from:"Sanjay Radia \(JIRA\)"

[jira] [Commented] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP

2018-02-05 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352889#comment-16352889
 ] 

Sanjay Radia commented on HADOOP-15191:
---

Steve, can you please explain how this will be used?

For example will distcp call the fs-object to see if it has a bulk delete and 
then call that  fs's  bulk deletes? Alternatively we could add a bulk delete 
operation to the FileSystem and FileContext API and have distcp simply call 
fs.bulkDelete(...);  the fs implementation will either call the bulk delete 
operation  or call individual delete.  The second approach has the advantage 
that distcp's code is simpler.

> Add Private/Unstable BulkDelete operations to supporting object stores for 
> DistCP
> -
>
> Key: HADOOP-15191
> URL: https://issues.apache.org/jira/browse/HADOOP-15191
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, 
> HADOOP-15191-003.patch, HADOOP-15191-004.patch
>
>
> Large scale DistCP with the -delete option doesn't finish in a viable time 
> because of the final CopyCommitter doing a 1 by 1 delete of all missing 
> files. This isn't randomized (the list is sorted), and it's throttled by AWS.
> If bulk deletion of files was exposed as an API, distCP would do 1/1000 of 
> the REST calls, so not get throttled.
> Proposed: add an initially private/unstable interface for stores, 
> {{BulkDelete}} which declares a page size and offers a 
> {{bulkDelete(List)}} operation for the bulk deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15140) S3guard mistakes root URI without / as non-absolute path

2018-01-23 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336896#comment-16336896
 ] 

Sanjay Radia commented on HADOOP-15140:
---

{quote}{{FileContext}} tried to move all the path manipulation up a level, so 
the FS implementations only deal with absolute URIs
{quote}
 

Yes that is correct. It simplified the semantics and the implementation.

> S3guard mistakes root URI without / as non-absolute path
> 
>
> Key: HADOOP-15140
> URL: https://issues.apache.org/jira/browse/HADOOP-15140
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Abraham Fine
>Priority: Major
>
> If you call {{getFileStatus("s3a://bucket")}} then S3Guard will throw an 
> exception in putMetadata, as it mistakes the empty path for "non-absolute 
> path"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)

2017-01-05 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802980#comment-15802980
 ] 

Sanjay Radia commented on HADOOP-11452:
---

*Some background:*

# FileSystem#rename(src, dest, options)  originally served as a temporary 
implementation of FileContext#rename(src, dest, options). Hence it was kept 
protected. It was deprecated when the implementation in AbstractFileSystem 
called dfsClient.rename(...) which btw is atomic as noted in this Jira.
# Why didn't we make it public with the correct impl. The expectation was that 
all Hadoop apps would move to FIleContext. The flaw in that plan was that apps 
like Hive neeed to run on both Hadoop 1 and Hadoop 2 and they picked the lowest 
common denominator: Hadoop 1 which only has FileSystem. We should have back 
ported  FileContext to Hadoop 1. Big mistake.

Going forward there is little chance of removing FileSystem since  many 
customers probably use it directly. Hence I am okay with FileSystem#rename(src, 
dest, options) becoming public and having a correct implementation (ie call  
dfsClient#rename(...). However we should only do this ONLY  if we feel 
customers/apps need the OVERWRITE flag.

I have already commented on the Rename.ATOMIC_REQUIRED. My vote is -1 for this 
option as I explained in my comment above. We could move the  
Rename.ATOMIC_REQUIRED part to a separate jira  if folks want to discuss this 
further  and move forward on making rename-with-options public.

> Revisit FileSystem.rename(path, path, options)
> --
>
> Key: HADOOP-11452
> URL: https://issues.apache.org/jira/browse/HADOOP-11452
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs
>Affects Versions: 2.7.3
>Reporter: Yi Liu
>Assignee: Steve Loughran
> Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected 
> and with _deprecated_ annotation. And the default implementation is not 
> atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a 
> good and atomic implementation. (Also an interesting thing in {{DFSClient}}, 
> the _deprecated_ annotations for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since 
> it's atomic for rename+overwrite, also it saves RPC calls if user desires 
> rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)

2017-01-05 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802888#comment-15802888
 ] 

Sanjay Radia commented on HADOOP-11452:
---

Steve suggested:
bq. note that we could consider adding a new enum operation 
Rename.ATOMIC_REQUIRED which will fail if atomicity is not supported

We had considered such things (and this specific one) multiple times in the 
past,  in the context of S3 and also the local file system for not just rename 
but also other methods. Neither local fs or S3 have exactly the same semantics 
as HDFS for each method.   *Here is the main issue:* File systems like 
LocalFIlesystem is used for testing apps and for a long time S3 was used for 
simply testing or for non-critical usage on the cloud. Folks were willing to 
live with the occasional inconsistency or with the performance overhead of say 
copy-delete for rename on S3.  If  applications like  hive or Spark used the 
rename.ATOMIC_REQUIRED on then the app would just fail on S3 and those use 
cases (testing, non-critical or willing to live with the performance overhead) 
would not be supported and its users would be unhappy.

Now that users want to run production apps on cloud storage like S3,  apps like 
Hive are being modified to run well on S3 by changing how they do commit (say 
via the metastore or a menifest file instead of the rename). 

So adding the Rename.ATOMIC_REQUIRED flag is easy. But is it going to be 
useful? Please articulate how it will be used. For example if we were to change 
Hive to use Rename.ATOMIC_REQUIRED then Hive will just fail on S3.

So I think we should continue to make progress on Hive, Spark and others to run 
first class on S3. I dont think Rename.ATOMIC_REQUIRED helps. I believe it make 
sense to have an FS.whatFeaturesDoYouSupport() API so that an app like Hive 
could be implemented to run first class on HDFS, S3, AzureBlobStoage etc by 
querying the FS features and then using a  different implementation for say 
committing the output of a job. In some cases it may be better to use a totally 
different approach that works on all FSs such as a manifest file or depend on 
Hive Metastore to commit . (Turns out hive needs to be able to commit multiple 
tables and hence even the rename-dir is not good enough.)

> Revisit FileSystem.rename(path, path, options)
> --
>
> Key: HADOOP-11452
> URL: https://issues.apache.org/jira/browse/HADOOP-11452
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs
>Affects Versions: 2.7.3
>Reporter: Yi Liu
>Assignee: Steve Loughran
> Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected 
> and with _deprecated_ annotation. And the default implementation is not 
> atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a 
> good and atomic implementation. (Also an interesting thing in {{DFSClient}}, 
> the _deprecated_ annotations for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since 
> it's atomic for rename+overwrite, also it saves RPC calls if user desires 
> rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12909) Change ipc.Client to support asynchronous calls

2016-03-21 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204595#comment-15204595
 ] 

Sanjay Radia commented on HADOOP-12909:
---

I haven't had a chance to look at the patch or review all the comments, but 
wanted to bring attention to one issue wrt async rpc that is well known by 
implementors and practitioners of message passing & rpc systems (excuse me if 
this has already been covered): 
* One needs to watch out for buffer management. ie. aync rpc/message passing 
has the potential to use up memory for buffering the messages. This is 
prevented in Sync rpc systems: 
** the sender (client) blocks and cannot flood the receiver unless it uses 
threads
** the receiver (server) is guaranteed that the sender (ie client) is waiting 
to receive and if it has died then the reply can be discarded.

With asyn rpc , my suggestion is to consider something along the following 
lines:
*  the client needs to allocate some buffer (or space for it)  where replies 
are stored. On each async rpc call, it passes a ref to this buffer for storing 
replies. If the client does not pick up the replies fast enough then his next 
async call using that buffer space will block. 
* Note this makes the clients code tricky in what to do if it is blocked since 
one must ensure that a deadlock or starvation  does not happen (but async 
messaging has always been tricky which is why cs community went with sync rpc). 
Note this problem does not arise on server side async-rpc since the client is 
blocked waiting for reply (unless the client also did async call but in that 
case its buffer, as per my suggestion,  must be there to store the reply).

> Change ipc.Client to support asynchronous calls
> ---
>
> Key: HADOOP-12909
> URL: https://issues.apache.org/jira/browse/HADOOP-12909
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HADOOP-12909-HDFS-9924.000.patch, 
> HADOOP-12909-HDFS-9924.001.patch, HADOOP-12909-HDFS-9924.002.patch, 
> HADOOP-12909-HDFS-9924.003.patch
>
>
> In ipc.Client, the underlying mechanism is already supporting asynchronous 
> calls -- the calls shares a connection, the call requests are sent using a 
> thread pool and the responses can be out of order.  Indeed, synchronous call 
> is implemented by invoking wait() in the caller thread in order to wait for 
> the server response.
> In this JIRA, we change ipc.Client to support asynchronous mode.  In 
> asynchronous mode, it return once the request has been sent out but not wait 
> for the response from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-7310) Trash location needs to be revisited

2015-07-14 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626988#comment-14626988
 ] 

Sanjay Radia commented on HADOOP-7310:
--

Looking at HDFS-8747, perhaps the better solution is:
Whenever a file is deleted, the trash is located by searching up the path and 
finding the closest parent dir that has trash.
It solves trash in encryption zone, trash in any quota based subtree (such as 
home directory) .
Ie the system creates trash in /, home dirs, encryotion zones etc. The 
increased complication  is that the NN needs to deal with multiple trash 
locations.

 Trash location needs to be revisited
 

 Key: HADOOP-7310
 URL: https://issues.apache.org/jira/browse/HADOOP-7310
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Sanjay Radia
Assignee: Sanjay Radia





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2015-05-21 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555032#comment-14555032
 ] 

Sanjay Radia commented on HADOOP-9984:
--

[~asuresh] thanks for your comment on Hive.
* if you configure hiveserver2 with sql-standard-auth then the file system 
permissions do not apply and symlinks should not be an issue. The data should 
be owned by hiverserver and users should not have hdfs access to those 
directories and files.,
* if you configure file-system-auth then the issue you describe will occur 
*when impersonation is tuned off*. Recall we had to fix Hive to work with 
encryption; likewise Hive will need to understand symlinks. To deal with 
atomicity issues (race between checking and setting symlink) we may have to add 
an api to HDFS to resolve to inode# and then resolve from inode#. (HDFS does 
have inode number that were added for NFS.). However, isn't file-system-auth 
usually used with impersonation where symlinks are not an issue?
* With impersonation turned on, the job will run as the user and symlinks will 
work. Correct?

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11966) Variable cygwin is undefined in hadoop-config.sh when executed through hadoop-daemon.sh.

2015-05-12 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541017#comment-14541017
 ] 

Sanjay Radia commented on HADOOP-11966:
---

You comment says that you tested hadoop-daemon.sh. Have you tested the hadoop,  
yarn and mapred commands (your change affects all of them and yes it should 
work since they all source hadoop-config.sh

 Variable cygwin is undefined in hadoop-config.sh when executed through 
 hadoop-daemon.sh.
 

 Key: HADOOP-11966
 URL: https://issues.apache.org/jira/browse/HADOOP-11966
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.7.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Critical
 Attachments: HADOOP-11966-branch-2.001.patch


 HADOOP-11464 reinstated support for running the bash scripts through Cygwin.  
 The logic involves setting a {{cygwin}} flag variable to indicate if the 
 script is executing through Cygwin.  The flag is set in all of the 
 interactive scripts: {{hadoop}}, {{hdfs}}, {{yarn}} and {{mapred}}.  The flag 
 is not set through hadoop-daemon.sh though.  This can cause an erroneous 
 overwrite of {{HADOOP_HOME}} and {{JAVA_LIBRARY_PATH}} inside 
 hadoop-config.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2015-04-29 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519971#comment-14519971
]

Sanjay Radia commented on HADOOP-9984:
--

bq. The problem with dereferencing all symlinks in listStatus is that it's
disastrously inefficient

# In the proposal listStatus2 is the new API that replaces listStatus
# all our libraries need to be changed to use listStatus2 (see item 3 in the4
proposal)
# customer who have old code that calls the old listStatus and cannot convert
that code immediately can disable symlinks, not use symlinks, or use symlinks
sparinglg. In practice I don't think there will dirs with oven tens of symlinks
(but symlink2 addresses the problem going forward.

bq. isSymlink is broken for dangling symlinks, FileSystem#rename is broken for
symlinks, the behavior of symlinks in globStatus is controversial, distCp
doesn't support it, ...
These are fixable. I think this jira itslef was attempting to fix some of these
when we ran into the design flaw of the orignal listStatus

bq. cross-filesystem symlinks ...
As I pointed out this needs to be discussed. Let make a separate comment that
summarizes the cross-namspace issues that have been presented in the various
comments in this and other jiras.

FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
default
--

Key: HADOOP-9984
URL: https://issues.apache.org/jira/browse/HADOOP-9984
Project: Hadoop Common
Issue Type: Sub-task
Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch,
HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch

During the process of adding symlink support to FileSystem, we realized that
many existing HDFS clients would be broken by listStatus and globStatus
returning symlinks. One example is applications that assume that
!FileStatus#isFile implies that the inode is a directory. As we discussed in
HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2015-04-29 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519998#comment-14519998
 ] 

Sanjay Radia commented on HADOOP-9984:
--

bq.  symlinks in globStatus is controversial
Colin can you please summarize the globStatus issue.
(I will take a stab at summarizing the cross-namespace issues.)
Thanks

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2015-04-27 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515986#comment-14515986
 ] 

Sanjay Radia commented on HADOOP-9984:
--

IThe following proposal on symlinks is based on discussions with Jason, Nathan 
and Daryn a few months ago. 
The recent disabling of symlinks (HDFS-11852) has prompted me to finally this 
comment out.

Symlink is a very frequently asked for feature and ran into trouble mostly 
because the the original listStatus was not well designed.
This issue has been heavily discussed and we have gone back and forth.  The 
proposal below is basically Jason Lowe's proposal as mostly described in 
 
https://issues.apache.org/jira/browse/HADOOP-9912?focusedCommentId=13772002page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13772002

An additional  issue concerns cross-namespace links that should be discussed in 
a separate comment.
Further,  Colins has raised a Hive concern in an email thread that I will also 
cover in a separate comment.


Summary of proposal:

* 1)  Existing listStatus() API will follows symlinks to maintain compatibility 
for isDir()  and throws exception if it cannot.
* 2) Add a new listStatus2() api that does the right thing (ie. not follow 
symlinks)
* 3) Change all other libraries such as  glob, cli and  tools to use the new 
API listStatus2
* 4) Deprecate the existing listStatus.

Details:
* 1) For the current API: listStatus()  returns  FileStatus[].
** a) List Status will follow the symlink. If any of the symlinks are not 
followable (i.e no permissions or dangling) then the listStatus throws an 
exception.
** b) The list of chidren in FileStatus is for those of the symlink and NOT the 
target 
** c) everything else FileStatus\[i]  (filesize, isDir, owner, perms, etc.) 
need to be from the resolved target of the symlinks. E.g.  FileStatus\[i].isDir 
will turn the status of the symlink target.   If it can't resolve a symlink 
then we must throw an error since we can't return partial results nor can we 
indicate per FileStatus entry that an error occurred.  (Note it would have been 
much nicer for isDir to throw the exception but that is not possible since it 
does not declare any exception and the only other option is runtime exception 
which is bad.)

* 2) Create a New API: listStatus2() (a better name? listDir) that returns 
FileStatusExtended[]
** a) This returns the raw list with symlinks *not* followed.
** b) FileStatusExtended has a method called getFileType() that returns an 
enum. Optionally it could have a method called isDir(), isFile(), isSymlink()

* 3) Fix all internal utilities and libraries (ls, glob, distcp)  to do the 
correct thing using API 1 or 2 as needed.

* 4) Deprecate the existing listStatus() API.

The reasoning behind the above proposal (Jason Lowe's words): 

As discussed in HADOOP-9912, listStatus is effectively a combination of 
readdir() and stat() from POSIX.  readdir() does not follow symlinks but stat() 
does.  That means we need to return the original names in the child directory, 
i.e.: what readdir() does, but the FileStatus infomation returned by listStatus 
needs to be what the symlink points to except for the name part, i.e.: what 
stat() does.o

And yes, throwing an exception for bad (dangling) symlinks is severe, but it 
seems like the lesser of evils.  We don't know what the application will do if 
we expose the raw symlink to it or hide it, which are basically our only 
choices if we don't throw.  Either approach could lead to silent dataloss or 
other badness because we don't know what the app is going to do.  That's why 
we'd deprecate the original API because it doesn't allow us to return errors 
for individual entries in the listStatus results -- it's all or nothing.


 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2015-04-27 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516025#comment-14516025
 ] 

Sanjay Radia commented on HADOOP-9984:
--

The Hive issue:
Colins posted the following in an HDFS-dev email thread that I am reproducing:
{quote}
Basically any
higher-level software that is relying on path-based access will have
problems with symlinks.  For example, Hive assumes that if you limit a
user's access to just things under /home/username, then you have
effectively sandboxed that person.  But if you can create a symlink
from /home/username/foo to /foo, then you've effectively broken out of
Hive's sandbox.  Since Hive often runs with elevated permissions, and
is willing access files under /home/username with those permissions,
this would be disastrous.  Hive is just one example, of course...
basically we'd have to audit all software using HDFS for this kind of
problem before enabling symlinks.
{quote}


I am not aware of the above sandboxing feature in hive. I checked with a 
couple of folks who are active in Hive and they told me that the above style of 
sandboxing was not supported in Hive.

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11852) Disable symlinks in trunk

2015-04-27 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515998#comment-14515998
 ] 

Sanjay Radia commented on HADOOP-11852:
---

I have posted a [comment | 
https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=14515986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14515986]
 in HDFS-9984 to fix symlinks.

 Disable symlinks in trunk
 -

 Key: HADOOP-11852
 URL: https://issues.apache.org/jira/browse/HADOOP-11852
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 3.0.0

 Attachments: hadoop-11852.001.patch


 In HADOOP-10020 and HADOOP-10162 we disabled symlinks in branch-2. Since 
 there's currently no plan to finish this work, let's disable it in trunk too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-30 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387533#comment-14387533
]

Sanjay Radia commented on HADOOP-11552:
---

You mean more than tests?

Allow handoff on the server side for RPC requests
-

Key: HADOOP-11552
URL: https://issues.apache.org/jira/browse/HADOOP-11552
Project: Hadoop Common
Issue Type: Improvement
Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt,
HADOOP-11552.3.txt

An RPC server handler thread is tied up for each incoming RPC request. This
isn't ideal, since this essentially implies that RPC operations should be
short lived, and most operations which could take time end up falling back to
a polling mechanism.
Some use cases where this is useful.
- YARN submitApplication - which currently submits, followed by a poll to
check if the application is accepted while the submit operation is written
out to storage. This can be collapsed into a single call.
- YARN allocate - requests and allocations use the same protocol. New
allocations are received via polling.
The allocate protocol could be split into a request/heartbeat along with a
'awaitResponse'. The request/heartbeat is sent only when there's a request or
on a much longer heartbeat interval. awaitResponse is always left active with
the RM - and returns the moment something is available.
MapReduce/Tez task to AM communication is another example of this pattern.
The same pattern of splitting calls can be used for other protocols as well.
This should serve to improve latency, as well as reduce network traffic since
the keep-alive heartbeat can be sent less frequently.
I believe there's some cases in HDFS as well, where the DN gets told to
perform some operations when they heartbeat into the NN.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-02-10 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315460#comment-14315460
]

Sanjay Radia commented on HADOOP-11552:
---

bq. Are you proposing to keep the TCP session open, but reuse the handler
thread for something else, while the RPC is progressing?
bq. Yes, the intent is to keep the TPC session open and re-use the handlers

Note our RPC system forces the handler thread to do the response and hence we
have to have a large number of handler threads since some of the requests (such
a write operation on a NN) takes a longer because it has to write to the
journal. Other RPC system and also request-response message passing systems
allow hand-off to any thread to do the work and reply. The TCP connection being
kept open is not due to the handler thread-binding, but it is instead because
our RCP layer depends on a connection close to detect server failures (and i
believe we send some heartbeat bytes to detect server failures promptly). So we
need to keep the connection open if the RPC is operation is not completed.
Now the impact on RCP connections that you raised:
* for normal end-clients (e.g. HDFS clients) the connections will remain open
as in the original case - ie the till the request is completed and reply is
sent. Hence the number of such connections will be the same.
* for internal clients where the request is of type do you have more work for
me (as sent by DN or NM) the number of connections will increase but will be
bounded. Here we can have a hybrid approach where the the RM could keep a few
requests blocked and reply only when work is available and for other such
requests it could say no work, but try 2 seconds later.

Allow handoff on the server side for RPC requests
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-02-10 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315474#comment-14315474
]

Sanjay Radia commented on HADOOP-11552:
---

bq. If we move to an offer-based system like Mesos,
You are mixing layers. SId is talking about the RPC layer. The layer above RPC
such as how Yarn resources are obtained and used will be unaffected.

bq. have the resource manager make outgoing connections to the executors
Making outgoing connections as you suggest is another valid approach. For that
to work well we need client-side async support while this jira is proposing a
server-side async (I put async in quotes because in my mind the hand-off is
not asycn-rpc since the rpc client blocks till the work is done).

Another good usecase for this jira is the the write-operations on the NN that
write to the journal. Such operations should be handed off to a worker thread
who writes to the journal and then replies. The original handler-thread goes
back to serving new requests as soon as the hand off is done. If we do this we
could drastically reduce the number of handler threads needed in NN (you
already noted the reduction in handler threads for the other use case).

Allow handoff on the server side for RPC requests
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-10-14 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9992:
-
Status: Open  (was: Patch Available)

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: HADOOP-9992.004.patch, hadoop-9992-v2.patch, 
 hadoop-9992-v3.patch, hadoop-9992-v4.patch, hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-10-14 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9992:
-
Attachment: hadoop-9992-v4.patch

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: HADOOP-9992.004.patch, hadoop-9992-v2.patch, 
 hadoop-9992-v3.patch, hadoop-9992-v4.patch, hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-10-14 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9992:
-
Status: Patch Available  (was: Open)

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: HADOOP-9992.004.patch, hadoop-9992-v2.patch, 
 hadoop-9992-v3.patch, hadoop-9992-v4.patch, hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10741) A lightweight WebHDFS client library

2014-10-01 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155967#comment-14155967
]

Sanjay Radia commented on HADOOP-10741:
---

I see part of the counter argument being that folks using rest are doing it
for one of two reasons.
1) Protocol compatibility - this was the orignal motivation in the past when
HDFS protocols were not compatible across some versions, This has been fixed.
2) Want a lightweight client that is independent of any version of HDFS.
However as Mohammad has pointed out in his description, ustomer using web hdfs
rest protocol find that managing failure, auth, etc is painful, Hence a library
would help.
I can see Andrew's argument of putting it outside Hadoop common to better
satisfy (2). We can decide the exact mechanism to distribute this library
later.
Note the goal of this library is *not* another FS API but a client side library
that wraps hdfs's rest protocol. It is valid question to see if this API should
mimic that actual Hadoop FS API?
Mohammad please post the patch. We will figure out the mechanism of
distributing that library separately. Thanks.

A lightweight WebHDFS client library

Key: HADOOP-10741
URL: https://issues.apache.org/jira/browse/HADOOP-10741
Project: Hadoop Common
Issue Type: New Feature
Components: tools
Reporter: Tsz Wo Nicholas Sze
Assignee: Mohammad Kamrul Islam

One of the motivations for creating WebHDFS is for applications connecting to
HDFS from outside the cluster. In order to do so, users have to either
# install Hadoop and use WebHdfsFileSsytem, or
# develop their own client using the WebHDFS REST API.
For #1, it is very difficult to manage and unnecessarily complicated for
other applications since Hadoop is not a lightweight library. For #2, it is
not easy to deal with security and handle transient errors.
Therefore, we propose adding a lightweight WebHDFS client as a separated
library which does not depend on Common and HDFS. The client can be packaged
as a standalone jar. Other applications simply add the jar to their
classpath for using it.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-09-24 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9992:
-
Status: Open  (was: Patch Available)

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: hadoop-9992-v2.patch, hadoop-9992-v3.patch, 
 hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-09-24 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9992:
-
Status: Patch Available  (was: Open)

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: hadoop-9992-v2.patch, hadoop-9992-v3.patch, 
 hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-14 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10919:
--

Attachment: HDFS-6134-Distcp-cp-UseCasesTable2.pdf

I misunderstood the EZKey. Matching does not matter for distcp/cp. I have 
updated the use cases table.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch, 
 HDFS-6134-Distcp-cp-UseCasesTable.pdf, HDFS-6134-Distcp-cp-UseCasesTable2.pdf


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095744#comment-14095744
 ] 

Sanjay Radia commented on HADOOP-10919:
---

bq. trashing   It's assumed that an hdfs admin would not (intentionally) do 
that.
Okay, please add that your doc when you next update it. We could allow just 
read access to /r/r/ to all.

Use cases: charles can we please work together to get the distcp use cases  
nailed. We can work offline to go faster and then summarize for the community.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096353#comment-14096353
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Q. when you say distcp  /r/r/src  /r/r/dest are the  keys  read from src and 
preserved in the dest? Does the act of copying the keys  from a  /r/r/src into 
a /r/r/dest  automatically set up a matching EZ  in the destination?

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10919:
--

Attachment: HDFS-6134-Distcp-cp-UseCasesTable.pdf

I have attached a table that shows the distcp/cp use cases and the desirable 
outcomes. I think this implementable in a transparent fashion within distcp or 
cp using /r/r mechanism.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch, 
 HDFS-6134-Distcp-cp-UseCasesTable.pdf


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094447#comment-14094447
 ] 

Sanjay Radia commented on HADOOP-10919:
---

bq. Given that, I'm wondering what would the purpose be for checking that the 
target is an EZ? 
You mentioned that in your doc and hence I raised it here.

Given that your document mentioned that the target and src must match wrt to EZ 
I thought that you had made distcp  transparent: ie distcp will check if  any 
dir in the subtree is EZ and will prefix by /.reserved/.raw. And I think that 
is a good idea since it will mean that all existing distcp scripts will 
continue to work if you set the EZ on the src and target correctly.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094640#comment-14094640
]

Sanjay Radia commented on HADOOP-10919:
---

bq. Right now it's transparent in that distcp will decrypt when it reads from
the normal path. This is what all existing distcp scripts will be doing,
copying to and from normal paths. ... but it's a reasonable and sometimes
desirable behavior.
At the meeting and in the jira we concluded that the above behavior is not
desirable: the user running the distcp may not have permission to decrypt (e.g.
an Admin at NSA). Second, the data is being transmitted in the clear. Third the
efficiency argument. You are saying but it's a reasonable and sometimes
desirable behavior. - I thought we have established it is not and hence we are
doing the /.r/.r and that distcp will take advantage of it. I hope you still
want to do /.r/.r? Maybe you are asserting that /.r/.r was unnecessary but you
are willing to do it to please a few in the community. That okay - we can agree
to disagree here.

I would have thought that if distcp prefixes all paths by /.r/.r then it would
just work. Your comments says that /.r/r is also superuser only -- not sure
what you mean - only superuer can access /.r/.r? Surely that is not the case?
Is this mentioned in the distcp doc and I missed it?

Copy command should preserve raw.* namespace extended attributes

Key: HADOOP-10919
URL: https://issues.apache.org/jira/browse/HADOOP-10919
Project: Hadoop Common
Issue Type: Bug
Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch

Refer to the doc attached to HDFS-6509 for background.
Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve
extended attributes in the raw.* namespace by default whenever the src and
target are in /.reserved/raw. To not preserve raw xattrs, don't specify
/.reserved/raw in either the src or target.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095122#comment-14095122
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles can you expand on what trashing you are worried about? One only needs 
read access on the src side.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095149#comment-14095149
]

Sanjay Radia commented on HADOOP-10919:
---

Charles lets enumerate the distcp use cases - Here is my first draft. Below for
some of the use cases I propose possible desirable outcomes but these outcomes
can be debated separately from the use cases,
# src subtree and dst subtree do not have EZ - easy, same as today
# src subtree has no EZ but dest does have EZ in a portion of its subtree.
Possible outcomes
## - if user performing operation has permissions in dest EZ then the files
within the dest EZ subtree are encrypted
## if user does not (say Admin) what do we expect to happen?
# src subtree has EZ but dest does not. Possible outcomes
## files copied as encrypted but cannot be decryptied at the dest since it does
not have an EZ zone- useful as a backup
## files copied as encrypted and a matching EZ is created automatically. Can an
admin do this operation since he does not have access to the keys?
## throw an error which can be overidden by a flag in which case the files are
decryoted and copied to in dest are left decrypted . This only works if the
user has permissions for decryption; admin cannot do this.
# both src and dest have EZ at exactly the same part of the subtree. Possible
outcomes
## If user has permission to decrypt and encrypt, then the data is copied and
encryption is redone with new keys,
## If user does not have permission then ?? Fail or copy as raw?
# both src and dest have EZ at different parts of the subtree. This should
reduce to 2 or 3.

For each of the above one can have distcp do the right thing automatically or
we can force the user to explicitly submit /r/r/path as appropriate, Lets
explore both approaches and see which one works better.

Copy command should preserve raw.* namespace extended attributes

Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093548#comment-14093548
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles, you list  disadvantage for the .raw scheme where the target of a 
distcp is not an encrypted zone. Would it make sense for distcp to check for 
that and to fail the distcp? 

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093550#comment-14093550
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles, the work you did for distcp needs to be also applied to har. I suspect 
.raw would also work.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093594#comment-14093594
 ] 

Sanjay Radia commented on HADOOP-10919:
---

charles, what is the usage model for distcp of encrypted files:
* distcp path1 path2   - where distcp will insert /.reserved/.raw to the 
pathnames if in encrypted zone.
* OR distcp /.reserved/.raw/path1  /.reserved/.raw/path2


BTW is the proposal is that both src and dest MUST be encryptedZones or neither 
? (Because of your misspoke comment I am a little confused.)


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10382) Add Apache Tez to the Hadoop homepage as a related project

2014-04-10 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965358#comment-13965358
 ] 

Sanjay Radia commented on HADOOP-10382:
---

+1

 Add Apache Tez to the Hadoop homepage as a related project
 --

 Key: HADOOP-10382
 URL: https://issues.apache.org/jira/browse/HADOOP-10382
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: HADOOP-10382.patch


 Add Apache Tez to the Hadoop homepage as a related project



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10191) Missing executable permission on viewfs internal dirs

2014-03-21 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943680#comment-13943680
 ] 

Sanjay Radia commented on HADOOP-10191:
---

Chris, 0555 is more correct since the mount links are not writable (like other 
internal dirs).

 Missing executable permission on viewfs internal dirs
 -

 Key: HADOOP-10191
 URL: https://issues.apache.org/jira/browse/HADOOP-10191
 Project: Hadoop Common
  Issue Type: Bug
  Components: viewfs
Reporter: Gera Shegalov
Priority: Blocker
 Attachments: HADOOP-10191.v01.patch


 ViewFileSystem allows 1) unconditional listing of internal directories (mount 
 points) and 2) and changing work directories.
 1) requires read permission
 2) requires executable permission
 However, the hardcoded PERMISSION_RRR == 444 for FileStatus representing an 
 internal dir does not have executable bit set.
 This confuses YARN localizer for public resources on viewfs because it 
 requires executable permission for other on all of the ancestor directories 
 of the resource. 
 {code}
 java.io.IOException: Resource viewfs:/pubcache/cache.txt is not publicly 
 accessable and as such cannot be part of the public cache.
 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:182)
 at 
 org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
 at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:279)
 at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:277)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-01-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862502#comment-13862502
 ] 

Sanjay Radia commented on HADOOP-9992:
--

Looks good. Update the javadoc to reflect the MR option. Also you have left 
some debugging printfs in the code.

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-9992) Modify the NN loadGenerator to optionally run as a MapReduce job

2014-01-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862506#comment-13862506
 ] 

Sanjay Radia commented on HADOOP-9992:
--

Update MapredTestDriver to include the loadGenerator for convenience.

 Modify the NN loadGenerator to optionally run as a MapReduce job
 

 Key: HADOOP-9992
 URL: https://issues.apache.org/jira/browse/HADOOP-9992
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Akshay Radia
Assignee: Akshay Radia
 Attachments: hadoop-9992.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-10106) Incorrect thread name in RPC log messages

2013-12-13 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848037#comment-13848037
 ] 

Sanjay Radia commented on HADOOP-10106:
---

+1

 Incorrect thread name in RPC log messages
 -

 Key: HADOOP-10106
 URL: https://issues.apache.org/jira/browse/HADOOP-10106
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor
 Attachments: hadoop_10106_trunk.patch, hadoop_10106_trunk_2.patch


 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 This is thrown by a reader thread, so the message should be like
 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 Another example is Responder.processResponse, which can also be called by 
 handler thread. When that happend, the thread name should be the handler 
 thread, not the responder thread.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HADOOP-10044) Improve the javadoc of rpc code

2013-12-12 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10044:
--

  Resolution: Fixed
Target Version/s: 2.3.0
  Status: Resolved  (was: Patch Available)

The failed test timeout is unrelated (and I also ran it successfully).
Committed.

 Improve the javadoc of rpc code
 ---

 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor
 Attachments: HADOOP-10044.20131014.patch, hadoop-10044.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HADOOP-10106) Incorrect thread name in RPC log messages

2013-12-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843477#comment-13843477
 ] 

Sanjay Radia commented on HADOOP-10106:
---

Refactoring of code generally need to be done in separate Jira. Isn't the 
fixing of the thread name possible with the old structure? If so, I suggest 
that you do the refactoring in a separate jira.

 Incorrect thread name in RPC log messages
 -

 Key: HADOOP-10106
 URL: https://issues.apache.org/jira/browse/HADOOP-10106
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor
 Attachments: hadoop_10106_trunk.patch


 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 This is thrown by a reader thread, so the message should be like
 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 Another example is Responder.processResponse, which can also be called by 
 handler thread. When that happend, the thread name should be the handler 
 thread, not the responder thread.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HADOOP-10044) Improve the javadoc of rpc code

2013-12-09 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10044:
--

Status: Open  (was: Patch Available)

 Improve the javadoc of rpc code
 ---

 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor
 Attachments: HADOOP-10044.20131014.patch, hadoop-10044.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HADOOP-10044) Improve the javadoc of rpc code

2013-12-09 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10044:
--

Status: Patch Available  (was: Open)

 Improve the javadoc of rpc code
 ---

 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor
 Attachments: HADOOP-10044.20131014.patch, hadoop-10044.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HADOOP-10106) Incorrect thread name in RPC log messages

2013-12-06 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841665#comment-13841665
 ] 

Sanjay Radia commented on HADOOP-10106:
---

You seemed to have unnecessarily moved the doRead() function's location in the 
server.java file. Can you please leave it in its original place in the file and 
please resubmit the patch.

 Incorrect thread name in RPC log messages
 -

 Key: HADOOP-10106
 URL: https://issues.apache.org/jira/browse/HADOOP-10106
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor
 Attachments: hadoop_10106_trunk.patch


 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 This is thrown by a reader thread, so the message should be like
 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: 
 readAndProcess from client 10.115.201.46 threw exception 
 org.apache.hadoop.ipc.RpcServerException: Unknown out of band call 
 #-2147483647
 Another example is Responder.processResponse, which can also be called by 
 handler thread. When that happend, the thread name should be the handler 
 thread, not the responder thread.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-19 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800031#comment-13800031
]

Sanjay Radia commented on HADOOP-9984:
--

bq. [sanjay] Fix all internal utilities, hive, pig, map reduce, yarn, etc to
not use isDir() and understand that a directory may contain symlinks.
bq. [daryn] I do not agree. This means symlinks are not transparent and not
compatible with pre-2.x.
Our tools *may* want to copy a symlink as-is rather than copy the file it
refers to; all I am saying is that if there is a need to do that we need to fix
such tools. For example, distcp needs to be symlink-aware rather than blindly
copy 1PB when in reality one would have desired to copy the symlink. The main
concern is other applications that are doing a listStatus + isDir; for that I
have listed 2 options and my own personal opinion on what we should do.

FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
default
--

Key: HADOOP-9984
URL: https://issues.apache.org/jira/browse/HADOOP-9984
Project: Hadoop Common
Issue Type: Sub-task
Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch,
HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-18 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799499#comment-13799499
 ] 

Sanjay Radia commented on HADOOP-9984:
--


*Background:* 
Applications  often call listStatus and then call fileStatus.isDir() on each of 
the retuned children to decide if a node is a dir or a file. Such code would 
potentially break if any of the children are symlinks. This jira proposed that 
listStatus should follow any child symlinks and return a resolved list of 
children. Note symlinks that occur in the pathname passed to listStatus are 
always transparently followed and are not an issue.  Also note that when 
symlinks was introduced, isDir() was deprecated and isDirectory(), isFile(), 
iSymlink() were added.

*Compare with Posix:*
Posix has separate readDir and stat/lstat. While  readDir does not return the 
full status of each child, it does return the file type in the struct-dirent 
(i.e. regular file, dir, symlink etc).

*Issue with following child symlinks*
This lira's proposed solution (follow the child symlinks) has an issue. Comment 
[by 
daryn|https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13786431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13786431]
  and  
[Oct9th|https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13790972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13790972]
 in this jira shows potential problems with following child symlinks  - the 
most egregious being the duplicate entry. 

*New Proposed Solution*
 listStatus should NOT follow child symlinks. Fix all internal utilities, hive, 
pig, map reduce, yarn, etc to not use isDir() and understand that a directory 
may contain symlinks.
We have two choices for isDir() (which, btw,  has already been deprecated)
a)  isDir() returns the file type of child without following the symlink (this 
is the code in trunk)
b)  isDir() returns the file type of child after following the symlink. ( 
unless the link is dangling). 

My own preference is (a). The argument in favor of (b) is that it would provide 
greater compatibility.
I think regardless of which choice we pick we will break some apps; in that 
case I rather pick the cleaner solution, (a).



 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-18 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799716#comment-13799716
 ] 

Sanjay Radia commented on HADOOP-9984:
--

bq. Let's make hadoop work like every other filesystem by making symlinks be 
transparent
Unix's readir does return the file type - see my comment. So your statement is 
not true. It is mostly transparent.

So your prefer the second option (b) for readDir. Is you layer file system 
proposal for fixing symlinks,  an implementation choice for option (b) or 
something with fundamentally different semantics?


 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HADOOP-10044) Improve the javadoc of rpc code

2013-10-11 Thread Sanjay Radia (JIRA)

Sanjay Radia created HADOOP-10044:
-

 Summary: Improve the javadoc of rpc code
 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10044) Improve the javadoc of rpc code

2013-10-11 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793139#comment-13793139
 ] 

Sanjay Radia commented on HADOOP-10044:
---

The hadoop rpc code especially the code in Server.java is fairly complicated 
and poorly documented. Everytime I make changes there or try and debug an 
issue, I have relearn parts of the code. The javadoc needs to be improved.

 Improve the javadoc of rpc code
 ---

 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10044) Improve the javadoc of rpc code

2013-10-11 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10044:
--

Attachment: hadoop-10044.patch

 Improve the javadoc of rpc code
 ---

 Key: HADOOP-10044
 URL: https://issues.apache.org/jira/browse/HADOOP-10044
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Sanjay Radia
Priority: Minor
 Attachments: hadoop-10044.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10029) Specifying har file to MR job fails in secure cluster

2013-10-10 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791670#comment-13791670
 ] 

Sanjay Radia commented on HADOOP-10029:
---

There are compiler warnings. Otherwise +1.

 Specifying har file to MR job fails in secure cluster
 -

 Key: HADOOP-10029
 URL: https://issues.apache.org/jira/browse/HADOOP-10029
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HADOOP-10029.1.patch, HADOOP-10029.2.patch, 
 HADOOP-10029.3.patch, HADOOP-10029.patch


 This is an issue found by [~rramya]. See the exception stack trace in the 
 following comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10029) Specifying har file to MR job fails in secure cluster

2013-10-10 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10029:
--

Attachment: HADOOP-10029.4.patch

Updated patch that adds suppress warning for deprecation

 Specifying har file to MR job fails in secure cluster
 -

 Key: HADOOP-10029
 URL: https://issues.apache.org/jira/browse/HADOOP-10029
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HADOOP-10029.1.patch, HADOOP-10029.2.patch, 
 HADOOP-10029.3.patch, HADOOP-10029.4.patch, HADOOP-10029.patch


 This is an issue found by [~rramya]. See the exception stack trace in the 
 following comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10029) Specifying har file to MR job fails in secure cluster

2013-10-10 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791996#comment-13791996
 ] 

Sanjay Radia commented on HADOOP-10029:
---

+1 

 Specifying har file to MR job fails in secure cluster
 -

 Key: HADOOP-10029
 URL: https://issues.apache.org/jira/browse/HADOOP-10029
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HADOOP-10029.1.patch, HADOOP-10029.2.patch, 
 HADOOP-10029.3.patch, HADOOP-10029.4.patch, HADOOP-10029.4.patch, 
 HADOOP-10029.5.patch, HADOOP-10029.6.patch, HADOOP-10029.patch


 This is an issue found by [~rramya]. See the exception stack trace in the 
 following comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10035) Cleanup TestFilterFileSystem

2013-10-10 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792025#comment-13792025
 ] 

Sanjay Radia commented on HADOOP-10035:
---

In MUST not implement you should put the caps on NOT, not  caps on MUST
+1 

 Cleanup TestFilterFileSystem
 

 Key: HADOOP-10035
 URL: https://issues.apache.org/jira/browse/HADOOP-10035
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.1.1-beta
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HADOOP-10035.1.patch, HADOOP-10035.2.patch, 
 HADOOP-10035.patch


 Currently TestFilterFileSystem only checks for FileSystem methods that must 
 be implemented in FilterFileSystem with a list of methods that are exception 
 to this rule. This jira wants to make this check stricter by adding a test 
 for ensuring the methods in exception rule list must not be implemented by 
 the FilterFileSystem.
 This also cleans up the current class that has methods from exception rule 
 list to interface to avoid having to provide dummy implementation of the 
 methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9631) ViewFs should use underlying FileSystem's server side defaults

2013-10-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790714#comment-13790714
 ] 

Sanjay Radia commented on HADOOP-9631:
--

Here are two early comments (haven't finished reviewing the whole patch).
* viewfs#getServerDefaults(path) can be simplified. See how open or list  are 
implemented and it take advantage of  the internal class InternalDirOfViewOf.
Something like this should work:
{code}
viewfs#getServerDefaults(f) {
  InodeTree.ResolveResultAbstractFileSystem res = 
fsState.resolve(getUriPath(f), true);
return res.targetFileSystem.getServerDefailts(res.remainingPath);
}


InternalDirOfViewFs#getServerDefaults() {
return LocalConfigKeys.getServerDefaults();
}
InternalDirOfViewFs#getServerDefaults(f) {
checkPathIsSlash(f);
return LocalConfigKeys.getServerDefaults();
}
{code}

 * FIleSystem#getServerDefaults(f) is incorrect due to getDefaultReplication(). 
It should use getDefaultReplciation(f). Hence move the code for 
FIleSystem#getServerDefaults() to FIleSystem#getServerDefaults(f), changing the 
getDefaultReplication to pass the pathname f. Have 
FIleSystem#getServerDefaults() call FIleSystem#getServerDefaults(/);

 ViewFs should use underlying FileSystem's server side defaults
 --

 Key: HADOOP-9631
 URL: https://issues.apache.org/jira/browse/HADOOP-9631
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs, viewfs
Affects Versions: 2.0.4-alpha
Reporter: Lohit Vijayarenu
 Attachments: HADOOP-9631.trunk.1.patch, HADOOP-9631.trunk.2.patch, 
 HADOOP-9631.trunk.3.patch, HADOOP-9631.trunk.4.patch, TestFileContext.java


 On a cluster with ViewFS as default FileSystem, creating files using 
 FileContext will always result with replication factor of 1, instead of 
 underlying filesystem default (like HDFS)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10029) Specifying har file to MR job fails in secure cluster

2013-10-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790862#comment-13790862
 ] 

Sanjay Radia commented on HADOOP-10029:
---

*  resolvePath(). Not sure what is correct here. Resolve is suppose to follow 
though symlinks/mount points and resolve the path. One possibility is to make 
it the same as the default implementation of FileSystem (it calls 
fileStatus.getPath)?
* Add comment why getCanonicalUri calls the underlying file system (due to 
tokens).
* I would make ALL the copyFromXX and moveFromXX variants  throw the exception 
rather than rely on the fact that FileSystem's default implementation calls one 
of copyFromXX that HarFileSystem implements and throws exception. 

 Specifying har file to MR job fails in secure cluster
 -

 Key: HADOOP-10029
 URL: https://issues.apache.org/jira/browse/HADOOP-10029
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 2.2.0

 Attachments: HADOOP-10029.1.patch, HADOOP-10029.2.patch, 
 HADOOP-10029.patch


 This is an issue found by [~rramya]. See the exception stack trace in the 
 following comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-09 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790972#comment-13790972
]

Sanjay Radia commented on HADOOP-9984:
--

bq. Daryn, the discussion about resolved paths versus unresolved ones belongs
on HADOOP-9780, not here.
At least some of the points in Daryn's comments on Oct 4th apply to HADOOP-9984
, rather than HADOOP-9780.

Hadoop-9984's latest patch resolves the symlinks for listStatus, i.e. if the
target directory denoted by the path has children that are symlinks those
symlinks will be resolved (so as to allow old apps that did if (! stat.isDir()
then AssumeItIsAFile to work unchanged)

Lets consider the following example:
Lets say the directory /foo/bar has children a, b, c, d and lets say c is a
symlink to /x/a.
The method listStatus(/foo/bar) will, with the patch, return an array of
FileStatus for a, b *a*, d. The repeated a is because /foo/bar/c is resolved
and its target /x/a is returned.

This is a spec violation: The result of listStatus is suppose to return a set
of unique directory entries (since a dir cannot have duplicate names) Further
if someone was using listStatus to copy the contents of /foo/bar the copy
operation will fail with a FileAlreadyExistsException. Daryn gives an example
of where someone is trying to do rename and gets tripped by the duplicate entry.

One could argue that for some of the other issues that Daryn raises, the
application writer should have been using another API. I picked the duplicates
one because it breaks a fundamental invariant of a directory - ie all its
children have unique names.

I am not offering any solution in this comment (although I have 2 suggestions).
I want us to first agree that the current patch which resolves symlinks for
listStatus has a serious issue.

FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
default
--

Key: HADOOP-9984
URL: https://issues.apache.org/jira/browse/HADOOP-9984
Project: Hadoop Common
Issue Type: Sub-task
Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch,
HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789462#comment-13789462
 ] 

Sanjay Radia commented on HADOOP-9780:
--

The description and comments seem to be talking of two different things.
# the description says that following the symlink breaks compatibility. (I 
disagree - symlinks are suppose to be transparent for the most part).
# the comments discuss two things - RPC costs and client-context-sensitivity.
 **  If the symlink is /relative with no scheme then the NN has the option of 
resolving it to reduce the rpcs. We should do this. This was discussed during 
the symlink design and was marked as future optimization - we should have filed 
a jira for that at that time.
 ** if  the symlink is  fully-qualified with a scheme then it needs to be 
resolved on client side; an optimization can be done by the NN if the scheme 
and authority matches that of the NN. The NN cannot resolve a fiully qualified 
name that does not match its own scheme and authority due to security.

There is no client context-sensitivity unless different clients are using 
inconsistent DNS servers.

 Filesystem and FileContext methods that follow symlinks should return 
 unresolved paths
 --

 Key: HADOOP-9780
 URL: https://issues.apache.org/jira/browse/HADOOP-9780
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Priority: Minor

 Currently, when you follow a symlink, you get back the resolved path, with 
 all symlinks removed.  For compatibility reasons, we might want to have the 
 returned path be an unresolved path.
 Example: if you have:
 {code}
 /a - b
 /b
 /b/c
 {code}
 {{getFileStatus(/a/c)}} will return a {{FileStatus}} object with a {{Path}} 
 of {{/b/c}}.
 If we returned the unresolved path, that would be {{/a/c}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789464#comment-13789464
 ] 

Sanjay Radia commented on HADOOP-9780:
--

Hit the add button accidentally. 
If the issue is rpc costs and SS resolution optimization then lets fix the jira 
title and comments. I suggest that we do that since the comments do not seem to 
match the title and description.

 Filesystem and FileContext methods that follow symlinks should return 
 unresolved paths
 --

 Key: HADOOP-9780
 URL: https://issues.apache.org/jira/browse/HADOOP-9780
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Priority: Minor

 Currently, when you follow a symlink, you get back the resolved path, with 
 all symlinks removed.  For compatibility reasons, we might want to have the 
 returned path be an unresolved path.
 Example: if you have:
 {code}
 /a - b
 /b
 /b/c
 {code}
 {{getFileStatus(/a/c)}} will return a {{FileStatus}} object with a {{Path}} 
 of {{/b/c}}.
 If we returned the unresolved path, that would be {{/a/c}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-06 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Attachment: Hadoop-10020-3.patch

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020-3.patch, 
 Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-06 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Attachment: Hadoop-10020-4.patch

Update patch against latest trunk

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020-3.patch, 
 Hadoop-10020-4.patch, Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10020) disable symlinks temporarily

2013-10-06 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787816#comment-13787816
 ] 

Sanjay Radia commented on HADOOP-10020:
---

As noted by chris, the TestRetryCacheWithHA is not related to this patch; 
further TestRetryCacheWithHA passed on my desktop.

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020-3.patch, 
 Hadoop-10020-4-forBranch2.1beta.patch, Hadoop-10020-4.patch, 
 Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-06 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Status: Open  (was: Patch Available)

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020-3.patch, 
 Hadoop-10020-4-forBranch2.1beta.patch, Hadoop-10020-4.patch, 
 Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-06 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Attachment: Hadoop-10020-4-forBranch2.1beta.patch

backported patch to branch-2.1beta. Main change is that RawLocalFileSystem does 
not have createSymlink.

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020-3.patch, 
 Hadoop-10020-4-forBranch2.1beta.patch, Hadoop-10020-4.patch, 
 Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-06 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9984:
-

Target Version/s: 2.3.0  (was: 2.1.2-beta)

Making this a blocker on 2.3 and removing the blocker on 2.1.2-beta since 
Hadoop-10020 (disabling symlinks) has been committed to branch-2.1-beta.

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-05 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Assignee: Sanjay Radia
  Status: Patch Available  (was: Open)

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-05 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Attachment: Hadoop-10020.patch

Note this patch is not for trunk - merely for 2.2.
Submitted here for running through tests

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Priority: Blocker
 Attachments: Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10020) disable symlinks temporarily

2013-10-05 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-10020:
--

Attachment: Hadoop-10020-2.patch

Updated patch which addresses chris's feedback. Note FSNamesystem.java will not 
compile till it gets the new FileSystem.class in the jar.

 disable symlinks temporarily
 

 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe
Assignee: Sanjay Radia
Priority: Blocker
 Attachments: Hadoop-10020-2.patch, Hadoop-10020.patch


 disable symlinks temporarily until we can make them production-ready in 
 Hadoop 2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10016) Distcp should support copy from a secure Hadoop 1 cluster to an insecure Hadoop 2 cluster

2013-10-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786395#comment-13786395
 ] 

Sanjay Radia commented on HADOOP-10016:
---

(2) - artificial delegation token, is cleaner.

 Distcp should support copy from a secure Hadoop 1 cluster to an insecure 
 Hadoop 2 cluster
 -

 Key: HADOOP-10016
 URL: https://issues.apache.org/jira/browse/HADOOP-10016
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai

 Distcp should be able to copy from a secure cluster to an insecure cluster. 
 This functionality is important for operators to migrate data to a new Hadoop 
 installation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10016) Distcp should support copy from a secure Hadoop 1 cluster to an insecure Hadoop 2 cluster

2013-10-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786400#comment-13786400
 ] 

Sanjay Radia commented on HADOOP-10016:
---

Add a config flag for this so as to minimize potential issues in case there are 
parts of the stack that depend on the old behavior (though I doubt there is).

 Distcp should support copy from a secure Hadoop 1 cluster to an insecure 
 Hadoop 2 cluster
 -

 Key: HADOOP-10016
 URL: https://issues.apache.org/jira/browse/HADOOP-10016
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai

 Distcp should be able to copy from a secure cluster to an insecure cluster. 
 This functionality is important for operators to migrate data to a new Hadoop 
 installation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786437#comment-13786437
 ] 

Sanjay Radia commented on HADOOP-9984:
--

bq. We agreed that HADOOP-9972  ... chrisN I don't remember this decision 
from the call.
I don't remember this part either, can we delay HADOOP-9972 to 2.3 since it 
does not break any APIs but simply adds new ones in a compatible fashion.

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786499#comment-13786499
 ] 

Sanjay Radia commented on HADOOP-9984:
--

Resolving the symlink  has issues. Daryn has raised several issues of which the 
duplicates one is a showstopper.
Another is where one may not  have permission  to follow the symlink - here 
listStatus will have to throw AccessControlException when  a child via a 
symlink is not accessible.


 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10016) Distcp should support copy from a secure Hadoop 1 cluster to an insecure Hadoop 2 cluster

2013-10-04 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786727#comment-13786727
]

Sanjay Radia commented on HADOOP-10016:
---

Context:
* Copying data from Secure 2.x to Insecure 2.x works because of rpc v9 (the
reverse, I think will also works).
* Want to copy data via DistCp from insecure 1.x to Secure 2.x - this does not
work - the issue is similar to the next one.
* Want to copy data via DistCp from secure 1.x to insecure 2.x - this fails
as described below.

Currently an insecure cluster returns *null* for getDelegationToken(). 2.x
clients do freakout on this null token but this is fixed by Hadoop-10017. But
the key problem is as follows.
* Distcp job runs in Secure1.x cluster, and it tries connect to NN in
Insecure2.x.
* Because security is enabled (distcp is running in Secure1.x cluster), it
sees that it has no tokens for that cluster (recall none were obtained because
a null was returned); then it tries to do a kerberos based authentication; this
fails because it has no kerberos credentials (it is running as a MR job) - even
the fallback to insecure does not work because it fails *before* the
RPC-connection.

Solution - have the NN in Insecure2.x return an artificial-token.

Distcp should support copy from a secure Hadoop 1 cluster to an insecure
Hadoop 2 cluster
-

Key: HADOOP-10016
URL: https://issues.apache.org/jira/browse/HADOOP-10016
Project: Hadoop Common
Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai

Distcp should be able to copy from a secure cluster to an insecure cluster.
This functionality is important for operators to migrate data to a new Hadoop
installation.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

2013-10-04 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786864#comment-13786864
 ] 

Sanjay Radia commented on HADOOP-9984:
--

I would like to change this jira as a non-blocker for 2.2GA because we are 
going to disable symlinks in 2.2GA via HADOOP-10020.
Colin, are you okay with that?

 FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
 default
 --

 Key: HADOOP-9984
 URL: https://issues.apache.org/jira/browse/HADOOP-9984
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.0-beta
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
 HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
 HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
 HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch


 During the process of adding symlink support to FileSystem, we realized that 
 many existing HDFS clients would be broken by listStatus and globStatus 
 returning symlinks.  One example is applications that assume that 
 !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
 HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
 resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10003) HarFileSystem.listLocatedStatus() fails

2013-10-01 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783379#comment-13783379
 ] 

Sanjay Radia commented on HADOOP-10003:
---

+1
Create a new jira to to update har test to ensure that HarFileSystem implements 
every declared method of FileSystem (see 
TestFilterFileSystem#testFilterFileSystem()  - it does a similar check). This 
ensures that when a new method is added to FileSystem, the test will catch that 
HarFileSystem is updated accordingly).

 HarFileSystem.listLocatedStatus() fails
 ---

 Key: HADOOP-10003
 URL: https://issues.apache.org/jira/browse/HADOOP-10003
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.1-beta
Reporter: Jason Dere
 Attachments: HADOOP-10003.1.patch, HADOOP-10003.2.patch, 
 HADOOP-10003.3.patch, HADOOP-10003.4.patch, HADOOP-10003.5.patch, test.har.tar


 It looks like HarFileSystem.listLocatedStatus() doesn't work properly because 
 it is inheriting FilterFileSystem's implementation.  This is causing archive 
 unit tests to fail in Hive when using hadoop 2.1.1.
 If HarFileSystem overrides listLocatedStatus() to use FileSystem's 
 implementation, the Hive unit tests pass.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10012) Secure Oozie jobs fail with delegation token renewal exception in HA setup

2013-10-01 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783439#comment-13783439
 ] 

Sanjay Radia commented on HADOOP-10012:
---

I am little bit worried about the key name in the map 
{code}
Text alias = new Text(HA_DT_SERVICE_PREFIX + // + specificToken.getService());
 ugi.addToken(alias, specificToken);
{code}

The original code added it using the unchanged service name.

 Secure Oozie jobs fail with delegation token renewal exception in HA setup
 --

 Key: HADOOP-10012
 URL: https://issues.apache.org/jira/browse/HADOOP-10012
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta
Assignee: Suresh Srinivas
 Attachments: HADOOP-10012.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10012) Secure Oozie jobs fail with delegation token renewal exception in HA setup

2013-10-01 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783455#comment-13783455
 ] 

Sanjay Radia commented on HADOOP-10012:
---

Turns out the key name in the map is not used to lookup a token when connecting 
to a service. Instead the token selector grabs all tokens and uses the service 
name *inside* the token:
{code}
for (Token? extends TokenIdentifier token : tokens) {
  if (kindName.equals(token.getKind())
   service.equals(token.getService())) {
return (TokenTokenIdent) token;
  }
}
{code}
I think changing the key in the map should be okay.
Daryn added this for debugging assistance - quote from IM:
{quote}
I figured it should have a unique name just in case, for some reason, the 
client really did have a token for the physical service.  Plus to simplify 
debugging if something goes awry again.
it won't break anything, because nothing really looks for a token by its key 
other than some mr/yarn stuff (grumble)
{quote}

 +1 for the patch.

Todd/Atm - didn't you run into this bug with CDH4 and CDH5 (even though CDH 
ships MR1, wouldn't it run into the same issue?)

 Secure Oozie jobs fail with delegation token renewal exception in HA setup
 --

 Key: HADOOP-10012
 URL: https://issues.apache.org/jira/browse/HADOOP-10012
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta
Assignee: Suresh Srinivas
 Attachments: HADOOP-10012.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-9671) Improve Hadoop security - Use cases, Threat Model and Problems

2013-08-22 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747755#comment-13747755
 ] 

Sanjay Radia commented on HADOOP-9671:
--

bq. A common token format with variable identity attributes to support 
fine-grained access control
Can you expand on this and also give an example. I got it that the token will 
contain both the main principal and also the group membership based on the 
discussion on other Jiras. Do you mean more than that?

 Improve Hadoop security - Use cases, Threat Model and Problems
 --

 Key: HADOOP-9671
 URL: https://issues.apache.org/jira/browse/HADOOP-9671
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9671) Improve Hadoop security - Use cases, Threat Model and Problems

2013-08-22 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747757#comment-13747757
 ] 

Sanjay Radia commented on HADOOP-9671:
--

bq.  Support proxy authentication: one Hadoop service can proxy authenticated 
client user to access other Hadoop service in a constrained way
Hadoop supports this today. Did want to do something different?

 Improve Hadoop security - Use cases, Threat Model and Problems
 --

 Key: HADOOP-9671
 URL: https://issues.apache.org/jira/browse/HADOOP-9671
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-9671) Improve Hadoop security - Use cases, Threat Model and Problems

2013-08-21 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HADOOP-9671:
-

Summary: Improve Hadoop security - Use cases, Threat Model and Problems  
(was: Improve Hadoop security - Use cases)

 Improve Hadoop security - Use cases, Threat Model and Problems
 --

 Key: HADOOP-9671
 URL: https://issues.apache.org/jira/browse/HADOOP-9671
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9671) Improve Hadoop security - Use cases, Threat Model and Problems

2013-08-21 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746984#comment-13746984
 ] 

Sanjay Radia commented on HADOOP-9671:
--

bq. My recollection of the consensus is this is an issue to collect use cases 
...
Looks like the sore point is calling this a umberalla Jira.  Sorry for doing 
that. Avik, Andrew any comments on the content? - I put a lot of effort in 
writing the use cases, threat model and problems - your feedback would be 
useful. I have updated the title slightly to reflect the content better.


 Improve Hadoop security - Use cases, Threat Model and Problems
 --

 Key: HADOOP-9671
 URL: https://issues.apache.org/jira/browse/HADOOP-9671
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sanjay Radia



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9671) Improve Hadoop security - Use cases, Threat Model and Problems

2013-08-21 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747079#comment-13747079
]

Sanjay Radia commented on HADOOP-9671:
--

Kai, thanks for you comment.
Wrt to your use cases - some of them are not use cases but a design choice.
For example item 1 Users can authenticate using their own domain specific
identity and receive an opaque token... is an particular design choice (a
good choice). Items 2 and 5 are requirements or goals. Use cases can be derived
from 3 and 4.

Let me update the use cases with what I can extract from your comments. I will
also try and generalize U2, U3, U4 and use them as variations of common use
case. I suspect you are after the use case that says that there are many base
authentication providers and that they all can be used with approriate plugins.

Will get back to you on the rest of your comment after I finish digesting them.

Can you please expand on your constraint:
bq. Hadoop should only need to understand the common token and the new
authentication method instead of concrete authentication mechanism

I assume that common token is the one issued by the newly proposed Hadoop
Authentication Server (HAS). Do you mean that we need to replace the delegation
token and the blocks tokens with it? What is are the new authentication
method and the concrete authentication method?

Improve Hadoop security - Use cases, Threat Model and Problems
--

Key: HADOOP-9671
URL: https://issues.apache.org/jira/browse/HADOOP-9671
Project: Hadoop Common
Issue Type: Improvement
Reporter: Sanjay Radia

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9880) SASL changes from HADOOP-9421 breaks Secure HA NN

2013-08-16 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742357#comment-13742357
 ] 

Sanjay Radia commented on HADOOP-9880:
--

Daryn, your hack is slightly more appealing. However, the client-side does know 
how to deal with StandbyException (ie it tries on the other side). So we need 
to fix the client side to catch the InvalidToken unwrap the cause and then 
retry.

BTW HDFS-3083 has a test and we need to run that test against this one verify 
that we have not regressed.



 SASL changes from HADOOP-9421 breaks Secure HA NN 
 --

 Key: HADOOP-9880
 URL: https://issues.apache.org/jira/browse/HADOOP-9880
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9880.patch


 buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth. 
 When create() is called against it, secretManager.checkAvailableForRead() is 
 called, which fails in HA standby. Thus HA standby nodes cannot be 
 transitioned to active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9880) SASL changes from HADOOP-9421 breaks Secure HA NN

2013-08-16 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742651#comment-13742651
 ] 

Sanjay Radia commented on HADOOP-9880:
--

We also applied and it works.
Yes it is unwrapping the exception - i did not read it carefully last night.
We applied the test from HDFS-3083 and that test passes.

 SASL changes from HADOOP-9421 breaks Secure HA NN 
 --

 Key: HADOOP-9880
 URL: https://issues.apache.org/jira/browse/HADOOP-9880
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9880.patch


 buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth. 
 When create() is called against it, secretManager.checkAvailableForRead() is 
 called, which fails in HA standby. Thus HA standby nodes cannot be 
 transitioned to active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9880) SASL changes from HADOOP-9421 breaks Secure HA NN

2013-08-16 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742666#comment-13742666
 ] 

Sanjay Radia commented on HADOOP-9880:
--

+1, i will commit in a few minutes; thanks daryn.

 SASL changes from HADOOP-9421 breaks Secure HA NN 
 --

 Key: HADOOP-9880
 URL: https://issues.apache.org/jira/browse/HADOOP-9880
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9880.patch


 buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth. 
 When create() is called against it, secretManager.checkAvailableForRead() is 
 called, which fails in HA standby. Thus HA standby nodes cannot be 
 transitioned to active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9880) RPC Server should not unconditionally create SaslServer with Token auth.

2013-08-15 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741771#comment-13741771
]

Sanjay Radia commented on HADOOP-9880:
--

We see exactly the same error during a test this morning.
The 2 Jiras that caused this problem are the recent HADOOP-9421 and the
earlier HDFS-3083.

HADOOP-9421 improved SASL protocol.
ZKFC uses Kerberos. But the server-side initiates the token-based challenge
just in case the client wants token. As part of doing that the server does
secretManager.checkAvailableForRead() fails because the NN is in standby.

It is really bizzare that there is check for the server's state (active or
standby) as part of SASL. This was introduced in HDFS-3083 to deal with a
failover bug. In HDFS-3083, Aaron noted that he does not like the solution:
I'm not in love with this solution, as it leaks abstractions all over the
place,. The abstraction layer violation finally caught up with us.

Turns out even prior to Dary's HADOOP-9421 a similar problem could have
occurred if the ZKFC had used Kerberos for first connection and Tokens for any
subsequent connections.

An immediate fix is required to fix what HADOOP-9421 broke but I believe we
need to also fix the fix that HDFS-3083 introduced - the abstraction layer
violations need to be cleaned up.

RPC Server should not unconditionally create SaslServer with Token auth.

Key: HADOOP-9880
URL: https://issues.apache.org/jira/browse/HADOOP-9880
Project: Hadoop Common
Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Kihwal Lee
Priority: Blocker

buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth.
When create() is called against it, secretManager.checkAvailableForRead() is
called, which fails in HA standby. Thus HA standby nodes cannot be
transitioned to active.

[jira] [Commented] (HADOOP-9392) Token based authentication and Single Sign On

2013-08-09 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734976#comment-13734976
]

Sanjay Radia commented on HADOOP-9392:
--

Looks like we are mostly in agreement. However I do not agree with the
following:
bq. The 'A' of HAS could be explained as Authentication, Authorization, or
Auditing or more of them, depending on HAS is provisioned with which role(s).
In this way it's much flexible and better to evolve in future.
I understand the notion of the a central authentication server and that is what
you have explained in the design. I believe that most if not all of the
authorization belongs closer to the resources servers being access. So for now
lets just call this the hadoop-authentication-service. Later if and when we
have design for centralized authorization we can expand the scope of the
service.

I would like to change this jira's title to Hadoop Authentication Service.
Also drop the SSO from the title since that is not unique to the HAS - today's
Kerberos/Authentication service supports SSO just as the HAS will.

Token based authentication and Single Sign On
-

Key: HADOOP-9392
URL: https://issues.apache.org/jira/browse/HADOOP-9392
Project: Hadoop Common
Issue Type: New Feature
Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng
Fix For: 3.0.0

Attachments: TokenAuth-breakdown.pdf, token-based-authn-plus-sso.pdf,
token-based-authn-plus-sso-v2.0.pdf

This is an umbrella entry for one of project Rhino’s topic, for details of
project Rhino, please refer to
https://github.com/intel-hadoop/project-rhino/. The major goal for this entry
as described in project Rhino was

“Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication
at the RPC layer, via SASL. However this does not provide valuable attributes
such as group membership, classification level, organizational identity, or
support for user defined attributes. Hadoop components must interrogate
external resources for discovering these attributes and at scale this is
problematic. There is also no consistent delegation model. HDFS has a simple
delegation capability, and only Oozie can take limited advantage of it. We
will implement a common token based authentication framework to decouple
internal user and service authentication from external mechanisms used to
support it (like Kerberos)”

We’d like to start our work from Hadoop-Common and try to provide common
facilities by extending existing authentication framework which support:
1.Pluggable token provider interface
2.Pluggable token verification protocol and interface
3.Security mechanism to distribute secrets in cluster nodes
4.Delegation model of user authentication

[jira] [Commented] (HADOOP-9797) Pluggable and compatible UGI change

2013-08-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735045#comment-13735045
 ] 

Sanjay Radia commented on HADOOP-9797:
--

[~daryn]
bq. but a change this large might need to be decomposed into incremental steps.

 Pluggable and compatible UGI change
 ---

 Key: HADOOP-9797
 URL: https://issues.apache.org/jira/browse/HADOOP-9797
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: Rhino
 Fix For: 3.0.0

 Attachments: HADOOP-9797-v1.patch


 As already widely discussed current UGI related classes needs to be improved 
 in many aspects. This is to improve and make UGI so that it can be: 
  
 * Pluggable, new authentication method with its login module can be 
 dynamically registered and plugged without having to change the UGI class;
 * Extensible, login modules with their options can be dynamically extended 
 and customized so that can be reusable elsewhere, like in TokenAuth;
  
 * No Kerberos relevant, remove any Kerberos relevant functionalities out of 
 it to make it simple and suitable for other login mechanisms; 
 * Of appropriate abstraction and API, with improved abstraction and API it’s 
 possible to allow authentication implementations not using JAAS modules;
 * Compatible, should be compatible with previous deployment and 
 authentication methods, so the existing APIs won’t be removed and some of 
 them are just to be deprecated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9797) Pluggable and compatible UGI change

2013-08-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735050#comment-13735050
 ] 

Sanjay Radia commented on HADOOP-9797:
--

OOps hit add by mistake.
[~daryn]
bq. but a change this large might need to be decomposed into incremental steps.
Having read the patch, I agree with Daryn, can you split this jira into smaller 
ones and submit some updated patches please.

 Pluggable and compatible UGI change
 ---

 Key: HADOOP-9797
 URL: https://issues.apache.org/jira/browse/HADOOP-9797
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: Rhino
 Fix For: 3.0.0

 Attachments: HADOOP-9797-v1.patch


 As already widely discussed current UGI related classes needs to be improved 
 in many aspects. This is to improve and make UGI so that it can be: 
  
 * Pluggable, new authentication method with its login module can be 
 dynamically registered and plugged without having to change the UGI class;
 * Extensible, login modules with their options can be dynamically extended 
 and customized so that can be reusable elsewhere, like in TokenAuth;
  
 * No Kerberos relevant, remove any Kerberos relevant functionalities out of 
 it to make it simple and suitable for other login mechanisms; 
 * Of appropriate abstraction and API, with improved abstraction and API it’s 
 possible to allow authentication implementations not using JAAS modules;
 * Compatible, should be compatible with previous deployment and 
 authentication methods, so the existing APIs won’t be removed and some of 
 them are just to be deprecated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9392) Token based authentication and Single Sign On

2013-08-09 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735094#comment-13735094
]

Sanjay Radia commented on HADOOP-9392:
--

bq. I would like to change this jira's title to Hadoop Authentication
Service. ...
Sorry had not noticed you had created Hadoop-9798. The title I suggested
applies more to that Jira.
So this jira is really about make Hadoop Authentication pluggable beyond
Kerberos and Hadoop-tokens.

Token based authentication and Single Sign On
-

Key: HADOOP-9392
URL: https://issues.apache.org/jira/browse/HADOOP-9392
Project: Hadoop Common
Issue Type: New Feature
Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng
Fix For: 3.0.0

Attachments: TokenAuth-breakdown.pdf, token-based-authn-plus-sso.pdf,
token-based-authn-plus-sso-v2.0.pdf

[jira] [Commented] (HADOOP-9797) Pluggable and compatible UGI change

2013-08-09 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735218#comment-13735218
 ] 

Sanjay Radia commented on HADOOP-9797:
--

Given that this jira is going change a key part of the code, can you please add 
a comment on what you will be testing beyond the unit tests. For security, we 
have relied on a fair amount of manual testing. You should test the classic 
kerberos case for both HDFS and MR, aloog with a trusted proxy (say OOzie). 

 Pluggable and compatible UGI change
 ---

 Key: HADOOP-9797
 URL: https://issues.apache.org/jira/browse/HADOOP-9797
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: Rhino
 Fix For: 3.0.0

 Attachments: HADOOP-9797-v1.patch


 As already widely discussed current UGI related classes needs to be improved 
 in many aspects. This is to improve and make UGI so that it can be: 
  
 * Pluggable, new authentication method with its login module can be 
 dynamically registered and plugged without having to change the UGI class;
 * Extensible, login modules with their options can be dynamically extended 
 and customized so that can be reusable elsewhere, like in TokenAuth;
  
 * No Kerberos relevant, remove any Kerberos relevant functionalities out of 
 it to make it simple and suitable for other login mechanisms; 
 * Of appropriate abstraction and API, with improved abstraction and API it’s 
 possible to allow authentication implementations not using JAAS modules;
 * Compatible, should be compatible with previous deployment and 
 authentication methods, so the existing APIs won’t be removed and some of 
 them are just to be deprecated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9813) Fine-grained authorization library for HAS

2013-08-09 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735225#comment-13735225
]

Sanjay Radia commented on HADOOP-9813:
--

I don't get this one. Can you give use cases and examples of policies for
authorization.
bq. Take HDFS for example, when a user is trying to access a file or a folder,
name node will call into this library and pass the resource identifier and the
rights needed.
Are you assuming that ALL hadoop resources have global resource identifiers?

Fine-grained authorization library for HAS
--

Key: HADOOP-9813
URL: https://issues.apache.org/jira/browse/HADOOP-9813
Project: Hadoop Common
Issue Type: Task
Components: security
Affects Versions: 3.0.0
Reporter: Jerry Chen
Labels: Rhino

This is to define and provide authorization enforcement library for Hadoop
services. It provides the utilities to load and enforce security policies
through related services provided by the Authorization Service of HAS. Hadoop
components call these utilities to enforce the authorization policies. Take
HDFS for example, when a user is trying to access a file or a folder, name
node will call into this library and pass the resource identifier and the
rights needed. The scope of this is as follows:
* Define and implement authorization policy enforcement API to be utilized by
Hadoop services to enforce authorization policies.
* Define and implement authorization policy load and sync facilities.
* Define and implement authorization policy evaluation engine.

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-08 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734064#comment-13734064
 ] 

Sanjay Radia commented on HADOOP-9820:
--

+1 with a minor nit in java comment.
 // decode message if it's SASL wrapped
should be 
// Must be SASL wrapped, verify and decode.

 RPCv9 wire protocol is insufficient to support multiplexing
 ---

 Key: HADOOP-9820
 URL: https://issues.apache.org/jira/browse/HADOOP-9820
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9820.patch, HADOOP-9820.patch


 RPCv9 is intended to allow future support of multiplexing.  This requires all 
 wire messages to be tagged with a RPC header so a demux can decode and route 
 the messages accordingly.
 RPC ping packets and SASL QOP wrapped data is known to not be tagged with a 
 header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-07 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732156#comment-13732156
]

Sanjay Radia commented on HADOOP-9820:
--

bq. Let's say we do throw an exception as suggested. If the server cannot
unwrap the SASL data, ...
Note the exception is not being thrown because the server cannot unwrap, the
exception is being thrown because *currently* the only header that is
acceptable when wrapping is enabled is the RPC-header-callId=sasl *with* the
SASL-state=wrapped header. If you don't get that then throw the exception
(which will go with its own response header). Later when we add multiplexing we
will allow RPC-header-callId=sasl *with* start new rpc-stream and here comes
its its SASL-authenticate exchange.

RPCv9 wire protocol is insufficient to support multiplexing
---

Key: HADOOP-9820
URL: https://issues.apache.org/jira/browse/HADOOP-9820
Project: Hadoop Common
Issue Type: Bug
Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
Attachments: HADOOP-9820.patch

RPCv9 is intended to allow future support of multiplexing. This requires all
wire messages to be tagged with a RPC header so a demux can decode and route
the messages accordingly.
RPC ping packets and SASL QOP wrapped data is known to not be tagged with a
header.

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-07 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732137#comment-13732137
]

Sanjay Radia commented on HADOOP-9820:
--

The RPC header and the SASL header after the it (but before the wrapped data)
are not wrapped. The wrapped reply also has unwrapped headers (RPC and SASL).
So the exception (if say the RPC header or the SASL is incorrect) will pass
through fine. Indeed that is the beauty of the headers to the wrapped data - it
does allow throwing an exception at the outer layer. The only problem is that
if there is an exception at the RPC layer (above the wrapped layer) then the
client has to be able to unwrap in order to read the exception.

RPCv9 wire protocol is insufficient to support multiplexing
---

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-07 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732965#comment-13732965
]

Sanjay Radia commented on HADOOP-9820:
--

bq. The more specific cases I had in mind: ... Server wants to send a
non-sensitive control messages like is session alive or close session.
Requiring non-sensitive messages to be wrapped/unwrapped seems overkill.

I am in agreement with you here. But I was never proposing that we need to wrap
such stuff in the future. Since you are responding to an issue I never raised,
perhaps you are misreading my concern.
All I am just saying: in SaslRpcClient#SaslRpcInputStream.readNextRpcPacket
line 569:
{code}
if (headerBuilder.getCallId() != AuthProtocol.SASL.callId) {...
throw an exception, perhaps close the connection with fatal exception
{code}
In the future when we have out-of-band messages we can enumerate the ones that
are allowed.

RPCv9 wire protocol is insufficient to support multiplexing
---

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-07 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732968#comment-13732968
 ] 

Sanjay Radia commented on HADOOP-9820:
--

+1 modulo my comment on the exception and my comment on the javadoc. I like 
Luke's nit.

 RPCv9 wire protocol is insufficient to support multiplexing
 ---

 Key: HADOOP-9820
 URL: https://issues.apache.org/jira/browse/HADOOP-9820
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9820.patch


 RPCv9 is intended to allow future support of multiplexing.  This requires all 
 wire messages to be tagged with a RPC header so a demux can decode and route 
 the messages accordingly.
 RPC ping packets and SASL QOP wrapped data is known to not be tagged with a 
 header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-06 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731144#comment-13731144
 ] 

Sanjay Radia commented on HADOOP-9820:
--

In SaslRpcClient#SaslRpcInputStream.readNextRpcPacket line 569:  if 
(headerBuilder.getCallId() == AuthProtocol.SASL.callId) {...

Since SaslRpcInputStream is only used when sasl-wrapped, shouldn't it throw an 
exception if the callId is not SASL.callId? 

 RPCv9 wire protocol is insufficient to support multiplexing
 ---

 Key: HADOOP-9820
 URL: https://issues.apache.org/jira/browse/HADOOP-9820
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9820.patch


 RPCv9 is intended to allow future support of multiplexing.  This requires all 
 wire messages to be tagged with a RPC header so a demux can decode and route 
 the messages accordingly.
 RPC ping packets and SASL QOP wrapped data is known to not be tagged with a 
 header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-06 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731150#comment-13731150
 ] 

Sanjay Radia commented on HADOOP-9820:
--

You have optimized as per item 6 on your comment. Hence the javadoc for 
getInputStream and getOutputStream are incorrect. It should say something like 
Get SASL wrapped xxxputStreeam if it is sasl wrapped otherwise return original 
stream.

 RPCv9 wire protocol is insufficient to support multiplexing
 ---

 Key: HADOOP-9820
 URL: https://issues.apache.org/jira/browse/HADOOP-9820
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9820.patch


 RPCv9 is intended to allow future support of multiplexing.  This requires all 
 wire messages to be tagged with a RPC header so a demux can decode and route 
 the messages accordingly.
 RPC ping packets and SASL QOP wrapped data is known to not be tagged with a 
 header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing

2013-08-06 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731254#comment-13731254
 ] 

Sanjay Radia commented on HADOOP-9820:
--

bq. I did consider if an exception should be thrown. However, it would preclude 
the server sending any control messages to a given session. 
If that is the case then we should enumerate the messages explicitly in the 
code. 

However, Non-sasl messages will have their own header and will be wrapped - 
they  will be parsed by the next layer and the SASL  layer will not see them. 
If you agree then at this stage throw the exception.

 RPCv9 wire protocol is insufficient to support multiplexing
 ---

 Key: HADOOP-9820
 URL: https://issues.apache.org/jira/browse/HADOOP-9820
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9820.patch


 RPCv9 is intended to allow future support of multiplexing.  This requires all 
 wire messages to be tagged with a RPC header so a demux can decode and route 
 the messages accordingly.
 RPC ping packets and SASL QOP wrapped data is known to not be tagged with a 
 header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9816) RPC Sasl QOP is broken

2013-08-05 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730006#comment-13730006
 ] 

Sanjay Radia commented on HADOOP-9816:
--

+1

 RPC Sasl QOP is broken
 --

 Key: HADOOP-9816
 URL: https://issues.apache.org/jira/browse/HADOOP-9816
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.1.0-beta, 2.3.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HADOOP-9816.patch


 HADOOP-9421 broke the handling of SASL wrapping for RPC QOP integrity and 
 privacy options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 >

1 - 100 of 678 matches

Mail list logo