[jira] [Commented] (HADOOP-8800) Dynamic Compress Stream

2012-09-14 Thread Dong Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455609#comment-13455609
 ] 

Dong Xiang commented on HADOOP-8800:


Looking forward to hearing more details about this proposed feature. 

 Dynamic Compress Stream
 ---

 Key: HADOOP-8800
 URL: https://issues.apache.org/jira/browse/HADOOP-8800
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io
Affects Versions: 2.0.1-alpha
Reporter: yankay
  Labels: patch
   Original Estimate: 168h
  Remaining Estimate: 168h

 We use compress in MapReduce in some case because It use CPU to improve IO 
 throughput.
 But we can only set one compress algorithm in configure file. The hadoop 
 cluster is changing every time.  So a compress algorithm may not work well in 
 all case. 
 Why not provide an algorithm named dynamic. It can change compress level and 
 algorithm dynamicly based on performance. Like tcp, it starts up slowly, and 
 try run faster and faster. It can make the io faster by choose a more 
 suitable compress algorithm.
 I would write a detail design here, and try to submit a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455613#comment-13455613
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

Alan, if you have a patch that actually fixes this, maybe you could share it 
with us?

After doing a little more research, it seems that {{rpath}} is not just for 
binaries.  Dynamic libraries can have it too.  Although the man page for 
{{ld.so}} only mentions it in the context of executables, it seems like it can 
embedded into shared libraries as well.  Combine that with ${ORIGIN}, and at 
least in theory we could find {{libsnappy.so}} by using the path of 
{{libhadoop.so}}.

There's some discussion here: 
http://stackoverflow.com/questions/6323603/ld-using-rpath-origin-inside-a-shared-library-recursive

It's all a little undocumented and weird, and ${ORIGIN} is definitely Linux- 
(and maybe Solaris?) specific, but it might be better than {{LD_LIBRARY_PATH}}. 
 Maybe.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated HADOOP-8808:
-

Status: Patch Available  (was: Open)

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated HADOOP-8808:
-

Attachment: HADOOP-8808.patch

The attached patch adds a note about the deprecation to the commands dus, lsr 
and rmr. Also completes the documentation for the rm command. 

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455635#comment-13455635
 ] 

Hadoop QA commented on HADOOP-8808:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545109/HADOOP-8808.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1456//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1456//console

This message is automatically generated.

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455637#comment-13455637
 ] 

Harsh J commented on HADOOP-8808:
-

One nit: There are currently not being published as they have to be ported to 
the APT format.

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Affects Version/s: (was: 2.0.0-alpha)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang

 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Fix Version/s: 2.0.3-alpha

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.

2012-09-14 Thread Bertrand Dechoux (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455649#comment-13455649
 ] 

Bertrand Dechoux commented on HADOOP-8791:
--

I made the same test on the 1.0.3 version and I can confirm that directories 
can not be delete, being empty or not has no influence...

@Hemanth : so yes, the documentation should be updated to say only files.

Can someone check for previous versions? Is that a regression? Or a 
documentation that was never correct? If that's a regression, should it be 
corrected now that's the 1.0.3 (and I guess the 1.0) behaviour? 

I also tested on 1.0.3 whether size of files has an impact but it doesn't.

Second question : if the observed behaviour is the 'correct' one, it would 
really beg for a 'rmdir' equivalent command for HDFS.

 rm Only deletes non empty directory and files.
 

 Key: HADOOP-8791
 URL: https://issues.apache.org/jira/browse/HADOOP-8791
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0
Reporter: Bertrand Dechoux
Assignee: Jing Zhao
  Labels: documentation
 Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch


 The documentation (1.0.3) is describing the opposite of what rm does.
 It should be  Only delete files and empty directories.
 With regards to file, the size of the file should not matter, should it?
 OR I am totally misunderstanding the semantic of this command and I am not 
 the only one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Attachment: HADOOP-8805.patch

Tested on hadoop-2.1.0 with bin/hdfs groups username

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8805.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Patch Available  (was: Open)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8805.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455651#comment-13455651
 ] 

Hadoop QA commented on HADOOP-8805:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545117/HADOOP-8805.patch
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1457//console

This message is automatically generated.

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8805.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-14 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455652#comment-13455652
 ] 

Luke Lu commented on HADOOP-8803:
-

bq. 1. No, more restrictive HDFS delegation token and Block Token are used to 
do byte-range access control, and new Block Token can reduce the damage when 
Block Token key is compromised.

Block tokens are ephemeral and expire in a few minutes as the shared DN secret 
is refreshed, unlike delegation token that is typically renewed in a longer 
period and often stored in the local storage. I feel that a delegation token 
with embedded authorization data (a la MS' PAC extension in Kerberos) is a 
useful addition, while block token with byte range seems redundant/overkill to 
me.

bq. I would like to test those kinds of job, do you guys have any examples of 
this kind of code I can try to run?

Any tasks that open an hdfs file directly will break with the byte range stuff. 
e.g. TestDFSIO

bq. So for my work, extra workload is only happening when one mapper need to 
access data which is on more than one datanode. And I don't think that is 
always happening.

Replica selection is done at DFSClient side, so the client gets the block 
locations of all the replicas and their block token and in your case, tokens. 
If you don't generate all the tokens for all the replicas, you'll likely have 
to do extra RPC calls, which is even worse.

bq. Another argument is that sharing the same key for all HDFS cluster is too 
risky. This overhead is something hadoop have to paid.

The shared key is in DN memory only and constantly refreshed. Risk only comes 
from OS/software bugs, which don't help unique keys either, in the big scheme 
of things.

bq. if hadoop is running in public cloud, they are maybe running under 
different cloud provider, and OS may different and people who maintaining those 
machines are different.

It's highly unlikely that a single cluster would span multiple providers. A 
more likely scenario would be a cluster in a provider mirroring to a cluster in 
another provider. For cross provider internet traffic, you'd better do TLS 
anyway if you care about security. 


 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access 

[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.

2012-09-14 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455656#comment-13455656
 ] 

Hemanth Yamijala commented on HADOOP-8791:
--

The behaviour is the same with 1.0.2 as well.

 rm Only deletes non empty directory and files.
 

 Key: HADOOP-8791
 URL: https://issues.apache.org/jira/browse/HADOOP-8791
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0
Reporter: Bertrand Dechoux
Assignee: Jing Zhao
  Labels: documentation
 Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch


 The documentation (1.0.3) is describing the opposite of what rm does.
 It should be  Only delete files and empty directories.
 With regards to file, the size of the file should not matter, should it?
 OR I am totally misunderstanding the semantic of this command and I am not 
 the only one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Open  (was: Patch Available)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8805.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8799) commons-lang version mismatch

2012-09-14 Thread Joel Costigliola (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455658#comment-13455658
 ] 

Joel Costigliola commented on HADOOP-8799:
--

Sorry, there is no problem, my mistake.

commons-lang was set to 2.6 version in some parent pom of my project, and it 
was overriding the 2.4 version deduced from transitive dependency.

Sorry again ! 

Joel

ps : I think still me be useful to display the classpath used when running a 
job, if you think it is worth it I will create an issue for that.


 commons-lang version mismatch
 -

 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 1.0.3
Reporter: Joel Costigliola

 hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
 references commons-lang:jar:2.6 as shown in maven dependency:tree command 
 output extract.
 {noformat}
 org.apache.hadoop:hadoop-core:jar:1.0.3:provided
 +- commons-cli:commons-cli:jar:1.2:provided
 +- xmlenc:xmlenc:jar:0.52:provided
 +- commons-httpclient:commons-httpclient:jar:3.0.1:provided
 +- commons-codec:commons-codec:jar:1.4:provided
 +- org.apache.commons:commons-math:jar:2.1:provided
 +- commons-configuration:commons-configuration:jar:1.6:provided
 |  +- commons-collections:commons-collections:jar:3.2.1:provided
 |  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
 {noformat}
 Hadoop install libs should be consistent with hadoop-core maven dependencies.
 I found this error because I was using a feature available in 
 commons-lang.2.6 that was failing when executed in my hadoop cluster (but not 
 with m pigunit tests).
 A last remark, it would be nice to display the classpath used by hadoop 
 cluster while executing a job, because these kinds of errors are not easy to 
 find.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455659#comment-13455659
 ] 

Hemanth Yamijala commented on HADOOP-8808:
--

Wow, ok. Thanks for that info Harsh.. I couldn't find the equivalent for the fs 
shell guide among the new apt documentation files. Is it missing or 
intentionally removed ? (In which case we can close this bug as won't fix or 
some such)

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream

2012-09-14 Thread yankay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yankay updated HADOOP-8800:
---

Description: 
h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |
throughput(MB/s)|Throughput in 100MB/s network  |Throughput in 10MB/s 
network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the same. Some 
are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in 
a MapReduce Job.
* The CPU can be use up when there is a “big job”, but the network is free. The 
job should not compress any more.
* Some node in a cluster have high load, but the others not. We should use 
different compress algorithm for them.


h2. What

In my idea, I can transfer a file with blocks. First block may use a compress 
algorithm such as LZO. After transfer it, we can get some information, and 
decide what the compress algorithm in the next block. Like TCP, it starts up 
slowly, and try run faster and faster. It can make the io faster by choose a 
more suitable compress algorithm.

h2. Design

In a big file transfer, compress and network all takes time. Consider to 
transfer a fixed size file:
{code} 
T = c/p + r/s
{code} 

Define:
* T: the total time used
* C:the CPU cycle to compress the file
* P:the available CPU GHz for compressing
* R:compress radio
* S:the available speed(throughput) of network (including decompress)

The variable are not fixed, the all can be change mainly by some reason.
* C:decide by different content , algorithm and algorithm level
* P: decide by CPU , processes at the machine 
* R: decide by different content , algorithm and algorithm level
* S: decide by network, the pairs’ network and processes at the machine. 

The file is transferred by block. After a block transferred, there is some 
information we can get:
* C/P : the time takes for compress
* R : the compress radio for the block
* R/S : the time takes for network

With the information and some reasonable assumptions, we can forecast each 
compress algorithm’s performance. The reasonable assumptions are:

* In one transfer, the content is similar
* P, S is continuous, we can assume that the next P, S is the same as the 
current one.
* C, R is proportional totally. For example, LZO is always faster than ZIP.

With the information and reasonable assumptions, we can forecast the next one 
by:
* C2/P2 = (last C1/P1 ) * (avg C2/C1)
* R2/S2 = F(R1) / S1 (S1 is known)
* F(R1)=R1*(avg R2/R1)

Then we can know the time each compress algorithm needs. And choose the best 
one to compress next block.
To optimize the avg values, we must log some statistics information.  To 
statistic the avg, we can set an N = 3, 5…, and change the avg value after each 
information comes.
{code}
Avg V = (n-1) / n * Avg V+ V / n
{code}

h2. Next Work

I would try to submit a patch later. And is there anyone interested about that?



  was:

h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |
throughput(MB/s)|Throughput in 100MB/s network  |Throughput in 10MB/s 
network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is 

[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream

2012-09-14 Thread yankay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yankay updated HADOOP-8800:
---

Description: 

h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |
throughput(MB/s)|Throughput in 100MB/s network  |Throughput in 10MB/s 
network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the same. Some 
are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in 
a MapReduce Job.
* The CPU can be use up when there is a “big job”, but the network is free. The 
job should not compress any more.
* Some node in a cluster have high load, but the others not. We should use 
different compress algorithm for them.


h2. What

In my idea, I can transfer a file with blocks. First block may use a compress 
algorithm such as LZO. After transfer it, we can get some information, and 
decide what the compress algorithm in the next block. Like TCP, it starts up 
slowly, and try run faster and faster. It can make the io faster by choose a 
more suitable compress algorithm.

h2. Design

In a big file transfer, compress and network all takes time. Consider to 
transfer a fixed size file:
{code} 
T = c/p + r/s
{code} 

Define:
* T: the total time used
* C:the CPU cycle to compress the file
* P:the available CPU GHz for compressing
* R:compress radio
* S:the available speed(throughput) of network (including decompress)

The variable are not fixed, the all can be change mainly by some reason.
* C:decide by different content , algorithm and algorithm level
* P: decide by CPU , processes at the machine 
* R: decide by different content , algorithm and algorithm level
* S: decide by network, the pairs’ network and processes at the machine. 

The file is transferred by block. After a block transferred, there is some 
information we can get:
* C/P : the time takes for compress
* R : the compress radio for the block
* R/S : the time takes for network

With the information and some reasonable assumptions, we can forecast each 
compress algorithm’s performance. The reasonable assumptions are:

* In one transfer, the content is similar
* P, S is continuous, we can assume that the next P, S is the same as the 
current one.
* C, R is proportional totally. For example, LZO is always faster than ZIP.

With the information and reasonable assumptions, we can forecast the next one 
by:
* C2/P2 = (last C1/P1 ) * (avg C2/C1)
* R2/S2 = F(R1) / S1 (S1 is known)
* F(R1)=R1*(avg R2/R1)

Then we can know the time each compress algorithm needs. And choose the best 
one to compress next block.
To optimize the avg values, we must log some statistics information.  To 
statistic the avg, we can set an N = 3, 5…, and change the avg value after each 
information comes.
{code}
Avg V = (n-1) / n * Avg V+ V / n
{code}

h2. Next Work

I would try to submit a patch later. And is there anyone interested about that?



  was:
We use compress in MapReduce in some case because It use CPU to improve IO 
throughput.

But we can only set one compress algorithm in configure file. The hadoop 
cluster is changing every time.  So a compress algorithm may not work well in 
all case. 

Why not provide an algorithm named dynamic. It can change compress level and 
algorithm dynamicly based on performance. Like tcp, it starts up slowly, and 
try run faster and faster. It can make the io faster by choose a more suitable 
compress algorithm.

I would write a detail design here, and try to submit a patch.


 Dynamic Compress Stream
 ---

 Key: HADOOP-8800
 URL: https://issues.apache.org/jira/browse/HADOOP-8800
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io
Affects Versions: 2.0.1-alpha
Reporter: yankay
  Labels: patch
   Original Estimate: 168h
  Remaining Estimate: 168h

 h2. Introduce
 This is a patch to provide a feature that a big file can be transferred in 
 different compress algorithm and level 

[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream

2012-09-14 Thread yankay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yankay updated HADOOP-8800:
---

Description: 
h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |throughput(MB/s)   |Throughput in 100MB/s 
network  |Throughput in 10MB/s network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the same. Some 
are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in 
a MapReduce Job.
* The CPU can be use up when there is a “big job”, but the network is free. The 
job should not compress any more.
* Some node in a cluster have high load, but the others not. We should use 
different compress algorithm for them.


h2. What

In my idea, I can transfer a file with blocks. First block may use a compress 
algorithm such as LZO. After transfer it, we can get some information, and 
decide what the compress algorithm in the next block. Like TCP, it starts up 
slowly, and try run faster and faster. It can make the io faster by choose a 
more suitable compress algorithm.

h2. Design

In a big file transfer, compress and network all takes time. Consider to 
transfer a fixed size file:
{code} 
T = c/p + r/s
{code} 

Define:
* T: the total time used
* C:the CPU cycle to compress the file
* P:the available CPU GHz for compressing
* R:compress radio
* S:the available speed(throughput) of network (including decompress)

The variable are not fixed, the all can be change mainly by some reason.
* C:decide by different content , algorithm and algorithm level
* P: decide by CPU , processes at the machine 
* R: decide by different content , algorithm and algorithm level
* S: decide by network, the pairs’ network and processes at the machine. 

The file is transferred by block. After a block transferred, there is some 
information we can get:
* C/P : the time takes for compress
* R : the compress radio for the block
* R/S : the time takes for network

With the information and some reasonable assumptions, we can forecast each 
compress algorithm’s performance. The reasonable assumptions are:

* In one transfer, the content is similar
* P, S is continuous, we can assume that the next P, S is the same as the 
current one.
* C, R is proportional totally. For example, LZO is always faster than ZIP.

With the information and reasonable assumptions, we can forecast the next one 
by:
* C2/P2 = (last C1/P1 ) * (avg C2/C1)
* R2/S2 = F(R1) / S1 (S1 is known)
* F(R1)=R1*(avg R2/R1)

Then we can know the time each compress algorithm needs. And choose the best 
one to compress next block.
To optimize the avg values, we must log some statistics information.  To 
statistic the avg, we can set an N = 3, 5…, and change the avg value after each 
information comes.
{code}
Avg V = (n-1) / n * Avg V+ V / n
{code}

h2. Next Work

I would try to submit a patch later. And is there anyone interested about that?



  was:
h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |
throughput(MB/s)|Throughput in 100MB/s network  |Throughput in 10MB/s 
network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the 

[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream

2012-09-14 Thread yankay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yankay updated HADOOP-8800:
---

Description: 
h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


| |compression ratio|   compression throughput(MB/s)|Throughput in 100MB/s 
network  |Throughput in 10MB/s network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the same. Some 
are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in 
a MapReduce Job.
* The CPU can be use up when there is a “big job”, but the network is free. The 
job should not compress any more.
* Some node in a cluster have high load, but the others not. We should use 
different compress algorithm for them.


h2. What

In my idea, I can transfer a file with blocks. First block may use a compress 
algorithm such as LZO. After transfer it, we can get some information, and 
decide what the compress algorithm in the next block. Like TCP, it starts up 
slowly, and try run faster and faster. It can make the io faster by choose a 
more suitable compress algorithm.

h2. Design

In a big file transfer, compress and network all takes time. Consider to 
transfer a fixed size file:
{code} 
T = c/p + r/s
{code} 

Define:
* T: the total time used
* C:the CPU cycle to compress the file
* P:the available CPU GHz for compressing
* R:compress radio
* S:the available speed(throughput) of network (including decompress)

The variable are not fixed, the all can be change mainly by some reason.
* C:decide by different content , algorithm and algorithm level
* P: decide by CPU , processes at the machine 
* R: decide by different content , algorithm and algorithm level
* S: decide by network, the pairs’ network and processes at the machine. 

The file is transferred by block. After a block transferred, there is some 
information we can get:
* C/P : the time takes for compress
* R : the compress radio for the block
* R/S : the time takes for network

With the information and some reasonable assumptions, we can forecast each 
compress algorithm’s performance. The reasonable assumptions are:

* In one transfer, the content is similar
* P, S is continuous, we can assume that the next P, S is the same as the 
current one.
* C, R is proportional totally. For example, LZO is always faster than ZIP.

With the information and reasonable assumptions, we can forecast the next one 
by:
* C2/P2 = (last C1/P1 ) * (avg C2/C1)
* R2/S2 = F(R1) / S1 (S1 is known)
* F(R1)=R1*(avg R2/R1)

Then we can know the time each compress algorithm needs. And choose the best 
one to compress next block.
To optimize the avg values, we must log some statistics information.  To 
statistic the avg, we can set an N = 3, 5…, and change the avg value after each 
information comes.
{code}
Avg V = (n-1) / n * Avg V+ V / n
{code}

h2. Next Work

I would try to submit a patch later. And is there anyone interested about that?



  was:
h2. Introduce

This is a patch to provide a feature that a big file can be transferred in 
different compress algorithm and level dynamically. So the performance would be 
better.

h2. Why

Compression is important in transfer big file. I found that we should need 
different compress algorithm in different cases. I have tested some cases.


||compression ratio|compression |throughput(MB/s)   |Throughput in 100MB/s 
network  |Throughput in 10MB/s network|
|ZLIB   |35.80% |9.6|9.6|9.6|
|LZO|54.40% |101.7  |101.7  |18.38235294|
|LIBLZF |54.60% |134.3  |134.3  |18.3150183|
|QUICKLZ|54.90% |183.4  |182.1493625|18.21493625|
|FASTLZ |56.20% |134.4  |134.4  |17.79359431|
|SNAPPY |59.80% |189|167.2240803|16.72240803|
|NONE   |100%   |300|100|10|

So there are no “perfect compress algorithm” can suit all cases. I want to 
build a dynamic “CompressOutputStream” to change “compress algorithm” in 
runtime.

In a busy cluster, The CPU and the network is changing all the time. There are 
some cases:

* The cluster is very huge; the network between each node is not the same. 

[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455698#comment-13455698
 ] 

Steve Loughran commented on HADOOP-8803:


Are you proposing that Hadoop (more precisely HDFS) would run in public cloud 
infrastructure without any kind of network-layer protections? Because security 
isn't enough to prevent things like DDoS attacks, and it opens up your entire 
cluster's dataset to 0-day exploits. 

Irrespective of what is done with kerberos, byte level security, etc. I would 
never tell anyone to bring up a Hadoop cluster in public infrastructure without 
isolating its network by way of iptables, VPN or whatever -with proxies or 
access restricted to in-cluster hosts in a DMZ-style setup.

Some cloud infrastructures do let you specify the network structure (VMware and 
VBox based systems included), you can do the same with KVM-based systems if the 
tooling is right (specifically network drivers in the host system). Isolation 
must be at this level not at the app layer, because you can never be 100% sure 
that you've fixed all security bugs.

Oh, and EC2 bills you for all net traffic that gets passed the router specs 
that you've declared, so if you do have a wide open system then you get to pay 
a lot for all the traffic you are rejecting.

I can see that your goal of limiting the access of a TT  spawned task(s) to 
the subset of a fileset that they are working with seems like a good goal -but 
consider that over time the amount of work sent to a TT means that even a 
compromised machine would get at more data over time. If it's ruthless it could 
signal fake job completion events to get the data faster than the MR job would 
do, so get more work sent to it, so collect more data than other machines.

You also need to consider that the blocks inside the DN could be compromised, 
they'd all have to be encrypted by whatever was writing them; the keys to 
decrypt passed down to the tasks. 

In a cloud infrastructure the tactic you'd adopt for security relies on VM 
images -you'd roll the VM back to the previous image regularly, either every 59 
minutes (cost effective), or every job. You need to think about DN 
decommissioning here too, but it's a better story -it's the standard tactic for 
defending VMs in the DMZ from being compromised for any extended period of time.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I 

[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455699#comment-13455699
 ] 

Steve Loughran commented on HADOOP-8801:


looks OK, but the patch should fix something the original got wrong -it should 
use {{Throwable.toString()}} not {{Throwable.getMessage()}}.

If you want to see why, look at the source for {{NullPointerException}} and 
work out what message it prints...

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path

2012-09-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455700#comment-13455700
 ] 

Steve Loughran commented on HADOOP-8797:


-1

Detecting {{JAVA_HOME}} is notoriously brittle -it's been moved to bigtop to 
let the RPM- and deb- specific installations deal with in the ways that best 
suit their platforms, but even they have to struggle to keep up to date 
(BIGTOP-523). They can release on a faster cycle than Hadoop itself, which 
means that such problems are fixed faster.

The other reason for bigtop hosting is that is where the functional tests of 
installation will go, and that's what you need to verify that the JAVA_HOME 
detection logic goes (and indeed, any of the other Hadoop scripts). 

Gera -we appreciate the work you've done, but you'd be better off checking out 
Bigtop and seeing if there are things there that you need to fix for your 
installations. 

(and yes, everyone hates the inconsistent placement of Java versions on Linux, 
as well as difference between {{JAVA_HOME}} and {{JDK_HOME}}.

 automatically detect JAVA_HOME on Linux, report native lib path similar to 
 class path
 -

 Key: HADOOP-8797
 URL: https://issues.apache.org/jira/browse/HADOOP-8797
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: Linux
Reporter: Gera Shegalov
Priority: Trivial
 Attachments: HADOOP-8797.patch


 Enhancement 1)
 iterate common java locations on Linux starting with Java7 down to Java6
 Enhancement 2)
 hadoop jnipath to print java.library.path similar to hadoop classpath

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455760#comment-13455760
 ] 

Hudson commented on HADOOP-8801:


Integrated in Hadoop-Hdfs-trunk #1165 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/])
HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. 
Contributed by Eli Collins (Revision 1384435)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455762#comment-13455762
 ] 

Hudson commented on HADOOP-8795:


Integrated in Hadoop-Hdfs-trunk #1165 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/])
HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to 
executable is specified. Contributed by Sean Mackrory. (Revision 1384436)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh


 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455759#comment-13455759
 ] 

Hudson commented on HADOOP-8755:


Integrated in Hadoop-Hdfs-trunk #1165 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/])
HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed 
by Andrey Klochkov. (Revision 1384627)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml


 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8809) RPMs should skip useradds if the users already exist

2012-09-14 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-8809:
--

 Summary: RPMs should skip useradds if the users already exist
 Key: HADOOP-8809
 URL: https://issues.apache.org/jira/browse/HADOOP-8809
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 1.0.3
Reporter: Steve Loughran
Priority: Minor


The hadoop.spec preinstall script creates users -but it does this even if they 
already exist. This may causes problems if the installation has already got 
those users with different uids. A check with {{id}} can avoid this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455798#comment-13455798
 ] 

Hudson commented on HADOOP-8755:


Integrated in Hadoop-Mapreduce-trunk #1196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/])
HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed 
by Andrey Klochkov. (Revision 1384627)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml


 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455799#comment-13455799
 ] 

Hudson commented on HADOOP-8801:


Integrated in Hadoop-Mapreduce-trunk #1196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/])
HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. 
Contributed by Eli Collins (Revision 1384435)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455801#comment-13455801
 ] 

Hudson commented on HADOOP-8795:


Integrated in Hadoop-Mapreduce-trunk #1196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/])
HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to 
executable is specified. Contributed by Sean Mackrory. (Revision 1384436)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh


 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455813#comment-13455813
 ] 

Kihwal Lee commented on HADOOP-8806:


bq. These libraries can be bundled in the $HADOOP_ROOT/lib/native directory. 
For example, the -Dbundle.snappy build option copies libsnappy.so to this 
directory. However, snappy can't be loaded from this directory unless 
LD_LIBRARY_PATH is set to include this directory.

If this is only about MR jobs, isn't setting {{LD_LIBRARY_PATH}} in 
{{mapreduce.admin.user.env}} enough?

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455822#comment-13455822
 ] 

Steve Loughran commented on HADOOP-8801:


@Eli, this patch went in within 2 hours of being submitted. This effectively 
prevented any review from anyone not in PST timezone keeping up to date with 
their JIRA issues.

While I celebrate a rapid integration of patches into the tree, I believe this 
devalues the RTC process, as people like myself can't review, even though -as I 
did belatedly comment- the patch should use {{toString()}} over 
{{getMessage()}}, because {{getMessage()}} has the right to return null.


# I think this a bad precedent. It means there's nothing to stop me getting 
together with someone else in the EU and pushing through a set of changes 
before anyone notices. 
# As I said, the patch is inadequate. 

I don't want to revert the patch -it's in- but I'd like my feedback to be 
incorporated into a new JIRA, as {{s/getMessage()/r/toString()}} enhances the 
value of the output even more. 

Do you want to do this? Or shall I? And in either case, can we have slightly 
more than 2h for review?


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-14 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455830#comment-13455830
 ] 

Daryn Sharp commented on HADOOP-8803:
-

I've done a lot of token work, and have contemplated similar changes.  I think 
it's much more difficult than believed.  I've only quickly skimmed the 
discussion, so I apologize if I've missed some details or if I'm just repeating 
some of what's already been said.
# When blocks can shift during rebalancing, the client will have to request 
more tokens when the blocks move which will impact performance and complicate 
the client.
# Trying to restrict the hdfs tokens to a path sounds good on the surface, but 
is very tricky (if not impossible) to do w/o imposing a significant burden on 
users and the framework.
# How does the ipc layer know which path token to use to establish the 
connection?  I suppose it could randomly pick one for the given NN, and then 
every fs method will have to be modified to send the specific token for the 
path it's going to access.
# Hftp/webhdfs both implicitly acquire a token during initialization.  At this 
point it's not possible to know which paths will be later accessed.
# Symlinks, esp. via viewfs, pose interesting problems.  Viewfs currently 
acquires tokens for all mounts because it doesn't know which mounts might be 
indirectly referenced via a symlink.  Trying to discern up and resolve all 
symlinks up front will be difficult.
# In general, it's not really feasible for the job submission client to know 
every path that will be accessed.
#* Jobs that submit jobs can't know all paths that will be accessed, esp. if 
the sub-job's paths are based on runtime logic of another job.
#* Should job submission be oozie, hive, pig, etc aware?
#* For instance, the oozie launcher is agnostic to the job launched in its 
task, so the user has to declare the NNs in the workflow conf if accessing 
anything other than the default NN.  This is different than actual job 
submission that generally knows the files/dirs that need to be accessed.  In 
practice, the user usually doesn't declare full paths in the conf setting for 
additional NNs.
#* Yarn and other parts of MR rely on the user's tokens to access files too.  
Should MR need to be yarn aware?
#* The job client won't know about the behavior of downstream pluggable items 
like shuffle handlers.
#* The job client won't know about dynamic runtime behaviors of a job.

All that said, maybe block tokens can be improved, but I don't see how path 
based tokens are feasible.
* If the token path is per-file, it would be a slew of tokens per job, and for 
the aforementioned reasons I just don't see how the job client can possibly 
calculate all the paths up-front.
* If the token path is a non-recursive directory it has similar problems to the 
per-file path approach.  Also, it won't allow a recursive listing of a 
directory, and jobs won't be able to access sub-directories created on the fly.
* If the token path is for recursive directory access, well, you know what 
everyone will do: request a token for / negating any value of the path-based 
token.


 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly 

[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455839#comment-13455839
 ] 

Harsh J commented on HADOOP-8808:
-

Not intentional, was just missed as part of the port of the rest of the docs. 
I'm unable to recall the porting JIRA but I'll comment back shortly with that. 
Sorry for the short response earlier Hemanth.

 Update FsShell documentation to mention deprecation of some of the commands, 
 and mention alternatives
 -

 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala
 Attachments: HADOOP-8808.patch


 In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
 favour of du -s, ls -r and rm -r respectively. The FsShell documentation 
 should be updated to mention these, so that users can start switching. Also, 
 there are places where we refer to the deprecated commands as alternatives. 
 This can be changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455894#comment-13455894
 ] 

Hudson commented on HADOOP-8780:


Integrated in Hadoop-Hdfs-trunk-Commit #2795 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2795/])
HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed 
Radwan (Revision 1384833)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java


 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, 
 HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-8780:
--

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Ahmed!

 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, 
 HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455903#comment-13455903
 ] 

Hudson commented on HADOOP-8780:


Integrated in Hadoop-Common-trunk-Commit #2732 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2732/])
HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed 
Radwan (Revision 1384833)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java


 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, 
 HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455941#comment-13455941
 ] 

Hudson commented on HADOOP-8780:


Integrated in Hadoop-Mapreduce-trunk-Commit #2756 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2756/])
HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed 
Radwan (Revision 1384833)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java


 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, 
 HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455944#comment-13455944
 ] 

Ivan Mitic commented on HADOOP-8731:


{quote}Can you please clarify the following scenario so that other folks 
reading this thread have it easy?
Directory A (perm for user Foo) contains directory B (perm for Everyone)
So contents of A will be private cache and contents of B will be public cache 
on Windows but not on Linux.
{quote}
Correct. 

The issue we are trying to mitigate on Windows comes from the default 
permissions. Specifically, default permissions in terms of the Unix mask map to 
700 on Windows. This means that by default others (EVERYONE group) do not 
have r+x permissions. This further means that, if we have a path 
c:\some\path\file1 and a user wants to upload file1 to the public 
distributed cache he has to change the permissions on the whole drive to do so. 
Now, to make the scenario more Windows friendly, we only require user to 
change the permissions on the file1 to make it public (more precisely to give 
EVERYONE group read permissions on the file1).

On Unix systems, given that default permissions are usually 775 or 755 the 
scenario is completely opposite.

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Attachment: HADOOP-8805-v2.patch

Tested on 2.0.0-alpha

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Fix Version/s: (was: 2.0.3-alpha)
   2.0.0-alpha
   Status: Patch Available  (was: Open)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.0-alpha

 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.

2012-09-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455966#comment-13455966
 ] 

Harsh J commented on HADOOP-8791:
-

That doc is pretty old so we may have regressed long ago as well. I suppose we 
could add a new rmdir in FsShell, as a client-side tool, instead of having rm 
delete directories - to stick to usual convention.

I'd like Daryn's thoughts for a new command though.

 rm Only deletes non empty directory and files.
 

 Key: HADOOP-8791
 URL: https://issues.apache.org/jira/browse/HADOOP-8791
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0
Reporter: Bertrand Dechoux
Assignee: Jing Zhao
  Labels: documentation
 Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch


 The documentation (1.0.3) is describing the opposite of what rm does.
 It should be  Only delete files and empty directories.
 With regards to file, the size of the file should not matter, should it?
 OR I am totally misunderstanding the semantic of this command and I am not 
 the only one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8799) commons-lang version mismatch

2012-09-14 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455970#comment-13455970
 ] 

Giridharan Kesavan commented on HADOOP-8799:


 @joel, If you think this is invalid, could you please resolve the jira as 
invalid?

 commons-lang version mismatch
 -

 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 1.0.3
Reporter: Joel Costigliola

 hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
 references commons-lang:jar:2.6 as shown in maven dependency:tree command 
 output extract.
 {noformat}
 org.apache.hadoop:hadoop-core:jar:1.0.3:provided
 +- commons-cli:commons-cli:jar:1.2:provided
 +- xmlenc:xmlenc:jar:0.52:provided
 +- commons-httpclient:commons-httpclient:jar:3.0.1:provided
 +- commons-codec:commons-codec:jar:1.4:provided
 +- org.apache.commons:commons-math:jar:2.1:provided
 +- commons-configuration:commons-configuration:jar:1.6:provided
 |  +- commons-collections:commons-collections:jar:3.2.1:provided
 |  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
 {noformat}
 Hadoop install libs should be consistent with hadoop-core maven dependencies.
 I found this error because I was using a feature available in 
 commons-lang.2.6 that was failing when executed in my hadoop cluster (but not 
 with m pigunit tests).
 A last remark, it would be nice to display the classpath used by hadoop 
 cluster while executing a job, because these kinds of errors are not easy to 
 find.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8799) commons-lang version mismatch

2012-09-14 Thread Joel Costigliola (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Costigliola resolved HADOOP-8799.
--

Resolution: Invalid

 commons-lang version mismatch
 -

 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 1.0.3
Reporter: Joel Costigliola

 hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
 references commons-lang:jar:2.6 as shown in maven dependency:tree command 
 output extract.
 {noformat}
 org.apache.hadoop:hadoop-core:jar:1.0.3:provided
 +- commons-cli:commons-cli:jar:1.2:provided
 +- xmlenc:xmlenc:jar:0.52:provided
 +- commons-httpclient:commons-httpclient:jar:3.0.1:provided
 +- commons-codec:commons-codec:jar:1.4:provided
 +- org.apache.commons:commons-math:jar:2.1:provided
 +- commons-configuration:commons-configuration:jar:1.6:provided
 |  +- commons-collections:commons-collections:jar:3.2.1:provided
 |  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
 {noformat}
 Hadoop install libs should be consistent with hadoop-core maven dependencies.
 I found this error because I was using a feature available in 
 commons-lang.2.6 that was failing when executed in my hadoop cluster (but not 
 with m pigunit tests).
 A last remark, it would be nice to display the classpath used by hadoop 
 cluster while executing a job, because these kinds of errors are not easy to 
 find.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455989#comment-13455989
 ] 

Hadoop QA commented on HADOOP-8805:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545176/HADOOP-8805-v2.patch
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1458//console

This message is automatically generated.

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.0-alpha

 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456004#comment-13456004
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

bq. If this is only about MR jobs, isn't setting LD_LIBRARY_PATH in 
mapreduce.admin.user.env enough?

Yes, it is enough.  The question is whether we could do anything to make this 
easier.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-14 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456005#comment-13456005
 ] 

Aaron T. Myers commented on HADOOP-8755:


I've just sent the email to common-dev@ as discussed.

Thanks again, Andrey.

 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8734) LocalJobRunner does not support private distributed cache

2012-09-14 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-8734.
-

   Resolution: Fixed
Fix Version/s: 1-win
 Hadoop Flags: Reviewed

+1, looks good. Verified that the test fails without the code change and passes 
with.

Just committed this to branch-1-win. Thanks Ivan!

 LocalJobRunner does not support private distributed cache
 -

 Key: HADOOP-8734
 URL: https://issues.apache.org/jira/browse/HADOOP-8734
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 1-win

 Attachments: HADOOP-8734-LocalJobRunner.patch


 It seems that LocalJobRunner does not support private distributed cache. The 
 issue is more visible on Windows as all DC files are private by default (see 
 HADOOP-8731).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal

2012-09-14 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-8810:
--

 Summary: when building with documentation build stops waiting for 
ENTER on terminal
 Key: HADOOP-8810
 URL: https://issues.apache.org/jira/browse/HADOOP-8810
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha


When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage 
-DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in 
Ubuntu as well), the build stops, if you press ENTER it continues. It happens 
twice.

I've traced this down to the exec-maven-plugin invocation of protoc for 
hadoop-yarn-api module (and other YARN module I don't recall at the moment).

jstacking the Maven process it seems the exec-maven-plugin has some locking 
issues consuming the STDOUT/STDERR of the process being executed.

I've converted the protoc invocation in the hadoop-yarn-api to use the antrun 
plugin instead and then another module running protoc using exec-maven-puglin 
hang.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal

2012-09-14 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456015#comment-13456015
 ] 

Alejandro Abdelnur commented on HADOOP-8810:


Converting all the protoc invocations to use the antrun plugin and the 
saveVersion.sh script fixes the problem.

 when building with documentation build stops waiting for ENTER on terminal
 --

 Key: HADOOP-8810
 URL: https://issues.apache.org/jira/browse/HADOOP-8810
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha


 When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage 
 -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times 
 in Ubuntu as well), the build stops, if you press ENTER it continues. It 
 happens twice.
 I've traced this down to the exec-maven-plugin invocation of protoc for 
 hadoop-yarn-api module (and other YARN module I don't recall at the moment).
 jstacking the Maven process it seems the exec-maven-plugin has some locking 
 issues consuming the STDOUT/STDERR of the process being executed.
 I've converted the protoc invocation in the hadoop-yarn-api to use the antrun 
 plugin instead and then another module running protoc using exec-maven-puglin 
 hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456019#comment-13456019
 ] 

Todd Lipcon commented on HADOOP-8806:
-

It's not only about MR jobs. Apps like HBase and Flume can also depend on 
Snappy, and it would be nice to allow them to get everything they need by just 
setting the java.library.path to include the hadoop lib/native dir without also 
futzing with LD_LIBRARY_PATH

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456030#comment-13456030
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

What do you think about the proposed RPATH hack?  Basically allowing us to find 
libraries relative to libhadoop.so.  I haven't tried it yet, but it seems like 
a good way to go if it works?

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Attachment: HADOOP-8805-v3.patch

Tested on the trunk (3.0.0-SNAPSHOT)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.0-alpha

 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Open  (was: Patch Available)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 2.0.0-alpha

 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Fix Version/s: (was: 2.0.0-alpha)
   Status: Patch Available  (was: Open)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal

2012-09-14 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HADOOP-8810:
---

Attachment: HADOOP-8810.patch

 when building with documentation build stops waiting for ENTER on terminal
 --

 Key: HADOOP-8810
 URL: https://issues.apache.org/jira/browse/HADOOP-8810
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8810.patch


 When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage 
 -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times 
 in Ubuntu as well), the build stops, if you press ENTER it continues. It 
 happens twice.
 I've traced this down to the exec-maven-plugin invocation of protoc for 
 hadoop-yarn-api module (and other YARN module I don't recall at the moment).
 jstacking the Maven process it seems the exec-maven-plugin has some locking 
 issues consuming the STDOUT/STDERR of the process being executed.
 I've converted the protoc invocation in the hadoop-yarn-api to use the antrun 
 plugin instead and then another module running protoc using exec-maven-puglin 
 hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal

2012-09-14 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HADOOP-8810:
---

Status: Patch Available  (was: Open)

 when building with documentation build stops waiting for ENTER on terminal
 --

 Key: HADOOP-8810
 URL: https://issues.apache.org/jira/browse/HADOOP-8810
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8810.patch


 When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage 
 -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times 
 in Ubuntu as well), the build stops, if you press ENTER it continues. It 
 happens twice.
 I've traced this down to the exec-maven-plugin invocation of protoc for 
 hadoop-yarn-api module (and other YARN module I don't recall at the moment).
 jstacking the Maven process it seems the exec-maven-plugin has some locking 
 issues consuming the STDOUT/STDERR of the process being executed.
 I've converted the protoc invocation in the hadoop-yarn-api to use the antrun 
 plugin instead and then another module running protoc using exec-maven-puglin 
 hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456035#comment-13456035
 ] 

Todd Lipcon commented on HADOOP-8806:
-

Seems reasonable to me.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8811) Compile hadoop native library in FreeBSD

2012-09-14 Thread Radim Kolar (JIRA)
Radim Kolar created HADOOP-8811:
---

 Summary: Compile hadoop native library in FreeBSD
 Key: HADOOP-8811
 URL: https://issues.apache.org/jira/browse/HADOOP-8811
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0
Reporter: Radim Kolar
Priority: Critical
 Attachments: freebsd-native.txt

Native hadoop library do not compiles in FreeBSD because setnetgrent returns 
void and assembler do not supports SSE4 instructions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8811) Compile hadoop native library in FreeBSD

2012-09-14 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated HADOOP-8811:


Attachment: freebsd-native.txt

 Compile hadoop native library in FreeBSD
 

 Key: HADOOP-8811
 URL: https://issues.apache.org/jira/browse/HADOOP-8811
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Radim Kolar
Priority: Critical
  Labels: freebsd
 Attachments: freebsd-native.txt


 Native hadoop library do not compiles in FreeBSD because setnetgrent returns 
 void and assembler do not supports SSE4 instructions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8811) Compile hadoop native library in FreeBSD

2012-09-14 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated HADOOP-8811:


Target Version/s: 2.0.0-alpha, 0.23.1  (was: 0.23.1, 2.0.0-alpha)
  Status: Patch Available  (was: Open)

 Compile hadoop native library in FreeBSD
 

 Key: HADOOP-8811
 URL: https://issues.apache.org/jira/browse/HADOOP-8811
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0
Reporter: Radim Kolar
Priority: Critical
  Labels: freebsd
 Attachments: freebsd-native.txt


 Native hadoop library do not compiles in FreeBSD because setnetgrent returns 
 void and assembler do not supports SSE4 instructions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456039#comment-13456039
 ] 

Aaron T. Myers commented on HADOOP-8801:


Hey Steve, I'm sure Eli will be happy to address your feedback in a follow-up 
JIRA. That said, for very small and low-risk patches such as this, I don't 
think that waiting some arbitrary amount of time to allow for someone to 
comment on it, who might never do so, is productive for anyone. For larger, 
riskier, or more controversial patches, I'll often say something like I'll 
commit this later today/tomorrow unless there are further comments.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456041#comment-13456041
 ] 

Hadoop QA commented on HADOOP-8810:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545196/HADOOP-8810.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javac.  The patch appears to cause the build to fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1460//console

This message is automatically generated.

 when building with documentation build stops waiting for ENTER on terminal
 --

 Key: HADOOP-8810
 URL: https://issues.apache.org/jira/browse/HADOOP-8810
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8810.patch


 When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage 
 -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times 
 in Ubuntu as well), the build stops, if you press ENTER it continues. It 
 happens twice.
 I've traced this down to the exec-maven-plugin invocation of protoc for 
 hadoop-yarn-api module (and other YARN module I don't recall at the moment).
 jstacking the Maven process it seems the exec-maven-plugin has some locking 
 issues consuming the STDOUT/STDERR of the process being executed.
 I've converted the protoc invocation in the hadoop-yarn-api to use the antrun 
 plugin instead and then another module running protoc using exec-maven-puglin 
 hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456045#comment-13456045
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-

Thanks for the explanation, Ivan.

Patch looks good overall. Two comments:
 - In your java comment for ancestorsHaveExecutePermissions(), please also 
mention that this change is only needed to enable LocalJobRunner to use 
Public-dist-cache. I'd also like the subject of this ticket to be changed - 
Public dist-cache support for LocalJobRunner on Windows
 - The changes involving FileUtil.chmod() look spurious, can you explain 
those changes?

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8811) Compile hadoop native library in FreeBSD

2012-09-14 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456046#comment-13456046
 ] 

Radim Kolar commented on HADOOP-8811:
-

patch freebsd-native.txt also applies cleanly to branch-0.23

 Compile hadoop native library in FreeBSD
 

 Key: HADOOP-8811
 URL: https://issues.apache.org/jira/browse/HADOOP-8811
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Radim Kolar
Priority: Critical
  Labels: freebsd
 Attachments: freebsd-native.txt


 Native hadoop library do not compiles in FreeBSD because setnetgrent returns 
 void and assembler do not supports SSE4 instructions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache

2012-09-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456048#comment-13456048
 ] 

Ivan Mitic commented on HADOOP-8734:


Awesome, thanks!

 LocalJobRunner does not support private distributed cache
 -

 Key: HADOOP-8734
 URL: https://issues.apache.org/jira/browse/HADOOP-8734
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 1-win

 Attachments: HADOOP-8734-LocalJobRunner.patch


 It seems that LocalJobRunner does not support private distributed cache. The 
 issue is more visible on Windows as all DC files are private by default (see 
 HADOOP-8731).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456050#comment-13456050
 ] 

Ivan Mitic commented on HADOOP-8731:


Thanks for reviewing Vinod!

bq. In your java comment for ancestorsHaveExecutePermissions(), please also 
mention that this change is only needed to enable LocalJobRunner to use 
Public-dist-cache. I'd also like the subject of this ticket to be changed - 
Public dist-cache support for LocalJobRunner on Windows
The change does not apply to LocalJobRunner only, but to distributed cache in 
general. I tried to explain what is the problem and how am I trying to solve it 
above, let me know if you need additional clarification.

bq. The changes involving FileUtil.chmod() look spurious, can you explain 
those changes?
Bikas asked the same question above :) Quoting my answer: {quote} The issue is 
that the right permissions are not set on files if I do not make this change. 
If you take a look at the previous FileUtils.chmod() it only sets permissions 
for archives, but not for files. Now when I moved it below, it sets the 
permissions for both files are archives. {quote}

Let me know if you have additional questions/comments.

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456057#comment-13456057
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-

Apologies for repeating the questions, I overlooked your answers.

There are two cases:
 - In the case of a real cluster, and with HDFS, the definition of a public 
dist-cache file is one which is accessible to all users; snd HDFS also has 
posix style permissions. The method isPublic() eventually is used by the 
JobClient to figure out which of the user-needed artifacts are public and which 
are not. So in the distributed-cluster case with DFS, this definition of 
public-cache doesn't need to change irrespective of whether you have Windows or 
Linux underneath.
 - If you are talking of distributed MR cluster working on a local-filesystem, 
yes your changes will be needed, but that mode is not a supported setup anyways 
and will most likely need many more changes besides yours.

Regarding the permissions related changes:
 - I believe TT absolutely needs to set ugo+rx for dirs containing expanded 
archives. This is needed to address some of the artifacts which retain 
permissions from the original bits that a user upload. So let's not move/change 
that code out of the archives code block.
 - And for files, can you tell me why the 2nd line in the code-fragment shown 
below doesn't already do it correctly on Windows? It may in fact be because of 
some other bug, so asking - is it not enough to set correct permissions on the 
file itself in case of Windows? 
{code}
 ...
 sourceFs.copyToLocalFile(sourcePath, workFile);
 localFs.setPermission(workFile, permission);
 if (isArchive) {
 ...
{code}

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8811) Compile hadoop native library in FreeBSD

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456059#comment-13456059
 ] 

Hadoop QA commented on HADOOP-8811:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545197/freebsd-native.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//console

This message is automatically generated.

 Compile hadoop native library in FreeBSD
 

 Key: HADOOP-8811
 URL: https://issues.apache.org/jira/browse/HADOOP-8811
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Radim Kolar
Priority: Critical
  Labels: freebsd
 Attachments: freebsd-native.txt


 Native hadoop library do not compiles in FreeBSD because setnetgrent returns 
 void and assembler do not supports SSE4 instructions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-14 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456075#comment-13456075
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Hi Kingshuk,

From long term, new features in hadoop can potentially give other management 
systems, which are based on Hadoop, more flexible to design and use. The core 
observation is that some improvements would be much easier to implement if we 
can do that in Hadoop's code (and those features can be reused by other 
systems, which save cost and time). For example, byte-range restrict design in 
Block token can also be used by management system outside Hadoop. But without 
this type of block token, programmer may need to design much more complex 
method to achieve this function, so company need to pay money and time for 
that. It is reasonable to think Hadoop is wrapped as middleware layers, but 
the improvement of hadoop layer would give upper and lower layer better 
flexibility, or performance, or security. 

Cloud provider can add additional security layer, but all of those services 
need extra paid. And even cloud providers invest huge money on those area, I 
think they still cannot guarantee that data is safe without huge performance 
penalty. Consider about balance of cost, performance, and security, I decide to 
do some work in Hadoop layer.

So what I believe is that, the hadoop, which is better designed and more 
secure, can reduce the difficulties to implement the overall system, such as 
Hortonworks HDP and IBM BigInsights, thus save time and cost. I think we should 
always look at what the hadoop users really need. For the programmer who 
writing system based on hadoop, they would like to see some good features come 
out of Hadoop. That is why I want to post my idea here to discus to see what is 
real need and difficulties from industry. 

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up 

[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456092#comment-13456092
 ] 

Eli Collins commented on HADOOP-8801:
-

@Steve,

I'm happy to address your feedback in a follow up jira.

On a related note would you be willing to make sure *all* your changes have 
been code reviewed? Most of your recent changes (HADOOP-8064, HADOOP-7878, 
HADOOP-, HADOOP-7772, HADOOP-7727, HADOOP-7705, etc) have been committed 
without any code review or a +1 from another committer, which is our project's 
policy as I understand it. I'm a little surprised you want more time to have 
this change code reviewed when you commit your own changes without any code 
review. Likewise, I don't want to revert these changes but they should not have 
been committed without review and a +1 from another committer.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-8806:
-

Attachment: HADOOP-8806.003.patch

* insert '$ORIGIN/' into the RPATH of {{libhadoop.so}} on Linux.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.

2012-09-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-8806:
-

Description: 
libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
directory.  However, snappy can't be loaded from this directory unless 
{{LD_LIBRARY_PATH}} is set to include this directory.

Can we make this configuration just work without needing to rely on 
{{LD_LIBRARY_PATH}}?

  was:
libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
directory.  However, snappy can't be loaded from this directory unless 
{{LD_LIBRARY_PATH}} is set to include this directory.

Should we also search {{java.library.path}} when loading these libraries?

Summary: libhadoop.so: dlopen should be better at locating 
libsnappy.so, etc.  (was: libhadoop.so: search java.library.path when calling 
dlopen)

 libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
 

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Can we make this configuration just work without needing to rely on 
 {{LD_LIBRARY_PATH}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.

2012-09-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-8806:
-

Assignee: Colin Patrick McCabe
  Status: Patch Available  (was: Open)

 libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
 

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Can we make this configuration just work without needing to rely on 
 {{LD_LIBRARY_PATH}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.

2012-09-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456120#comment-13456120
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

I have tested this patch with Yarn running TestDFSIO, and it works.  I can drop 
{{libsnappy.so}} into {{$HADOOP_ROOT/lib/native}} (the same directory that 
contains {{libhadoop.so}}) and everything just works.

No {{LD_LIBRARY_PATH}} required, and no code changes required.

 libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
 

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Can we make this configuration just work without needing to rely on 
 {{LD_LIBRARY_PATH}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Open  (was: Patch Available)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Patch Available  (was: Open)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456134#comment-13456134
 ] 

Ivan Mitic commented on HADOOP-8731:


Thanks Vinod, these are great comments! Some answers below.

{quote}In the case of a real cluster, and with HDFS, the definition of a public 
dist-cache file is one which is accessible to all users; snd HDFS also has 
posix style permissions. The method isPublic() eventually is used by the 
JobClient to figure out which of the user-needed artifacts are public and which 
are not. So in the distributed-cluster case with DFS, this definition of 
public-cache doesn't need to change irrespective of whether you have Windows or 
Linux underneath.
{quote}
I agree, the JobClient will evaluate whether a file should be public or 
private. Now, if I understood things correctly, based on whether the file is 
marked public or private on the JobClient side, it will be later downloaded 
from DFS to the public or private LFS location on the TT machine. What we are 
proposing with this change is to change the logic on the JobClient side that 
determines whether the file is public or private. Given that all files are by 
default private on Windows, it would be a real challenge for users to upload a 
file to the public distributed cache if we keep the old model (see my previous 
comments). Does this make sense? Please do comment, maybe I just didn't 
understand correctly how DC works.

bq. I believe TT absolutely needs to set ugo+rx for dirs containing expanded 
archives. This is needed to address some of the artifacts which retain 
permissions from the original bits that a user upload. So let's not move/change 
that code out of the archives code block.
Ah, didn't think of this. Will revert back the original chmod.
bq. And for files, can you tell me why the 2nd line in the code-fragment shown 
below doesn't already do it correctly on Windows? It may in fact be because of 
some other bug, so asking - is it not enough to set correct permissions on the 
file itself in case of Windows? 
You're right, let me debug to see what was the problem here, I made this fix a 
while back.


 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8809) RPMs should skip useradds if the users already exist

2012-09-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456145#comment-13456145
 ] 

Eli Collins commented on HADOOP-8809:
-

Bigtop has a fix for this btw. Now that Bigtop is becoming a TLP what do people 
thing of just removing the packaging code so we don't maintain the same thing 
in two projects?

 RPMs should skip useradds if the users already exist
 

 Key: HADOOP-8809
 URL: https://issues.apache.org/jira/browse/HADOOP-8809
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 1.0.3
Reporter: Steve Loughran
Priority: Minor

 The hadoop.spec preinstall script creates users -but it does this even if 
 they already exist. This may causes problems if the installation has already 
 got those users with different uids. A check with {{id}} can avoid this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456148#comment-13456148
 ] 

Hadoop QA commented on HADOOP-8806:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545207/HADOOP-8806.003.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverController

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//console

This message is automatically generated.

 libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
 

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Can we make this configuration just work without needing to rely on 
 {{LD_LIBRARY_PATH}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8733) TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows

2012-09-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456158#comment-13456158
 ] 

Ivan Mitic commented on HADOOP-8733:


Thanks again for reviewing Vinod!

bq. One minor point: In TestJvmManager, instead of creating dummy file for 
WINDOWS, will it be possible to simulate the Child code like on Linux. Is 
final String jvmName = ManagementFactory.getRuntimeMXBean().getName(); in 
Child.java the call that is used to send pid from Child to TT? If so, we should 
just simulate that code.
I'm not sure I understand your comment. Do you mind clarifying a bit? 

Just to provide some context from my side. On Windows, we don't use ProcessID 
to identify the task process, instead we used attemptId string. This ID is tied 
to the child task using Windows 
[JobObjects|http://msdn.microsoft.com/en-us/library/windows/desktop/ms684161(v=vs.85).aspx],
 and TT uses this ID to kill the task if needed.

Now, in TestJvmManager we are verifying that the task is killed properly, and 
on Windows there is no need to circulate the PID from the task back to the TT, 
as the TT has this info already. Hope this helps.

 TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail 
 on Windows
 ---

 Key: HADOOP-8733
 URL: https://issues.apache.org/jira/browse/HADOOP-8733
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8733-scripts.2.patch, 
 HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.patch


 Jira tracking test failures related to test .sh script dependencies. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Open  (was: Patch Available)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Status: Patch Available  (was: Open)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Attachment: HADOOP-8805-v3.patch

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-14 Thread Bo Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated HADOOP-8805:


Attachment: (was: HADOOP-8805-v3.patch)

 Move protocol buffer implementation of GetUserMappingProtocol from HDFS to 
 Common
 -

 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Bo Wang
Assignee: Bo Wang
 Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, 
 HADOOP-8805-v3.patch


 org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
 We should move the protocol buffer implementation from HDFS to Common so that 
 it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8812:
---

 Summary: ExitUtil#terminate should print Exception#toString 
 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8812:


Attachment: hadoop-8812.txt

Patch attached.  Thanks Steve for the good suggestion.

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8812:


Status: Patch Available  (was: Open)

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456196#comment-13456196
 ] 

Hadoop QA commented on HADOOP-8812:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545226/hadoop-8812.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//console

This message is automatically generated.

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456205#comment-13456205
 ] 

Todd Lipcon commented on HADOOP-8812:
-

Why not replace the whole thing with StringUtils.stringifyException?

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8457) Address file ownership issue for users in Administrators group on Windows.

2012-09-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456212#comment-13456212
 ] 

Sanjay Radia commented on HADOOP-8457:
--

+1

 Address file ownership issue for users in Administrators group on Windows.
 --

 Key: HADOOP-8457
 URL: https://issues.apache.org/jira/browse/HADOOP-8457
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 1.1.0
Reporter: Chuan Liu
Assignee: Ivan Mitic
Priority: Minor
 Attachments: HADOOP-8457-branch-1-win_Admins(2).patch, 
 HADOOP-8457-branch-1-win_Admins(3).patch, 
 HADOOP-8457-branch-1-win_Admins.patch


 On Linux, the initial file owners are the creators. (I think this is true in 
 general. If there are exceptions, please let me know.) On Windows, the file 
 created by a user in the Administrators group has the initial owner 
 ‘Administrators’, i.e. the the Administrators group is the initial owner of 
 the file. As a result, this leads to an exception when we check file 
 ownership in SecureIOUtils .checkStat() method. As a result, this method is 
 disabled right now. We need to address this problem and enable the method on 
 Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable

2012-09-14 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned HADOOP-7930:
-

Assignee: Robert Kanter

 Kerberos relogin interval in UserGroupInformation should be configurable
 

 Key: HADOOP-7930
 URL: https://issues.apache.org/jira/browse/HADOOP-7930
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Fix For: 0.23.3, 0.24.0


 Currently the check done in the *hasSufficientTimeElapsed()* method is 
 hardcoded to 10 mins wait.
 The wait time should be driven by configuration and its default value, for 
 clients should be 1 min. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable

2012-09-14 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated HADOOP-7930:
--

Status: Patch Available  (was: Open)

 Kerberos relogin interval in UserGroupInformation should be configurable
 

 Key: HADOOP-7930
 URL: https://issues.apache.org/jira/browse/HADOOP-7930
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Fix For: 0.23.3, 0.24.0

 Attachments: HADOOP-7930.patch


 Currently the check done in the *hasSufficientTimeElapsed()* method is 
 hardcoded to 10 mins wait.
 The wait time should be driven by configuration and its default value, for 
 clients should be 1 min. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable

2012-09-14 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated HADOOP-7930:
--

Attachment: HADOOP-7930.patch

I added a property called {{hadoop.kerberos.min.time.before.relogin}} that is 
used to specify the relogin interval.  

 Kerberos relogin interval in UserGroupInformation should be configurable
 

 Key: HADOOP-7930
 URL: https://issues.apache.org/jira/browse/HADOOP-7930
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Fix For: 0.23.3, 0.24.0

 Attachments: HADOOP-7930.patch


 Currently the check done in the *hasSufficientTimeElapsed()* method is 
 hardcoded to 10 mins wait.
 The wait time should be driven by configuration and its default value, for 
 clients should be 1 min. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8812:


Attachment: hadoop-8812.txt

That's better. Patch attached, fixes the same in a test while we're at it.

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt, hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456220#comment-13456220
 ] 

Eli Collins commented on HADOOP-8812:
-

Btw terminate will LOG#fatal with the message which is why I am just passing 
the stringified exception.

 ExitUtil#terminate should print Exception#toString 
 ---

 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hadoop-8812.txt, hadoop-8812.txt


 Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
 rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456222#comment-13456222
 ] 

Eli Collins commented on HADOOP-8780:
-

Hey guys,

Looks like this change introduced a findbugs warning:
https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html

 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, 
 HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.

2012-09-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456228#comment-13456228
 ] 

Eli Collins commented on HADOOP-8806:
-

Overall seems reasonable.
- Why use RPATH instead of RUNPATH?
- Have you tested with libsnappy.so in $HADOOP_ROOT/lib/native as well as 
installed in the system (ie in LD_LIBRARY_PATH)?

 libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
 

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8806.003.patch


 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Can we make this configuration just work without needing to rely on 
 {{LD_LIBRARY_PATH}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8780) Update DeprecatedProperties apt file

2012-09-14 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated HADOOP-8780:
-

Attachment: HADOOP-8780_amendment.patch

This is weird, I don't know why it wasn't showing on my report here: 
https://issues.apache.org/jira/browse/HADOOP-8780?focusedCommentId=13454161page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13454161

I am attaching a minor amendment that takes care of this findbugs warning.

Eli, Tom mentioned earlier that jenkins is not running test-patch as it is 
timing out when running the hdfs tests. Any idea how to fix that so we can 
directly get the test-patch report?

 Update DeprecatedProperties apt file
 

 Key: HADOOP-8780
 URL: https://issues.apache.org/jira/browse/HADOOP-8780
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8780_amendment.patch, HADOOP-8780.patch, 
 HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch


 The current list of deprecated properties is not up-to-date. I'll will upload 
 a patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >