[jira] [Created] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path

2012-09-13 Thread Gera Shegalov (JIRA)
Gera Shegalov created HADOOP-8797:
-

 Summary: automatically detect JAVA_HOME on Linux, report native 
lib path similar to class path
 Key: HADOOP-8797
 URL: https://issues.apache.org/jira/browse/HADOOP-8797
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: Linux
Reporter: Gera Shegalov
Priority: Trivial


Enhancement 1)
iterate common java locations on Linux starting with Java7 down to Java6

Enhancement 2)
hadoop jnipath to print java.library.path similar to hadoop classpath


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path

2012-09-13 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HADOOP-8797:
--

Attachment: HADOOP-8797.patch

Please review this patch

 automatically detect JAVA_HOME on Linux, report native lib path similar to 
 class path
 -

 Key: HADOOP-8797
 URL: https://issues.apache.org/jira/browse/HADOOP-8797
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: Linux
Reporter: Gera Shegalov
Priority: Trivial
 Attachments: HADOOP-8797.patch


 Enhancement 1)
 iterate common java locations on Linux starting with Java7 down to Java6
 Enhancement 2)
 hadoop jnipath to print java.library.path similar to hadoop classpath

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8799) commons-lang version mismatch

2012-09-13 Thread Joel Costigliola (JIRA)
Joel Costigliola created HADOOP-8799:


 Summary: commons-lang version mismatch
 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Joel Costigliola


hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
references commons-lang:jar:2.6 as shown in maven dependency:tree command 
output extract.

{noformat}
org.apache.hadoop:hadoop-core:jar:1.0.3:provided
+- commons-cli:commons-cli:jar:1.2:provided
+- xmlenc:xmlenc:jar:0.52:provided
+- commons-httpclient:commons-httpclient:jar:3.0.1:provided
+- commons-codec:commons-codec:jar:1.4:provided
+- org.apache.commons:commons-math:jar:2.1:provided
+- commons-configuration:commons-configuration:jar:1.6:provided
|  +- commons-collections:commons-collections:jar:3.2.1:provided
|  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
{noformat}

Hadoop install libs should be consistent with hadoop-core maven dependencies.

I found this error because I was using a feature available in commons-lang.2.6 
that was failing when executed in my hadoop cluster (but not with m pigunit 
tests).

A last remark, it would be nice to display the classpath used by hadoop cluster 
while executing a job, because these kinds of errors are not easy to find.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration

2012-09-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454880#comment-13454880
 ] 

Ted Malaska commented on HADOOP-8787:
-

Cool.  Thanks Alejandro.  I will get a updated patch soon.

 KerberosAuthenticationHandler should include missing property names in 
 configuration
 

 Key: HADOOP-8787
 URL: https://issues.apache.org/jira/browse/HADOOP-8787
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Ted Malaska
Priority: Minor
  Labels: newbie
 Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, 
 HADOOP-8787-2.patch


 Currently, if the spnego keytab is missing from the configuration, the user 
 gets an error like: javax.servlet.ServletException: Principal not defined in 
 configuration. This should be augmented to actually show the configuration 
 variable which is missing. Otherwise it is hard for a user to know what to 
 fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8800) Dynamic Compress Stream

2012-09-13 Thread yankay (JIRA)
yankay created HADOOP-8800:
--

 Summary: Dynamic Compress Stream
 Key: HADOOP-8800
 URL: https://issues.apache.org/jira/browse/HADOOP-8800
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io
Affects Versions: 2.0.1-alpha
Reporter: yankay


We use compress in MapReduce in some case because It use CPU to improve IO 
throughput.

But we can only set one compress algorithm in configure file. The hadoop 
cluster is changing every time.  So a compress algorithm may not work well in 
all case. 

Why not provide a algorithm named dynamic. It can change compress level and 
algorithm dynamic based on performance. Like tcp, it starts up slowly, and try 
run faster and faster.

I would write a detail design here, and try to submit a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream

2012-09-13 Thread yankay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yankay updated HADOOP-8800:
---

Description: 
We use compress in MapReduce in some case because It use CPU to improve IO 
throughput.

But we can only set one compress algorithm in configure file. The hadoop 
cluster is changing every time.  So a compress algorithm may not work well in 
all case. 

Why not provide an algorithm named dynamic. It can change compress level and 
algorithm dynamicly based on performance. Like tcp, it starts up slowly, and 
try run faster and faster. It can make the io faster by choose a more suitable 
compress algorithm.

I would write a detail design here, and try to submit a patch.

  was:
We use compress in MapReduce in some case because It use CPU to improve IO 
throughput.

But we can only set one compress algorithm in configure file. The hadoop 
cluster is changing every time.  So a compress algorithm may not work well in 
all case. 

Why not provide a algorithm named dynamic. It can change compress level and 
algorithm dynamic based on performance. Like tcp, it starts up slowly, and try 
run faster and faster.

I would write a detail design here, and try to submit a patch.


 Dynamic Compress Stream
 ---

 Key: HADOOP-8800
 URL: https://issues.apache.org/jira/browse/HADOOP-8800
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io
Affects Versions: 2.0.1-alpha
Reporter: yankay
  Labels: patch
   Original Estimate: 168h
  Remaining Estimate: 168h

 We use compress in MapReduce in some case because It use CPU to improve IO 
 throughput.
 But we can only set one compress algorithm in configure file. The hadoop 
 cluster is changing every time.  So a compress algorithm may not work well in 
 all case. 
 Why not provide an algorithm named dynamic. It can change compress level and 
 algorithm dynamicly based on performance. Like tcp, it starts up slowly, and 
 try run faster and faster. It can make the io faster by choose a more 
 suitable compress algorithm.
 I would write a detail design here, and try to submit a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration

2012-09-13 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HADOOP-8787:


Attachment: HADOOP-8787-3.patch

Applied changed based on review.

Major changes:
1. KerberosAuthenticationHandler now can get config_prefix from properties.
2. AuthenticationFilter.getConfiguration will now put the config_prefix into 
the newly created properties object
3. Also added additional tests to test KerberosAuthenticationHandler new 
exceptions.

 KerberosAuthenticationHandler should include missing property names in 
 configuration
 

 Key: HADOOP-8787
 URL: https://issues.apache.org/jira/browse/HADOOP-8787
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Ted Malaska
Priority: Minor
  Labels: newbie
 Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, 
 HADOOP-8787-2.patch, HADOOP-8787-3.patch


 Currently, if the spnego keytab is missing from the configuration, the user 
 gets an error like: javax.servlet.ServletException: Principal not defined in 
 configuration. This should be augmented to actually show the configuration 
 variable which is missing. Otherwise it is hard for a user to know what to 
 fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration

2012-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454941#comment-13454941
 ] 

Hadoop QA commented on HADOOP-8787:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544989/HADOOP-8787-3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1452//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1452//console

This message is automatically generated.

 KerberosAuthenticationHandler should include missing property names in 
 configuration
 

 Key: HADOOP-8787
 URL: https://issues.apache.org/jira/browse/HADOOP-8787
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Ted Malaska
Priority: Minor
  Labels: newbie
 Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, 
 HADOOP-8787-2.patch, HADOOP-8787-3.patch


 Currently, if the spnego keytab is missing from the configuration, the user 
 gets an error like: javax.servlet.ServletException: Principal not defined in 
 configuration. This should be augmented to actually show the configuration 
 variable which is missing. Otherwise it is hard for a user to know what to 
 fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache

2012-09-13 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454943#comment-13454943
 ] 

Ivan Mitic commented on HADOOP-8734:


Thanks Bikas.

bq. So if I understand this right, this fixes a generic deficiency in 
LocalJobRunner which wasnt showing up because by default files are public to 
read on Linux FS and so LocalJobRunner would not see issues in accessing 
private distributed cache from the local FS.
Correct, this is how I see the problem.

bq. Also, this would make the change to TestMRWithDistributedCache unnecessary?
Given that I'm making a bug fix I should also add a test case that catches the 
bug. In this case, it was enough to slightly modify one test to catch the bug. 
Make sense?


 LocalJobRunner does not support private distributed cache
 -

 Key: HADOOP-8734
 URL: https://issues.apache.org/jira/browse/HADOOP-8734
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8734-LocalJobRunner.patch


 It seems that LocalJobRunner does not support private distributed cache. The 
 issue is more visible on Windows as all DC files are private by default (see 
 HADOOP-8731).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8801:
---

 Summary: ExitUtil#terminate should capture the exception stack 
trace
 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt

ExitUtil#terminate(status,Throwable) should capture and log the stack trace of 
the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8801:


Attachment: hadoop-8801.txt

Patch attached.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8801:


Status: Patch Available  (was: Open)

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454972#comment-13454972
 ] 

Karthik Kambatla commented on HADOOP-8801:
--

+1

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454999#comment-13454999
 ] 

Hadoop QA commented on HADOOP-8801:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544995/hadoop-8801.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverController

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1453//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1453//console

This message is automatically generated.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11

2012-09-13 Thread Amir Sanjar (JIRA)
Amir Sanjar created HADOOP-8802:
---

 Summary: TestUserGroupInformation testcase fails using IBM JDK 6.0 
SR11
 Key: HADOOP-8802
 URL: https://issues.apache.org/jira/browse/HADOOP-8802
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.3
 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, 
x86_64
Reporter: Amir Sanjar
 Fix For: 1.0.3


Testsuite: org.apache.hadoop.security.TestUserGroupInformation
Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec
- Standard Output ---
2012-09-13 10:57:59,771 WARN  conf.Configuration 
(Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the 
classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, 
mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, 
mapred-default.xml and hdfs-default.xml respectively
sanjar:sanjar dialout desktop_admin_r
-  ---

Testcase: testGetServerSideGroups took 0.036 sec
Caused an ERROR
expected:d[ialout] but was:d[esktop_admin_r]
at 
org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11

2012-09-13 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated HADOOP-8802:


Priority: Minor  (was: Major)

 TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
 --

 Key: HADOOP-8802
 URL: https://issues.apache.org/jira/browse/HADOOP-8802
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.3
 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, 
 x86_64
Reporter: Amir Sanjar
Priority: Minor
 Fix For: 1.0.3


 Testsuite: org.apache.hadoop.security.TestUserGroupInformation
 Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec
 - Standard Output ---
 2012-09-13 10:57:59,771 WARN  conf.Configuration 
 (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the 
 classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, 
 mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, 
 mapred-default.xml and hdfs-default.xml respectively
 sanjar:sanjar dialout desktop_admin_r
 -  ---
 Testcase: testGetServerSideGroups took 0.036 sec
   Caused an ERROR
 expected:d[ialout] but was:d[esktop_admin_r]
   at 
 org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11

2012-09-13 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated HADOOP-8802:


Attachment: HADOOP-8802.patch

This is a bug in hadoop testcase, order of hostname saved should be irrelevant. 
Testcase should not assume hostname order, in this case. 

Solution: validate stored host name without inforcing the order:

for(int i=0; i  gi.length; i++) {
  assertEquals(groups.get(i), gi[i]);   check based on order, removed..
  assertTrue(groups.contains(gi[i]));  solution
 }
Note: this solution will work both on IBM JAVA and SUN JAVA

 TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
 --

 Key: HADOOP-8802
 URL: https://issues.apache.org/jira/browse/HADOOP-8802
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.3
 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, 
 x86_64
Reporter: Amir Sanjar
Priority: Minor
 Fix For: 1.0.3

 Attachments: HADOOP-8802.patch


 Testsuite: org.apache.hadoop.security.TestUserGroupInformation
 Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec
 - Standard Output ---
 2012-09-13 10:57:59,771 WARN  conf.Configuration 
 (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the 
 classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, 
 mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, 
 mapred-default.xml and hdfs-default.xml respectively
 sanjar:sanjar dialout desktop_admin_r
 -  ---
 Testcase: testGetServerSideGroups took 0.036 sec
   Caused an ERROR
 expected:d[ialout] but was:d[esktop_admin_r]
   at 
 org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8768) TestDistCp is @ignored

2012-09-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8768:


Priority: Critical  (was: Minor)

 TestDistCp is @ignored
 --

 Key: HADOOP-8768
 URL: https://issues.apache.org/jira/browse/HADOOP-8768
 Project: Hadoop Common
  Issue Type: Bug
  Components: test, tools/distcp
Affects Versions: 2.0.2-alpha
Reporter: Colin Patrick McCabe
Priority: Critical

 We should fix TestDistCp so that it actually runs, rather than being ignored.
 {code}
 @ignore
 public class TestDistCp {
   private static final Log LOG = LogFactory.getLog(TestDistCp.class);
   private static ListPath pathList = new ArrayListPath();
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)
Xianqing Yu created HADOOP-8803:
---

 Summary: Make Hadoop running more secure public cloud envrionment
 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu


I have two major goals in the project.

One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop 
access control is based on user or block granularity, e.g. HDFS Delegation 
Token only check if the file can be accessed by certain user or not, Block 
Token only proof which block or blocks can be accessed. I would like to make 
Hadoop can do byte-granularity access control, each access party, user or 
task process can only access the bytes she or he least needed.

Second one is that make Hadoop work more secure in Cloud environment, 
especially in public Cloud environment. So the communication between 
hadoop's node should be protected. And if some nodes of hadoop is 
compromised, the damage should be minimized (e.g. known wildly shared-key 
problem of Block Access Token problem).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8768) TestDistCp is @ignored

2012-09-13 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455021#comment-13455021
 ] 

Eli Collins commented on HADOOP-8768:
-

I don't think so, I pinged MR-2765.

 TestDistCp is @ignored
 --

 Key: HADOOP-8768
 URL: https://issues.apache.org/jira/browse/HADOOP-8768
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.2-alpha
Reporter: Colin Patrick McCabe
Priority: Critical

 We should fix TestDistCp so that it actually runs, rather than being ignored.
 {code}
 @ignore
 public class TestDistCp {
   private static final Log LOG = LogFactory.getLog(TestDistCp.class);
   private static ListPath pathList = new ArrayListPath();
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

2012-09-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455024#comment-13455024
 ] 

Bikas Saha commented on HADOOP-8731:


Looks like the chmod fixes an existing generic bug.

Can you please clarify the following scenario so that other folks reading this 
thread have it easy?
Directory A (perm for user Foo) contains directory B (perm for Everyone)
So contents of A will be private cache and contents of B will be public cache 
on Windows but not on Linux.


 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455025#comment-13455025
 ] 

Xianqing Yu commented on HADOOP-8803:
-

I would like to discuss this topic with hadoop community to see if people want 
or need those features in future's Hadoop. Please post your thoughts here.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I have two major goals in the project.
 One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I would like to make 
 Hadoop can do byte-granularity access control, each access party, user or 
 task process can only access the bytes she or he least needed.
 Second one is that make Hadoop work more secure in Cloud environment, 
 especially in public Cloud environment. So the communication between 
 hadoop's node should be protected. And if some nodes of hadoop is 
 compromised, the damage should be minimized (e.g. known wildly shared-key 
 problem of Block Access Token problem).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache

2012-09-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455043#comment-13455043
 ] 

Bikas Saha commented on HADOOP-8734:


Sorry. I got totally confused and misread the test file name in the patch. +1. 
Thanks!

 LocalJobRunner does not support private distributed cache
 -

 Key: HADOOP-8734
 URL: https://issues.apache.org/jira/browse/HADOOP-8734
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8734-LocalJobRunner.patch


 It seems that LocalJobRunner does not support private distributed cache. The 
 issue is more visible on Windows as all DC files are private by default (see 
 HADOOP-8731).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-7688) When a servlet filter throws an exception in init(..), the Jetty server failed silently.

2012-09-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455052#comment-13455052
 ] 

Uma Maheswara Rao G commented on HADOOP-7688:
-

Ported to branch-2. Committed revision 1384416.

 When a servlet filter throws an exception in init(..), the Jetty server 
 failed silently. 
 -

 Key: HADOOP-7688
 URL: https://issues.apache.org/jira/browse/HADOOP-7688
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 0.23.0, 0.24.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0

 Attachments: filter-init-exception-test.patch, 
 HADOOP-7688-branch-2.patch, HADOOP-7688.patch, 
 org.apache.hadoop.http.TestServletFilter-output.txt


 When a servlet filter throws a ServletException in init(..), the exception is 
 logged by Jetty but not re-throws to the caller.  As a result, the Jetty 
 server failed silently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-7688) When a servlet filter throws an exception in init(..), the Jetty server failed silently.

2012-09-13 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-7688:


Attachment: HADOOP-7688-branch-2.patch

here is the ported patch.

 When a servlet filter throws an exception in init(..), the Jetty server 
 failed silently. 
 -

 Key: HADOOP-7688
 URL: https://issues.apache.org/jira/browse/HADOOP-7688
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 0.23.0, 0.24.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0

 Attachments: filter-init-exception-test.patch, 
 HADOOP-7688-branch-2.patch, HADOOP-7688.patch, 
 org.apache.hadoop.http.TestServletFilter-output.txt


 When a servlet filter throws a ServletException in init(..), the exception is 
 logged by Jetty but not re-throws to the caller.  As a result, the Jetty 
 server failed silently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455056#comment-13455056
 ] 

Aaron T. Myers commented on HADOOP-8801:


+1, the patch looks good to me.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HADOOP-8795:
--

Assignee: Sean Mackrory

 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sean Mackrory
Assignee: Sean Mackrory
 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HADOOP-8795:
---

  Component/s: scripts
 Priority: Minor  (was: Major)
 Target Version/s: 2.0.3-alpha
Affects Version/s: 2.0.0-alpha

+1, the patch looks good to me. I'll commit this momentarily.

 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8801:


  Resolution: Fixed
   Fix Version/s: 2.0.2-alpha
Target Version/s:   (was: 2.0.2-alpha)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the reviews guys. I've committed this and merged to branch-2 and 
branch-2.0.2-alpha.

 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455085#comment-13455085
 ] 

Hudson commented on HADOOP-8801:


Integrated in Hadoop-Common-trunk-Commit #2729 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2729/])
HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. 
Contributed by Eli Collins (Revision 1384435)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455086#comment-13455086
 ] 

Hudson commented on HADOOP-8795:


Integrated in Hadoop-Common-trunk-Commit #2729 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2729/])
HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to 
executable is specified. Contributed by Sean Mackrory. (Revision 1384436)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh


 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455089#comment-13455089
 ] 

Hudson commented on HADOOP-8801:


Integrated in Hadoop-Hdfs-trunk-Commit #2792 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2792/])
HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. 
Contributed by Eli Collins (Revision 1384435)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455090#comment-13455090
 ] 

Hudson commented on HADOOP-8795:


Integrated in Hadoop-Hdfs-trunk-Commit #2792 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2792/])
HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to 
executable is specified. Contributed by Sean Mackrory. (Revision 1384436)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh


 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-8795:


  Resolution: Fixed
   Fix Version/s: 2.0.3-alpha
Target Version/s:   (was: 2.0.3-alpha)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455092#comment-13455092
 ] 

Aaron T. Myers commented on HADOOP-8795:


I've just committed this to trunk and branch-2. Thanks a lot for the 
contribution, Sean.

 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8796) commands_manual.html link is broken

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HADOOP-8796:
---

Target Version/s: 2.0.3-alpha
   Fix Version/s: (was: 2.0.2-alpha)

Thanks a lot for filing this issue, Roman.

In the future, please only set the fix version field once the patch has been 
committed. To indicate what branch you'd like to see this issue fixed on, 
please use the target version field.

 commands_manual.html link is broken
 ---

 Key: HADOOP-8796
 URL: https://issues.apache.org/jira/browse/HADOOP-8796
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.1-alpha
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor

 If you go to http://hadoop.apache.org/docs/r2.0.0-alpha/ and click on Hadoop 
 Commands you are getting a broken link: 
 http://hadoop.apache.org/docs/r2.0.0-alpha/hadoop-project-dist/hadoop-common/commands_manual.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing Yu updated HADOOP-8803:


Description: 
I am a Ph.D student in North Carolina State University. I am modifying the 
Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
TaskTracker, NameNode, DataNode) to achieve better security.
 
My major goal is that make Hadoop running more secure in the Cloud environment, 
especially for public Cloud environment. In order to achieve that, I redesign 
the currently security mechanism and achieve following proprieties:

1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
access control is based on user or block granularity, e.g. HDFS Delegation 
Token only check if the file can be accessed by certain user or not, Block 
Token only proof which block or blocks can be accessed. I make Hadoop can do 
byte-granularity access control, each access party, user or task process can 
only access the bytes she or he least needed.

2. I assume that in the public Cloud environment, only Namenode, secondary 
Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker 
may be compromised due to some of them may be running under less secure 
environment. So I re-design the secure mechanism to make the damage the hacker 
can do to be minimized.
 
a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. 
In original Block Access Token design, all HDFS (Namenode and Datanode) share 
one master key to generate Block Access Token, if one DataNode is compromised 
by hacker, the hacker can get the key and generate any  Block Access Token he 
or she want.
 
b. Re-design the HDFS Delegation Token to do fine-grain access control for 
TaskTracker and Map-Reduce Task process on HDFS. 
 
In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to 
access any files for MapReduce on HDFS. So they have the same privilege as 
JobTracker to do read or write tokens, copy job file, etc.. However, if one of 
them is compromised, every critical thing in MapReduce directory (job file, 
Delegation Token) is exposed to attacker. I solve the problem by making 
JobTracker to decide which TaskTracker can access which file in MapReduce 
Directory on HDFS.
 
For Task process, once it get HDFS Delegation Token, it can access everything 
belong to this job or user on HDFS. By my design, it can only access the bytes 
it needed from HDFS.
 
There are some other improvement in the security, such as TaskTracker can not 
know some information like blockID from the Block Token (because it is 
encrypted by my way), and HDFS can set up secure channel to send data as a 
option.
 
By those features, Hadoop can run much securely under uncertain environment 
such as Public Cloud. I already start to test my prototype. I want to know that 
whether community is interesting about my work? Is that a value work to 
contribute to production Hadoop?


  was:
I have two major goals in the project.

One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop 
access control is based on user or block granularity, e.g. HDFS Delegation 
Token only check if the file can be accessed by certain user or not, Block 
Token only proof which block or blocks can be accessed. I would like to make 
Hadoop can do byte-granularity access control, each access party, user or 
task process can only access the bytes she or he least needed.

Second one is that make Hadoop work more secure in Cloud environment, 
especially in public Cloud environment. So the communication between 
hadoop's node should be protected. And if some nodes of hadoop is 
compromised, the damage should be minimized (e.g. known wildly shared-key 
problem of Block Access Token problem).


 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed 

[jira] [Updated] (HADOOP-8763) Set group owner on Windows failed

2012-09-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-8763:


Description: 
RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a 
file on Windows.

Specifically the following function in RawLocalFileSystem class will fail on 
Windows when username is null, i.e. only set group ownership.
{code}
public void setOwner(Path p, String username, String groupname)
{code}


  was:RawLocalFileSystem.setOwner() method may incorrectly set the group owner 
of a file on Windows.


 Set group owner on Windows failed
 -

 Key: HADOOP-8763
 URL: https://issues.apache.org/jira/browse/HADOOP-8763
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 1-win

 Attachments: HADOOP-8763-branch-1-win-2.patch, 
 HADOOP-8763-branch-1-win.patch


 RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a 
 file on Windows.
 Specifically the following function in RawLocalFileSystem class will fail on 
 Windows when username is null, i.e. only set group ownership.
 {code}
 public void setOwner(Path p, String username, String groupname)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init

2012-09-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455112#comment-13455112
 ] 

Uma Maheswara Rao G commented on HADOOP-8786:
-

Back-ported to branch-2 Committed revision 1384456. Attached ported patch.

 HttpServer continues to start even if AuthenticationFilter fails to init
 

 Key: HADOOP-8786
 URL: https://issues.apache.org/jira/browse/HADOOP-8786
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0

 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt


 As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the 
 web server will continue to start up. We need to check for context 
 initialization errors after starting the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init

2012-09-13 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HADOOP-8786:


Attachment: HADOOP-8786-branch-2.patch

 HttpServer continues to start even if AuthenticationFilter fails to init
 

 Key: HADOOP-8786
 URL: https://issues.apache.org/jira/browse/HADOOP-8786
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0

 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt


 As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the 
 web server will continue to start up. We need to check for context 
 initialization errors after starting the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init

2012-09-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455113#comment-13455113
 ] 

Uma Maheswara Rao G commented on HADOOP-8786:
-

I will port this to branch-1 in some time later.

 HttpServer continues to start even if AuthenticationFilter fails to init
 

 Key: HADOOP-8786
 URL: https://issues.apache.org/jira/browse/HADOOP-8786
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0

 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt


 As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the 
 web server will continue to start up. We need to check for context 
 initialization errors after starting the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8804) Improve Web UIs when the wildcard address is used

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8804:
---

 Summary: Improve Web UIs when the wildcard address is used
 Key: HADOOP-8804
 URL: https://issues.apache.org/jira/browse/HADOOP-8804
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Eli Collins
Priority: Minor


When IPC addresses are bound to the wildcard (ie the default config) the NN, JT 
(and probably RM etc) Web UIs are a little goofy. Eg 0 Hadoop Map/Reduce 
Administration and NameNode '0.0.0.0:18021' (active). Let's improve them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455120#comment-13455120
 ] 

Todd Lipcon commented on HADOOP-8803:
-

Hi Xianqing,

To me, the latter is much more interesting than the former. Byte-range access 
control implies either an incredibly large amount of meta-data per file, or 
implies HDFS having semantic understanding of the files it stores. Neither 
seems tenable given our architecture and design goals.

Increasing the granularity of access control provided by the token mechanisms 
could be useful, but you may run up against compatibility issues. It will 
require a bit of finesse to ensure that old clients continue to operate 
compatibly, etc. So, it may turn out to be interested research, but don't be 
discouraged if the amount of work to make it feasible to commit into the 
mainline is too much to be worth it.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455130#comment-13455130
 ] 

Hudson commented on HADOOP-8801:


Integrated in Hadoop-Mapreduce-trunk-Commit #2753 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2753/])
HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. 
Contributed by Eli Collins (Revision 1384435)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java


 ExitUtil#terminate should capture the exception stack trace
 ---

 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.2-alpha

 Attachments: hadoop-8801.txt


 ExitUtil#terminate(status,Throwable) should capture and log the stack trace 
 of the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455131#comment-13455131
 ] 

Hudson commented on HADOOP-8795:


Integrated in Hadoop-Mapreduce-trunk-Commit #2753 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2753/])
HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to 
executable is specified. Contributed by Sean Mackrory. (Revision 1384436)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh


 BASH tab completion doesn't look in PATH, assumes path to executable is 
 specified
 -

 Key: HADOOP-8795
 URL: https://issues.apache.org/jira/browse/HADOOP-8795
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.0.0-alpha
Reporter: Sean Mackrory
Assignee: Sean Mackrory
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8795.patch


 bash-tab-completion/hadoop.sh checks that the first token in the command is 
 an existing, executable file - which assumes that the path to the hadoop 
 executable is specified (or that it's in the working directory). If the 
 executable is somewhere else in PATH, tab completion will not work.
 I propose that the first token be passed through 'which' so that any 
 executables in the path also get detected. I've tested that this technique 
 will work in the event that relative and absolute paths are used as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8763) Set group owner on Windows failed

2012-09-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455156#comment-13455156
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8763:
-

We can just leave around the public constant Shell.SET_GROUP_COMMAND or 
deprecate it. I am okay leaving it around.

Not sure of your usage of asserts vs exit-code, but in src/winutils/chown.c, 
instead of asserts for zero-length string, we should log a msg to stderr and 
return an EXIT_FAILURE? Also, if both are empty also you should return 
EXIT_FAILURE?

bq. +On Linux, if a colon but no group name follows the user name, the group 
of\n\
+the files is changed to that user\'s login group.
This code won't be invoked on linux, because, ahm, this is winutils? In any 
case, that is not behaviour I know, a chown user: filename shouldn't change 
the group-name

 Set group owner on Windows failed
 -

 Key: HADOOP-8763
 URL: https://issues.apache.org/jira/browse/HADOOP-8763
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 1-win

 Attachments: HADOOP-8763-branch-1-win-2.patch, 
 HADOOP-8763-branch-1-win.patch


 RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a 
 file on Windows.
 Specifically the following function in RawLocalFileSystem class will fail on 
 Windows when username is null, i.e. only set group ownership.
 {code}
 public void setOwner(Path p, String username, String groupname)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455162#comment-13455162
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Hi Todd,

Byte-range access control is mainly used for minimized the attacker's damage if 
one tasktracker is compromised. So if one user use DFSClient to read his or her 
file, that would be no different from current way. Byte-range access control 
would be used when tasktracker or task try to access HDFS. For instance, when 
JobClient do FileSplit, it would know that those bytes in HDFS should be used 
as input for this task, then it should make sure later this task process can 
only access those bytes. The similar story for tasktracker. So the goal is that 
minimized the hacker can get once the hacker compromise one machine.

Compatibility is a issue, but I minimized that by only use byte-level access 
control when it is necessary. Currently it is only used by task to access input 
file, and tasktracker to read job file, job configuration file, and Delegation 
Token. BTW, in my design, each task would use different HDFS Delegation Token, 
even for one job. So by stealing one Delegation Token, hacker can only read the 
input data which one task needed.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows

2012-09-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned HADOOP-8731:
---

Assignee: Vinod Kumar Vavilapalli  (was: Ivan Mitic)

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Vinod Kumar Vavilapalli
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455167#comment-13455167
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Hi Todd,

Just to clarify my point, I don't need to modify current HDFS file system. I 
don't need meta-data per file. Only thing I modified is access control check 
function. Byte-level check currently only be used by tasktracker and task. 
JobTracker would know which bytes need to be accessed by which tasktracker and 
jobClient would know which bytes need to be accessed by which task. That is why 
I don't need meta-data per file.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455181#comment-13455181
 ] 

Todd Lipcon commented on HADOOP-8803:
-

Oh, I see. So, you'd modify getBlockLocations() so that it returns a block 
token which is byte-range restricted?

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455188#comment-13455188
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Simply said, yes. But more than that. Not just block token contain byte-range 
restricted, Delegation Token, which is used to get block token, also contain 
byte-range restricted information. JobClient or JobTracker would is the source 
to generate byte-range information. Both are authenticated themselves by 
kerberos. 

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455221#comment-13455221
 ] 

Todd Lipcon commented on HADOOP-8803:
-

Makes sense, but I think it's going to be difficult to plumb through the 
various abstractions here in a clean way that doesn't introduce specific 
dependencies on FileInputFormat, etc. Will be interesting to see how it goes.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455267#comment-13455267
 ] 

Xianqing Yu commented on HADOOP-8803:
-

My code indeed spread wildly over the hadoop, such as jobtracker, tasktracker, 
namenode, datanode, jobclient. But I don't need to change FileInputFormat. 
Instead, I let JobClient getSplits as usual, and generate tokens according to 
the splits result. In other words, I don't make decision how to split the 
files, but I use split result to decide how to generate tokens.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows

2012-09-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned HADOOP-8731:
---

Assignee: Ivan Mitic  (was: Vinod Kumar Vavilapalli)

Reverting accidental assignment.

 Public distributed cache support for Windows
 

 Key: HADOOP-8731
 URL: https://issues.apache.org/jira/browse/HADOOP-8731
 Project: Hadoop Common
  Issue Type: Bug
  Components: filecache
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8731-PublicCache.patch


 A distributed cache file is considered public (sharable between MR jobs) if 
 OTHER has read permissions on the file and +x permissions all the way up in 
 the folder hierarchy. By default, Windows permissions are mapped to 700 all 
 the way up to the drive letter, and it is unreasonable to ask users to change 
 the permission on the whole drive to make the file public. IOW, it is hardly 
 possible to have public distributed cache on Windows. 
 To enable the scenario and make it more Windows friendly, the criteria on 
 when a file is considered public should be relaxed. One proposal is to check 
 whether the user has given EVERYONE group permission on the file only (and 
 discard the +x check on parent folders).
 Security considerations for the proposal: Default permissions on Unix 
 platforms are usually 775 or 755 meaning that OTHER users can read and 
 list folders by default. What this also means is that Hadoop users have to 
 explicitly make the files private in order to make them private in the 
 cluster (please correct me if this is not the case in real life!). On 
 Windows, default permissions are 700. This means that by default all files 
 are private. In the new model, if users want to make them public, they have 
 to explicitly add EVERYONE group permissions on the file. 
 TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11

2012-09-13 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455273#comment-13455273
 ] 

Andy Isaacson commented on HADOOP-8802:
---

This was fixed on trunk with HADOOP-7290, just FYI.  Probably best to backport 
that patch wholesale.

 TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
 --

 Key: HADOOP-8802
 URL: https://issues.apache.org/jira/browse/HADOOP-8802
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.3
 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, 
 x86_64
Reporter: Amir Sanjar
Priority: Minor
 Fix For: 1.0.3

 Attachments: HADOOP-8802.patch


 Testsuite: org.apache.hadoop.security.TestUserGroupInformation
 Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec
 - Standard Output ---
 2012-09-13 10:57:59,771 WARN  conf.Configuration 
 (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the 
 classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, 
 mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, 
 mapred-default.xml and hdfs-default.xml respectively
 sanjar:sanjar dialout desktop_admin_r
 -  ---
 Testcase: testGetServerSideGroups took 0.036 sec
   Caused an ERROR
 expected:d[ialout] but was:d[esktop_admin_r]
   at 
 org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common

2012-09-13 Thread Bo Wang (JIRA)
Bo Wang created HADOOP-8805:
---

 Summary: Move protocol buffer implementation of 
GetUserMappingProtocol from HDFS to Common
 Key: HADOOP-8805
 URL: https://issues.apache.org/jira/browse/HADOOP-8805
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Bo Wang
Assignee: Bo Wang


org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. 
We should move the protocol buffer implementation from HDFS to Common so that 
it can also be used by YARN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455357#comment-13455357
 ] 

Owen O'Malley commented on HADOOP-8803:
---

The performance impact of trying to do fine grained permission checks at the 
level of byte ranges of particular files will likely be too high to make them 
feasible. Further, almost all MapReduce map tasks require access to byte ranges 
outside of the file split byte range that they are assigned. The most you could 
realistically hope to accomplish for most jobs is to limit the job to 
particular directories or files. (Some jobs require more than this and so this 
would have to be optional.)

There is already a jira to change the block token protocol so that root is not 
required to run a datanode, which will significantly change how the block 
tokens are used.

In security work, generally you divide machines into security zones. All of the 
datanodes/tasktrackers machines are in the same zone, so segregating between 
them isn't very productive. In particular, the datanodes need the ability to be 
trusted by other datanodes. In the same way, the tasktracker is already running 
arbitrary user code and giving it the permissions assigned to the job. 
Enforcing better permission separation between the namenode/jobtracker and the 
datanodes/tasktrackers does make sense and could make the system stronger.


 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more 

[jira] [Commented] (HADOOP-8756) Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455371#comment-13455371
 ] 

Colin Patrick McCabe commented on HADOOP-8756:
--

Filed HADOOP-8806 to discuss searching {{java.library.path}}

 Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH
 ---

 Key: HADOOP-8756
 URL: https://issues.apache.org/jira/browse/HADOOP-8756
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.0.2-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HADOOP-8756.002.patch, HADOOP-8756.003.patch, 
 HADOOP-8756.004.patch


 We use {{System.loadLibrary(snappy)}} from the Java side.  However in 
 libhadoop, we use {{dlopen}} to open libsnappy.so dynamically.  
 System.loadLibrary uses {{java.library.path}} to resolve libraries, and 
 {{dlopen}} uses {{LD_LIBRARY_PATH}} and the system paths to resolve 
 libraries.  Because of this, the two library loading functions can be at odds.
 We should fix this so we only load the library once, preferably using the 
 standard Java {{java.library.path}}.
 We should also log the search path(s) we use for {{libsnappy.so}} when 
 loading fails, so that it's easier to diagnose configuration issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455372#comment-13455372
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

One issue is that if we ever move from using {{dlopen}} to linking directory 
against libsnappy or libz, this trick won't work.  so maybe {{LD_LIBRARY_PATH}} 
is the better way after all?  Hmm.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455375#comment-13455375
 ] 

Todd Lipcon commented on HADOOP-8806:
-

IMO we don't want to link against the system's libsnappy for the forseeable 
future, since it's not widely packaged by distributions enough. libz being an 
ancient library would be more reasonable to link against.

So, I think we should do one of:

1) Continue to use dlopen, but explicitly search the java.library.path. It's 
probably easy enough with JNI to grab this system property.
or 2) Statically link libsnappy.a into libhadoop at build time.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init

2012-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-8786:


Target Version/s: 1.2.0
   Fix Version/s: 2.0.3-alpha

Adding 2.0.3 to fixversions for Uma, who was having some JIRA troubles. Leaving 
open for commit to branch-1

 HttpServer continues to start even if AuthenticationFilter fails to init
 

 Key: HADOOP-8786
 URL: https://issues.apache.org/jira/browse/HADOOP-8786
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt


 As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the 
 web server will continue to start up. We need to check for context 
 initialization errors after starting the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455420#comment-13455420
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Hi Owen,

Thanks for your comments.

The performance is always a trade-off for security design. I am close to test 
performance but not yet. But I do consider performance impact when I design the 
whole thing. So one example  is that, you can see byte-range restrict 
information is generated in JobClient, then it is stored in HDFS Delegation 
Token, then it is transferred to Block Access Token, finally it would be 
checked in DataNodes. Check process is simple, which is that datanode only 
sends out the bytes the byte-range defined. There is only less than 5 lines of 
java code to check that and those extra workload would be spread over datanodes 
in the clusters. (original Hadoop's security check will also be performed, here 
I only talk about byte-range check)

The second point you pointed out is very important. In my current 
implementation, there are two parts are using byte-range check. One is for 
TaskTracker when it want to access MapReduce directory on HDFS. Currently I 
don't see any problem yet. The other part, as you said, is task executing part. 
If it need to access the content beyond file split defined, it would be a 
problem. But I think byte-range access control is very important for many job 
program, and, just as you suggesting, we can include that as an option, and 
leave the choice to Hadoop's user, better secure or easier to write the code.

Thanks for pointing out the change of the block token protocol. I will take a 
look of that. I think that change should be included in 2.0.1, right?

Right, I only fully trust Namenode and Jobtracker. Datanodes and tasktrackers 
are in less secure zone. The main reason is not only for productivity, but also 
for the potential large number of datanodes/tasktrackers (it may increase 
attacking possiblity). 

About the datanodes need the ability to be trusted by other datanodes, I 
decide to leave this work to kerberos. In fact, datanode only trust Namenode by 
using kerberos. All other authentication would be done by using my Block Token.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get 

[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455426#comment-13455426
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

What if we
* linked against the system {{libz.so}}
* statically linked in {{libsnappy.a}}

I think that would simplify things considerably and eliminate a lot of 
hair-pulling over obscure configuration issues.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455433#comment-13455433
 ] 

Todd Lipcon commented on HADOOP-8806:
-

Makes sense to me.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455434#comment-13455434
 ] 

Todd Lipcon commented on HADOOP-8806:
-

re static linking, we should make sure we don't accidentally export the 
libsnappy symbols, though -- we'd like someone to be able to pull in 
libhadoop.so but separately link their own snappy from somewhere else if they 
so choose.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455438#comment-13455438
 ] 

Aaron T. Myers commented on HADOOP-8755:


Not quite sure why test-patch didn't run on this latest patch. Regardless, I've 
just kicked Jenkins manually.

 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455440#comment-13455440
 ] 

Roman Shaposhnik commented on HADOOP-8806:
--

FYI: snappy is now pretty widely available. Even on CentOS 5: 
http://pkgs.org/search/?keyword=snappy

With that in mind, I'd rather link against it dynamically. Especially if we are 
not getting rid of dynamic aspect alltogether (libz will remain to be 
dynamically linked).

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8799) commons-lang version mismatch

2012-09-13 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455452#comment-13455452
 ] 

Giridharan Kesavan commented on HADOOP-8799:


could you pls list the steps to repro? I checked the ivy/libraries.properties 
and the commons-configuration pom for transient dependency, both has 
commons-lang version set to 2.4. 

I tried a local build and this is what I see;
{quote}
[ivy:resolve]   found commons-el#commons-el;1.0 in maven2
[ivy:resolve]   found commons-configuration#commons-configuration;1.6 in maven2
[ivy:resolve]   found commons-collections#commons-collections;3.2.1 in maven2
[ivy:resolve]   found commons-lang#commons-lang;2.4 in maven2
{quote}

could you pls try this with a clean .ivy2 .m2 cache ?

 commons-lang version mismatch
 -

 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 1.0.3
Reporter: Joel Costigliola

 hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
 references commons-lang:jar:2.6 as shown in maven dependency:tree command 
 output extract.
 {noformat}
 org.apache.hadoop:hadoop-core:jar:1.0.3:provided
 +- commons-cli:commons-cli:jar:1.2:provided
 +- xmlenc:xmlenc:jar:0.52:provided
 +- commons-httpclient:commons-httpclient:jar:3.0.1:provided
 +- commons-codec:commons-codec:jar:1.4:provided
 +- org.apache.commons:commons-math:jar:2.1:provided
 +- commons-configuration:commons-configuration:jar:1.6:provided
 |  +- commons-collections:commons-collections:jar:3.2.1:provided
 |  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
 {noformat}
 Hadoop install libs should be consistent with hadoop-core maven dependencies.
 I found this error because I was using a feature available in 
 commons-lang.2.6 that was failing when executed in my hadoop cluster (but not 
 with m pigunit tests).
 A last remark, it would be nice to display the classpath used by hadoop 
 cluster while executing a job, because these kinds of errors are not easy to 
 find.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8799) commons-lang version mismatch

2012-09-13 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HADOOP-8799:
---

Component/s: build

 commons-lang version mismatch
 -

 Key: HADOOP-8799
 URL: https://issues.apache.org/jira/browse/HADOOP-8799
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 1.0.3
Reporter: Joel Costigliola

 hadoop install references commons-lang-2.4.jar while hadoop-core dependency 
 references commons-lang:jar:2.6 as shown in maven dependency:tree command 
 output extract.
 {noformat}
 org.apache.hadoop:hadoop-core:jar:1.0.3:provided
 +- commons-cli:commons-cli:jar:1.2:provided
 +- xmlenc:xmlenc:jar:0.52:provided
 +- commons-httpclient:commons-httpclient:jar:3.0.1:provided
 +- commons-codec:commons-codec:jar:1.4:provided
 +- org.apache.commons:commons-math:jar:2.1:provided
 +- commons-configuration:commons-configuration:jar:1.6:provided
 |  +- commons-collections:commons-collections:jar:3.2.1:provided
 |  +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4)
 {noformat}
 Hadoop install libs should be consistent with hadoop-core maven dependencies.
 I found this error because I was using a feature available in 
 commons-lang.2.6 that was failing when executed in my hadoop cluster (but not 
 with m pigunit tests).
 A last remark, it would be nice to display the classpath used by hadoop 
 cluster while executing a job, because these kinds of errors are not easy to 
 find.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455459#comment-13455459
 ] 

Todd Lipcon commented on HADOOP-8806:
-

bq. FYI: snappy is now pretty widely available. Even on CentOS 5: 
http://pkgs.org/search/?keyword=snappy

Yea, but only in EPEL/elforge, which is a little annoying (not always enabled 
in production systems)

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455470#comment-13455470
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

bq. With that in mind, I'd rather link against it dynamically. Especially if we 
are not getting rid of dynamic aspect alltogether (libz will remain to be 
dynamically linked).

I think a lot of people still don't have libsnappy installed, and they would 
perceive being required to install it to use libhadoop as a regression.  In 
contrast, nearly every system in existence has libz installed.

bq. re static linking, we should make sure we don't accidentally export the 
libsnappy symbols, though – we'd like someone to be able to pull in 
libhadoop.so but separately link their own snappy from somewhere else if they 
so choose.

I agree that we should not export the libsnappy symbols.  I mean really we 
should not be exporting any symbols except the ones that the JVM invokes.  But 
that's a bit of a separate issue.

bq. [epel discussion]

EPEL isn't officially supported by Red Hat, and a lot of systems don't install 
packages from there.  There are other third-party repos for Red Hat, and some 
of them conflict with one another, as I found out (but that's another story)

For now, I think we have to assume that most users will not have libsnappy 
provided by their base OS.  When that changes in a few years we can revisit 
this.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455473#comment-13455473
 ] 

Luke Lu commented on HADOOP-8803:
-

Hi Xianqing, if I understand your proposal correctly, you're essentially trying 
to do two things:

# A more restrictive HDFS delegation token to reduce the damage in case a token 
is compromised. As Owen said, byte range check based on split info won't work 
for many mapreduce jobs, where splits are across record boundaries. The only 
thing that would always work is file level check if you consider all the corner 
cases. You have to design an ACL language to cover all cases, where default 
works for most cases. You'll need to account for all the distributed cache 
files as well.
# Unique secret keys for every datanode to generate block tokens to reduce the 
damage in case a datanode is compromised. This means you need a block token for 
each replica, unlike the current one block token for all replicas. This adds 
some overhead to the normal operations.

A meta question though is that are you sure that all these machinery actually 
increases security in real (vs theorectical) situations? Your implied 
assumption is that compromise/root escalation on DN/TT is random and uniformly 
distributed. My guess is that the assumption is wrong. Security breaches these 
days are mostly due to zero-day bugs in system software that the likelihood of 
all the TT/DN being compromised at the same time is extremely high due to the 
fact they're likely have the same OS/software versions.

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to 

[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455485#comment-13455485
 ] 

Roman Shaposhnik commented on HADOOP-8806:
--

bq. [epel discussion]

I think this is a bit of a red herring here. I'm confident that libsnappy will 
get into the distros with time and when this happens we have to be able to 
offer a choice that is not -- recompile your libhadoop.so. The current 
situation where libsnappy.so gets bundled with hadoop as a separate object that 
can ignored (if needed) is ideal from that standpoint. Statically linking all 
of it into the libhadoop.so is a hammer that I'd rather not use right away. 

At this point the problem is that we've got 2 code path. One in 
org.apache.hadoop.io.compress.snappy that does System.loadLibrary(snappy); 
and is fine and the other one, apparently, in libhadoop.so that uses dlopen(). 
Would it be completely out of the question to focus on unifying the two?

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455495#comment-13455495
 ] 

Allen Wittenauer commented on HADOOP-8806:
--

bq.  However, snappy can't be loaded from this directory unless LD_LIBRARY_PATH 
is set to include this directory

Or, IIRC, dlopen will look in the shared libraries run path (-rpath for those 
using GNU LD, -R for just about everyone else).  This is the preferred way to 
deal with this outside of Java.  See also the $ORIGIN 'macro' to make the path 
dynamic based upon the executable location. There is no reason to really 
hard-code any paths or set LD_LIBRARY_PATH in modern linkers due to these 
features unless you are absolutely doing something crazy.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455498#comment-13455498
 ] 

Allen Wittenauer commented on HADOOP-8797:
--

bq.  iterate common java locations on Linux starting with Java7 down to Java6

FWIW, -1.

 automatically detect JAVA_HOME on Linux, report native lib path similar to 
 class path
 -

 Key: HADOOP-8797
 URL: https://issues.apache.org/jira/browse/HADOOP-8797
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: Linux
Reporter: Gera Shegalov
Priority: Trivial
 Attachments: HADOOP-8797.patch


 Enhancement 1)
 iterate common java locations on Linux starting with Java7 down to Java6
 Enhancement 2)
 hadoop jnipath to print java.library.path similar to hadoop classpath

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Xianqing Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455509#comment-13455509
 ] 

Xianqing Yu commented on HADOOP-8803:
-

Hi Luke,

1. No, more restrictive HDFS delegation token and Block Token are used to do 
byte-range access control, and new Block Token can reduce the damage when Block 
Token key is compromised. As Owen said, I am thinking that put both file-level 
check and byte-level check as options in the configuration file, so users can 
decide which level security they want and which type of check is compatible 
with their code. I would like to test those kinds of job, do you guys have any 
examples of this kind of code I can try to run?

2. Yes, I use unique key. Right, extra block tokens are needed, Each Block 
Token can only be used for one datanode. For example, if I want to access data 
which store on datanode A and B, then Namenode needs to generate two Block 
Tokens and send them to me. This is the largest extra overhead in my design. 
But I think (please correct if I am wrong) for original Hadoop, when one job is 
running, Namenode need to perform Block Token generate operation whenever task 
process need to access. So that means for one job, Namenode need to perform the 
number of Block Token generate operations as the number of mapper. So for my 
work, extra workload is only happening when one mapper need to access data 
which is on more than one datanode. And I don't think that is always happening. 

Another argument is that sharing the same key for all HDFS cluster is too 
risky. This overhead is something hadoop have to paid.

3. It is very interesting question. In the security area, I think it is really 
hard to find perfect security solution, but we always can find a better way. I 
do love the way we can discuss varies possibilities here. Back to your 
question, zero-day breaches is really big threat and that depends on a lot of 
things, as you said, which most of them are beyond the Hadoop itself. TT/DN may 
have the same OS/software version, however, if hadoop is running in public 
cloud, they are maybe running under different cloud provider, and OS may 
different and people who maintaining those machines are different.  

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 

[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455512#comment-13455512
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

bq. [rpath discussion]

The problem is, we don't know at compile-time where libsnappy.so will be.  
Normally there's a make install step where rpaths get injected, but there is 
nothing like that for Hadoop.

Sadly, I have encountered an issue that I think puts the kibosh on the static 
libsnappy idea-- you cannot link a .a into a .so.  I don't know why I didn't 
think of that earlier.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455515#comment-13455515
 ] 

Allen Wittenauer commented on HADOOP-8806:
--

That's why $ORIGIN is a way out of this.  At install time, build a symlink to a 
known path to the out-of-the-way location.

bq.  you cannot link a .a into a .so

Sure you can.  You can always use ar to pull out the objects and then include 
them into your own library.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455519#comment-13455519
 ] 

Allen Wittenauer commented on HADOOP-8806:
--

(p.s., this is pretty much what the compiler does when you statically link...)

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455522#comment-13455522
 ] 

Hadoop QA commented on HADOOP-8755:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544942/HADOOP-8755.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  org.apache.hadoop.hdfs.TestDatanodeBlockScanner

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1454//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1454//console

This message is automatically generated.

 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455525#comment-13455525
 ] 

Andy Isaacson commented on HADOOP-8806:
---

{quote}
bq. you cannot link a .a into a .so

Sure you can. You can always use ar to pull out the objects and then include 
them into your own library.
{quote}
Only if the objects were compiled {{-fPIC}} and any other requirements are met. 
 My understanding is that PIC is still an issue in the amd64 ABI but I'd have 
to go check to make sure...

I'd strongly recommend that we continue to dynamically link against 
libsnappy.so, using LD_LIBRARY_PATH if at all possible, but even parsing 
{{java.library.path}} and iterating it to dlopen would be OK.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HADOOP-8755:
---

Attachment: HADOOP-8755.patch

Patch looks great, except that it needs to include the Apache License header in 
the new files. Here's an updated patch that adds those.

+1, I'm going to commit this momentarily since the difference between this and 
the last is just comments.

 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455527#comment-13455527
 ] 

Allen Wittenauer commented on HADOOP-8806:
--

The problem with LD_LIBRARY_PATH is if you are running something not Java, you 
may accidentally introduce a different/conflicting library than the one the 
compiled program is expecting. That's going to lead to some very strange errors 
to the user.  The other possibility is that the end user will override 
LD_LIBRARY_PATH themselves, which puts us back to the original problem.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HADOOP-8755:
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Andrey!

 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455531#comment-13455531
 ] 

Andy Isaacson commented on HADOOP-8806:
---

Another potential issue -- there is plenty of fun debugging waiting for the 
first developer who tries to have a dynamic libsnappy.so and a static 
snappy.a-in-libhadoop.so in the same executable.  Supposedly that scenario can 
be made to work, but I've had no end of trouble with similar scenarios 
previously.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment

2012-09-13 Thread Kingshuk Chatterjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455532#comment-13455532
 ] 

Kingshuk Chatterjee commented on HADOOP-8803:
-

Xianqing, one thing that we will need to evaluate is the business value for the 
proposed change. In my mind, Hadoop will slowly be used as an infrastructure 
piece of an overall enterprise wide data management platform, instead of being 
accessed directly. Hortonworks HDP and IBM BigInsights are steps toward that 
direction. I know that China Telecom is (or, was) investigating the possibility 
of creating a data mining platform around Hadoop. Long story short, just as 
rarely anyone uses/accesses a RDBMS directly, Hadoop will also see itself being 
wrapped up in middleware layers. And when deployed in a cloud settings, there 
will undoubtedly additional security layers at physical, application, and 
network levels supported and invested by the cloud provider to ensure data 
security.

Needless to say, all these layers will add their own latency to data access.

So my question will be: What business value can we expect to derive from this 
additional security feature in Hadoop? Granted it is open-source, and its our 
collective sweat invested, but we will need to weigh in on what should be 
delegated to the product user, and what should be built into the product. 

What do you think?

 Make Hadoop running more secure public cloud envrionment
 

 Key: HADOOP-8803
 URL: https://issues.apache.org/jira/browse/HADOOP-8803
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, ipc, security
Affects Versions: 0.20.204.0
Reporter: Xianqing Yu
  Labels: hadoop
   Original Estimate: 2m
  Remaining Estimate: 2m

 I am a Ph.D student in North Carolina State University. I am modifying the 
 Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
 TaskTracker, NameNode, DataNode) to achieve better security.
  
 My major goal is that make Hadoop running more secure in the Cloud 
 environment, especially for public Cloud environment. In order to achieve 
 that, I redesign the currently security mechanism and achieve following 
 proprieties:
 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
 access control is based on user or block granularity, e.g. HDFS Delegation 
 Token only check if the file can be accessed by certain user or not, Block 
 Token only proof which block or blocks can be accessed. I make Hadoop can do 
 byte-granularity access control, each access party, user or task process can 
 only access the bytes she or he least needed.
 2. I assume that in the public Cloud environment, only Namenode, secondary 
 Namenode, JobTracker can be trusted. A large number of Datanode and 
 TaskTracker may be compromised due to some of them may be running under less 
 secure environment. So I re-design the secure mechanism to make the damage 
 the hacker can do to be minimized.
  
 a. Re-design the Block Access Token to solve wildly shared-key problem of 
 HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
 share one master key to generate Block Access Token, if one DataNode is 
 compromised by hacker, the hacker can get the key and generate any  Block 
 Access Token he or she want.
  
 b. Re-design the HDFS Delegation Token to do fine-grain access control for 
 TaskTracker and Map-Reduce Task process on HDFS. 
  
 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
 to access any files for MapReduce on HDFS. So they have the same privilege as 
 JobTracker to do read or write tokens, copy job file, etc.. However, if one 
 of them is compromised, every critical thing in MapReduce directory (job 
 file, Delegation Token) is exposed to attacker. I solve the problem by making 
 JobTracker to decide which TaskTracker can access which file in MapReduce 
 Directory on HDFS.
  
 For Task process, once it get HDFS Delegation Token, it can access everything 
 belong to this job or user on HDFS. By my design, it can only access the 
 bytes it needed from HDFS.
  
 There are some other improvement in the security, such as TaskTracker can not 
 know some information like blockID from the Block Token (because it is 
 encrypted by my way), and HDFS can set up secure channel to send data as a 
 option.
  
 By those features, Hadoop can run much securely under uncertain environment 
 such as Public Cloud. I already start to test my prototype. I want to know 
 that whether community is interesting about my work? Is that a value work to 
 contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more 

[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455535#comment-13455535
 ] 

Hudson commented on HADOOP-8755:


Integrated in Hadoop-Hdfs-trunk-Commit #2794 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2794/])
HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed 
by Andrey Klochkov. (Revision 1384627)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml


 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455534#comment-13455534
 ] 

Hudson commented on HADOOP-8755:


Integrated in Hadoop-Common-trunk-Commit #2731 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2731/])
HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed 
by Andrey Klochkov. (Revision 1384627)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml


 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455540#comment-13455540
 ] 

Hudson commented on HADOOP-8755:


Integrated in Hadoop-Mapreduce-trunk-Commit #2755 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2755/])
HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed 
by Andrey Klochkov. (Revision 1384627)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml


 Print thread dump when tests fail due to timeout 
 -

 Key: HADOOP-8755
 URL: https://issues.apache.org/jira/browse/HADOOP-8755
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, 
 HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, 
 HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch


 When a test fails due to timeout it's often not clear what is the root cause. 
 See HDFS-3364 as an example.
 We can print dump of all threads in this case, this may help finding causes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455549#comment-13455549
 ] 

Colin Patrick McCabe commented on HADOOP-8806:
--

On x86_64, you cannot link a .a into a .so unless the .a was compiled with 
-fPIC.  Give it a try if you are curious.

The issue here as I see it as the a lot of people seem to want to put 
{{libsnappy.so}} in the same folder as {{libhadoop.so}}.  They believe that by 
doing this, we will use that library.  However, currently we do not.  So we 
need to eliminate that difference between people's expectations and reality 
somehow.

A lot of things have been proposed:

* we could manually search {{java.library.path}}, but that is more complex.  
Also, it doesn't work for shared libraries that we link against normally.  
Since every discussion we've ever had about {{dlopen}} has ended with ... and 
eventually, we won't have to do this, that seems like a major downside.

* we could add {{java.library.path}} to {{LD_LIBRARY_PATH}}.  That solves the 
problem for both dlopen'ed and normally linked shared libraries, but it 
requires some changes to initialization scripts.  Alan has argued that this may 
lead to unintended code being loaded.  However, if you can drop evil jars into 
the {{java.library.path}}, you can already compromise the system, so this seems 
specious.  (You could also drop an evil {{libhadoop.so}} into 
{{java.library.path}}, if you have write access to that path.)  Basically if 
you can write to {{java.library.path}}, you have own the system-- simple as 
that.

* we could use {{System.loadLibrary}} to load the shared library, and then use 
{{dlopen(RTLD_NOLOAD | RTLD_GLOBAL)}} to make the library's symbols accessible 
to {{libhadoop.so}}.  This solves the problem with minimal code change, but 
it's Linux specific, and suffers from a lot of the same problems as the first 
solution.

* static linking was proposed-- but it seems to be infeasible, so forget that.

I think I'm leaning towards solution #2, which would basically mean closing 
this JIRA as WONTFIX.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen

2012-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1342#comment-1342
 ] 

Allen Wittenauer commented on HADOOP-8806:
--

It's pretty clear that I'm not making my point given the summary, so I'm just 
going to let it drop and prepare yet another local patch to back this total 
mess out after it inevitably gets committed.

 libhadoop.so: search java.library.path when calling dlopen
 --

 Key: HADOOP-8806
 URL: https://issues.apache.org/jira/browse/HADOOP-8806
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Priority: Minor

 libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}.  These 
 libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory.  For 
 example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this 
 directory.  However, snappy can't be loaded from this directory unless 
 {{LD_LIBRARY_PATH}} is set to include this directory.
 Should we also search {{java.library.path}} when loading these libraries?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8807) Update README and website to reflect HADOOP-8662

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8807:
---

 Summary: Update README and website to reflect HADOOP-8662
 Key: HADOOP-8807
 URL: https://issues.apache.org/jira/browse/HADOOP-8807
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Eli Collins


HADOOP-8662 removed the various tabs from the website. Our top-level README.txt 
and the generated docs refer to them (eg hadoop.apache.org/core, /hdfs etc). 
Let's fix that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.

2012-09-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455575#comment-13455575
 ] 

Hemanth Yamijala commented on HADOOP-8791:
--

It looks like rm cannot even delete empty directories. Tried this on both 1.0.3 
and trunk. We should modify the documentation to only specify that it deletes 
files, right ?

 rm Only deletes non empty directory and files.
 

 Key: HADOOP-8791
 URL: https://issues.apache.org/jira/browse/HADOOP-8791
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0
Reporter: Bertrand Dechoux
Assignee: Jing Zhao
  Labels: documentation
 Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch


 The documentation (1.0.3) is describing the opposite of what rm does.
 It should be  Only delete files and empty directories.
 With regards to file, the size of the file should not matter, should it?
 OR I am totally misunderstanding the semantic of this command and I am not 
 the only one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives

2012-09-13 Thread Hemanth Yamijala (JIRA)
Hemanth Yamijala created HADOOP-8808:


 Summary: Update FsShell documentation to mention deprecation of 
some of the commands, and mention alternatives
 Key: HADOOP-8808
 URL: https://issues.apache.org/jira/browse/HADOOP-8808
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Hemanth Yamijala
Assignee: Hemanth Yamijala


In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in 
favour of du -s, ls -r and rm -r respectively. The FsShell documentation should 
be updated to mention these, so that users can start switching. Also, there are 
places where we refer to the deprecated commands as alternatives. This can be 
changed as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.

2012-09-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455586#comment-13455586
 ] 

Hemanth Yamijala commented on HADOOP-8791:
--

Also, I think the examples in the same documentation section might need update 
to reflect that empty directories can't be removed.

 rm Only deletes non empty directory and files.
 

 Key: HADOOP-8791
 URL: https://issues.apache.org/jira/browse/HADOOP-8791
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0
Reporter: Bertrand Dechoux
Assignee: Jing Zhao
  Labels: documentation
 Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch


 The documentation (1.0.3) is describing the opposite of what rm does.
 It should be  Only delete files and empty directories.
 With regards to file, the size of the file should not matter, should it?
 OR I am totally misunderstanding the semantic of this command and I am not 
 the only one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8733) TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows

2012-09-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455589#comment-13455589
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8733:
-

Looks good overall, Ivan.

One minor point: In TestJvmManager, instead of creating dummy file for WINDOWS, 
will it be possible to simulate the Child code like on Linux. Is final String 
jvmName = ManagementFactory.getRuntimeMXBean().getName(); in Child.java the 
call that is used to send pid from Child to TT? If so, we should just simulate 
that code.

 TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail 
 on Windows
 ---

 Key: HADOOP-8733
 URL: https://issues.apache.org/jira/browse/HADOOP-8733
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HADOOP-8733-scripts.2.patch, 
 HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.patch


 Jira tracking test failures related to test .sh script dependencies. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira