[jira] [Created] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path
Gera Shegalov created HADOOP-8797: - Summary: automatically detect JAVA_HOME on Linux, report native lib path similar to class path Key: HADOOP-8797 URL: https://issues.apache.org/jira/browse/HADOOP-8797 Project: Hadoop Common Issue Type: Improvement Environment: Linux Reporter: Gera Shegalov Priority: Trivial Enhancement 1) iterate common java locations on Linux starting with Java7 down to Java6 Enhancement 2) hadoop jnipath to print java.library.path similar to hadoop classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path
[ https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HADOOP-8797: -- Attachment: HADOOP-8797.patch Please review this patch automatically detect JAVA_HOME on Linux, report native lib path similar to class path - Key: HADOOP-8797 URL: https://issues.apache.org/jira/browse/HADOOP-8797 Project: Hadoop Common Issue Type: Improvement Environment: Linux Reporter: Gera Shegalov Priority: Trivial Attachments: HADOOP-8797.patch Enhancement 1) iterate common java locations on Linux starting with Java7 down to Java6 Enhancement 2) hadoop jnipath to print java.library.path similar to hadoop classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8799) commons-lang version mismatch
Joel Costigliola created HADOOP-8799: Summary: commons-lang version mismatch Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration
[ https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454880#comment-13454880 ] Ted Malaska commented on HADOOP-8787: - Cool. Thanks Alejandro. I will get a updated patch soon. KerberosAuthenticationHandler should include missing property names in configuration Key: HADOOP-8787 URL: https://issues.apache.org/jira/browse/HADOOP-8787 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Ted Malaska Priority: Minor Labels: newbie Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, HADOOP-8787-2.patch Currently, if the spnego keytab is missing from the configuration, the user gets an error like: javax.servlet.ServletException: Principal not defined in configuration. This should be augmented to actually show the configuration variable which is missing. Otherwise it is hard for a user to know what to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8800) Dynamic Compress Stream
yankay created HADOOP-8800: -- Summary: Dynamic Compress Stream Key: HADOOP-8800 URL: https://issues.apache.org/jira/browse/HADOOP-8800 Project: Hadoop Common Issue Type: New Feature Components: io Affects Versions: 2.0.1-alpha Reporter: yankay We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide a algorithm named dynamic. It can change compress level and algorithm dynamic based on performance. Like tcp, it starts up slowly, and try run faster and faster. I would write a detail design here, and try to submit a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankay updated HADOOP-8800: --- Description: We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide an algorithm named dynamic. It can change compress level and algorithm dynamicly based on performance. Like tcp, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. I would write a detail design here, and try to submit a patch. was: We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide a algorithm named dynamic. It can change compress level and algorithm dynamic based on performance. Like tcp, it starts up slowly, and try run faster and faster. I would write a detail design here, and try to submit a patch. Dynamic Compress Stream --- Key: HADOOP-8800 URL: https://issues.apache.org/jira/browse/HADOOP-8800 Project: Hadoop Common Issue Type: New Feature Components: io Affects Versions: 2.0.1-alpha Reporter: yankay Labels: patch Original Estimate: 168h Remaining Estimate: 168h We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide an algorithm named dynamic. It can change compress level and algorithm dynamicly based on performance. Like tcp, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. I would write a detail design here, and try to submit a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration
[ https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HADOOP-8787: Attachment: HADOOP-8787-3.patch Applied changed based on review. Major changes: 1. KerberosAuthenticationHandler now can get config_prefix from properties. 2. AuthenticationFilter.getConfiguration will now put the config_prefix into the newly created properties object 3. Also added additional tests to test KerberosAuthenticationHandler new exceptions. KerberosAuthenticationHandler should include missing property names in configuration Key: HADOOP-8787 URL: https://issues.apache.org/jira/browse/HADOOP-8787 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Ted Malaska Priority: Minor Labels: newbie Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, HADOOP-8787-2.patch, HADOOP-8787-3.patch Currently, if the spnego keytab is missing from the configuration, the user gets an error like: javax.servlet.ServletException: Principal not defined in configuration. This should be augmented to actually show the configuration variable which is missing. Otherwise it is hard for a user to know what to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8787) KerberosAuthenticationHandler should include missing property names in configuration
[ https://issues.apache.org/jira/browse/HADOOP-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454941#comment-13454941 ] Hadoop QA commented on HADOOP-8787: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544989/HADOOP-8787-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-auth. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1452//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1452//console This message is automatically generated. KerberosAuthenticationHandler should include missing property names in configuration Key: HADOOP-8787 URL: https://issues.apache.org/jira/browse/HADOOP-8787 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 1.0.3, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Ted Malaska Priority: Minor Labels: newbie Attachments: HADOOP-8787-0.patch, HADOOP-8787-1.patch, HADOOP-8787-2.patch, HADOOP-8787-3.patch Currently, if the spnego keytab is missing from the configuration, the user gets an error like: javax.servlet.ServletException: Principal not defined in configuration. This should be augmented to actually show the configuration variable which is missing. Otherwise it is hard for a user to know what to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache
[ https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454943#comment-13454943 ] Ivan Mitic commented on HADOOP-8734: Thanks Bikas. bq. So if I understand this right, this fixes a generic deficiency in LocalJobRunner which wasnt showing up because by default files are public to read on Linux FS and so LocalJobRunner would not see issues in accessing private distributed cache from the local FS. Correct, this is how I see the problem. bq. Also, this would make the change to TestMRWithDistributedCache unnecessary? Given that I'm making a bug fix I should also add a test case that catches the bug. In this case, it was enough to slightly modify one test to catch the bug. Make sense? LocalJobRunner does not support private distributed cache - Key: HADOOP-8734 URL: https://issues.apache.org/jira/browse/HADOOP-8734 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8734-LocalJobRunner.patch It seems that LocalJobRunner does not support private distributed cache. The issue is more visible on Windows as all DC files are private by default (see HADOOP-8731). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
Eli Collins created HADOOP-8801: --- Summary: ExitUtil#terminate should capture the exception stack trace Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8801: Attachment: hadoop-8801.txt Patch attached. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8801: Status: Patch Available (was: Open) ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454972#comment-13454972 ] Karthik Kambatla commented on HADOOP-8801: -- +1 ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454999#comment-13454999 ] Hadoop QA commented on HADOOP-8801: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544995/hadoop-8801.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1453//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1453//console This message is automatically generated. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
Amir Sanjar created HADOOP-8802: --- Summary: TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11 Key: HADOOP-8802 URL: https://issues.apache.org/jira/browse/HADOOP-8802 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.3 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, x86_64 Reporter: Amir Sanjar Fix For: 1.0.3 Testsuite: org.apache.hadoop.security.TestUserGroupInformation Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec - Standard Output --- 2012-09-13 10:57:59,771 WARN conf.Configuration (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively sanjar:sanjar dialout desktop_admin_r - --- Testcase: testGetServerSideGroups took 0.036 sec Caused an ERROR expected:d[ialout] but was:d[esktop_admin_r] at org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
[ https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Sanjar updated HADOOP-8802: Priority: Minor (was: Major) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11 -- Key: HADOOP-8802 URL: https://issues.apache.org/jira/browse/HADOOP-8802 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.3 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, x86_64 Reporter: Amir Sanjar Priority: Minor Fix For: 1.0.3 Testsuite: org.apache.hadoop.security.TestUserGroupInformation Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec - Standard Output --- 2012-09-13 10:57:59,771 WARN conf.Configuration (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively sanjar:sanjar dialout desktop_admin_r - --- Testcase: testGetServerSideGroups took 0.036 sec Caused an ERROR expected:d[ialout] but was:d[esktop_admin_r] at org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
[ https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Sanjar updated HADOOP-8802: Attachment: HADOOP-8802.patch This is a bug in hadoop testcase, order of hostname saved should be irrelevant. Testcase should not assume hostname order, in this case. Solution: validate stored host name without inforcing the order: for(int i=0; i gi.length; i++) { assertEquals(groups.get(i), gi[i]); check based on order, removed.. assertTrue(groups.contains(gi[i])); solution } Note: this solution will work both on IBM JAVA and SUN JAVA TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11 -- Key: HADOOP-8802 URL: https://issues.apache.org/jira/browse/HADOOP-8802 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.3 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, x86_64 Reporter: Amir Sanjar Priority: Minor Fix For: 1.0.3 Attachments: HADOOP-8802.patch Testsuite: org.apache.hadoop.security.TestUserGroupInformation Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec - Standard Output --- 2012-09-13 10:57:59,771 WARN conf.Configuration (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively sanjar:sanjar dialout desktop_admin_r - --- Testcase: testGetServerSideGroups took 0.036 sec Caused an ERROR expected:d[ialout] but was:d[esktop_admin_r] at org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8768) TestDistCp is @ignored
[ https://issues.apache.org/jira/browse/HADOOP-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8768: Priority: Critical (was: Minor) TestDistCp is @ignored -- Key: HADOOP-8768 URL: https://issues.apache.org/jira/browse/HADOOP-8768 Project: Hadoop Common Issue Type: Bug Components: test, tools/distcp Affects Versions: 2.0.2-alpha Reporter: Colin Patrick McCabe Priority: Critical We should fix TestDistCp so that it actually runs, rather than being ignored. {code} @ignore public class TestDistCp { private static final Log LOG = LogFactory.getLog(TestDistCp.class); private static ListPath pathList = new ArrayListPath(); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
Xianqing Yu created HADOOP-8803: --- Summary: Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu I have two major goals in the project. One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I would like to make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. Second one is that make Hadoop work more secure in Cloud environment, especially in public Cloud environment. So the communication between hadoop's node should be protected. And if some nodes of hadoop is compromised, the damage should be minimized (e.g. known wildly shared-key problem of Block Access Token problem). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8768) TestDistCp is @ignored
[ https://issues.apache.org/jira/browse/HADOOP-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455021#comment-13455021 ] Eli Collins commented on HADOOP-8768: - I don't think so, I pinged MR-2765. TestDistCp is @ignored -- Key: HADOOP-8768 URL: https://issues.apache.org/jira/browse/HADOOP-8768 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.2-alpha Reporter: Colin Patrick McCabe Priority: Critical We should fix TestDistCp so that it actually runs, rather than being ignored. {code} @ignore public class TestDistCp { private static final Log LOG = LogFactory.getLog(TestDistCp.class); private static ListPath pathList = new ArrayListPath(); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455024#comment-13455024 ] Bikas Saha commented on HADOOP-8731: Looks like the chmod fixes an existing generic bug. Can you please clarify the following scenario so that other folks reading this thread have it easy? Directory A (perm for user Foo) contains directory B (perm for Everyone) So contents of A will be private cache and contents of B will be public cache on Windows but not on Linux. Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455025#comment-13455025 ] Xianqing Yu commented on HADOOP-8803: - I would like to discuss this topic with hadoop community to see if people want or need those features in future's Hadoop. Please post your thoughts here. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I have two major goals in the project. One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I would like to make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. Second one is that make Hadoop work more secure in Cloud environment, especially in public Cloud environment. So the communication between hadoop's node should be protected. And if some nodes of hadoop is compromised, the damage should be minimized (e.g. known wildly shared-key problem of Block Access Token problem). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache
[ https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455043#comment-13455043 ] Bikas Saha commented on HADOOP-8734: Sorry. I got totally confused and misread the test file name in the patch. +1. Thanks! LocalJobRunner does not support private distributed cache - Key: HADOOP-8734 URL: https://issues.apache.org/jira/browse/HADOOP-8734 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8734-LocalJobRunner.patch It seems that LocalJobRunner does not support private distributed cache. The issue is more visible on Windows as all DC files are private by default (see HADOOP-8731). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7688) When a servlet filter throws an exception in init(..), the Jetty server failed silently.
[ https://issues.apache.org/jira/browse/HADOOP-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455052#comment-13455052 ] Uma Maheswara Rao G commented on HADOOP-7688: - Ported to branch-2. Committed revision 1384416. When a servlet filter throws an exception in init(..), the Jetty server failed silently. - Key: HADOOP-7688 URL: https://issues.apache.org/jira/browse/HADOOP-7688 Project: Hadoop Common Issue Type: Improvement Affects Versions: 0.23.0, 0.24.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Uma Maheswara Rao G Fix For: 3.0.0 Attachments: filter-init-exception-test.patch, HADOOP-7688-branch-2.patch, HADOOP-7688.patch, org.apache.hadoop.http.TestServletFilter-output.txt When a servlet filter throws a ServletException in init(..), the exception is logged by Jetty but not re-throws to the caller. As a result, the Jetty server failed silently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7688) When a servlet filter throws an exception in init(..), the Jetty server failed silently.
[ https://issues.apache.org/jira/browse/HADOOP-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-7688: Attachment: HADOOP-7688-branch-2.patch here is the ported patch. When a servlet filter throws an exception in init(..), the Jetty server failed silently. - Key: HADOOP-7688 URL: https://issues.apache.org/jira/browse/HADOOP-7688 Project: Hadoop Common Issue Type: Improvement Affects Versions: 0.23.0, 0.24.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Uma Maheswara Rao G Fix For: 3.0.0 Attachments: filter-init-exception-test.patch, HADOOP-7688-branch-2.patch, HADOOP-7688.patch, org.apache.hadoop.http.TestServletFilter-output.txt When a servlet filter throws a ServletException in init(..), the exception is logged by Jetty but not re-throws to the caller. As a result, the Jetty server failed silently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455056#comment-13455056 ] Aaron T. Myers commented on HADOOP-8801: +1, the patch looks good to me. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reassigned HADOOP-8795: -- Assignee: Sean Mackrory BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Reporter: Sean Mackrory Assignee: Sean Mackrory Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HADOOP-8795: --- Component/s: scripts Priority: Minor (was: Major) Target Version/s: 2.0.3-alpha Affects Version/s: 2.0.0-alpha +1, the patch looks good to me. I'll commit this momentarily. BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8801: Resolution: Fixed Fix Version/s: 2.0.2-alpha Target Version/s: (was: 2.0.2-alpha) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the reviews guys. I've committed this and merged to branch-2 and branch-2.0.2-alpha. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455085#comment-13455085 ] Hudson commented on HADOOP-8801: Integrated in Hadoop-Common-trunk-Commit #2729 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2729/]) HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. Contributed by Eli Collins (Revision 1384435) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455086#comment-13455086 ] Hudson commented on HADOOP-8795: Integrated in Hadoop-Common-trunk-Commit #2729 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2729/]) HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to executable is specified. Contributed by Sean Mackrory. (Revision 1384436) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455089#comment-13455089 ] Hudson commented on HADOOP-8801: Integrated in Hadoop-Hdfs-trunk-Commit #2792 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2792/]) HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. Contributed by Eli Collins (Revision 1384435) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455090#comment-13455090 ] Hudson commented on HADOOP-8795: Integrated in Hadoop-Hdfs-trunk-Commit #2792 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2792/]) HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to executable is specified. Contributed by Sean Mackrory. (Revision 1384436) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8795: Resolution: Fixed Fix Version/s: 2.0.3-alpha Target Version/s: (was: 2.0.3-alpha) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Fix For: 2.0.3-alpha Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455092#comment-13455092 ] Aaron T. Myers commented on HADOOP-8795: I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Sean. BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Fix For: 2.0.3-alpha Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8796) commands_manual.html link is broken
[ https://issues.apache.org/jira/browse/HADOOP-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HADOOP-8796: --- Target Version/s: 2.0.3-alpha Fix Version/s: (was: 2.0.2-alpha) Thanks a lot for filing this issue, Roman. In the future, please only set the fix version field once the patch has been committed. To indicate what branch you'd like to see this issue fixed on, please use the target version field. commands_manual.html link is broken --- Key: HADOOP-8796 URL: https://issues.apache.org/jira/browse/HADOOP-8796 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.0.1-alpha Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Priority: Minor If you go to http://hadoop.apache.org/docs/r2.0.0-alpha/ and click on Hadoop Commands you are getting a broken link: http://hadoop.apache.org/docs/r2.0.0-alpha/hadoop-project-dist/hadoop-common/commands_manual.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianqing Yu updated HADOOP-8803: Description: I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? was: I have two major goals in the project. One is bring fine-grain access control to Hadoop. Based on 0.20.204, Hadoop access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I would like to make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. Second one is that make Hadoop work more secure in Cloud environment, especially in public Cloud environment. So the communication between hadoop's node should be protected. And if some nodes of hadoop is compromised, the damage should be minimized (e.g. known wildly shared-key problem of Block Access Token problem). Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed
[jira] [Updated] (HADOOP-8763) Set group owner on Windows failed
[ https://issues.apache.org/jira/browse/HADOOP-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HADOOP-8763: Description: RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a file on Windows. Specifically the following function in RawLocalFileSystem class will fail on Windows when username is null, i.e. only set group ownership. {code} public void setOwner(Path p, String username, String groupname) {code} was:RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a file on Windows. Set group owner on Windows failed - Key: HADOOP-8763 URL: https://issues.apache.org/jira/browse/HADOOP-8763 Project: Hadoop Common Issue Type: Bug Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 1-win Attachments: HADOOP-8763-branch-1-win-2.patch, HADOOP-8763-branch-1-win.patch RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a file on Windows. Specifically the following function in RawLocalFileSystem class will fail on Windows when username is null, i.e. only set group ownership. {code} public void setOwner(Path p, String username, String groupname) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init
[ https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455112#comment-13455112 ] Uma Maheswara Rao G commented on HADOOP-8786: - Back-ported to branch-2 Committed revision 1384456. Attached ported patch. HttpServer continues to start even if AuthenticationFilter fails to init Key: HADOOP-8786 URL: https://issues.apache.org/jira/browse/HADOOP-8786 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init
[ https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HADOOP-8786: Attachment: HADOOP-8786-branch-2.patch HttpServer continues to start even if AuthenticationFilter fails to init Key: HADOOP-8786 URL: https://issues.apache.org/jira/browse/HADOOP-8786 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init
[ https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455113#comment-13455113 ] Uma Maheswara Rao G commented on HADOOP-8786: - I will port this to branch-1 in some time later. HttpServer continues to start even if AuthenticationFilter fails to init Key: HADOOP-8786 URL: https://issues.apache.org/jira/browse/HADOOP-8786 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0 Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8804) Improve Web UIs when the wildcard address is used
Eli Collins created HADOOP-8804: --- Summary: Improve Web UIs when the wildcard address is used Key: HADOOP-8804 URL: https://issues.apache.org/jira/browse/HADOOP-8804 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Eli Collins Priority: Minor When IPC addresses are bound to the wildcard (ie the default config) the NN, JT (and probably RM etc) Web UIs are a little goofy. Eg 0 Hadoop Map/Reduce Administration and NameNode '0.0.0.0:18021' (active). Let's improve them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455120#comment-13455120 ] Todd Lipcon commented on HADOOP-8803: - Hi Xianqing, To me, the latter is much more interesting than the former. Byte-range access control implies either an incredibly large amount of meta-data per file, or implies HDFS having semantic understanding of the files it stores. Neither seems tenable given our architecture and design goals. Increasing the granularity of access control provided by the token mechanisms could be useful, but you may run up against compatibility issues. It will require a bit of finesse to ensure that old clients continue to operate compatibly, etc. So, it may turn out to be interested research, but don't be discouraged if the amount of work to make it feasible to commit into the mainline is too much to be worth it. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455130#comment-13455130 ] Hudson commented on HADOOP-8801: Integrated in Hadoop-Mapreduce-trunk-Commit #2753 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2753/]) HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. Contributed by Eli Collins (Revision 1384435) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455131#comment-13455131 ] Hudson commented on HADOOP-8795: Integrated in Hadoop-Mapreduce-trunk-Commit #2753 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2753/]) HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to executable is specified. Contributed by Sean Mackrory. (Revision 1384436) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Fix For: 2.0.3-alpha Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8763) Set group owner on Windows failed
[ https://issues.apache.org/jira/browse/HADOOP-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455156#comment-13455156 ] Vinod Kumar Vavilapalli commented on HADOOP-8763: - We can just leave around the public constant Shell.SET_GROUP_COMMAND or deprecate it. I am okay leaving it around. Not sure of your usage of asserts vs exit-code, but in src/winutils/chown.c, instead of asserts for zero-length string, we should log a msg to stderr and return an EXIT_FAILURE? Also, if both are empty also you should return EXIT_FAILURE? bq. +On Linux, if a colon but no group name follows the user name, the group of\n\ +the files is changed to that user\'s login group. This code won't be invoked on linux, because, ahm, this is winutils? In any case, that is not behaviour I know, a chown user: filename shouldn't change the group-name Set group owner on Windows failed - Key: HADOOP-8763 URL: https://issues.apache.org/jira/browse/HADOOP-8763 Project: Hadoop Common Issue Type: Bug Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 1-win Attachments: HADOOP-8763-branch-1-win-2.patch, HADOOP-8763-branch-1-win.patch RawLocalFileSystem.setOwner() method may incorrectly set the group owner of a file on Windows. Specifically the following function in RawLocalFileSystem class will fail on Windows when username is null, i.e. only set group ownership. {code} public void setOwner(Path p, String username, String groupname) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455162#comment-13455162 ] Xianqing Yu commented on HADOOP-8803: - Hi Todd, Byte-range access control is mainly used for minimized the attacker's damage if one tasktracker is compromised. So if one user use DFSClient to read his or her file, that would be no different from current way. Byte-range access control would be used when tasktracker or task try to access HDFS. For instance, when JobClient do FileSplit, it would know that those bytes in HDFS should be used as input for this task, then it should make sure later this task process can only access those bytes. The similar story for tasktracker. So the goal is that minimized the hacker can get once the hacker compromise one machine. Compatibility is a issue, but I minimized that by only use byte-level access control when it is necessary. Currently it is only used by task to access input file, and tasktracker to read job file, job configuration file, and Delegation Token. BTW, in my design, each task would use different HDFS Delegation Token, even for one job. So by stealing one Delegation Token, hacker can only read the input data which one task needed. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned HADOOP-8731: --- Assignee: Vinod Kumar Vavilapalli (was: Ivan Mitic) Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Vinod Kumar Vavilapalli Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455167#comment-13455167 ] Xianqing Yu commented on HADOOP-8803: - Hi Todd, Just to clarify my point, I don't need to modify current HDFS file system. I don't need meta-data per file. Only thing I modified is access control check function. Byte-level check currently only be used by tasktracker and task. JobTracker would know which bytes need to be accessed by which tasktracker and jobClient would know which bytes need to be accessed by which task. That is why I don't need meta-data per file. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455181#comment-13455181 ] Todd Lipcon commented on HADOOP-8803: - Oh, I see. So, you'd modify getBlockLocations() so that it returns a block token which is byte-range restricted? Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455188#comment-13455188 ] Xianqing Yu commented on HADOOP-8803: - Simply said, yes. But more than that. Not just block token contain byte-range restricted, Delegation Token, which is used to get block token, also contain byte-range restricted information. JobClient or JobTracker would is the source to generate byte-range information. Both are authenticated themselves by kerberos. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455221#comment-13455221 ] Todd Lipcon commented on HADOOP-8803: - Makes sense, but I think it's going to be difficult to plumb through the various abstractions here in a clean way that doesn't introduce specific dependencies on FileInputFormat, etc. Will be interesting to see how it goes. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455267#comment-13455267 ] Xianqing Yu commented on HADOOP-8803: - My code indeed spread wildly over the hadoop, such as jobtracker, tasktracker, namenode, datanode, jobclient. But I don't need to change FileInputFormat. Instead, I let JobClient getSplits as usual, and generate tokens according to the splits result. In other words, I don't make decision how to split the files, but I use split result to decide how to generate tokens. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned HADOOP-8731: --- Assignee: Ivan Mitic (was: Vinod Kumar Vavilapalli) Reverting accidental assignment. Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8802) TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11
[ https://issues.apache.org/jira/browse/HADOOP-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455273#comment-13455273 ] Andy Isaacson commented on HADOOP-8802: --- This was fixed on trunk with HADOOP-7290, just FYI. Probably best to backport that patch wholesale. TestUserGroupInformation testcase fails using IBM JDK 6.0 SR11 -- Key: HADOOP-8802 URL: https://issues.apache.org/jira/browse/HADOOP-8802 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.3 Environment: Build with IBM JAVA 6sr11 sdk, Lunix RHEL 6.2 64bit, x86_64 Reporter: Amir Sanjar Priority: Minor Fix For: 1.0.3 Attachments: HADOOP-8802.patch Testsuite: org.apache.hadoop.security.TestUserGroupInformation Tests run: 10, Failures: 0, Errors: 1, Time elapsed: 0.264 sec - Standard Output --- 2012-09-13 10:57:59,771 WARN conf.Configuration (Configuration.java:clinit(192)) - DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively sanjar:sanjar dialout desktop_admin_r - --- Testcase: testGetServerSideGroups took 0.036 sec Caused an ERROR expected:d[ialout] but was:d[esktop_admin_r] at org.apache.hadoop.security.TestUserGroupInformation.testGetServerSideGroups(TestUserGroupInformation.java:108) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
Bo Wang created HADOOP-8805: --- Summary: Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455357#comment-13455357 ] Owen O'Malley commented on HADOOP-8803: --- The performance impact of trying to do fine grained permission checks at the level of byte ranges of particular files will likely be too high to make them feasible. Further, almost all MapReduce map tasks require access to byte ranges outside of the file split byte range that they are assigned. The most you could realistically hope to accomplish for most jobs is to limit the job to particular directories or files. (Some jobs require more than this and so this would have to be optional.) There is already a jira to change the block token protocol so that root is not required to run a datanode, which will significantly change how the block tokens are used. In security work, generally you divide machines into security zones. All of the datanodes/tasktrackers machines are in the same zone, so segregating between them isn't very productive. In particular, the datanodes need the ability to be trusted by other datanodes. In the same way, the tasktracker is already running arbitrary user code and giving it the permissions assigned to the job. Enforcing better permission separation between the namenode/jobtracker and the datanodes/tasktrackers does make sense and could make the system stronger. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Commented] (HADOOP-8756) Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH
[ https://issues.apache.org/jira/browse/HADOOP-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455371#comment-13455371 ] Colin Patrick McCabe commented on HADOOP-8756: -- Filed HADOOP-8806 to discuss searching {{java.library.path}} Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH --- Key: HADOOP-8756 URL: https://issues.apache.org/jira/browse/HADOOP-8756 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.0.2-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8756.002.patch, HADOOP-8756.003.patch, HADOOP-8756.004.patch We use {{System.loadLibrary(snappy)}} from the Java side. However in libhadoop, we use {{dlopen}} to open libsnappy.so dynamically. System.loadLibrary uses {{java.library.path}} to resolve libraries, and {{dlopen}} uses {{LD_LIBRARY_PATH}} and the system paths to resolve libraries. Because of this, the two library loading functions can be at odds. We should fix this so we only load the library once, preferably using the standard Java {{java.library.path}}. We should also log the search path(s) we use for {{libsnappy.so}} when loading fails, so that it's easier to diagnose configuration issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455372#comment-13455372 ] Colin Patrick McCabe commented on HADOOP-8806: -- One issue is that if we ever move from using {{dlopen}} to linking directory against libsnappy or libz, this trick won't work. so maybe {{LD_LIBRARY_PATH}} is the better way after all? Hmm. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455375#comment-13455375 ] Todd Lipcon commented on HADOOP-8806: - IMO we don't want to link against the system's libsnappy for the forseeable future, since it's not widely packaged by distributions enough. libz being an ancient library would be more reasonable to link against. So, I think we should do one of: 1) Continue to use dlopen, but explicitly search the java.library.path. It's probably easy enough with JNI to grab this system property. or 2) Statically link libsnappy.a into libhadoop at build time. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init
[ https://issues.apache.org/jira/browse/HADOOP-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HADOOP-8786: Target Version/s: 1.2.0 Fix Version/s: 2.0.3-alpha Adding 2.0.3 to fixversions for Uma, who was having some JIRA troubles. Leaving open for commit to branch-1 HttpServer continues to start even if AuthenticationFilter fails to init Key: HADOOP-8786 URL: https://issues.apache.org/jira/browse/HADOOP-8786 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.2.0, 3.0.0, 2.0.1-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.0.3-alpha Attachments: HADOOP-8786-branch-2.patch, hadoop-8786.txt As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455420#comment-13455420 ] Xianqing Yu commented on HADOOP-8803: - Hi Owen, Thanks for your comments. The performance is always a trade-off for security design. I am close to test performance but not yet. But I do consider performance impact when I design the whole thing. So one example is that, you can see byte-range restrict information is generated in JobClient, then it is stored in HDFS Delegation Token, then it is transferred to Block Access Token, finally it would be checked in DataNodes. Check process is simple, which is that datanode only sends out the bytes the byte-range defined. There is only less than 5 lines of java code to check that and those extra workload would be spread over datanodes in the clusters. (original Hadoop's security check will also be performed, here I only talk about byte-range check) The second point you pointed out is very important. In my current implementation, there are two parts are using byte-range check. One is for TaskTracker when it want to access MapReduce directory on HDFS. Currently I don't see any problem yet. The other part, as you said, is task executing part. If it need to access the content beyond file split defined, it would be a problem. But I think byte-range access control is very important for many job program, and, just as you suggesting, we can include that as an option, and leave the choice to Hadoop's user, better secure or easier to write the code. Thanks for pointing out the change of the block token protocol. I will take a look of that. I think that change should be included in 2.0.1, right? Right, I only fully trust Namenode and Jobtracker. Datanodes and tasktrackers are in less secure zone. The main reason is not only for productivity, but also for the potential large number of datanodes/tasktrackers (it may increase attacking possiblity). About the datanodes need the ability to be trusted by other datanodes, I decide to leave this work to kerberos. In fact, datanode only trust Namenode by using kerberos. All other authentication would be done by using my Block Token. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455426#comment-13455426 ] Colin Patrick McCabe commented on HADOOP-8806: -- What if we * linked against the system {{libz.so}} * statically linked in {{libsnappy.a}} I think that would simplify things considerably and eliminate a lot of hair-pulling over obscure configuration issues. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455433#comment-13455433 ] Todd Lipcon commented on HADOOP-8806: - Makes sense to me. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455434#comment-13455434 ] Todd Lipcon commented on HADOOP-8806: - re static linking, we should make sure we don't accidentally export the libsnappy symbols, though -- we'd like someone to be able to pull in libhadoop.so but separately link their own snappy from somewhere else if they so choose. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455438#comment-13455438 ] Aaron T. Myers commented on HADOOP-8755: Not quite sure why test-patch didn't run on this latest patch. Regardless, I've just kicked Jenkins manually. Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455440#comment-13455440 ] Roman Shaposhnik commented on HADOOP-8806: -- FYI: snappy is now pretty widely available. Even on CentOS 5: http://pkgs.org/search/?keyword=snappy With that in mind, I'd rather link against it dynamically. Especially if we are not getting rid of dynamic aspect alltogether (libz will remain to be dynamically linked). libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8799) commons-lang version mismatch
[ https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455452#comment-13455452 ] Giridharan Kesavan commented on HADOOP-8799: could you pls list the steps to repro? I checked the ivy/libraries.properties and the commons-configuration pom for transient dependency, both has commons-lang version set to 2.4. I tried a local build and this is what I see; {quote} [ivy:resolve] found commons-el#commons-el;1.0 in maven2 [ivy:resolve] found commons-configuration#commons-configuration;1.6 in maven2 [ivy:resolve] found commons-collections#commons-collections;3.2.1 in maven2 [ivy:resolve] found commons-lang#commons-lang;2.4 in maven2 {quote} could you pls try this with a clean .ivy2 .m2 cache ? commons-lang version mismatch - Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8799) commons-lang version mismatch
[ https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated HADOOP-8799: --- Component/s: build commons-lang version mismatch - Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455459#comment-13455459 ] Todd Lipcon commented on HADOOP-8806: - bq. FYI: snappy is now pretty widely available. Even on CentOS 5: http://pkgs.org/search/?keyword=snappy Yea, but only in EPEL/elforge, which is a little annoying (not always enabled in production systems) libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455470#comment-13455470 ] Colin Patrick McCabe commented on HADOOP-8806: -- bq. With that in mind, I'd rather link against it dynamically. Especially if we are not getting rid of dynamic aspect alltogether (libz will remain to be dynamically linked). I think a lot of people still don't have libsnappy installed, and they would perceive being required to install it to use libhadoop as a regression. In contrast, nearly every system in existence has libz installed. bq. re static linking, we should make sure we don't accidentally export the libsnappy symbols, though – we'd like someone to be able to pull in libhadoop.so but separately link their own snappy from somewhere else if they so choose. I agree that we should not export the libsnappy symbols. I mean really we should not be exporting any symbols except the ones that the JVM invokes. But that's a bit of a separate issue. bq. [epel discussion] EPEL isn't officially supported by Red Hat, and a lot of systems don't install packages from there. There are other third-party repos for Red Hat, and some of them conflict with one another, as I found out (but that's another story) For now, I think we have to assume that most users will not have libsnappy provided by their base OS. When that changes in a few years we can revisit this. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455473#comment-13455473 ] Luke Lu commented on HADOOP-8803: - Hi Xianqing, if I understand your proposal correctly, you're essentially trying to do two things: # A more restrictive HDFS delegation token to reduce the damage in case a token is compromised. As Owen said, byte range check based on split info won't work for many mapreduce jobs, where splits are across record boundaries. The only thing that would always work is file level check if you consider all the corner cases. You have to design an ACL language to cover all cases, where default works for most cases. You'll need to account for all the distributed cache files as well. # Unique secret keys for every datanode to generate block tokens to reduce the damage in case a datanode is compromised. This means you need a block token for each replica, unlike the current one block token for all replicas. This adds some overhead to the normal operations. A meta question though is that are you sure that all these machinery actually increases security in real (vs theorectical) situations? Your implied assumption is that compromise/root escalation on DN/TT is random and uniformly distributed. My guess is that the assumption is wrong. Security breaches these days are mostly due to zero-day bugs in system software that the likelihood of all the TT/DN being compromised at the same time is extremely high due to the fact they're likely have the same OS/software versions. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455485#comment-13455485 ] Roman Shaposhnik commented on HADOOP-8806: -- bq. [epel discussion] I think this is a bit of a red herring here. I'm confident that libsnappy will get into the distros with time and when this happens we have to be able to offer a choice that is not -- recompile your libhadoop.so. The current situation where libsnappy.so gets bundled with hadoop as a separate object that can ignored (if needed) is ideal from that standpoint. Statically linking all of it into the libhadoop.so is a hammer that I'd rather not use right away. At this point the problem is that we've got 2 code path. One in org.apache.hadoop.io.compress.snappy that does System.loadLibrary(snappy); and is fine and the other one, apparently, in libhadoop.so that uses dlopen(). Would it be completely out of the question to focus on unifying the two? libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455495#comment-13455495 ] Allen Wittenauer commented on HADOOP-8806: -- bq. However, snappy can't be loaded from this directory unless LD_LIBRARY_PATH is set to include this directory Or, IIRC, dlopen will look in the shared libraries run path (-rpath for those using GNU LD, -R for just about everyone else). This is the preferred way to deal with this outside of Java. See also the $ORIGIN 'macro' to make the path dynamic based upon the executable location. There is no reason to really hard-code any paths or set LD_LIBRARY_PATH in modern linkers due to these features unless you are absolutely doing something crazy. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path
[ https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455498#comment-13455498 ] Allen Wittenauer commented on HADOOP-8797: -- bq. iterate common java locations on Linux starting with Java7 down to Java6 FWIW, -1. automatically detect JAVA_HOME on Linux, report native lib path similar to class path - Key: HADOOP-8797 URL: https://issues.apache.org/jira/browse/HADOOP-8797 Project: Hadoop Common Issue Type: Improvement Environment: Linux Reporter: Gera Shegalov Priority: Trivial Attachments: HADOOP-8797.patch Enhancement 1) iterate common java locations on Linux starting with Java7 down to Java6 Enhancement 2) hadoop jnipath to print java.library.path similar to hadoop classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455509#comment-13455509 ] Xianqing Yu commented on HADOOP-8803: - Hi Luke, 1. No, more restrictive HDFS delegation token and Block Token are used to do byte-range access control, and new Block Token can reduce the damage when Block Token key is compromised. As Owen said, I am thinking that put both file-level check and byte-level check as options in the configuration file, so users can decide which level security they want and which type of check is compatible with their code. I would like to test those kinds of job, do you guys have any examples of this kind of code I can try to run? 2. Yes, I use unique key. Right, extra block tokens are needed, Each Block Token can only be used for one datanode. For example, if I want to access data which store on datanode A and B, then Namenode needs to generate two Block Tokens and send them to me. This is the largest extra overhead in my design. But I think (please correct if I am wrong) for original Hadoop, when one job is running, Namenode need to perform Block Token generate operation whenever task process need to access. So that means for one job, Namenode need to perform the number of Block Token generate operations as the number of mapper. So for my work, extra workload is only happening when one mapper need to access data which is on more than one datanode. And I don't think that is always happening. Another argument is that sharing the same key for all HDFS cluster is too risky. This overhead is something hadoop have to paid. 3. It is very interesting question. In the security area, I think it is really hard to find perfect security solution, but we always can find a better way. I do love the way we can discuss varies possibilities here. Back to your question, zero-day breaches is really big threat and that depends on a lot of things, as you said, which most of them are beyond the Hadoop itself. TT/DN may have the same OS/software version, however, if hadoop is running in public cloud, they are maybe running under different cloud provider, and OS may different and people who maintaining those machines are different. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS.
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455512#comment-13455512 ] Colin Patrick McCabe commented on HADOOP-8806: -- bq. [rpath discussion] The problem is, we don't know at compile-time where libsnappy.so will be. Normally there's a make install step where rpaths get injected, but there is nothing like that for Hadoop. Sadly, I have encountered an issue that I think puts the kibosh on the static libsnappy idea-- you cannot link a .a into a .so. I don't know why I didn't think of that earlier. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455515#comment-13455515 ] Allen Wittenauer commented on HADOOP-8806: -- That's why $ORIGIN is a way out of this. At install time, build a symlink to a known path to the out-of-the-way location. bq. you cannot link a .a into a .so Sure you can. You can always use ar to pull out the objects and then include them into your own library. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455519#comment-13455519 ] Allen Wittenauer commented on HADOOP-8806: -- (p.s., this is pretty much what the compiler does when you statically link...) libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455522#comment-13455522 ] Hadoop QA commented on HADOOP-8755: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544942/HADOOP-8755.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1454//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1454//console This message is automatically generated. Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455525#comment-13455525 ] Andy Isaacson commented on HADOOP-8806: --- {quote} bq. you cannot link a .a into a .so Sure you can. You can always use ar to pull out the objects and then include them into your own library. {quote} Only if the objects were compiled {{-fPIC}} and any other requirements are met. My understanding is that PIC is still an issue in the amd64 ABI but I'd have to go check to make sure... I'd strongly recommend that we continue to dynamically link against libsnappy.so, using LD_LIBRARY_PATH if at all possible, but even parsing {{java.library.path}} and iterating it to dlopen would be OK. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HADOOP-8755: --- Attachment: HADOOP-8755.patch Patch looks great, except that it needs to include the Apache License header in the new files. Here's an updated patch that adds those. +1, I'm going to commit this momentarily since the difference between this and the last is just comments. Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455527#comment-13455527 ] Allen Wittenauer commented on HADOOP-8806: -- The problem with LD_LIBRARY_PATH is if you are running something not Java, you may accidentally introduce a different/conflicting library than the one the compiled program is expecting. That's going to lead to some very strange errors to the user. The other possibility is that the end user will override LD_LIBRARY_PATH themselves, which puts us back to the original problem. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HADOOP-8755: --- Resolution: Fixed Fix Version/s: 2.0.3-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andrey! Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455531#comment-13455531 ] Andy Isaacson commented on HADOOP-8806: --- Another potential issue -- there is plenty of fun debugging waiting for the first developer who tries to have a dynamic libsnappy.so and a static snappy.a-in-libhadoop.so in the same executable. Supposedly that scenario can be made to work, but I've had no end of trouble with similar scenarios previously. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455532#comment-13455532 ] Kingshuk Chatterjee commented on HADOOP-8803: - Xianqing, one thing that we will need to evaluate is the business value for the proposed change. In my mind, Hadoop will slowly be used as an infrastructure piece of an overall enterprise wide data management platform, instead of being accessed directly. Hortonworks HDP and IBM BigInsights are steps toward that direction. I know that China Telecom is (or, was) investigating the possibility of creating a data mining platform around Hadoop. Long story short, just as rarely anyone uses/accesses a RDBMS directly, Hadoop will also see itself being wrapped up in middleware layers. And when deployed in a cloud settings, there will undoubtedly additional security layers at physical, application, and network levels supported and invested by the cloud provider to ensure data security. Needless to say, all these layers will add their own latency to data access. So my question will be: What business value can we expect to derive from this additional security feature in Hadoop? Granted it is open-source, and its our collective sweat invested, but we will need to weigh in on what should be delegated to the product user, and what should be built into the product. What do you think? Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up secure channel to send data as a option. By those features, Hadoop can run much securely under uncertain environment such as Public Cloud. I already start to test my prototype. I want to know that whether community is interesting about my work? Is that a value work to contribute to production Hadoop? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455535#comment-13455535 ] Hudson commented on HADOOP-8755: Integrated in Hadoop-Hdfs-trunk-Commit #2794 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2794/]) HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed by Andrey Klochkov. (Revision 1384627) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455534#comment-13455534 ] Hudson commented on HADOOP-8755: Integrated in Hadoop-Common-trunk-Commit #2731 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2731/]) HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed by Andrey Klochkov. (Revision 1384627) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455540#comment-13455540 ] Hudson commented on HADOOP-8755: Integrated in Hadoop-Mapreduce-trunk-Commit #2755 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2755/]) HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed by Andrey Klochkov. (Revision 1384627) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455549#comment-13455549 ] Colin Patrick McCabe commented on HADOOP-8806: -- On x86_64, you cannot link a .a into a .so unless the .a was compiled with -fPIC. Give it a try if you are curious. The issue here as I see it as the a lot of people seem to want to put {{libsnappy.so}} in the same folder as {{libhadoop.so}}. They believe that by doing this, we will use that library. However, currently we do not. So we need to eliminate that difference between people's expectations and reality somehow. A lot of things have been proposed: * we could manually search {{java.library.path}}, but that is more complex. Also, it doesn't work for shared libraries that we link against normally. Since every discussion we've ever had about {{dlopen}} has ended with ... and eventually, we won't have to do this, that seems like a major downside. * we could add {{java.library.path}} to {{LD_LIBRARY_PATH}}. That solves the problem for both dlopen'ed and normally linked shared libraries, but it requires some changes to initialization scripts. Alan has argued that this may lead to unintended code being loaded. However, if you can drop evil jars into the {{java.library.path}}, you can already compromise the system, so this seems specious. (You could also drop an evil {{libhadoop.so}} into {{java.library.path}}, if you have write access to that path.) Basically if you can write to {{java.library.path}}, you have own the system-- simple as that. * we could use {{System.loadLibrary}} to load the shared library, and then use {{dlopen(RTLD_NOLOAD | RTLD_GLOBAL)}} to make the library's symbols accessible to {{libhadoop.so}}. This solves the problem with minimal code change, but it's Linux specific, and suffers from a lot of the same problems as the first solution. * static linking was proposed-- but it seems to be infeasible, so forget that. I think I'm leaning towards solution #2, which would basically mean closing this JIRA as WONTFIX. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1342#comment-1342 ] Allen Wittenauer commented on HADOOP-8806: -- It's pretty clear that I'm not making my point given the summary, so I'm just going to let it drop and prepare yet another local patch to back this total mess out after it inevitably gets committed. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8807) Update README and website to reflect HADOOP-8662
Eli Collins created HADOOP-8807: --- Summary: Update README and website to reflect HADOOP-8662 Key: HADOOP-8807 URL: https://issues.apache.org/jira/browse/HADOOP-8807 Project: Hadoop Common Issue Type: Bug Components: documentation Reporter: Eli Collins HADOOP-8662 removed the various tabs from the website. Our top-level README.txt and the generated docs refer to them (eg hadoop.apache.org/core, /hdfs etc). Let's fix that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.
[ https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455575#comment-13455575 ] Hemanth Yamijala commented on HADOOP-8791: -- It looks like rm cannot even delete empty directories. Tried this on both 1.0.3 and trunk. We should modify the documentation to only specify that it deletes files, right ? rm Only deletes non empty directory and files. Key: HADOOP-8791 URL: https://issues.apache.org/jira/browse/HADOOP-8791 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0 Reporter: Bertrand Dechoux Assignee: Jing Zhao Labels: documentation Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch The documentation (1.0.3) is describing the opposite of what rm does. It should be Only delete files and empty directories. With regards to file, the size of the file should not matter, should it? OR I am totally misunderstanding the semantic of this command and I am not the only one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
Hemanth Yamijala created HADOOP-8808: Summary: Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.
[ https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455586#comment-13455586 ] Hemanth Yamijala commented on HADOOP-8791: -- Also, I think the examples in the same documentation section might need update to reflect that empty directories can't be removed. rm Only deletes non empty directory and files. Key: HADOOP-8791 URL: https://issues.apache.org/jira/browse/HADOOP-8791 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0 Reporter: Bertrand Dechoux Assignee: Jing Zhao Labels: documentation Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch The documentation (1.0.3) is describing the opposite of what rm does. It should be Only delete files and empty directories. With regards to file, the size of the file should not matter, should it? OR I am totally misunderstanding the semantic of this command and I am not the only one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8733) TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows
[ https://issues.apache.org/jira/browse/HADOOP-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455589#comment-13455589 ] Vinod Kumar Vavilapalli commented on HADOOP-8733: - Looks good overall, Ivan. One minor point: In TestJvmManager, instead of creating dummy file for WINDOWS, will it be possible to simulate the Child code like on Linux. Is final String jvmName = ManagementFactory.getRuntimeMXBean().getName(); in Child.java the call that is used to send pid from Child to TT? If so, we should just simulate that code. TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows --- Key: HADOOP-8733 URL: https://issues.apache.org/jira/browse/HADOOP-8733 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.patch Jira tracking test failures related to test .sh script dependencies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira