[jira] [Commented] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455609#comment-13455609 ] Dong Xiang commented on HADOOP-8800: Looking forward to hearing more details about this proposed feature. Dynamic Compress Stream --- Key: HADOOP-8800 URL: https://issues.apache.org/jira/browse/HADOOP-8800 Project: Hadoop Common Issue Type: New Feature Components: io Affects Versions: 2.0.1-alpha Reporter: yankay Labels: patch Original Estimate: 168h Remaining Estimate: 168h We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide an algorithm named dynamic. It can change compress level and algorithm dynamicly based on performance. Like tcp, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. I would write a detail design here, and try to submit a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455613#comment-13455613 ] Colin Patrick McCabe commented on HADOOP-8806: -- Alan, if you have a patch that actually fixes this, maybe you could share it with us? After doing a little more research, it seems that {{rpath}} is not just for binaries. Dynamic libraries can have it too. Although the man page for {{ld.so}} only mentions it in the context of executables, it seems like it can embedded into shared libraries as well. Combine that with ${ORIGIN}, and at least in theory we could find {{libsnappy.so}} by using the path of {{libhadoop.so}}. There's some discussion here: http://stackoverflow.com/questions/6323603/ld-using-rpath-origin-inside-a-shared-library-recursive It's all a little undocumented and weird, and ${ORIGIN} is definitely Linux- (and maybe Solaris?) specific, but it might be better than {{LD_LIBRARY_PATH}}. Maybe. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated HADOOP-8808: - Status: Patch Available (was: Open) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated HADOOP-8808: - Attachment: HADOOP-8808.patch The attached patch adds a note about the deprecation to the commands dus, lsr and rmr. Also completes the documentation for the rm command. Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455635#comment-13455635 ] Hadoop QA commented on HADOOP-8808: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545109/HADOOP-8808.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1456//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1456//console This message is automatically generated. Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455637#comment-13455637 ] Harsh J commented on HADOOP-8808: - One nit: There are currently not being published as they have to be ported to the APT format. Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Affects Version/s: (was: 2.0.0-alpha) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Fix Version/s: 2.0.3-alpha Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.
[ https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455649#comment-13455649 ] Bertrand Dechoux commented on HADOOP-8791: -- I made the same test on the 1.0.3 version and I can confirm that directories can not be delete, being empty or not has no influence... @Hemanth : so yes, the documentation should be updated to say only files. Can someone check for previous versions? Is that a regression? Or a documentation that was never correct? If that's a regression, should it be corrected now that's the 1.0.3 (and I guess the 1.0) behaviour? I also tested on 1.0.3 whether size of files has an impact but it doesn't. Second question : if the observed behaviour is the 'correct' one, it would really beg for a 'rmdir' equivalent command for HDFS. rm Only deletes non empty directory and files. Key: HADOOP-8791 URL: https://issues.apache.org/jira/browse/HADOOP-8791 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0 Reporter: Bertrand Dechoux Assignee: Jing Zhao Labels: documentation Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch The documentation (1.0.3) is describing the opposite of what rm does. It should be Only delete files and empty directories. With regards to file, the size of the file should not matter, should it? OR I am totally misunderstanding the semantic of this command and I am not the only one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Attachment: HADOOP-8805.patch Tested on hadoop-2.1.0 with bin/hdfs groups username Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha Attachments: HADOOP-8805.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Patch Available (was: Open) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha Attachments: HADOOP-8805.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455651#comment-13455651 ] Hadoop QA commented on HADOOP-8805: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545117/HADOOP-8805.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1457//console This message is automatically generated. Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha Attachments: HADOOP-8805.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455652#comment-13455652 ] Luke Lu commented on HADOOP-8803: - bq. 1. No, more restrictive HDFS delegation token and Block Token are used to do byte-range access control, and new Block Token can reduce the damage when Block Token key is compromised. Block tokens are ephemeral and expire in a few minutes as the shared DN secret is refreshed, unlike delegation token that is typically renewed in a longer period and often stored in the local storage. I feel that a delegation token with embedded authorization data (a la MS' PAC extension in Kerberos) is a useful addition, while block token with byte range seems redundant/overkill to me. bq. I would like to test those kinds of job, do you guys have any examples of this kind of code I can try to run? Any tasks that open an hdfs file directly will break with the byte range stuff. e.g. TestDFSIO bq. So for my work, extra workload is only happening when one mapper need to access data which is on more than one datanode. And I don't think that is always happening. Replica selection is done at DFSClient side, so the client gets the block locations of all the replicas and their block token and in your case, tokens. If you don't generate all the tokens for all the replicas, you'll likely have to do extra RPC calls, which is even worse. bq. Another argument is that sharing the same key for all HDFS cluster is too risky. This overhead is something hadoop have to paid. The shared key is in DN memory only and constantly refreshed. Risk only comes from OS/software bugs, which don't help unique keys either, in the big scheme of things. bq. if hadoop is running in public cloud, they are maybe running under different cloud provider, and OS may different and people who maintaining those machines are different. It's highly unlikely that a single cluster would span multiple providers. A more likely scenario would be a cluster in a provider mirroring to a cluster in another provider. For cross provider internet traffic, you'd better do TLS anyway if you care about security. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access
[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.
[ https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455656#comment-13455656 ] Hemanth Yamijala commented on HADOOP-8791: -- The behaviour is the same with 1.0.2 as well. rm Only deletes non empty directory and files. Key: HADOOP-8791 URL: https://issues.apache.org/jira/browse/HADOOP-8791 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0 Reporter: Bertrand Dechoux Assignee: Jing Zhao Labels: documentation Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch The documentation (1.0.3) is describing the opposite of what rm does. It should be Only delete files and empty directories. With regards to file, the size of the file should not matter, should it? OR I am totally misunderstanding the semantic of this command and I am not the only one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Open (was: Patch Available) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha Attachments: HADOOP-8805.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8799) commons-lang version mismatch
[ https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455658#comment-13455658 ] Joel Costigliola commented on HADOOP-8799: -- Sorry, there is no problem, my mistake. commons-lang was set to 2.6 version in some parent pom of my project, and it was overriding the 2.4 version deduced from transitive dependency. Sorry again ! Joel ps : I think still me be useful to display the classpath used when running a job, if you think it is worth it I will create an issue for that. commons-lang version mismatch - Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455659#comment-13455659 ] Hemanth Yamijala commented on HADOOP-8808: -- Wow, ok. Thanks for that info Harsh.. I couldn't find the equivalent for the fs shell guide among the new apt documentation files. Is it missing or intentionally removed ? (In which case we can close this bug as won't fix or some such) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankay updated HADOOP-8800: --- Description: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression | throughput(MB/s)|Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the same. Some are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in a MapReduce Job. * The CPU can be use up when there is a “big job”, but the network is free. The job should not compress any more. * Some node in a cluster have high load, but the others not. We should use different compress algorithm for them. h2. What In my idea, I can transfer a file with blocks. First block may use a compress algorithm such as LZO. After transfer it, we can get some information, and decide what the compress algorithm in the next block. Like TCP, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. h2. Design In a big file transfer, compress and network all takes time. Consider to transfer a fixed size file: {code} T = c/p + r/s {code} Define: * T: the total time used * C:the CPU cycle to compress the file * P:the available CPU GHz for compressing * R:compress radio * S:the available speed(throughput) of network (including decompress) The variable are not fixed, the all can be change mainly by some reason. * C:decide by different content , algorithm and algorithm level * P: decide by CPU , processes at the machine * R: decide by different content , algorithm and algorithm level * S: decide by network, the pairs’ network and processes at the machine. The file is transferred by block. After a block transferred, there is some information we can get: * C/P : the time takes for compress * R : the compress radio for the block * R/S : the time takes for network With the information and some reasonable assumptions, we can forecast each compress algorithm’s performance. The reasonable assumptions are: * In one transfer, the content is similar * P, S is continuous, we can assume that the next P, S is the same as the current one. * C, R is proportional totally. For example, LZO is always faster than ZIP. With the information and reasonable assumptions, we can forecast the next one by: * C2/P2 = (last C1/P1 ) * (avg C2/C1) * R2/S2 = F(R1) / S1 (S1 is known) * F(R1)=R1*(avg R2/R1) Then we can know the time each compress algorithm needs. And choose the best one to compress next block. To optimize the avg values, we must log some statistics information. To statistic the avg, we can set an N = 3, 5…, and change the avg value after each information comes. {code} Avg V = (n-1) / n * Avg V+ V / n {code} h2. Next Work I would try to submit a patch later. And is there anyone interested about that? was: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression | throughput(MB/s)|Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is
[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankay updated HADOOP-8800: --- Description: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression | throughput(MB/s)|Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the same. Some are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in a MapReduce Job. * The CPU can be use up when there is a “big job”, but the network is free. The job should not compress any more. * Some node in a cluster have high load, but the others not. We should use different compress algorithm for them. h2. What In my idea, I can transfer a file with blocks. First block may use a compress algorithm such as LZO. After transfer it, we can get some information, and decide what the compress algorithm in the next block. Like TCP, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. h2. Design In a big file transfer, compress and network all takes time. Consider to transfer a fixed size file: {code} T = c/p + r/s {code} Define: * T: the total time used * C:the CPU cycle to compress the file * P:the available CPU GHz for compressing * R:compress radio * S:the available speed(throughput) of network (including decompress) The variable are not fixed, the all can be change mainly by some reason. * C:decide by different content , algorithm and algorithm level * P: decide by CPU , processes at the machine * R: decide by different content , algorithm and algorithm level * S: decide by network, the pairs’ network and processes at the machine. The file is transferred by block. After a block transferred, there is some information we can get: * C/P : the time takes for compress * R : the compress radio for the block * R/S : the time takes for network With the information and some reasonable assumptions, we can forecast each compress algorithm’s performance. The reasonable assumptions are: * In one transfer, the content is similar * P, S is continuous, we can assume that the next P, S is the same as the current one. * C, R is proportional totally. For example, LZO is always faster than ZIP. With the information and reasonable assumptions, we can forecast the next one by: * C2/P2 = (last C1/P1 ) * (avg C2/C1) * R2/S2 = F(R1) / S1 (S1 is known) * F(R1)=R1*(avg R2/R1) Then we can know the time each compress algorithm needs. And choose the best one to compress next block. To optimize the avg values, we must log some statistics information. To statistic the avg, we can set an N = 3, 5…, and change the avg value after each information comes. {code} Avg V = (n-1) / n * Avg V+ V / n {code} h2. Next Work I would try to submit a patch later. And is there anyone interested about that? was: We use compress in MapReduce in some case because It use CPU to improve IO throughput. But we can only set one compress algorithm in configure file. The hadoop cluster is changing every time. So a compress algorithm may not work well in all case. Why not provide an algorithm named dynamic. It can change compress level and algorithm dynamicly based on performance. Like tcp, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. I would write a detail design here, and try to submit a patch. Dynamic Compress Stream --- Key: HADOOP-8800 URL: https://issues.apache.org/jira/browse/HADOOP-8800 Project: Hadoop Common Issue Type: New Feature Components: io Affects Versions: 2.0.1-alpha Reporter: yankay Labels: patch Original Estimate: 168h Remaining Estimate: 168h h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level
[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankay updated HADOOP-8800: --- Description: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression |throughput(MB/s) |Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the same. Some are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in a MapReduce Job. * The CPU can be use up when there is a “big job”, but the network is free. The job should not compress any more. * Some node in a cluster have high load, but the others not. We should use different compress algorithm for them. h2. What In my idea, I can transfer a file with blocks. First block may use a compress algorithm such as LZO. After transfer it, we can get some information, and decide what the compress algorithm in the next block. Like TCP, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. h2. Design In a big file transfer, compress and network all takes time. Consider to transfer a fixed size file: {code} T = c/p + r/s {code} Define: * T: the total time used * C:the CPU cycle to compress the file * P:the available CPU GHz for compressing * R:compress radio * S:the available speed(throughput) of network (including decompress) The variable are not fixed, the all can be change mainly by some reason. * C:decide by different content , algorithm and algorithm level * P: decide by CPU , processes at the machine * R: decide by different content , algorithm and algorithm level * S: decide by network, the pairs’ network and processes at the machine. The file is transferred by block. After a block transferred, there is some information we can get: * C/P : the time takes for compress * R : the compress radio for the block * R/S : the time takes for network With the information and some reasonable assumptions, we can forecast each compress algorithm’s performance. The reasonable assumptions are: * In one transfer, the content is similar * P, S is continuous, we can assume that the next P, S is the same as the current one. * C, R is proportional totally. For example, LZO is always faster than ZIP. With the information and reasonable assumptions, we can forecast the next one by: * C2/P2 = (last C1/P1 ) * (avg C2/C1) * R2/S2 = F(R1) / S1 (S1 is known) * F(R1)=R1*(avg R2/R1) Then we can know the time each compress algorithm needs. And choose the best one to compress next block. To optimize the avg values, we must log some statistics information. To statistic the avg, we can set an N = 3, 5…, and change the avg value after each information comes. {code} Avg V = (n-1) / n * Avg V+ V / n {code} h2. Next Work I would try to submit a patch later. And is there anyone interested about that? was: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression | throughput(MB/s)|Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the
[jira] [Updated] (HADOOP-8800) Dynamic Compress Stream
[ https://issues.apache.org/jira/browse/HADOOP-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankay updated HADOOP-8800: --- Description: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. | |compression ratio| compression throughput(MB/s)|Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the same. Some are 10MB/s, some are 100MB/s. We cannot choose a perfect compress algorithm in a MapReduce Job. * The CPU can be use up when there is a “big job”, but the network is free. The job should not compress any more. * Some node in a cluster have high load, but the others not. We should use different compress algorithm for them. h2. What In my idea, I can transfer a file with blocks. First block may use a compress algorithm such as LZO. After transfer it, we can get some information, and decide what the compress algorithm in the next block. Like TCP, it starts up slowly, and try run faster and faster. It can make the io faster by choose a more suitable compress algorithm. h2. Design In a big file transfer, compress and network all takes time. Consider to transfer a fixed size file: {code} T = c/p + r/s {code} Define: * T: the total time used * C:the CPU cycle to compress the file * P:the available CPU GHz for compressing * R:compress radio * S:the available speed(throughput) of network (including decompress) The variable are not fixed, the all can be change mainly by some reason. * C:decide by different content , algorithm and algorithm level * P: decide by CPU , processes at the machine * R: decide by different content , algorithm and algorithm level * S: decide by network, the pairs’ network and processes at the machine. The file is transferred by block. After a block transferred, there is some information we can get: * C/P : the time takes for compress * R : the compress radio for the block * R/S : the time takes for network With the information and some reasonable assumptions, we can forecast each compress algorithm’s performance. The reasonable assumptions are: * In one transfer, the content is similar * P, S is continuous, we can assume that the next P, S is the same as the current one. * C, R is proportional totally. For example, LZO is always faster than ZIP. With the information and reasonable assumptions, we can forecast the next one by: * C2/P2 = (last C1/P1 ) * (avg C2/C1) * R2/S2 = F(R1) / S1 (S1 is known) * F(R1)=R1*(avg R2/R1) Then we can know the time each compress algorithm needs. And choose the best one to compress next block. To optimize the avg values, we must log some statistics information. To statistic the avg, we can set an N = 3, 5…, and change the avg value after each information comes. {code} Avg V = (n-1) / n * Avg V+ V / n {code} h2. Next Work I would try to submit a patch later. And is there anyone interested about that? was: h2. Introduce This is a patch to provide a feature that a big file can be transferred in different compress algorithm and level dynamically. So the performance would be better. h2. Why Compression is important in transfer big file. I found that we should need different compress algorithm in different cases. I have tested some cases. ||compression ratio|compression |throughput(MB/s) |Throughput in 100MB/s network |Throughput in 10MB/s network| |ZLIB |35.80% |9.6|9.6|9.6| |LZO|54.40% |101.7 |101.7 |18.38235294| |LIBLZF |54.60% |134.3 |134.3 |18.3150183| |QUICKLZ|54.90% |183.4 |182.1493625|18.21493625| |FASTLZ |56.20% |134.4 |134.4 |17.79359431| |SNAPPY |59.80% |189|167.2240803|16.72240803| |NONE |100% |300|100|10| So there are no “perfect compress algorithm” can suit all cases. I want to build a dynamic “CompressOutputStream” to change “compress algorithm” in runtime. In a busy cluster, The CPU and the network is changing all the time. There are some cases: * The cluster is very huge; the network between each node is not the same.
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455698#comment-13455698 ] Steve Loughran commented on HADOOP-8803: Are you proposing that Hadoop (more precisely HDFS) would run in public cloud infrastructure without any kind of network-layer protections? Because security isn't enough to prevent things like DDoS attacks, and it opens up your entire cluster's dataset to 0-day exploits. Irrespective of what is done with kerberos, byte level security, etc. I would never tell anyone to bring up a Hadoop cluster in public infrastructure without isolating its network by way of iptables, VPN or whatever -with proxies or access restricted to in-cluster hosts in a DMZ-style setup. Some cloud infrastructures do let you specify the network structure (VMware and VBox based systems included), you can do the same with KVM-based systems if the tooling is right (specifically network drivers in the host system). Isolation must be at this level not at the app layer, because you can never be 100% sure that you've fixed all security bugs. Oh, and EC2 bills you for all net traffic that gets passed the router specs that you've declared, so if you do have a wide open system then you get to pay a lot for all the traffic you are rejecting. I can see that your goal of limiting the access of a TT spawned task(s) to the subset of a fileset that they are working with seems like a good goal -but consider that over time the amount of work sent to a TT means that even a compromised machine would get at more data over time. If it's ruthless it could signal fake job completion events to get the data faster than the MR job would do, so get more work sent to it, so collect more data than other machines. You also need to consider that the blocks inside the DN could be compromised, they'd all have to be encrypted by whatever was writing them; the keys to decrypt passed down to the tasks. In a cloud infrastructure the tactic you'd adopt for security relies on VM images -you'd roll the VM back to the previous image regularly, either every 59 minutes (cost effective), or every job. You need to think about DN decommissioning here too, but it's a better story -it's the standard tactic for defending VMs in the DMZ from being compromised for any extended period of time. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455699#comment-13455699 ] Steve Loughran commented on HADOOP-8801: looks OK, but the patch should fix something the original got wrong -it should use {{Throwable.toString()}} not {{Throwable.getMessage()}}. If you want to see why, look at the source for {{NullPointerException}} and work out what message it prints... ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path
[ https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455700#comment-13455700 ] Steve Loughran commented on HADOOP-8797: -1 Detecting {{JAVA_HOME}} is notoriously brittle -it's been moved to bigtop to let the RPM- and deb- specific installations deal with in the ways that best suit their platforms, but even they have to struggle to keep up to date (BIGTOP-523). They can release on a faster cycle than Hadoop itself, which means that such problems are fixed faster. The other reason for bigtop hosting is that is where the functional tests of installation will go, and that's what you need to verify that the JAVA_HOME detection logic goes (and indeed, any of the other Hadoop scripts). Gera -we appreciate the work you've done, but you'd be better off checking out Bigtop and seeing if there are things there that you need to fix for your installations. (and yes, everyone hates the inconsistent placement of Java versions on Linux, as well as difference between {{JAVA_HOME}} and {{JDK_HOME}}. automatically detect JAVA_HOME on Linux, report native lib path similar to class path - Key: HADOOP-8797 URL: https://issues.apache.org/jira/browse/HADOOP-8797 Project: Hadoop Common Issue Type: Improvement Environment: Linux Reporter: Gera Shegalov Priority: Trivial Attachments: HADOOP-8797.patch Enhancement 1) iterate common java locations on Linux starting with Java7 down to Java6 Enhancement 2) hadoop jnipath to print java.library.path similar to hadoop classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455760#comment-13455760 ] Hudson commented on HADOOP-8801: Integrated in Hadoop-Hdfs-trunk #1165 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/]) HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. Contributed by Eli Collins (Revision 1384435) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455762#comment-13455762 ] Hudson commented on HADOOP-8795: Integrated in Hadoop-Hdfs-trunk #1165 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/]) HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to executable is specified. Contributed by Sean Mackrory. (Revision 1384436) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Fix For: 2.0.3-alpha Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455759#comment-13455759 ] Hudson commented on HADOOP-8755: Integrated in Hadoop-Hdfs-trunk #1165 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1165/]) HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed by Andrey Klochkov. (Revision 1384627) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8809) RPMs should skip useradds if the users already exist
Steve Loughran created HADOOP-8809: -- Summary: RPMs should skip useradds if the users already exist Key: HADOOP-8809 URL: https://issues.apache.org/jira/browse/HADOOP-8809 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 1.0.3 Reporter: Steve Loughran Priority: Minor The hadoop.spec preinstall script creates users -but it does this even if they already exist. This may causes problems if the installation has already got those users with different uids. A check with {{id}} can avoid this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455798#comment-13455798 ] Hudson commented on HADOOP-8755: Integrated in Hadoop-Mapreduce-trunk #1196 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/]) HADOOP-8755. Print thread dump when tests fail due to timeout. Contributed by Andrey Klochkov. (Revision 1384627) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384627 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TestTimedOutTestsListener.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455799#comment-13455799 ] Hudson commented on HADOOP-8801: Integrated in Hadoop-Mapreduce-trunk #1196 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/]) HADOOP-8801. ExitUtil#terminate should capture the exception stack trace. Contributed by Eli Collins (Revision 1384435) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384435 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8795) BASH tab completion doesn't look in PATH, assumes path to executable is specified
[ https://issues.apache.org/jira/browse/HADOOP-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455801#comment-13455801 ] Hudson commented on HADOOP-8795: Integrated in Hadoop-Mapreduce-trunk #1196 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1196/]) HADOOP-8795. BASH tab completion doesn't look in PATH, assumes path to executable is specified. Contributed by Sean Mackrory. (Revision 1384436) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384436 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/contrib/bash-tab-completion/hadoop.sh BASH tab completion doesn't look in PATH, assumes path to executable is specified - Key: HADOOP-8795 URL: https://issues.apache.org/jira/browse/HADOOP-8795 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.0-alpha Reporter: Sean Mackrory Assignee: Sean Mackrory Priority: Minor Fix For: 2.0.3-alpha Attachments: HADOOP-8795.patch bash-tab-completion/hadoop.sh checks that the first token in the command is an existing, executable file - which assumes that the path to the hadoop executable is specified (or that it's in the working directory). If the executable is somewhere else in PATH, tab completion will not work. I propose that the first token be passed through 'which' so that any executables in the path also get detected. I've tested that this technique will work in the event that relative and absolute paths are used as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455813#comment-13455813 ] Kihwal Lee commented on HADOOP-8806: bq. These libraries can be bundled in the $HADOOP_ROOT/lib/native directory. For example, the -Dbundle.snappy build option copies libsnappy.so to this directory. However, snappy can't be loaded from this directory unless LD_LIBRARY_PATH is set to include this directory. If this is only about MR jobs, isn't setting {{LD_LIBRARY_PATH}} in {{mapreduce.admin.user.env}} enough? libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455822#comment-13455822 ] Steve Loughran commented on HADOOP-8801: @Eli, this patch went in within 2 hours of being submitted. This effectively prevented any review from anyone not in PST timezone keeping up to date with their JIRA issues. While I celebrate a rapid integration of patches into the tree, I believe this devalues the RTC process, as people like myself can't review, even though -as I did belatedly comment- the patch should use {{toString()}} over {{getMessage()}}, because {{getMessage()}} has the right to return null. # I think this a bad precedent. It means there's nothing to stop me getting together with someone else in the EU and pushing through a set of changes before anyone notices. # As I said, the patch is inadequate. I don't want to revert the patch -it's in- but I'd like my feedback to be incorporated into a new JIRA, as {{s/getMessage()/r/toString()}} enhances the value of the output even more. Do you want to do this? Or shall I? And in either case, can we have slightly more than 2h for review? ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455830#comment-13455830 ] Daryn Sharp commented on HADOOP-8803: - I've done a lot of token work, and have contemplated similar changes. I think it's much more difficult than believed. I've only quickly skimmed the discussion, so I apologize if I've missed some details or if I'm just repeating some of what's already been said. # When blocks can shift during rebalancing, the client will have to request more tokens when the blocks move which will impact performance and complicate the client. # Trying to restrict the hdfs tokens to a path sounds good on the surface, but is very tricky (if not impossible) to do w/o imposing a significant burden on users and the framework. # How does the ipc layer know which path token to use to establish the connection? I suppose it could randomly pick one for the given NN, and then every fs method will have to be modified to send the specific token for the path it's going to access. # Hftp/webhdfs both implicitly acquire a token during initialization. At this point it's not possible to know which paths will be later accessed. # Symlinks, esp. via viewfs, pose interesting problems. Viewfs currently acquires tokens for all mounts because it doesn't know which mounts might be indirectly referenced via a symlink. Trying to discern up and resolve all symlinks up front will be difficult. # In general, it's not really feasible for the job submission client to know every path that will be accessed. #* Jobs that submit jobs can't know all paths that will be accessed, esp. if the sub-job's paths are based on runtime logic of another job. #* Should job submission be oozie, hive, pig, etc aware? #* For instance, the oozie launcher is agnostic to the job launched in its task, so the user has to declare the NNs in the workflow conf if accessing anything other than the default NN. This is different than actual job submission that generally knows the files/dirs that need to be accessed. In practice, the user usually doesn't declare full paths in the conf setting for additional NNs. #* Yarn and other parts of MR rely on the user's tokens to access files too. Should MR need to be yarn aware? #* The job client won't know about the behavior of downstream pluggable items like shuffle handlers. #* The job client won't know about dynamic runtime behaviors of a job. All that said, maybe block tokens can be improved, but I don't see how path based tokens are feasible. * If the token path is per-file, it would be a slew of tokens per job, and for the aforementioned reasons I just don't see how the job client can possibly calculate all the paths up-front. * If the token path is a non-recursive directory it has similar problems to the per-file path approach. Also, it won't allow a recursive listing of a directory, and jobs won't be able to access sub-directories created on the fly. * If the token path is for recursive directory access, well, you know what everyone will do: request a token for / negating any value of the path-based token. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly
[jira] [Commented] (HADOOP-8808) Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives
[ https://issues.apache.org/jira/browse/HADOOP-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455839#comment-13455839 ] Harsh J commented on HADOOP-8808: - Not intentional, was just missed as part of the port of the rest of the docs. I'm unable to recall the porting JIRA but I'll comment back shortly with that. Sorry for the short response earlier Hemanth. Update FsShell documentation to mention deprecation of some of the commands, and mention alternatives - Key: HADOOP-8808 URL: https://issues.apache.org/jira/browse/HADOOP-8808 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Hemanth Yamijala Assignee: Hemanth Yamijala Attachments: HADOOP-8808.patch In HADOOP-7286, we deprecated the following 3 commands dus, lsr and rmr, in favour of du -s, ls -r and rm -r respectively. The FsShell documentation should be updated to mention these, so that users can start switching. Also, there are places where we refer to the deprecated commands as alternatives. This can be changed as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455894#comment-13455894 ] Hudson commented on HADOOP-8780: Integrated in Hadoop-Hdfs-trunk-Commit #2795 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2795/]) HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed Radwan (Revision 1384833) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HADOOP-8780: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Ahmed! Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455903#comment-13455903 ] Hudson commented on HADOOP-8780: Integrated in Hadoop-Common-trunk-Commit #2732 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2732/]) HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed Radwan (Revision 1384833) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455941#comment-13455941 ] Hudson commented on HADOOP-8780: Integrated in Hadoop-Mapreduce-trunk-Commit #2756 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2756/]) HADOOP-8780. Update DeprecatedProperties apt file. Contributed by Ahmed Radwan (Revision 1384833) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1384833 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455944#comment-13455944 ] Ivan Mitic commented on HADOOP-8731: {quote}Can you please clarify the following scenario so that other folks reading this thread have it easy? Directory A (perm for user Foo) contains directory B (perm for Everyone) So contents of A will be private cache and contents of B will be public cache on Windows but not on Linux. {quote} Correct. The issue we are trying to mitigate on Windows comes from the default permissions. Specifically, default permissions in terms of the Unix mask map to 700 on Windows. This means that by default others (EVERYONE group) do not have r+x permissions. This further means that, if we have a path c:\some\path\file1 and a user wants to upload file1 to the public distributed cache he has to change the permissions on the whole drive to do so. Now, to make the scenario more Windows friendly, we only require user to change the permissions on the file1 to make it public (more precisely to give EVERYONE group read permissions on the file1). On Unix systems, given that default permissions are usually 775 or 755 the scenario is completely opposite. Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Attachment: HADOOP-8805-v2.patch Tested on 2.0.0-alpha Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.3-alpha Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Fix Version/s: (was: 2.0.3-alpha) 2.0.0-alpha Status: Patch Available (was: Open) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.0-alpha Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8791) rm Only deletes non empty directory and files.
[ https://issues.apache.org/jira/browse/HADOOP-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455966#comment-13455966 ] Harsh J commented on HADOOP-8791: - That doc is pretty old so we may have regressed long ago as well. I suppose we could add a new rmdir in FsShell, as a client-side tool, instead of having rm delete directories - to stick to usual convention. I'd like Daryn's thoughts for a new command though. rm Only deletes non empty directory and files. Key: HADOOP-8791 URL: https://issues.apache.org/jira/browse/HADOOP-8791 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0 Reporter: Bertrand Dechoux Assignee: Jing Zhao Labels: documentation Attachments: HADOOP-8791-branch-1.patch, HADOOP-8791-trunk.patch The documentation (1.0.3) is describing the opposite of what rm does. It should be Only delete files and empty directories. With regards to file, the size of the file should not matter, should it? OR I am totally misunderstanding the semantic of this command and I am not the only one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8799) commons-lang version mismatch
[ https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455970#comment-13455970 ] Giridharan Kesavan commented on HADOOP-8799: @joel, If you think this is invalid, could you please resolve the jira as invalid? commons-lang version mismatch - Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8799) commons-lang version mismatch
[ https://issues.apache.org/jira/browse/HADOOP-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Costigliola resolved HADOOP-8799. -- Resolution: Invalid commons-lang version mismatch - Key: HADOOP-8799 URL: https://issues.apache.org/jira/browse/HADOOP-8799 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.0.3 Reporter: Joel Costigliola hadoop install references commons-lang-2.4.jar while hadoop-core dependency references commons-lang:jar:2.6 as shown in maven dependency:tree command output extract. {noformat} org.apache.hadoop:hadoop-core:jar:1.0.3:provided +- commons-cli:commons-cli:jar:1.2:provided +- xmlenc:xmlenc:jar:0.52:provided +- commons-httpclient:commons-httpclient:jar:3.0.1:provided +- commons-codec:commons-codec:jar:1.4:provided +- org.apache.commons:commons-math:jar:2.1:provided +- commons-configuration:commons-configuration:jar:1.6:provided | +- commons-collections:commons-collections:jar:3.2.1:provided | +- commons-lang:commons-lang:jar:2.6:provided (version managed from 2.4) {noformat} Hadoop install libs should be consistent with hadoop-core maven dependencies. I found this error because I was using a feature available in commons-lang.2.6 that was failing when executed in my hadoop cluster (but not with m pigunit tests). A last remark, it would be nice to display the classpath used by hadoop cluster while executing a job, because these kinds of errors are not easy to find. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455989#comment-13455989 ] Hadoop QA commented on HADOOP-8805: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545176/HADOOP-8805-v2.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1458//console This message is automatically generated. Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.0-alpha Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456004#comment-13456004 ] Colin Patrick McCabe commented on HADOOP-8806: -- bq. If this is only about MR jobs, isn't setting LD_LIBRARY_PATH in mapreduce.admin.user.env enough? Yes, it is enough. The question is whether we could do anything to make this easier. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8755) Print thread dump when tests fail due to timeout
[ https://issues.apache.org/jira/browse/HADOOP-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456005#comment-13456005 ] Aaron T. Myers commented on HADOOP-8755: I've just sent the email to common-dev@ as discussed. Thanks again, Andrey. Print thread dump when tests fail due to timeout - Key: HADOOP-8755 URL: https://issues.apache.org/jira/browse/HADOOP-8755 Project: Hadoop Common Issue Type: Improvement Components: test Affects Versions: 1.0.3, 0.23.1, 2.0.0-alpha Reporter: Andrey Klochkov Assignee: Andrey Klochkov Fix For: 2.0.3-alpha Attachments: HADOOP-8755.patch, HADOOP-8755.patch, HADOOP-8755.patch, HDFS-3762-branch-0.23.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch, HDFS-3762.patch When a test fails due to timeout it's often not clear what is the root cause. See HDFS-3364 as an example. We can print dump of all threads in this case, this may help finding causes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8734) LocalJobRunner does not support private distributed cache
[ https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved HADOOP-8734. - Resolution: Fixed Fix Version/s: 1-win Hadoop Flags: Reviewed +1, looks good. Verified that the test fails without the code change and passes with. Just committed this to branch-1-win. Thanks Ivan! LocalJobRunner does not support private distributed cache - Key: HADOOP-8734 URL: https://issues.apache.org/jira/browse/HADOOP-8734 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 1-win Attachments: HADOOP-8734-LocalJobRunner.patch It seems that LocalJobRunner does not support private distributed cache. The issue is more visible on Windows as all DC files are private by default (see HADOOP-8731). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal
Alejandro Abdelnur created HADOOP-8810: -- Summary: when building with documentation build stops waiting for ENTER on terminal Key: HADOOP-8810 URL: https://issues.apache.org/jira/browse/HADOOP-8810 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in Ubuntu as well), the build stops, if you press ENTER it continues. It happens twice. I've traced this down to the exec-maven-plugin invocation of protoc for hadoop-yarn-api module (and other YARN module I don't recall at the moment). jstacking the Maven process it seems the exec-maven-plugin has some locking issues consuming the STDOUT/STDERR of the process being executed. I've converted the protoc invocation in the hadoop-yarn-api to use the antrun plugin instead and then another module running protoc using exec-maven-puglin hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal
[ https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456015#comment-13456015 ] Alejandro Abdelnur commented on HADOOP-8810: Converting all the protoc invocations to use the antrun plugin and the saveVersion.sh script fixes the problem. when building with documentation build stops waiting for ENTER on terminal -- Key: HADOOP-8810 URL: https://issues.apache.org/jira/browse/HADOOP-8810 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in Ubuntu as well), the build stops, if you press ENTER it continues. It happens twice. I've traced this down to the exec-maven-plugin invocation of protoc for hadoop-yarn-api module (and other YARN module I don't recall at the moment). jstacking the Maven process it seems the exec-maven-plugin has some locking issues consuming the STDOUT/STDERR of the process being executed. I've converted the protoc invocation in the hadoop-yarn-api to use the antrun plugin instead and then another module running protoc using exec-maven-puglin hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456019#comment-13456019 ] Todd Lipcon commented on HADOOP-8806: - It's not only about MR jobs. Apps like HBase and Flume can also depend on Snappy, and it would be nice to allow them to get everything they need by just setting the java.library.path to include the hadoop lib/native dir without also futzing with LD_LIBRARY_PATH libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456030#comment-13456030 ] Colin Patrick McCabe commented on HADOOP-8806: -- What do you think about the proposed RPATH hack? Basically allowing us to find libraries relative to libhadoop.so. I haven't tried it yet, but it seems like a good way to go if it works? libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Attachment: HADOOP-8805-v3.patch Tested on the trunk (3.0.0-SNAPSHOT) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.0-alpha Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Open (was: Patch Available) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Fix For: 2.0.0-alpha Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Fix Version/s: (was: 2.0.0-alpha) Status: Patch Available (was: Open) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal
[ https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HADOOP-8810: --- Attachment: HADOOP-8810.patch when building with documentation build stops waiting for ENTER on terminal -- Key: HADOOP-8810 URL: https://issues.apache.org/jira/browse/HADOOP-8810 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: HADOOP-8810.patch When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in Ubuntu as well), the build stops, if you press ENTER it continues. It happens twice. I've traced this down to the exec-maven-plugin invocation of protoc for hadoop-yarn-api module (and other YARN module I don't recall at the moment). jstacking the Maven process it seems the exec-maven-plugin has some locking issues consuming the STDOUT/STDERR of the process being executed. I've converted the protoc invocation in the hadoop-yarn-api to use the antrun plugin instead and then another module running protoc using exec-maven-puglin hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal
[ https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HADOOP-8810: --- Status: Patch Available (was: Open) when building with documentation build stops waiting for ENTER on terminal -- Key: HADOOP-8810 URL: https://issues.apache.org/jira/browse/HADOOP-8810 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: HADOOP-8810.patch When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in Ubuntu as well), the build stops, if you press ENTER it continues. It happens twice. I've traced this down to the exec-maven-plugin invocation of protoc for hadoop-yarn-api module (and other YARN module I don't recall at the moment). jstacking the Maven process it seems the exec-maven-plugin has some locking issues consuming the STDOUT/STDERR of the process being executed. I've converted the protoc invocation in the hadoop-yarn-api to use the antrun plugin instead and then another module running protoc using exec-maven-puglin hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456035#comment-13456035 ] Todd Lipcon commented on HADOOP-8806: - Seems reasonable to me. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8811) Compile hadoop native library in FreeBSD
Radim Kolar created HADOOP-8811: --- Summary: Compile hadoop native library in FreeBSD Key: HADOOP-8811 URL: https://issues.apache.org/jira/browse/HADOOP-8811 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Radim Kolar Priority: Critical Attachments: freebsd-native.txt Native hadoop library do not compiles in FreeBSD because setnetgrent returns void and assembler do not supports SSE4 instructions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8811) Compile hadoop native library in FreeBSD
[ https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated HADOOP-8811: Attachment: freebsd-native.txt Compile hadoop native library in FreeBSD Key: HADOOP-8811 URL: https://issues.apache.org/jira/browse/HADOOP-8811 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Radim Kolar Priority: Critical Labels: freebsd Attachments: freebsd-native.txt Native hadoop library do not compiles in FreeBSD because setnetgrent returns void and assembler do not supports SSE4 instructions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8811) Compile hadoop native library in FreeBSD
[ https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated HADOOP-8811: Target Version/s: 2.0.0-alpha, 0.23.1 (was: 0.23.1, 2.0.0-alpha) Status: Patch Available (was: Open) Compile hadoop native library in FreeBSD Key: HADOOP-8811 URL: https://issues.apache.org/jira/browse/HADOOP-8811 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Radim Kolar Priority: Critical Labels: freebsd Attachments: freebsd-native.txt Native hadoop library do not compiles in FreeBSD because setnetgrent returns void and assembler do not supports SSE4 instructions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456039#comment-13456039 ] Aaron T. Myers commented on HADOOP-8801: Hey Steve, I'm sure Eli will be happy to address your feedback in a follow-up JIRA. That said, for very small and low-risk patches such as this, I don't think that waiting some arbitrary amount of time to allow for someone to comment on it, who might never do so, is productive for anyone. For larger, riskier, or more controversial patches, I'll often say something like I'll commit this later today/tomorrow unless there are further comments. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8810) when building with documentation build stops waiting for ENTER on terminal
[ https://issues.apache.org/jira/browse/HADOOP-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456041#comment-13456041 ] Hadoop QA commented on HADOOP-8810: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545196/HADOOP-8810.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1460//console This message is automatically generated. when building with documentation build stops waiting for ENTER on terminal -- Key: HADOOP-8810 URL: https://issues.apache.org/jira/browse/HADOOP-8810 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: HADOOP-8810.patch When building the docs {{mvn clean package -Pdocs -DskipTests site site:stage -DstagingDirectory=/tmp/hadoop-site}}, in OSX (and I've seen it a few times in Ubuntu as well), the build stops, if you press ENTER it continues. It happens twice. I've traced this down to the exec-maven-plugin invocation of protoc for hadoop-yarn-api module (and other YARN module I don't recall at the moment). jstacking the Maven process it seems the exec-maven-plugin has some locking issues consuming the STDOUT/STDERR of the process being executed. I've converted the protoc invocation in the hadoop-yarn-api to use the antrun plugin instead and then another module running protoc using exec-maven-puglin hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456045#comment-13456045 ] Vinod Kumar Vavilapalli commented on HADOOP-8731: - Thanks for the explanation, Ivan. Patch looks good overall. Two comments: - In your java comment for ancestorsHaveExecutePermissions(), please also mention that this change is only needed to enable LocalJobRunner to use Public-dist-cache. I'd also like the subject of this ticket to be changed - Public dist-cache support for LocalJobRunner on Windows - The changes involving FileUtil.chmod() look spurious, can you explain those changes? Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8811) Compile hadoop native library in FreeBSD
[ https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456046#comment-13456046 ] Radim Kolar commented on HADOOP-8811: - patch freebsd-native.txt also applies cleanly to branch-0.23 Compile hadoop native library in FreeBSD Key: HADOOP-8811 URL: https://issues.apache.org/jira/browse/HADOOP-8811 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Radim Kolar Priority: Critical Labels: freebsd Attachments: freebsd-native.txt Native hadoop library do not compiles in FreeBSD because setnetgrent returns void and assembler do not supports SSE4 instructions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8734) LocalJobRunner does not support private distributed cache
[ https://issues.apache.org/jira/browse/HADOOP-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456048#comment-13456048 ] Ivan Mitic commented on HADOOP-8734: Awesome, thanks! LocalJobRunner does not support private distributed cache - Key: HADOOP-8734 URL: https://issues.apache.org/jira/browse/HADOOP-8734 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 1-win Attachments: HADOOP-8734-LocalJobRunner.patch It seems that LocalJobRunner does not support private distributed cache. The issue is more visible on Windows as all DC files are private by default (see HADOOP-8731). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456050#comment-13456050 ] Ivan Mitic commented on HADOOP-8731: Thanks for reviewing Vinod! bq. In your java comment for ancestorsHaveExecutePermissions(), please also mention that this change is only needed to enable LocalJobRunner to use Public-dist-cache. I'd also like the subject of this ticket to be changed - Public dist-cache support for LocalJobRunner on Windows The change does not apply to LocalJobRunner only, but to distributed cache in general. I tried to explain what is the problem and how am I trying to solve it above, let me know if you need additional clarification. bq. The changes involving FileUtil.chmod() look spurious, can you explain those changes? Bikas asked the same question above :) Quoting my answer: {quote} The issue is that the right permissions are not set on files if I do not make this change. If you take a look at the previous FileUtils.chmod() it only sets permissions for archives, but not for files. Now when I moved it below, it sets the permissions for both files are archives. {quote} Let me know if you have additional questions/comments. Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456057#comment-13456057 ] Vinod Kumar Vavilapalli commented on HADOOP-8731: - Apologies for repeating the questions, I overlooked your answers. There are two cases: - In the case of a real cluster, and with HDFS, the definition of a public dist-cache file is one which is accessible to all users; snd HDFS also has posix style permissions. The method isPublic() eventually is used by the JobClient to figure out which of the user-needed artifacts are public and which are not. So in the distributed-cluster case with DFS, this definition of public-cache doesn't need to change irrespective of whether you have Windows or Linux underneath. - If you are talking of distributed MR cluster working on a local-filesystem, yes your changes will be needed, but that mode is not a supported setup anyways and will most likely need many more changes besides yours. Regarding the permissions related changes: - I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block. - And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? {code} ... sourceFs.copyToLocalFile(sourcePath, workFile); localFs.setPermission(workFile, permission); if (isArchive) { ... {code} Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8811) Compile hadoop native library in FreeBSD
[ https://issues.apache.org/jira/browse/HADOOP-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456059#comment-13456059 ] Hadoop QA commented on HADOOP-8811: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545197/freebsd-native.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1461//console This message is automatically generated. Compile hadoop native library in FreeBSD Key: HADOOP-8811 URL: https://issues.apache.org/jira/browse/HADOOP-8811 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Radim Kolar Priority: Critical Labels: freebsd Attachments: freebsd-native.txt Native hadoop library do not compiles in FreeBSD because setnetgrent returns void and assembler do not supports SSE4 instructions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8803) Make Hadoop running more secure public cloud envrionment
[ https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456075#comment-13456075 ] Xianqing Yu commented on HADOOP-8803: - Hi Kingshuk, From long term, new features in hadoop can potentially give other management systems, which are based on Hadoop, more flexible to design and use. The core observation is that some improvements would be much easier to implement if we can do that in Hadoop's code (and those features can be reused by other systems, which save cost and time). For example, byte-range restrict design in Block token can also be used by management system outside Hadoop. But without this type of block token, programmer may need to design much more complex method to achieve this function, so company need to pay money and time for that. It is reasonable to think Hadoop is wrapped as middleware layers, but the improvement of hadoop layer would give upper and lower layer better flexibility, or performance, or security. Cloud provider can add additional security layer, but all of those services need extra paid. And even cloud providers invest huge money on those area, I think they still cannot guarantee that data is safe without huge performance penalty. Consider about balance of cost, performance, and security, I decide to do some work in Hadoop layer. So what I believe is that, the hadoop, which is better designed and more secure, can reduce the difficulties to implement the overall system, such as Hortonworks HDP and IBM BigInsights, thus save time and cost. I think we should always look at what the hadoop users really need. For the programmer who writing system based on hadoop, they would like to see some good features come out of Hadoop. That is why I want to post my idea here to discus to see what is real need and difficulties from industry. Make Hadoop running more secure public cloud envrionment Key: HADOOP-8803 URL: https://issues.apache.org/jira/browse/HADOOP-8803 Project: Hadoop Common Issue Type: New Feature Components: fs, ipc, security Affects Versions: 0.20.204.0 Reporter: Xianqing Yu Labels: hadoop Original Estimate: 2m Remaining Estimate: 2m I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud environment, especially for public Cloud environment. In order to achieve that, I redesign the currently security mechanism and achieve following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can be accessed by certain user or not, Block Token only proof which block or blocks can be accessed. I make Hadoop can do byte-granularity access control, each access party, user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker can be trusted. A large number of Datanode and TaskTracker may be compromised due to some of them may be running under less secure environment. So I re-design the secure mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) share one master key to generate Block Access Token, if one DataNode is compromised by hacker, the hacker can get the key and generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and Map-Reduce Task process on HDFS. In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write tokens, copy job file, etc.. However, if one of them is compromised, every critical thing in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access everything belong to this job or user on HDFS. By my design, it can only access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker can not know some information like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up
[jira] [Commented] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
[ https://issues.apache.org/jira/browse/HADOOP-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456092#comment-13456092 ] Eli Collins commented on HADOOP-8801: - @Steve, I'm happy to address your feedback in a follow up jira. On a related note would you be willing to make sure *all* your changes have been code reviewed? Most of your recent changes (HADOOP-8064, HADOOP-7878, HADOOP-, HADOOP-7772, HADOOP-7727, HADOOP-7705, etc) have been committed without any code review or a +1 from another committer, which is our project's policy as I understand it. I'm a little surprised you want more time to have this change code reviewed when you commit your own changes without any code review. Likewise, I don't want to revert these changes but they should not have been committed without review and a +1 from another committer. ExitUtil#terminate should capture the exception stack trace --- Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.2-alpha Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-8806: - Attachment: HADOOP-8806.003.patch * insert '$ORIGIN/' into the RPATH of {{libhadoop.so}} on Linux. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-8806: - Description: libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? was: libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? Summary: libhadoop.so: dlopen should be better at locating libsnappy.so, etc. (was: libhadoop.so: search java.library.path when calling dlopen) libhadoop.so: dlopen should be better at locating libsnappy.so, etc. Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-8806: - Assignee: Colin Patrick McCabe Status: Patch Available (was: Open) libhadoop.so: dlopen should be better at locating libsnappy.so, etc. Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456120#comment-13456120 ] Colin Patrick McCabe commented on HADOOP-8806: -- I have tested this patch with Yarn running TestDFSIO, and it works. I can drop {{libsnappy.so}} into {{$HADOOP_ROOT/lib/native}} (the same directory that contains {{libhadoop.so}}) and everything just works. No {{LD_LIBRARY_PATH}} required, and no code changes required. libhadoop.so: dlopen should be better at locating libsnappy.so, etc. Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Open (was: Patch Available) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Patch Available (was: Open) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows
[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456134#comment-13456134 ] Ivan Mitic commented on HADOOP-8731: Thanks Vinod, these are great comments! Some answers below. {quote}In the case of a real cluster, and with HDFS, the definition of a public dist-cache file is one which is accessible to all users; snd HDFS also has posix style permissions. The method isPublic() eventually is used by the JobClient to figure out which of the user-needed artifacts are public and which are not. So in the distributed-cluster case with DFS, this definition of public-cache doesn't need to change irrespective of whether you have Windows or Linux underneath. {quote} I agree, the JobClient will evaluate whether a file should be public or private. Now, if I understood things correctly, based on whether the file is marked public or private on the JobClient side, it will be later downloaded from DFS to the public or private LFS location on the TT machine. What we are proposing with this change is to change the logic on the JobClient side that determines whether the file is public or private. Given that all files are by default private on Windows, it would be a real challenge for users to upload a file to the public distributed cache if we keep the old model (see my previous comments). Does this make sense? Please do comment, maybe I just didn't understand correctly how DC works. bq. I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block. Ah, didn't think of this. Will revert back the original chmod. bq. And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? You're right, let me debug to see what was the problem here, I made this fix a while back. Public distributed cache support for Windows Key: HADOOP-8731 URL: https://issues.apache.org/jira/browse/HADOOP-8731 Project: Hadoop Common Issue Type: Bug Components: filecache Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8731-PublicCache.patch A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to 700 all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. To enable the scenario and make it more Windows friendly, the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders). Security considerations for the proposal: Default permissions on Unix platforms are usually 775 or 755 meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are 700. This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8809) RPMs should skip useradds if the users already exist
[ https://issues.apache.org/jira/browse/HADOOP-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456145#comment-13456145 ] Eli Collins commented on HADOOP-8809: - Bigtop has a fix for this btw. Now that Bigtop is becoming a TLP what do people thing of just removing the packaging code so we don't maintain the same thing in two projects? RPMs should skip useradds if the users already exist Key: HADOOP-8809 URL: https://issues.apache.org/jira/browse/HADOOP-8809 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 1.0.3 Reporter: Steve Loughran Priority: Minor The hadoop.spec preinstall script creates users -but it does this even if they already exist. This may causes problems if the installation has already got those users with different uids. A check with {{id}} can avoid this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456148#comment-13456148 ] Hadoop QA commented on HADOOP-8806: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545207/HADOOP-8806.003.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//console This message is automatically generated. libhadoop.so: dlopen should be better at locating libsnappy.so, etc. Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8733) TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows
[ https://issues.apache.org/jira/browse/HADOOP-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456158#comment-13456158 ] Ivan Mitic commented on HADOOP-8733: Thanks again for reviewing Vinod! bq. One minor point: In TestJvmManager, instead of creating dummy file for WINDOWS, will it be possible to simulate the Child code like on Linux. Is final String jvmName = ManagementFactory.getRuntimeMXBean().getName(); in Child.java the call that is used to send pid from Child to TT? If so, we should just simulate that code. I'm not sure I understand your comment. Do you mind clarifying a bit? Just to provide some context from my side. On Windows, we don't use ProcessID to identify the task process, instead we used attemptId string. This ID is tied to the child task using Windows [JobObjects|http://msdn.microsoft.com/en-us/library/windows/desktop/ms684161(v=vs.85).aspx], and TT uses this ID to kill the task if needed. Now, in TestJvmManager we are verifying that the task is killed properly, and on Windows there is no need to circulate the PID from the task back to the TT, as the TT has this info already. Hope this helps. TestStreamingTaskLog, TestJvmManager, TestLinuxTaskControllerLaunchArgs fail on Windows --- Key: HADOOP-8733 URL: https://issues.apache.org/jira/browse/HADOOP-8733 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.2.patch, HADOOP-8733-scripts.patch Jira tracking test failures related to test .sh script dependencies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Open (was: Patch Available) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Status: Patch Available (was: Open) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Attachment: HADOOP-8805-v3.patch Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8805) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common
[ https://issues.apache.org/jira/browse/HADOOP-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated HADOOP-8805: Attachment: (was: HADOOP-8805-v3.patch) Move protocol buffer implementation of GetUserMappingProtocol from HDFS to Common - Key: HADOOP-8805 URL: https://issues.apache.org/jira/browse/HADOOP-8805 Project: Hadoop Common Issue Type: Improvement Reporter: Bo Wang Assignee: Bo Wang Attachments: HADOOP-8805.patch, HADOOP-8805-v2.patch, HADOOP-8805-v3.patch org.apache.hadoop.tools.GetUserMappingProtocol is used in both HDFS and YARN. We should move the protocol buffer implementation from HDFS to Common so that it can also be used by YARN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
Eli Collins created HADOOP-8812: --- Summary: ExitUtil#terminate should print Exception#toString Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8812: Attachment: hadoop-8812.txt Patch attached. Thanks Steve for the good suggestion. ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8812: Status: Patch Available (was: Open) ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456196#comment-13456196 ] Hadoop QA commented on HADOOP-8812: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545226/hadoop-8812.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1464//console This message is automatically generated. ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456205#comment-13456205 ] Todd Lipcon commented on HADOOP-8812: - Why not replace the whole thing with StringUtils.stringifyException? ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8457) Address file ownership issue for users in Administrators group on Windows.
[ https://issues.apache.org/jira/browse/HADOOP-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456212#comment-13456212 ] Sanjay Radia commented on HADOOP-8457: -- +1 Address file ownership issue for users in Administrators group on Windows. -- Key: HADOOP-8457 URL: https://issues.apache.org/jira/browse/HADOOP-8457 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 1.1.0 Reporter: Chuan Liu Assignee: Ivan Mitic Priority: Minor Attachments: HADOOP-8457-branch-1-win_Admins(2).patch, HADOOP-8457-branch-1-win_Admins(3).patch, HADOOP-8457-branch-1-win_Admins.patch On Linux, the initial file owners are the creators. (I think this is true in general. If there are exceptions, please let me know.) On Windows, the file created by a user in the Administrators group has the initial owner ‘Administrators’, i.e. the the Administrators group is the initial owner of the file. As a result, this leads to an exception when we check file ownership in SecureIOUtils .checkStat() method. As a result, this method is disabled right now. We need to address this problem and enable the method on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable
[ https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned HADOOP-7930: - Assignee: Robert Kanter Kerberos relogin interval in UserGroupInformation should be configurable Key: HADOOP-7930 URL: https://issues.apache.org/jira/browse/HADOOP-7930 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 0.23.1, 0.24.0 Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 0.23.3, 0.24.0 Currently the check done in the *hasSufficientTimeElapsed()* method is hardcoded to 10 mins wait. The wait time should be driven by configuration and its default value, for clients should be 1 min. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable
[ https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated HADOOP-7930: -- Status: Patch Available (was: Open) Kerberos relogin interval in UserGroupInformation should be configurable Key: HADOOP-7930 URL: https://issues.apache.org/jira/browse/HADOOP-7930 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 0.23.1, 0.24.0 Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 0.23.3, 0.24.0 Attachments: HADOOP-7930.patch Currently the check done in the *hasSufficientTimeElapsed()* method is hardcoded to 10 mins wait. The wait time should be driven by configuration and its default value, for clients should be 1 min. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7930) Kerberos relogin interval in UserGroupInformation should be configurable
[ https://issues.apache.org/jira/browse/HADOOP-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated HADOOP-7930: -- Attachment: HADOOP-7930.patch I added a property called {{hadoop.kerberos.min.time.before.relogin}} that is used to specify the relogin interval. Kerberos relogin interval in UserGroupInformation should be configurable Key: HADOOP-7930 URL: https://issues.apache.org/jira/browse/HADOOP-7930 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 0.23.1, 0.24.0 Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 0.23.3, 0.24.0 Attachments: HADOOP-7930.patch Currently the check done in the *hasSufficientTimeElapsed()* method is hardcoded to 10 mins wait. The wait time should be driven by configuration and its default value, for clients should be 1 min. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HADOOP-8812: Attachment: hadoop-8812.txt That's better. Patch attached, fixes the same in a test while we're at it. ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt, hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
[ https://issues.apache.org/jira/browse/HADOOP-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456220#comment-13456220 ] Eli Collins commented on HADOOP-8812: - Btw terminate will LOG#fatal with the message which is why I am just passing the stringified exception. ExitUtil#terminate should print Exception#toString --- Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: hadoop-8812.txt, hadoop-8812.txt Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456222#comment-13456222 ] Eli Collins commented on HADOOP-8780: - Hey guys, Looks like this change introduced a findbugs warning: https://builds.apache.org/job/PreCommit-HADOOP-Build/1462//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456228#comment-13456228 ] Eli Collins commented on HADOOP-8806: - Overall seems reasonable. - Why use RPATH instead of RUNPATH? - Have you tested with libsnappy.so in $HADOOP_ROOT/lib/native as well as installed in the system (ie in LD_LIBRARY_PATH)? libhadoop.so: dlopen should be better at locating libsnappy.so, etc. Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8806.003.patch libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Can we make this configuration just work without needing to rely on {{LD_LIBRARY_PATH}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8780) Update DeprecatedProperties apt file
[ https://issues.apache.org/jira/browse/HADOOP-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated HADOOP-8780: - Attachment: HADOOP-8780_amendment.patch This is weird, I don't know why it wasn't showing on my report here: https://issues.apache.org/jira/browse/HADOOP-8780?focusedCommentId=13454161page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13454161 I am attaching a minor amendment that takes care of this findbugs warning. Eli, Tom mentioned earlier that jenkins is not running test-patch as it is timing out when running the hdfs tests. Any idea how to fix that so we can directly get the test-patch report? Update DeprecatedProperties apt file Key: HADOOP-8780 URL: https://issues.apache.org/jira/browse/HADOOP-8780 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: HADOOP-8780_amendment.patch, HADOOP-8780.patch, HADOOP-8780_rev2.patch, HADOOP-8780_rev3.patch The current list of deprecated properties is not up-to-date. I'll will upload a patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira