[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661837#comment-14661837 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #269 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/269/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661841#comment-14661841 ] Hudson commented on MAPREDUCE-6443: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #269 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/269/]) MAPREDUCE-6443. Add JvmPauseMonitor to JobHistoryServer. Contributed by Robert Kanter. (junping_du: rev e73a928a6360f68aaee2ed58b3a8d180f4051407) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * hadoop-mapreduce-project/CHANGES.txt Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661860#comment-14661860 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/277/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6446) Support SSL for AM webapp
[ https://issues.apache.org/jira/browse/MAPREDUCE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena moved YARN-3651 to MAPREDUCE-6446: --- Affects Version/s: (was: 2.7.0) 2.7.0 Component/s: (was: applications) (was: resourcemanager) resourcemanager Key: MAPREDUCE-6446 (was: YARN-3651) Project: Hadoop Map/Reduce (was: Hadoop YARN) Support SSL for AM webapp - Key: MAPREDUCE-6446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6446 Project: Hadoop Map/Reduce Issue Type: Improvement Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Priority: Minor Application URL in Application CLI wrong Steps to reproduce == 1. Start HA setup insecure mode 2.Configure HTTPS_ONLY 3.Submit application to cluster 4.Execute command ./yarn application -list 5.Observer tracking URL shown {code} 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History server at /IP:45034 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id --- Tracking-URL application_1431672734347_0003 *http://host-10-19-92-117:13013* {code} *Expected* https://IP:64323/proxy/application_1431672734347_0003 / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661632#comment-14661632 ] Hudson commented on MAPREDUCE-6443: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #280 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/280/]) MAPREDUCE-6443. Add JvmPauseMonitor to JobHistoryServer. Contributed by Robert Kanter. (junping_du: rev e73a928a6360f68aaee2ed58b3a8d180f4051407) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * hadoop-mapreduce-project/CHANGES.txt Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661628#comment-14661628 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #280 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/280/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * hadoop-mapreduce-project/CHANGES.txt Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661642#comment-14661642 ] Hudson commented on MAPREDUCE-6443: --- FAILURE: Integrated in Hadoop-Yarn-trunk #1010 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1010/]) MAPREDUCE-6443. Add JvmPauseMonitor to JobHistoryServer. Contributed by Robert Kanter. (junping_du: rev e73a928a6360f68aaee2ed58b3a8d180f4051407) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * hadoop-mapreduce-project/CHANGES.txt Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661638#comment-14661638 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Yarn-trunk #1010 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1010/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * hadoop-mapreduce-project/CHANGES.txt Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6445) Shuffle hang
[ https://issues.apache.org/jira/browse/MAPREDUCE-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661599#comment-14661599 ] Peng Zhang commented on MAPREDUCE-6445: --- I found most tasks of this job (94 of 100) failed like MAPREDUCE-6303. So this maybe related, I'll backport MAPREDUCE-6303 and test on our cluster. Shuffle hang Key: MAPREDUCE-6445 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6445 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Peng Zhang Scale cluster has run for months with 2.6.0. 2 of 200 reduces hang on shuffle instance 1 log seems like loop on 1 map output: {noformat} 2015-08-06 21:54:14,649 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#1 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193reduce=20map=attempt_1438689528746_10193_m_13_0,attempt_1438689528746_10193_m_20_0 sent hash and received reply 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager returned status WAIT ... 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#1 in 2ms 2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5 2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193reduce=20map=attempt_1438689528746_10193_m_13_0,attempt_1438689528746_10193_m_20_0 sent hash and received reply 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ... 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 4ms 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193reduce=20map=attempt_1438689528746_10193_m_13_0,attempt_1438689528746_10193_m_20_0 sent hash and received reply 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ... 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 5ms 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5 {noformat} node 2 log seems like loop on 5 map output: {noformat} 2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-172.bj:22408 with 1 to fetcher#5 2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-172.bj:22408 to fetcher#5 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193reduce=85map=attempt_1438689528746_10193_m_13_0,attempt_1438689528746_10193_m_20_0 sent hash and received reply 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergeManager returned status WAIT ... 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#3 in 5ms 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-179.bj:22408 with 1 to fetcher#3 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-179.bj:22408 to fetcher#3 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Attachment: (was: MAPREDUCE-6357-1.patch) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Attachment: MAPREDUCE-6357-1.patch Submitting the javadoc changes. Please let me know if anything look amiss. Thanks! MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Status: Open (was: Patch Available) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661911#comment-14661911 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2207/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661962#comment-14661962 ] Hadoop QA commented on MAPREDUCE-6357: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 3 new checkstyle issues (total was 29, now 32). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 47s | Tests passed in hadoop-mapreduce-client-core. | | | | 40m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749275/MAPREDUCE-6357-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b6265d3 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/console | This message was automatically generated. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Attachment: MAPREDUCE-6357-1.patch Fixing a typo and checkstyle warning. No tests since this is a doc change. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Status: Patch Available (was: Open) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662064#comment-14662064 ] Eric Payne commented on MAPREDUCE-5870: --- [~sunilg], for what it's worth, I have downloaded the latest patch (version 003) and tested and verified it in conjunction with the changes that were made for YARN-2003. I performed the following sleep jobs with 10 tasks each. My one-node cluster can run 5 containers at once. - I submit sleep job1 to the default queue, setting {{-Dmapreduce.job.priority=LOW}} - Job1 starts running 5 containers and has 5 tasks pending. - I submit sleep job2 to the default queue, setting {{-Dmapreduce.job.priority=HIGH}} - All 10 job2 tasks are pending. - Once tasks from job1 complete, job2 gets the containers. Although job1 has 5 tasks pending, the number of running tasks for job1 remains 0 until job2 has no more pending tasks and job2's running tasks begin to complete. - At that point, job1's tasks begin again to receive containers. I also verified that you can specify {{-Dmapreduce.job.priority=_number_}}, and the container allocations go to the higher numbered jobs. Finally, I verified that if you make the priority higher than the cluster max, it silently sets the job priority to cluster max. So, the bottom line is LGTM :-) +1 Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: MAPREDUCE-5870 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Sunil G Assignee: Sunil G Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 0003-MAPREDUCE-5870.patch, Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662080#comment-14662080 ] Hadoop QA commented on MAPREDUCE-6357: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 15s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 16s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 52s | Tests passed in hadoop-mapreduce-client-core. | | | | 40m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749289/MAPREDUCE-6357-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b6265d3 | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/console | This message was automatically generated. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
[jira] [Updated] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated MAPREDUCE-6357: --- Status: Patch Available (was: Open) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Dustin Cote Attachments: MAPREDUCE-6357-1.patch After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6257) Document encrypted spills
[ https://issues.apache.org/jira/browse/MAPREDUCE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661887#comment-14661887 ] Hudson commented on MAPREDUCE-6257: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2226 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2226/]) MAPREDUCE-6257. Document encrypted spills (Bibin A Chundatt via aw) (aw: rev fb1be0b3100cdd69f6dc1987585fcadd4e7c8a2a) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/EncryptedShuffle.md * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml Document encrypted spills - Key: MAPREDUCE-6257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6257 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Allen Wittenauer Assignee: Bibin A Chundatt Fix For: 3.0.0 Attachments: 0001-MAPREDUCE-6257.patch, 0002-MAPREDUCE-6257.patch, 0003-MAPREDUCE-6257.patch, EncryptedShuffle.html Encrypted spills appear to be completely undocumented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661915#comment-14661915 ] Hudson commented on MAPREDUCE-6443: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2207 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2207/]) MAPREDUCE-6443. Add JvmPauseMonitor to JobHistoryServer. Contributed by Robert Kanter. (junping_du: rev e73a928a6360f68aaee2ed58b3a8d180f4051407) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)