[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4911: -- Attachment: MAPREDUCE-4911.3.patch Fixed Javadoc warnings again. Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.2.patch, MAPREDUCE-4911.3.patch, MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4911: -- Summary: Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf (was: Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.2.patch, MAPREDUCE-4911.3.patch, MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4911) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558653#comment-13558653 ] Hadoop QA commented on MAPREDUCE-4911: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565750/MAPREDUCE-4911.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3257//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3257//console This message is automatically generated. Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.2.patch, MAPREDUCE-4911.3.patch, MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4952) FSSchedulerNode is always instantiated with a 0 virtual core capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558658#comment-13558658 ] Harsh J commented on MAPREDUCE-4952: One may also use More Actions - Move to move incorrect JIRAs to their right project, and URLs are automatically mapped to the new one, making the process more elegant. Just something for future :-) FSSchedulerNode is always instantiated with a 0 virtual core capacity - Key: MAPREDUCE-4952 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4952 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza After YARN-2, FSSchedulerNode was not updated to initialize with the underlying RMNode's CPU capacity, and thus always has 0 virtual cores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4876) Adopt a Tuple MapReduce API instead of classic MapReduce one
[ https://issues.apache.org/jira/browse/MAPREDUCE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558665#comment-13558665 ] posa Wu commented on MAPREDUCE-4876: [~ivan.prado]Does Pangool support MultipleInputs and MultipleOutputs? Adopt a Tuple MapReduce API instead of classic MapReduce one Key: MAPREDUCE-4876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4876 Project: Hadoop Map/Reduce Issue Type: Wish Reporter: Iván de Prado Priority: Minor After using MapReduce for many years, we have noticed that it lacks some important features: compound records, easy intra-reduce sorting and join capabilities. We have elaborated a slightly modified MapReduce foundation to overcome these problems: Tuple MapReduce. You can see a full paper published at the ICDM 2012 that describes it at http://pangool.net/TupleMapReduce.pdf The good news are: 1) That no architectural changes on Hadoop are needed to embrace Tuple MapReduce. 2) Indeed, we have proven that it is possible to implement it on top of Hadoop. See the Pangool Open Source project ( http://pangool.net/ ). 3) It performs very efficiently ( http://pangool.net/benchmark.html ) 4) It is compatible with all Hadoop stack: Writables, Serializers, Input/OutputFormats, etc. We believe Hadoop community could benefit from it in different ways: 1) By getting ideas for a future API redesign. 2) By adopting Pangool inside Hadoop. Of course, we would be helping and contributing with anything needed doing any adaptation changes if needed (not many, because as I told, everything is compatible with existing MapReduce). Obviously, we prefer the second. But at least, we believe some good ideas can be obtained by looking at Tuple MapReduce and Pangool. There are also other improvements in Pangool that would improve Hadoop API: 1) Configuration by instance: passing parameters by constructor. For example, Pangool Input/OutputFormats can be configured by providing values to the constructor. 2) Stateful serialization. What is requested in https://issues.apache.org/jira/browse/MAPREDUCE-1462 is already supported by Pangool. 3) First-class multipleinput/multipleoutput. Well, we are open to the discussion and to contribute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4876) Adopt a Tuple MapReduce API instead of classic MapReduce one
[ https://issues.apache.org/jira/browse/MAPREDUCE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558673#comment-13558673 ] Iván de Prado commented on MAPREDUCE-4876: -- [~posa88] Yes. Pangool offers multiple inputs. Example (from [http://pangool.net/userguide/TupleMrBuilder.html]: {code:java} ... mr.addInput(new Path(input1), new HadoopInputFormat(TextInputFormat.class), new UrlMapProcessor()); mr.addInput(new Path(input2), new HadoopInputFormat(TextInputFormat.class), new UrlProcessor()); ... {code} Pangool also supports multiple outputs, with some advantages over Hadoop multiple outputs because of the use of configuration by instance. See [http://pangool.net/userguide/named_outputs.html]. [Here|http://www.datasalt.com/2012/05/pangool-solr/] you can find an example of using Pangool's named outputs for generating different Solr indexes in one Job. Adopt a Tuple MapReduce API instead of classic MapReduce one Key: MAPREDUCE-4876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4876 Project: Hadoop Map/Reduce Issue Type: Wish Reporter: Iván de Prado Priority: Minor After using MapReduce for many years, we have noticed that it lacks some important features: compound records, easy intra-reduce sorting and join capabilities. We have elaborated a slightly modified MapReduce foundation to overcome these problems: Tuple MapReduce. You can see a full paper published at the ICDM 2012 that describes it at http://pangool.net/TupleMapReduce.pdf The good news are: 1) That no architectural changes on Hadoop are needed to embrace Tuple MapReduce. 2) Indeed, we have proven that it is possible to implement it on top of Hadoop. See the Pangool Open Source project ( http://pangool.net/ ). 3) It performs very efficiently ( http://pangool.net/benchmark.html ) 4) It is compatible with all Hadoop stack: Writables, Serializers, Input/OutputFormats, etc. We believe Hadoop community could benefit from it in different ways: 1) By getting ideas for a future API redesign. 2) By adopting Pangool inside Hadoop. Of course, we would be helping and contributing with anything needed doing any adaptation changes if needed (not many, because as I told, everything is compatible with existing MapReduce). Obviously, we prefer the second. But at least, we believe some good ideas can be obtained by looking at Tuple MapReduce and Pangool. There are also other improvements in Pangool that would improve Hadoop API: 1) Configuration by instance: passing parameters by constructor. For example, Pangool Input/OutputFormats can be configured by providing values to the constructor. 2) Stateful serialization. What is requested in https://issues.apache.org/jira/browse/MAPREDUCE-1462 is already supported by Pangool. 3) First-class multipleinput/multipleoutput. Well, we are open to the discussion and to contribute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558770#comment-13558770 ] Avner BenHanoch commented on MAPREDUCE-4049: Hi Alejandro, Thanks for your comprehensive answer and code. It is important for me to agree with you on all the details before I go to implementation. Kindly, please let me know if you agree on all the bullets below: # I’ll Keep consumer producer in the same JIRA and submit one patch when all is ready (since this JIRA issue deals with generic shuffle service as set of two plugins). I’ll do my best to do it as soon as possible within few days. # The consumer will be according to all our agreements so far # The producer will be according to your code and remarks. # Also, I understand that in _... remove that line and discover, instantiate and initialize the provider plugin_ you simply mean _... remove that line and call multiShuffleProviderPlugin.initialize()_ # In addition, I’ll move the current call _shuffleConsumerPlugin.destroy()_ from _TaskTracker.close()_ method into _TaskTracker.shutdown()_ method – this will match the move of _shuffleConsumerPlugin.initialize_ into TT CTOR. Thanks, Avner plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4952) FSSchedulerNode is always instantiated with a 0 virtual core capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558936#comment-13558936 ] Sandy Ryza commented on MAPREDUCE-4952: --- Thanks Harsh, good to know. FSSchedulerNode is always instantiated with a 0 virtual core capacity - Key: MAPREDUCE-4952 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4952 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza After YARN-2, FSSchedulerNode was not updated to initialize with the underlying RMNode's CPU capacity, and thus always has 0 virtual cores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4271) Make TestCapacityScheduler more robust with non-Sun JDK
[ https://issues.apache.org/jira/browse/MAPREDUCE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558967#comment-13558967 ] Amir Sanjar commented on MAPREDUCE-4271: this testcase also fails with non IBM JDK. Verified to fail with Oracle JAVA 7 Make TestCapacityScheduler more robust with non-Sun JDK --- Key: MAPREDUCE-4271 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4271 Project: Hadoop Map/Reduce Issue Type: Bug Components: capacity-sched Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Labels: alt-jdk, capacity Attachments: mapreduce-4271-branch-1.patch, test-afterepatch.result, test-beforepatch.result, test-patch.result The capacity scheduler queue is initialized with a HashMap, the values of which are later added to a list (a queue for assigning tasks). TestCapacityScheduler depends on the order of the list hence not portable across JDKs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558987#comment-13558987 ] Bikas Saha commented on MAPREDUCE-4951: --- Will that differentiate between preemption killing and resource (eg out of memory) killing? Container preemption interpreted as task failure Key: MAPREDUCE-4951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4951.patch When YARN reports a completed container to the MR AM, it always interprets it as a failure. This can lead to a job failing because too many of its tasks failed, when in fact they only failed because the scheduler preempted them. MR needs to recognize the special exit code value of -100 and interpret it as a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559023#comment-13559023 ] Sandy Ryza commented on MAPREDUCE-4951: --- I believe in that case the exit code will be FORCE_KILLED(137) or TERMINATED(143) (from ContainerExecutor.java). Container preemption interpreted as task failure Key: MAPREDUCE-4951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4951.patch When YARN reports a completed container to the MR AM, it always interprets it as a failure. This can lead to a job failing because too many of its tasks failed, when in fact they only failed because the scheduler preempted them. MR needs to recognize the special exit code value of -100 and interpret it as a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559079#comment-13559079 ] Hadoop QA commented on MAPREDUCE-4946: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565590/MAPREDUCE-4946.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3258//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3258//console This message is automatically generated. Type conversion of map completion events leads to performance problems with large jobs -- Key: MAPREDUCE-4946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-4946.patch We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500). We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559107#comment-13559107 ] Siddharth Seth commented on MAPREDUCE-4946: --- The change looks good to me. Jason, could you please post a patch for branch-0.23 as well. Agreed. TaskUmbilical using TaskAttemptCompletionEvents seems like a longer term change - the conversions ends up getting pushed to individual tasks, unless Task itself is change to work with mrv2 constructs. Type conversion of map completion events leads to performance problems with large jobs -- Key: MAPREDUCE-4946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-4946.patch We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500). We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4838) Add extra info to JH files
[ https://issues.apache.org/jira/browse/MAPREDUCE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated MAPREDUCE-4838: --- Attachment: MAPREDUCE-4838_1.patch I've migrated the code to add more task-info to JH for Hadoop-1 to that for Hadoop-2. There're the following major differences: 1) There's no JobInProgress (actually nearly empty) and TaskInProgress, where the locality ant the avataar attributes are set and logged. Instead, avataar is now set in TaskImpl#addAndScheduleAttempt by judging whether there are other active task attempts, while locality is set in TaskAttemptImpl#ContainerAssignedTransition#transition by judging whether the assigned container's host is within the local host/rack list of the task attempt. 2) workflow related info is added in JobImpl. The function getWorkflowAdjacencies and its dependent functions are also imported in this class. 3) Locality has the same enum values as the NodeType of yarn, but I still created Loality because it infers one attribute of a task attempt. The current trunk can be built correctly with the patch applied. However, I still need some more work with the test cases. Arun and Sid, please have a look at the patch, and give some comments. Thank you! Zhijie Add extra info to JH files -- Key: MAPREDUCE-4838 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4838 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Arun C Murthy Assignee: Zhijie Shen Attachments: MAPREDUCE-4838_1.patch, MAPREDUCE-4838.patch It will be useful to add more task-info to JH for analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4953) HadoopPipes misuses fprintf
Andy Isaacson created MAPREDUCE-4953: Summary: HadoopPipes misuses fprintf Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated MAPREDUCE-4953: - Status: Patch Available (was: Open) HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated MAPREDUCE-4953: - Attachment: mapreduce-4953.txt Fix the warning by switching to fputs. HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-4951: -- Attachment: MAPREDUCE-4951-1.patch Container preemption interpreted as task failure Key: MAPREDUCE-4951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch When YARN reports a completed container to the MR AM, it always interprets it as a failure. This can lead to a job failing because too many of its tasks failed, when in fact they only failed because the scheduler preempted them. MR needs to recognize the special exit code value of -100 and interpret it as a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559160#comment-13559160 ] Sandy Ryza commented on MAPREDUCE-4951: --- New patch includes test and uses constant from YarnConfiguration instead of hardcoded -100. Container preemption interpreted as task failure Key: MAPREDUCE-4951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch When YARN reports a completed container to the MR AM, it always interprets it as a failure. This can lead to a job failing because too many of its tasks failed, when in fact they only failed because the scheduler preempted them. MR needs to recognize the special exit code value of -100 and interpret it as a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559162#comment-13559162 ] Hadoop QA commented on MAPREDUCE-4953: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565858/mapreduce-4953.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3259//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3259//console This message is automatically generated. HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559170#comment-13559170 ] Hadoop QA commented on MAPREDUCE-4951: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565866/MAPREDUCE-4951-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3260//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3260//console This message is automatically generated. Container preemption interpreted as task failure Key: MAPREDUCE-4951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch When YARN reports a completed container to the MR AM, it always interprets it as a failure. This can lead to a job failing because too many of its tasks failed, when in fact they only failed because the scheduler preempted them. MR needs to recognize the special exit code value of -100 and interpret it as a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559201#comment-13559201 ] Andy Isaacson commented on MAPREDUCE-4953: -- Since hitting the bug would probably segfault the Pipes app, and the fix is trivially correct, no tests are included. HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4838) Add extra info to JH files
[ https://issues.apache.org/jira/browse/MAPREDUCE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559422#comment-13559422 ] Siddharth Seth commented on MAPREDUCE-4838: --- Thanks for working on this Zhijie. bq. There's no JobInProgress (actually nearly empty) and TaskInProgress, where the locality ant the avataar attributes are set and logged. Instead, avataar is now set in TaskImpl#addAndScheduleAttempt by judging whether there are other active task attempts, while locality is set in TaskAttemptImpl#ContainerAssignedTransition#transition by judging whether the assigned container's host is within the local host/rack list of the task attempt. Right. I think you've got these, as well as the other changes mapped to the correct places for trunk. Comments on the patch - Some lines exceed the 80 column limit. (Coding style guidelines at http://wiki.apache.org/hadoop/CodeReviewChecklist) - Job specific configuration strings like mapreduce.workflow.id etc should be in MRJobConfig - TaskAttemptImpl - Default locality set to NODE_LOCAL. Should be OFF_SWITCH to match branch-1. - In TaskAttemptImpl, avoid resolving the host names multiple times. - In TaskImpl, SPECULATIVE should be set when the RedundantScheduleTransition is taken. Not for retires caused by FAILED / KILLED attempts. The same likely applies to the branch-1 patch. - In the history events (JobSubmittedEvent etc) - null check on the new strings. (the Utf8 constructor does not work with nulls) - The toString implementation in the history events, as well as the Rumen events is not really needed. If implemented, they should include additional fields. The additional information may be useful to expose - via the UI at least - in which case it'll need to be exposed via TaskAttemptReport. This can be done in a follow up jira. For now, Locality and Avatar enums could be moved to mrv2 - the hadoop-mapreduce-client-common module (ref TaskType). Add extra info to JH files -- Key: MAPREDUCE-4838 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4838 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Arun C Murthy Assignee: Zhijie Shen Attachments: MAPREDUCE-4838_1.patch, MAPREDUCE-4838.patch It will be useful to add more task-info to JH for analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-4469: Attachment: MAPREDUCE-4469_rev5.patch Here is the updated patch. Thanks Todd and Phil for your comments! I have updated the patch to get rid of the excludedPids caching which may result in miscalculations due to pid recycling as Todd highlighted. The patch also uses StringUtils.split(). Since getrusage only accounts for terminated children, the updates will be missing important resource usage info for any currently running children which haven't terminated yet, so in my opinion we shouldn't use it. I have also added a time frequency property (in milliseconds instead of skips) determining when the resource usage counters are updated. Please take a look and let me know if you have any comments. Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, MAPREDUCE-4469_rev5.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559433#comment-13559433 ] Hadoop QA commented on MAPREDUCE-4469: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565910/MAPREDUCE-4469_rev5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3261//console This message is automatically generated. Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, MAPREDUCE-4469_rev5.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira