[jira] [Created] (MAPREDUCE-4866) ShuffleRamManager is limited to 2Gb of memory - we should increase that
Varene Olivier created MAPREDUCE-4866: - Summary: ShuffleRamManager is limited to 2Gb of memory - we should increase that Key: MAPREDUCE-4866 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4866 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 0.20.2 Environment: linux, 64bits cpu, more than 2Gb of memory for each reducer tasks Reporter: Varene Olivier Priority: Minor Inside the org.apache.hadoop.mapred.ReduceTask.java, the *ShuffleRamManager* is limited to allocate up to 2Gb of memory during the shuffle phase. We should be able to allocate more, to take advantage of the full memory we have on servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4866) ShuffleRamManager is limited to 2Gb of memory - we should increase that
[ https://issues.apache.org/jira/browse/MAPREDUCE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varene Olivier updated MAPREDUCE-4866: -- Attachment: MAPREDUCE-4866-INCOMPLETE.patch This patch is incomplete and does not work !!! because the allocation of byte array is limited to around Integer.MAX_VALUE, we need to find another way to allocate such huge space of memory. What about BigArrays ? What do you think ? propose ? ShuffleRamManager is limited to 2Gb of memory - we should increase that --- Key: MAPREDUCE-4866 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4866 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 0.20.2 Environment: linux, 64bits cpu, more than 2Gb of memory for each reducer tasks Reporter: Varene Olivier Priority: Minor Labels: patch Attachments: MAPREDUCE-4866-INCOMPLETE.patch Inside the org.apache.hadoop.mapred.ReduceTask.java, the *ShuffleRamManager* is limited to allocate up to 2Gb of memory during the shuffle phase. We should be able to allocate more, to take advantage of the full memory we have on servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4867) reduces tasks won't start in certain circumstances
Vincent Behar created MAPREDUCE-4867: Summary: reduces tasks won't start in certain circumstances Key: MAPREDUCE-4867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4867 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.0.4 Reporter: Vincent Behar Reduce tasks start are conditioned by the value of mapred.reduce.slowstart.completed.maps. However, if the number of completed map tasks never reached the configured value (for example because mapred.max.map.failures.percent has been set to a high value, to permit a job to have a lot of failed tasks), then the reduce tasks won't start. The job is still running, all map tasks are finished (either successful or not), and all reduce tasks are still pending. The only thing one can do is to kill the job. There are 2 things that could be done : - document the relation between mapred.max.map.failures.percent and mapred.reduce.slowstart.completed.maps : we can say that the rule to follow if you want to be sure that your reduce tasks will start is : mapred.reduce.slowstart.completed.maps * 100 100 - mapred.max.map.failures.percent - fix JobInProgress.scheduleReduces() to return true if all map tasks are finished -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4867) reduces tasks won't start in certain circumstances
[ https://issues.apache.org/jira/browse/MAPREDUCE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527964#comment-13527964 ] Jason Lowe commented on MAPREDUCE-4867: --- I believe this is a duplicate of MAPREDUCE-2129 which was fixed in 1.1.0. reduces tasks won't start in certain circumstances --- Key: MAPREDUCE-4867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4867 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.0.4 Reporter: Vincent Behar Reduce tasks start are conditioned by the value of mapred.reduce.slowstart.completed.maps. However, if the number of completed map tasks never reached the configured value (for example because mapred.max.map.failures.percent has been set to a high value, to permit a job to have a lot of failed tasks), then the reduce tasks won't start. The job is still running, all map tasks are finished (either successful or not), and all reduce tasks are still pending. The only thing one can do is to kill the job. There are 2 things that could be done : - document the relation between mapred.max.map.failures.percent and mapred.reduce.slowstart.completed.maps : we can say that the rule to follow if you want to be sure that your reduce tasks will start is : mapred.reduce.slowstart.completed.maps * 100 100 - mapred.max.map.failures.percent - fix JobInProgress.scheduleReduces() to return true if all map tasks are finished -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4867) reduces tasks won't start in certain circumstances
[ https://issues.apache.org/jira/browse/MAPREDUCE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527994#comment-13527994 ] Vincent Behar commented on MAPREDUCE-4867: -- yes it is a duplicate of MAPREDUCE-2129 (sorry I didn't find it) The fix has been applied to branch-1 and branch-1.1, but not branch-1.0. Merging r1358233 (from branch-1) in branch-1.0 should be enough. Thanks reduces tasks won't start in certain circumstances --- Key: MAPREDUCE-4867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4867 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.0.4 Reporter: Vincent Behar Reduce tasks start are conditioned by the value of mapred.reduce.slowstart.completed.maps. However, if the number of completed map tasks never reached the configured value (for example because mapred.max.map.failures.percent has been set to a high value, to permit a job to have a lot of failed tasks), then the reduce tasks won't start. The job is still running, all map tasks are finished (either successful or not), and all reduce tasks are still pending. The only thing one can do is to kill the job. There are 2 things that could be done : - document the relation between mapred.max.map.failures.percent and mapred.reduce.slowstart.completed.maps : we can say that the rule to follow if you want to be sure that your reduce tasks will start is : mapred.reduce.slowstart.completed.maps * 100 100 - mapred.max.map.failures.percent - fix JobInProgress.scheduleReduces() to return true if all map tasks are finished -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4867) reduces tasks won't start in certain circumstances
[ https://issues.apache.org/jira/browse/MAPREDUCE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528008#comment-13528008 ] Jason Lowe commented on MAPREDUCE-4867: --- Adding Matt Foley who is the release manager for Hadoop 1.x. He can comment on whether there are plans for another 1.0.x release and if MAPREDUCE-2129 would be a good candidate. reduces tasks won't start in certain circumstances --- Key: MAPREDUCE-4867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4867 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.0.4 Reporter: Vincent Behar Reduce tasks start are conditioned by the value of mapred.reduce.slowstart.completed.maps. However, if the number of completed map tasks never reached the configured value (for example because mapred.max.map.failures.percent has been set to a high value, to permit a job to have a lot of failed tasks), then the reduce tasks won't start. The job is still running, all map tasks are finished (either successful or not), and all reduce tasks are still pending. The only thing one can do is to kill the job. There are 2 things that could be done : - document the relation between mapred.max.map.failures.percent and mapred.reduce.slowstart.completed.maps : we can say that the rule to follow if you want to be sure that your reduce tasks will start is : mapred.reduce.slowstart.completed.maps * 100 100 - mapred.max.map.failures.percent - fix JobInProgress.scheduleReduces() to return true if all map tasks are finished -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4703: -- Issue Type: Improvement (was: Bug) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. -- Key: MAPREDUCE-4703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, test Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4703_branch-1.patch, MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528120#comment-13528120 ] Alejandro Abdelnur commented on MAPREDUCE-4703: --- Thanks Ahmed. I've committed to trunk and branch-2 after running the tests. However, when trying to run the test with the branch-1 patch the test is failing with the following output. Would you please take a look at it? I'll hold off committing to branch-1 until this is addressed. Leaving the JIRA open as well. {code} Testcase: testJob took 14.387 sec Testcase: testRestart took 7.333 sec Caused an ERROR java.io.IOException: Call to localhost/127.0.0.1:59747 failed on local exception: java.io.EOFException java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:59747 failed on local exception: java.io.EOFException at org.apache.hadoop.mapred.MiniMRCluster.waitUntilIdle(MiniMRCluster.java:325) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:527) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:465) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:457) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:449) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:439) at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:429) at org.apache.hadoop.mapred.MiniMRClusterAdapter.restart(MiniMRClusterAdapter.java:80) at org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:109) Caused by: java.io.IOException: Call to localhost/127.0.0.1:59747 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144) at org.apache.hadoop.ipc.Client.call(Client.java:1112) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at org.apache.hadoop.mapred.$Proxy10.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:505) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:496) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:479) at org.apache.hadoop.mapred.MiniMRCluster.waitUntilIdle(MiniMRCluster.java:311) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) {code} Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. -- Key: MAPREDUCE-4703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, test Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4703_branch-1.patch, MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4703: -- Affects Version/s: 2.0.3-alpha 1.2.0 Fix Version/s: 2.0.3-alpha Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. -- Key: MAPREDUCE-4703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, test Affects Versions: 1.2.0, 2.0.3-alpha Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4703_branch-1.patch, MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528122#comment-13528122 ] Hudson commented on MAPREDUCE-4703: --- Integrated in Hadoop-trunk-Commit #3102 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3102/]) MAPREDUCE-4703. Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. (ahmed.radwan via tucu) (Revision 1419618) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1419618 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRClientCluster.java Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. -- Key: MAPREDUCE-4703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, test Affects Versions: 1.2.0, 2.0.3-alpha Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4703_branch-1.patch, MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528123#comment-13528123 ] Ahmed Radwan commented on MAPREDUCE-4703: - Thanks Tucu! I'll take a look at this failure and get back to you. Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped. -- Key: MAPREDUCE-4703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, test Affects Versions: 1.2.0, 2.0.3-alpha Reporter: Ahmed Radwan Assignee: Ahmed Radwan Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4703_branch-1.patch, MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528271#comment-13528271 ] Alejandro Abdelnur commented on MAPREDUCE-4049: --- Avner, If you look at the latest patches for MAPREDUCE-4807, MAPREDUCE-4812 MAPREDUCE-4809, you'll see that they are limited to do the same thing MAPREDUCE-4049 does, define an interface, make existing classes to implement that inteface, instanciate those classes using ReflectionUtils.newInstance(). The only thing extra is a minor refactoring that it has been already agreed that it is OK and posses no risk. The bulk of the patches are testcases, that instead limiting to test the pluggability, they provide alternate simple alternate implementations to show the interfaces are adequate for such. In order to get this done, I encourage you, as a contributor, to look at the work proposed in MAPREDUCE-4807, MAPREDUCE-4812 MAPREDUCE-4809 and provide feedback so we get things in the branch and in trunk. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4859) TestRecoveryManager fails on branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-4859: - Attachment: MAPREDUCE-4859.patch I had a look at the failing tests. A couple of the tests were hanging because they don't wait for the jobs to complete, so the tasktrackers never exit and mini cluster shutdown waits forever. testJobResubmission was failing due to a race where the old TIP gets removed while the recovered TIP is running so the TT thinks it has never completed. I also made the output directories unique since there were occasional clashes between tests despite the test directory being deleted each time. Tests pass for me on Mac and Linux. Matt/Arun - can you see if the patch works for you please? TestRecoveryManager fails on branch-1 - Key: MAPREDUCE-4859 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4859 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 1.1.2 Attachments: MAPREDUCE-4859.patch, MAPREDUCE-4859.patch Looks like the tests are extremely flaky and just hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528317#comment-13528317 ] Arun C Murthy commented on MAPREDUCE-4049: -- Forgot to add over weekend - GM run finished with this patch slightly faster 13s on 300 nodes with ~1300 jobs. Overall runtime was 65mins. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Allow reduce-side merge to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528334#comment-13528334 ] Arun C Murthy commented on MAPREDUCE-4808: -- Asokan, thanks for the clarification. However, I'm still trying to understand what you are trying to achieve here. The original goals of the parent task (MAPREDUCE-2454) was to make 'sort pluggable'. We've accomplished that with MAPREDUCE-4807 and MAPREDUCE-4809. Now, are we done? If not, what else is remaining to achieve that? Do you need some special hook in the Reducer's merge for Syncsort? As I've told you in person, when making sweeping changes to framework it's better to focus on the 'goal' and make as minimal changes to get there. We can always do more work and add more features, but let's do one thing at a time. We can add limit-N etc. separately, it just delays this jira - why do that? Allow reduce-side merge to be pluggable --- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch Allow reduce-side merge to be pluggable for MAPREDUCE-2454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Allow reduce-side merge to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528346#comment-13528346 ] Arun C Murthy commented on MAPREDUCE-4808: -- To be clear, I'm not against newer features - I just want them done independently so we can close this out and be done with. Allow reduce-side merge to be pluggable --- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch Allow reduce-side merge to be pluggable for MAPREDUCE-2454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528376#comment-13528376 ] Milind Bhandarkar commented on MAPREDUCE-4049: -- Thanks for verifying, Arun. FWIW, we have been running with many earlier versions of this patch on our Greenplum Analytics Workbench 1000 node cluster since May 2012 (I think I had mentioned this to you and Chris Douglas during Hadoop Summit in June), and haven't found any issues with this patch so far. (See my comment above.) plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4594: --- Attachment: partitioner4.txt Add init/shutdown methods to mapreduce Partitioner -- Key: MAPREDUCE-4594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: trunk Reporter: Radim Kolar Attachments: partitioner1.txt, partitioner2.txt, partitioner2.txt, partitioner3.txt, partitioner4.txt The Partitioner supports only the Configurable API, which can be used for basic init in setConf(). Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer. Use case is that I need to start and stop spring context and datagrid client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528491#comment-13528491 ] Hadoop QA commented on MAPREDUCE-4594: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560312/partitioner4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3115//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3115//console This message is automatically generated. Add init/shutdown methods to mapreduce Partitioner -- Key: MAPREDUCE-4594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: trunk Reporter: Radim Kolar Attachments: partitioner1.txt, partitioner2.txt, partitioner2.txt, partitioner3.txt, partitioner4.txt The Partitioner supports only the Configurable API, which can be used for basic init in setConf(). Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer. Use case is that I need to start and stop spring context and datagrid client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4396) Make LocalJobRunner work with private distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528577#comment-13528577 ] Ivan Mitic commented on MAPREDUCE-4396: --- This was fixed with HADOOP-8734 in branch-1-win. Maybe just integrate the same patch to branch-1? Make LocalJobRunner work with private distributed cache --- Key: MAPREDUCE-4396 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4396 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Priority: Minor Attachments: mapreduce-4396-branch-1.patch, test-afterpatch.result, test-beforepatch.result, test-patch.result Some LocalJobRunner related unit tests fails if user directory permission and/or umask is too restrictive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability
[ https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reopened MAPREDUCE-4549: --- Distributed cache conflicts breaks backwards compatability -- Key: MAPREDUCE-4549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 0.23.5 Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards compatibility with 1.0 in distribtued cache entries. instead of changing the behavior of the distributed cache to more closely match 1.0 behavior I want to just change the exception to a warning message informing the users that it will become an error in 2.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability
[ https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-4549: -- Affects Version/s: 2.0.2-alpha Fix Version/s: 2.0.3-alpha Distributed cache conflicts breaks backwards compatability -- Key: MAPREDUCE-4549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards compatibility with 1.0 in distribtued cache entries. instead of changing the behavior of the distributed cache to more closely match 1.0 behavior I want to just change the exception to a warning message informing the users that it will become an error in 2.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability
[ https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-4549: -- Attachment: MAPREDUCE-4549-trunk.patch Distributed cache conflicts breaks backwards compatability -- Key: MAPREDUCE-4549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards compatibility with 1.0 in distribtued cache entries. instead of changing the behavior of the distributed cache to more closely match 1.0 behavior I want to just change the exception to a warning message informing the users that it will become an error in 2.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1639) Grouping using hashing instead of sorting
[ https://issues.apache.org/jira/browse/MAPREDUCE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528596#comment-13528596 ] Jerry Chen commented on MAPREDUCE-1639: --- +1 I think this feature is valuable and I would take time to work on this. The hash based algorithm can both used for group by and for join. Both of them are not requiring a global sort. Grouping using hashing instead of sorting - Key: MAPREDUCE-1639 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1639 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Joydeep Sen Sarma most applications of map-reduce care about grouping and not sorting. Sorting is a (relatively expensive) way to achieve grouping. In order to achieve just grouping - one can: - replace the sort on the Mappers with a HashTable - and maintain lists of key-values against each hash-bucket. - key-value tuples inside each hash bucket are sorted - before spilling or sending to Reducer. Anytime this is done - Combiner can be invoked. - HashTable is serialized by hash-bucketid. So merges (of either spills or Map Outputs) works similar to today (at least there's no change in overall compute complexity of merge) Of course this hashtable has nothing to do with partitioning. it's just a replacement for map-side sort. -- this is (pretty much) straight from the MARS project paper: http://www.cse.ust.hk/catalac/papers/mars_pact08.pdf. They report a 45% speedup in inverted index calculation using hashing instead of sorting (reference implementation is NOT against Hadoop though). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3247) Add hash aggregation style data flow and/or new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528609#comment-13528609 ] Jerry Chen commented on MAPREDUCE-3247: --- Binglin, I noticed that you create this bug from MAPREDUCE-1639, while I think this two bugs are more or less similar. And also there are a lot other things related are going on such as MAPREDUCE-2454 and MAPREDUCE-4049. If you are not working on this, I would like to take time to work on this feature. Add hash aggregation style data flow and/or new API --- Key: MAPREDUCE-3247 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3247 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task Affects Versions: 0.23.0 Reporter: Binglin Chang Labels: api, perfomance In many join/aggregation like queries run on top of mapreduce, sort is not need, in fact a hash table based join/aggregation is more efficient, this is described in Tenzing A SQL Implementation On The MapReduce Framework in detail. There are two ways to support hash table based join/aggregation in hadoop mapreduce: # Only support no sort, the framework do nothing, just pass partitioned k/v pair from mapper to reducer The upper application use hash table in their mapper reducer to do aggregation, and emit all hashtable enties in cleanup() of mapper/reducer, this is how Google did in Tenzing. The main problem is memory control of hashtable. # Add new fold API, it can coexist with combiner/reducer API, user can use mapper-combiner-reducer or mapper-folder (maybe a bad name, welcome to propose a better name..) Like foldl in functional programming: folder should have the semantic: foldl folder z (x:xs) = foldl folder (folder z x) xs In this way, upper applications only need to provide folder, underlying framework create and maintains hashtable for key/value pairs, it can be managed optimized by the framework. For example, in mapper side, we can pre emit entire hashtable or use some policies like cache algorithm to emit part of k/v pairs to free some memory, if the memory consumption reach io.sort.mb -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4868) Allow multiple iteration for map
Jerry Chen created MAPREDUCE-4868: - Summary: Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the map the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4868: -- Description: Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. was: Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the map the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528629#comment-13528629 ] Radim Kolar commented on MAPREDUCE-4868: Did you tried Spring Batch? You can boot it in setup() and do whatever you want with data, including multiple steps and multithreading. Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528630#comment-13528630 ] Radim Kolar commented on MAPREDUCE-4868: also this one you can find handy. org.apache.hadoop.mapred.lib.ChainMapper Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528641#comment-13528641 ] Jerry Chen commented on MAPREDUCE-4868: --- Radim, thank you very much your quick response. I checked the ChainMapper and it showed to be not quite the same thing as here. The ChainMapper actually iterate the map data only once, and for each key value, it goes through the chain of mappers. But the difference here is it will enable the mapper to run multiple iterations. At the first glance, it seems to make no sense. But considering the parameter data needed (not the input data) for each iteration, it makes sense when considering the availability of the parameter data for each iteration. In the Hive optimization problem I mentioned above, the parameter data may not be able to fit in the memory and we need partition the data and load in the memory and goes through mutiple times over the input data for each partition. This saves the complex reduce stage. Does this makes sense, or there are other way around which provide equivalent performance? Thanks again. Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528642#comment-13528642 ] Radim Kolar commented on MAPREDUCE-4868: If you want multiple passes then go for Spring Batch. All you need to write is hdfs reader, writer driver for spring batch. Its about 20 lines of code each. Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map
[ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528664#comment-13528664 ] Jerry Chen commented on MAPREDUCE-4868: --- It showed to me that Spring Batch is another batch processing infrastructure. While we are seeking solve the problem under the context of MapReduce as well as enpower the map reduce in a reasonable manner, other than simply hook totally to another batch processing thing. Allow multiple iteration for map Key: MAPREDUCE-4868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Jerry Chen Fix For: 3.0.0, 2.0.3-alpha Original Estimate: 168h Remaining Estimate: 168h Currently, the Mapper class allows advanced users to override public void run(Context context) method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of more control. One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters. This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4848) TaskAttemptContext cast error during AM recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528685#comment-13528685 ] Jerry Chen commented on MAPREDUCE-4848: --- Hi Jason, I looked into this problem and this is a bug in RecoveryService of MRv2. The cause is that the RecoveryService didn't consider the commiter type (new api commiter or old api commiter). I can submit a patch to this issue soon. TaskAttemptContext cast error during AM recovery Key: MAPREDUCE-4848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4848 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 0.23.4 Reporter: Jason Lowe Recently saw an AM that failed and tried to recover, but the subsequent attempt quickly exited with its own failure during recovery: {noformat} 2012-12-05 02:33:36,752 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.ClassCastException: org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl cannot be cast to org.apache.hadoop.mapred.TaskAttemptContext at org.apache.hadoop.mapred.OutputCommitter.recoverTask(OutputCommitter.java:284) at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$InterceptingEventHandler.handle(RecoveryService.java:361) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1211) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1177) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:958) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:926) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:918) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:285) at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:281) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-12-05 02:33:36,752 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. {noformat} The RM then launched a third AM attempt which succeeded. The third attempt saw basically no progress after parsing the history file from the second attempt and ran the job again from scratch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4869) TestMapReduceChildJVM fails in branch-trunk-win
[ https://issues.apache.org/jira/browse/MAPREDUCE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-4869: - Target Version/s: trunk-win TestMapReduceChildJVM fails in branch-trunk-win --- Key: MAPREDUCE-4869 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4869 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth The YARN-233 patch for getting YARN working on Windows forgot to include a corresponding change in {{TestMapReduceChildJVM}}, so the test is failing now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (MAPREDUCE-4869) TestMapReduceChildJVM fails in branch-trunk-win
[ https://issues.apache.org/jira/browse/MAPREDUCE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth moved HADOOP-9130 to MAPREDUCE-4869: -- Component/s: (was: test) test Target Version/s: (was: trunk-win) Affects Version/s: (was: trunk-win) trunk-win Key: MAPREDUCE-4869 (was: HADOOP-9130) Project: Hadoop Map/Reduce (was: Hadoop Common) TestMapReduceChildJVM fails in branch-trunk-win --- Key: MAPREDUCE-4869 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4869 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth The YARN-233 patch for getting YARN working on Windows forgot to include a corresponding change in {{TestMapReduceChildJVM}}, so the test is failing now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4869) TestMapReduceChildJVM fails in branch-trunk-win
[ https://issues.apache.org/jira/browse/MAPREDUCE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-4869: - Attachment: MAPREDUCE-4869-branch-trunk-win.1.patch The attached patch updates the test to remove exec and perform platform-specific escaping of environment variable references. With YARN-233, the exec is now inserted on the container side, because this is an OS-specific command. With this patch, the test passes on Mac and Windows. TestMapReduceChildJVM fails in branch-trunk-win --- Key: MAPREDUCE-4869 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4869 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4869-branch-trunk-win.1.patch The YARN-233 patch for getting YARN working on Windows forgot to include a corresponding change in {{TestMapReduceChildJVM}}, so the test is failing now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4810) Add admin command options for ApplicationMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528766#comment-13528766 ] Jerry Chen commented on MAPREDUCE-4810: --- +1 It makes sense separating stable confs with those that may vary by each job. While the question is that how strong is the need to set per job confs (such as Heap size) on a MRAppMaster that only masters the running of the tasks but not running task itself. What I can think of is that different jobs may differ on the number of tasks which may lead different level of memory comsumption. Is there any other use cases that may have a specific need of heap size of MRAppMaster? Add admin command options for ApplicationMaster --- Key: MAPREDUCE-4810 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4810 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 2.0.2-alpha, 0.23.4 Reporter: Jason Lowe Priority: Minor It would be nice if the MR ApplicationMaster had the notion of admin options in addition to the existing user options much like we have for map and reduce tasks, e.g.: mapreduce.admin.map.child.java.opts vs. mapreduce.map.java.opts. This allows site-wide configuration options for MR AMs but still allows a user to easily override the heap size of the AM without worrying about dropping other admin-specified options. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4816) JobImpl Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED
[ https://issues.apache.org/jira/browse/MAPREDUCE-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528775#comment-13528775 ] Jerry Chen commented on MAPREDUCE-4816: --- Hi Jason, I looked into this issue. The cause of the exception is that when a job is at a FAILED state, it should ignore further JOB_TASK_ATTEMPT_COMPLETED events. While in the version 0.23.5, the trasition from FAILED state with JOB_TASK_ATTEMPT_COMPLETED event is not declared in its state machine and thus throws the exception. Actually, besides JOB_TASK_ATTEMPT_COMPLETED, other events such as JOB_TASK_COMPLETED, JOB_MAP_TASK_RESCHEDULED also possibly to happen at the FAILED state and should also be declared in the state machine. I checked the trunk version. And there seems to be some refactor done with the JobState - JobStateInternal and already fix the transition problem mentioned above. If necessary, I would back port the trasition fixes to these versions such as 2.0.2-alpha. JobImpl Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED --- Key: MAPREDUCE-4816 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4816 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 0.23.5 Reporter: Jason Lowe Saw this in an AM log of a task that had failed: {noformat} 2012-11-21 23:26:44,533 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:690) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:900) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira