[jira] [Commented] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841409#comment-13841409 ] Milind Bhandarkar commented on MAPREDUCE-5649: -- Folks, is there any reason behind limiting this amount of memory to 2GB ? Reduce cannot use more than 2G memory for the final merge -- Key: MAPREDUCE-5649 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: trunk Reporter: stanley shi In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in the finalMerge method: int maxInMemReduce = (int)Math.min( Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE); This means no matter how much memory user has, reducer will not retain more than 2G data in memory before the reduce phase starts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595526#comment-13595526 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- The community just created a huge issue for me to make this available to the community, by naming us anti-community. So, while I am trying to get this available to the community, I have to now a few more obstacles to overcome. Please bear with me, or better still try to stop the community to stop their bile-spewing against us, so that we can navigate through this mess. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Ralph H Castain Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528376#comment-13528376 ] Milind Bhandarkar commented on MAPREDUCE-4049: -- Thanks for verifying, Arun. FWIW, we have been running with many earlier versions of this patch on our Greenplum Analytics Workbench 1000 node cluster since May 2012 (I think I had mentioned this to you and Chris Douglas during Hadoop Summit in June), and haven't found any issues with this patch so far. (See my comment above.) plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4459) Allow ad-placement on the JobTracker /RM UI
Milind Bhandarkar created MAPREDUCE-4459: Summary: Allow ad-placement on the JobTracker /RM UI Key: MAPREDUCE-4459 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4459 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.0.0-alpha Environment: all, especially private clusters Reporter: Milind Bhandarkar Priority: Minor A lot of Hadoop map-reduce users spend a lot of time staring at the jobtracker webUI to check if their job has been scheduled, and checking the progress. An easy way to monetize these eyeballs is to allow ad-placement on this page. This will attract public-cloud IaaS companies such as AWS, Google Compute Engine, Microsoft Azure etc to place ads on that page, such as Waiting for your job to be scheduled on your company's Hadoop cluster ? You can create your own cluster and run your jobs fast, without waiting. This will allow major Hadoop installations to offload some of their load to public IaaS clouds, and in addition, create an ad-revenue source for themselves. And not only that, based on the demographic (mostly male, mostly starved of all the real-world fun) of users of these Hadoop clusters, there could be very targeted ads to be placed on this page. (Please consider this as an extension to HADOOP-8607). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283826#comment-13283826 ] Milind Bhandarkar commented on MAPREDUCE-4049: -- Luke, the original Auburn Univ work was done (with Mellanox support) in version 0.20.x. However, Avner ported it to 1.0.x, which is the patch attached here. We have been testing it at Greenplum (with both default shuffle plugin, and Mellanox's Unstructured Data Accelerator (UDA) plugin,) and haven't found any issues so far. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task, tasktracker Affects Versions: 1.1.0, 1.0.3, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, HADOOP-1.0.x.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, MAPREDUCE-4049-branch-1.0.2.patch, mapred-site.xml, mapred.diff, src.tgz, test.diff Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283846#comment-13283846 ] Milind Bhandarkar commented on MAPREDUCE-4049: -- Luke, I do not know of any tests done with 0.20.203 (Avner do you have any numbers?). The tests that compared vanilla 1.0.x with pluggable shuffle (with default plugin) do not show any measurable difference. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task, tasktracker Affects Versions: 1.1.0, 1.0.3, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, HADOOP-1.0.x.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, MAPREDUCE-4049-branch-1.0.2.patch, mapred-site.xml, mapred.diff, src.tgz, test.diff Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277916#comment-13277916 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- I am excited to report that, thanks to great efforts by Ralph Castain and Wangda Tan, Hamster (i.e. OpenMPI on Yarn) now works flawlessly, and is scheduled to be merged to OpenMPI trunk soon. This effort was equivalent to building a second floor on a mobile home while it was hurtling down the freeway at 65 MPH :-) Thanks to both Ralph Wangda. According to Ralph: Lots of cleanup and documentation to do, and performance sucks per HPC standards. But at least it works! To my knowledge, this is the first application framework implemented in C that uses the multi-lingual protobuf APIs for Yarn. (For secure environments, a small java-based shim is needed.) Also, it is encouraging that no changes were needed in Yarn to make resource allocation work for MPI. (MPI as a standard came along in 1994, 18 years before Yarn was designed.) Currently, using MPI-IO functionality in MPI requires a shared posix file-system mounted on every node. However, this will change in future. For some distributed file systems (*cough*), which offer posix interface, MPI-IO works today. Once it is decided whether BigTop can include Non-ASF packages, we plan to work with BigTop community to integrate OpenMPI (new BSD-licensed) in the big data stack. I am closing this issue as fixed. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Ralph H Castain Fix For: 0.24.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar resolved MAPREDUCE-2911. -- Resolution: Fixed Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Ralph H Castain Fix For: 0.24.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption
[ https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112741#comment-13112741 ] Milind Bhandarkar commented on MAPREDUCE-3065: -- @Chris, glad to know that it worked! Ironically, I had discovered this issue with RHEL6 when working on another project earlier this year at LinkedIn :-) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption --- Key: MAPREDUCE-3065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.24.0 Reporter: Chris Riccomini Hey Vinod, OK, so I have a little more clarity into this. When I bump my resource request for my AM to 4096, it runs. The important line in the NM logs is: 2011-09-21 13:43:44,366 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 for container-id container_1316637655278_0001_01_01 : Virtual 2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit -1 bytes The thing to note is the virtual memory, which is off the charts, even though my physical memory is almost nothing (12 megs). I'm still poking around the code, but I am noticing that there are two checks in the NM, one for virtual mem, and one for physical mem. The virtual memory check appears to be toggle-able, but is presumably defaulted to on. At this point I'm trying to figure out exactly what the VMEM check is for, why YARN thinks my app is taking 2 gigs, and how to fix this. Cheers, Chris From: Chris Riccomini [criccom...@linkedin.com] Sent: Wednesday, September 21, 2011 1:42 PM To: mapreduce-...@hadoop.apache.org Subject: Re: ApplicationMaster Memory Usage For the record, I bumped to 4096 for memory resource request, and it works. :( On 9/21/11 1:32 PM, Chris Riccomini criccom...@linkedin.com wrote: Hey Vinod, So, I ran my application master directly from the CLI. I commented out the YARN-specific code. It runs fine without leaking memory. I then ran it from YARN, with all YARN-specific code commented it. It again ran fine. I then uncommented JUST my registerWithResourceManager call. It then fails with OOM after a few seconds. I call registerWithResourceManager, and then go into a while(true) { println(yeh) sleep(1000) }. Doing this prints: yeh yeh yeh yeh yeh At which point, it dies, and, in the NodeManager,I see: 2011-09-21 13:24:51,036 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for container: container_1316626117280_0005_01_01 has processes older than 1 iteration running over the configured limit. Limit=2147483648, current usage = 2192773120 2011-09-21 13:24:51,037 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(453)) - Container [pid=23852,containerID=container_1316626117280_0005_01_01] is running beyond memory-limits. Current usage : 2192773120bytes. Limit : 2147483648bytes. Killing container. Dump of the process-tree for container_1316626117280_0005_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java -Xmx512M -cp './package/*' kafka.yarn.ApplicationMaster /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280 com.linkedin.TODO 1 1/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000 001/stdout 2/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000 001/stderr |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp ./package/* kafka.yarn.ApplicationMaster /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280 com.linkedin.TODO 1 2011-09-21 13:24:51,037 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852 Either something is leaking in YARN, or my registerWithResourceManager code (see below) is doing something funky. I'm trying to avoid going through all the pain of attaching a remote debugger. Presumably things aren't leaking in YARN, which means it's likely that I'm doing something wrong in my registration code. Incidentally, my NodeManager is running with 1000 megs. My application master memory is set to 2048, and my -Xmx setting is 512M Cheers, Chris From: Vinod Kumar
[jira] [Commented] (MAPREDUCE-3060) Generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109872#comment-13109872 ] Milind Bhandarkar commented on MAPREDUCE-3060: -- +1 ! This makes a lot of optimized third party plugins possible. Generic shuffle service --- Key: MAPREDUCE-3060 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3060 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Luke Lu Labels: shuffle Fix For: 0.24.0 When I was talking to Owen about HADOOP-2600, we came across (again, talked about it with Chris before) the shuffle dependency issue. NodeManager currently has an implicit (hidden by the service plugin mechanism) dependency of a specific version of mapreduce shuffle. While this works in many cases, as long as we don't change shuffle headers and the usage of mapred security tokens, it's a hack to make things work none the less. It's generally agreed upon that nodemanager should only load generic services that are mapreduce framework neutral. In this particular case, the right solution seems to be a generic shuffle handler that can serve data for a particular partition securely. The ShuffleHandler currently only depends on mapreduce for task tokens and shuffle header, which is only used for writing data, i.e., the shuffle handler has no semantic dependency on mapreduce. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2268) With JVM reuse, JvmManager doesn't delete last workdir properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108197#comment-13108197 ] Milind Bhandarkar commented on MAPREDUCE-2268: -- jvm reuse has very limited use in practice, especially on a multi-tenant cluster. Also, with security enabled, jvm reuse tends to get used even more rarely. In the 0.22 release notes, we should make a note of this, and should make this jira not-a-blocker. Thoughts ? With JVM reuse, JvmManager doesn't delete last workdir properly --- Key: MAPREDUCE-2268 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2268 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 In JvmManager, when a Jvm exits, it tries to delete the workdir for {{initalContext.task}} which is null, hence throwing NPE. Currently this NPE is swallowed into the abyss. We should catch exceptions out of the JvmRunner thread, add a test case that verifies this functionality, and fix this code to properly grab the last task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101330#comment-13101330 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- Sorry folks. I got distracted this week by some mind-numbing non-technical stuff. Progress on hamster was slow, as a result. Since I will be travelling next week, hoping to find some time to work on it :-) Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767
Hadoop streaming test failure, post MR-2767 --- Key: MAPREDUCE-2948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.22.0, 0.23.0, 0.24.0 Environment: All Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.22.0, 0.23.0, 0.24.0 After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in contrib/streaming: TestStreamingAsDifferentUser.java is failing since it imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch forthcoming. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767
[ https://issues.apache.org/jira/browse/MAPREDUCE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-2948 started by Milind Bhandarkar. Hadoop streaming test failure, post MR-2767 --- Key: MAPREDUCE-2948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.22.0, 0.23.0, 0.24.0 Environment: All Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.22.0, 0.23.0, 0.24.0 Original Estimate: 1h Remaining Estimate: 1h After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in contrib/streaming: TestStreamingAsDifferentUser.java is failing since it imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch forthcoming. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767
[ https://issues.apache.org/jira/browse/MAPREDUCE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099612#comment-13099612 ] Milind Bhandarkar commented on MAPREDUCE-2948: -- Thanks for doing it while I had stepped out for a meeting, Mahadev :-) Hadoop streaming test failure, post MR-2767 --- Key: MAPREDUCE-2948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.22.0, 0.23.0, 0.24.0 Environment: All Reporter: Milind Bhandarkar Assignee: Mahadev konar Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MAPREDUCE-2948-0.22.patch, MAPREDUCE-2948.patch Original Estimate: 1h Remaining Estimate: 1h After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in contrib/streaming: TestStreamingAsDifferentUser.java is failing since it imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch forthcoming. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2929) Move task-controller from bin to libexec
[ https://issues.apache.org/jira/browse/MAPREDUCE-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098297#comment-13098297 ] Milind Bhandarkar commented on MAPREDUCE-2929: -- linux task-controller is scheduled to be removed from 0.23. (See MAPREDUCE-2767). Move task-controller from bin to libexec Key: MAPREDUCE-2929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2929 Project: Hadoop Map/Reduce Issue Type: Bug Components: task-controller Affects Versions: 0.20.204.0, 0.23.0 Environment: Java, Redhat 5.6 Reporter: Eric Yang Linux task-controller is hard coded to $HADOOP_HOME/bin. Ideally, it should be moved to $HADOOP_PREFIX/libexec for ant binary layout, or the updated file structure layout for trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: (was: MR2767-trunk.patch) Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Open (was: Patch Available) Cancelling, and re-making patch-available. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: MR2767-trunk.patch Removing conflict in trunk due to recent commits. Re-did patch for trunk. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Patch Available (was: Open) Making patch available for trunk. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096109#comment-13096109 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- Re: FindBugs warnings. These are not new. In fact none of these directories (all in mrv2 code) have been touched by the patch. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095556#comment-13095556 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- @Arun, LTC changes made in 0.20.203 have not propagated to 0.23 and trunk ? I thought the race condition fix is already in trunk/0.23, no ? Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767new.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095561#comment-13095561 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- Aha. Okay, will do. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767new.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: (was: MR2767new.patch) Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: MR2767-trunk.patch MR2767-23.patch MR2767-22.patch Attaching patches for 0.22, 0.23, and trunk. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Affects Version/s: 0.24.0 0.23.0 Fix Version/s: 0.24.0 0.23.0 Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0, 0.23.0, 0.24.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0, 0.23.0, 0.24.0 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094719#comment-13094719 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- For some time after the merge of three projects (common, hdfs, mapreduce), the test-patch ant target was broken (since it tried to apply the patch from inside the mapreduce directory (where build.xml exists), whereas the patch is generated from the top level. This never got fixed in the 0.22 branch, and so my ant test-patch is failing in the patch application step. After manually applying the patch, I ran ant test in mapreduce directory, and I get only unrelated failures. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094766#comment-13094766 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Arun, the direct-launch method requires that certain environment variables are set that a.out can access. At a minimum, number of nodes, and the host and port of the head node (i.e. process with rank 0) need to be available to all processes. Thus, we will have a tiny process that sets these environment variables, and launch a.out. When a.out calls MPI_Init(), the MPI library code will read these env vars, wait till all the processes have reported to the head node, and start execution. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-2911 started by Milind Bhandarkar. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094771#comment-13094771 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Luke 1. Communication among processes in Hadoop, i.e. map output that gets consumed by reduce input, is not encrypted. I think un-encrypted communication among MPI processes should be acceptable. 2. mpiexec is used by MPI-2, and OpenMPI supports that. Can you elaborate on your third point ? Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: MR2767new.patch New patch that fixes the build.xml merge issues. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch, MR2767new.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094786#comment-13094786 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Arun, I do not see how a ContainerLaunchContext can get the hostname and port of the 0'th container (which is the head node). (I remember Jerry had worked around this problem by making the JobClient as a 0th process. But having a gateway execute heavy-duty code is not good.) Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094790#comment-13094790 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Luke, how do you prevent map tasks opening sockets, receiving connections, and communicating with each other in Hadoop ? Isn't that the same case here ? Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: (was: MR2767.patch) Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767new.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt testlog.txt Attaching the log of ant test. There were two failures: testMiniMRChildTask, and testDFSIO. testDFSIO is definitely unrelated. I have attached testMiniMRChildTask log, which looks unrelated as well. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767new.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094908#comment-13094908 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- Re: the comment by Hadoop QA: this patch is not for trunk. It's only for 0.22 branch. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767new.patch, TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093491#comment-13093491 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- where should I place it in the source hierarchy ? Also, I am currently working off the trunk. IIn case, I get busy in other stuff, I do not want it to be blocker for 0.23.0. What's the timeline for 0.23.0 release ? I know that I wont be able to make it work on windows in the first version. I hope that does not become a blocker, too. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093502#comment-13093502 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- The design is deliberately kept simple. One script, start-mpi -np numnodes -out hdfs://user/milind/nodes.lst starts the application master, which requests numnodes containers from resource manager, and waits till all those containers become available. The job client polls for application master to write a file called nodes.lst in specified location on HDFS. As containers become available, the application master spawns openmpi runtime environment daemon (orted) in each of those containers. When job client notices that nodes.lst is available on HDFS, it downloads it to local directory, and exits. MPI jobs are launched with regular: mpirun -np numnodes -nodes nodes.lst executable Multiple MPI jobs can be launched in the same virtual MPI cluster created by start-mpi script. After all MPI jobs are done, the cluster is dismantled with stop-mpi nodes.lst (first line of nodes.lst contains application master location and port.) Currently, there is no authentication for MPI job submission on the cluster started by the user. Thus, anyone can submit MPI jobs to any virtual MPI cluster. (I promise to do it in the next version.) Also, if any of the container (running orte), exits abnormally, entire virtual MPI cluster is terminated. (This limitation will be removed in the next version.) There is one issue I am currently facing. I need at most one MPI container per physical node (until I figure out how to avoid port conflicts etc). Any input regarding how to achieve that, is welcome. My code walkthrough of resource manager did not suggest anything obvious. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093505#comment-13093505 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Arun I will try my best to get the first version into 0.23.0 (but as noted above there will be a huge security hole.) Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093506#comment-13093506 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- @Arun, hadoop-openmpi-client makes most sense (however, it also contains an app master.) Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093512#comment-13093512 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- Just realized that if I make nodes.lst permissions 600, no other user will be able to accidentally submit jobs to the virtual MPI cluster (but malicious users can check the RM UI to see MPI AMs, and recreate nodes.lst.) Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 0.23.0 Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM
[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090019#comment-13090019 ] Milind Bhandarkar commented on MAPREDUCE-2863: -- Hey guys, in the tried and tested traditions of Apache, (i.e. the apache way), why can't we wait until a patch is actually posted, to discuss its merits ? I mean, that way has worked so far in Apache, right ? Wait for a few days for a patch, and then lets discuss. (By the way, based on the premature negativity on this jira, I would really really like an already done work, i.e. Hoop, to get accepted by the community -- since this jira and Hoop have lots of common dependencies and even code) before uploading my patch, so Alejandro, you can really do the community a great service by uploading your Hoop patches first.) Support web-services for RM NM Key: MAPREDUCE-2863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, resourcemanager Reporter: Arun C Murthy Assignee: Milind Bhandarkar It will be very useful for RM and NM to support web-services to export json/xml. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2863) Support web-services for RM NM
[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar reassigned MAPREDUCE-2863: Assignee: Milind Bhandarkar Support web-services for RM NM Key: MAPREDUCE-2863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, resourcemanager Reporter: Arun C Murthy Assignee: Milind Bhandarkar It will be very useful for RM and NM to support web-services to export json/xml. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM
[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087977#comment-13087977 ] Milind Bhandarkar commented on MAPREDUCE-2863: -- I plan to use Jersey (http://jersey.java.net/) for this, since thats what I am familiar with, and have used in the past. Jersey is available under two licenses: CDDL 1.1 and GPL 2 with CPE. Is any of these acceptable for use in Apache Hadoop ? I see that Jersey is being used in two other Apache projects: Apache Camel, and Apache ActiveMQ. If Jersey license is not an issue, then I can start building this using Jersey. Support web-services for RM NM Key: MAPREDUCE-2863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, resourcemanager Reporter: Arun C Murthy Assignee: Milind Bhandarkar It will be very useful for RM and NM to support web-services to export json/xml. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM
[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087996#comment-13087996 ] Milind Bhandarkar commented on MAPREDUCE-2863: -- Alejandro, that's exactly what I plan to do. Any idea when Hoop will be committed ? Support web-services for RM NM Key: MAPREDUCE-2863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, resourcemanager Reporter: Arun C Murthy Assignee: Milind Bhandarkar It will be very useful for RM and NM to support web-services to export json/xml. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080496#comment-13080496 ] Milind Bhandarkar commented on MAPREDUCE-2767: -- That will cause test-failures etc. I think removing it entirely is the cleanest way of turning it off. --- Milind Bhandarkar (typing on glass, please ignore spelling mistakes) Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Patch Available (was: Open) Submitting for tests to run. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Open (was: Patch Available) Cancelling patch, and re-submitting to see if the (now available) jenkins picks it up. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Open (was: Patch Available) Following @atm's directions. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: MR2767.patch Attaching the same patch as before for jenkins to pick up. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: (was: MR2767.patch) Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Patch Available (was: Open) Dear jenkins, please pick up the patch. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Attachment: MR2767.patch Removed LinuxTaskController, associated tests, and C++ files. Modified build.xml to not build task-controller. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated MAPREDUCE-2767: - Status: Patch Available (was: Open) Patch submitted for hudson testing. Tested locally. Remove Linux task-controller from 0.22 branch - Key: MAPREDUCE-2767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Priority: Blocker Fix For: 0.22.0 Attachments: MR2767.patch There's a potential security hole in the task-controller as it stands. Based on the discussion on general@, removing task-controller from the 0.22 branch will pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2729) Reducers are always counted having pending tasks even if they can't be scheduled yet because not enough of their mappers have completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071333#comment-13071333 ] Milind Bhandarkar commented on MAPREDUCE-2729: -- It would be good to have a notion of a ready task, which is separate from a pending task. Reducers are always counted having pending tasks even if they can't be scheduled yet because not enough of their mappers have completed - Key: MAPREDUCE-2729 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2729 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.205.0 Environment: 0.20.1xx-Secondary Reporter: Sherry Chen Assignee: Sherry Chen Fix For: 0.20.205.0 In capacity scheduler, number of users in a queue needing slots are calculated based on whether users' jobs have any pending tasks. This works fine for map tasks. However, for reduce tasks, jobs do not need reduce slots until the minimum number of map tasks have been completed. Here, we add checking whether reduce is ready to schedule (i.e. if a job has completed enough map tasks) when we increment number of users in a queue needing reduce slots. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1917) Semantics of map.input.bytes is not consistent
Semantics of map.input.bytes is not consistent -- Key: MAPREDUCE-1917 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: All Reporter: Milind Bhandarkar Assignee: Arun C Murthy map.input.bytes counter is updated by RecordReader. For sequence files, it is the size of the raw data, which may be compressed. For text files, it is the size of uncompressed data. For PigStorage, it is always 0. This request is to have a consistent semantics for this counter. Since HDFS_BYTES_READ already shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size of uncompressed data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack
Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack --- Key: MAPREDUCE-1922 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: All Reporter: Milind Bhandarkar Assignee: Arun C Murthy As more and more applications use combine file input format (to reduce number of mappers), formats with columns groups implemented as different hdfs files (zebra, hbase), composite input formats (map-side joins), data-locality and rack-locality loses its meaning. (A map task reading only one column group, say 20% of its input, locally and 80% remote still gets flagged as data-local map.) So, my suggestion is to drop these counters, and instead, replace them with HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These counters will make it easier to reason about read-performance for maps. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1805) Document configurations parameters that are read by MapReduce framework and cannot be changed per job
Document configurations parameters that are read by MapReduce framework and cannot be changed per job - Key: MAPREDUCE-1805 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1805 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2 Environment: All Reporter: Milind Bhandarkar From the documentation in mapred-default.xml, it is not apparent whether the configurations parameters (such as mapred.tasktracker.map.tasks.maximum) can be specified per-job, or whether these parameters are read by the framework at start-up and can never be changed. It would be helpful to annotate the default configurations file with this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831537#action_12831537 ] Milind Bhandarkar commented on MAPREDUCE-326: - Back to a low-level binary API: the proposal here isn't to deprecate any higher level APIs, but rather to add a new lower-level API that we can implement both the current APIs and new APIs atop. This should in fact help us to preserve high-level API compatibility longer, since the mapreduce kernel will be independent of the high-level API. +1 !! I have always thought of hadoop MR APIs as assembly language, and gradually no one will use it directly. The low-level APIs will be great for Pig, Hive, HBase and other high-level languages to translate to, without making compromises for efficiency. The lowest level map-reduce APIs should be byte oriented Key: MAPREDUCE-326 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: eric baldeschwieler As discussed here: https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 The templates, serializers and other complexities that allow map-reduce to use arbitrary types complicate the design and lead to lots of object creates and other overhead that a byte oriented design would not suffer. I believe the lowest level implementation of hadoop map-reduce should have byte string oriented APIs (for keys and values). This API would be more performant, simpler and more easily cross language. The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same
[ https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782062#action_12782062 ] Milind Bhandarkar commented on MAPREDUCE-1185: -- I think the approach of including the job history file name in the URL since the beginning will cause more headaches, since the job history file name includes some things that are unparseable by humans. It may be easier and more human-friendly to translate the job id internally to the history file name, and return the content of job history. This will require a map between job ids and the file name to be kept inside the jobtracker, but that should not be too big, since the entries can be removed when job history is purged periodically. Makes sense ? In any case, Hadoop 0.21 will have a different human-friendly file naming scheme, when this can go away. URL to JT webconsole for running job and job history should be the same --- Key: MAPREDUCE-1185 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Sharad Agarwal Assignee: Sharad Agarwal Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch, 1185_v4.patch The tracking url for running jobs and the jobs which are retired is different. This creates problem for clients which caches the job running url because soon it becomes invalid when job is retired. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
[ https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758048#action_12758048 ] Milind Bhandarkar commented on MAPREDUCE-1016: -- Oh Thank you Owen ! Thank you, thank you, thank you ! Make the format of the Job History be JSON instead of Avro binary - Key: MAPREDUCE-1016 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Doug Cutting Fix For: 0.21.0, 0.22.0 I forgot that one of the features that would be nice is to off load the job history display from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore, I think we should change the storage now to prevent incompatibilities later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-989) Allow segregation of DistributedCache for maps and reduces
[ https://issues.apache.org/jira/browse/MAPREDUCE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758062#action_12758062 ] Milind Bhandarkar commented on MAPREDUCE-989: - If as eric suggests, the tasks themselves request the cached files needed (presumably in the configure method of the user-supplied mapper / reducer), then we lose an opportunity of overlapping populating cache for reducers with fetching map outputs. My request for different configuration variables for map and reduce tasks for cache is consistent with the basic observation that map and reduce runtime requirements are different. This observation has resulted in several additions to configuration variables lately, such as specifying different child.java.opts, specifying different ulimits, specifying different task runners etc for these two types of tasks. So, it is imperative that users provide different cache files and archives for different tasks too. This cannot be in the user-provided code, because otherwise, hadoop streaming, and pipes, and pig will have to be modified to implement that functionality in the wrappers they provide. Having one implementation provided by the framework seems to me the best way to go. Allow segregation of DistributedCache for maps and reduces -- Key: MAPREDUCE-989 URL: https://issues.apache.org/jira/browse/MAPREDUCE-989 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Arun C Murthy Applications might have differing needs for files in the DistributedCache wrt maps and reduces. We should allow them to specify them separately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-257) Preventing node from swapping
[ https://issues.apache.org/jira/browse/MAPREDUCE-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730539#action_12730539 ] Milind Bhandarkar commented on MAPREDUCE-257: - I have seen this in one of our production clusters. The java task itself is killed due to memory limits, but there is a runaway task consuming lost of memory. So, I think killing the entire process tree did not work. Preventing node from swapping - Key: MAPREDUCE-257 URL: https://issues.apache.org/jira/browse/MAPREDUCE-257 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Hong Tang When a node swaps, it slows everything: maps running on that node, reducers fetching output from the node, and DFS clients reading from the DN. We should just treat it the same way as if OS exhausts memory and kill some tasks to free up memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.