[jira] [Commented] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2013-12-06 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841409#comment-13841409
 ] 

Milind Bhandarkar commented on MAPREDUCE-5649:
--

Folks, is there any reason behind limiting this amount of memory to 2GB ?

 Reduce cannot use more than 2G memory  for the final merge
 --

 Key: MAPREDUCE-5649
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: trunk
Reporter: stanley shi

 In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
 the finalMerge method: 
  int maxInMemReduce = (int)Math.min(
 Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
  
 This means no matter how much memory user has, reducer will not retain more 
 than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2013-03-06 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595526#comment-13595526
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

The community just created a huge issue for me to make this available to the 
community, by naming us anti-community. So, while I am trying to get this 
available to the community, I have to now a few more obstacles to overcome. 
Please bear with me, or better still try to stop the community to stop their 
bile-spewing against us, so that we can navigate through this mess.





 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Ralph H Castain
   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-12-10 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528376#comment-13528376
 ] 

Milind Bhandarkar commented on MAPREDUCE-4049:
--

Thanks for verifying, Arun. FWIW, we have been running with many earlier 
versions of this patch on our Greenplum Analytics Workbench 1000 node cluster 
since May 2012 (I think I had mentioned this to you and Chris Douglas during 
Hadoop Summit in June), and haven't found any issues with this patch so far. 
(See my comment above.)

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: 3.0.0

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4459) Allow ad-placement on the JobTracker /RM UI

2012-07-18 Thread Milind Bhandarkar (JIRA)
Milind Bhandarkar created MAPREDUCE-4459:


 Summary: Allow ad-placement on the JobTracker /RM UI
 Key: MAPREDUCE-4459
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4459
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Affects Versions: 2.0.0-alpha
 Environment: all, especially private clusters
Reporter: Milind Bhandarkar
Priority: Minor


A lot of Hadoop map-reduce users spend a lot of time staring at the jobtracker 
webUI to check if their job has been scheduled, and checking the progress. An 
easy way to monetize these eyeballs is to allow ad-placement on this page. This 
will attract public-cloud IaaS companies such as AWS, Google Compute Engine, 
Microsoft Azure etc to place ads on that page, such as Waiting for your job to 
be scheduled on your company's Hadoop cluster ? You can create your own cluster 
and run your jobs fast, without waiting.

This will allow major Hadoop installations to offload some of their load to 
public IaaS clouds, and in addition, create an ad-revenue source for themselves.

And not only that, based on the demographic (mostly male, mostly starved of all 
the real-world fun) of users of these Hadoop clusters, there could be very 
targeted ads to be placed on this page.

(Please consider this as an extension to HADOOP-8607).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-05-25 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283826#comment-13283826
 ] 

Milind Bhandarkar commented on MAPREDUCE-4049:
--

Luke,

the original Auburn Univ work was done (with Mellanox support) in version 
0.20.x. However, Avner ported it to 1.0.x, which is the patch attached here. We 
have been testing it at Greenplum (with both default shuffle plugin, and 
Mellanox's Unstructured Data Accelerator (UDA) plugin,) and haven't found any 
issues so far.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.1.0, 1.0.3, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, 
 HADOOP-1.0.x.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle 
 Provider Plugin TLD.rtf, MAPREDUCE-4049-branch-1.0.2.patch, mapred-site.xml, 
 mapred.diff, src.tgz, test.diff


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-05-25 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283846#comment-13283846
 ] 

Milind Bhandarkar commented on MAPREDUCE-4049:
--

Luke, I do not know of any tests done with 0.20.203 (Avner do you have any 
numbers?). The tests that compared vanilla 1.0.x  with pluggable shuffle (with 
default plugin) do not show any measurable difference.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.1.0, 1.0.3, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, 
 HADOOP-1.0.x.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle 
 Provider Plugin TLD.rtf, MAPREDUCE-4049-branch-1.0.2.patch, mapred-site.xml, 
 mapred.diff, src.tgz, test.diff


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2012-05-17 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277916#comment-13277916
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

I am excited to report that, thanks to great efforts by Ralph Castain and 
Wangda Tan, Hamster (i.e. OpenMPI on Yarn) now works flawlessly, and is 
scheduled to be merged to OpenMPI trunk soon. This effort was equivalent to 
building a second floor on a mobile home while it was hurtling down the freeway 
at 65 MPH :-) Thanks to both Ralph  Wangda.

According to Ralph:

Lots of cleanup and documentation to do, and performance sucks per HPC
standards. But at least it works!

To my knowledge, this is the first application framework implemented in C that 
uses the multi-lingual protobuf APIs for Yarn. (For secure environments, a 
small java-based shim is needed.)

Also, it is encouraging that no changes were needed in Yarn to make resource 
allocation work for MPI. (MPI as a standard came along in 1994, 18 years before 
Yarn was designed.)

Currently, using MPI-IO functionality in MPI requires a shared posix  
file-system mounted on every node. However, this will change in future. For 
some distributed file systems (*cough*), which offer posix interface, MPI-IO 
works today.

Once it is decided whether BigTop can include Non-ASF packages, we plan to work 
with BigTop community to integrate OpenMPI (new BSD-licensed) in the big data 
stack.

I am closing this issue as fixed.

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Ralph H Castain
 Fix For: 0.24.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2012-05-17 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar resolved MAPREDUCE-2911.
--

Resolution: Fixed

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Ralph H Castain
 Fix For: 0.24.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption

2011-09-22 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112741#comment-13112741
 ] 

Milind Bhandarkar commented on MAPREDUCE-3065:
--

@Chris, glad to know that it worked! Ironically, I had discovered this issue 
with RHEL6 when working on another project earlier this year at LinkedIn :-)

 ApplicationMaster killed by NodeManager due to excessive virtual memory 
 consumption
 ---

 Key: MAPREDUCE-3065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.24.0
Reporter: Chris Riccomini

  Hey Vinod,
  
  OK, so I have a little more clarity into this.
  
  When I bump my resource request for my AM to 4096, it runs. The important 
  line in the NM logs is:
  
  2011-09-21 13:43:44,366 INFO  monitor.ContainersMonitorImpl 
  (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 
  for container-id container_1316637655278_0001_01_01 : Virtual 
  2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit 
  -1 bytes
  
  The thing to note is the virtual memory, which is off the charts, even 
  though my physical memory is almost nothing (12 megs). I'm still poking 
  around the code, but I am noticing that there are two checks in the NM, one 
  for virtual mem, and one for physical mem. The virtual memory check appears 
  to be toggle-able, but is presumably defaulted to on.
  
  At this point I'm trying to figure out exactly what the VMEM check is for, 
  why YARN thinks my app is taking 2 gigs, and how to fix this.
  
  Cheers,
  Chris
  
  From: Chris Riccomini [criccom...@linkedin.com]
  Sent: Wednesday, September 21, 2011 1:42 PM
  To: mapreduce-...@hadoop.apache.org
  Subject: Re: ApplicationMaster Memory Usage
  
  For the record, I bumped to 4096 for memory resource request, and it works.
  :(
  
  
  On 9/21/11 1:32 PM, Chris Riccomini criccom...@linkedin.com wrote:
  
  Hey Vinod,
  
  So, I ran my application master directly from the CLI. I commented out the
  YARN-specific code. It runs fine without leaking memory.
  
  I then ran it from YARN, with all YARN-specific code commented it. It again
  ran fine.
  
  I then uncommented JUST my registerWithResourceManager call. It then fails
  with OOM after a few seconds. I call registerWithResourceManager, and then 
  go
  into a while(true) { println(yeh) sleep(1000) }. Doing this prints:
  
  yeh
  yeh
  yeh
  yeh
  yeh
  
  At which point, it dies, and, in the NodeManager,I see:
  
  2011-09-21 13:24:51,036 WARN  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for
  container: container_1316626117280_0005_01_01 has processes older than 
  1
  iteration running over the configured limit. Limit=2147483648, current 
  usage =
  2192773120
  2011-09-21 13:24:51,037 WARN  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:run(453)) - Container
  [pid=23852,containerID=container_1316626117280_0005_01_01] is running
  beyond memory-limits. Current usage : 2192773120bytes. Limit :
  2147483648bytes. Killing container.
  Dump of the process-tree for container_1316626117280_0005_01_01 :
  |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
  SYSTEM_TIME(MILLIS)
  VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
  |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java 
  -Xmx512M
  -cp './package/*' kafka.yarn.ApplicationMaster
  /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
  com.linkedin.TODO 1
  1/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
  001/stdout
  2/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
  001/stderr
  |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp
  ./package/* kafka.yarn.ApplicationMaster
  /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
  com.linkedin.TODO 1
  2011-09-21 13:24:51,037 INFO  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852
  
  Either something is leaking in YARN, or my registerWithResourceManager code
  (see below) is doing something funky.
  
  I'm trying to avoid going through all the pain of attaching a remote 
  debugger.
  Presumably things aren't leaking in YARN, which means it's likely that I'm
  doing something wrong in my registration code.
  
  Incidentally, my NodeManager is running with 1000 megs. My application 
  master
  memory is set to 2048, and my -Xmx setting is 512M
  
  Cheers,
  Chris
  
  From: Vinod Kumar 

[jira] [Commented] (MAPREDUCE-3060) Generic shuffle service

2011-09-21 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109872#comment-13109872
 ] 

Milind Bhandarkar commented on MAPREDUCE-3060:
--

+1 ! This makes a lot of optimized third party plugins possible.

 Generic shuffle service
 ---

 Key: MAPREDUCE-3060
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3060
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Luke Lu
  Labels: shuffle
 Fix For: 0.24.0


 When I was talking to Owen about HADOOP-2600, we came across (again, talked 
 about it with Chris before) the shuffle dependency issue. NodeManager 
 currently has an implicit (hidden by the service plugin mechanism) dependency 
 of a specific version of mapreduce shuffle. While this works in many cases, 
 as long as we don't change shuffle headers and the usage of mapred security 
 tokens, it's a hack to make things work none the less. It's generally agreed 
 upon that nodemanager should only load generic services that are mapreduce 
 framework neutral.
 In this particular case, the right solution seems to be a generic shuffle 
 handler that can serve data for a particular partition securely. The 
 ShuffleHandler currently only depends on mapreduce for task tokens and 
 shuffle header, which is only used for writing data, i.e., the shuffle 
 handler has no semantic dependency on mapreduce.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2268) With JVM reuse, JvmManager doesn't delete last workdir properly

2011-09-19 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108197#comment-13108197
 ] 

Milind Bhandarkar commented on MAPREDUCE-2268:
--

jvm reuse has very limited use in practice, especially on a multi-tenant 
cluster. Also, with security enabled, jvm reuse tends to get used even more 
rarely. In the 0.22 release notes, we should make a note of this, and should 
make this jira not-a-blocker. Thoughts ?

 With JVM reuse, JvmManager doesn't delete last workdir properly
 ---

 Key: MAPREDUCE-2268
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2268
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0


 In JvmManager, when a Jvm exits, it tries to delete the workdir for 
 {{initalContext.task}} which is null, hence throwing NPE. Currently this NPE 
 is swallowed into the abyss.
 We should catch exceptions out of the JvmRunner thread, add a test case that 
 verifies this functionality, and fix this code to properly grab the last task.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-09-09 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101330#comment-13101330
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

Sorry folks. I got distracted this week by some mind-numbing non-technical 
stuff. Progress on hamster was slow, as a result. Since I will be travelling 
next week, hoping to find some time to work on it :-)

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767

2011-09-07 Thread Milind Bhandarkar (JIRA)
Hadoop streaming test failure, post MR-2767
---

 Key: MAPREDUCE-2948
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.22.0, 0.23.0, 0.24.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.22.0, 0.23.0, 0.24.0


After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in 
contrib/streaming: TestStreamingAsDifferentUser.java is failing since it 
imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch 
forthcoming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767

2011-09-07 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-2948 started by Milind Bhandarkar.

 Hadoop streaming test failure, post MR-2767
 ---

 Key: MAPREDUCE-2948
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.22.0, 0.23.0, 0.24.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.22.0, 0.23.0, 0.24.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in 
 contrib/streaming: TestStreamingAsDifferentUser.java is failing since it 
 imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch 
 forthcoming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2948) Hadoop streaming test failure, post MR-2767

2011-09-07 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099612#comment-13099612
 ] 

Milind Bhandarkar commented on MAPREDUCE-2948:
--

Thanks for doing it while I had stepped out for a meeting, Mahadev :-)

 Hadoop streaming test failure, post MR-2767
 ---

 Key: MAPREDUCE-2948
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2948
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.22.0, 0.23.0, 0.24.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Mahadev konar
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MAPREDUCE-2948-0.22.patch, MAPREDUCE-2948.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 After removing LinuxTaskController in MAPREDUCE-2767, one of the tests in 
 contrib/streaming: TestStreamingAsDifferentUser.java is failing since it 
 imports import org.apache.hadoop.mapred.ClusterWithLinuxTaskController. Patch 
 forthcoming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2929) Move task-controller from bin to libexec

2011-09-06 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098297#comment-13098297
 ] 

Milind Bhandarkar commented on MAPREDUCE-2929:
--

linux task-controller is scheduled to be removed from 0.23. (See 
MAPREDUCE-2767).

 Move task-controller from bin to libexec
 

 Key: MAPREDUCE-2929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2929
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller
Affects Versions: 0.20.204.0, 0.23.0
 Environment: Java, Redhat 5.6
Reporter: Eric Yang

 Linux task-controller is hard coded to $HADOOP_HOME/bin.  Ideally, it should 
 be moved to $HADOOP_PREFIX/libexec for ant binary layout, or the updated 
 file structure layout for trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: (was: MR2767-trunk.patch)

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Open  (was: Patch Available)

Cancelling, and re-making patch-available.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767-trunk.patch

Removing conflict in trunk due to recent commits. Re-did patch for trunk.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Making patch available for trunk.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-02 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096109#comment-13096109
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

Re: FindBugs warnings. These are not new. In fact none of these directories 
(all in mrv2 code) have been touched by the patch.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-01 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095556#comment-13095556
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

@Arun, LTC changes made in 0.20.203 have not propagated to 0.23 and trunk ? I 
thought the race condition fix is already in trunk/0.23, no ?

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767new.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-01 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095561#comment-13095561
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

Aha. Okay, will do.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767new.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-01 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: (was: MR2767new.patch)

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, 
 testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-01 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767-trunk.patch
MR2767-23.patch
MR2767-22.patch

Attaching patches for 0.22, 0.23, and trunk.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-09-01 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Affects Version/s: 0.24.0
   0.23.0
Fix Version/s: 0.24.0
   0.23.0

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0, 0.23.0, 0.24.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0, 0.23.0, 0.24.0

 Attachments: MR2767-22.patch, MR2767-23.patch, MR2767-trunk.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094719#comment-13094719
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

For some time after the merge of three projects (common, hdfs, mapreduce), the 
test-patch ant target was broken (since it tried to apply the patch from inside 
the mapreduce directory (where build.xml exists), whereas the patch is 
generated from the top level. This never got fixed in the 0.22 branch, and so 
my ant test-patch is failing in the patch application step.

After manually applying the patch, I ran ant test in mapreduce directory, and I 
get only unrelated failures.


 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094766#comment-13094766
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Arun, the direct-launch method requires that certain environment variables are 
set that a.out can access. At a minimum, number of nodes, and the host and 
port of the head node (i.e. process with rank 0) need to be available to all 
processes. Thus, we will have a tiny process that sets these environment 
variables, and launch a.out. When a.out calls MPI_Init(), the MPI library code 
will read these env vars, wait till all the processes have reported to the head 
node, and start execution.

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-31 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-2911 started by Milind Bhandarkar.

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094771#comment-13094771
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Luke

1. Communication among processes in Hadoop, i.e. map output that gets consumed 
by reduce input, is not encrypted. I think un-encrypted communication among MPI 
processes should be acceptable.
2. mpiexec is used by MPI-2, and OpenMPI supports that.

Can you elaborate on your third point ?

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-31 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767new.patch

New patch that fixes the build.xml merge issues.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch, MR2767new.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094786#comment-13094786
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Arun, I do not see how a ContainerLaunchContext can get the hostname and port 
of the 0'th container (which is the head node). (I remember Jerry had worked 
around this problem by making the JobClient as a 0th process. But having a 
gateway execute heavy-duty code is not good.)

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094790#comment-13094790
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Luke, how do you prevent map tasks opening sockets, receiving connections, and 
communicating with each other in Hadoop ? Isn't that the same case here ?

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-31 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: (was: MR2767.patch)

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767new.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-31 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt
testlog.txt

Attaching the log of ant test. There were two failures: testMiniMRChildTask, 
and testDFSIO. testDFSIO is definitely unrelated. I have attached 
testMiniMRChildTask log, which looks unrelated as well.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767new.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-31 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094908#comment-13094908
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

Re: the comment by Hadoop QA: this patch is not for trunk. It's only for 0.22 
branch.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767new.patch, 
 TEST-org.apache.hadoop.mapred.TestMiniMRChildTask.txt, testlog.txt


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)
Hamster: Hadoop And Mpi on the same cluSTER
---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0


MPI is commonly used for many machine-learning applications. OpenMPI 
(http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
past, running MPI application on a Hadoop cluster was achieved using Hadoop 
Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
have all the tools needed to make MPI a first-class citizen on a Hadoop 
cluster. I am currently working on the patch to make MPI an application-master. 
Initial version of this patch will be available soon (hopefully before 
September 10.) This jira will track the development of Hamster: The application 
master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093491#comment-13093491
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

where should I place it in the source hierarchy ? Also, I am currently working 
off the trunk. IIn case, I get busy in other stuff, I do not want it to be 
blocker for 0.23.0. What's the timeline for 0.23.0 release ? I know that I wont 
be able to make it work on windows in the first version. I hope that does not 
become a blocker, too.

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093502#comment-13093502
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

The design is deliberately kept simple.

One script, start-mpi -np numnodes -out hdfs://user/milind/nodes.lst starts 
the application master, which requests numnodes containers from resource 
manager, and waits till all those containers become available. The job client 
polls for application master to write a file called nodes.lst in specified 
location on HDFS.

As containers become available, the application master spawns openmpi runtime 
environment daemon (orted) in each of those containers.

When job client notices that nodes.lst is available on HDFS, it downloads it to 
local directory, and exits.

MPI jobs are launched with regular:

mpirun -np numnodes -nodes nodes.lst executable

Multiple MPI jobs can be launched in the same virtual MPI cluster created by 
start-mpi script.

After all MPI jobs are done, the cluster is dismantled with

stop-mpi nodes.lst

(first line of nodes.lst contains application master location and port.)

Currently, there is no authentication for MPI job submission on the cluster 
started by the user. Thus, anyone can submit MPI jobs to any virtual MPI 
cluster. (I promise to do it in the next version.)

Also, if any of the container (running orte), exits abnormally, entire virtual 
MPI cluster is terminated. (This limitation will be removed in the next 
version.)

There is one issue I am currently facing. I need at most one MPI container per 
physical node (until I figure out how to avoid port conflicts etc). Any input 
regarding how to achieve that, is welcome.  My code walkthrough of resource 
manager did not suggest anything obvious.


 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093505#comment-13093505
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Arun I will try my best to get the first version into 0.23.0 (but as noted 
above there will be a huge security hole.)

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093506#comment-13093506
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

@Arun, hadoop-openmpi-client makes most sense (however, it also contains an app 
master.)

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

2011-08-30 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093512#comment-13093512
 ] 

Milind Bhandarkar commented on MAPREDUCE-2911:
--

Just realized that if I make nodes.lst permissions 600, no other user will be 
able to accidentally submit jobs to the virtual MPI cluster (but malicious 
users can check the RM UI to see MPI AMs, and recreate nodes.lst.)

 Hamster: Hadoop And Mpi on the same cluSTER
 ---

 Key: MAPREDUCE-2911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.0
 Environment: All Unix-Environments
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 0.23.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 MPI is commonly used for many machine-learning applications. OpenMPI 
 (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
 past, running MPI application on a Hadoop cluster was achieved using Hadoop 
 Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
 kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
 have all the tools needed to make MPI a first-class citizen on a Hadoop 
 cluster. I am currently working on the patch to make MPI an 
 application-master. Initial version of this patch will be available soon 
 (hopefully before September 10.) This jira will track the development of 
 Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM

2011-08-24 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090019#comment-13090019
 ] 

Milind Bhandarkar commented on MAPREDUCE-2863:
--

Hey guys, in the tried and tested traditions of Apache, (i.e. the apache way), 
why can't we wait until a patch is actually posted, to discuss its merits ? I 
mean, that way has worked so far in Apache, right ? Wait for a few days for a 
patch, and then lets discuss.

(By the way, based on the premature negativity on this jira, I would really 
really like an already done work, i.e. Hoop, to get accepted by the community 
-- since this jira and Hoop have lots of common dependencies and even code) 
before uploading my patch, so Alejandro, you can really do the community a 
great service by uploading your Hoop patches first.)

 Support web-services for RM  NM
 

 Key: MAPREDUCE-2863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager, resourcemanager
Reporter: Arun C Murthy
Assignee: Milind Bhandarkar

 It will be very useful for RM and NM to support web-services to export 
 json/xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2863) Support web-services for RM NM

2011-08-19 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar reassigned MAPREDUCE-2863:


Assignee: Milind Bhandarkar

 Support web-services for RM  NM
 

 Key: MAPREDUCE-2863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager, resourcemanager
Reporter: Arun C Murthy
Assignee: Milind Bhandarkar

 It will be very useful for RM and NM to support web-services to export 
 json/xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM

2011-08-19 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087977#comment-13087977
 ] 

Milind Bhandarkar commented on MAPREDUCE-2863:
--

I plan to use Jersey (http://jersey.java.net/) for this, since thats what I am 
familiar with, and have used in the past. Jersey is available under two 
licenses:  CDDL 1.1 and GPL 2 with CPE. Is any of these acceptable for use in 
Apache Hadoop ?

I see that Jersey is being used in two other Apache projects: Apache Camel, and 
Apache ActiveMQ.

If Jersey license is not an issue, then I can start building this using Jersey.

 Support web-services for RM  NM
 

 Key: MAPREDUCE-2863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager, resourcemanager
Reporter: Arun C Murthy
Assignee: Milind Bhandarkar

 It will be very useful for RM and NM to support web-services to export 
 json/xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM

2011-08-19 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087996#comment-13087996
 ] 

Milind Bhandarkar commented on MAPREDUCE-2863:
--

Alejandro, that's exactly what I plan to do. Any idea when Hoop will be 
committed ?

 Support web-services for RM  NM
 

 Key: MAPREDUCE-2863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager, resourcemanager
Reporter: Arun C Murthy
Assignee: Milind Bhandarkar

 It will be very useful for RM and NM to support web-services to export 
 json/xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-06 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080496#comment-13080496
 ] 

Milind Bhandarkar commented on MAPREDUCE-2767:
--

That will cause test-failures etc. I think removing it entirely is the cleanest 
way of turning it off.


---
Milind Bhandarkar
(typing on glass, please ignore spelling mistakes)




 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Submitting for tests to run.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Open  (was: Patch Available)

Cancelling patch, and re-submitting to see if the (now available) jenkins picks 
it up.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Open  (was: Patch Available)

Following @atm's directions.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767.patch

Attaching the same patch as before for jenkins to pick up.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: (was: MR2767.patch)

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Dear jenkins, please pick up the patch.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-02 Thread Milind Bhandarkar (JIRA)
Remove Linux task-controller from 0.22 branch
-

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0


There's a potential security hole in the task-controller as it stands. Based on 
the discussion on general@, removing task-controller from the 0.22 branch will 
pave way for 0.22.0 release. (This was done for the 0.21.0 release as well: see 
MAPREDUCE-2014.) We can roll a 0.22.1 release with the task-controller when it 
is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767.patch

Removed LinuxTaskController, associated tests, and C++ files. Modified 
build.xml to not build task-controller.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-02 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Patch submitted for hudson testing. Tested locally.

 Remove Linux task-controller from 0.22 branch
 -

 Key: MAPREDUCE-2767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MR2767.patch


 There's a potential security hole in the task-controller as it stands. Based 
 on the discussion on general@, removing task-controller from the 0.22 branch 
 will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
 well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
 task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2729) Reducers are always counted having pending tasks even if they can't be scheduled yet because not enough of their mappers have completed

2011-07-26 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071333#comment-13071333
 ] 

Milind Bhandarkar commented on MAPREDUCE-2729:
--

It would be good to have a notion of a ready task, which is separate from a 
pending task.

 Reducers are always counted having pending tasks even if they can't be 
 scheduled yet because not enough of their mappers have completed
 -

 Key: MAPREDUCE-2729
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2729
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.205.0
 Environment: 0.20.1xx-Secondary
Reporter: Sherry Chen
Assignee: Sherry Chen
 Fix For: 0.20.205.0


 In capacity scheduler, number of users in a queue needing slots are 
 calculated based on whether users' jobs have any pending tasks.
 This works fine for map tasks. However, for reduce tasks, jobs do not need 
 reduce slots until the minimum number of map tasks have been completed.
 Here, we add checking whether reduce is ready to schedule (i.e. if a job has 
 completed enough map tasks) when we increment number of users in a queue 
 needing reduce slots.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-1917) Semantics of map.input.bytes is not consistent

2010-07-06 Thread Milind Bhandarkar (JIRA)
Semantics of map.input.bytes is not consistent
--

 Key: MAPREDUCE-1917
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy


map.input.bytes counter is updated by RecordReader. For sequence files, it is 
the size of the raw data, which may be compressed. For text files, it is the 
size of uncompressed data. For PigStorage, it is always 0. This request is to 
have a consistent semantics for this counter. Since HDFS_BYTES_READ already 
shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size 
of uncompressed data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack

2010-07-06 Thread Milind Bhandarkar (JIRA)
Counters for data-local and rack-local tasks should be replaced by 
bytes-read-local and bytes-read-rack
---

 Key: MAPREDUCE-1922
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy


As more and more applications use combine file input format (to reduce number 
of mappers), formats with columns groups implemented as different hdfs files 
(zebra, hbase), composite input formats (map-side joins), data-locality and 
rack-locality loses its meaning. (A map task reading only one column group, say 
20% of its input, locally and 80% remote still gets flagged as data-local map.)

So, my suggestion is to drop these counters, and instead, replace them with 
HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These 
counters will make it easier to reason about read-performance for maps.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1805) Document configurations parameters that are read by MapReduce framework and cannot be changed per job

2010-05-20 Thread Milind Bhandarkar (JIRA)
Document configurations parameters that are read by MapReduce framework and 
cannot be changed per job
-

 Key: MAPREDUCE-1805
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1805
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2
 Environment: All
Reporter: Milind Bhandarkar


From the documentation in mapred-default.xml, it is not apparent whether the 
configurations parameters (such as mapred.tasktracker.map.tasks.maximum) can 
be specified per-job, or whether these parameters are read by the framework at 
start-up and can never be changed. It would be helpful to annotate the default 
configurations file with this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-09 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831537#action_12831537
 ] 

Milind Bhandarkar commented on MAPREDUCE-326:
-

 Back to a low-level binary API: the proposal here isn't to deprecate any 
 higher level APIs, but rather to add a new lower-level API that we can 
 implement both the current APIs and new APIs atop. This should in fact help 
 us to preserve high-level API compatibility longer, since the mapreduce 
 kernel will be independent of the high-level API.

+1 !!

I have always thought of hadoop MR APIs as assembly language, and gradually no 
one will use it directly. The low-level APIs will be great for Pig, Hive, HBase 
and other high-level languages to translate to, without making compromises for 
efficiency.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler

 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same

2009-11-24 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782062#action_12782062
 ] 

Milind Bhandarkar commented on MAPREDUCE-1185:
--

I think the approach of including the job history file name in the URL since 
the beginning will cause more headaches, since the job history file name 
includes some things that are unparseable by humans. It may be easier and more 
human-friendly to translate the job id internally to the history file name, and 
return the content of job history. This will require a map between job ids and 
the file name to be kept inside the jobtracker, but that should not be too big, 
since the entries can be removed when job history is purged periodically. Makes 
sense ?

In any case, Hadoop 0.21 will have a different human-friendly  file naming 
scheme, when this can go away.

 URL to JT webconsole for running job and job history should be the same
 ---

 Key: MAPREDUCE-1185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch, 
 1185_v4.patch


 The tracking url for running jobs and the jobs which are retired is 
 different. This creates problem for clients which caches the job running url 
 because soon it becomes invalid when job is retired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary

2009-09-21 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758048#action_12758048
 ] 

Milind Bhandarkar commented on MAPREDUCE-1016:
--

Oh Thank you Owen ! Thank you, thank you, thank you !

 Make the format of the Job History be JSON instead of Avro binary
 -

 Key: MAPREDUCE-1016
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Doug Cutting
 Fix For: 0.21.0, 0.22.0


 I forgot that one of the features that would be nice is to off load the job 
 history display from the JobTracker. That will be a lot easier, if the job 
 history is stored in JSON. Therefore, I think we should change the storage 
 now to prevent incompatibilities later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-989) Allow segregation of DistributedCache for maps and reduces

2009-09-21 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758062#action_12758062
 ] 

Milind Bhandarkar commented on MAPREDUCE-989:
-

If as eric suggests, the tasks themselves request the cached files needed 
(presumably in the configure method of the user-supplied mapper / reducer), 
then we lose an opportunity of overlapping populating cache for reducers with 
fetching map outputs.

My request for different configuration variables for map and reduce tasks for 
cache is consistent with the basic observation that map and reduce runtime 
requirements are different. This observation has resulted in several additions 
to configuration variables lately, such as specifying different 
child.java.opts, specifying different ulimits, specifying different task 
runners etc for these two types of tasks. So, it is imperative that users 
provide different cache files and archives for different tasks too.

This cannot be in the user-provided code, because otherwise, hadoop streaming, 
and pipes, and pig will have to be modified to implement that functionality in 
the wrappers they provide. Having one implementation provided by the framework 
seems to me the best way to go.

 Allow segregation of DistributedCache for maps and reduces
 --

 Key: MAPREDUCE-989
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-989
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy

 Applications might have differing needs for files in the DistributedCache wrt 
 maps and reduces. We should allow them to specify them separately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-257) Preventing node from swapping

2009-07-13 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730539#action_12730539
 ] 

Milind Bhandarkar commented on MAPREDUCE-257:
-

I have seen this in one of our production clusters. The java task itself is 
killed due to memory limits, but there is a runaway task consuming lost of 
memory. So, I think killing the entire process tree did not work.

 Preventing node from swapping
 -

 Key: MAPREDUCE-257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-257
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang

 When a node swaps, it slows everything: maps running on that node, reducers 
 fetching output from the node, and DFS clients reading from the DN. We should 
 just treat it the same way as if OS exhausts memory and kill some tasks to 
 free up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.