Re: Assigning reduce tasks to specific nodes

2012-12-08 Thread Jean-Marc Spaggiari
Hi Tsuyoshi, For which version of Hadoop is that? I think it's for 0.2x.x, right? Because I'm not able to find this class in 1.0.x Thanks, JM 2012/12/8, Tsuyoshi OZAWA : > Hi Hioryuki, > > Lately I've changed scheduler for improving hadoop, so I may help you. > > RMContainerAllocator#handleEven

Re: Assigning reduce tasks to specific nodes

2012-12-08 Thread Tsuyoshi OZAWA
Hi Hioryuki, Lately I've changed scheduler for improving hadoop, so I may help you. RMContainerAllocator#handleEvent decides MapTasks to allocated containers. You can implement semi-strict(best effort allocation) mode by hacking there. Note that, however, allocation of containers is done by Res

Re: Assigning reduce tasks to specific nodes

2012-12-07 Thread Jean-Marc Spaggiari
Hi Hiroyuki, Have you made any progress on that? I'm also looking at a way to assign specific Map tasks to specific nodes (I want the Map to run where the data is). JM 2012/12/1, Michael Segel : > I haven't thought about reducers but in terms of mappers you need to > override the data locality

Re: Assigning reduce tasks to specific nodes

2012-12-01 Thread Michael Segel
I haven't thought about reducers but in terms of mappers you need to override the data locality so that it thinks that the node where you want to send the data exists. Again, not really recommended since it will kill performance unless the compute time is at least an order of magnitude greater

Re: Assigning reduce tasks to specific nodes

2012-12-01 Thread Harsh J
Yes, scheduling is done on a Tasktracker heartbeat basis, so it is certainly possible to do absolutely strict scheduling (although be aware of the condition of failing/unavailable tasktrackers). Mohit's suggestion is somewhat like what you desire (delay scheduling in fair scheduler config) - but s

Re: Assigning reduce tasks to specific nodes

2012-12-01 Thread Hiroyuki Yamada
Thank you all for the comments. >you ought to make sure your scheduler also does non-strict scheduling of data >local tasks for jobs that don't require such strictness I just want to make sure one thing. If I write my own scheduler, is it possible to do "strict" scheduling ? Thanks On Thu, Nov

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread Mohit Anchlia
Look at locality delay parameter Sent from my iPhone On Nov 28, 2012, at 8:44 PM, Harsh J wrote: > None of the current schedulers are "strict" in the sense of "do not > schedule the task if such a tasktracker is not available". That has > never been a requirement for Map/Reduce programs and no

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread Harsh J
None of the current schedulers are "strict" in the sense of "do not schedule the task if such a tasktracker is not available". That has never been a requirement for Map/Reduce programs and nor should be. I feel if you want some code to run individually on all nodes for whatever reason, you may as

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread Hiroyuki Yamada
Thank you all for the comments and advices. I know it is not recommended to assigning mapper locations by myself. But There needs to be one mapper running in each node in some cases, so I need a strict way to do it. So, locations is taken care of by JobTracker(scheduler), but it is not strict. An

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread Michael Segel
Mappers? Uhm... yes you can do it. Yes it is non-trivial. Yes, it is not recommended. I think we talk a bit about this in an InfoQ article written by Boris Lublinsky. Its kind of wild when your entire cluster map goes red in ganglia... :-) On Nov 28, 2012, at 2:41 AM, Harsh J wrote: > Hi,

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread JAY
Seems like hadoop is non optimal for this since it's designed to scale machines anonymously. On Nov 27, 2012, at 11:08 PM, Harsh J wrote: > This is not supported/available currently even in MR2, but take a look at > https://issues.apache.org/jira/browse/MAPREDUCE-199. > > > On Wed, Nov 28,

Re: Assigning reduce tasks to specific nodes

2012-11-28 Thread Harsh J
Hi, Mapper scheduling is indeed influenced by the getLocations() returned results of the InputSplit. The map task itself does not care about deserializing the location information, as it is of no use to it. The location information is vital to the scheduler (or in 0.20.2, the JobTracker), where i

Re: Assigning reduce tasks to specific nodes

2012-11-27 Thread Hiroyuki Yamada
Hi Harsh, Thank you for the information. I understand the current circumstances. How about for mappers ? As far as I tested, location information in InputSplit is ignored in 0.20.2, so there seems no easy way for assigning mappers to specific nodes. (I before checked the source and noticed that l

Re: Assigning reduce tasks to specific nodes

2012-11-27 Thread Harsh J
This is not supported/available currently even in MR2, but take a look at https://issues.apache.org/jira/browse/MAPREDUCE-199. On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada wrote: > Hi, > > I am wondering how I can assign reduce tasks to specific nodes. > What I want to do is, for example,

Assigning reduce tasks to specific nodes

2012-11-27 Thread Hiroyuki Yamada
Hi, I am wondering how I can assign reduce tasks to specific nodes. What I want to do is, for example, assigning reducer which produces part-0 to node xxx000, and part-1 to node xxx001 and so on. I think it's abount task assignment scheduling but I am not sure where to customize to achie