Re: Keeping Map-Tasks alive

2012-08-06 Thread Yaron Gonen
Thanks. As I see it, it cannot be done in the MapReduce 1 framework without changing TaskTracker and JobTracker. Problem is I'm not familiar at all with YARN... it might be possible there. Thanks again! On Mon, Aug 6, 2012 at 1:21 AM, Harsh J wrote: > Ah, my bad - I skipped over the K-Means part

Re: Keeping Map-Tasks alive

2012-08-05 Thread Harsh J
Ah, my bad - I skipped over the K-Means part of your original post. There currently isn't a way to do this with the existing MR framework and APIs. A Reducer is initiated upon map completion and the Task JVM is canned away after the Maps end. Perhaps you can use YARN to write something of what you

Re: Keeping Map-Tasks alive

2012-08-05 Thread Yaron Gonen
Thanks for the fast reply, but I don't see how a custom record reader will help. Consider again the k-means: the mappers need to stand-by until all the reducers finish to calculate the new clusters' center. Only then, after the reducers finish their work, the stand-by mappers get back to life and p

Re: Keeping Map-Tasks alive

2012-08-05 Thread Harsh J
Sure you can, as we provide pluggable code points via the API. Just write a custom record reader that doubles the work (first round reads actual input, second round reads your known output and reiterates). In the mapper, separate the first and second logic via a flag. On Sun, Aug 5, 2012 at 4:17 P

Keeping Map-Tasks alive

2012-08-05 Thread Yaron Gonen
Hi, Is there a way to keep a map-task alive after it has finished its work, to later perform another task on its same input? For example, consider the k-means clustering algorithm (k-means descriptionand hadoop implementation