Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Murali Krishna
Hi, I have to run a small MR job while there is a bigger job already running. The first job takes around 20 hours to finish and the second 1 hour. The second job will be given a higher priority. The problem here is that the first set of reducers of job1 will be occupying all the slots and will

RE: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Goel, Ankur
I presume that the initial set of reducers of job1 are taking fairly long to complete thereby denying the reducers of job2 a chance to run. I don't see a provision in hadoop to preempt a running task. This looks like an enhancment to task tracker scheduling where running tasks are preempted

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
I think the JobTracker can easily detect this. The case where a high priority job is starved as there are no slots/resources. Preemption should probably kick in where tasks from a low priority job might get scheduled even though the high priority job has some tasks to run. Amar Goel, Ankur

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
Goel, Ankur wrote: Ok in that case bumping up the priority of job2 to a level higher than job1 before running job2 should actually fix the starvation issue. @Ankur, Preemption across jobs with different priorities is still not there in Hadoop. Hence job1 will succeed before job2 because of

RE: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Vivek Ratan
There are a few different issues at play here. - It seems like you're facing a problem only because the reducers of JOb1 are long running (somebody else pointed this out too). Once a reducer of Job1 finishes, that slot will go to a reducer of Job2 in today's Hadoop. Can you confirm that is