We have a cluster where we have both quick and long running jobs.  The long 
running jobs are mappers that take 1+ hours and reducers that take 1+hours and 
can take all the resources on our cluster.

We want a configuration where the quick jobs get resources and not get blocked 
by the long running jobs.  We are using the Fair scheduler with the long jobs 
in one queue and the short jobs in another queue.

Seems preemption with minResources is the only way to have the quick running 
jobs get cluster time.  But not only do they preempt tasks that have been 
running for a while (gripe: the preemptor does not seem to kill the jobs that 
have been running the shortest), but if they preempt a reducer then mappers 
need to be rerun as the data is lost and the reducers wait, wasting resources 
and causing more preemption.  This creates a nasty cycle and the job never 
finishes.

I am looking for suggestion on how to make this work.   Is there a way to have 
the mappers not delete the intermediate data until the job finishes?  Is there 
a way to have the preempter kill the shorted running jobs?  Or am I approaching 
this entirely wrong?

Thanks in advance for any and all advice.

Cheers,

Randy

PS: Getting a larger cluster is not an option.

Reply via email to