Hi,
Shivaram and I stumbled across this problem a few weeks ago, and AFAIK
there is no nice solution. We worked around it by avoiding jobs with tasks
that have tasks with two locality levels.
To fix this problem, we really need to fix the underlying problem in the
scheduling code, which
Hi Mridul,
In the case Shivaram and I saw, and based on my understanding of Ma chong's
description, I don't think that completely fixes the problem.
To be very concrete, suppose your job has two tasks, t1 and t2, and they
each have input data (in HDFS) on h1 and h2, respectively, and that h1 and
This sounds like it may be exactly the problem we've been having (and about
which I recently posted on the user list).
Is there any way of monitoring it's attempts to wait, giving up, and trying
another level?
In general, I'm trying to figure out why we can have repeated identical
jobs, the
In the specific example stated, the user had two taskset if I
understood right ... the first taskset reads off db (dfs in your
example), and does some filter, etc and caches it.
Second which works off the cached data (which is, now, process local
locality level aware) to do map, group, etc.
The