Keys distribution insights

Flavio Pompermaier Mon, 05 Jun 2017 03:02:54 -0700

Hi everybody,
in my job I have a groupReduce operator with parallelism 4 and one of the
sub-tasks takes a huge amount of time (wrt the others).
My guess is that the objects assigned to that slot have much more data to
reduce (an thus are somehow computationally heavy within the groupReduce
operator).
What I'm trying to understand which keys are assigned to that slot: is
there any way (from the JobManager UI or from the logs) to investigate the
keys distribution (that from the plan visualization is the result of an
hash partition)?


Best,
Flavio

Keys distribution insights

Reply via email to