On 28 Dec 2017, at 19:25, Patrick Alwell <palw...@hortonworks.com> wrote:
> You are using groupByKey() have you thought of an alternative like 
> aggregateByKey() or combineByKey() to reduce shuffling?

I am aware of this indeed. I do have a groupByKey() that is difficult to avoid, 
but the problem occurs afterwards.

> Dynamic allocation is great; but sometimes I’ve found explicitly setting the 
> num executors, cores per executor, and memory per executor to be a better 
> alternative.

I will try with dynamic allocation off.

> Take a look at the yarn logs as well for the particular executor in question. 
> Executors can have multiple tasks; and will often fail if they have more 
> tasks than available threads.

The trouble is there is nothing significant in the logs (read: that is clear 
enough for me to understand!). Any special message I could grep for?

> [...] https://spark.apache.org/docs/latest/tuning.html#level-of-parallelism
> [...] https://spark.apache.org/docs/latest/hardware-provisioning.html

Thanks for the pointers -- will have a look!

Jeroen



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to