Re: oome from blockmanager

Stephen Haberman Sat, 26 Oct 2013 15:42:37 -0700

Thanks for all the help, guys.

> When you do a shuffle form N map partitions to M reduce partitions, there are
> N * M output blocks created and each one is tracked.


Okay, that makes sense. I have a few questions then...

If there are N*M output blocks, does that mean that each machine will
(generally) be responsible for (N*M)/(number of machines) blocks, and so the
BlockManager data structures would have appropriately less data with more
machines?

(Turns out 7,000*7,000/5=9.8 million which is in the ballpark of the
estimated 5 million or so entries that were in BlockManager heap dump.)

> Since you only have a few machines, you don't "need" the
> extra partitions to add more parallelism to the reduce.

True, I am perhaps overly cautious about favoring more partitions...

It seems like previously, when Spark was shuffling a partition in the
ShuffleMapTask, it buffered all the data in memory. So, even if you
only had 5 machines, it was important to have lots of tiny slices of
data, rather than a few big ones, to avoid OOMEs.

...but, I had forgotten this, but I believe that's no longer the case?
And Spark/ShuffleMapTask can now fully stream partitions?

...if so, that seems like a big deal and that one of my first patches
to have default partitioner prefer max partitions no longer makes as
much sense, and spark.default.parallelism just became a whole lot more
useful. As in, it should always be set. (We're not, currently.)

I'll give this another go later tonight/tomorrow with less partitions
and see what happens.

Thanks again!

- Stephen

Re: oome from blockmanager

Reply via email to