Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
the default compression codec (LZF) is that the implementation we use
allocates buffers pretty generously. I did a simple experiment and found
that creating 1000 LZFOutputStream allocated 198976424 bytes
We tried with lower block size for lzf, but it barfed all over the place.
Snappy was the way to go for our jobs.
Regards,
Mridul
On Mon, Jul 14, 2014 at 12:31 PM, Reynold Xin r...@databricks.com wrote:
Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
Just a comment from the peanut gallery, but these buffers are a real
PITA for us as well. Probably 75% of our non-user-error job failures
are related to them.
Just naively, what about not doing compression on the fly? E.g. during
the shuffle just write straight to disk, uncompressed?
For us, we
Stephen,
Often the shuffle is bound by writes to disk, so even if disks have enough
space to store the uncompressed data, the shuffle can complete faster by
writing less data.
Reynold,
This isn't a big help in the short term, but if we switch to a sort-based
shuffle, we'll only need a single
You can actually turn off shuffle compression by setting spark.shuffle.compress
to false. Try that out, there will still be some buffers for the various
OutputStreams, but they should be smaller.
Matei
On Jul 14, 2014, at 3:30 PM, Stephen Haberman stephen.haber...@gmail.com
wrote:
Just a
Copying Jon here since he worked on the lzf library at Ning.
Jon - any comments on this topic?
On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
You can actually turn off shuffle compression by setting
spark.shuffle.compress to false. Try that out, there will
Maybe we could try LZ4 [1], which has better performance and smaller footprint
than LZF and Snappy. In fast scan mode, the performance is 1.5 - 2x
higher than LZF[2],
but memory used is 10x smaller than LZF (16k vs 190k).
[1] https://github.com/jpountz/lz4-java
[2]
Is the held memory due to just instantiating the LZFOutputStream? If so,
I'm a surprised and I consider that a bug.
I suspect the held memory may be due to a SoftReference - memory will be
released with enough memory pressure.
Finally, is it necessary to keep 1000 (or more) decoders active?
One of the core problems here is the number of open streams we have, which
is (# cores * # reduce partitions), which can easily climb into the tens of
thousands for large jobs. This is a more general problem that we are
planning on fixing for our largest shuffles, as even moderate buffer sizes
can