Hi Bijay,

The Shuffle Spill (Disk) is the total number of bytes written to disk by
records spilled during the shuffle.  The Shuffle Spill (Memory) is the
amount of space the spilled records occupied in memory before they were
spilled.  These differ because the serialized format is more compact, and
the on-disk version can be compressed as well.

-Sandy

On Mon, Mar 23, 2015 at 5:29 PM, Bijay Pathak <bijay.pat...@cloudwick.com>
wrote:

> Hello,
>
> I am running  TeraSort <https://github.com/ehiggs/spark-terasort> on
> 100GB of data. The final metrics I am getting on Shuffle Spill are:
>
> Shuffle Spill(Memory): 122.5 GB
> Shuffle Spill(Disk): 3.4 GB
>
> What's the difference and relation between these two metrics? Does these
> mean 122.5 GB was spill from memory during the shuffle?
>
> thank you,
> bijay
>

Reply via email to