Re: Why does spark write huge file into temporary local disk even without on-disk persist or checkpoint?

Peng Cheng Wed, 11 Feb 2015 07:56:22 -0800

You are right. I've checked the overall stage metrics and looks like the
largest shuffling write is over 9G. The partition completed successfully
but its spilled file can't be removed until all others are finished.
It's very likely caused by a stupid mistake in my design. A lookup table
grows constantly in a loop, every time its union with a new increment will
results in both of them being reshuffled, and partitioner reverted to None.
This can never be efficient with existing API.

Re: Why does spark write huge file into temporary local disk even without on-disk persist or checkpoint?

Reply via email to