Hi All,
I'm wonderring why does shuffle in spark write shuffle data to disk by
default?
In Stackoverflow, someone said it's used by FTS, but node down is the
most common reason of fault, and write to disk cannot do FTS in this case
either.
So why not use ramdisk as default instread of
I think for most jobs the bottleneck isn't in writing shuffle data to disk,
since shuffle data needs to be "shuffled" and sent across the network.
You can always use a ramdisk yourself. Requiring ramdisk by default would
significantly complicate configuration and platform portability.
On Mon,