I think for most jobs the bottleneck isn't in writing shuffle data to disk,
since shuffle data needs to be "shuffled" and sent across the network.
You can always use a ramdisk yourself. Requiring ramdisk by default would
significantly complicate configuration and platform portability.
On Mon, Nov 23, 2015 at 5:36 PM, huan zhang wrote:
> Hi All,
> I'm wonderring why does shuffle in spark write shuffle data to disk by
> default?
> In Stackoverflow, someone said it's used by FTS, but node down is the
> most common reason of fault, and write to disk cannot do FTS in this case
> either.
> So why not use ramdisk as default instread of SDD or HDD only?
>
> Thanks
> Hubert Zhang
>