why does shuffle in spark write shuffle data to disk by default?

2015-11-23 Thread huan zhang
Hi All, I'm wonderring why does shuffle in spark write shuffle data to disk by default? In Stackoverflow, someone said it's used by FTS, but node down is the most common reason of fault, and write to disk cannot do FTS in this case either. So why not use ramdisk as default instread of

Re: why does shuffle in spark write shuffle data to disk by default?

2015-11-23 Thread Reynold Xin
I think for most jobs the bottleneck isn't in writing shuffle data to disk, since shuffle data needs to be "shuffled" and sent across the network. You can always use a ramdisk yourself. Requiring ramdisk by default would significantly complicate configuration and platform portability. On Mon,