Most shuffle files are really kept around in the OS's buffer/disk cache, so it is still pretty much in memory. If you are concerned about performance, you have to do a holistic comparison for end-to-end performance. You could take a look at this.
https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/ On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh < [email protected]> wrote: > Is it fair to say that Storm stream processing is completely in memory, > whereas spark streaming would take a disk hit because of how shuffle works? > > Does spark streaming try to avoid disk usage out of the box? > > -Abhishek- > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
