If you have enough memory, you can put the temporary work directory in tempfs (in memory file system).
On Wed, Jun 10, 2015 at 8:43 PM, Corey Nolet <cjno...@gmail.com> wrote: > Ok so it is the case that small shuffles can be done without hitting any > disk. Is this the same case for the aux shuffle service in yarn? Can that be > done without hitting disk? > > On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell <pwend...@gmail.com> wrote: >> >> In many cases the shuffle will actually hit the OS buffer cache and >> not ever touch spinning disk if it is a size that is less than memory >> on the machine. >> >> - Patrick >> >> On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet <cjno...@gmail.com> wrote: >> > So with this... to help my understanding of Spark under the hood- >> > >> > Is this statement correct "When data needs to pass between multiple >> > JVMs, a >> > shuffle will always hit disk"? >> > >> > On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen <rosenvi...@gmail.com> >> > wrote: >> >> >> >> There's a discussion of this at >> >> https://github.com/apache/spark/pull/5403 >> >> >> >> >> >> >> >> On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet <cjno...@gmail.com> wrote: >> >>> >> >>> Is it possible to configure Spark to do all of its shuffling FULLY in >> >>> memory (given that I have enough memory to store all the data)? >> >>> >> >>> >> >>> >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org