Re: Fully in-memory shuffles

Davies Liu Wed, 10 Jun 2015 21:51:43 -0700

If you have enough memory, you can put the temporary work directory in
tempfs (in memory file system).


On Wed, Jun 10, 2015 at 8:43 PM, Corey Nolet <cjno...@gmail.com> wrote:
> Ok so it is the case that small shuffles can be done without hitting any
> disk. Is this the same case for the aux shuffle service in yarn? Can that be
> done without hitting disk?
>
> On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell <pwend...@gmail.com> wrote:
>>
>> In many cases the shuffle will actually hit the OS buffer cache and
>> not ever touch spinning disk if it is a size that is less than memory
>> on the machine.
>>
>> - Patrick
>>
>> On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet <cjno...@gmail.com> wrote:
>> > So with this... to help my understanding of Spark under the hood-
>> >
>> > Is this statement correct "When data needs to pass between multiple
>> > JVMs, a
>> > shuffle will always hit disk"?
>> >
>> > On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen <rosenvi...@gmail.com>
>> > wrote:
>> >>
>> >> There's a discussion of this at
>> >> https://github.com/apache/spark/pull/5403
>> >>
>> >>
>> >>
>> >> On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet <cjno...@gmail.com> wrote:
>> >>>
>> >>> Is it possible to configure Spark to do all of its shuffling FULLY in
>> >>> memory (given that I have enough memory to store all the data)?
>> >>>
>> >>>
>> >>>
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Fully in-memory shuffles

Reply via email to