Yes, this is very much "use at your own risk".  That said at Yahoo we did
something very similar to this on all of the YARN nodes and saw a decent
performance uplift.  This was even with HDFS running on the same nodes. I
think we just changed the time to flush to 30 mins, but it was a long time
ago so the details are a bit fuzzy. Spark will guarantee that the data has
been handed off to the OS. The OS should not lose the data unless there is
a bug in the OS or the entire node crashes.  If the node crashes all the
containers will go down with it.  YARN has no way to recover a crashed
container, it just reruns it from the beginning, so it does not matter if
the data is lost when it crashes.

I don't remember all of the details for what we did around software
upgrade or RAID setups for the root/os partitions. You need to be careful
if you are modifying anything that is not ephemeral on the node, but most
things should be ephemeral in production on these types of systems. You
might mess up the file system a bit on a crash and need to run fsk to
recover, but we had that automated and most groups now run on VMs anyways
so throw away the old VM and start over.  Especially if crashes are rare,
and you might have a lot of memory sitting idle.

On Fri, Aug 20, 2021 at 10:05 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi Bobby,
>
> On this statement of yours if I may:
>
> ... If you really want to you can configure the pagecache to not spill to
> disk until absolutely necessary. That should get you really close to pure
> in-memory processing, so long as you have enough free memory on the host to
> support it.
>
> I would not particularly recommend that, bearing in mind that as we are
> dealing with edge cases, in case of error recovering from edge cases can be
> more costly than using disk space.
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 20 Aug 2021 at 15:28, Bobby Evans <bo...@apache.org> wrote:
>
>> On the data path, Spark will write to a local disk when it runs out of
>> memory and needs to spill or when doing a shuffle with the default shuffle
>> implementation.  The spilling is a good thing because it lets you process
>> data that is too large to fit in memory.  It is not great because the
>> processing slows down a lot when that happens, but slow is better than
>> crashing in many cases. The default shuffle implementation will
>> always write out to disk.  This again is good in that it allows you to
>> process more data on a single box than can fit in memory. It is bad when
>> the shuffle data could fit in memory, but ends up being written to disk
>> anyways.  On Linux the data is being written into the page cache and will
>> be flushed to disk in the background when memory is needed or after a set
>> amount of time. If your query is fast and is shuffling little data, then it
>> is likely that your query is running all in memory.  All of the shuffle
>> reads and writes are probably going directly to the page cache and the disk
>> is not involved at all. If you really want to you can configure the
>> pagecache to not spill to disk until absolutely necessary. That should get
>> you really close to pure in-memory processing, so long as you have enough
>> free memory on the host to support it.
>>
>> Bobby
>>
>>
>>
>> On Fri, Aug 20, 2021 at 7:57 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well I don't know what having an "in-memory Spark only" is going to
>>> achieve. Spark GUI shows the amount of disk usage pretty well. The memory
>>> is used exclusively by default first.
>>>
>>> Spark is no different from a predominantly in-memory application.
>>> Effectively it is doing the classical disk based hadoop  map-reduce
>>> operation "in memory" to speed up the processing but it is still an
>>> application on top of the OS.  So like mose applications, there is a state
>>> of Spark, the code running and the OS(s), where disk usage will be needed.
>>>
>>> This is akin to swap space on OS itself and I quote "Swap space is used when
>>> your operating system decides that it needs physical memory for active
>>> processes and the amount of available (unused) physical memory is
>>> insufficient. When this happens, inactive pages from the physical
>>> memory are then moved into the swap space, freeing up that physical memory
>>> for other uses"
>>>
>>>  free
>>>               total        used        free      shared  buff/cache
>>>  available
>>> Mem:       65659732    30116700     1429436     2341772    34113596
>>> 32665372
>>> Swap:     104857596      550912   104306684
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 20 Aug 2021 at 12:50, Jacek Laskowski <ja...@japila.pl> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been exploring BlockManager and the stores for a while now and am
>>>> tempted to say that a memory-only Spark setup would be possible (except
>>>> shuffle blocks). Is this correct?
>>>>
>>>> What about shuffle blocks? Do they have to be stored on disk (in
>>>> DiskStore)?
>>>>
>>>> I think broadcast variables are in-memory first so except on-disk
>>>> storage level explicitly used (by Spark devs), there's no reason not to
>>>> have Spark in-memory only.
>>>>
>>>> (I was told that one of the differences between Trino/Presto vs Spark
>>>> SQL is that Trino keeps all processing in-memory only and will blow up
>>>> while Spark uses disk to avoid OOMEs).
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://about.me/JacekLaskowski
>>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>
>>>> <https://twitter.com/jaceklaskowski>
>>>>
>>>

Reply via email to