Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.

Jack Sat, 05 Sep 2009 08:54:55 -0700

I was wondering if most OSes take the opportunity when they have
'idle i/o time' to move copies of some memory blocks, like disk caches,
to swap to make the 'getting it back faster' in case it needs to use
the memory for a 'more active' use (user programs, 'hotter i/o blocks', etc).
Thus doing a 'pre-emptive strike' before being forced to swap or drop cache.
><> ... Jack




On Sat, Sep 5, 2009 at 10:06 AM, Doug Hughes<[email protected]> wrote:
> Edward Ned Harvey wrote:
>>> since swap is _extrememly_ expensive to use, you don't actually want to
>>> use much, if any in a HPC cluster.
>>>
>>
>>
>> I know this seems counterintuitive - but - I have experience to the
>> contrary.  In traditional thinking, of course, swap is slower so you don't
>> want to use it, but in modern thinking, having swap available boosts your
>> system performance because the system can trade swap for cache.
>>
>> Here's the reasoning:
>>
>> At all times, the kernel will grow to the maximum available ram
>> (buffering/caching disk reads/writes).  So obviously the more memory
>> available, the better, and the less required in user space the better...
>> but ... This means at all times the kernel is choosing which disk blocks to
>> keep in cache, as user processes grow, whatever is deemed to be the least
>> valuable cached disk block is dropped out of ram.
>>
>> If you have plenty of swap available, it gives the kernel another degree of
>> freedom to work with.  The kernel now has the option available to page out
>> some idle process that it deems to be less valuable than the cached disk
>> blocks.
>>
>> If you run "free" or "top" on your system (assuming linux)...  Soon after
>> booting, you'll see lots of free memory.  But if your system has been up for
>> a week, you'll see zero free memory, and all the rest is consumed by
>> buffers.
>>
>> During the time when there is still "free" memory available, you will get no
>> performance boost by having swap available (and obviously there would be no
>> reason to consume any swap).  But after the system is up for a long time,
>> and the kernel has filled all the ram with buffers...  Then you get a
>> performance boost by using swap.
>>
>
> This is very tricky and presumes that your local disk is faster than
> your back end storage, which is not necessarily the case. A local disk
> cache can be your friend or your enemy depending on your job load and
> your architecture. If you have a big honking storage farm to serve your
> HPC cluster with lots of memory, you can serve things at nearly wire speed.
>
> Again, this depends upon many factors. In our HPC workload, swap is
> never used except for a very rare series of jobs computing force fields
> between molecules, and it's extremely painful there, so they tune their
> workload very carefully.
>
>
> _______________________________________________
> Tech mailing list
> [email protected]
> http://lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>

_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.

Reply via email to