Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.

david Sat, 05 Sep 2009 11:41:49 -0700

On Sat, 5 Sep 2009, Jack wrote:

I was wondering if most OSes take the opportunity when they have
'idle i/o time' to move copies of some memory blocks, like disk caches,
to swap to make the 'getting it back faster' in case it needs to use
the memory for a 'more active' use (user programs, 'hotter i/o blocks', etc).
Thus doing a 'pre-emptive strike' before being forced to swap or drop cache.

linux will write dirty pages of cache to disk after a little bit (thedefault is something like 5 seconds) so that it can throw those pages awaywithout having to write to disk if it needs to later (and to get the dataonto permanent media so that it will survive a crash)


David Lang

<> ... Jack




On Sat, Sep 5, 2009 at 10:06 AM, Doug Hughes<[email protected]> wrote:

Edward Ned Harvey wrote:

since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.



I know this seems counterintuitive - but - I have experience to the
contrary.  In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.

Here's the reasoning:

At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes).  So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.

If you have plenty of swap available, it gives the kernel another degree of
freedom to work with.  The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.

If you run "free" or "top" on your system (assuming linux)...  Soon after
booting, you'll see lots of free memory.  But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.

During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap).  But after the system is up for a long time,
and the kernel has filled all the ram with buffers...  Then you get a
performance boost by using swap.


This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.

Again, this depends upon many factors. In our HPC workload, swap is
never used except for a very rare series of jobs computing force fields
between molecules, and it's extremely painful there, so they tune their
workload very carefully.


_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.

Reply via email to