On Fri, 4 Sep 2009, Matthias Birkner wrote: > At $work we've been having a discussion about what the right amount of > swap is for a given amount of RAM in our standard linux image and I'm > looking for additional input. > > The "old school" conventional wisdom says "swap = 2x RAM". The more > modern conventional wisdom seems to vary from "swap = 1x RAM + 4G" to > "swap = 4G regardless of RAM".
in part this depends heavily on the virtual memory design of the *nix system that you are using. some systems allocate a page in swap for every virtual address you ever can use (including the ones that you have real memory for), and for those your total memory address space available to your kernel is equal to your swap size (even if it's less than your ram size), so for those, ram+ sizing is required (if you had 1G of ram and 512M of swap you would only ever be able to use 512M of ram) other systems use pages of swap in addition to pages of memory, and so your total address space is swap + ra the current linux VM system is in the second category, so you only need as much swap as you want to allow the system to use. since swap is _extrememly_ expensive to use, you don't actually want to use much, if any in a HPC cluster. HOWEVER, there is the issue of memory overcommit and how you choose to deal with it. Linux frequently uses a feature called 'Copy On Write' (COW) where instead of copying a page of memory it instead marks it read-only and COW, and allows multiple processes to still access this page. if any of the processes try to make a change, it triggers a write error that then copies the page and life continues. this is a HUGE win for almost all systems. for example, if you are running firefox and it is using 1.5G of ram, you click on a pdf file. firefox downloads the file then starts your pdf reader, to do this it first forks a copy of itself, and then executes the pdf reader. between the time that it does the fork and makes the exec call to start the pdf reader, you technically have two identical copies of firefox in ram, each needing 1.5G of ram. with COW you end up only useing a few K of ram for this process instead of having to really allocate and copy the 1.5G of ram. because of this feature, you can have a lot more address space in use than you actually have memory for (witht he firefox example, with COW you can do the example above with 2G of ram, without COW you would need 3.5G of ram + swap). the bad thing is that this can trigger additional memory getting used long after the malloc call has completed sucessfully. that additional memory use could push you into swap or run you out of memory entirely. by default linux allows unlimited overcommit, and if you actually run out of memory it triggers the Out Of Memory process (OOM), which tries to figure out what to kill to try and keep the system running (and as with any heristics, sometimes it works, sometimes it doesn't) you can change this default to disable overcommit. in which case if you do not have the address space available to fully support all possible COW splits. if you don't have enough swap allocated to support the possible COW splits, the system will reject the malloc, EVEN IF YOU HAVE UNUSED RAM. so you need to either allow unlimited overcommit (which can kill your system at unexpected times when your run out of ram), or you need to disable overcommit and have 'enough' swap (which can run the risk of running you into swap and bringing your system to a crawl) personally, I choose to leave overcommit on, and have a small amount of swap, no matter how much ram I have. for historical reasons (since it used to be the limit on swap partition size), I have fallen in the habit of creating a 2G swap partition on all my systems. If I was going to change it I would probably shrink it down (by the time a system is using 512M to 1G of swap, it's probably slowed to unusuable levels anyway and so I would just as soon have the system crash so that my clustering HA solution can kick in instead) David Lang > > So if you're running/managing a Linux HPC cluster, or you have strong > opinions on the subject, or you just want to comment :), I love to hear > you're thoughts. > > Some info about our environment... We have several HPC clusters scattered > around the globe with anywhere > from 100 to somewhat over 1000 systems in each cluster. Workload in > the clusters is managed using LSF and typically they are configured to > have one job-slot per cpu. The memory configs in each system ranges > from 4G RAM up to 512G. Not sure if the OS version matters but in case > it does, we're primarily running RHEL4u5 and starting a migration to > RHEL5u3. > > Thanks much, > Matt > > =========================================================== > "If they are the pillars of our community, > We better keep a sharp eye on the roof." > =========================================================== > > > > > _______________________________________________ > Tech mailing list > [email protected] > http://lopsa.org/cgi-bin/mailman/listinfo/tech > This list provided by the League of Professional System Administrators > http://lopsa.org/ > _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
