Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. Otherwise you start paging out things you don't usually want paged out in favor of a larger filesystem cache.
There is also a vm parameter that controls the minimum size of the free chain, might want to increase that a bit. Also, look into hosting your JVM heap on huge pages, they can't be paged out and will help the JVM perform better too. On Dec 8, 2012, at 6:09 PM, Robert Dyer <rd...@iastate.edu> wrote: > Has anyone experienced a TaskTracker/DataNode behaving like the attached > image? > > This was during a MR job (which runs often). Note the extremely high System > CPU time. Upon investigating I saw that out of 64GB ram the system had > allocated almost 45GB to cache! > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is > roughly where the graph goes back to normal (much lower System, much higher > User). > > This has happened a few times. > > I have tried playing with the sysctl vm.swappiness value (default of 60) by > setting it to 30 (which it was at when the graph was collected) and now to > 10. I am not sure that helps. > > Any ideas? Anyone else run into this before? > > 24 cores > 64GB ram > 4x2TB sata3 hdd > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on > this machine. > > 24 map slots (1gb heap each), no reducers. > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine. > <cpu-use.png>
smime.p7s
Description: S/MIME cryptographic signature
