On Mon, Dec 10, 2012 at 1:23 PM, Andy Isaacson <a...@cloudera.com> wrote:
> What kernel did you see this on? Was there significant swap traffic > (si/so in vmstat output) during the high-system-time period? > It's an older kernel, Fedora 15. Linux XXXXX 2.6.43.8-1.fc15.x86_64 #1 SMP Mon Jun 4 20:33:44 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux The next time it happens I'll take a look at the vmstat output, I do not have that log for this last occurrence. > BTW, you don't need to nor do you want to run sync(1) when > manipulating drop_caches, it just causes additional noise and > slowdown. drop_caches doesn't have any impact on correctness; it won't > cause data loss (by dropping a dirty page or whatever). I've had sync > calls take 10 minutes to complete, so the unnecessary impact can be > significant. > > -andy > > On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <rd...@iastate.edu> wrote: > > Has anyone experienced a TaskTracker/DataNode behaving like the attached > > image? > > > > This was during a MR job (which runs often). Note the extremely high > System > > CPU time. Upon investigating I saw that out of 64GB ram the system had > > allocated almost 45GB to cache! > > > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" > which is > > roughly where the graph goes back to normal (much lower System, much > higher > > User). > > > > This has happened a few times. > > > > I have tried playing with the sysctl vm.swappiness value (default of 60) > by > > setting it to 30 (which it was at when the graph was collected) and now > to > > 10. I am not sure that helps. > > > > Any ideas? Anyone else run into this before? > > > > 24 cores > > 64GB ram > > 4x2TB sata3 hdd > > > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) > on > > this machine. > > > > 24 map slots (1gb heap each), no reducers. > > > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine. > -- Robert Dyer rd...@iastate.edu