Re: OOM problems with 2.6.11-rc4
On Wed, Apr 13, 2005 at 03:47:40PM +0200, Andrea Arcangeli wrote: > On Fri, Mar 18, 2005 at 11:12:18AM -0500, Noah Meyerhans wrote: > > Well, that's certainly an interesting question. The filesystem is IBM's > > JFS. If you tell me that's part of the problem, I'm not likely to > > disagree. 8^) > > It would be nice if you could reproduce with ext3 or reiserfs (if with > ext3, after applying the memleak fix from Andrew that was found in this > same thread ;). The below make it look like a jfs problem. > > 830696 830639 99%0.80K 2076744830696K jfs_ip I'll see what I can do. It may be difficult to move all the data to a different filesystem. There are multiple terabytes in use. I'll refer the JFS developers to this thread, too, they may be able to shed some light on it. Thanks. noah -- Noah Meyerhans System Administrator MIT Computer Science and Artificial Intelligence Laboratory signature.asc Description: Digital signature
Re: OOM problems with 2.6.11-rc4
On Fri, Mar 18, 2005 at 11:12:18AM -0500, Noah Meyerhans wrote: > Well, that's certainly an interesting question. The filesystem is IBM's > JFS. If you tell me that's part of the problem, I'm not likely to > disagree. 8^) It would be nice if you could reproduce with ext3 or reiserfs (if with ext3, after applying the memleak fix from Andrew that was found in this same thread ;). The below make it look like a jfs problem. 830696 830639 99%0.80K 2076744830696K jfs_ip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Hi Andrew, Andrea, et al. Sorry for taking a while to get back to you on this. Thanks a lot for the work you've already put in to this. We built a 2.6.11.4 kernel with Andrea's first patch for this problem (the patch is included at the end of this mail, just to make sure you know which one I'm referring to). We had also switched back to overcommit mode 0. More comments follow inline... On Tue, Mar 15, 2005 at 03:46:08PM -0800, Andrew Morton wrote: > > Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 > > slab:220221 mapped:12256 pagetables:122 > > Vast amounts of slab - presumably inode and dentries. > > What sort of local filesystems are in use? Well, that's certainly an interesting question. The filesystem is IBM's JFS. If you tell me that's part of the problem, I'm not likely to disagree. 8^) > Can you take a copy of /proc/slabinfo when the backup has run for a while, > send it? We triggerred a backup process, and I watched slabtop and /proc/meminfo while it was running, right up until the time the OOM killer was triggered. Unfortunately I didn't get a copy of slabinfo. Hopefully the slabtop and meminfo output help a bit, though. Here are the last three seconds worth of /proc/meminfo: Fri Mar 18 10:41:08 EST 2005 MemTotal: 2074660 kB MemFree: 8492 kB Buffers: 19552 kB Cached:1132916 kB SwapCached: 3672 kB Active: 55040 kB Inactive: 1136024 kB HighTotal: 1179072 kB HighFree: 576 kB LowTotal: 895588 kB LowFree: 7916 kB SwapTotal: 3615236 kB SwapFree: 3609168 kB Dirty: 68 kB Writeback: 0 kB Mapped: 43744 kB Slab: 861952 kB CommitLimit: 4652564 kB Committed_AS:53272 kB PageTables:572 kB VmallocTotal: 114680 kB VmallocUsed: 6700 kB VmallocChunk: 107964 kB Fri Mar 18 10:41:10 EST 2005 MemTotal: 2074660 kB MemFree: 8236 kB Buffers: 19512 kB Cached:1132884 kB SwapCached: 3672 kB Active: 54708 kB Inactive: 1136288 kB HighTotal: 1179072 kB HighFree: 576 kB LowTotal: 895588 kB LowFree: 7660 kB SwapTotal: 3615236 kB SwapFree: 3609168 kB Dirty: 68 kB Writeback: 0 kB Mapped: 43744 kB Slab: 862216 kB CommitLimit: 4652564 kB Committed_AS:53272 kB PageTables:572 kB VmallocTotal: 114680 kB VmallocUsed: 6700 kB VmallocChunk: 107964 kB MemTotal: 2074660 kB MemFree: 8620 kB Buffers: 19388 kB Cached:1132552 kB SwapCached: 3780 kB Active: 56200 kB Inactive: 1134388 kB HighTotal: 1179072 kB HighFree: 960 kB LowTotal: 895588 kB LowFree: 7660 kB SwapTotal: 3615236 kB SwapFree: 3609204 kB Dirty: 104 kB Writeback: 0 kB Mapped: 43572 kB Slab: 862484 kB CommitLimit: 4652564 kB Committed_AS:53100 kB PageTables:564 kB VmallocTotal: 114680 kB VmallocUsed: 6700 kB VmallocChunk: 107964 kB Here are the top few entries from the last page of slabtop: 830696 830639 99%0.80K 2076744830696K jfs_ip 129675 4841 3%0.05K 1729 75 6916K buffer_head 39186 35588 90%0.27K 2799 14 11196K radix_tree_node 5983 2619 43%0.12K193 31 772K size-128 4860 4728 97%0.05K 60 81 240K journal_head 4403 4403 100%0.03K 37 119 148K size-32 4164 4161 99%1.00K 10414 4164K size-1024 3857 1552 40%0.13K133 29 532K dentry_cache 3355 1781 53%0.06K 55 61 220K size-64 3103 3026 97%0.04K 29 107 116K sysfs_dir_cache 2712 2412 88%0.02K 12 22648K dm_io 2712 2412 88%0.02K 12 22648K dm_tio > Does increasing /proc/sys/vm/vfs_cache_pressure help? If you're watching > /proc/meminfo you should be able to observe the effect of that upon the > Slab: figure. It doesn't have any noticable effect on the stability of the machine. I set it to 1 but within a few hours the machine had crashed again. We weren't able to capture all of the console messages prior to the crash. Here are some of them. Note that, again, the last memory dump is was manually triggered via SysRq: nactive:132kB present:16384kB pages_scanned:1589 all_unreclaimable? yes lowmem_reserve[]: 0 880 2031 Normal free:3752kB min:3756kB low:4692kB high:5632kB active:9948kB inactive:9648kB present:901120kB pages_scanned:20640 all_unreclaimable? yes lowmem_reserve[]: 0 0 9212 HighMem free:960kB min:512kB low:640kB high:768kB active:45132kB inactive:1125920kB present:1179136kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB0*4096kB = 3588kB Normal: 0*4kB 1*8kB 0*16kB
Re: OOM problems with 2.6.11-rc4
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > On Wed, Mar 16, 2005 at 04:04:35AM -0800, Andrew Morton wrote: > > > + if (!reclaim_state->reclaimed_slab && > > > + zone->pages_scanned >= (zone->nr_active + > > > + zone->nr_inactive) * 4) > > > zone->all_unreclaimable = 1; > > > > That might not change anything because we clear ->all_unreclaimable in > > free_page_bulk(). [..] > > Really? free_page_bulk is called inside shrink_slab, and so it's overwritten > later by all_unreclaimable. Otherwise how could all_unreclaimable be set > in the first place if a single page freed by shrink_slab would be enough > to clear it? > > shrink_slab > all_unreclaimable = 0 > zone->pages_scanned >= (zone->nr_active [..] > all_unreclaimable = 1 > > try_to_free_pages > all_unreclaimable == 1 > oom Spose so. > I also considering changing shrink_slab to return a progress retval, but > then I noticed I could get away with a one liner fix ;). > > Your fix is better but it should be mostly equivalent in pratcie. I > liked the dontrylock not risking to go oom, the one liner couldn't > handle that ;). It has a problem. If ZONE_DMA is really, really oom, kswapd will sit there freeing up ZONE_NORMAL slab objects and not setting all_unreclaimable. We'll end up using tons of CPU and reclaiming lots of slab in response to a ZONE_DMA oom. I'm thinking that the most accurate way of fixing this and also avoiding the "we're fragmenting slab but not actually freeing pages yet" problem is - change task_struct->reclaim_state so that it has an array of booleans (one per zone) - in kmem_cache_free, work out what zone the object corresponds to and set the boolean in current->reclaim_state which corresponds to that zone. - in balance_pgdat(), inspect this zone's boolean to see if we're making any forward progress with slab freeing. Probably we can do the work in kmem_cache_free() at the place where we spill the slab magazine, to optimise things a bit. I haven't looked at it. But that has a problem too. Some other task might be freeing objects into the relevant zone instead of this one. So maybe a better approach would be to add a "someone freed something" counter to the zone structure. That would be incremented whenever anyone frees a page for a slab object. Then in balance_pdgat we take a look at that before and after performing the LRU and slab scans. If it incremented, dont' set all_unreclaimable. And still keep the free_pages_bulk code there as the code which takes us _out_ of the all_unreclaimable state. It's tricky. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
On Wed, Mar 16, 2005 at 04:04:35AM -0800, Andrew Morton wrote: > > + if (!reclaim_state->reclaimed_slab && > > + zone->pages_scanned >= (zone->nr_active + > > + zone->nr_inactive) * 4) > > zone->all_unreclaimable = 1; > > That might not change anything because we clear ->all_unreclaimable in > free_page_bulk(). [..] Really? free_page_bulk is called inside shrink_slab, and so it's overwritten later by all_unreclaimable. Otherwise how could all_unreclaimable be set in the first place if a single page freed by shrink_slab would be enough to clear it? shrink_slab all_unreclaimable = 0 zone->pages_scanned >= (zone->nr_active [..] all_unreclaimable = 1 try_to_free_pages all_unreclaimable == 1 oom I also considering changing shrink_slab to return a progress retval, but then I noticed I could get away with a one liner fix ;). Your fix is better but it should be mostly equivalent in pratcie. I liked the dontrylock not risking to go oom, the one liner couldn't handle that ;). thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > This below is an untested attempt at bringing dquot a bit more in line > with the API, to make the whole thing a bit more consistent, Like this? (Noah, don't bother testing this one) Fix some bugs spotted by Andrea Arcangeli <[EMAIL PROTECTED]> - When we added /proc/sys/vm/vfs_cache_pressure we forgot to allow it to tune the dquot and mbcache slabs as well. - Reduce lock contention in shrink_dqcache_memory(). - Use dqstats.free_dquots in shrink_dqcache_memory(): this is the count of reclaimable objects. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- 25-akpm/fs/dquot.c | 12 +--- 25-akpm/fs/mbcache.c |2 +- 2 files changed, 6 insertions(+), 8 deletions(-) diff -puN fs/dquot.c~slab-shrinkers-use-vfs_cache_pressure fs/dquot.c --- 25/fs/dquot.c~slab-shrinkers-use-vfs_cache_pressure 2005-03-16 04:22:01.0 -0800 +++ 25-akpm/fs/dquot.c 2005-03-16 04:27:09.0 -0800 @@ -505,14 +505,12 @@ static void prune_dqcache(int count) static int shrink_dqcache_memory(int nr, unsigned int gfp_mask) { - int ret; - - spin_lock(&dq_list_lock); - if (nr) + if (nr) { + spin_lock(&dq_list_lock); prune_dqcache(nr); - ret = dqstats.allocated_dquots; - spin_unlock(&dq_list_lock); - return ret; + spin_unlock(&dq_list_lock); + } + return (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure; } /* diff -puN fs/mbcache.c~slab-shrinkers-use-vfs_cache_pressure fs/mbcache.c --- 25/fs/mbcache.c~slab-shrinkers-use-vfs_cache_pressure 2005-03-16 04:22:01.0 -0800 +++ 25-akpm/fs/mbcache.c2005-03-16 04:24:43.0 -0800 @@ -225,7 +225,7 @@ mb_cache_shrink_fn(int nr_to_scan, unsig e_lru_list), gfp_mask); } out: - return count; + return (count / 100) * sysctl_vfs_cache_pressure; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > - ret = dqstats.allocated_dquots; > +ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure; Oh I see. Yes, using .allocated_dquots was wrong. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Andrew Morton <[EMAIL PROTECTED]> wrote: > > Still, I think it would make more sense to return a success indication from > shrink_slab() if we actually freed any slab objects. That will prevent us > from incorrectly going all_unreclaimable if all we happen to be doing is > increasing slab internal fragmentation. > > We could do that kludgily by re-polling the shrinker but it would be better > to return a second value from all the shrinkers. This is the kludgy version. --- 25/mm/vmscan.c~vmscan-notice-slab-shrinking 2005-03-16 04:12:49.0 -0800 +++ 25-akpm/mm/vmscan.c 2005-03-16 04:14:02.0 -0800 @@ -180,17 +180,20 @@ EXPORT_SYMBOL(remove_shrinker); * `lru_pages' represents the number of on-LRU pages in all the zones which * are eligible for the caller's allocation attempt. It is used for balancing * slab reclaim versus page reclaim. + * + * Returns the number of slab objects which we shrunk. */ static int shrink_slab(unsigned long scanned, unsigned int gfp_mask, unsigned long lru_pages) { struct shrinker *shrinker; + int ret = 0; if (scanned == 0) scanned = SWAP_CLUSTER_MAX; if (!down_read_trylock(&shrinker_rwsem)) - return 0; + return 1; /* Assume we'll be able to shrink next time */ list_for_each_entry(shrinker, &shrinker_list, list) { unsigned long long delta; @@ -209,10 +212,14 @@ static int shrink_slab(unsigned long sca while (total_scan >= SHRINK_BATCH) { long this_scan = SHRINK_BATCH; int shrink_ret; + int nr_before; + nr_before = (*shrinker->shrinker)(0, gfp_mask); shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask); if (shrink_ret == -1) break; + if (shrink_ret < nr_before) + ret += nr_before - shrink_ret; mod_page_state(slabs_scanned, this_scan); total_scan -= this_scan; @@ -222,7 +229,7 @@ static int shrink_slab(unsigned long sca shrinker->nr += total_scan; } up_read(&shrinker_rwsem); - return 0; + return ret; } /* Called without lock on whether page is mapped, so answer is unstable */ @@ -1077,6 +1084,7 @@ scan: */ for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; + int nr_slab; if (zone->present_pages == 0) continue; @@ -1098,14 +1106,15 @@ scan: sc.swap_cluster_max = nr_pages? nr_pages : SWAP_CLUSTER_MAX; shrink_zone(zone, &sc); reclaim_state->reclaimed_slab = 0; - shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages); + nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, + lru_pages); sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_reclaimed += sc.nr_reclaimed; total_scanned += sc.nr_scanned; if (zone->all_unreclaimable) continue; - if (zone->pages_scanned >= (zone->nr_active + - zone->nr_inactive) * 4) + if (nr_slab == 0 && zone->pages_scanned >= + (zone->nr_active + zone->nr_inactive) * 4) zone->all_unreclaimable = 1; /* * If we've done a decent amount of scanning and _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > the VM is setting all_unreclaimable on the > normal zone without any care about the progress we're making at freeing > the slab. Urgh, I didn't notice that all_unreclaimable is set. > Beware this absolutely untested and it may not be enough. Perhaps there > are more bugs in the same area (the shrink_slab itself seems overkill > complicated for no good reason and different methods returns random > stuff, dcache returns a percentage of the free entries, dquot instead > returns the allocated inuse entries too which makes the whole API > looking unreliable). No, the two functions are equivalent for the default value of vfs_cache_pressure (100) - it's not a percentage. It's just that we forgot about the quota cache when adding the tunable. And mbcache, come to that. > Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> > > --- x/mm/vmscan.c.~1~ 2005-03-14 05:02:17.0 +0100 > +++ x/mm/vmscan.c 2005-03-16 01:28:16.0 +0100 > @@ -1074,8 +1074,9 @@ scan: > total_scanned += sc.nr_scanned; > if (zone->all_unreclaimable) > continue; > - if (zone->pages_scanned >= (zone->nr_active + > - zone->nr_inactive) * 4) A change we made a while back effectively doubles the rate at which pages_scanned gets incremented here (we now account for the active list as well as the inactive list). So this should be *8 to make it more equivalent to the old code. Not that this is likely to make much difference. > + if (!reclaim_state->reclaimed_slab && > + zone->pages_scanned >= (zone->nr_active + > + zone->nr_inactive) * 4) > zone->all_unreclaimable = 1; That might not change anything because we clear ->all_unreclaimable in free_page_bulk(). Although that is behind the per-cpu-pages, so there will be some lag. And this change will cause us to not bale out of reclaim.. Still, I think it would make more sense to return a success indication from shrink_slab() if we actually freed any slab objects. That will prevent us from incorrectly going all_unreclaimable if all we happen to be doing is increasing slab internal fragmentation. We could do that kludgily by re-polling the shrinker but it would be better to return a second value from all the shrinkers. > --- x/fs/dquot.c.~1~ 2005-03-08 01:02:13.0 +0100 > +++ x/fs/dquot.c 2005-03-16 01:18:19.0 +0100 > @@ -510,7 +510,7 @@ static int shrink_dqcache_memory(int nr, > spin_lock(&dq_list_lock); > if (nr) > prune_dqcache(nr); > - ret = dqstats.allocated_dquots; > + ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure; > spin_unlock(&dq_list_lock); > return ret; > } yup. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
On Wed, Mar 16, 2005 at 01:31:34AM +0100, Andrea Arcangeli wrote: > In short I think we can start by trying this fix (which has some risk, > since now it might become harder to detect an oom condition, but I don't Some testing shows that oom conditions are still detected fine (I expected this but I wasn't completely sure until I tested it ;). Now the main question is if this is enough to fix your problem or if there are more hidden bugs in the same area. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
On Tue, Mar 15, 2005 at 03:44:13PM -0500, Noah Meyerhans wrote: > Hello. We have a server, currently running 2.6.11-rc4, that is > experiencing similar OOM problems to those described at > http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e > and discussed further by several developers here (the summary is at > http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6) We > are running 2.6.11-rc4 because it contains the patches that Andrea > mentioned in the kerneltraffic link. The problem was present in 2.6.10 > as well. We can try newer 2.6 kernels if it helps. Thanks for testing the new code, but unfortunately the problem you're facing is a different one. It's still definitely another VM bug though. While looking after your bug I identified for sure a bug in how the VM sets the all_unreclaimable, the VM is setting all_unreclaimable on the normal zone without any care about the progress we're making at freeing the slab. Once all_unreclaimable is set, it's pretty much too late in trying not to go OOM. all_unreclaimable truly means OOM so we must be extremely careful when we set it (for sure the slab progress must be taken into account). We also want kswapd to help us in freeing the slab in the background instead of erroneously giving it up if some slab cache is still freeable. Once all_unreclaimable is set, then shrink_caches will stop calling shrink_zone for anything but the lowest prio, and this will lead to sc.nr_scanned to be small, and this will lead to shrink_slab to get a small parameter too. In short I think we can start by trying this fix (which has some risk, since now it might become harder to detect an oom condition, but I don't see many other ways in order to keep the slab progress into account without major changes). perhaps another way would be to check for total_reclaimed < SWAP_CLUSTER_MAX, but the one I used in the patch is much safer for your purposes (even if less safe in terms of not running into live locks). Beware this absolutely untested and it may not be enough. Perhaps there are more bugs in the same area (the shrink_slab itself seems overkill complicated for no good reason and different methods returns random stuff, dcache returns a percentage of the free entries, dquot instead returns the allocated inuse entries too which makes the whole API looking unreliable). Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> --- x/mm/vmscan.c.~1~ 2005-03-14 05:02:17.0 +0100 +++ x/mm/vmscan.c 2005-03-16 01:28:16.0 +0100 @@ -1074,8 +1074,9 @@ scan: total_scanned += sc.nr_scanned; if (zone->all_unreclaimable) continue; - if (zone->pages_scanned >= (zone->nr_active + - zone->nr_inactive) * 4) + if (!reclaim_state->reclaimed_slab && + zone->pages_scanned >= (zone->nr_active + + zone->nr_inactive) * 4) zone->all_unreclaimable = 1; /* * If we've done a decent amount of scanning and This below is an untested attempt at bringing dquot a bit more in line with the API, to make the whole thing a bit more consistent, though I doubt you're using quotas, so it's only the above one that's going to be interesting for you to test. Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> --- x/fs/dquot.c.~1~2005-03-08 01:02:13.0 +0100 +++ x/fs/dquot.c2005-03-16 01:18:19.0 +0100 @@ -510,7 +510,7 @@ static int shrink_dqcache_memory(int nr, spin_lock(&dq_list_lock); if (nr) prune_dqcache(nr); - ret = dqstats.allocated_dquots; + ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure; spin_unlock(&dq_list_lock); return ret; } Let us know if this helps in any way or not. Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
Noah Meyerhans <[EMAIL PROTECTED]> wrote: > > Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 > slab:220221 mapped:12256 pagetables:122 Vast amounts of slab - presumably inode and dentries. What sort of local filesystems are in use? Can you take a copy of /proc/slabinfo when the backup has run for a while, send it? It's useful to run `watch -n1 cat /proc/meminfo', see what the various caches are doing during the operation. Also, run slabtop if you have it. Or bloatmeter (http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmon and http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmeter). The thing to watch for here is the internal fragmentation of the slab caches: dentry_cache:76505KB82373KB 92.87 93% is good. Sometimes it gets much worse - very regular directory patterns can trigger high fragmentation levels. Does increasing /proc/sys/vm/vfs_cache_pressure help? If you're watching /proc/meminfo you should be able to observe the effect of that upon the Slab: figure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
On Tue, 2005-03-15 at 16:56 -0500, Sean wrote: > On Tue, March 15, 2005 3:44 pm, Noah Meyerhans said: > > The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB > > of swap, and several TB of NFS exported filesystems. One notable point > > is that this machine has been running in overcommit mode 2 > > (/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being > > triggered, which is allegedly not supposed to be possible according to > > the kerneltraffic.org document above. We had been running in overcommit > > mode 0 until about a month ago, and experienced similar OOM problems > > then as well. > > We're seeing this on our dual Xeon box too, with 4 GB of RAM and 2GB of > swap (no NFS) using stock RHEL 4 kernel. The only thing that seems to > keep it from happening is setting /proc/sys/vm/vfs_cache_pressure to > 1. I suspect I hit this too on a smaller (UP) machine with 512MB RAM/512MB swap while stress testing RT stuff with dbench and massively parallel makes. The OOM seemed to trigger way before the machine filled up swap. I dismissed it at the time, but maybe there's something there. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM problems with 2.6.11-rc4
On Tue, March 15, 2005 3:44 pm, Noah Meyerhans said: > Hello. We have a server, currently running 2.6.11-rc4, that is > experiencing similar OOM problems to those described at > http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e > and discussed further by several developers here (the summary is at > http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6) We > are running 2.6.11-rc4 because it contains the patches that Andrea > mentioned in the kerneltraffic link. The problem was present in 2.6.10 > as well. We can try newer 2.6 kernels if it helps. > > The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB > of swap, and several TB of NFS exported filesystems. One notable point > is that this machine has been running in overcommit mode 2 > (/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being > triggered, which is allegedly not supposed to be possible according to > the kerneltraffic.org document above. We had been running in overcommit > mode 0 until about a month ago, and experienced similar OOM problems > then as well. We're seeing this on our dual Xeon box too, with 4 GB of RAM and 2GB of swap (no NFS) using stock RHEL 4 kernel. The only thing that seems to keep it from happening is setting /proc/sys/vm/vfs_cache_pressure to 1. Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
OOM problems with 2.6.11-rc4
Hello. We have a server, currently running 2.6.11-rc4, that is experiencing similar OOM problems to those described at http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e and discussed further by several developers here (the summary is at http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6) We are running 2.6.11-rc4 because it contains the patches that Andrea mentioned in the kerneltraffic link. The problem was present in 2.6.10 as well. We can try newer 2.6 kernels if it helps. The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB of swap, and several TB of NFS exported filesystems. One notable point is that this machine has been running in overcommit mode 2 (/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being triggered, which is allegedly not supposed to be possible according to the kerneltraffic.org document above. We had been running in overcommit mode 0 until about a month ago, and experienced similar OOM problems then as well. The problem can be somewhat reliably triggered by running our backup software on a particular filesystem. The backup software attempts to keep the entire file list in memory, and this filesystem contains several million files, so lots of memory is being allocated. The server experienced these problems today and we captured the kernel output, which is included below. Note that this machine has not used very much swap at all, and we've never observed it completely running out of swap. Note that in this kernel output, the last memory dump is from the magic SysRq key. By the time we've reached this point, the machine is unresponsive and our next action is to trigger a sync+reboot via the SysRq key. File content: 057 slab:220275 mapped:12395 pagetables:118 DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:696kB present:16384kB pages_scanned:1203 all_unreclaimable? yes lowmem_reserve[]: 0 880 2031 Normal free:3744kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB present:901120kB pages_scanned:683 all_unreclaimable? yes lowmem_reserve[]: 0 0 9212 HighMem free:896kB min:512kB low:640kB high:768kB active:50076kB inactive:1121156kB present:1179136kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 1*4kB 2*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB Normal: 0*4kB 10*8kB 1*16kB 2*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3744kB HighMem: 82*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 896kB Swap cache: add 2582, delete 2011, find 276/524, race 0+0 Free swap = 3610572kB Total swap = 3615236kB Out of Memory: Killed process 1188 (exim). oom-killer: gfp_mask=0xd0 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 Free pages:9196kB (1856kB HighMem) Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 slab:220221 mapped:12256 pagetables:122 DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:736kB present:16384kB pages_scanned:5706 all_unreclaimable? yes lowmem_reserve[]: 0 880 2031 Normal free:3752kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB present:901120kB pages_scanned:6943 all_unreclaimable? yes lowmem_reserve[]: 0 0 9212 HighMem free:1856kB min:512kB low:640kB high:768kB active:49528kB inactive:1120732kB present:1179136kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 3*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB Normal: 0*4kB 11*8kB 1*16kB 2*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3752kB HighMem: 204*4kB 36*8kB 9*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1856kB Swap cache: add 2582, delete 2011, find 276/524, race 0+0 Free swap = 3610572kB Total swap = 3615236kB Out of Memory: Killed process 17905 (terad). oom-killer: gfp_mask=0xd0 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 Free pages: 21804kB (14464kB HighMem) Active:9243 inactive:280452 dirty:214 writeback:0 unstable:0 free:5451 slab:220222 mapped:9110 pagetables:115 DMA free:3588kB min:68kB low:84kB high:100kB active:28kB inactive:708kB present:16384kB pages_scanned:5739 all_unreclaimable? yes lowmem_reserve[]: 0 880 2031 Normal free:3752kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB present:901120kB pages_scanned:6943 all_unreclaimable? yes lowmem_reserve[]: 0 0 9212 HighMem free:14464kB min:512kB low:640kB high:768kB active:36944kB inactive:1120732kB present:11