subject:"OOM problems with 2.6.11\-rc4"

Re: OOM problems with 2.6.11-rc4

2005-04-14 Thread Noah Meyerhans

On Wed, Apr 13, 2005 at 03:47:40PM +0200, Andrea Arcangeli wrote:
> On Fri, Mar 18, 2005 at 11:12:18AM -0500, Noah Meyerhans wrote:
> > Well, that's certainly an interesting question.  The filesystem is IBM's
> > JFS.  If you tell me that's part of the problem, I'm not likely to
> > disagree.  8^)
> 
> It would be nice if you could reproduce with ext3 or reiserfs (if with
> ext3, after applying the memleak fix from Andrew that was found in this
> same thread ;). The below make it look like a jfs problem.
> 
> 830696 830639  99%0.80K 2076744830696K jfs_ip

I'll see what I can do.  It may be difficult to move all the data to a
different filesystem.  There are multiple terabytes in use.

I'll refer the JFS developers to this thread, too, they may be able to
shed some light on it.

Thanks.
noah

-- 
Noah Meyerhans System Administrator
MIT Computer Science and Artificial Intelligence Laboratory



signature.asc
Description: Digital signature

Re: OOM problems with 2.6.11-rc4

2005-04-13 Thread Andrea Arcangeli

On Fri, Mar 18, 2005 at 11:12:18AM -0500, Noah Meyerhans wrote:
> Well, that's certainly an interesting question.  The filesystem is IBM's
> JFS.  If you tell me that's part of the problem, I'm not likely to
> disagree.  8^)

It would be nice if you could reproduce with ext3 or reiserfs (if with
ext3, after applying the memleak fix from Andrew that was found in this
same thread ;). The below make it look like a jfs problem.

830696 830639  99%0.80K 2076744830696K jfs_ip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-18 Thread Noah Meyerhans

Hi Andrew, Andrea, et al.  Sorry for taking a while to get back to you
on this.  Thanks a lot for the work you've already put in to this.  We
built a 2.6.11.4 kernel with Andrea's first patch for this problem (the
patch is included at the end of this mail, just to make sure you know
which one I'm referring to).  We had also switched back to overcommit
mode 0.  More comments follow inline...

On Tue, Mar 15, 2005 at 03:46:08PM -0800, Andrew Morton wrote:
> > Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 
> > slab:220221 mapped:12256 pagetables:122
> 
> Vast amounts of slab - presumably inode and dentries.
> 
> What sort of local filesystems are in use?

Well, that's certainly an interesting question.  The filesystem is IBM's
JFS.  If you tell me that's part of the problem, I'm not likely to
disagree.  8^)

> Can you take a copy of /proc/slabinfo when the backup has run for a while,
> send it?

We triggerred a backup process, and I watched slabtop and /proc/meminfo
while it was running, right up until the time the OOM killer was
triggered.  Unfortunately I didn't get a copy of slabinfo.  Hopefully
the slabtop and meminfo output help a bit, though.  Here are the last
three seconds worth of /proc/meminfo:

Fri Mar 18 10:41:08 EST 2005
MemTotal:  2074660 kB
MemFree:  8492 kB
Buffers: 19552 kB
Cached:1132916 kB
SwapCached:   3672 kB
Active:  55040 kB
Inactive:  1136024 kB
HighTotal: 1179072 kB
HighFree:  576 kB
LowTotal:   895588 kB
LowFree:  7916 kB
SwapTotal: 3615236 kB
SwapFree:  3609168 kB
Dirty:  68 kB
Writeback:   0 kB
Mapped:  43744 kB
Slab:   861952 kB
CommitLimit:   4652564 kB
Committed_AS:53272 kB
PageTables:572 kB
VmallocTotal:   114680 kB
VmallocUsed:  6700 kB
VmallocChunk:   107964 kB
Fri Mar 18 10:41:10 EST 2005
MemTotal:  2074660 kB
MemFree:  8236 kB
Buffers: 19512 kB
Cached:1132884 kB
SwapCached:   3672 kB
Active:  54708 kB
Inactive:  1136288 kB
HighTotal: 1179072 kB
HighFree:  576 kB
LowTotal:   895588 kB
LowFree:  7660 kB
SwapTotal: 3615236 kB
SwapFree:  3609168 kB
Dirty:  68 kB
Writeback:   0 kB
Mapped:  43744 kB
Slab:   862216 kB
CommitLimit:   4652564 kB
Committed_AS:53272 kB
PageTables:572 kB
VmallocTotal:   114680 kB
VmallocUsed:  6700 kB
VmallocChunk:   107964 kB

MemTotal:  2074660 kB
MemFree:  8620 kB
Buffers: 19388 kB
Cached:1132552 kB
SwapCached:   3780 kB
Active:  56200 kB
Inactive:  1134388 kB
HighTotal: 1179072 kB
HighFree:  960 kB
LowTotal:   895588 kB
LowFree:  7660 kB
SwapTotal: 3615236 kB
SwapFree:  3609204 kB
Dirty: 104 kB
Writeback:   0 kB
Mapped:  43572 kB
Slab:   862484 kB
CommitLimit:   4652564 kB
Committed_AS:53100 kB
PageTables:564 kB
VmallocTotal:   114680 kB
VmallocUsed:  6700 kB
VmallocChunk:   107964 kB

Here are the top few entries from the last page of slabtop:
830696 830639  99%0.80K 2076744830696K jfs_ip
129675   4841   3%0.05K   1729   75  6916K buffer_head
 39186  35588  90%0.27K   2799   14 11196K radix_tree_node
  5983   2619  43%0.12K193   31   772K size-128
  4860   4728  97%0.05K 60   81   240K journal_head
  4403   4403 100%0.03K 37  119   148K size-32
  4164   4161  99%1.00K   10414  4164K size-1024
  3857   1552  40%0.13K133   29   532K dentry_cache
  3355   1781  53%0.06K 55   61   220K size-64
  3103   3026  97%0.04K 29  107   116K sysfs_dir_cache
  2712   2412  88%0.02K 12  22648K dm_io
  2712   2412  88%0.02K 12  22648K dm_tio



> Does increasing /proc/sys/vm/vfs_cache_pressure help?  If you're watching
> /proc/meminfo you should be able to observe the effect of that upon the
> Slab: figure.

It doesn't have any noticable effect on the stability of the machine.  I
set it to 1 but within a few hours the machine had crashed again.

We weren't able to capture all of the console messages prior to the
crash.  Here are some of them.  Note that, again, the last memory dump
is was manually triggered via SysRq:

nactive:132kB present:16384kB pages_scanned:1589 all_unreclaimable? yes
lowmem_reserve[]: 0 880 2031
Normal free:3752kB min:3756kB low:4692kB high:5632kB active:9948kB 
inactive:9648kB present:901120kB pages_scanned:20640 all_unreclaimable? yes
lowmem_reserve[]: 0 0 9212
HighMem free:960kB min:512kB low:640kB high:768kB active:45132kB 
inactive:1125920kB present:1179136kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 
1*2048kB0*4096kB = 3588kB
Normal: 0*4kB 1*8kB 0*16kB

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrew Morton

Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> On Wed, Mar 16, 2005 at 04:04:35AM -0800, Andrew Morton wrote:
> > > + if (!reclaim_state->reclaimed_slab &&
> > > + zone->pages_scanned >= (zone->nr_active +
> > > + zone->nr_inactive) * 4)
> > >   zone->all_unreclaimable = 1;
> > 
> > That might not change anything because we clear ->all_unreclaimable in
> > free_page_bulk().  [..]
> 
> Really? free_page_bulk is called inside shrink_slab, and so it's overwritten
> later by all_unreclaimable. Otherwise how could all_unreclaimable be set
> in the first place if a single page freed by shrink_slab would be enough
> to clear it?
> 
>   shrink_slab
>   all_unreclaimable = 0
>   zone->pages_scanned >= (zone->nr_active [..]
>   all_unreclaimable = 1
> 
>   try_to_free_pages
>   all_unreclaimable == 1
>   oom

Spose so.

> I also considering changing shrink_slab to return a progress retval, but
> then I noticed I could get away with a one liner fix ;).
> 
> Your fix is better but it should be mostly equivalent in pratcie. I
> liked the dontrylock not risking to go oom, the one liner couldn't
> handle that ;).

It has a problem.  If ZONE_DMA is really, really oom, kswapd will sit there
freeing up ZONE_NORMAL slab objects and not setting all_unreclaimable. 
We'll end up using tons of CPU and reclaiming lots of slab in response to a
ZONE_DMA oom.

I'm thinking that the most accurate way of fixing this and also avoiding
the "we're fragmenting slab but not actually freeing pages yet" problem is

- change task_struct->reclaim_state so that it has an array of booleans
  (one per zone)

- in kmem_cache_free, work out what zone the object corresponds to and
  set the boolean in current->reclaim_state which corresponds to that zone.

- in balance_pgdat(), inspect this zone's boolean to see if we're making
  any forward progress with slab freeing.

Probably we can do the work in kmem_cache_free() at the place where we
spill the slab magazine, to optimise things a bit.  I haven't looked at it.

But that has a problem too.  Some other task might be freeing objects into
the relevant zone instead of this one.

So maybe a better approach would be to add a "someone freed something"
counter to the zone structure.  That would be incremented whenever anyone
frees a page for a slab object.  Then in balance_pdgat we take a look at
that before and after performing the LRU and slab scans.  If it
incremented, dont' set all_unreclaimable.  And still keep the
free_pages_bulk code there as the code which takes us _out_ of the
all_unreclaimable state.

It's tricky.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrea Arcangeli

On Wed, Mar 16, 2005 at 04:04:35AM -0800, Andrew Morton wrote:
> > +   if (!reclaim_state->reclaimed_slab &&
> > +   zone->pages_scanned >= (zone->nr_active +
> > +   zone->nr_inactive) * 4)
> > zone->all_unreclaimable = 1;
> 
> That might not change anything because we clear ->all_unreclaimable in
> free_page_bulk().  [..]

Really? free_page_bulk is called inside shrink_slab, and so it's overwritten
later by all_unreclaimable. Otherwise how could all_unreclaimable be set
in the first place if a single page freed by shrink_slab would be enough
to clear it?

shrink_slab
all_unreclaimable = 0
zone->pages_scanned >= (zone->nr_active [..]
all_unreclaimable = 1

try_to_free_pages
all_unreclaimable == 1
oom

I also considering changing shrink_slab to return a progress retval, but
then I noticed I could get away with a one liner fix ;).

Your fix is better but it should be mostly equivalent in pratcie. I
liked the dontrylock not risking to go oom, the one liner couldn't
handle that ;).

thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrew Morton

Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> This below is an untested attempt at bringing dquot a bit more in line
>  with the API, to make the whole thing a bit more consistent,

Like this?  (Noah, don't bother testing this one)



Fix some bugs spotted by Andrea Arcangeli <[EMAIL PROTECTED]>

- When we added /proc/sys/vm/vfs_cache_pressure we forgot to allow it to
  tune the dquot and mbcache slabs as well.

- Reduce lock contention in shrink_dqcache_memory().

- Use dqstats.free_dquots in shrink_dqcache_memory(): this is the count of
  reclaimable objects.

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 25-akpm/fs/dquot.c   |   12 +---
 25-akpm/fs/mbcache.c |2 +-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff -puN fs/dquot.c~slab-shrinkers-use-vfs_cache_pressure fs/dquot.c
--- 25/fs/dquot.c~slab-shrinkers-use-vfs_cache_pressure 2005-03-16 
04:22:01.0 -0800
+++ 25-akpm/fs/dquot.c  2005-03-16 04:27:09.0 -0800
@@ -505,14 +505,12 @@ static void prune_dqcache(int count)
 
 static int shrink_dqcache_memory(int nr, unsigned int gfp_mask)
 {
-   int ret;
-
-   spin_lock(&dq_list_lock);
-   if (nr)
+   if (nr) {
+   spin_lock(&dq_list_lock);
prune_dqcache(nr);
-   ret = dqstats.allocated_dquots;
-   spin_unlock(&dq_list_lock);
-   return ret;
+   spin_unlock(&dq_list_lock);
+   }
+   return (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;
 }
 
 /*
diff -puN fs/mbcache.c~slab-shrinkers-use-vfs_cache_pressure fs/mbcache.c
--- 25/fs/mbcache.c~slab-shrinkers-use-vfs_cache_pressure   2005-03-16 
04:22:01.0 -0800
+++ 25-akpm/fs/mbcache.c2005-03-16 04:24:43.0 -0800
@@ -225,7 +225,7 @@ mb_cache_shrink_fn(int nr_to_scan, unsig
   e_lru_list), gfp_mask);
}
 out:
-   return count;
+   return (count / 100) * sysctl_vfs_cache_pressure;
 }
 
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrew Morton

Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> - ret = dqstats.allocated_dquots;
>  +ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;

Oh I see.  Yes, using .allocated_dquots was wrong.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrew Morton

Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> Still, I think it would make more sense to return a success indication from
>  shrink_slab() if we actually freed any slab objects.  That will prevent us
>  from incorrectly going all_unreclaimable if all we happen to be doing is
>  increasing slab internal fragmentation.
> 
>  We could do that kludgily by re-polling the shrinker but it would be better
>  to return a second value from all the shrinkers.

This is the kludgy version.

--- 25/mm/vmscan.c~vmscan-notice-slab-shrinking 2005-03-16 04:12:49.0 
-0800
+++ 25-akpm/mm/vmscan.c 2005-03-16 04:14:02.0 -0800
@@ -180,17 +180,20 @@ EXPORT_SYMBOL(remove_shrinker);
  * `lru_pages' represents the number of on-LRU pages in all the zones which
  * are eligible for the caller's allocation attempt.  It is used for balancing
  * slab reclaim versus page reclaim.
+ *
+ * Returns the number of slab objects which we shrunk.
  */
 static int shrink_slab(unsigned long scanned, unsigned int gfp_mask,
unsigned long lru_pages)
 {
struct shrinker *shrinker;
+   int ret = 0;
 
if (scanned == 0)
scanned = SWAP_CLUSTER_MAX;
 
if (!down_read_trylock(&shrinker_rwsem))
-   return 0;
+   return 1;   /* Assume we'll be able to shrink next time */
 
list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;
@@ -209,10 +212,14 @@ static int shrink_slab(unsigned long sca
while (total_scan >= SHRINK_BATCH) {
long this_scan = SHRINK_BATCH;
int shrink_ret;
+   int nr_before;
 
+   nr_before = (*shrinker->shrinker)(0, gfp_mask);
shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);
if (shrink_ret == -1)
break;
+   if (shrink_ret < nr_before)
+   ret += nr_before - shrink_ret;
mod_page_state(slabs_scanned, this_scan);
total_scan -= this_scan;
 
@@ -222,7 +229,7 @@ static int shrink_slab(unsigned long sca
shrinker->nr += total_scan;
}
up_read(&shrinker_rwsem);
-   return 0;
+   return ret;
 }
 
 /* Called without lock on whether page is mapped, so answer is unstable */
@@ -1077,6 +1084,7 @@ scan:
 */
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
+   int nr_slab;
 
if (zone->present_pages == 0)
continue;
@@ -1098,14 +1106,15 @@ scan:
sc.swap_cluster_max = nr_pages? nr_pages : 
SWAP_CLUSTER_MAX;
shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
-   shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
+   nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
+   lru_pages);
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
-   if (zone->pages_scanned >= (zone->nr_active +
-   zone->nr_inactive) * 4)
+   if (nr_slab == 0 && zone->pages_scanned >=
+   (zone->nr_active + zone->nr_inactive) * 4)
zone->all_unreclaimable = 1;
/*
 * If we've done a decent amount of scanning and
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrew Morton

Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> the VM is setting all_unreclaimable on the
> normal zone without any care about the progress we're making at freeing
> the slab.

Urgh, I didn't notice that all_unreclaimable is set.

> Beware this absolutely untested and it may not be enough.  Perhaps there
> are more bugs in the same area (the shrink_slab itself seems overkill
> complicated for no good reason and different methods returns random
> stuff, dcache returns a percentage of the free entries, dquot instead
> returns the allocated inuse entries too which makes the whole API
> looking unreliable).

No, the two functions are equivalent for the default value of
vfs_cache_pressure (100) - it's not a percentage.  It's just that we forgot
about the quota cache when adding the tunable.  And mbcache, come to that.

> Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
> 
> --- x/mm/vmscan.c.~1~ 2005-03-14 05:02:17.0 +0100
> +++ x/mm/vmscan.c 2005-03-16 01:28:16.0 +0100
> @@ -1074,8 +1074,9 @@ scan:
>   total_scanned += sc.nr_scanned;
>   if (zone->all_unreclaimable)
>   continue;
> - if (zone->pages_scanned >= (zone->nr_active +
> - zone->nr_inactive) * 4)

A change we made a while back effectively doubles the rate at which
pages_scanned gets incremented here (we now account for the active list as
well as the inactive list).  So this should be *8 to make it more
equivalent to the old code.  Not that this is likely to make much
difference.

> + if (!reclaim_state->reclaimed_slab &&
> + zone->pages_scanned >= (zone->nr_active +
> + zone->nr_inactive) * 4)
>   zone->all_unreclaimable = 1;

That might not change anything because we clear ->all_unreclaimable in
free_page_bulk().  Although that is behind the per-cpu-pages, so there will
be some lag.  And this change will cause us to not bale out of reclaim..

Still, I think it would make more sense to return a success indication from
shrink_slab() if we actually freed any slab objects.  That will prevent us
from incorrectly going all_unreclaimable if all we happen to be doing is
increasing slab internal fragmentation.

We could do that kludgily by re-polling the shrinker but it would be better
to return a second value from all the shrinkers.

> --- x/fs/dquot.c.~1~  2005-03-08 01:02:13.0 +0100
> +++ x/fs/dquot.c  2005-03-16 01:18:19.0 +0100
> @@ -510,7 +510,7 @@ static int shrink_dqcache_memory(int nr,
>   spin_lock(&dq_list_lock);
>   if (nr)
>   prune_dqcache(nr);
> - ret = dqstats.allocated_dquots;
> + ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;
>   spin_unlock(&dq_list_lock);
>   return ret;
>  }

yup.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-16 Thread Andrea Arcangeli

On Wed, Mar 16, 2005 at 01:31:34AM +0100, Andrea Arcangeli wrote:
> In short I think we can start by trying this fix (which has some risk,
> since now it might become harder to detect an oom condition, but I don't

Some testing shows that oom conditions are still detected fine (I
expected this but I wasn't completely sure until I tested it ;). Now the
main question is if this is enough to fix your problem or if there are
more hidden bugs in the same area.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Andrea Arcangeli

On Tue, Mar 15, 2005 at 03:44:13PM -0500, Noah Meyerhans wrote:
> Hello.  We have a server, currently running 2.6.11-rc4, that is
> experiencing similar OOM problems to those described at
> http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e
> and discussed further by several developers here (the summary is at
> http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6)  We
> are running 2.6.11-rc4 because it contains the patches that Andrea
> mentioned in the kerneltraffic link.  The problem was present in 2.6.10
> as well.  We can try newer 2.6 kernels if it helps.

Thanks for testing the new code, but unfortunately the problem you're
facing is a different one. It's still definitely another VM bug though.

While looking after your bug I identified for sure a bug in how the VM
sets the all_unreclaimable, the VM is setting all_unreclaimable on the
normal zone without any care about the progress we're making at freeing
the slab. Once all_unreclaimable is set, it's pretty much too late in
trying not to go OOM. all_unreclaimable truly means OOM so we must be
extremely careful when we set it (for sure the slab progress must be
taken into account).

We also want kswapd to help us in freeing the slab in the background
instead of erroneously giving it up if some slab cache is still
freeable.

Once all_unreclaimable is set, then shrink_caches will stop calling
shrink_zone for anything but the lowest prio, and this will lead to
sc.nr_scanned to be small, and this will lead to shrink_slab to get a
small parameter too.

In short I think we can start by trying this fix (which has some risk,
since now it might become harder to detect an oom condition, but I don't
see many other ways in order to keep the slab progress into account
without major changes). perhaps another way would be to check for
total_reclaimed < SWAP_CLUSTER_MAX, but the one I used in the patch is
much safer for your purposes (even if less safe in terms of not running
into live locks).

Beware this absolutely untested and it may not be enough.  Perhaps there
are more bugs in the same area (the shrink_slab itself seems overkill
complicated for no good reason and different methods returns random
stuff, dcache returns a percentage of the free entries, dquot instead
returns the allocated inuse entries too which makes the whole API
looking unreliable).

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

--- x/mm/vmscan.c.~1~   2005-03-14 05:02:17.0 +0100
+++ x/mm/vmscan.c   2005-03-16 01:28:16.0 +0100
@@ -1074,8 +1074,9 @@ scan:
total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
-   if (zone->pages_scanned >= (zone->nr_active +
-   zone->nr_inactive) * 4)
+   if (!reclaim_state->reclaimed_slab &&
+   zone->pages_scanned >= (zone->nr_active +
+   zone->nr_inactive) * 4)
zone->all_unreclaimable = 1;
/*
 * If we've done a decent amount of scanning and

This below is an untested attempt at bringing dquot a bit more in line
with the API, to make the whole thing a bit more consistent, though I
doubt you're using quotas, so it's only the above one that's going to be
interesting for you to test.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

--- x/fs/dquot.c.~1~2005-03-08 01:02:13.0 +0100
+++ x/fs/dquot.c2005-03-16 01:18:19.0 +0100
@@ -510,7 +510,7 @@ static int shrink_dqcache_memory(int nr,
spin_lock(&dq_list_lock);
if (nr)
prune_dqcache(nr);
-   ret = dqstats.allocated_dquots;
+   ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;
spin_unlock(&dq_list_lock);
return ret;
 }

Let us know if this helps in any way or not. Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Andrew Morton

Noah Meyerhans <[EMAIL PROTECTED]> wrote:
>
> Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 
> slab:220221 mapped:12256 pagetables:122

Vast amounts of slab - presumably inode and dentries.

What sort of local filesystems are in use?

Can you take a copy of /proc/slabinfo when the backup has run for a while,
send it?

It's useful to run `watch -n1 cat /proc/meminfo', see what the various
caches are doing during the operation.

Also, run slabtop if you have it.  Or bloatmeter
(http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmon and
http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmeter).  The thing to
watch for here is the internal fragmentation of the slab caches:

dentry_cache:76505KB82373KB   92.87

93% is good.  Sometimes it gets much worse - very regular directory
patterns can trigger high fragmentation levels.

Does increasing /proc/sys/vm/vfs_cache_pressure help?  If you're watching
/proc/meminfo you should be able to observe the effect of that upon the
Slab: figure.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Lee Revell

On Tue, 2005-03-15 at 16:56 -0500, Sean wrote:
> On Tue, March 15, 2005 3:44 pm, Noah Meyerhans said:
> > The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB
> > of swap, and several TB of NFS exported filesystems.  One notable point
> > is that this machine has been running in overcommit mode 2
> > (/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being
> > triggered, which is allegedly not supposed to be possible according to
> > the kerneltraffic.org document above.  We had been running in overcommit
> > mode 0 until about a month ago, and experienced similar OOM problems
> > then as well.
> 
> We're seeing this on our dual Xeon box too, with 4 GB of RAM and 2GB of
> swap (no NFS) using stock RHEL 4 kernel.   The only thing that seems to
> keep it from happening is setting /proc/sys/vm/vfs_cache_pressure to
> 1.

I suspect I hit this too on a smaller (UP) machine with 512MB RAM/512MB
swap while stress testing RT stuff with dbench and massively parallel
makes.  The OOM seemed to trigger way before the machine filled up swap.
I dismissed it at the time, but maybe there's something there.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Sean

On Tue, March 15, 2005 3:44 pm, Noah Meyerhans said:
> Hello.  We have a server, currently running 2.6.11-rc4, that is
> experiencing similar OOM problems to those described at
> http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e
> and discussed further by several developers here (the summary is at
> http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6)  We
> are running 2.6.11-rc4 because it contains the patches that Andrea
> mentioned in the kerneltraffic link.  The problem was present in 2.6.10
> as well.  We can try newer 2.6 kernels if it helps.
>
> The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB
> of swap, and several TB of NFS exported filesystems.  One notable point
> is that this machine has been running in overcommit mode 2
> (/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being
> triggered, which is allegedly not supposed to be possible according to
> the kerneltraffic.org document above.  We had been running in overcommit
> mode 0 until about a month ago, and experienced similar OOM problems
> then as well.

We're seeing this on our dual Xeon box too, with 4 GB of RAM and 2GB of
swap (no NFS) using stock RHEL 4 kernel.   The only thing that seems to
keep it from happening is setting /proc/sys/vm/vfs_cache_pressure to
1.

Sean


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

OOM problems with 2.6.11-rc4

2005-03-15 Thread Noah Meyerhans

Hello.  We have a server, currently running 2.6.11-rc4, that is
experiencing similar OOM problems to those described at
http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e
and discussed further by several developers here (the summary is at
http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6)  We
are running 2.6.11-rc4 because it contains the patches that Andrea
mentioned in the kerneltraffic link.  The problem was present in 2.6.10
as well.  We can try newer 2.6 kernels if it helps.

The machine in question is a dual Xeon system with 2 GB of RAM, 3.5 GB
of swap, and several TB of NFS exported filesystems.  One notable point
is that this machine has been running in overcommit mode 2
(/proc/sys/vm/overcommit_memory = 2) and the OOM killer is still being
triggered, which is allegedly not supposed to be possible according to
the kerneltraffic.org document above.  We had been running in overcommit
mode 0 until about a month ago, and experienced similar OOM problems
then as well.

The problem can be somewhat reliably triggered by running our backup
software on a particular filesystem.  The backup software attempts to
keep the entire file list in memory, and this filesystem contains
several million files, so lots of memory is being allocated.

The server experienced these problems today and we captured the kernel
output, which is included below.  Note that this machine has not used
very much swap at all, and we've never observed it completely running
out of swap.

Note that in this kernel output, the last memory dump is from the magic
SysRq key.  By the time we've reached this point, the machine is
unresponsive and our next action is to trigger a sync+reboot via the
SysRq key.

File content:
057 slab:220275 mapped:12395 pagetables:118
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:696kB 
present:16384kB pages_scanned:1203 all_unreclaimable? yes
lowmem_reserve[]: 0 880 2031
Normal free:3744kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB 
present:901120kB pages_scanned:683 all_unreclaimable? yes
lowmem_reserve[]: 0 0 9212
HighMem free:896kB min:512kB low:640kB high:768kB active:50076kB 
inactive:1121156kB present:1179136kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 2*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 
0*4096kB = 3588kB
Normal: 0*4kB 10*8kB 1*16kB 2*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 
1*2048kB 0*4096kB = 3744kB
HighMem: 82*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 
0*2048kB 0*4096kB = 896kB
Swap cache: add 2582, delete 2011, find 276/524, race 0+0
Free swap  = 3610572kB
Total swap = 3615236kB
Out of Memory: Killed process 1188 (exim).
oom-killer: gfp_mask=0xd0
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16

HighMem per-cpu:

cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16

Free pages:9196kB (1856kB HighMem)
Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 
slab:220221 mapped:12256 pagetables:122
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:736kB 
present:16384kB pages_scanned:5706 all_unreclaimable? yes
lowmem_reserve[]: 0 880 2031
Normal free:3752kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB 
present:901120kB pages_scanned:6943 all_unreclaimable? yes
lowmem_reserve[]: 0 0 9212
HighMem free:1856kB min:512kB low:640kB high:768kB active:49528kB 
inactive:1120732kB present:1179136kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 
0*4096kB = 3588kB
Normal: 0*4kB 11*8kB 1*16kB 2*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 
1*2048kB 0*4096kB = 3752kB
HighMem: 204*4kB 36*8kB 9*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 
0*2048kB 0*4096kB = 1856kB
Swap cache: add 2582, delete 2011, find 276/524, race 0+0
Free swap  = 3610572kB
Total swap = 3615236kB
Out of Memory: Killed process 17905 (terad).
oom-killer: gfp_mask=0xd0
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1

Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16

Free pages:   21804kB (14464kB HighMem)
Active:9243 inactive:280452 dirty:214 writeback:0 unstable:0 free:5451 
slab:220222 mapped:9110 pagetables:115
DMA free:3588kB min:68kB low:84kB high:100kB active:28kB inactive:708kB 
present:16384kB pages_scanned:5739 all_unreclaimable? yes
lowmem_reserve[]: 0 880 2031
Normal free:3752kB min:3756kB low:4692kB high:5632kB active:0kB inactive:368kB 
present:901120kB pages_scanned:6943 all_unreclaimable? yes
lowmem_reserve[]: 0 0 9212
HighMem free:14464kB min:512kB low:640kB high:768kB active:36944kB 
inactive:1120732kB present:11

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

Re: OOM problems with 2.6.11-rc4

OOM problems with 2.6.11-rc4

15 matches

Site Navigation

Mail list logo

Footer information