Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-13 Thread Robert Hancock

Zhao Forrest wrote:

These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y



Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh->b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;


So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load..


No, only CONFIG_PREEMPT will do that.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-13 Thread Robert Hancock

Zhao Forrest wrote:

These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y



Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh-b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;


So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load..


No, only CONFIG_PREEMPT will do that.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Peter Zijlstra
On Thu, 2007-04-12 at 00:55 -0700, Andrew Morton wrote:
> On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
> > 
> > > There used to be a cond_resched() in invalidate_mapping_pages() which 
> > > would
> > > have prevented this, but I rudely removed it to support
> > > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> > > under spinlock).
> > > 
> > > We could resurrect that cond_resched() by passing in some flag, I guess. 
> > > Or change the code to poke the softlockup detector.  The former would be
> > > better.
> > 
> > cond_resched() is conditional on __resched_legal(0)
> 
> What's __resched_legal()?

Hmm, I was looking at the RHEL-5 sources, it seems this function was
removed later (commmit 9414232fa0cc28e2f51b8c76d260f2748f7953fc).

> > which should take
> > care of being called under a spinlock.
> 
> We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

Right, ok, that makes the whole thing fall apart indeed.

> > Index: linux-2.6-mm/mm/truncate.c
> > ===
> > --- linux-2.6-mm.orig/mm/truncate.c
> > +++ linux-2.6-mm/mm/truncate.c
> > @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
> > pgoff_t index;
> > int lock_failed;
> >  
> > +   cond_resched();
> > +
> > lock_failed = TestSetPageLocked(page);
> 
> Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
> scheduling-in-spinlock warnings.

*nod*

> For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
> will fix things if !CONFIG_PREEMPT.
> 
> We can presumably just remove the invalidate_mapping_pages() call from the
> kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
> the same thing.

Certainly looks that way, I'll see if I can spot a hole in that.

> It might be time to remove that unused-for-six-years destroy_dirty_buffers
> too.

ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Andrew Morton
On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
> 
> > There used to be a cond_resched() in invalidate_mapping_pages() which would
> > have prevented this, but I rudely removed it to support
> > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> > under spinlock).
> > 
> > We could resurrect that cond_resched() by passing in some flag, I guess. 
> > Or change the code to poke the softlockup detector.  The former would be
> > better.
> 
> cond_resched() is conditional on __resched_legal(0)

What's __resched_legal()?

> which should take
> care of being called under a spinlock.

We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

> so I guess we can just reinstate the call in invalidate_mapping_pages()
> 
> (still waiting on the compile to finish...)
> ---
> invalidate_mapping_pages() is called under locks (usually preemptable)
> but can do a _lot_ of work, stick in a voluntary preemption point to
> avoid excessive latencies (over 10 seconds was reported by softlockup).
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  mm/truncate.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6-mm/mm/truncate.c
> ===
> --- linux-2.6-mm.orig/mm/truncate.c
> +++ linux-2.6-mm/mm/truncate.c
> @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
>   pgoff_t index;
>   int lock_failed;
>  
> + cond_resched();
> +
>   lock_failed = TestSetPageLocked(page);

Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
scheduling-in-spinlock warnings.

For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
will fix things if !CONFIG_PREEMPT.

We can presumably just remove the invalidate_mapping_pages() call from the
kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
the same thing.

It might be time to remove that unused-for-six-years destroy_dirty_buffers
too.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Peter Zijlstra
On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:

> There used to be a cond_resched() in invalidate_mapping_pages() which would
> have prevented this, but I rudely removed it to support
> /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> under spinlock).
> 
> We could resurrect that cond_resched() by passing in some flag, I guess. 
> Or change the code to poke the softlockup detector.  The former would be
> better.

cond_resched() is conditional on __resched_legal(0), which should take
care of being called under a spinlock.

so I guess we can just reinstate the call in invalidate_mapping_pages()

(still waiting on the compile to finish...)
---
invalidate_mapping_pages() is called under locks (usually preemptable)
but can do a _lot_ of work, stick in a voluntary preemption point to
avoid excessive latencies (over 10 seconds was reported by softlockup).

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/truncate.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6-mm/mm/truncate.c
===
--- linux-2.6-mm.orig/mm/truncate.c
+++ linux-2.6-mm/mm/truncate.c
@@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
pgoff_t index;
int lock_failed;
 
+   cond_resched();
+
lock_failed = TestSetPageLocked(page);
 
/*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Zhao Forrest

On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head   10927942 10942560120   321 : tunables   32
> 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
> 11743736  01 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
>
> Then after buffer_head is freed, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head 9542  36384120   321 : tunables   32   16
>   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
> 11743736  01 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
>
> Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)

These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y



Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh->b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;


So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load..

Thanks,
Forrest
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Zhao Forrest

On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:

On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
 I got some new information:
 Before soft lockup message is out, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head   10927942 10942560120   321 : tunables   32
 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
 11743736  01 6918 12166031 1013708
 : cpustat 35254590 2350698 13610965 907286

 Then after buffer_head is freed, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head 9542  36384120   321 : tunables   32   16
   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
 11743736  01 6983 20507478
 1708818 : cpustat 35254625 2350704 16027174 1068367

 Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)

These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y



Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh-b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;


So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load..

Thanks,
Forrest
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Peter Zijlstra
On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:

 There used to be a cond_resched() in invalidate_mapping_pages() which would
 have prevented this, but I rudely removed it to support
 /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
 under spinlock).
 
 We could resurrect that cond_resched() by passing in some flag, I guess. 
 Or change the code to poke the softlockup detector.  The former would be
 better.

cond_resched() is conditional on __resched_legal(0), which should take
care of being called under a spinlock.

so I guess we can just reinstate the call in invalidate_mapping_pages()

(still waiting on the compile to finish...)
---
invalidate_mapping_pages() is called under locks (usually preemptable)
but can do a _lot_ of work, stick in a voluntary preemption point to
avoid excessive latencies (over 10 seconds was reported by softlockup).

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---
 mm/truncate.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6-mm/mm/truncate.c
===
--- linux-2.6-mm.orig/mm/truncate.c
+++ linux-2.6-mm/mm/truncate.c
@@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
pgoff_t index;
int lock_failed;
 
+   cond_resched();
+
lock_failed = TestSetPageLocked(page);
 
/*


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Andrew Morton
On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote:

 On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
 
  There used to be a cond_resched() in invalidate_mapping_pages() which would
  have prevented this, but I rudely removed it to support
  /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
  under spinlock).
  
  We could resurrect that cond_resched() by passing in some flag, I guess. 
  Or change the code to poke the softlockup detector.  The former would be
  better.
 
 cond_resched() is conditional on __resched_legal(0)

What's __resched_legal()?

 which should take
 care of being called under a spinlock.

We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

 so I guess we can just reinstate the call in invalidate_mapping_pages()
 
 (still waiting on the compile to finish...)
 ---
 invalidate_mapping_pages() is called under locks (usually preemptable)
 but can do a _lot_ of work, stick in a voluntary preemption point to
 avoid excessive latencies (over 10 seconds was reported by softlockup).
 
 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 ---
  mm/truncate.c |2 ++
  1 file changed, 2 insertions(+)
 
 Index: linux-2.6-mm/mm/truncate.c
 ===
 --- linux-2.6-mm.orig/mm/truncate.c
 +++ linux-2.6-mm/mm/truncate.c
 @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
   pgoff_t index;
   int lock_failed;
  
 + cond_resched();
 +
   lock_failed = TestSetPageLocked(page);

Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
scheduling-in-spinlock warnings.

For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
will fix things if !CONFIG_PREEMPT.

We can presumably just remove the invalidate_mapping_pages() call from the
kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
the same thing.

It might be time to remove that unused-for-six-years destroy_dirty_buffers
too.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-12 Thread Peter Zijlstra
On Thu, 2007-04-12 at 00:55 -0700, Andrew Morton wrote:
 On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
  
   There used to be a cond_resched() in invalidate_mapping_pages() which 
   would
   have prevented this, but I rudely removed it to support
   /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
   under spinlock).
   
   We could resurrect that cond_resched() by passing in some flag, I guess. 
   Or change the code to poke the softlockup detector.  The former would be
   better.
  
  cond_resched() is conditional on __resched_legal(0)
 
 What's __resched_legal()?

Hmm, I was looking at the RHEL-5 sources, it seems this function was
removed later (commmit 9414232fa0cc28e2f51b8c76d260f2748f7953fc).

  which should take
  care of being called under a spinlock.
 
 We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

Right, ok, that makes the whole thing fall apart indeed.

  Index: linux-2.6-mm/mm/truncate.c
  ===
  --- linux-2.6-mm.orig/mm/truncate.c
  +++ linux-2.6-mm/mm/truncate.c
  @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
  pgoff_t index;
  int lock_failed;
   
  +   cond_resched();
  +
  lock_failed = TestSetPageLocked(page);
 
 Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
 scheduling-in-spinlock warnings.

*nod*

 For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
 will fix things if !CONFIG_PREEMPT.
 
 We can presumably just remove the invalidate_mapping_pages() call from the
 kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
 the same thing.

Certainly looks that way, I'll see if I can spot a hole in that.

 It might be time to remove that unused-for-six-years destroy_dirty_buffers
 too.

ok.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Andrew Morton
On Wed, 11 Apr 2007 14:17:04 +0800
"Zhao Forrest" <[EMAIL PROTECTED]> wrote:

> We're using RHEL5 with kernel version 2.6.18-8.el5.
> When doing a stress test on raw device for about 3-4 hours, we found
> the soft lockup message in dmesg.
> I know we're not reporting the bug on the latest kernel, but does any
> expert know if this is the known issue in old kernel? Or why
> kmem_cache_free occupy CPU for more than 10 seconds?
> 
> Please let me know if you need any information.
> 
> Thanks,
> Forrest
> --
> BUG: soft lockup detected on CPU#1!
> 
> Call Trace:
>[] softlockup_tick+0xdb/0xed
>  [] update_process_times+0x42/0x68
>  [] smp_local_timer_interrupt+0x23/0x47
>  [] smp_apic_timer_interrupt+0x41/0x47
>  [] apic_timer_interrupt+0x66/0x6c
>[] kmem_cache_free+0x1c0/0x1cb
>  [] free_buffer_head+0x2a/0x43
>  [] try_to_free_buffers+0x89/0x9d
>  [] invalidate_mapping_pages+0x90/0x15f
>  [] kill_bdev+0xe/0x21
>  [] __blkdev_put+0x4f/0x169
>  [] __fput+0xae/0x198
>  [] filp_close+0x5c/0x64
>  [] put_files_struct+0x6c/0xc3
>  [] do_exit+0x2d2/0x8b1
>  [] cpuset_exit+0x0/0x6c
>  [] get_signal_to_deliver+0x427/0x456
>  [] do_notify_resume+0x9c/0x7a9
>  [] default_wake_function+0x0/0xe
>  [] audit_syscall_exit+0x2cd/0x2ec
>  [] int_signal+0x12/0x17

I think there's nothing unusual happening here - you closed the device and
the kernel has to remove a tremendous number of pagecache pages, and that
simply takes a long time.

How much memory does the machine have?

There used to be a cond_resched() in invalidate_mapping_pages() which would
have prevented this, but I rudely removed it to support
/proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
under spinlock).

We could resurrect that cond_resched() by passing in some flag, I guess. 
Or change the code to poke the softlockup detector.  The former would be
better.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Badari Pulavarty
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> > > I'm confused - which end of ths stack is up?
> > >
> > > cpuset_exit doesn't call do_exit, rather it's the other
> > > way around.  But put_files_struct doesn't call do_exit,
> > > rather do_exit calls __exit_files calls put_files_struct.
> >
> > I'm guessing its x86_64 which generates crap traces.
> >
> Yes, it's x86_64. Is there a reliable way to generate stack traces under 
> x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

CONFIG_UNWIND_INFO=y
CONFIG_STACK_UNWIND=y

should help.

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Ken Chen

On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head   10927942 10942560120   321 : tunables   32
> 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
> 11743736  01 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
>
> Then after buffer_head is freed, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head 9542  36384120   321 : tunables   32   16
>   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
> 11743736  01 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
>
> Does this huge number of buffer_head cause the soft lockup?

__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)


also, you can try this patch:

http://groups.google.com/group/linux.kernel/browse_thread/thread/7086e4b9d5504dc9/c608bfea4614b07e?lnk=gst=+Big+kernel+lock+contention+in+do_open=1#c608bfea4614b07e
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Chuck Ebbert
Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>> On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
>> > I'm confused - which end of ths stack is up?
>> >
>> > cpuset_exit doesn't call do_exit, rather it's the other
>> > way around.  But put_files_struct doesn't call do_exit,
>> > rather do_exit calls __exit_files calls put_files_struct.
>>
>> I'm guessing its x86_64 which generates crap traces.
>>
> Yes, it's x86_64. Is there a reliable way to generate stack traces under
> x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

It will help a little, but not much because the stack backtrace
code ignores the frame pointers. But it will prevent tail calls,
making it somewhat easier to make sense of the data.

This should be a FAQ item somewhere...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head   10927942 10942560120   321 : tunables   32
> 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
> 11743736  01 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
> 
> Then after buffer_head is freed, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head 9542  36384120   321 : tunables   32   16
>   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
> 11743736  01 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
> 
> Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)

Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh->b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> > > I'm confused - which end of ths stack is up?
> > >
> > > cpuset_exit doesn't call do_exit, rather it's the other
> > > way around.  But put_files_struct doesn't call do_exit,
> > > rather do_exit calls __exit_files calls put_files_struct.
> >
> > I'm guessing its x86_64 which generates crap traces.
> >
> Yes, it's x86_64. Is there a reliable way to generate stack traces under 
> x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

Sometimes, the best is to redo the undo of the dwarf based stack
unwinder.

But as you said, that _huge_ number of buffers might be the issue, I'm
looking through the codepaths from __blkdev_put on downwards, I suspect
we hold a single lock somewhere,...

So hold on with patching the unwinder back in (unless of course you
fancy doing so :-)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> I'm confused - which end of ths stack is up?
>
> cpuset_exit doesn't call do_exit, rather it's the other
> way around.  But put_files_struct doesn't call do_exit,
> rather do_exit calls __exit_files calls put_files_struct.

I'm guessing its x86_64 which generates crap traces.


Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64?
Can enabling "[ ] Compile the kernel with frame pointers" help?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> I'm confused - which end of ths stack is up?
> 
> cpuset_exit doesn't call do_exit, rather it's the other
> way around.  But put_files_struct doesn't call do_exit,
> rather do_exit calls __exit_files calls put_files_struct.

I'm guessing its x86_64 which generates crap traces.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

I got some new information:
Before soft lockup message is out, we have:
[EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
buffer_head   10927942 10942560120   321 : tunables   32
168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
11743736  01 6918 12166031 1013708
: cpustat 35254590 2350698 13610965 907286

Then after buffer_head is freed, we have:
[EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
buffer_head 9542  36384120   321 : tunables   32   16
 8 : slabdata   1137   1137245 : globalstat 37602996 11589379
11743736  01 6983 20507478
1708818 : cpustat 35254625 2350704 16027174 1068367

Does this huge number of buffer_head cause the soft lockup?

Thanks,
Forrest

On 4/11/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote:
> We're using RHEL5 with kernel version 2.6.18-8.el5.
> When doing a stress test on raw device for about 3-4 hours, we found
> the soft lockup message in dmesg.
> I know we're not reporting the bug on the latest kernel, but does any
> expert know if this is the known issue in old kernel? Or why
> kmem_cache_free occupy CPU for more than 10 seconds?

Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Paul Jackson
 [] put_files_struct+0x6c/0xc3
 [] do_exit+0x2d2/0x8b1
 [] cpuset_exit+0x0/0x6c

I'm confused - which end of ths stack is up?

cpuset_exit doesn't call do_exit, rather it's the other
way around.  But put_files_struct doesn't call do_exit,
rather do_exit calls __exit_files calls put_files_struct.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

I enable CONFIG_DEBUG_SLAB, but don't get any extra debug messages
related to slab.
Is there other switch that I need to turn on?

Thanks,
Forrest

BUG: soft lockup detected on CPU#6!

Call Trace:
  [] softlockup_tick+0xdb/0xed
[] update_process_times+0x42/0x68
[] smp_local_timer_interrupt+0x23/0x47
[] smp_apic_timer_interrupt+0x41/0x47
[] apic_timer_interrupt+0x66/0x6c
  [] invalidate_mapping_pages+0xe1/0x15f
[] invalidate_mapping_pages+0xd2/0x15f
[] kill_bdev+0xe/0x21
[] __blkdev_put+0x4f/0x169
[] __fput+0xae/0x198
[] filp_close+0x5c/0x64
[] put_files_struct+0x6c/0xc3
[] do_exit+0x2d2/0x8b1
[] cpuset_exit+0x0/0x6c
[] get_signal_to_deliver+0x427/0x456
[] do_notify_resume+0x9c/0x7a9
[] default_wake_function+0x0/0xe
[] audit_syscall_exit+0x2cd/0x2ec
[] int_signal+0x12/0x17


On 4/11/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote:
> We're using RHEL5 with kernel version 2.6.18-8.el5.
> When doing a stress test on raw device for about 3-4 hours, we found
> the soft lockup message in dmesg.
> I know we're not reporting the bug on the latest kernel, but does any
> expert know if this is the known issue in old kernel? Or why
> kmem_cache_free occupy CPU for more than 10 seconds?

Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Pekka Enberg

On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote:

We're using RHEL5 with kernel version 2.6.18-8.el5.
When doing a stress test on raw device for about 3-4 hours, we found
the soft lockup message in dmesg.
I know we're not reporting the bug on the latest kernel, but does any
expert know if this is the known issue in old kernel? Or why
kmem_cache_free occupy CPU for more than 10 seconds?


Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

Hi,

We're using RHEL5 with kernel version 2.6.18-8.el5.
When doing a stress test on raw device for about 3-4 hours, we found
the soft lockup message in dmesg.
I know we're not reporting the bug on the latest kernel, but does any
expert know if this is the known issue in old kernel? Or why
kmem_cache_free occupy CPU for more than 10 seconds?

Please let me know if you need any information.

Thanks,
Forrest
--
BUG: soft lockup detected on CPU#1!

Call Trace:
  [] softlockup_tick+0xdb/0xed
[] update_process_times+0x42/0x68
[] smp_local_timer_interrupt+0x23/0x47
[] smp_apic_timer_interrupt+0x41/0x47
[] apic_timer_interrupt+0x66/0x6c
  [] kmem_cache_free+0x1c0/0x1cb
[] free_buffer_head+0x2a/0x43
[] try_to_free_buffers+0x89/0x9d
[] invalidate_mapping_pages+0x90/0x15f
[] kill_bdev+0xe/0x21
[] __blkdev_put+0x4f/0x169
[] __fput+0xae/0x198
[] filp_close+0x5c/0x64
[] put_files_struct+0x6c/0xc3
[] do_exit+0x2d2/0x8b1
[] cpuset_exit+0x0/0x6c
[] get_signal_to_deliver+0x427/0x456
[] do_notify_resume+0x9c/0x7a9
[] default_wake_function+0x0/0xe
[] audit_syscall_exit+0x2cd/0x2ec
[] int_signal+0x12/0x17
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

Hi,

We're using RHEL5 with kernel version 2.6.18-8.el5.
When doing a stress test on raw device for about 3-4 hours, we found
the soft lockup message in dmesg.
I know we're not reporting the bug on the latest kernel, but does any
expert know if this is the known issue in old kernel? Or why
kmem_cache_free occupy CPU for more than 10 seconds?

Please let me know if you need any information.

Thanks,
Forrest
--
BUG: soft lockup detected on CPU#1!

Call Trace:
IRQ  [800b2c93] softlockup_tick+0xdb/0xed
[800933df] update_process_times+0x42/0x68
[80073d97] smp_local_timer_interrupt+0x23/0x47
[80074459] smp_apic_timer_interrupt+0x41/0x47
[8005bcc2] apic_timer_interrupt+0x66/0x6c
EOI  [80007660] kmem_cache_free+0x1c0/0x1cb
[800262ee] free_buffer_head+0x2a/0x43
[80027110] try_to_free_buffers+0x89/0x9d
[80043041] invalidate_mapping_pages+0x90/0x15f
[800d4a77] kill_bdev+0xe/0x21
[800d4f9d] __blkdev_put+0x4f/0x169
[80012281] __fput+0xae/0x198
[80023647] filp_close+0x5c/0x64
[800384f9] put_files_struct+0x6c/0xc3
[80014f01] do_exit+0x2d2/0x8b1
[80046eb6] cpuset_exit+0x0/0x6c
[8002abd7] get_signal_to_deliver+0x427/0x456
[80059122] do_notify_resume+0x9c/0x7a9
[80086c6d] default_wake_function+0x0/0xe
[800b1fd8] audit_syscall_exit+0x2cd/0x2ec
[8005b362] int_signal+0x12/0x17
--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Pekka Enberg

On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote:

We're using RHEL5 with kernel version 2.6.18-8.el5.
When doing a stress test on raw device for about 3-4 hours, we found
the soft lockup message in dmesg.
I know we're not reporting the bug on the latest kernel, but does any
expert know if this is the known issue in old kernel? Or why
kmem_cache_free occupy CPU for more than 10 seconds?


Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

I enable CONFIG_DEBUG_SLAB, but don't get any extra debug messages
related to slab.
Is there other switch that I need to turn on?

Thanks,
Forrest

BUG: soft lockup detected on CPU#6!

Call Trace:
IRQ  [800b3834] softlockup_tick+0xdb/0xed
[80093edf] update_process_times+0x42/0x68
[80074897] smp_local_timer_interrupt+0x23/0x47
[80074f59] smp_apic_timer_interrupt+0x41/0x47
[8005c7c2] apic_timer_interrupt+0x66/0x6c
EOI  [80043b09] invalidate_mapping_pages+0xe1/0x15f
[80043afa] invalidate_mapping_pages+0xd2/0x15f
[800d5bea] kill_bdev+0xe/0x21
[800d6110] __blkdev_put+0x4f/0x169
[80012785] __fput+0xae/0x198
[80023ca6] filp_close+0x5c/0x64
[80038e33] put_files_struct+0x6c/0xc3
[8001543d] do_exit+0x2d2/0x8b1
[80047932] cpuset_exit+0x0/0x6c
[8002b30f] get_signal_to_deliver+0x427/0x456
[80059b9e] do_notify_resume+0x9c/0x7a9
[8008776d] default_wake_function+0x0/0xe
[800b2b79] audit_syscall_exit+0x2cd/0x2ec
[8005be62] int_signal+0x12/0x17


On 4/11/07, Pekka Enberg [EMAIL PROTECTED] wrote:

On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote:
 We're using RHEL5 with kernel version 2.6.18-8.el5.
 When doing a stress test on raw device for about 3-4 hours, we found
 the soft lockup message in dmesg.
 I know we're not reporting the bug on the latest kernel, but does any
 expert know if this is the known issue in old kernel? Or why
 kmem_cache_free occupy CPU for more than 10 seconds?

Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Paul Jackson
 [80038e33] put_files_struct+0x6c/0xc3
 [8001543d] do_exit+0x2d2/0x8b1
 [80047932] cpuset_exit+0x0/0x6c

I'm confused - which end of ths stack is up?

cpuset_exit doesn't call do_exit, rather it's the other
way around.  But put_files_struct doesn't call do_exit,
rather do_exit calls __exit_files calls put_files_struct.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

I got some new information:
Before soft lockup message is out, we have:
[EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
buffer_head   10927942 10942560120   321 : tunables   32
168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
11743736  01 6918 12166031 1013708
: cpustat 35254590 2350698 13610965 907286

Then after buffer_head is freed, we have:
[EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
buffer_head 9542  36384120   321 : tunables   32   16
 8 : slabdata   1137   1137245 : globalstat 37602996 11589379
11743736  01 6983 20507478
1708818 : cpustat 35254625 2350704 16027174 1068367

Does this huge number of buffer_head cause the soft lockup?

Thanks,
Forrest

On 4/11/07, Pekka Enberg [EMAIL PROTECTED] wrote:

On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote:
 We're using RHEL5 with kernel version 2.6.18-8.el5.
 When doing a stress test on raw device for about 3-4 hours, we found
 the soft lockup message in dmesg.
 I know we're not reporting the bug on the latest kernel, but does any
 expert know if this is the known issue in old kernel? Or why
 kmem_cache_free occupy CPU for more than 10 seconds?

Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
 I'm confused - which end of ths stack is up?
 
 cpuset_exit doesn't call do_exit, rather it's the other
 way around.  But put_files_struct doesn't call do_exit,
 rather do_exit calls __exit_files calls put_files_struct.

I'm guessing its x86_64 which generates crap traces.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Zhao Forrest

On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:

On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
 I'm confused - which end of ths stack is up?

 cpuset_exit doesn't call do_exit, rather it's the other
 way around.  But put_files_struct doesn't call do_exit,
 rather do_exit calls __exit_files calls put_files_struct.

I'm guessing its x86_64 which generates crap traces.


Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64?
Can enabling [ ] Compile the kernel with frame pointers help?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
 On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:
  On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
   I'm confused - which end of ths stack is up?
  
   cpuset_exit doesn't call do_exit, rather it's the other
   way around.  But put_files_struct doesn't call do_exit,
   rather do_exit calls __exit_files calls put_files_struct.
 
  I'm guessing its x86_64 which generates crap traces.
 
 Yes, it's x86_64. Is there a reliable way to generate stack traces under 
 x86_64?
 Can enabling [ ] Compile the kernel with frame pointers help?

Sometimes, the best is to redo the undo of the dwarf based stack
unwinder.

But as you said, that _huge_ number of buffers might be the issue, I'm
looking through the codepaths from __blkdev_put on downwards, I suspect
we hold a single lock somewhere,...

So hold on with patching the unwinder back in (unless of course you
fancy doing so :-)



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
 I got some new information:
 Before soft lockup message is out, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head   10927942 10942560120   321 : tunables   32
 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
 11743736  01 6918 12166031 1013708
 : cpustat 35254590 2350698 13610965 907286
 
 Then after buffer_head is freed, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head 9542  36384120   321 : tunables   32   16
   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
 11743736  01 6983 20507478
 1708818 : cpustat 35254625 2350704 16027174 1068367
 
 Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)

Does this fix it?

--- fs/buffer.c~2007-02-01 12:00:34.0 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.0 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh-b_this_page;
free_buffer_head(bh);
bh = next;
+
+   cond_resched();
} while (bh != buffers_to_free);
}
return ret;




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Chuck Ebbert
Zhao Forrest wrote:
 On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:
 On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
  I'm confused - which end of ths stack is up?
 
  cpuset_exit doesn't call do_exit, rather it's the other
  way around.  But put_files_struct doesn't call do_exit,
  rather do_exit calls __exit_files calls put_files_struct.

 I'm guessing its x86_64 which generates crap traces.

 Yes, it's x86_64. Is there a reliable way to generate stack traces under
 x86_64?
 Can enabling [ ] Compile the kernel with frame pointers help?

It will help a little, but not much because the stack backtrace
code ignores the frame pointers. But it will prevent tail calls,
making it somewhat easier to make sense of the data.

This should be a FAQ item somewhere...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Ken Chen

On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:

On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
 I got some new information:
 Before soft lockup message is out, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head   10927942 10942560120   321 : tunables   32
 168 : slabdata 341955 341955  6 : globalstat 37602996 11589379
 11743736  01 6918 12166031 1013708
 : cpustat 35254590 2350698 13610965 907286

 Then after buffer_head is freed, we have:
 [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
 buffer_head 9542  36384120   321 : tunables   32   16
   8 : slabdata   1137   1137245 : globalstat 37602996 11589379
 11743736  01 6983 20507478
 1708818 : cpustat 35254625 2350704 16027174 1068367

 Does this huge number of buffer_head cause the soft lockup?

__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)


also, you can try this patch:

http://groups.google.com/group/linux.kernel/browse_thread/thread/7086e4b9d5504dc9/c608bfea4614b07e?lnk=gstq=+Big+kernel+lock+contention+in+do_openrnum=1#c608bfea4614b07e
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Badari Pulavarty
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
 On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote:
  On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
   I'm confused - which end of ths stack is up?
  
   cpuset_exit doesn't call do_exit, rather it's the other
   way around.  But put_files_struct doesn't call do_exit,
   rather do_exit calls __exit_files calls put_files_struct.
 
  I'm guessing its x86_64 which generates crap traces.
 
 Yes, it's x86_64. Is there a reliable way to generate stack traces under 
 x86_64?
 Can enabling [ ] Compile the kernel with frame pointers help?

CONFIG_UNWIND_INFO=y
CONFIG_STACK_UNWIND=y

should help.

Thanks,
Badari



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Andrew Morton
On Wed, 11 Apr 2007 14:17:04 +0800
Zhao Forrest [EMAIL PROTECTED] wrote:

 We're using RHEL5 with kernel version 2.6.18-8.el5.
 When doing a stress test on raw device for about 3-4 hours, we found
 the soft lockup message in dmesg.
 I know we're not reporting the bug on the latest kernel, but does any
 expert know if this is the known issue in old kernel? Or why
 kmem_cache_free occupy CPU for more than 10 seconds?
 
 Please let me know if you need any information.
 
 Thanks,
 Forrest
 --
 BUG: soft lockup detected on CPU#1!
 
 Call Trace:
  IRQ  [800b2c93] softlockup_tick+0xdb/0xed
  [800933df] update_process_times+0x42/0x68
  [80073d97] smp_local_timer_interrupt+0x23/0x47
  [80074459] smp_apic_timer_interrupt+0x41/0x47
  [8005bcc2] apic_timer_interrupt+0x66/0x6c
  EOI  [80007660] kmem_cache_free+0x1c0/0x1cb
  [800262ee] free_buffer_head+0x2a/0x43
  [80027110] try_to_free_buffers+0x89/0x9d
  [80043041] invalidate_mapping_pages+0x90/0x15f
  [800d4a77] kill_bdev+0xe/0x21
  [800d4f9d] __blkdev_put+0x4f/0x169
  [80012281] __fput+0xae/0x198
  [80023647] filp_close+0x5c/0x64
  [800384f9] put_files_struct+0x6c/0xc3
  [80014f01] do_exit+0x2d2/0x8b1
  [80046eb6] cpuset_exit+0x0/0x6c
  [8002abd7] get_signal_to_deliver+0x427/0x456
  [80059122] do_notify_resume+0x9c/0x7a9
  [80086c6d] default_wake_function+0x0/0xe
  [800b1fd8] audit_syscall_exit+0x2cd/0x2ec
  [8005b362] int_signal+0x12/0x17

I think there's nothing unusual happening here - you closed the device and
the kernel has to remove a tremendous number of pagecache pages, and that
simply takes a long time.

How much memory does the machine have?

There used to be a cond_resched() in invalidate_mapping_pages() which would
have prevented this, but I rudely removed it to support
/proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
under spinlock).

We could resurrect that cond_resched() by passing in some flag, I guess. 
Or change the code to poke the softlockup detector.  The former would be
better.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/