Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
Zhao Forrest wrote: These 2 kernel options are turned on by default in my kernel. Here's snip from .config # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh->b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; So far I have run the test with patched kernel for 6 rounds, and didn't see the soft lockup. I think this patch should fix the problem. But what still confused me is that why do we need to invoke cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2 options should make schedule happen even if CPU is under heavy load.. No, only CONFIG_PREEMPT will do that. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
Zhao Forrest wrote: These 2 kernel options are turned on by default in my kernel. Here's snip from .config # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh-b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; So far I have run the test with patched kernel for 6 rounds, and didn't see the soft lockup. I think this patch should fix the problem. But what still confused me is that why do we need to invoke cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2 options should make schedule happen even if CPU is under heavy load.. No, only CONFIG_PREEMPT will do that. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Thu, 2007-04-12 at 00:55 -0700, Andrew Morton wrote: > On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: > > > > > There used to be a cond_resched() in invalidate_mapping_pages() which > > > would > > > have prevented this, but I rudely removed it to support > > > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() > > > under spinlock). > > > > > > We could resurrect that cond_resched() by passing in some flag, I guess. > > > Or change the code to poke the softlockup detector. The former would be > > > better. > > > > cond_resched() is conditional on __resched_legal(0) > > What's __resched_legal()? Hmm, I was looking at the RHEL-5 sources, it seems this function was removed later (commmit 9414232fa0cc28e2f51b8c76d260f2748f7953fc). > > which should take > > care of being called under a spinlock. > > We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT. Right, ok, that makes the whole thing fall apart indeed. > > Index: linux-2.6-mm/mm/truncate.c > > === > > --- linux-2.6-mm.orig/mm/truncate.c > > +++ linux-2.6-mm/mm/truncate.c > > @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s > > pgoff_t index; > > int lock_failed; > > > > + cond_resched(); > > + > > lock_failed = TestSetPageLocked(page); > > Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get > scheduling-in-spinlock warnings. *nod* > For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and > will fix things if !CONFIG_PREEMPT. > > We can presumably just remove the invalidate_mapping_pages() call from the > kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do > the same thing. Certainly looks that way, I'll see if I can spot a hole in that. > It might be time to remove that unused-for-six-years destroy_dirty_buffers > too. ok. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: > > > There used to be a cond_resched() in invalidate_mapping_pages() which would > > have prevented this, but I rudely removed it to support > > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() > > under spinlock). > > > > We could resurrect that cond_resched() by passing in some flag, I guess. > > Or change the code to poke the softlockup detector. The former would be > > better. > > cond_resched() is conditional on __resched_legal(0) What's __resched_legal()? > which should take > care of being called under a spinlock. We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT. > so I guess we can just reinstate the call in invalidate_mapping_pages() > > (still waiting on the compile to finish...) > --- > invalidate_mapping_pages() is called under locks (usually preemptable) > but can do a _lot_ of work, stick in a voluntary preemption point to > avoid excessive latencies (over 10 seconds was reported by softlockup). > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > --- > mm/truncate.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-2.6-mm/mm/truncate.c > === > --- linux-2.6-mm.orig/mm/truncate.c > +++ linux-2.6-mm/mm/truncate.c > @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s > pgoff_t index; > int lock_failed; > > + cond_resched(); > + > lock_failed = TestSetPageLocked(page); Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get scheduling-in-spinlock warnings. For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and will fix things if !CONFIG_PREEMPT. We can presumably just remove the invalidate_mapping_pages() call from the kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do the same thing. It might be time to remove that unused-for-six-years destroy_dirty_buffers too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: > There used to be a cond_resched() in invalidate_mapping_pages() which would > have prevented this, but I rudely removed it to support > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() > under spinlock). > > We could resurrect that cond_resched() by passing in some flag, I guess. > Or change the code to poke the softlockup detector. The former would be > better. cond_resched() is conditional on __resched_legal(0), which should take care of being called under a spinlock. so I guess we can just reinstate the call in invalidate_mapping_pages() (still waiting on the compile to finish...) --- invalidate_mapping_pages() is called under locks (usually preemptable) but can do a _lot_ of work, stick in a voluntary preemption point to avoid excessive latencies (over 10 seconds was reported by softlockup). Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- mm/truncate.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6-mm/mm/truncate.c === --- linux-2.6-mm.orig/mm/truncate.c +++ linux-2.6-mm/mm/truncate.c @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s pgoff_t index; int lock_failed; + cond_resched(); + lock_failed = TestSetPageLocked(page); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: > I got some new information: > Before soft lockup message is out, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 10927942 10942560120 321 : tunables 32 > 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 > 11743736 01 6918 12166031 1013708 > : cpustat 35254590 2350698 13610965 907286 > > Then after buffer_head is freed, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 9542 36384120 321 : tunables 32 16 > 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 > 11743736 01 6983 20507478 > 1708818 : cpustat 35254625 2350704 16027174 1068367 > > Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) These 2 kernel options are turned on by default in my kernel. Here's snip from .config # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh->b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; So far I have run the test with patched kernel for 6 rounds, and didn't see the soft lockup. I think this patch should fix the problem. But what still confused me is that why do we need to invoke cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2 options should make schedule happen even if CPU is under heavy load.. Thanks, Forrest - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: I got some new information: Before soft lockup message is out, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 10927942 10942560120 321 : tunables 32 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 11743736 01 6918 12166031 1013708 : cpustat 35254590 2350698 13610965 907286 Then after buffer_head is freed, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 9542 36384120 321 : tunables 32 16 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 11743736 01 6983 20507478 1708818 : cpustat 35254625 2350704 16027174 1068367 Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) These 2 kernel options are turned on by default in my kernel. Here's snip from .config # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh-b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; So far I have run the test with patched kernel for 6 rounds, and didn't see the soft lockup. I think this patch should fix the problem. But what still confused me is that why do we need to invoke cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2 options should make schedule happen even if CPU is under heavy load.. Thanks, Forrest - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: There used to be a cond_resched() in invalidate_mapping_pages() which would have prevented this, but I rudely removed it to support /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() under spinlock). We could resurrect that cond_resched() by passing in some flag, I guess. Or change the code to poke the softlockup detector. The former would be better. cond_resched() is conditional on __resched_legal(0), which should take care of being called under a spinlock. so I guess we can just reinstate the call in invalidate_mapping_pages() (still waiting on the compile to finish...) --- invalidate_mapping_pages() is called under locks (usually preemptable) but can do a _lot_ of work, stick in a voluntary preemption point to avoid excessive latencies (over 10 seconds was reported by softlockup). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/truncate.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6-mm/mm/truncate.c === --- linux-2.6-mm.orig/mm/truncate.c +++ linux-2.6-mm/mm/truncate.c @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s pgoff_t index; int lock_failed; + cond_resched(); + lock_failed = TestSetPageLocked(page); /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: There used to be a cond_resched() in invalidate_mapping_pages() which would have prevented this, but I rudely removed it to support /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() under spinlock). We could resurrect that cond_resched() by passing in some flag, I guess. Or change the code to poke the softlockup detector. The former would be better. cond_resched() is conditional on __resched_legal(0) What's __resched_legal()? which should take care of being called under a spinlock. We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT. so I guess we can just reinstate the call in invalidate_mapping_pages() (still waiting on the compile to finish...) --- invalidate_mapping_pages() is called under locks (usually preemptable) but can do a _lot_ of work, stick in a voluntary preemption point to avoid excessive latencies (over 10 seconds was reported by softlockup). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/truncate.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6-mm/mm/truncate.c === --- linux-2.6-mm.orig/mm/truncate.c +++ linux-2.6-mm/mm/truncate.c @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s pgoff_t index; int lock_failed; + cond_resched(); + lock_failed = TestSetPageLocked(page); Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get scheduling-in-spinlock warnings. For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and will fix things if !CONFIG_PREEMPT. We can presumably just remove the invalidate_mapping_pages() call from the kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do the same thing. It might be time to remove that unused-for-six-years destroy_dirty_buffers too. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Thu, 2007-04-12 at 00:55 -0700, Andrew Morton wrote: On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote: There used to be a cond_resched() in invalidate_mapping_pages() which would have prevented this, but I rudely removed it to support /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() under spinlock). We could resurrect that cond_resched() by passing in some flag, I guess. Or change the code to poke the softlockup detector. The former would be better. cond_resched() is conditional on __resched_legal(0) What's __resched_legal()? Hmm, I was looking at the RHEL-5 sources, it seems this function was removed later (commmit 9414232fa0cc28e2f51b8c76d260f2748f7953fc). which should take care of being called under a spinlock. We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT. Right, ok, that makes the whole thing fall apart indeed. Index: linux-2.6-mm/mm/truncate.c === --- linux-2.6-mm.orig/mm/truncate.c +++ linux-2.6-mm/mm/truncate.c @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s pgoff_t index; int lock_failed; + cond_resched(); + lock_failed = TestSetPageLocked(page); Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get scheduling-in-spinlock warnings. *nod* For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and will fix things if !CONFIG_PREEMPT. We can presumably just remove the invalidate_mapping_pages() call from the kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do the same thing. Certainly looks that way, I'll see if I can spot a hole in that. It might be time to remove that unused-for-six-years destroy_dirty_buffers too. ok. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 11 Apr 2007 14:17:04 +0800 "Zhao Forrest" <[EMAIL PROTECTED]> wrote: > We're using RHEL5 with kernel version 2.6.18-8.el5. > When doing a stress test on raw device for about 3-4 hours, we found > the soft lockup message in dmesg. > I know we're not reporting the bug on the latest kernel, but does any > expert know if this is the known issue in old kernel? Or why > kmem_cache_free occupy CPU for more than 10 seconds? > > Please let me know if you need any information. > > Thanks, > Forrest > -- > BUG: soft lockup detected on CPU#1! > > Call Trace: >[] softlockup_tick+0xdb/0xed > [] update_process_times+0x42/0x68 > [] smp_local_timer_interrupt+0x23/0x47 > [] smp_apic_timer_interrupt+0x41/0x47 > [] apic_timer_interrupt+0x66/0x6c >[] kmem_cache_free+0x1c0/0x1cb > [] free_buffer_head+0x2a/0x43 > [] try_to_free_buffers+0x89/0x9d > [] invalidate_mapping_pages+0x90/0x15f > [] kill_bdev+0xe/0x21 > [] __blkdev_put+0x4f/0x169 > [] __fput+0xae/0x198 > [] filp_close+0x5c/0x64 > [] put_files_struct+0x6c/0xc3 > [] do_exit+0x2d2/0x8b1 > [] cpuset_exit+0x0/0x6c > [] get_signal_to_deliver+0x427/0x456 > [] do_notify_resume+0x9c/0x7a9 > [] default_wake_function+0x0/0xe > [] audit_syscall_exit+0x2cd/0x2ec > [] int_signal+0x12/0x17 I think there's nothing unusual happening here - you closed the device and the kernel has to remove a tremendous number of pagecache pages, and that simply takes a long time. How much memory does the machine have? There used to be a cond_resched() in invalidate_mapping_pages() which would have prevented this, but I rudely removed it to support /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() under spinlock). We could resurrect that cond_resched() by passing in some flag, I guess. Or change the code to poke the softlockup detector. The former would be better. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote: > On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: > > > I'm confused - which end of ths stack is up? > > > > > > cpuset_exit doesn't call do_exit, rather it's the other > > > way around. But put_files_struct doesn't call do_exit, > > > rather do_exit calls __exit_files calls put_files_struct. > > > > I'm guessing its x86_64 which generates crap traces. > > > Yes, it's x86_64. Is there a reliable way to generate stack traces under > x86_64? > Can enabling "[ ] Compile the kernel with frame pointers" help? CONFIG_UNWIND_INFO=y CONFIG_STACK_UNWIND=y should help. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: > I got some new information: > Before soft lockup message is out, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 10927942 10942560120 321 : tunables 32 > 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 > 11743736 01 6918 12166031 1013708 > : cpustat 35254590 2350698 13610965 907286 > > Then after buffer_head is freed, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 9542 36384120 321 : tunables 32 16 > 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 > 11743736 01 6983 20507478 > 1708818 : cpustat 35254625 2350704 16027174 1068367 > > Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) also, you can try this patch: http://groups.google.com/group/linux.kernel/browse_thread/thread/7086e4b9d5504dc9/c608bfea4614b07e?lnk=gst=+Big+kernel+lock+contention+in+do_open=1#c608bfea4614b07e - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
Zhao Forrest wrote: > On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: >> On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: >> > I'm confused - which end of ths stack is up? >> > >> > cpuset_exit doesn't call do_exit, rather it's the other >> > way around. But put_files_struct doesn't call do_exit, >> > rather do_exit calls __exit_files calls put_files_struct. >> >> I'm guessing its x86_64 which generates crap traces. >> > Yes, it's x86_64. Is there a reliable way to generate stack traces under > x86_64? > Can enabling "[ ] Compile the kernel with frame pointers" help? It will help a little, but not much because the stack backtrace code ignores the frame pointers. But it will prevent tail calls, making it somewhat easier to make sense of the data. This should be a FAQ item somewhere... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: > I got some new information: > Before soft lockup message is out, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 10927942 10942560120 321 : tunables 32 > 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 > 11743736 01 6918 12166031 1013708 > : cpustat 35254590 2350698 13610965 907286 > > Then after buffer_head is freed, we have: > [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head > buffer_head 9542 36384120 321 : tunables 32 16 > 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 > 11743736 01 6983 20507478 > 1708818 : cpustat 35254625 2350704 16027174 1068367 > > Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh->b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote: > On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: > > > I'm confused - which end of ths stack is up? > > > > > > cpuset_exit doesn't call do_exit, rather it's the other > > > way around. But put_files_struct doesn't call do_exit, > > > rather do_exit calls __exit_files calls put_files_struct. > > > > I'm guessing its x86_64 which generates crap traces. > > > Yes, it's x86_64. Is there a reliable way to generate stack traces under > x86_64? > Can enabling "[ ] Compile the kernel with frame pointers" help? Sometimes, the best is to redo the undo of the dwarf based stack unwinder. But as you said, that _huge_ number of buffers might be the issue, I'm looking through the codepaths from __blkdev_put on downwards, I suspect we hold a single lock somewhere,... So hold on with patching the unwinder back in (unless of course you fancy doing so :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: > I'm confused - which end of ths stack is up? > > cpuset_exit doesn't call do_exit, rather it's the other > way around. But put_files_struct doesn't call do_exit, > rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64? Can enabling "[ ] Compile the kernel with frame pointers" help? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: > I'm confused - which end of ths stack is up? > > cpuset_exit doesn't call do_exit, rather it's the other > way around. But put_files_struct doesn't call do_exit, > rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
I got some new information: Before soft lockup message is out, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 10927942 10942560120 321 : tunables 32 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 11743736 01 6918 12166031 1013708 : cpustat 35254590 2350698 13610965 907286 Then after buffer_head is freed, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 9542 36384120 321 : tunables 32 16 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 11743736 01 6983 20507478 1708818 : cpustat 35254625 2350704 16027174 1068367 Does this huge number of buffer_head cause the soft lockup? Thanks, Forrest On 4/11/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote: > We're using RHEL5 with kernel version 2.6.18-8.el5. > When doing a stress test on raw device for about 3-4 hours, we found > the soft lockup message in dmesg. > I know we're not reporting the bug on the latest kernel, but does any > expert know if this is the known issue in old kernel? Or why > kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
[] put_files_struct+0x6c/0xc3 [] do_exit+0x2d2/0x8b1 [] cpuset_exit+0x0/0x6c I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
I enable CONFIG_DEBUG_SLAB, but don't get any extra debug messages related to slab. Is there other switch that I need to turn on? Thanks, Forrest BUG: soft lockup detected on CPU#6! Call Trace: [] softlockup_tick+0xdb/0xed [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x23/0x47 [] smp_apic_timer_interrupt+0x41/0x47 [] apic_timer_interrupt+0x66/0x6c [] invalidate_mapping_pages+0xe1/0x15f [] invalidate_mapping_pages+0xd2/0x15f [] kill_bdev+0xe/0x21 [] __blkdev_put+0x4f/0x169 [] __fput+0xae/0x198 [] filp_close+0x5c/0x64 [] put_files_struct+0x6c/0xc3 [] do_exit+0x2d2/0x8b1 [] cpuset_exit+0x0/0x6c [] get_signal_to_deliver+0x427/0x456 [] do_notify_resume+0x9c/0x7a9 [] default_wake_function+0x0/0xe [] audit_syscall_exit+0x2cd/0x2ec [] int_signal+0x12/0x17 On 4/11/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote: > We're using RHEL5 with kernel version 2.6.18-8.el5. > When doing a stress test on raw device for about 3-4 hours, we found > the soft lockup message in dmesg. > I know we're not reporting the bug on the latest kernel, but does any > expert know if this is the known issue in old kernel? Or why > kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Zhao Forrest <[EMAIL PROTECTED]> wrote: We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Why kmem_cache_free occupy CPU for more than 10 seconds?
Hi, We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Please let me know if you need any information. Thanks, Forrest -- BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xdb/0xed [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x23/0x47 [] smp_apic_timer_interrupt+0x41/0x47 [] apic_timer_interrupt+0x66/0x6c [] kmem_cache_free+0x1c0/0x1cb [] free_buffer_head+0x2a/0x43 [] try_to_free_buffers+0x89/0x9d [] invalidate_mapping_pages+0x90/0x15f [] kill_bdev+0xe/0x21 [] __blkdev_put+0x4f/0x169 [] __fput+0xae/0x198 [] filp_close+0x5c/0x64 [] put_files_struct+0x6c/0xc3 [] do_exit+0x2d2/0x8b1 [] cpuset_exit+0x0/0x6c [] get_signal_to_deliver+0x427/0x456 [] do_notify_resume+0x9c/0x7a9 [] default_wake_function+0x0/0xe [] audit_syscall_exit+0x2cd/0x2ec [] int_signal+0x12/0x17 -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Why kmem_cache_free occupy CPU for more than 10 seconds?
Hi, We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Please let me know if you need any information. Thanks, Forrest -- BUG: soft lockup detected on CPU#1! Call Trace: IRQ [800b2c93] softlockup_tick+0xdb/0xed [800933df] update_process_times+0x42/0x68 [80073d97] smp_local_timer_interrupt+0x23/0x47 [80074459] smp_apic_timer_interrupt+0x41/0x47 [8005bcc2] apic_timer_interrupt+0x66/0x6c EOI [80007660] kmem_cache_free+0x1c0/0x1cb [800262ee] free_buffer_head+0x2a/0x43 [80027110] try_to_free_buffers+0x89/0x9d [80043041] invalidate_mapping_pages+0x90/0x15f [800d4a77] kill_bdev+0xe/0x21 [800d4f9d] __blkdev_put+0x4f/0x169 [80012281] __fput+0xae/0x198 [80023647] filp_close+0x5c/0x64 [800384f9] put_files_struct+0x6c/0xc3 [80014f01] do_exit+0x2d2/0x8b1 [80046eb6] cpuset_exit+0x0/0x6c [8002abd7] get_signal_to_deliver+0x427/0x456 [80059122] do_notify_resume+0x9c/0x7a9 [80086c6d] default_wake_function+0x0/0xe [800b1fd8] audit_syscall_exit+0x2cd/0x2ec [8005b362] int_signal+0x12/0x17 -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote: We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
I enable CONFIG_DEBUG_SLAB, but don't get any extra debug messages related to slab. Is there other switch that I need to turn on? Thanks, Forrest BUG: soft lockup detected on CPU#6! Call Trace: IRQ [800b3834] softlockup_tick+0xdb/0xed [80093edf] update_process_times+0x42/0x68 [80074897] smp_local_timer_interrupt+0x23/0x47 [80074f59] smp_apic_timer_interrupt+0x41/0x47 [8005c7c2] apic_timer_interrupt+0x66/0x6c EOI [80043b09] invalidate_mapping_pages+0xe1/0x15f [80043afa] invalidate_mapping_pages+0xd2/0x15f [800d5bea] kill_bdev+0xe/0x21 [800d6110] __blkdev_put+0x4f/0x169 [80012785] __fput+0xae/0x198 [80023ca6] filp_close+0x5c/0x64 [80038e33] put_files_struct+0x6c/0xc3 [8001543d] do_exit+0x2d2/0x8b1 [80047932] cpuset_exit+0x0/0x6c [8002b30f] get_signal_to_deliver+0x427/0x456 [80059b9e] do_notify_resume+0x9c/0x7a9 [8008776d] default_wake_function+0x0/0xe [800b2b79] audit_syscall_exit+0x2cd/0x2ec [8005be62] int_signal+0x12/0x17 On 4/11/07, Pekka Enberg [EMAIL PROTECTED] wrote: On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote: We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
[80038e33] put_files_struct+0x6c/0xc3 [8001543d] do_exit+0x2d2/0x8b1 [80047932] cpuset_exit+0x0/0x6c I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
I got some new information: Before soft lockup message is out, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 10927942 10942560120 321 : tunables 32 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 11743736 01 6918 12166031 1013708 : cpustat 35254590 2350698 13610965 907286 Then after buffer_head is freed, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 9542 36384120 321 : tunables 32 16 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 11743736 01 6983 20507478 1708818 : cpustat 35254625 2350704 16027174 1068367 Does this huge number of buffer_head cause the soft lockup? Thanks, Forrest On 4/11/07, Pekka Enberg [EMAIL PROTECTED] wrote: On 4/11/07, Zhao Forrest [EMAIL PROTECTED] wrote: We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64? Can enabling [ ] Compile the kernel with frame pointers help? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote: On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64? Can enabling [ ] Compile the kernel with frame pointers help? Sometimes, the best is to redo the undo of the dwarf based stack unwinder. But as you said, that _huge_ number of buffers might be the issue, I'm looking through the codepaths from __blkdev_put on downwards, I suspect we hold a single lock somewhere,... So hold on with patching the unwinder back in (unless of course you fancy doing so :-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: I got some new information: Before soft lockup message is out, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 10927942 10942560120 321 : tunables 32 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 11743736 01 6918 12166031 1013708 : cpustat 35254590 2350698 13610965 907286 Then after buffer_head is freed, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 9542 36384120 321 : tunables 32 16 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 11743736 01 6983 20507478 1708818 : cpustat 35254625 2350704 16027174 1068367 Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) Does this fix it? --- fs/buffer.c~2007-02-01 12:00:34.0 +0100 +++ fs/buffer.c 2007-04-11 12:35:48.0 +0200 @@ -3029,6 +3029,8 @@ out: struct buffer_head *next = bh-b_this_page; free_buffer_head(bh); bh = next; + + cond_resched(); } while (bh != buffers_to_free); } return ret; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
Zhao Forrest wrote: On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64? Can enabling [ ] Compile the kernel with frame pointers help? It will help a little, but not much because the stack backtrace code ignores the frame pointers. But it will prevent tail calls, making it somewhat easier to make sense of the data. This should be a FAQ item somewhere... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote: I got some new information: Before soft lockup message is out, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 10927942 10942560120 321 : tunables 32 168 : slabdata 341955 341955 6 : globalstat 37602996 11589379 11743736 01 6918 12166031 1013708 : cpustat 35254590 2350698 13610965 907286 Then after buffer_head is freed, we have: [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head buffer_head 9542 36384120 321 : tunables 32 16 8 : slabdata 1137 1137245 : globalstat 37602996 11589379 11743736 01 6983 20507478 1708818 : cpustat 35254625 2350704 16027174 1068367 Does this huge number of buffer_head cause the soft lockup? __blkdev_put() takes the BKL and bd_mutex invalidate_mapping_pages() tries to take the PageLock But no other looks seem held while free_buffer_head() is called All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not hog the cpu like that, what preemption mode have you got selected? (CONFIG_PREEMPT_VOLUNTARY?=y) also, you can try this patch: http://groups.google.com/group/linux.kernel/browse_thread/thread/7086e4b9d5504dc9/c608bfea4614b07e?lnk=gstq=+Big+kernel+lock+contention+in+do_openrnum=1#c608bfea4614b07e - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote: On 4/11/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its x86_64 which generates crap traces. Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64? Can enabling [ ] Compile the kernel with frame pointers help? CONFIG_UNWIND_INFO=y CONFIG_STACK_UNWIND=y should help. Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
On Wed, 11 Apr 2007 14:17:04 +0800 Zhao Forrest [EMAIL PROTECTED] wrote: We're using RHEL5 with kernel version 2.6.18-8.el5. When doing a stress test on raw device for about 3-4 hours, we found the soft lockup message in dmesg. I know we're not reporting the bug on the latest kernel, but does any expert know if this is the known issue in old kernel? Or why kmem_cache_free occupy CPU for more than 10 seconds? Please let me know if you need any information. Thanks, Forrest -- BUG: soft lockup detected on CPU#1! Call Trace: IRQ [800b2c93] softlockup_tick+0xdb/0xed [800933df] update_process_times+0x42/0x68 [80073d97] smp_local_timer_interrupt+0x23/0x47 [80074459] smp_apic_timer_interrupt+0x41/0x47 [8005bcc2] apic_timer_interrupt+0x66/0x6c EOI [80007660] kmem_cache_free+0x1c0/0x1cb [800262ee] free_buffer_head+0x2a/0x43 [80027110] try_to_free_buffers+0x89/0x9d [80043041] invalidate_mapping_pages+0x90/0x15f [800d4a77] kill_bdev+0xe/0x21 [800d4f9d] __blkdev_put+0x4f/0x169 [80012281] __fput+0xae/0x198 [80023647] filp_close+0x5c/0x64 [800384f9] put_files_struct+0x6c/0xc3 [80014f01] do_exit+0x2d2/0x8b1 [80046eb6] cpuset_exit+0x0/0x6c [8002abd7] get_signal_to_deliver+0x427/0x456 [80059122] do_notify_resume+0x9c/0x7a9 [80086c6d] default_wake_function+0x0/0xe [800b1fd8] audit_syscall_exit+0x2cd/0x2ec [8005b362] int_signal+0x12/0x17 I think there's nothing unusual happening here - you closed the device and the kernel has to remove a tremendous number of pagecache pages, and that simply takes a long time. How much memory does the machine have? There used to be a cond_resched() in invalidate_mapping_pages() which would have prevented this, but I rudely removed it to support /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages() under spinlock). We could resurrect that cond_resched() by passing in some flag, I guess. Or change the code to poke the softlockup detector. The former would be better. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/