Re: 2.6.22-rc6 spurious hangs
>>> Thomas, any chance you could try the patch below? >> I'm still testing but I couldn't break it until now. > Great, thanks a lot Thomas! The box is still running without a problem, it seems the bug is fixed. Thanks a lot, Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
> Thomas, any chance you could try the patch below? It is very, very stupid, > it was done without any understanding of this code, and of course it is > completely untested. I doubt very much it is correct, and even if it is > correct it is definitely not good. It would be great if Dmitry can take a > look. I'm still testing but I couldn't break it until now. And I didn't find any drawbacks yet. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
Thomas, any chance you could try the patch below? It is very, very stupid, it was done without any understanding of this code, and of course it is completely untested. I doubt very much it is correct, and even if it is correct it is definitely not good. It would be great if Dmitry can take a look. I'm still testing but I couldn't break it until now. And I didn't find any drawbacks yet. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
Thomas, any chance you could try the patch below? I'm still testing but I couldn't break it until now. Great, thanks a lot Thomas! The box is still running without a problem, it seems the bug is fixed. Thanks a lot, Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
>> Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2] > > cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing > about drivers/media/dvb/. Does this mean the problem is in the cinergyt2 driver? I'm having similar problems with another box but with different hardware. While my laptop is used as a test system the other one is used as a 'productive' TV-recorder. I hoped we could trace the bug on the test system and fix the productive one at the same time. :-/ The other box ("silver") is a desktop, which has two Hauppauge Nova-T DVB-T PCI cards and one (analog) Hauppauge WinTV PVR-350. Silver only hangs if the (digital) recording process has to much priority: (silver is running 2.6.21.5-cfs-v17 +squashfs +ivtv) As I wanted to give as much priority to the recording process as possible I firstly run dvbd as SCHED_RR. This hung the box quite often, sometimes after an uptime of several minutes, sometimes after two weeks. I switched to -ck and run dvbd as SCHED_ISO which worked without *any* problem for about 18 months. As -ck is discontinued I switched to CFS and the box hung again (twice until I understood why) when dvbd was running as nice -15. ATM dvbd runs with nice -12 but yesterday, during a rsync-transfer of several >4G files, a recording was broken. 29 seconds of the recorded stream are lost because the system load was at 5 for about three hours. Perhaps the 29 missing seconds are caused not by to less CPU time but by the havy IO of rsync. But on the other hand dvbd is also running at IO realtime prio 4 (ionice) while rsync run as IO normal. Any hints? Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2] cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing about drivers/media/dvb/. Does this mean the problem is in the cinergyt2 driver? I'm having similar problems with another box but with different hardware. While my laptop is used as a test system the other one is used as a 'productive' TV-recorder. I hoped we could trace the bug on the test system and fix the productive one at the same time. :-/ The other box (silver) is a desktop, which has two Hauppauge Nova-T DVB-T PCI cards and one (analog) Hauppauge WinTV PVR-350. Silver only hangs if the (digital) recording process has to much priority: (silver is running 2.6.21.5-cfs-v17 +squashfs +ivtv) As I wanted to give as much priority to the recording process as possible I firstly run dvbd as SCHED_RR. This hung the box quite often, sometimes after an uptime of several minutes, sometimes after two weeks. I switched to -ck and run dvbd as SCHED_ISO which worked without *any* problem for about 18 months. As -ck is discontinued I switched to CFS and the box hung again (twice until I understood why) when dvbd was running as nice -15. ATM dvbd runs with nice -12 but yesterday, during a rsync-transfer of several 4G files, a recording was broken. 29 seconds of the recorded stream are lost because the system load was at 5 for about three hours. Perhaps the 29 missing seconds are caused not by to less CPU time but by the havy IO of rsync. But on the other hand dvbd is also running at IO realtime prio 4 (ionice) while rsync run as IO normal. Any hints? Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
> Could you also show the result of sysrq-T ? I was so happy that I could trigger it that fast ... ... that I forgot to press Alt-Sysrq-t before reboot. :-( But, I could trigger it again. :-) This time I can offer: - Debug output from Oleg's patch (11x, every 30s) - Alt-Sysrq-t (3x, about 30s between them) There is no lockdep stuff but lockdep must have been running. It's enabled and did not fire before the the bug was triggered. The logfile is attached. (yes it is, I checked twice) Thomas messages.gz Description: application/gzip
Re: 2.6.22-rc6 spurious hangs
Here is the logfile. Thomas -- keep mailinglists in english, feel free to send PM in german messages.gz Description: application/gzip
Re: 2.6.22-rc6 spurious hangs
As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The corresponding part of my syslogs is attached, as well as my kernel config. >>> Could you try the patch below? It dumps some info when flush_workqueue() >>> hangs. >> I'm compiling a patched kernel right now. As I wrote in my former mail the >> whole thing not easy to trigger. So it can take some time to get the info. > > Forgot to say, if you manage to trigger the hang, please wait a couple of > minutes to collect more info from flush_wait(). Seems today is my lucky day: I triggered it in just a few minutes. The logfile is attached. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
>> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The >> corresponding part of my syslogs is attached, as well as my kernel config. > > Could you try the patch below? It dumps some info when flush_workqueue() > hangs. I'm compiling a patched kernel right now. As I wrote in my former mail the whole thing not easy to trigger. So it can take some time to get the info. Thanks so far, Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.22-rc6 spurious hangs
Hi there ... I'm observing seldom hangs with linux 2.6. I can't tell when exactly it happened the first time, I think somewhere around 2.6.16 or 2.6.17. I see it about once or twice a month. With absolutely nothing in the logs. So far I asked for help: - in the -ck list Mon Sep 4 10:22:06 EST 2006, [ck] ck-patches seem to break DVB-T drivers (see http://bhhdoa.org.au/pipermail/ck/2006-September/thread.html#6385) - in the linux-dvb list Wed Sep 6 19:02:29 CEST 2006, [linux-dvb] driver problems when using ck-patchset (http://www.linuxtv.org/pipermail/linux-dvb/2006-September/thread.html#12649) - in the DaLUG (german, currently no archive) 14.09.2006 But nobody could help me so far. Here is what I do: I was running different kernels with different patchsets. It happened in the past on -ck kernels (staircase), vanilla scheduler and cfs. As far as I can remember the following patches were allways applied: squashfs and vesa-tng. Currently I'm running 2.6.22rc6 with cfs-v18, vesa-tng and an XFS-lockdep patch: http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc6-v18.patch http://dev.gentoo.org/~spock/projects/vesafb-tng/archive/vesafb-tng-1.0-rc2-2.6.20-rc2.patch see http://marc.info/?l=linux-kernel=118286232709378=2 I also installed these kernel modules via gentoo portage: ati-drivers-8.37.6-r1 fuse-2.6.4-r1 kqemu-1.3.0_pre11 truecrypt-4.3 kqemu and truecrypt weren't loaded, but ati-drivers and fuse were. The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM. The distribution in use is gentoo, quite up to date. Attached to the box is an USB2.0 DVB-T receiver (Cinergy T², Terratec). In rare cases the keyboard stops working when the T² stops streaming DVB to the box. It happens when I record the stream to disk as well as when I stream it to mplayer. If end of streaming is caused by a keypress, 'q' or 'enter' on mplayer, that key gets stuck. It's repeated until I reboot the box. If the recording was scheduled and stops by itself no more keys are recognized. The keyboard is dead. The laptop's own and the attached USB-Keyboard. Magic-Sys-Keys are still working. I can still use the mouse to move windows around, start new xterms via icewm's panel or copy and past single characters from an xterm to other xterms. I can also close most of the open windows, for example firefox and most xterms. I cannot close an xterm which is started as 'xterm -e top' by icewm or a vncviewer. Both windows stay open but lose their content. If a root shell is open I can enter 'reboot' or 'halt' but most of the time this doesn't reboot or halt. I get the message for an upcoming shotdown in all xterms but the box doesn't come down. The systemload continously increases but there is nothing to see in top why. Ingo Molnar told me to enable CONFIG_PROVE_LOCKING but xfs triggers it long before the box hangs. I tested the patch mentioned above but it was triggered by xfs again, see [1] and I didn't reboot between this and the last hung. [1] http://marc.info/?l=linux-kernel=118295294529681=2 As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The corresponding part of my syslogs is attached, as well as my kernel config. Another thing I observed with the T² is that it doesn't work if it's already connected when the laptop boots up. I need to power off, disconnect and boot. If I connect the T² after bootup it works. I can also rmmod it's driver when it's not in use. If I boot the box with the T² connected I cannot use it, the blue led in the T² is always off and I cannot rmmod the driver. (I don't know whether I ever tired to rmmod the driver before I tried to use the T².) Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german messages.gz Description: application/gzip config.gz Description: application/gzip
2.6.22-rc6 spurious hangs
Hi there ... I'm observing seldom hangs with linux 2.6. I can't tell when exactly it happened the first time, I think somewhere around 2.6.16 or 2.6.17. I see it about once or twice a month. With absolutely nothing in the logs. So far I asked for help: - in the -ck list Mon Sep 4 10:22:06 EST 2006, [ck] ck-patches seem to break DVB-T drivers (see http://bhhdoa.org.au/pipermail/ck/2006-September/thread.html#6385) - in the linux-dvb list Wed Sep 6 19:02:29 CEST 2006, [linux-dvb] driver problems when using ck-patchset (http://www.linuxtv.org/pipermail/linux-dvb/2006-September/thread.html#12649) - in the DaLUG (german, currently no archive) 14.09.2006 But nobody could help me so far. Here is what I do: I was running different kernels with different patchsets. It happened in the past on -ck kernels (staircase), vanilla scheduler and cfs. As far as I can remember the following patches were allways applied: squashfs and vesa-tng. Currently I'm running 2.6.22rc6 with cfs-v18, vesa-tng and an XFS-lockdep patch: http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc6-v18.patch http://dev.gentoo.org/~spock/projects/vesafb-tng/archive/vesafb-tng-1.0-rc2-2.6.20-rc2.patch see http://marc.info/?l=linux-kernelm=118286232709378w=2 I also installed these kernel modules via gentoo portage: ati-drivers-8.37.6-r1 fuse-2.6.4-r1 kqemu-1.3.0_pre11 truecrypt-4.3 kqemu and truecrypt weren't loaded, but ati-drivers and fuse were. The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM. The distribution in use is gentoo, quite up to date. Attached to the box is an USB2.0 DVB-T receiver (Cinergy T², Terratec). In rare cases the keyboard stops working when the T² stops streaming DVB to the box. It happens when I record the stream to disk as well as when I stream it to mplayer. If end of streaming is caused by a keypress, 'q' or 'enter' on mplayer, that key gets stuck. It's repeated until I reboot the box. If the recording was scheduled and stops by itself no more keys are recognized. The keyboard is dead. The laptop's own and the attached USB-Keyboard. Magic-Sys-Keys are still working. I can still use the mouse to move windows around, start new xterms via icewm's panel or copy and past single characters from an xterm to other xterms. I can also close most of the open windows, for example firefox and most xterms. I cannot close an xterm which is started as 'xterm -e top' by icewm or a vncviewer. Both windows stay open but lose their content. If a root shell is open I can enter 'reboot' or 'halt' but most of the time this doesn't reboot or halt. I get the message for an upcoming shotdown in all xterms but the box doesn't come down. The systemload continously increases but there is nothing to see in top why. Ingo Molnar told me to enable CONFIG_PROVE_LOCKING but xfs triggers it long before the box hangs. I tested the patch mentioned above but it was triggered by xfs again, see [1] and I didn't reboot between this and the last hung. [1] http://marc.info/?l=linux-kernelm=118295294529681w=2 As Ingo told me I run 'echo t /proc/sysrq-trigger' this time. The corresponding part of my syslogs is attached, as well as my kernel config. Another thing I observed with the T² is that it doesn't work if it's already connected when the laptop boots up. I need to power off, disconnect and boot. If I connect the T² after bootup it works. I can also rmmod it's driver when it's not in use. If I boot the box with the T² connected I cannot use it, the blue led in the T² is always off and I cannot rmmod the driver. (I don't know whether I ever tired to rmmod the driver before I tried to use the T².) Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german messages.gz Description: application/gzip config.gz Description: application/gzip
Re: 2.6.22-rc6 spurious hangs
As Ingo told me I run 'echo t /proc/sysrq-trigger' this time. The corresponding part of my syslogs is attached, as well as my kernel config. Could you try the patch below? It dumps some info when flush_workqueue() hangs. I'm compiling a patched kernel right now. As I wrote in my former mail the whole thing not easy to trigger. So it can take some time to get the info. Thanks so far, Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
As Ingo told me I run 'echo t /proc/sysrq-trigger' this time. The corresponding part of my syslogs is attached, as well as my kernel config. Could you try the patch below? It dumps some info when flush_workqueue() hangs. I'm compiling a patched kernel right now. As I wrote in my former mail the whole thing not easy to trigger. So it can take some time to get the info. Forgot to say, if you manage to trigger the hang, please wait a couple of minutes to collect more info from flush_wait(). Seems today is my lucky day: I triggered it in just a few minutes. The logfile is attached. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6 spurious hangs
Here is the logfile. Thomas -- keep mailinglists in english, feel free to send PM in german messages.gz Description: application/gzip
Re: 2.6.22-rc6 spurious hangs
Could you also show the result of sysrq-T ? I was so happy that I could trigger it that fast ... ... that I forgot to press Alt-Sysrq-t before reboot. :-( But, I could trigger it again. :-) This time I can offer: - Debug output from Oleg's patch (11x, every 30s) - Alt-Sysrq-t (3x, about 30s between them) There is no lockdep stuff but lockdep must have been running. It's enabled and did not fire before the the bug was triggered. The logfile is attached. (yes it is, I checked twice) Thomas messages.gz Description: application/gzip
Re: [BUG] Lockdep warning with XFS on 2.6.22-rc6
> Patch below should fix this (untested). Just tested 2.6.22-rc6: message is gone when patch is applied. But deleting some directories in /var/tmp (which lives on xfs) I got: BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: held lock freed!
> I'll run memtest86+ this night and post the results tomorrow. Memtest86+ did not show any problems: time 8h pass 24 errors 0 Please remember to CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: held lock freed!
I'll run memtest86+ this night and post the results tomorrow. Memtest86+ did not show any problems: time 8h pass 24 errors 0 Please remember to CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Lockdep warning with XFS on 2.6.22-rc6
Patch below should fix this (untested). Just tested 2.6.22-rc6: message is gone when patch is applied. But deleting some directories in /var/tmp (which lives on xfs) I got: BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: held lock freed!
>> I removed xfs from my system. The first reboot after replacing xfs with >> ext3 brought be > > Perhaps this is a curse that falls on those who desert XFS ;) My laptop sometimes 'kicks' his keyboard. That means no key is working any more, mouse is ok and I can copy and paste single characters in xterms to enter some commands. Pasting 'reboot' in a root-xterm does not work. AFAIR it always happens when a DVB-T recording ends. But, even on daily recordings, only about once a month. Ingo Molnar told me to activate CONFIG_PROVE_LOCKING. Since then I had = [ INFO: possible recursive locking detected ] 2.6.21.5-cfs-v17 #5 - xauth/6510 is trying to acquire lock: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs] but task is already holding lock: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs] other info that might help us debug this: 2 locks held by xauth/6510: #0: (>i_mutex){--..}, at: [] open_namei+0xe2/0x555 #1: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs] stack backtrace: [] __lock_acquire+0x11e/0xb23 [] lock_acquire+0x56/0x6e [] xfs_ilock+0x47/0x67 [xfs] [] down_write+0x2e/0x46 [] xfs_ilock+0x47/0x67 [xfs] [] xfs_ilock+0x47/0x67 [xfs] [] xfs_iget_core+0x291/0x579 [xfs] [] xfs_iget+0x87/0xfd [xfs] [] xfs_trans_iget+0xe6/0x151 [xfs] [] xfs_ialloc+0xb2/0x479 [xfs] [] xfs_dir_ialloc+0x7b/0x29d [xfs] [] down_write+0x2e/0x46 [] xfs_create+0x31c/0x5d6 [xfs] [] xfs_vn_mknod+0x19b/0x2ce [xfs] [] vfs_create+0xa5/0xeb [] open_namei+0x177/0x555 [] get_unused_fd+0x1f/0xb4 [] do_filp_open+0x25/0x39 [] _spin_unlock+0x14/0x1c [] get_unused_fd+0xaa/0xb4 [] do_sys_open+0x42/0xc3 [] sys_open+0x1c/0x1e [] sysenter_past_esp+0x5f/0x99 === or = [ INFO: possible recursive locking detected ] 2.6.21.5-cfs-v17 #5 - dotlockfile/6467 is trying to acquire lock: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs] but task is already holding lock: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs] other info that might help us debug this: 2 locks held by dotlockfile/6467: #0: (>i_mutex){--..}, at: [] open_namei+0xe2/0x555 #1: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 stack backtrace: [] __lock_acquire+0x11e/0xb23 [] lock_acquire+0x56/0x6e [] xfs_ilock+0x47/0x67 [xfs] [] down_write+0x2e/0x46 [] xfs_ilock+0x47/0x67 [xfs] [] xfs_ilock+0x47/0x67 [xfs] [] xfs_iget_core+0x291/0x579 [xfs] [] xfs_iget+0x87/0xfd [xfs] [] xfs_trans_iget+0xe6/0x151 [xfs] [] xfs_ialloc+0xb2/0x479 [xfs] [] xfs_dir_ialloc+0x7b/0x29d [xfs] [] down_write+0x2e/0x46 [] xfs_create+0x31c/0x5d6 [xfs] [] xfs_vn_mknod+0x19b/0x2ce [xfs] [] vfs_create+0xa5/0xeb [] open_namei+0x177/0x555 [] get_unused_fd+0x1f/0xb4 [] do_filp_open+0x25/0x39 [] _spin_unlock+0x14/0x1c [] get_unused_fd+0xaa/0xb4 [] do_sys_open+0x42/0xc3 [] sys_open+0x1c/0x1e [] sysenter_past_esp+0x5f/0x99 === in dmesg right after bootup. Ingo said that xfs used to have problems with lockdep, but that this doesn't mean there's anything wrong with XFS, but that lockdep turns itself off after it finds the first locking problem. So I formated the data partition as ext3, which gave me the formerly posted info. > Odd. I can't see any error at the shmem_delete_inode end nor at the > free_fdtable_rcu end. It seems to be some kind of corruption, whereby > free_fdtable_rcu is kfree'ing some memory (perhaps fdt->open_fds), > but the address kfreed is that of the shmem_sb_info in which it has > just acquired a spinlock at the top of the stack. There was a typo in my mail, I'm running 2.6.21.5-cfs-v18, not 2.6.22, sorry for that. > It could come about through a single-bit error, and I was going to > suggest that you give memtest86+ a good run overnight. And still do > suggest that, though we seem to have rather too much of a coincidence > for it to be a likely explanation. But I've no other ideas, sorry. I'll run memtest86+ this night and post the results tomorrow. Please remember to CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: held lock freed!
Hi there ... I removed xfs from my system. The first reboot after replacing xfs with ext3 brought be Jun 26 08:43:17 pearl = Jun 26 08:43:17 pearl [ BUG: held lock freed! ] Jun 26 08:43:17 pearl - Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f, with a lock still held there! Jun 26 08:43:17 pearl (>stat_lock){--..}, at: [] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl 1 lock held by udevd/3064: Jun 26 08:43:17 pearl #0: (>stat_lock){--..}, at: [] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl Jun 26 08:43:17 pearl stack backtrace: Jun 26 08:43:17 pearl [] debug_check_no_locks_freed+0xe7/0x11a Jun 26 08:43:17 pearl [] kfree+0x45/0x7f Jun 26 08:43:17 pearl [] free_fdtable_rcu+0x3a/0x70 Jun 26 08:43:17 pearl [] __rcu_process_callbacks+0xfd/0x165 Jun 26 08:43:17 pearl [] rcu_process_callbacks+0xf/0x1e Jun 26 08:43:17 pearl [] tasklet_action+0x3d/0x68 Jun 26 08:43:17 pearl [] __do_softirq+0x41/0x92 Jun 26 08:43:17 pearl [] do_softirq+0x27/0x3d Jun 26 08:43:17 pearl [] irq_exit+0x35/0x64 Jun 26 08:43:17 pearl [] do_IRQ+0x7e/0x92 Jun 26 08:43:17 pearl [] common_interrupt+0x24/0x34 Jun 26 08:43:17 pearl [] common_interrupt+0x2e/0x34 Jun 26 08:43:17 pearl [] lock_acquire+0x68/0x6e Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [] _spin_lock+0x29/0x34 Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [] shmem_delete_inode+0x0/0xda Jun 26 08:43:17 pearl [] generic_delete_inode+0x8c/0xf4 Jun 26 08:43:17 pearl [] iput+0x60/0x62 Jun 26 08:43:17 pearl [] do_unlinkat+0xbe/0x132 Jun 26 08:43:17 pearl [] sysenter_past_esp+0x8f/0x99 Jun 26 08:43:17 pearl [] trace_hardirqs_on+0x11e/0x141 Jun 26 08:43:17 pearl [] sysenter_past_esp+0x5f/0x99 Jun 26 08:43:17 pearl === But it only came once, several reboots after that were ok. I changed my kernel config today: e1000 is now "=y" (was "=m"), I removed PCMCIA as I do not use it and some other modules complained about it, and I added CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y) The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo (x86), quite up to date, udev is 104-r12. Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: held lock freed!
Hi there ... I removed xfs from my system. The first reboot after replacing xfs with ext3 brought be Jun 26 08:43:17 pearl = Jun 26 08:43:17 pearl [ BUG: held lock freed! ] Jun 26 08:43:17 pearl - Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f, with a lock still held there! Jun 26 08:43:17 pearl (sbinfo-stat_lock){--..}, at: [c0157eb1] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl 1 lock held by udevd/3064: Jun 26 08:43:17 pearl #0: (sbinfo-stat_lock){--..}, at: [c0157eb1] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl Jun 26 08:43:17 pearl stack backtrace: Jun 26 08:43:17 pearl [c0135331] debug_check_no_locks_freed+0xe7/0x11a Jun 26 08:43:17 pearl [c0158649] kfree+0x45/0x7f Jun 26 08:43:17 pearl [c016c3d2] free_fdtable_rcu+0x3a/0x70 Jun 26 08:43:17 pearl [c012a95f] __rcu_process_callbacks+0xfd/0x165 Jun 26 08:43:17 pearl [c012a9d6] rcu_process_callbacks+0xf/0x1e Jun 26 08:43:17 pearl [c0120c69] tasklet_action+0x3d/0x68 Jun 26 08:43:17 pearl [c0120b9e] __do_softirq+0x41/0x92 Jun 26 08:43:17 pearl [c0120c16] do_softirq+0x27/0x3d Jun 26 08:43:17 pearl [c0120fae] irq_exit+0x35/0x64 Jun 26 08:43:17 pearl [c0106108] do_IRQ+0x7e/0x92 Jun 26 08:43:17 pearl [c0104634] common_interrupt+0x24/0x34 Jun 26 08:43:17 pearl [c010463e] common_interrupt+0x2e/0x34 Jun 26 08:43:17 pearl [c0136513] lock_acquire+0x68/0x6e Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [c02e4daa] _spin_lock+0x29/0x34 Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda Jun 26 08:43:17 pearl [c0157df0] shmem_delete_inode+0x0/0xda Jun 26 08:43:17 pearl [c016b2a8] generic_delete_inode+0x8c/0xf4 Jun 26 08:43:17 pearl [c016aa33] iput+0x60/0x62 Jun 26 08:43:17 pearl [c01636d7] do_unlinkat+0xbe/0x132 Jun 26 08:43:17 pearl [c0103bfa] sysenter_past_esp+0x8f/0x99 Jun 26 08:43:17 pearl [c0135227] trace_hardirqs_on+0x11e/0x141 Jun 26 08:43:17 pearl [c0103bca] sysenter_past_esp+0x5f/0x99 Jun 26 08:43:17 pearl === But it only came once, several reboots after that were ok. I changed my kernel config today: e1000 is now =y (was =m), I removed PCMCIA as I do not use it and some other modules complained about it, and I added CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y) The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo (x86), quite up to date, udev is 104-r12. Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: held lock freed!
I removed xfs from my system. The first reboot after replacing xfs with ext3 brought be Perhaps this is a curse that falls on those who desert XFS ;) My laptop sometimes 'kicks' his keyboard. That means no key is working any more, mouse is ok and I can copy and paste single characters in xterms to enter some commands. Pasting 'reboot' in a root-xterm does not work. AFAIR it always happens when a DVB-T recording ends. But, even on daily recordings, only about once a month. Ingo Molnar told me to activate CONFIG_PROVE_LOCKING. Since then I had = [ INFO: possible recursive locking detected ] 2.6.21.5-cfs-v17 #5 - xauth/6510 is trying to acquire lock: ((ip-i_lock)-mr_lock){}, at: [e16e0715] xfs_ilock+0x47/0x67 [xfs] but task is already holding lock: ((ip-i_lock)-mr_lock){}, at: [e16e0715] xfs_ilock+0x47/0x67 [xfs] other info that might help us debug this: 2 locks held by xauth/6510: #0: (inode-i_mutex){--..}, at: [c016477c] open_namei+0xe2/0x555 #1: ((ip-i_lock)-mr_lock){}, at: [e16e0715] xfs_ilock+0x47/0x67 [xfs] stack backtrace: [c0136026] __lock_acquire+0x11e/0xb23 [c0136dd9] lock_acquire+0x56/0x6e [e16e0715] xfs_ilock+0x47/0x67 [xfs] [c012fe10] down_write+0x2e/0x46 [e16e0715] xfs_ilock+0x47/0x67 [xfs] [e16e0715] xfs_ilock+0x47/0x67 [xfs] [e16e0fa3] xfs_iget_core+0x291/0x579 [xfs] [e16e1312] xfs_iget+0x87/0xfd [xfs] [e16f7665] xfs_trans_iget+0xe6/0x151 [xfs] [e16e4a53] xfs_ialloc+0xb2/0x479 [xfs] [e16f7fda] xfs_dir_ialloc+0x7b/0x29d [xfs] [c012fe10] down_write+0x2e/0x46 [e16fda9b] xfs_create+0x31c/0x5d6 [xfs] [e1707676] xfs_vn_mknod+0x19b/0x2ce [xfs] [c01621f6] vfs_create+0xa5/0xeb [c0164811] open_namei+0x177/0x555 [c015a8ab] get_unused_fd+0x1f/0xb4 [c015ab5c] do_filp_open+0x25/0x39 [c02e57da] _spin_unlock+0x14/0x1c [c015a936] get_unused_fd+0xaa/0xb4 [c015abb2] do_sys_open+0x42/0xc3 [c015ac6c] sys_open+0x1c/0x1e [c0103bca] sysenter_past_esp+0x5f/0x99 === or = [ INFO: possible recursive locking detected ] 2.6.21.5-cfs-v17 #5 - dotlockfile/6467 is trying to acquire lock: ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67 [xfs] but task is already holding lock: ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67 [xfs] other info that might help us debug this: 2 locks held by dotlockfile/6467: #0: (inode-i_mutex){--..}, at: [c016477c] open_namei+0xe2/0x555 #1: ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67 stack backtrace: [c0136026] __lock_acquire+0x11e/0xb23 [c0136dd9] lock_acquire+0x56/0x6e [e15a6715] xfs_ilock+0x47/0x67 [xfs] [c012fe10] down_write+0x2e/0x46 [e15a6715] xfs_ilock+0x47/0x67 [xfs] [e15a6715] xfs_ilock+0x47/0x67 [xfs] [e15a6fa3] xfs_iget_core+0x291/0x579 [xfs] [e15a7312] xfs_iget+0x87/0xfd [xfs] [e15bd665] xfs_trans_iget+0xe6/0x151 [xfs] [e15aaa53] xfs_ialloc+0xb2/0x479 [xfs] [e15bdfda] xfs_dir_ialloc+0x7b/0x29d [xfs] [c012fe10] down_write+0x2e/0x46 [e15c3a9b] xfs_create+0x31c/0x5d6 [xfs] [e15cd676] xfs_vn_mknod+0x19b/0x2ce [xfs] [c01621f6] vfs_create+0xa5/0xeb [c0164811] open_namei+0x177/0x555 [c015a8ab] get_unused_fd+0x1f/0xb4 [c015ab5c] do_filp_open+0x25/0x39 [c02e57da] _spin_unlock+0x14/0x1c [c015a936] get_unused_fd+0xaa/0xb4 [c015abb2] do_sys_open+0x42/0xc3 [c015ac6c] sys_open+0x1c/0x1e [c0103bca] sysenter_past_esp+0x5f/0x99 === in dmesg right after bootup. Ingo said that xfs used to have problems with lockdep, but that this doesn't mean there's anything wrong with XFS, but that lockdep turns itself off after it finds the first locking problem. So I formated the data partition as ext3, which gave me the formerly posted info. Odd. I can't see any error at the shmem_delete_inode end nor at the free_fdtable_rcu end. It seems to be some kind of corruption, whereby free_fdtable_rcu is kfree'ing some memory (perhaps fdt-open_fds), but the address kfreed is that of the shmem_sb_info in which it has just acquired a spinlock at the top of the stack. There was a typo in my mail, I'm running 2.6.21.5-cfs-v18, not 2.6.22, sorry for that. It could come about through a single-bit error, and I was going to suggest that you give memtest86+ a good run overnight. And still do suggest that, though we seem to have rather too much of a coincidence for it to be a likely explanation. But I've no other ideas, sorry. I'll run memtest86+ this night and post the results tomorrow. Please remember to CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: at fs/inotify.c:172 set_dentry_child_flags()
Hi there ... I upgraded from 2.6.19-ck2 to 2.6.21.5-cfs-v17 three days ago. Since than I have the following in my /var/log/message, up to 30 times a day: Jun 22 13:37:18 silver BUG: at fs/inotify.c:172 set_dentry_child_flags() Jun 22 13:37:18 silver [] set_dentry_child_flags+0xc5/0x174 Jun 22 13:37:18 silver [] remove_watch_no_event+0x6f/0x71 Jun 22 13:37:18 silver [] inotify_destroy+0x5d/0xa9 Jun 22 13:37:18 silver [] inotify_release+0x14/0x5c Jun 22 13:37:18 silver [] __fput+0x16a/0x17b Jun 22 13:37:18 silver [] filp_close+0x43/0x6d Jun 22 13:37:18 silver [] close_files+0x71/0x80 Jun 22 13:37:18 silver [] put_files_struct+0x27/0x56 Jun 22 13:37:18 silver [] do_exit+0x12a/0x40a Jun 22 13:37:18 silver [] sys_read+0x47/0x76 Jun 22 13:37:18 silver [] do_group_exit+0x24/0x75 Jun 22 13:37:18 silver [] syscall_call+0x7/0xb Jun 22 13:37:18 silver [] svcauth_gss_release+0x33d/0x371 Jun 22 13:37:18 silver === Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: at fs/inotify.c:172 set_dentry_child_flags()
Hi there ... I upgraded from 2.6.19-ck2 to 2.6.21.5-cfs-v17 three days ago. Since than I have the following in my /var/log/message, up to 30 times a day: Jun 22 13:37:18 silver BUG: at fs/inotify.c:172 set_dentry_child_flags() Jun 22 13:37:18 silver [c0184b6d] set_dentry_child_flags+0xc5/0x174 Jun 22 13:37:18 silver [c0184ccc] remove_watch_no_event+0x6f/0x71 Jun 22 13:37:18 silver [c0185255] inotify_destroy+0x5d/0xa9 Jun 22 13:37:18 silver [c0185af5] inotify_release+0x14/0x5c Jun 22 13:37:18 silver [c015fd76] __fput+0x16a/0x17b Jun 22 13:37:18 silver [c015e5c6] filp_close+0x43/0x6d Jun 22 13:37:18 silver [c011fe4e] close_files+0x71/0x80 Jun 22 13:37:18 silver [c011feba] put_files_struct+0x27/0x56 Jun 22 13:37:18 silver [c0120850] do_exit+0x12a/0x40a Jun 22 13:37:18 silver [c015f038] sys_read+0x47/0x76 Jun 22 13:37:18 silver [c0120b74] do_group_exit+0x24/0x75 Jun 22 13:37:18 silver [c0103f08] syscall_call+0x7/0xb Jun 22 13:37:18 silver [c03f] svcauth_gss_release+0x33d/0x371 Jun 22 13:37:18 silver === Please CC me as I'm not subscribed to the list. Thomas -- keep mailinglists in english, feel free to send PM in german - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/