Re: 2.6.22-rc6 spurious hangs

2007-07-01 Thread Thomas Sattler
>>> Thomas, any chance you could try the patch below?
>> I'm still testing but I couldn't break it until now.
> Great, thanks a lot Thomas!
The box is still running without a problem,
it seems the bug is fixed.

Thanks a lot,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-07-01 Thread Thomas Sattler
> Thomas, any chance you could try the patch below? It is very, very stupid,
> it was done without any understanding of this code, and of course it is
> completely untested. I doubt very much it is correct, and even if it is
> correct it is definitely not good. It would be great if Dmitry can take a 
> look.

I'm still testing but I couldn't break it until now.
And I didn't find any drawbacks yet.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-07-01 Thread Thomas Sattler
 Thomas, any chance you could try the patch below? It is very, very stupid,
 it was done without any understanding of this code, and of course it is
 completely untested. I doubt very much it is correct, and even if it is
 correct it is definitely not good. It would be great if Dmitry can take a 
 look.

I'm still testing but I couldn't break it until now.
And I didn't find any drawbacks yet.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-07-01 Thread Thomas Sattler
 Thomas, any chance you could try the patch below?
 I'm still testing but I couldn't break it until now.
 Great, thanks a lot Thomas!
The box is still running without a problem,
it seems the bug is fixed.

Thanks a lot,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-29 Thread Thomas Sattler
>> Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]
> 
> cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing
> about drivers/media/dvb/.

Does this mean the problem is in the cinergyt2 driver? I'm having similar
problems with another box but with different hardware. While my laptop is
used as a test system the other one is used as a 'productive' TV-recorder.
I hoped we could trace the bug on the test system and fix the productive
one at the same time. :-/

The other box ("silver") is a desktop, which has two Hauppauge Nova-T DVB-T
PCI cards and one (analog) Hauppauge WinTV PVR-350. Silver only hangs if
the (digital) recording process has to much priority: (silver is running
2.6.21.5-cfs-v17 +squashfs +ivtv)

As I wanted to give as much priority to the recording process as possible
I firstly run dvbd as SCHED_RR. This hung the box quite often, sometimes
after an uptime of several minutes, sometimes after two weeks.

I switched to -ck and run dvbd as SCHED_ISO which worked without *any*
problem for about 18 months. As -ck is discontinued I switched to CFS and
the box hung again (twice until I understood why) when dvbd was running as
nice -15.

ATM dvbd runs with nice -12 but yesterday, during a rsync-transfer of
several >4G files, a recording was broken. 29 seconds of the recorded
stream are lost because the system load was at 5 for about three hours.

Perhaps the 29 missing seconds are caused not by to less CPU time but by
the havy IO of rsync. But on the other hand dvbd is also running at IO
realtime prio 4 (ionice) while rsync run as IO normal.

Any hints?
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-29 Thread Thomas Sattler
 Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]
 
 cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing
 about drivers/media/dvb/.

Does this mean the problem is in the cinergyt2 driver? I'm having similar
problems with another box but with different hardware. While my laptop is
used as a test system the other one is used as a 'productive' TV-recorder.
I hoped we could trace the bug on the test system and fix the productive
one at the same time. :-/

The other box (silver) is a desktop, which has two Hauppauge Nova-T DVB-T
PCI cards and one (analog) Hauppauge WinTV PVR-350. Silver only hangs if
the (digital) recording process has to much priority: (silver is running
2.6.21.5-cfs-v17 +squashfs +ivtv)

As I wanted to give as much priority to the recording process as possible
I firstly run dvbd as SCHED_RR. This hung the box quite often, sometimes
after an uptime of several minutes, sometimes after two weeks.

I switched to -ck and run dvbd as SCHED_ISO which worked without *any*
problem for about 18 months. As -ck is discontinued I switched to CFS and
the box hung again (twice until I understood why) when dvbd was running as
nice -15.

ATM dvbd runs with nice -12 but yesterday, during a rsync-transfer of
several 4G files, a recording was broken. 29 seconds of the recorded
stream are lost because the system load was at 5 for about three hours.

Perhaps the 29 missing seconds are caused not by to less CPU time but by
the havy IO of rsync. But on the other hand dvbd is also running at IO
realtime prio 4 (ionice) while rsync run as IO normal.

Any hints?
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
> Could you also show the result of sysrq-T ?
I was so happy that I could trigger it that fast ...
... that I forgot to press Alt-Sysrq-t before reboot.
:-(

But, I could trigger it again. :-)

This time I can offer:

 - Debug output from Oleg's patch (11x, every 30s)
 - Alt-Sysrq-t (3x, about 30s between them)

There is no lockdep stuff but lockdep must have
been running. It's enabled and did not fire
before the the bug was triggered.

The logfile is attached.
(yes it is, I checked twice)

Thomas



messages.gz
Description: application/gzip


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
Here is the logfile.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german


messages.gz
Description: application/gzip


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
 As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
 corresponding part of my syslogs is attached, as well as my kernel config.
>>> Could you try the patch below? It dumps some info when flush_workqueue()
>>> hangs.
>> I'm compiling a patched kernel right now. As I wrote in my former mail the
>> whole thing not easy to trigger. So it can take some time to get the info.
> 
> Forgot to say, if you manage to trigger the hang, please wait a couple of
> minutes to collect more info from flush_wait().

Seems today is my lucky day: I triggered it in just a few minutes.

The logfile is attached.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
>> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
>> corresponding part of my syslogs is attached, as well as my kernel config.
> 
> Could you try the patch below? It dumps some info when flush_workqueue()
> hangs.

I'm compiling a patched kernel right now. As I wrote in my former mail the
whole thing not easy to trigger. So it can take some time to get the info.

Thanks so far,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
Hi there ...

I'm observing seldom hangs with linux 2.6. I can't tell when exactly it
happened the first time, I think somewhere around 2.6.16 or 2.6.17. I
see it about once or twice a month. With absolutely nothing in the logs.
So far I asked for help:

 - in the -ck list

Mon Sep 4 10:22:06 EST 2006, [ck] ck-patches seem to break DVB-T drivers
(see http://bhhdoa.org.au/pipermail/ck/2006-September/thread.html#6385)

 - in the linux-dvb list

Wed Sep 6 19:02:29 CEST 2006, [linux-dvb] driver problems when using
ck-patchset
(http://www.linuxtv.org/pipermail/linux-dvb/2006-September/thread.html#12649)

 - in the DaLUG (german, currently no archive) 14.09.2006


But nobody could help me so far.

Here is what I do:

I was running different kernels with different patchsets. It happened in
the past on -ck kernels (staircase), vanilla scheduler and cfs. As far
as I can remember the following patches were allways applied: squashfs
and vesa-tng.

Currently I'm running 2.6.22rc6 with cfs-v18, vesa-tng and an
XFS-lockdep patch:

http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc6-v18.patch
http://dev.gentoo.org/~spock/projects/vesafb-tng/archive/vesafb-tng-1.0-rc2-2.6.20-rc2.patch
see http://marc.info/?l=linux-kernel=118286232709378=2

I also installed these kernel modules via gentoo portage:

ati-drivers-8.37.6-r1
fuse-2.6.4-r1
kqemu-1.3.0_pre11
truecrypt-4.3

kqemu and truecrypt weren't loaded, but ati-drivers and fuse were.

The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM.
The distribution in use is gentoo, quite up to date. Attached to the box
is an USB2.0 DVB-T receiver (Cinergy T², Terratec).

In rare cases the keyboard stops working when the T² stops streaming DVB
to the box. It happens when I record the stream to disk as well as when
I stream it to mplayer.

If end of streaming is caused by a keypress, 'q' or 'enter' on mplayer,
that key gets stuck. It's repeated until I reboot the box.

If the recording was scheduled and stops by itself no more keys are
recognized. The keyboard is dead. The laptop's own and the attached
USB-Keyboard. Magic-Sys-Keys are still working.

I can still use the mouse to move windows around, start new xterms via
icewm's panel or copy and past single characters from an xterm to other
xterms.

I can also close most of the open windows, for example firefox and most
xterms. I cannot close an xterm which is started as 'xterm -e top' by
icewm or a vncviewer. Both windows stay open but lose their content.

If a root shell is open I can enter 'reboot' or 'halt' but most of the
time this doesn't reboot or halt. I get the message for an upcoming
shotdown in all xterms but the box doesn't come down.

The systemload continously increases but there is nothing to see in top why.

Ingo Molnar told me to enable CONFIG_PROVE_LOCKING but xfs triggers it
long before the box hangs. I tested the patch mentioned above but it was
triggered by xfs again, see [1] and I didn't reboot between this and the
last hung. [1] http://marc.info/?l=linux-kernel=118295294529681=2

As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
corresponding part of my syslogs is attached, as well as my kernel config.


Another thing I observed with the T² is that it doesn't work if it's
already connected when the laptop boots up. I need to power off,
disconnect and boot. If I connect the T² after bootup it works. I can
also rmmod it's driver when it's not in use.

If I boot the box with the T² connected I cannot use it, the blue led in
the T² is always off and I cannot rmmod the driver. (I don't know
whether I ever tired to rmmod the driver before I tried to use the T².)


Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german


messages.gz
Description: application/gzip


config.gz
Description: application/gzip


2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
Hi there ...

I'm observing seldom hangs with linux 2.6. I can't tell when exactly it
happened the first time, I think somewhere around 2.6.16 or 2.6.17. I
see it about once or twice a month. With absolutely nothing in the logs.
So far I asked for help:

 - in the -ck list

Mon Sep 4 10:22:06 EST 2006, [ck] ck-patches seem to break DVB-T drivers
(see http://bhhdoa.org.au/pipermail/ck/2006-September/thread.html#6385)

 - in the linux-dvb list

Wed Sep 6 19:02:29 CEST 2006, [linux-dvb] driver problems when using
ck-patchset
(http://www.linuxtv.org/pipermail/linux-dvb/2006-September/thread.html#12649)

 - in the DaLUG (german, currently no archive) 14.09.2006


But nobody could help me so far.

Here is what I do:

I was running different kernels with different patchsets. It happened in
the past on -ck kernels (staircase), vanilla scheduler and cfs. As far
as I can remember the following patches were allways applied: squashfs
and vesa-tng.

Currently I'm running 2.6.22rc6 with cfs-v18, vesa-tng and an
XFS-lockdep patch:

http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc6-v18.patch
http://dev.gentoo.org/~spock/projects/vesafb-tng/archive/vesafb-tng-1.0-rc2-2.6.20-rc2.patch
see http://marc.info/?l=linux-kernelm=118286232709378w=2

I also installed these kernel modules via gentoo portage:

ati-drivers-8.37.6-r1
fuse-2.6.4-r1
kqemu-1.3.0_pre11
truecrypt-4.3

kqemu and truecrypt weren't loaded, but ati-drivers and fuse were.

The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM.
The distribution in use is gentoo, quite up to date. Attached to the box
is an USB2.0 DVB-T receiver (Cinergy T², Terratec).

In rare cases the keyboard stops working when the T² stops streaming DVB
to the box. It happens when I record the stream to disk as well as when
I stream it to mplayer.

If end of streaming is caused by a keypress, 'q' or 'enter' on mplayer,
that key gets stuck. It's repeated until I reboot the box.

If the recording was scheduled and stops by itself no more keys are
recognized. The keyboard is dead. The laptop's own and the attached
USB-Keyboard. Magic-Sys-Keys are still working.

I can still use the mouse to move windows around, start new xterms via
icewm's panel or copy and past single characters from an xterm to other
xterms.

I can also close most of the open windows, for example firefox and most
xterms. I cannot close an xterm which is started as 'xterm -e top' by
icewm or a vncviewer. Both windows stay open but lose their content.

If a root shell is open I can enter 'reboot' or 'halt' but most of the
time this doesn't reboot or halt. I get the message for an upcoming
shotdown in all xterms but the box doesn't come down.

The systemload continously increases but there is nothing to see in top why.

Ingo Molnar told me to enable CONFIG_PROVE_LOCKING but xfs triggers it
long before the box hangs. I tested the patch mentioned above but it was
triggered by xfs again, see [1] and I didn't reboot between this and the
last hung. [1] http://marc.info/?l=linux-kernelm=118295294529681w=2

As Ingo told me I run 'echo t  /proc/sysrq-trigger' this time. The
corresponding part of my syslogs is attached, as well as my kernel config.


Another thing I observed with the T² is that it doesn't work if it's
already connected when the laptop boots up. I need to power off,
disconnect and boot. If I connect the T² after bootup it works. I can
also rmmod it's driver when it's not in use.

If I boot the box with the T² connected I cannot use it, the blue led in
the T² is always off and I cannot rmmod the driver. (I don't know
whether I ever tired to rmmod the driver before I tried to use the T².)


Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german


messages.gz
Description: application/gzip


config.gz
Description: application/gzip


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
 As Ingo told me I run 'echo t  /proc/sysrq-trigger' this time. The
 corresponding part of my syslogs is attached, as well as my kernel config.
 
 Could you try the patch below? It dumps some info when flush_workqueue()
 hangs.

I'm compiling a patched kernel right now. As I wrote in my former mail the
whole thing not easy to trigger. So it can take some time to get the info.

Thanks so far,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
 As Ingo told me I run 'echo t  /proc/sysrq-trigger' this time. The
 corresponding part of my syslogs is attached, as well as my kernel config.
 Could you try the patch below? It dumps some info when flush_workqueue()
 hangs.
 I'm compiling a patched kernel right now. As I wrote in my former mail the
 whole thing not easy to trigger. So it can take some time to get the info.
 
 Forgot to say, if you manage to trigger the hang, please wait a couple of
 minutes to collect more info from flush_wait().

Seems today is my lucky day: I triggered it in just a few minutes.

The logfile is attached.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
Here is the logfile.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german


messages.gz
Description: application/gzip


Re: 2.6.22-rc6 spurious hangs

2007-06-28 Thread Thomas Sattler
 Could you also show the result of sysrq-T ?
I was so happy that I could trigger it that fast ...
... that I forgot to press Alt-Sysrq-t before reboot.
:-(

But, I could trigger it again. :-)

This time I can offer:

 - Debug output from Oleg's patch (11x, every 30s)
 - Alt-Sysrq-t (3x, about 30s between them)

There is no lockdep stuff but lockdep must have
been running. It's enabled and did not fire
before the the bug was triggered.

The logfile is attached.
(yes it is, I checked twice)

Thomas



messages.gz
Description: application/gzip


Re: [BUG] Lockdep warning with XFS on 2.6.22-rc6

2007-06-27 Thread Thomas Sattler
> Patch below should fix this (untested).
Just tested 2.6.22-rc6: message is gone when patch is applied. But
deleting some directories in /var/tmp (which lives on xfs) I got:

  BUG: MAX_LOCK_DEPTH too low!
  turning off the locking correctness validator.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed!

2007-06-27 Thread Thomas Sattler
> I'll run memtest86+ this night and post the results tomorrow.

Memtest86+ did not show any problems:

time   8h
pass   24
errors  0

Please remember to CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed!

2007-06-27 Thread Thomas Sattler
 I'll run memtest86+ this night and post the results tomorrow.

Memtest86+ did not show any problems:

time   8h
pass   24
errors  0

Please remember to CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Lockdep warning with XFS on 2.6.22-rc6

2007-06-27 Thread Thomas Sattler
 Patch below should fix this (untested).
Just tested 2.6.22-rc6: message is gone when patch is applied. But
deleting some directories in /var/tmp (which lives on xfs) I got:

  BUG: MAX_LOCK_DEPTH too low!
  turning off the locking correctness validator.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed!

2007-06-26 Thread Thomas Sattler
>> I removed xfs from my system. The first reboot after replacing xfs with
>> ext3 brought be
> 
> Perhaps this is a curse that falls on those who desert XFS ;)

My laptop sometimes 'kicks' his keyboard. That means no key is working
any more, mouse is ok and I can copy and paste single characters in
xterms to enter some commands. Pasting 'reboot' in a root-xterm does not
work.

AFAIR it always happens when a DVB-T recording ends. But, even on daily
recordings, only about once a month. Ingo Molnar told me to activate
CONFIG_PROVE_LOCKING. Since then I had

=
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
-
xauth/6510 is trying to acquire lock:
 (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
 (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by xauth/6510:
 #0:  (>i_mutex){--..}, at: [] open_namei+0xe2/0x555
 #1:  (&(>i_lock)->mr_lock){}, at: []
xfs_ilock+0x47/0x67 [xfs]

stack backtrace:
 [] __lock_acquire+0x11e/0xb23
 [] lock_acquire+0x56/0x6e
 [] xfs_ilock+0x47/0x67 [xfs]
 [] down_write+0x2e/0x46
 [] xfs_ilock+0x47/0x67 [xfs]
 [] xfs_ilock+0x47/0x67 [xfs]
 [] xfs_iget_core+0x291/0x579 [xfs]
 [] xfs_iget+0x87/0xfd [xfs]
 [] xfs_trans_iget+0xe6/0x151 [xfs]
 [] xfs_ialloc+0xb2/0x479 [xfs]
 [] xfs_dir_ialloc+0x7b/0x29d [xfs]
 [] down_write+0x2e/0x46
 [] xfs_create+0x31c/0x5d6 [xfs]
 [] xfs_vn_mknod+0x19b/0x2ce [xfs]
 [] vfs_create+0xa5/0xeb
 [] open_namei+0x177/0x555
 [] get_unused_fd+0x1f/0xb4
 [] do_filp_open+0x25/0x39
 [] _spin_unlock+0x14/0x1c
 [] get_unused_fd+0xaa/0xb4
 [] do_sys_open+0x42/0xc3
 [] sys_open+0x1c/0x1e
 [] sysenter_past_esp+0x5f/0x99
 ===

or

=
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
-
dotlockfile/6467 is trying to acquire lock:
 (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
 (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by dotlockfile/6467:
 #0:  (>i_mutex){--..}, at: [] open_namei+0xe2/0x555
 #1:  (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x47/0x67

stack backtrace:
 [] __lock_acquire+0x11e/0xb23
 [] lock_acquire+0x56/0x6e
 [] xfs_ilock+0x47/0x67 [xfs]
 [] down_write+0x2e/0x46
 [] xfs_ilock+0x47/0x67 [xfs]
 [] xfs_ilock+0x47/0x67 [xfs]
 [] xfs_iget_core+0x291/0x579 [xfs]
 [] xfs_iget+0x87/0xfd [xfs]
 [] xfs_trans_iget+0xe6/0x151 [xfs]
 [] xfs_ialloc+0xb2/0x479 [xfs]
 [] xfs_dir_ialloc+0x7b/0x29d [xfs]
 [] down_write+0x2e/0x46
 [] xfs_create+0x31c/0x5d6 [xfs]
 [] xfs_vn_mknod+0x19b/0x2ce [xfs]
 [] vfs_create+0xa5/0xeb
 [] open_namei+0x177/0x555
 [] get_unused_fd+0x1f/0xb4
 [] do_filp_open+0x25/0x39
 [] _spin_unlock+0x14/0x1c
 [] get_unused_fd+0xaa/0xb4
 [] do_sys_open+0x42/0xc3
 [] sys_open+0x1c/0x1e
 [] sysenter_past_esp+0x5f/0x99
 ===

in dmesg right after bootup. Ingo said that xfs used to have problems
with lockdep, but that this doesn't mean there's anything wrong with
XFS, but that lockdep turns itself off after it finds the first locking
problem. So I formated the data partition as ext3, which gave me the
formerly posted info.

> Odd.  I can't see any error at the shmem_delete_inode end nor at the
> free_fdtable_rcu end.  It seems to be some kind of corruption, whereby
> free_fdtable_rcu is kfree'ing some memory (perhaps fdt->open_fds),
> but the address kfreed is that of the shmem_sb_info in which it has
> just acquired a spinlock at the top of the stack.

There was a typo in my mail, I'm running 2.6.21.5-cfs-v18, not 2.6.22,
sorry for that.

> It could come about through a single-bit error, and I was going to
> suggest that you give memtest86+ a good run overnight.  And still do
> suggest that, though we seem to have rather too much of a coincidence
> for it to be a likely explanation.  But I've no other ideas, sorry.

I'll run memtest86+ this night and post the results tomorrow.

Please remember to CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: held lock freed!

2007-06-26 Thread Thomas Sattler
Hi there ...

I removed xfs from my system. The first reboot after replacing xfs with
ext3 brought be

Jun 26 08:43:17 pearl =
Jun 26 08:43:17 pearl [ BUG: held lock freed! ]
Jun 26 08:43:17 pearl -
Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f,
with a lock still held there!
Jun 26 08:43:17 pearl (>stat_lock){--..}, at: []
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl 1 lock held by udevd/3064:
Jun 26 08:43:17 pearl #0:  (>stat_lock){--..}, at: []
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl
Jun 26 08:43:17 pearl stack backtrace:
Jun 26 08:43:17 pearl [] debug_check_no_locks_freed+0xe7/0x11a
Jun 26 08:43:17 pearl [] kfree+0x45/0x7f
Jun 26 08:43:17 pearl [] free_fdtable_rcu+0x3a/0x70
Jun 26 08:43:17 pearl [] __rcu_process_callbacks+0xfd/0x165
Jun 26 08:43:17 pearl [] rcu_process_callbacks+0xf/0x1e
Jun 26 08:43:17 pearl [] tasklet_action+0x3d/0x68
Jun 26 08:43:17 pearl [] __do_softirq+0x41/0x92
Jun 26 08:43:17 pearl [] do_softirq+0x27/0x3d
Jun 26 08:43:17 pearl [] irq_exit+0x35/0x64
Jun 26 08:43:17 pearl [] do_IRQ+0x7e/0x92
Jun 26 08:43:17 pearl [] common_interrupt+0x24/0x34
Jun 26 08:43:17 pearl [] common_interrupt+0x2e/0x34
Jun 26 08:43:17 pearl [] lock_acquire+0x68/0x6e
Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [] _spin_lock+0x29/0x34
Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [] shmem_delete_inode+0x0/0xda
Jun 26 08:43:17 pearl [] generic_delete_inode+0x8c/0xf4
Jun 26 08:43:17 pearl [] iput+0x60/0x62
Jun 26 08:43:17 pearl [] do_unlinkat+0xbe/0x132
Jun 26 08:43:17 pearl [] sysenter_past_esp+0x8f/0x99
Jun 26 08:43:17 pearl [] trace_hardirqs_on+0x11e/0x141
Jun 26 08:43:17 pearl [] sysenter_past_esp+0x5f/0x99
Jun 26 08:43:17 pearl ===

But it only came once, several reboots after that were ok. I changed my
kernel config today: e1000 is now "=y" (was "=m"), I removed PCMCIA as I
do not use it and some other modules complained about it, and I added
CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y)

The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo
(x86), quite up to date, udev is 104-r12.

Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: held lock freed!

2007-06-26 Thread Thomas Sattler
Hi there ...

I removed xfs from my system. The first reboot after replacing xfs with
ext3 brought be

Jun 26 08:43:17 pearl =
Jun 26 08:43:17 pearl [ BUG: held lock freed! ]
Jun 26 08:43:17 pearl -
Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f,
with a lock still held there!
Jun 26 08:43:17 pearl (sbinfo-stat_lock){--..}, at: [c0157eb1]
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl 1 lock held by udevd/3064:
Jun 26 08:43:17 pearl #0:  (sbinfo-stat_lock){--..}, at: [c0157eb1]
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl
Jun 26 08:43:17 pearl stack backtrace:
Jun 26 08:43:17 pearl [c0135331] debug_check_no_locks_freed+0xe7/0x11a
Jun 26 08:43:17 pearl [c0158649] kfree+0x45/0x7f
Jun 26 08:43:17 pearl [c016c3d2] free_fdtable_rcu+0x3a/0x70
Jun 26 08:43:17 pearl [c012a95f] __rcu_process_callbacks+0xfd/0x165
Jun 26 08:43:17 pearl [c012a9d6] rcu_process_callbacks+0xf/0x1e
Jun 26 08:43:17 pearl [c0120c69] tasklet_action+0x3d/0x68
Jun 26 08:43:17 pearl [c0120b9e] __do_softirq+0x41/0x92
Jun 26 08:43:17 pearl [c0120c16] do_softirq+0x27/0x3d
Jun 26 08:43:17 pearl [c0120fae] irq_exit+0x35/0x64
Jun 26 08:43:17 pearl [c0106108] do_IRQ+0x7e/0x92
Jun 26 08:43:17 pearl [c0104634] common_interrupt+0x24/0x34
Jun 26 08:43:17 pearl [c010463e] common_interrupt+0x2e/0x34
Jun 26 08:43:17 pearl [c0136513] lock_acquire+0x68/0x6e
Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [c02e4daa] _spin_lock+0x29/0x34
Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [c0157eb1] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [c0157df0] shmem_delete_inode+0x0/0xda
Jun 26 08:43:17 pearl [c016b2a8] generic_delete_inode+0x8c/0xf4
Jun 26 08:43:17 pearl [c016aa33] iput+0x60/0x62
Jun 26 08:43:17 pearl [c01636d7] do_unlinkat+0xbe/0x132
Jun 26 08:43:17 pearl [c0103bfa] sysenter_past_esp+0x8f/0x99
Jun 26 08:43:17 pearl [c0135227] trace_hardirqs_on+0x11e/0x141
Jun 26 08:43:17 pearl [c0103bca] sysenter_past_esp+0x5f/0x99
Jun 26 08:43:17 pearl ===

But it only came once, several reboots after that were ok. I changed my
kernel config today: e1000 is now =y (was =m), I removed PCMCIA as I
do not use it and some other modules complained about it, and I added
CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y)

The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo
(x86), quite up to date, udev is 104-r12.

Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed!

2007-06-26 Thread Thomas Sattler
 I removed xfs from my system. The first reboot after replacing xfs with
 ext3 brought be
 
 Perhaps this is a curse that falls on those who desert XFS ;)

My laptop sometimes 'kicks' his keyboard. That means no key is working
any more, mouse is ok and I can copy and paste single characters in
xterms to enter some commands. Pasting 'reboot' in a root-xterm does not
work.

AFAIR it always happens when a DVB-T recording ends. But, even on daily
recordings, only about once a month. Ingo Molnar told me to activate
CONFIG_PROVE_LOCKING. Since then I had

=
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
-
xauth/6510 is trying to acquire lock:
 ((ip-i_lock)-mr_lock){}, at: [e16e0715] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
 ((ip-i_lock)-mr_lock){}, at: [e16e0715] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by xauth/6510:
 #0:  (inode-i_mutex){--..}, at: [c016477c] open_namei+0xe2/0x555
 #1:  ((ip-i_lock)-mr_lock){}, at: [e16e0715]
xfs_ilock+0x47/0x67 [xfs]

stack backtrace:
 [c0136026] __lock_acquire+0x11e/0xb23
 [c0136dd9] lock_acquire+0x56/0x6e
 [e16e0715] xfs_ilock+0x47/0x67 [xfs]
 [c012fe10] down_write+0x2e/0x46
 [e16e0715] xfs_ilock+0x47/0x67 [xfs]
 [e16e0715] xfs_ilock+0x47/0x67 [xfs]
 [e16e0fa3] xfs_iget_core+0x291/0x579 [xfs]
 [e16e1312] xfs_iget+0x87/0xfd [xfs]
 [e16f7665] xfs_trans_iget+0xe6/0x151 [xfs]
 [e16e4a53] xfs_ialloc+0xb2/0x479 [xfs]
 [e16f7fda] xfs_dir_ialloc+0x7b/0x29d [xfs]
 [c012fe10] down_write+0x2e/0x46
 [e16fda9b] xfs_create+0x31c/0x5d6 [xfs]
 [e1707676] xfs_vn_mknod+0x19b/0x2ce [xfs]
 [c01621f6] vfs_create+0xa5/0xeb
 [c0164811] open_namei+0x177/0x555
 [c015a8ab] get_unused_fd+0x1f/0xb4
 [c015ab5c] do_filp_open+0x25/0x39
 [c02e57da] _spin_unlock+0x14/0x1c
 [c015a936] get_unused_fd+0xaa/0xb4
 [c015abb2] do_sys_open+0x42/0xc3
 [c015ac6c] sys_open+0x1c/0x1e
 [c0103bca] sysenter_past_esp+0x5f/0x99
 ===

or

=
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
-
dotlockfile/6467 is trying to acquire lock:
 ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
 ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by dotlockfile/6467:
 #0:  (inode-i_mutex){--..}, at: [c016477c] open_namei+0xe2/0x555
 #1:  ((ip-i_lock)-mr_lock){}, at: [e15a6715] xfs_ilock+0x47/0x67

stack backtrace:
 [c0136026] __lock_acquire+0x11e/0xb23
 [c0136dd9] lock_acquire+0x56/0x6e
 [e15a6715] xfs_ilock+0x47/0x67 [xfs]
 [c012fe10] down_write+0x2e/0x46
 [e15a6715] xfs_ilock+0x47/0x67 [xfs]
 [e15a6715] xfs_ilock+0x47/0x67 [xfs]
 [e15a6fa3] xfs_iget_core+0x291/0x579 [xfs]
 [e15a7312] xfs_iget+0x87/0xfd [xfs]
 [e15bd665] xfs_trans_iget+0xe6/0x151 [xfs]
 [e15aaa53] xfs_ialloc+0xb2/0x479 [xfs]
 [e15bdfda] xfs_dir_ialloc+0x7b/0x29d [xfs]
 [c012fe10] down_write+0x2e/0x46
 [e15c3a9b] xfs_create+0x31c/0x5d6 [xfs]
 [e15cd676] xfs_vn_mknod+0x19b/0x2ce [xfs]
 [c01621f6] vfs_create+0xa5/0xeb
 [c0164811] open_namei+0x177/0x555
 [c015a8ab] get_unused_fd+0x1f/0xb4
 [c015ab5c] do_filp_open+0x25/0x39
 [c02e57da] _spin_unlock+0x14/0x1c
 [c015a936] get_unused_fd+0xaa/0xb4
 [c015abb2] do_sys_open+0x42/0xc3
 [c015ac6c] sys_open+0x1c/0x1e
 [c0103bca] sysenter_past_esp+0x5f/0x99
 ===

in dmesg right after bootup. Ingo said that xfs used to have problems
with lockdep, but that this doesn't mean there's anything wrong with
XFS, but that lockdep turns itself off after it finds the first locking
problem. So I formated the data partition as ext3, which gave me the
formerly posted info.

 Odd.  I can't see any error at the shmem_delete_inode end nor at the
 free_fdtable_rcu end.  It seems to be some kind of corruption, whereby
 free_fdtable_rcu is kfree'ing some memory (perhaps fdt-open_fds),
 but the address kfreed is that of the shmem_sb_info in which it has
 just acquired a spinlock at the top of the stack.

There was a typo in my mail, I'm running 2.6.21.5-cfs-v18, not 2.6.22,
sorry for that.

 It could come about through a single-bit error, and I was going to
 suggest that you give memtest86+ a good run overnight.  And still do
 suggest that, though we seem to have rather too much of a coincidence
 for it to be a likely explanation.  But I've no other ideas, sorry.

I'll run memtest86+ this night and post the results tomorrow.

Please remember to CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: at fs/inotify.c:172 set_dentry_child_flags()

2007-06-22 Thread Thomas Sattler
Hi there ...

I upgraded from 2.6.19-ck2 to 2.6.21.5-cfs-v17 three days ago.

Since than I have the following in my /var/log/message, up to
30 times a day:

Jun 22 13:37:18 silver BUG: at fs/inotify.c:172 set_dentry_child_flags()
Jun 22 13:37:18 silver [] set_dentry_child_flags+0xc5/0x174
Jun 22 13:37:18 silver [] remove_watch_no_event+0x6f/0x71
Jun 22 13:37:18 silver [] inotify_destroy+0x5d/0xa9
Jun 22 13:37:18 silver [] inotify_release+0x14/0x5c
Jun 22 13:37:18 silver [] __fput+0x16a/0x17b
Jun 22 13:37:18 silver [] filp_close+0x43/0x6d
Jun 22 13:37:18 silver [] close_files+0x71/0x80
Jun 22 13:37:18 silver [] put_files_struct+0x27/0x56
Jun 22 13:37:18 silver [] do_exit+0x12a/0x40a
Jun 22 13:37:18 silver [] sys_read+0x47/0x76
Jun 22 13:37:18 silver [] do_group_exit+0x24/0x75
Jun 22 13:37:18 silver [] syscall_call+0x7/0xb
Jun 22 13:37:18 silver [] svcauth_gss_release+0x33d/0x371
Jun 22 13:37:18 silver ===

Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: at fs/inotify.c:172 set_dentry_child_flags()

2007-06-22 Thread Thomas Sattler
Hi there ...

I upgraded from 2.6.19-ck2 to 2.6.21.5-cfs-v17 three days ago.

Since than I have the following in my /var/log/message, up to
30 times a day:

Jun 22 13:37:18 silver BUG: at fs/inotify.c:172 set_dentry_child_flags()
Jun 22 13:37:18 silver [c0184b6d] set_dentry_child_flags+0xc5/0x174
Jun 22 13:37:18 silver [c0184ccc] remove_watch_no_event+0x6f/0x71
Jun 22 13:37:18 silver [c0185255] inotify_destroy+0x5d/0xa9
Jun 22 13:37:18 silver [c0185af5] inotify_release+0x14/0x5c
Jun 22 13:37:18 silver [c015fd76] __fput+0x16a/0x17b
Jun 22 13:37:18 silver [c015e5c6] filp_close+0x43/0x6d
Jun 22 13:37:18 silver [c011fe4e] close_files+0x71/0x80
Jun 22 13:37:18 silver [c011feba] put_files_struct+0x27/0x56
Jun 22 13:37:18 silver [c0120850] do_exit+0x12a/0x40a
Jun 22 13:37:18 silver [c015f038] sys_read+0x47/0x76
Jun 22 13:37:18 silver [c0120b74] do_group_exit+0x24/0x75
Jun 22 13:37:18 silver [c0103f08] syscall_call+0x7/0xb
Jun 22 13:37:18 silver [c03f] svcauth_gss_release+0x33d/0x371
Jun 22 13:37:18 silver ===

Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/