Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-08 Thread Alan Stern
On Wed, 7 Mar 2007, Pete Zaitcev wrote:

> On Wed, 7 Mar 2007 17:18:29 -0500 (EST), Alan Stern <[EMAIL PROTECTED]> wrote:
> 
> > I've never heard of a process failing to show up in a SysRq-t listing.  It 
> > suggests something is wrong with the process management in the kernel you 
> > were using.  That leads me to think a non -mm kernel might give more 
> > informative results.
> 
> I think, if a process is looping, it's not shown in SysRq-t. So maybe
> khubd is on a CPU.

You mean, if it is currently running?  I don't believe that.  A simple 
test comparison shows every process listed in "ps -A" also listed in 
SysRq-t.

> In RHEL we have a patch for SysRq-w, which showed all CPU states by
> the way of a special IPI (unless looping with closed interrups, of course).
> But this capability seems a bit degraded in stock SysRq-w. It might not
> catch this (does not seem for me in 2.6.20).
> 
> Another possibility is, something killed khubd. It's only a process
> after all. Remember how we had grief with it being killed by "telinit 1".

If it was killed then it wouldn't show up in "ps" or as a directory under 
/proc.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Pete Zaitcev
On Wed, 7 Mar 2007 17:18:29 -0500 (EST), Alan Stern <[EMAIL PROTECTED]> wrote:

> I've never heard of a process failing to show up in a SysRq-t listing.  It 
> suggests something is wrong with the process management in the kernel you 
> were using.  That leads me to think a non -mm kernel might give more 
> informative results.

I think, if a process is looping, it's not shown in SysRq-t. So maybe
khubd is on a CPU.

In RHEL we have a patch for SysRq-w, which showed all CPU states by
the way of a special IPI (unless looping with closed interrups, of course).
But this capability seems a bit degraded in stock SysRq-w. It might not
catch this (does not seem for me in 2.6.20).

Another possibility is, something killed khubd. It's only a process
after all. Remember how we had grief with it being killed by "telinit 1".

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Alan Stern
On Wed, 7 Mar 2007, Eric Buddington wrote:

> On Wed, Mar 07, 2007 at 03:22:05PM -0500, Alan Stern wrote:
> > On Wed, 7 Mar 2007, Eric Buddington,,, wrote:
> > 
> > > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
> > > > > me anything at all. What else can I try?
> > 
> > How about SysRq-r?
> 
> SysRq : Keyboard mode set to XLATE

Whoops, I was thinking of SysRq-p, which you have already tried.

> > These problems start with some USB resets, right?  Did they occur with 
> > earlier kernel versions, or is this new behavior?
> 
> Yes, the problem starts with USB resets (or USB errors that trigger a reset)
> 
> > How often does the problem occur?
> 
> Recently, the USB drive has choked up after several hours of moderate
> use. However, before this instance, it would consistently hang up my
> watchdog process and force a system reboot (no idea why; the watchdog
> process didn't use this drive at all). This may have changed when I
> upgraded from 2.6.20-rc6-mm3 to 2.6.20-mm2, but my sample size is to
> small to be sure.

What about earlier kernels?  Does 2.6.19 work any better?

What about non -mm kernels, like 2.6.21-rc2?

I've never heard of a process failing to show up in a SysRq-t listing.  It 
suggests something is wrong with the process management in the kernel you 
were using.  That leads me to think a non -mm kernel might give more 
informative results.

And as long as you're testing, you might as well also turn on 
CONFIG_USB_DEBUG.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Eric Buddington
On Wed, Mar 07, 2007 at 03:22:05PM -0500, Alan Stern wrote:
> On Wed, 7 Mar 2007, Eric Buddington,,, wrote:
> 
> > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
> > > > me anything at all. What else can I try?
> 
> How about SysRq-r?

SysRq : Keyboard mode set to XLATE

> These problems start with some USB resets, right?  Did they occur with 
> earlier kernel versions, or is this new behavior?

Yes, the problem starts with USB resets (or USB errors that trigger a reset)

> How often does the problem occur?

Recently, the USB drive has choked up after several hours of moderate
use. However, before this instance, it would consistently hang up my
watchdog process and force a system reboot (no idea why; the watchdog
process didn't use this drive at all). This may have changed when I
upgraded from 2.6.20-rc6-mm3 to 2.6.20-mm2, but my sample size is to
small to be sure.

-Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Alan Stern
On Wed, 7 Mar 2007, Eric Buddington,,, wrote:

> > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
> > > me anything at all. What else can I try?

How about SysRq-r?

> > I'm baffled.  khubd should have shown up as the process with ID 163.  Is 
> > that process listed under a different name?
> 
> It does show up under /proc/163, for whatever that's worth.

Maybe that means the process is dying but isn't completely dead yet, and 
it's stuck running something inside the reiser4 driver.  Unless we can 
find out what it is, though, there isn't much we can do.

> Going through the list of processes dumped by SysRq-t, here are the
> ones I didn't knowingly start myself:
> 
> aio/0
> ata/0
> ata_aux
> ent:md1.
> events/0
> ib_addr
> ib_cm/0
> ib_mcast
> iw_cm_wq
> kacpid
> kblockd/0
> kcryptd/0
> khelper
> khpsbpkt
> kmirrord
> kmpathd/0
> kprefetchd
> kpsmoused
> kseriod
> ksnapd
> ksuspend_usbd
> kswapd0
> kthread
> ktxnmgrd:md1:
> ktxnmgrd:sda1
> md2_raid1
> pdflush
> rdma_cm
> reiserfs/0
> scsi_eh_0 
> watchdog/0

Most of those (maybe all of them) are built-in kernel threads.

> And khubd is still showing up as a major CPU consumer.  Interestingly,
> ent:sda1! is also absent from the SysRq-t listing, though present (and
> using lost of CPU) according to 'top' (ps, oddly, hangs in 'D' state
> after listing some processes).

Something is badly messed up somewhere, but I have no idea what it could 
be.

These problems start with some USB resets, right?  Did they occur with 
earlier kernel versions, or is this new behavior?

How often does the problem occur?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Eric Buddington,,,
On Wed, Mar 07, 2007 at 11:03:21AM -0500, Alan Stern wrote:
> On Tue, 6 Mar 2007, Eric Buddington wrote:
> 
> > On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote:
> > > The stack trace didn't include the khubd process at all.  Probably that 
> > > means it had already died.
> > 
> > No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and
> > khubd did not show up in dmesg:
> > 
> > -bash-2.05b# echo t >/proc/sysrq-trigger
> > -bash-2.05b# dmesg | grep khub
> > -bash-2.05b# dmesg | grep SysRq
> > SysRq : Show State
> > SysRq : Show State
> > -bash-2.05b# ps ax | grep khubd
> >   163 ?R<   633:41 [khubd]
> > -bash-2.05b# echo p >/proc/sysrq-trigger
> > -bash-2.05b# dmesg | tail -2
> >  ===
> > SysRq : Show Regs
> > -bash-2.05b#
> > 
> > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
> > me anything at all. What else can I try?
> 
> I'm baffled.  khubd should have shown up as the process with ID 163.  Is 
> that process listed under a different name?

It does show up under /proc/163, for whatever that's worth.

Going through the list of processes dumped by SysRq-t, here are the
ones I didn't knowingly start myself:

aio/0
ata/0
ata_aux
ent:md1.
events/0
ib_addr
ib_cm/0
ib_mcast
iw_cm_wq
kacpid
kblockd/0
kcryptd/0
khelper
khpsbpkt
kmirrord
kmpathd/0
kprefetchd
kpsmoused
kseriod
ksnapd
ksuspend_usbd
kswapd0
kthread
ktxnmgrd:md1:
ktxnmgrd:sda1
md2_raid1
pdflush
rdma_cm
reiserfs/0
scsi_eh_0 
watchdog/0

And khubd is still showing up as a major CPU consumer.  Interestingly,
ent:sda1! is also absent from the SysRq-t listing, though present (and
using lost of CPU) according to 'top' (ps, oddly, hangs in 'D' state
after listing some processes).

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-07 Thread Alan Stern
On Tue, 6 Mar 2007, Eric Buddington wrote:

> On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote:
> > The stack trace didn't include the khubd process at all.  Probably that 
> > means it had already died.
> 
> No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and
> khubd did not show up in dmesg:
> 
> -bash-2.05b# echo t >/proc/sysrq-trigger
> -bash-2.05b# dmesg | grep khub
> -bash-2.05b# dmesg | grep SysRq
> SysRq : Show State
> SysRq : Show State
> -bash-2.05b# ps ax | grep khubd
>   163 ?R<   633:41 [khubd]
> -bash-2.05b# echo p >/proc/sysrq-trigger
> -bash-2.05b# dmesg | tail -2
>  ===
> SysRq : Show Regs
> -bash-2.05b#
> 
> So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
> me anything at all. What else can I try?

I'm baffled.  khubd should have shown up as the process with ID 163.  Is 
that process listed under a different name?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-06 Thread Eric Buddington
On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote:
> The stack trace didn't include the khubd process at all.  Probably that 
> means it had already died.

No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and
khubd did not show up in dmesg:

-bash-2.05b# echo t >/proc/sysrq-trigger
-bash-2.05b# dmesg | grep khub
-bash-2.05b# dmesg | grep SysRq
SysRq : Show State
SysRq : Show State
-bash-2.05b# ps ax | grep khubd
  163 ?R<   633:41 [khubd]
-bash-2.05b# echo p >/proc/sysrq-trigger
-bash-2.05b# dmesg | tail -2
 ===
SysRq : Show Regs
-bash-2.05b#

So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give
me anything at all. What else can I try?

-Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-06 Thread Alan Stern
On Tue, 6 Mar 2007, Eric Buddington wrote:

> On Tue, Mar 06, 2007 at 10:36:20AM -0500, Alan Stern wrote:
> > On Tue, 6 Mar 2007, Oliver Neukum wrote:
> > 
> > > > Am Dienstag, 6. M??rz 2007 05:13 schrieb Eric Buddington:
> > > > reiser4[khubd(163)]: commit_current_atom 
> > > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> > > > WARNING: Flushing like mad: 16384
> > > > reiser4[khubd(163)]: commit_current_atom 
> > > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> > > > WARNING: Flushing like mad: 32768
> > > > ...
> > > > 
> > > > Most problematically, khubd and ent:sda1! are conspiring to suck 100%
> > > > CPU time, even after powering off the drive. A bunch of processes are
> > > > stuck in 'D' state, possibly because they're trying to access the dead
> > > > disk, which won't umount ("device is busy").
> > > 
> > > It looks like khubd allocates memory and enters reiser4. Possibly we have
> > > GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has
> > > a problem dealing with IO failures.
> > 
> > A more complete stack trace (for example, Alt-SysRq-T) would help.

The stack trace didn't include the khubd process at all.  Probably that 
means it had already died.

On the good side, if khubd is dead, it can't be using up 100% of the CPU 
time!  :-)

You need to find out somehow what khubd is doing while it is so busy.  It 
could easily be, like Oliver suggested, that reiser4 is unable to handle 
I/O failures and gets stuck.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-06 Thread Alan Stern
On Tue, 6 Mar 2007, Oliver Neukum wrote:

> > Am Dienstag, 6. März 2007 05:13 schrieb Eric Buddington:
> > reiser4[khubd(163)]: commit_current_atom 
> > (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> > WARNING: Flushing like mad: 16384
> > reiser4[khubd(163)]: commit_current_atom 
> > (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> > WARNING: Flushing like mad: 32768
> > ...
> > 
> > Most problematically, khubd and ent:sda1! are conspiring to suck 100%
> > CPU time, even after powering off the drive. A bunch of processes are
> > stuck in 'D' state, possibly because they're trying to access the dead
> > disk, which won't umount ("device is busy").
> 
> It looks like khubd allocates memory and enters reiser4. Possibly we have
> GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has
> a problem dealing with IO failures.

A more complete stack trace (for example, Alt-SysRq-T) would help.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-06 Thread Oliver Neukum
Am Dienstag, 6. März 2007 05:13 schrieb Eric Buddington:
> reiser4[khubd(163)]: commit_current_atom 
> (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> WARNING: Flushing like mad: 16384
> reiser4[khubd(163)]: commit_current_atom 
> (fs/reiser4/txnmgr.c:1049)[nikita-3176]:
> WARNING: Flushing like mad: 32768
> ...
> 
> Most problematically, khubd and ent:sda1! are conspiring to suck 100%
> CPU time, even after powering off the drive. A bunch of processes are
> stuck in 'D' state, possibly because they're trying to access the dead
> disk, which won't umount ("device is busy").

It looks like khubd allocates memory and enters reiser4. Possibly we have
GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has
a problem dealing with IO failures.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


khubd and ent:sda1 sucking CPU with reiser4 + USB HD

2007-03-05 Thread Eric Buddington
Kernel 2.6.20-mm2 on an Athlon XP.

Caveat: I know my USB cabling may cause the initial failure, but I
think the system should be failing more gracefully.

After hours of backing up to my external USB drive, I see in dmesg:

APIC error on CPU0: 00(02)
APIC error on CPU0: 02(02)
APIC error on CPU0: 02(02)
...
usb 1-6.2: reset high speed USB device using ehci_hcd and address 7
usb 1-6.2: device descriptor read/64, error -110
usb 1-6.2: device descriptor read/64, error -110
...
sd 0:0:0:0: scsi: Device offlined - not ready after error recovery
sd 0:0:0:0: SCSI error: return code = 0x0005
end_request: I/O error, dev sda, sector 612518599
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: SCSI error: return code = 0x0001
end_request: I/O error, dev sda, sector 612518839
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
...
reiser4[khubd(163)]: commit_current_atom 
(fs/reiser4/txnmgr.c:1049)[nikita-3176]:
WARNING: Flushing like mad: 16384
reiser4[khubd(163)]: commit_current_atom 
(fs/reiser4/txnmgr.c:1049)[nikita-3176]:
WARNING: Flushing like mad: 32768
...

Most problematically, khubd and ent:sda1! are conspiring to suck 100%
CPU time, even after powering off the drive. A bunch of processes are
stuck in 'D' state, possibly because they're trying to access the dead
disk, which won't umount ("device is busy").

I have not been able to test this setup with the ub driver yet (is it
possible with ub as a module, and usb-storage built in?).

I don't know how repeatable this is, but the processes are still going
and I'm happy to provide more diagnostics if you tell me what's
useful. I'll reboot tomorrow to get use of the drive again.

dmesg output from alt-sysrq-[wpp] is attached.

I have the impression that reiser4 development is cool these days, and
I'll likely reformat and try ext3 if nobody asks for more tests with
reiser4.

-Eric
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
kswapd0   D EB3F9BDC 0   195  6   196   165 (L-TLB)
   f7c61db4 0046 c06fc3e0 eb3f9bdc 0001116d 0096 db4e5f3c 0009 
   c1b9c070 eb3fa997 0001116d 00185c88 c1b9c164 eb3f9fef 0001116d f34a9240 
   f7c61e04 c126a560 f7c61dd0 f7c61dd8 c04be097 0001 c1b9c070 c0115fb9 
Call Trace:
 [] wait_for_completion+0x5a/0x83
 [] default_wake_function+0x0/0xc
 [] write_page_by_ent+0xdc/0xe6
 [] shrink_inactive_list+0x38a/0x712
 [] isolate_lru_pages+0x4e/0x195
 [] shrink_active_list+0x2fd/0x305
 [] shrink_zone+0xbe/0xdf
 [] kswapd+0x26e/0x391
 [] autoremove_wake_function+0x0/0x35
 [] kswapd+0x0/0x391
 [] kthread+0xa0/0xc9
 [] kthread+0x0/0xc9
 [] kernel_thread_helper+0x7/0x10
 ===
X D D44363E0 0  3946   3244  4146   (NOTLB)
   f3bc3d0c 3082 1502a8c0 d44363e0 3286 f4218800 d1593934 0009 
   f3ba6ad0 f67dce8a 000111e7 0006cd95 f3ba6bc4 f67dce8a 000111e7 f3cbe140 
   f3bc3d5c c137acc0 f3bc3d28 f3bc3d30 c04be097 0001 f3ba6ad0 c0115fb9 
Call Trace:
 [] wait_for_completion+0x5a/0x83
 [] default_wake_function+0x0/0xc
 [] write_page_by_ent+0xdc/0xe6
 [] shrink_inactive_list+0x38a/0x712
 [] shrink_active_list+0x2fd/0x305
 [] shrink_zone+0xbe/0xdf
 [] try_to_free_pages+0x148/0x240
 [] __alloc_pages+0x18f/0x28c
 [] __pte_alloc+0xd/0x58
 [] move_page_tables+0x92/0x1e6
 [] do_mremap+0x395/0x502
 [] sys_mremap+0x35/0x50
 [] sysenter_past_esp+0x5d/0x81
 ===
gtk-gnutella  D 2C679F64 0  4162   4158  41844166  4161 (NOTLB)
   f2881cfc 0086 f2881cdc 2c679f64 f2881cdc c06f5220  0009 
   f41f1540 01d88b49 00011179 7a0b f41f1634 00011197 c06f5220 f39d6ba0 
   f2881d4c c122f520 f2881d18 f2881d20 c04be097 0001 f41f1540 c0115fb9 
Call Trace:
 [] wait_for_completion+0x5a/0x83
 [] default_wake_function+0x0/0xc
 [] write_page_by_ent+0xdc/0xe6
 [] shrink_inactive_list+0x38a/0x712
 [] page_referenced_one+0x9b/0xb2
 [] page_referenced+0x63/0xc7
 [] shrink_active_list+0x2fd/0x305
 [] shrink_zone+0xbe/0xdf
 [] try_to_free_pages+0x148/0x240
 [] __alloc_pages+0x18f/0x28c
 [] vma_merge+0xf8/0x17f
 [] __handle_mm_fault+0x378/0x772
 [] do_page_fault+0x20a/0x512
 [] do_page_fault+0x0/0x512
 [] error_code+0x74/0x7c
 ===
mkzftree  D 0012 0  7247  1  7589  7854 (NOTLB)
   e5c75ad8 0086 0046 0012  0046 c226f960 0009 
   e62a2030 e4fe6a99 0001116e 0bf6 e62a2124 2967d2dd 0001116e e8775c60 
   dde78344 d2a36d80 dde7836c c226f960 c01af79c  e62a2030 c0126000 
Call Trace:
 [] reiser4_go_to_sleep+0x52/0x6b
 [] autoremove_wake_function+0x0/0x35
 [] capture_fuse_wait+0x80/0xd5
 [] wait_for_fusion+0x0/0x17
 [] reiser4_try_capture+0x40a/0x461
 [] longterm_lock_znode+0x1df/0x28b
 [] coord_by_han