Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, 7 Mar 2007, Pete Zaitcev wrote: > On Wed, 7 Mar 2007 17:18:29 -0500 (EST), Alan Stern <[EMAIL PROTECTED]> wrote: > > > I've never heard of a process failing to show up in a SysRq-t listing. It > > suggests something is wrong with the process management in the kernel you > > were using. That leads me to think a non -mm kernel might give more > > informative results. > > I think, if a process is looping, it's not shown in SysRq-t. So maybe > khubd is on a CPU. You mean, if it is currently running? I don't believe that. A simple test comparison shows every process listed in "ps -A" also listed in SysRq-t. > In RHEL we have a patch for SysRq-w, which showed all CPU states by > the way of a special IPI (unless looping with closed interrups, of course). > But this capability seems a bit degraded in stock SysRq-w. It might not > catch this (does not seem for me in 2.6.20). > > Another possibility is, something killed khubd. It's only a process > after all. Remember how we had grief with it being killed by "telinit 1". If it was killed then it wouldn't show up in "ps" or as a directory under /proc. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, 7 Mar 2007 17:18:29 -0500 (EST), Alan Stern <[EMAIL PROTECTED]> wrote: > I've never heard of a process failing to show up in a SysRq-t listing. It > suggests something is wrong with the process management in the kernel you > were using. That leads me to think a non -mm kernel might give more > informative results. I think, if a process is looping, it's not shown in SysRq-t. So maybe khubd is on a CPU. In RHEL we have a patch for SysRq-w, which showed all CPU states by the way of a special IPI (unless looping with closed interrups, of course). But this capability seems a bit degraded in stock SysRq-w. It might not catch this (does not seem for me in 2.6.20). Another possibility is, something killed khubd. It's only a process after all. Remember how we had grief with it being killed by "telinit 1". -- Pete - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, 7 Mar 2007, Eric Buddington wrote: > On Wed, Mar 07, 2007 at 03:22:05PM -0500, Alan Stern wrote: > > On Wed, 7 Mar 2007, Eric Buddington,,, wrote: > > > > > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give > > > > > me anything at all. What else can I try? > > > > How about SysRq-r? > > SysRq : Keyboard mode set to XLATE Whoops, I was thinking of SysRq-p, which you have already tried. > > These problems start with some USB resets, right? Did they occur with > > earlier kernel versions, or is this new behavior? > > Yes, the problem starts with USB resets (or USB errors that trigger a reset) > > > How often does the problem occur? > > Recently, the USB drive has choked up after several hours of moderate > use. However, before this instance, it would consistently hang up my > watchdog process and force a system reboot (no idea why; the watchdog > process didn't use this drive at all). This may have changed when I > upgraded from 2.6.20-rc6-mm3 to 2.6.20-mm2, but my sample size is to > small to be sure. What about earlier kernels? Does 2.6.19 work any better? What about non -mm kernels, like 2.6.21-rc2? I've never heard of a process failing to show up in a SysRq-t listing. It suggests something is wrong with the process management in the kernel you were using. That leads me to think a non -mm kernel might give more informative results. And as long as you're testing, you might as well also turn on CONFIG_USB_DEBUG. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, Mar 07, 2007 at 03:22:05PM -0500, Alan Stern wrote: > On Wed, 7 Mar 2007, Eric Buddington,,, wrote: > > > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give > > > > me anything at all. What else can I try? > > How about SysRq-r? SysRq : Keyboard mode set to XLATE > These problems start with some USB resets, right? Did they occur with > earlier kernel versions, or is this new behavior? Yes, the problem starts with USB resets (or USB errors that trigger a reset) > How often does the problem occur? Recently, the USB drive has choked up after several hours of moderate use. However, before this instance, it would consistently hang up my watchdog process and force a system reboot (no idea why; the watchdog process didn't use this drive at all). This may have changed when I upgraded from 2.6.20-rc6-mm3 to 2.6.20-mm2, but my sample size is to small to be sure. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, 7 Mar 2007, Eric Buddington,,, wrote: > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give > > > me anything at all. What else can I try? How about SysRq-r? > > I'm baffled. khubd should have shown up as the process with ID 163. Is > > that process listed under a different name? > > It does show up under /proc/163, for whatever that's worth. Maybe that means the process is dying but isn't completely dead yet, and it's stuck running something inside the reiser4 driver. Unless we can find out what it is, though, there isn't much we can do. > Going through the list of processes dumped by SysRq-t, here are the > ones I didn't knowingly start myself: > > aio/0 > ata/0 > ata_aux > ent:md1. > events/0 > ib_addr > ib_cm/0 > ib_mcast > iw_cm_wq > kacpid > kblockd/0 > kcryptd/0 > khelper > khpsbpkt > kmirrord > kmpathd/0 > kprefetchd > kpsmoused > kseriod > ksnapd > ksuspend_usbd > kswapd0 > kthread > ktxnmgrd:md1: > ktxnmgrd:sda1 > md2_raid1 > pdflush > rdma_cm > reiserfs/0 > scsi_eh_0 > watchdog/0 Most of those (maybe all of them) are built-in kernel threads. > And khubd is still showing up as a major CPU consumer. Interestingly, > ent:sda1! is also absent from the SysRq-t listing, though present (and > using lost of CPU) according to 'top' (ps, oddly, hangs in 'D' state > after listing some processes). Something is badly messed up somewhere, but I have no idea what it could be. These problems start with some USB resets, right? Did they occur with earlier kernel versions, or is this new behavior? How often does the problem occur? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Wed, Mar 07, 2007 at 11:03:21AM -0500, Alan Stern wrote: > On Tue, 6 Mar 2007, Eric Buddington wrote: > > > On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote: > > > The stack trace didn't include the khubd process at all. Probably that > > > means it had already died. > > > > No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and > > khubd did not show up in dmesg: > > > > -bash-2.05b# echo t >/proc/sysrq-trigger > > -bash-2.05b# dmesg | grep khub > > -bash-2.05b# dmesg | grep SysRq > > SysRq : Show State > > SysRq : Show State > > -bash-2.05b# ps ax | grep khubd > > 163 ?R< 633:41 [khubd] > > -bash-2.05b# echo p >/proc/sysrq-trigger > > -bash-2.05b# dmesg | tail -2 > > === > > SysRq : Show Regs > > -bash-2.05b# > > > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give > > me anything at all. What else can I try? > > I'm baffled. khubd should have shown up as the process with ID 163. Is > that process listed under a different name? It does show up under /proc/163, for whatever that's worth. Going through the list of processes dumped by SysRq-t, here are the ones I didn't knowingly start myself: aio/0 ata/0 ata_aux ent:md1. events/0 ib_addr ib_cm/0 ib_mcast iw_cm_wq kacpid kblockd/0 kcryptd/0 khelper khpsbpkt kmirrord kmpathd/0 kprefetchd kpsmoused kseriod ksnapd ksuspend_usbd kswapd0 kthread ktxnmgrd:md1: ktxnmgrd:sda1 md2_raid1 pdflush rdma_cm reiserfs/0 scsi_eh_0 watchdog/0 And khubd is still showing up as a major CPU consumer. Interestingly, ent:sda1! is also absent from the SysRq-t listing, though present (and using lost of CPU) according to 'top' (ps, oddly, hangs in 'D' state after listing some processes). -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Tue, 6 Mar 2007, Eric Buddington wrote: > On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote: > > The stack trace didn't include the khubd process at all. Probably that > > means it had already died. > > No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and > khubd did not show up in dmesg: > > -bash-2.05b# echo t >/proc/sysrq-trigger > -bash-2.05b# dmesg | grep khub > -bash-2.05b# dmesg | grep SysRq > SysRq : Show State > SysRq : Show State > -bash-2.05b# ps ax | grep khubd > 163 ?R< 633:41 [khubd] > -bash-2.05b# echo p >/proc/sysrq-trigger > -bash-2.05b# dmesg | tail -2 > === > SysRq : Show Regs > -bash-2.05b# > > So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give > me anything at all. What else can I try? I'm baffled. khubd should have shown up as the process with ID 163. Is that process listed under a different name? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Tue, Mar 06, 2007 at 01:34:41PM -0500, Alan Stern wrote: > The stack trace didn't include the khubd process at all. Probably that > means it had already died. No, it's still there. I ran 'echo t >/proc/sysrq-trigger' again, and khubd did not show up in dmesg: -bash-2.05b# echo t >/proc/sysrq-trigger -bash-2.05b# dmesg | grep khub -bash-2.05b# dmesg | grep SysRq SysRq : Show State SysRq : Show State -bash-2.05b# ps ax | grep khubd 163 ?R< 633:41 [khubd] -bash-2.05b# echo p >/proc/sysrq-trigger -bash-2.05b# dmesg | tail -2 === SysRq : Show Regs -bash-2.05b# So SysRq-t doesn't show anything about khubd, and SysRq-p doesn't give me anything at all. What else can I try? -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Tue, 6 Mar 2007, Eric Buddington wrote: > On Tue, Mar 06, 2007 at 10:36:20AM -0500, Alan Stern wrote: > > On Tue, 6 Mar 2007, Oliver Neukum wrote: > > > > > > Am Dienstag, 6. M??rz 2007 05:13 schrieb Eric Buddington: > > > > reiser4[khubd(163)]: commit_current_atom > > > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > > > > WARNING: Flushing like mad: 16384 > > > > reiser4[khubd(163)]: commit_current_atom > > > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > > > > WARNING: Flushing like mad: 32768 > > > > ... > > > > > > > > Most problematically, khubd and ent:sda1! are conspiring to suck 100% > > > > CPU time, even after powering off the drive. A bunch of processes are > > > > stuck in 'D' state, possibly because they're trying to access the dead > > > > disk, which won't umount ("device is busy"). > > > > > > It looks like khubd allocates memory and enters reiser4. Possibly we have > > > GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has > > > a problem dealing with IO failures. > > > > A more complete stack trace (for example, Alt-SysRq-T) would help. The stack trace didn't include the khubd process at all. Probably that means it had already died. On the good side, if khubd is dead, it can't be using up 100% of the CPU time! :-) You need to find out somehow what khubd is doing while it is so busy. It could easily be, like Oliver suggested, that reiser4 is unable to handle I/O failures and gets stuck. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] khubd and ent:sda1 sucking CPU with reiser4 + USB HD
On Tue, 6 Mar 2007, Oliver Neukum wrote: > > Am Dienstag, 6. März 2007 05:13 schrieb Eric Buddington: > > reiser4[khubd(163)]: commit_current_atom > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > > WARNING: Flushing like mad: 16384 > > reiser4[khubd(163)]: commit_current_atom > > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > > WARNING: Flushing like mad: 32768 > > ... > > > > Most problematically, khubd and ent:sda1! are conspiring to suck 100% > > CPU time, even after powering off the drive. A bunch of processes are > > stuck in 'D' state, possibly because they're trying to access the dead > > disk, which won't umount ("device is busy"). > > It looks like khubd allocates memory and enters reiser4. Possibly we have > GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has > a problem dealing with IO failures. A more complete stack trace (for example, Alt-SysRq-T) would help. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: khubd and ent:sda1 sucking CPU with reiser4 + USB HD
Am Dienstag, 6. März 2007 05:13 schrieb Eric Buddington: > reiser4[khubd(163)]: commit_current_atom > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > WARNING: Flushing like mad: 16384 > reiser4[khubd(163)]: commit_current_atom > (fs/reiser4/txnmgr.c:1049)[nikita-3176]: > WARNING: Flushing like mad: 32768 > ... > > Most problematically, khubd and ent:sda1! are conspiring to suck 100% > CPU time, even after powering off the drive. A bunch of processes are > stuck in 'D' state, possibly because they're trying to access the dead > disk, which won't umount ("device is busy"). It looks like khubd allocates memory and enters reiser4. Possibly we have GFP_KERNEL in khubd where we should have GFP_NOIO or reiser4 has a problem dealing with IO failures. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
khubd and ent:sda1 sucking CPU with reiser4 + USB HD
Kernel 2.6.20-mm2 on an Athlon XP. Caveat: I know my USB cabling may cause the initial failure, but I think the system should be failing more gracefully. After hours of backing up to my external USB drive, I see in dmesg: APIC error on CPU0: 00(02) APIC error on CPU0: 02(02) APIC error on CPU0: 02(02) ... usb 1-6.2: reset high speed USB device using ehci_hcd and address 7 usb 1-6.2: device descriptor read/64, error -110 usb 1-6.2: device descriptor read/64, error -110 ... sd 0:0:0:0: scsi: Device offlined - not ready after error recovery sd 0:0:0:0: SCSI error: return code = 0x0005 end_request: I/O error, dev sda, sector 612518599 sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: SCSI error: return code = 0x0001 end_request: I/O error, dev sda, sector 612518839 sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device ... reiser4[khubd(163)]: commit_current_atom (fs/reiser4/txnmgr.c:1049)[nikita-3176]: WARNING: Flushing like mad: 16384 reiser4[khubd(163)]: commit_current_atom (fs/reiser4/txnmgr.c:1049)[nikita-3176]: WARNING: Flushing like mad: 32768 ... Most problematically, khubd and ent:sda1! are conspiring to suck 100% CPU time, even after powering off the drive. A bunch of processes are stuck in 'D' state, possibly because they're trying to access the dead disk, which won't umount ("device is busy"). I have not been able to test this setup with the ub driver yet (is it possible with ub as a module, and usb-storage built in?). I don't know how repeatable this is, but the processes are still going and I'm happy to provide more diagnostics if you tell me what's useful. I'll reboot tomorrow to get use of the drive again. dmesg output from alt-sysrq-[wpp] is attached. I have the impression that reiser4 development is cool these days, and I'll likely reformat and try ext3 if nobody asks for more tests with reiser4. -Eric SysRq : Show Blocked State freesibling task PCstack pid father child younger older kswapd0 D EB3F9BDC 0 195 6 196 165 (L-TLB) f7c61db4 0046 c06fc3e0 eb3f9bdc 0001116d 0096 db4e5f3c 0009 c1b9c070 eb3fa997 0001116d 00185c88 c1b9c164 eb3f9fef 0001116d f34a9240 f7c61e04 c126a560 f7c61dd0 f7c61dd8 c04be097 0001 c1b9c070 c0115fb9 Call Trace: [] wait_for_completion+0x5a/0x83 [] default_wake_function+0x0/0xc [] write_page_by_ent+0xdc/0xe6 [] shrink_inactive_list+0x38a/0x712 [] isolate_lru_pages+0x4e/0x195 [] shrink_active_list+0x2fd/0x305 [] shrink_zone+0xbe/0xdf [] kswapd+0x26e/0x391 [] autoremove_wake_function+0x0/0x35 [] kswapd+0x0/0x391 [] kthread+0xa0/0xc9 [] kthread+0x0/0xc9 [] kernel_thread_helper+0x7/0x10 === X D D44363E0 0 3946 3244 4146 (NOTLB) f3bc3d0c 3082 1502a8c0 d44363e0 3286 f4218800 d1593934 0009 f3ba6ad0 f67dce8a 000111e7 0006cd95 f3ba6bc4 f67dce8a 000111e7 f3cbe140 f3bc3d5c c137acc0 f3bc3d28 f3bc3d30 c04be097 0001 f3ba6ad0 c0115fb9 Call Trace: [] wait_for_completion+0x5a/0x83 [] default_wake_function+0x0/0xc [] write_page_by_ent+0xdc/0xe6 [] shrink_inactive_list+0x38a/0x712 [] shrink_active_list+0x2fd/0x305 [] shrink_zone+0xbe/0xdf [] try_to_free_pages+0x148/0x240 [] __alloc_pages+0x18f/0x28c [] __pte_alloc+0xd/0x58 [] move_page_tables+0x92/0x1e6 [] do_mremap+0x395/0x502 [] sys_mremap+0x35/0x50 [] sysenter_past_esp+0x5d/0x81 === gtk-gnutella D 2C679F64 0 4162 4158 41844166 4161 (NOTLB) f2881cfc 0086 f2881cdc 2c679f64 f2881cdc c06f5220 0009 f41f1540 01d88b49 00011179 7a0b f41f1634 00011197 c06f5220 f39d6ba0 f2881d4c c122f520 f2881d18 f2881d20 c04be097 0001 f41f1540 c0115fb9 Call Trace: [] wait_for_completion+0x5a/0x83 [] default_wake_function+0x0/0xc [] write_page_by_ent+0xdc/0xe6 [] shrink_inactive_list+0x38a/0x712 [] page_referenced_one+0x9b/0xb2 [] page_referenced+0x63/0xc7 [] shrink_active_list+0x2fd/0x305 [] shrink_zone+0xbe/0xdf [] try_to_free_pages+0x148/0x240 [] __alloc_pages+0x18f/0x28c [] vma_merge+0xf8/0x17f [] __handle_mm_fault+0x378/0x772 [] do_page_fault+0x20a/0x512 [] do_page_fault+0x0/0x512 [] error_code+0x74/0x7c === mkzftree D 0012 0 7247 1 7589 7854 (NOTLB) e5c75ad8 0086 0046 0012 0046 c226f960 0009 e62a2030 e4fe6a99 0001116e 0bf6 e62a2124 2967d2dd 0001116e e8775c60 dde78344 d2a36d80 dde7836c c226f960 c01af79c e62a2030 c0126000 Call Trace: [] reiser4_go_to_sleep+0x52/0x6b [] autoremove_wake_function+0x0/0x35 [] capture_fuse_wait+0x80/0xd5 [] wait_for_fusion+0x0/0x17 [] reiser4_try_capture+0x40a/0x461 [] longterm_lock_znode+0x1df/0x28b [] coord_by_han