On Thu, Aug 12, 2010 at 4:54 PM, Ryan C. Underwood <[email protected]> wrote: > > I have a system which acts as a NAT router (Ethernet) to share a CDMA > modem (USB). The same system runs the AFS client which talks to AFS > fileservers over the internet. > > Occasionally the modem is knocked offline, and when this happens the > Linux USB driver resets the modem. Whenever the modem is knocked > offline temporarily even once, the /afs mount and all processes that > were accessing it at the time that it was disconnected permanently hangs > until the system is rebooted. > > The kernel logs show hung_task messages always similar to the following, > always hanging in afs_PutVCache on each process accessing AFS at the > time: > > [ 4440.472856] INFO: task perl:21072 blocked for more than 120 seconds. > [ 4440.472861] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 4440.472866] perl D ffff88008dc8d0b8 0 21072 21071 0x00000000 > [ 4440.472877] ffff8800921b1b58 0000000000000086 ffff880000000000 > 0000000000015900 > [ 4440.472887] ffff8800921b1fd8 0000000000015900 ffff8800921b1fd8 > ffff8800902196e0 > [ 4440.472897] 0000000000015900 0000000000015900 ffff8800921b1fd8 > 0000000000015900 > [ 4440.472907] Call Trace: > [ 4440.472962] [<ffffffffa0638089>] ? afs_PutVCache+0x79/0x140 [openafs] > [ 4440.472973] [<ffffffff8158730f>] __mutex_lock_slowpath+0xff/0x190 > [ 4440.472982] [<ffffffff815871eb>] mutex_lock+0x2b/0x50 > [ 4440.472991] [<ffffffff8115d7b7>] do_lookup+0x107/0x280 > [ 4440.473000] [<ffffffff8115e1de>] link_path_walk+0x12e/0xab0 > [ 4440.473009] [<ffffffff8115e613>] link_path_walk+0x563/0xab0 > [ 4440.473016] [<ffffffff8115ecc7>] path_walk+0x67/0xe0 > [ 4440.473023] [<ffffffff8115ee9b>] do_path_lookup+0x5b/0xa0 > [ 4440.473031] [<ffffffff8115fb67>] user_path_at+0x57/0xa0 > [ 4440.473039] [<ffffffff81155c4c>] vfs_fstatat+0x3c/0x80 > [ 4440.473047] [<ffffffff81155d6b>] vfs_stat+0x1b/0x20 > [ 4440.473054] [<ffffffff81155d94>] sys_newstat+0x24/0x50 > [ 4440.473063] [<ffffffff8158c46e>] ? do_page_fault+0x15e/0x350 > [ 4440.473071] [<ffffffff81588fb5>] ? page_fault+0x25/0x30 > [ 4440.473080] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b > > Kernel is 2.6.35-13 and OpenAFS is 1.5.75 from the ubuntu repository. > > I don't know if it helps, but here is the output of cmdebug -long some > time after the hang: > > $ cmdebug localhost -long > Lock afs_xvcache status: (none_waiting) > Lock afs_xdcache status: (none_waiting) > Lock afs_xserver status: (none_waiting) > Lock afs_xvcb status: (none_waiting) > Lock afs_xbrs status: (none_waiting) > Lock afs_xcell status: (none_waiting) > Lock afs_xconn status: (none_waiting) > Lock afs_xuser status: (none_waiting) > Lock afs_xvolume status: (none_waiting) > Lock puttofile status: (none_waiting) > Lock afs_ftf status: (none_waiting) > Lock afs_xcbhash status: (none_waiting) > Lock afs_xaxs status: (none_waiting) > Lock afs_xinterface status: (none_waiting) > Lock afs_xosi status: (none_waiting) > Lock afs_xsrvAddr status: (none_waiting) > Lock afs_xvreclaim status: (none_waiting) > Lock afsdb_client_loc status: (none_waiting) > Lock afsdb_req_lock status: (none_waiting) > Lock afs_discon_lock status: (none_waiting, 1 read_locks(pid:0)) > Lock afs_disconDirtyL status: (none_waiting) > Lock afs_discon_vc_di status: (none_waiting) > Lock dynroot status: (none_waiting) > Lock icequake.net status: (none_waiting) > ** Cache entry @ 0x8dc8c000 for 0.1.1.1 [dynroot] > 2048 bytes DV 3 refcnt 3 > callback 00000000 expires 0 > 0 opens 0 writers > volume root > states (0x5), stat'd, read-only > ** Cache entry @ 0x8dc8d400 for 2.536870916.1.1 [icequake.net] > locks: (writer_waiting, write_locked(pid:18986 at:54), 1 waiters)
ok, but what was this pid? you'll want 1.5.76 shortly, for other reasons. -- Derrick _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
