On Thu, Aug 12, 2010 at 4:54 PM, Ryan C. Underwood
<[email protected]> wrote:
>
> I have a system which acts as a NAT router (Ethernet) to share a CDMA
> modem (USB).  The same system runs the AFS client which talks to AFS
> fileservers over the internet.
>
> Occasionally the modem is knocked offline, and when this happens the
> Linux USB driver resets the modem.  Whenever the modem is knocked
> offline temporarily even once, the /afs mount and all processes that
> were accessing it at the time that it was disconnected permanently hangs
> until the system is rebooted.
>
> The kernel logs show hung_task messages always similar to the following,
> always hanging in afs_PutVCache on each process accessing AFS at the
> time:
>
> [ 4440.472856] INFO: task perl:21072 blocked for more than 120 seconds.
> [ 4440.472861] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 4440.472866] perl          D ffff88008dc8d0b8     0 21072  21071 0x00000000
> [ 4440.472877]  ffff8800921b1b58 0000000000000086 ffff880000000000 
> 0000000000015900
> [ 4440.472887]  ffff8800921b1fd8 0000000000015900 ffff8800921b1fd8 
> ffff8800902196e0
> [ 4440.472897]  0000000000015900 0000000000015900 ffff8800921b1fd8 
> 0000000000015900
> [ 4440.472907] Call Trace:
> [ 4440.472962]  [<ffffffffa0638089>] ? afs_PutVCache+0x79/0x140 [openafs]
> [ 4440.472973]  [<ffffffff8158730f>] __mutex_lock_slowpath+0xff/0x190
> [ 4440.472982]  [<ffffffff815871eb>] mutex_lock+0x2b/0x50
> [ 4440.472991]  [<ffffffff8115d7b7>] do_lookup+0x107/0x280
> [ 4440.473000]  [<ffffffff8115e1de>] link_path_walk+0x12e/0xab0
> [ 4440.473009]  [<ffffffff8115e613>] link_path_walk+0x563/0xab0
> [ 4440.473016]  [<ffffffff8115ecc7>] path_walk+0x67/0xe0
> [ 4440.473023]  [<ffffffff8115ee9b>] do_path_lookup+0x5b/0xa0
> [ 4440.473031]  [<ffffffff8115fb67>] user_path_at+0x57/0xa0
> [ 4440.473039]  [<ffffffff81155c4c>] vfs_fstatat+0x3c/0x80
> [ 4440.473047]  [<ffffffff81155d6b>] vfs_stat+0x1b/0x20
> [ 4440.473054]  [<ffffffff81155d94>] sys_newstat+0x24/0x50
> [ 4440.473063]  [<ffffffff8158c46e>] ? do_page_fault+0x15e/0x350
> [ 4440.473071]  [<ffffffff81588fb5>] ? page_fault+0x25/0x30
> [ 4440.473080]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
>
> Kernel is 2.6.35-13 and OpenAFS is 1.5.75 from the ubuntu repository.
>
> I don't know if it helps, but here is the output of cmdebug -long some
> time after the hang:
>
> $ cmdebug localhost -long
> Lock afs_xvcache status: (none_waiting)
> Lock afs_xdcache status: (none_waiting)
> Lock afs_xserver status: (none_waiting)
> Lock afs_xvcb status: (none_waiting)
> Lock afs_xbrs status: (none_waiting)
> Lock afs_xcell status: (none_waiting)
> Lock afs_xconn status: (none_waiting)
> Lock afs_xuser status: (none_waiting)
> Lock afs_xvolume status: (none_waiting)
> Lock puttofile status: (none_waiting)
> Lock afs_ftf status: (none_waiting)
> Lock afs_xcbhash status: (none_waiting)
> Lock afs_xaxs status: (none_waiting)
> Lock afs_xinterface status: (none_waiting)
> Lock afs_xosi status: (none_waiting)
> Lock afs_xsrvAddr status: (none_waiting)
> Lock afs_xvreclaim status: (none_waiting)
> Lock afsdb_client_loc status: (none_waiting)
> Lock afsdb_req_lock status: (none_waiting)
> Lock afs_discon_lock status: (none_waiting, 1 read_locks(pid:0))
> Lock afs_disconDirtyL status: (none_waiting)
> Lock afs_discon_vc_di status: (none_waiting)
> Lock dynroot status: (none_waiting)
> Lock icequake.net status: (none_waiting)
> ** Cache entry @ 0x8dc8c000 for 0.1.1.1 [dynroot]
>            2048 bytes  DV            3  refcnt     3
>    callback 00000000   expires 0
>    0 opens     0 writers
>    volume root
>    states (0x5), stat'd, read-only
> ** Cache entry @ 0x8dc8d400 for 2.536870916.1.1 [icequake.net]
>    locks: (writer_waiting, write_locked(pid:18986 at:54), 1 waiters)

ok, but what was this pid?

you'll want 1.5.76 shortly, for other reasons.

-- 
Derrick
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to