Revisiting this, since you said it's still happening: On Wed, Jan 27, 2010 at 12:26 PM, Derrick Brashear <[email protected]> wrote: > On Wed, Jan 27, 2010 at 12:10 PM, Adam Megacz <[email protected]> wrote: >> >> Derrick Brashear <[email protected]> writes: >>>> I might be able to try that, but it will take a few days. >>> >>> if true, you should see output in cmdebug now >> >> Okay, I just caught it red-handed. Can anybody help with reading the >> tea leaves here? >> >> meg...@quine:~$cmdebug localhost >> Lock afs_xvcache status: (none_waiting, write_locked(pid:11013 at:335)) > > writelocked = (0 == NBObtainWriteLock(&afs_xvcache, 335)); > > in afs_vop_reclaim > > xvreclaim not held, which means we're presumably in afs_FlushVCache. > >> Lock afs_xserver status: (none_waiting, 1 read_locks(pid:0)) > > somewhere has afs_xserver read locked. for obvious reasons we can't > track these. no one's blocked on it. > >> Lock afs_xvcb status: (writer_waiting, write_locked(pid:0 at:273), 1 >> waiters) > > ObtainWriteLock(&afs_xvcb, 273); > > is in afs_FlushVCBs (called with lockit true). assuming you're not > running disconnected and actively trying to disconnect, this is the > system daemon which does this (afs_Daemon). that also explains > "pid:0". We don't know who's waiting, but only this, QueueVCB and > RemoveVCB actually *get* afs_xvcb. > > So, let's be clever. FlushVCache? Calls QueueVCB. So we can assume > it's blocking. > > So then the question is why FlushVCBs is blocking you. well, you said > you had multihomed fileservers. > > RXAFS_GiveUpCallBacks is called here. you didn't perchance grab > rxdebug output for the client at this point? (no is fine, this is > probably the answer) > > so, presumably (and now from memory, i'm not looking at the code) you > block for like a minute while it times out a fileserver, then it fails > over to another address, afs_Analyze returns shouldretry=1, you look, > afs_ConnByHost probably gets the other address, and the loop proceeds > and wins.
Ok, so, can you gather rxdebug (hungclient) 7001 and perhaps a couple minutes of tcpdump -s 1500 -n -w /tmp/packets host (hungclient) and port 7001 at this point? (specify an ethernet interface with -i if it's not the default that's your upstream) _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
