Revisiting this, since you said it's still happening:

On Wed, Jan 27, 2010 at 12:26 PM, Derrick Brashear <[email protected]> wrote:
> On Wed, Jan 27, 2010 at 12:10 PM, Adam Megacz <[email protected]> wrote:
>>
>> Derrick Brashear <[email protected]> writes:
>>>> I might be able to try that, but it will take a few days.
>>>
>>> if true, you should see output in cmdebug now
>>
>> Okay, I just caught it red-handed.  Can anybody help with reading the
>> tea leaves here?
>>
>>  meg...@quine:~$cmdebug localhost
>>  Lock afs_xvcache status: (none_waiting, write_locked(pid:11013 at:335))
>
>       writelocked = (0 == NBObtainWriteLock(&afs_xvcache, 335));
>
> in afs_vop_reclaim
>
> xvreclaim not held, which means we're presumably in afs_FlushVCache.
>
>>  Lock afs_xserver status: (none_waiting, 1 read_locks(pid:0))
>
> somewhere has afs_xserver read locked. for obvious reasons we can't
> track these. no one's blocked on it.
>
>>  Lock afs_xvcb status: (writer_waiting, write_locked(pid:0 at:273), 1 
>> waiters)
>
>        ObtainWriteLock(&afs_xvcb, 273);
>
> is in afs_FlushVCBs (called with lockit true). assuming you're not
> running disconnected and actively trying to disconnect, this is the
> system daemon which does this (afs_Daemon). that also explains
> "pid:0". We don't know who's waiting, but only this, QueueVCB and
> RemoveVCB actually *get* afs_xvcb.
>
> So, let's be clever. FlushVCache? Calls QueueVCB. So we can assume
> it's blocking.
>
> So then the question is why FlushVCBs is blocking you. well, you said
> you had multihomed fileservers.
>
> RXAFS_GiveUpCallBacks is called here. you didn't perchance grab
> rxdebug output for the client at this point? (no is fine, this is
> probably the answer)
>
> so, presumably (and now from memory, i'm not looking at the code) you
> block for like a minute while it times out a fileserver, then it fails
> over to another address, afs_Analyze returns shouldretry=1, you look,
> afs_ConnByHost probably gets the other address, and the loop proceeds
> and wins.

Ok, so, can you gather rxdebug (hungclient) 7001
and perhaps a couple minutes of
tcpdump -s 1500 -n -w /tmp/packets host (hungclient) and port 7001

at this point?
(specify an ethernet interface with -i if it's not the default that's
your upstream)
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to