On Tue, Feb 08, 2005 at 05:28:10PM -0500, Jeffrey Hutzelman wrote:
> If you are noticing actual problems, rather than just messages in your 
> FileLog, please describe what they are.

Sorry, it was a bit late yesterday. I had found out that I had not described
any problem myself...

The problem is an apparently hanging file server. The 1.2.11 on Linux 2.4.21
clients sends FetchStatus requests on behalf of different uid's in high volume.
RX challenge/response proceeds, as do ptserver queries by the file server. The
file server also sends the WhoAreYou requests back to the client which are
normally replied to very quickly. The routine in afs_callback.c does not have
to work particularly hard. But after some requests have been successfully
processed, some WhoAreYou call seems stuck in the client. The server is forced
to re-send the query for in my case about 90 seconds or even longer, the client
sends a RX-level ack, but no WhoAreYou replies. Meanwhile other FetchStatus
requests pile up, all obviously waiting in a queue to get their chance to issue
the WhoAreYou requests.

During this state, a lot of RX Ack stuff is going on, but nothing else. No CPU
on either the client or server side, no disk activity, and no network traffic
that might remotely create a bottleneck.

When the server finally gives up, normal processing resumes. I created a
version of afs_callback.c without the ObtainReadLock by hard-coding the
information requested, but this did not help, the client seems stuck in another
lock. Not asking for the WhoAreYou at all obviously helped, but this can at
best be called a hack.

Volker

Attachment: pgpkTi37Cp0gy.pgp
Description: PGP signature

Reply via email to