Re: [OpenAFS] Terminal server/Citrix & AFS Client

Jeffrey Altman Wed, 03 Sep 2008 17:41:10 -0700

Robin Manke-Cassidy wrote:
> We are running a Citrix farm using AFS as the main storage for the
> users.  We have been experiencing some stability issues with the
> client.  I am looking for anyone that is using terminal server or Citrix
> in a high volume situation.  Here are the symptoms that we are experiencing: 
> 
> 1.  No one can get to the S drive, even after a log out and relaunch of
> an app.  In this case, we've found that the Service is running, but is
> completely non-responsive.  AFS space is at this point, completely
> unusable, and a service restart works only 10-20% of the time.


That indicates either a deadlock or the smb client has dropped the
connection to the "AFS" file service.

If it is a deadlock it is bug that needs to be fixed.  You can obtain a
minidump of the process with "fs minidump" and have it examined offline
to determine if it is in fact a deadlock.

If it is not a deadlock, it could be that too many previous SMB requests
took longer than the SMB client's 45 second timeout period.  In which
case it backs off to reduce load on the SMB server.

What errors or warnings are you seeing in the Windows Application Event
Log?

I have in the last couple of weeks implemented deadlock detection code
within afsd_service.exe.  I sent a link to Jack Hsu earlier today a link
to a private build that implements many fixes based upon the potential
deadlocks that the lock order validation code identified.

> 2.  The client seems to stop responding while folks are on the server. 
> Those that attached before the failure seem to be ok.  

Normally I would say this sounds odd but ...

> Anyone new coming on a server is without an S: drive.  

Are attempts to communicate with the "AFS" file service failing with
an authentication failure?  Perhaps "wrong password" or something else?

I ask because I fixed a bug last week that was leaking memory if the
SMB client was attempting to authenticate and failed.  The leaked memory
was allocated by the LSA so it could have resulted in the LSA running
out of memory.  If so, there would be errors logged to the afsd log
files if "fs trace" was actived.  This fix is also in the private
build I pointed Jack Hsu at.

> 75-90% of the time, a service restart fixes the issue.

But not the rest of the time.  What are the error conditions?

What does the afsd_init.log file report?

Is the "AFS" netbios name being registered in the failure case?

> 3.  One or two people on a server are unable to get to their S: drive. 
> All existing with an S: drive and most to all new sessions get the S:
> drive.  This appears to be caused by the client not initializing
> correctly within the session startup, and is resolved nearly 100% of the
> time with a logout and relaunch of the app.

What is "client" in this context?

> The S: drive is the AFS mounted volume.

When you execute "NET USE", what is S: mapped to?

Is it the freelance root.afs volume?

The cell's root.afs volume?

A per-user home directory?

What does "fs examine \\afs\<cell>#<volume>\" report as the status
for the volume in question?

---

More general questions:

What version of OpenAFS are you using?

How have you tuned the client?  The default values are not appropriate
for a multi-user system?

Is this 32-bit or 64-bit Citrix?  64-bit is strongly recommended as the
maximum cache size on 32-bit systems is ~1GB.

Are there communication problems between the Citrix machines and the
AFS file servers?   We have been seeing problems recently with Rx jumbo
grams and networks that are less than friendly to fragmented UDP packets.

Jeffrey Altman
Secure Endpoints Inc.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Terminal server/Citrix & AFS Client

Reply via email to