Robin Manke-Cassidy wrote: > We are running a Citrix farm using AFS as the main storage for the > users. We have been experiencing some stability issues with the > client. I am looking for anyone that is using terminal server or Citrix > in a high volume situation. Here are the symptoms that we are experiencing: > > 1. No one can get to the S drive, even after a log out and relaunch of > an app. In this case, we've found that the Service is running, but is > completely non-responsive. AFS space is at this point, completely > unusable, and a service restart works only 10-20% of the time.
That indicates either a deadlock or the smb client has dropped the connection to the "AFS" file service. If it is a deadlock it is bug that needs to be fixed. You can obtain a minidump of the process with "fs minidump" and have it examined offline to determine if it is in fact a deadlock. If it is not a deadlock, it could be that too many previous SMB requests took longer than the SMB client's 45 second timeout period. In which case it backs off to reduce load on the SMB server. What errors or warnings are you seeing in the Windows Application Event Log? I have in the last couple of weeks implemented deadlock detection code within afsd_service.exe. I sent a link to Jack Hsu earlier today a link to a private build that implements many fixes based upon the potential deadlocks that the lock order validation code identified. > 2. The client seems to stop responding while folks are on the server. > Those that attached before the failure seem to be ok. Normally I would say this sounds odd but ... > Anyone new coming on a server is without an S: drive. Are attempts to communicate with the "AFS" file service failing with an authentication failure? Perhaps "wrong password" or something else? I ask because I fixed a bug last week that was leaking memory if the SMB client was attempting to authenticate and failed. The leaked memory was allocated by the LSA so it could have resulted in the LSA running out of memory. If so, there would be errors logged to the afsd log files if "fs trace" was actived. This fix is also in the private build I pointed Jack Hsu at. > 75-90% of the time, a service restart fixes the issue. But not the rest of the time. What are the error conditions? What does the afsd_init.log file report? Is the "AFS" netbios name being registered in the failure case? > 3. One or two people on a server are unable to get to their S: drive. > All existing with an S: drive and most to all new sessions get the S: > drive. This appears to be caused by the client not initializing > correctly within the session startup, and is resolved nearly 100% of the > time with a logout and relaunch of the app. What is "client" in this context? > The S: drive is the AFS mounted volume. When you execute "NET USE", what is S: mapped to? Is it the freelance root.afs volume? The cell's root.afs volume? A per-user home directory? What does "fs examine \\afs\<cell>#<volume>\" report as the status for the volume in question? --- More general questions: What version of OpenAFS are you using? How have you tuned the client? The default values are not appropriate for a multi-user system? Is this 32-bit or 64-bit Citrix? 64-bit is strongly recommended as the maximum cache size on 32-bit systems is ~1GB. Are there communication problems between the Citrix machines and the AFS file servers? We have been seeing problems recently with Rx jumbo grams and networks that are less than friendly to fragmented UDP packets. Jeffrey Altman Secure Endpoints Inc.
smime.p7s
Description: S/MIME Cryptographic Signature
