Nice script., Dan! I was going to suggest running tcpdump to see if one client is accounting for most of the traffic. Some misconfiguration or a hardware problem out at the client end can definitely cause a headache for a server. (I dimly recall finding some client system that appeared to have two different afs clients installed and running, or trying to run, at the same time, causing a nasty load on a server.)
Best regards, Anne ________________________________ From: Dan Van Der Ster <[email protected]> To: "<[email protected]>" <[email protected]> Cc: "<[email protected]>" <[email protected]> Sent: Friday, August 16, 2013 4:15 AM Subject: Re: [OpenAFS] Investigating 'calls waiting' from rxdebug Hi, Whenever we get waiting calls it is ~always caused by one or two users hammering a fileserver from batch jobs. To find the culprit(s) you could try debugging the fileserver by sending the TSTP signal: http://rzdocs.uni-hohenheim.de/afs_3.6/debug/fs/fileserver.html We have a script that enables debugging for 3 seconds then parses the output to make a nice summary. It has some dependencies on our local perl mgmt api but perhaps you can adapt it to work for you. I copied it here: http://pastebin.com/B6De4idS Cheers, Dan On Aug 16, 2013, at 4:33 AM, [email protected] wrote: > Hi. > > In the past week we have had two frustrating periods of significant > performance problems in our > AFS cell. The first one lasted for maybe two hours, at which point it > seemed the culprit was > something odd-looking on two of our remote-access linux servers. I > rebooted those servers, and > the performance problems disappeared. That sounds good, but I was so > busy investigating > various red-herrings that the performance problems might have stopped > 15-20 minutes earlier, > and I just didn't notice until after I had done that reboot. This > incident, by itself, is not too > worrisome. > > Wednesday the significant (but intermittent) performance problems > returned, and there was > nothing particularly odd-looking on any machines I could see. Based on > some google searches, > we zeroed in on the fact that one of our file servers was reporting > rather high values for 'calls > waiting for a thread' in the output of 'rxdebug $fileserver -rxstats'. > The other file servers almost > always reported zero calls waiting, but on this one file server the value > tended to range between 5 > and 50. Occasionally it got over 100. And the higher the value, the > more likely we would see > performance problems on a wide variety of AFS clients. > > Googling some more showed that many people had reported that this value > was indeed a good > indicator of performance problems. And looking in log files on the file > servers we saw a few (but > not many) messages which pointed us to problems in our network. Most of > those looked like > minor problems, one or two were more significant and were magnified by > some heavy network > traffic which happened to be going on at the time. We fixed all of > those, and actually shut down > the process which was (legitimately) doing a lot of network I/O. These > were all good things to do, > and none of them made a bit of difference to the values we saw for 'calls > waiting" on that file > server, or on the very frustratingly hangs we were seeing on AFS clients. > > And then at 7:07am this morning, the problem disappeared. Completely. > The 'calls wating' value > on that server has not gone above zero for the entire rest of the day. > So, the immediate crisis is > over. Everything is working fine. > > But my question is: If this returns, how can I track down what is > *causing* the calls-waiting value > to climb? We had over 100 workstations using AFS at the time, scattered > all around campus. I did > a variety of things to try and pinpoint the culprit, but didn't have much > luck. > > So, given a streak of high values for 'call waiting', how can I track > that down to a specific client (or > clients), or maybe a specific AFS volume? > > -- > Garance Alistair Drosehn > Senior Systems Programmer > RPI; Troy NY > > > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
