On Wed, Feb 4, 2009 at 4:38 PM, Will Maier <[email protected]> wrote: > Hi folks- > > In the past, we've observed prolonged periods where one or more of > our servers would report more than 200 calls waiting for a thread. > This occurred again this morning and lasted for about four hours.
bos status (fileserverhost) fs -long and post that information? However, lots of bugs which would affect this fixed since 1.4.1, which is ancient. > While the server reported the blocked calls, top showed that the > fileserver was pegged at >= 100% CPU and FileLog (with verbosity > increased via SIGTSTP) showed a huge number of SAFS_FetchStatuses > (and very little else). > > During this time, I also noticed that the number of blocked calls > seemed to oscillate between 0 and ~220 over a period of about 100 > seconds (with ~1300 total clients according to the hosts.dump file). > This made me wonder if there wasn't some component that was > periodically clearing the backlog and, if so, if the period might be > easily modifiable. > > This condition tends to coincide with a large number of batch jobs > that, unfortunately, must get some of their shared libraries, > binaries and configuration/seed files from our AFS cell. We've done > as much as we can to limit the amount of data in AFS that these jobs > require, but we still observe blocked calls, especially when a large > number of jobs spin up at approximately the same time. It's also > possible that the jobs are overwhelming the clients' caches, which > could conceivably cause extra/spurious calls to the server. Is this > a possibility? > > If the periodicity of the backlog's level is a red herring, is there > something else we might consider? Yes. OpenAFS 1.4.8. -- Derrick _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
