Instead of discussing infinite possibilities it would be useful to be able to discuss particular use cases. As much as I dislike Andrei Maslennikov's CMS and ATLAS results, the biggest struggle those in the OpenAFS community encounter when trying to address them is finding out what the parameters of the CMS and ATLAS jobs are. Earlier this month at DESY we finally found out enough of the details to understand some of the issues.
What are the batch jobs that are bringing a file server to a halt? We know that they all take place within a single volume but there is a lot we don't know: . how are the file servers configured? . how many client machines are issuing requests against the volume? . how many client processes? . how many client PAGs are involved since that affects the maximum number of outstanding RPCs in parallel? . how are the client machines configured? . are the jobs more like CMS (large sequential reads) or like ATLAS (small random seeks with very large files) or something different? . are the jobs read only or read/write? . if read/write, are the jobs creating and removing large numbers of files in a common set of directories? . if read/write, are the jobs competing for access to a common set of files? . does the data once read or written get used again on the same client? Separate and apart from the bottlenecks in the file server, it is really important to understand the requirements of the job that is executing and how both the configuration of the client cache manager and the file server will impact it. It is also critical to understand how the AFS cache coherency model is going to impact these jobs. AFS is a caching file system. It therefore must ensure that cache coherency is maintained. The most critical aspect of this is data visibility. A file system is an inter-process communication path that can be used in parallel with other inter-process communication mechanisms. It is critical that a client that performs a file creation or a data store not be told that the operation has completed until such time as all of the other clients accessing that directory or file are told that their cached data is invalid. Otherwise, a message transmitted from process A that performed the data changing RPC could be received by process B before the cache invalidation message was received. This race would cause process B to read the wrong data. In exactly the same way that CMS and ATLAS jobs tune their datasets to optimize for certain conditions it is appropriate to optimize the datasets stored in AFS to be aware of the cache coherency requirements. I fear that the problem that is being faced here is primarily due to the enforcement of cache coherency. In particular, are these jobs designed in such a manner that there are more client accessing and modifying a given directory or subset of directories than there are threads in the file server? Is the file server grinding to a halt because each store operation requires that every client taking part in the job be notified of the change before the next one can begin to execute? More information about the jobs would really be helpful. Jeffrey Altman
signature.asc
Description: OpenPGP digital signature
