>>>>> "Jeffrey" == Jeffrey E Sussna <[EMAIL PROTECTED]> writes: Jeffrey> Here is a more detailed description of my potential AFS usage model. I Jeffrey> would appreciate any concrete feedback on AFS's suitability for it: Jeffrey> Basic model: process A writes file X. Process B and C read file X. Process Jeffrey> C writes file Y. Processes D, E, and F read file Y. Process F writes file Jeffrey> Z. A->B->C->D->E->F constitutes a "job". At the end of the job X, Y, and Z Jeffrey> can all be deleted, and should be as each machine will otherwise pile up a Jeffrey> few gigs of data per day. Things get interesting if a job crashes. I'd Jeffrey> rather not have to grunge around hundreds of machines looking for leftover Jeffrey> files that are safe to delete, or maintain global state to do it for me. Jeffrey> The entire system runs inside a private network. Many jobs run in parallel, Jeffrey> but there is no interaction between jobs. I am also investigating Jeffrey> publish-subscribe middleware such as Tibco and Talarian. I'll give you my take on this... First of all, let me arrogantly claim to have built one of (if not the) worlds most mission critical industrial strength AFS infrastructures in existence. Transarc certainly used to moan about the fact that we have pushed AFS into unseen territory here. Morgan Stanley's entire distributed systems infrastructure is based on AFS. Our UNIX desktops and servers are almost all dataless AFS clients, with both the operating systems as well as production applications being run from binaries, scripts and config files all being loaded from RO AFS volumes. For RO data, you simply can't beat AFS for scalability and reliability. We can debate that if you like... However, when it comes to RW data, this is where the product is somewhat weak. As Russ Allbery pointed out, the AFS client is MP-safe (meaning it works), but is far from MP-fast. It is essentially single threaded (we can pick nits here, but note the word "essentially"). On some platforms (most notably IRIX, in my experience), performance isn't just flat as you increase the number of processors, but it actually degrades severaly due to lock contention. If each individual client in your architecture will not be performing lots of parallel reads/writes of large volumes of data into AFS, such that the load on a single clients' AFS cache manager is not too high, you'll be OK. If you have 1000's of clients, however, the fileserver itself deals with such volumes of RW data fairly well, since the fileserver is multi-threaded (as of AFS 3.5 I think). AFS use of RW data must be designed cautiously. Having said all of that, I think that the filesystem should be used for files. Distributed filesystems are not very good IPC mechanisms, and you should probably step back and ask yourself if this is right architecture to begin with. Sure, you can probably make it work, but my intuition (and 7+ years of experience with AFS, and 15+ years of experience with *ultra* large scale systems) says otherwise. All of the above statements are obviously somewhat general, and we are really speaking rather qualitatively. To make quantitative arguments requires a much closer look at the architecture you're trying to engineer. _______________________________________________ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo.cgi/openafs-info _______________________________________________ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo.cgi/openafs-info
