Esther Filderman <[EMAIL PROTECTED]> wrote: > On 8/7/06, John Hascall <[EMAIL PROTECTED]> wrote: > >> >> 1) Stability. The uptime of our DB servers is years, >> we can only dream of that for our fileservers. > > I'm currently running a mix. My primary KDC and 'lowest IP" DB server > is a non-fileserver machine. The other two boxes do both. > > In addition to uptime, we also have the added stability of being able > to take down the KDC without interrupting volume access. This is very > very nice.
Umm, am I missing something? One of the major reasons I use AFS is the "vos move" command. And it was my understanding that AFS can handle server outages without breaking. Do you all have different experiences? If AFS can't handle a server outage (especially a planned one) there is no point in using it. I patch and reboot all of our AFS servers about once a month to ensure that they have the latest operating system patches. I usually also upgrade to the latest 1.4.x release (just installed 1.4.2b3 on a system today.) >> 2) Restart speed. Waiting for a pile of disk to >> fsck to get your DB servers up and running again >> is suboptimal. > > Again, having one machine as a DB-non-fileserver helps this greatly. > > We also run with --fast-restart compiled in. This is a pushme-pullyou. > Basically all fast-restart does is skip the salvaging. Now we have > volumes crapping themselves here and there. [Thank you, Fortran, you > %*%()#. Ahem.] I also run with fast-restart. Have not had any reported problems with volumes crapping out. And I generally vos move eveything off of a fileserver before planned restarts, so there is nothing there for the salvager to keep offline. > We're starting a routine of monthly salvages for each server to try to > combat this. Do salvages touch the volumes themselves, or is it just a parition level thing? I.e. if I vos move volumes off of the paritions and mkfs them monthly, do I still need to worry about salvaging periodically? >> 3) Load. A busy fileserver on the same machine as your >> DB server can slow your whole cell. > > Cannot argue with this. Luckily, load isn't an issue for us yet, but I do see that as a valid point for some cells. >> 4) Simplicity. When something is amiss with a machine, >> the less things a machine is doing, the less things >> to check and the less likely it is the result of >> some wierd interaction. > > This is also why I advocate turning off everything else possible on an > AFS server. No AFS client. Turn off everything you can. Outside > of AFS's own ports we have ntp and scp/ssh allowed in & out and that's > about it. Oh yes. I don't run anything else on my AFS servers or KDCs. I'd hate to see a flaw in openafs compromise a KDC and thus I keep them seperate. Although our (currently non-existant) DR plans might have a KDC and AFS server on the same machine, possibly in a Solaris zone. >> Reasons for joining them would be (in my mind): >> >> 1) Cost. Fewer machines == Less cost >> (however, you can easily run the DB servers >> low-cost, even hand-me-down boxes). > > My current DB-non-fileserver box was plucked out of the garbage. I'm > serious. All of our AFS servers were donated to us from various places. >> 2) Space, power, cooling. Either you have these or you don't. >> >> 3) You got a really small cell, so it doesn't matter. > > Argueably I have, well, a mid-sized cell. I'm supporting a fairly > small number of frequently active users [maybe 250 on a good day], > maybe 2000 total real users. I don't think I've cracked 1T in used > space yet. A sizeable chunk of my volumes are stuffed with research > databases and videos. > > Yet I find that the more servers you have the more stable you are. > The more machines you are the less one machine's impact is felt. > > My cell used to be three machines, all DB & fileservers together, > about 300G in use. When one machine went down 1/3 of the cell was > inaccessible. TOTAL MESS. > > Now I have 5 machines. Not as good as I'd like, but still muuuuch > more stable. Yes, I've noticed that things are more stable now that we have 5 servers instead of 3. But I think that is actually do to improvements in the AFS code, not b/c of the number of machines. <<CDC -- Christopher D. Clausen [EMAIL PROTECTED] SysAdmin _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
