How do you handle 'vacation'[1] coverage in a solo shop? I'm the only sysadmin for a research lab, and I'm soliciting creative suggesions for ways to provide in-depth sysadmin coverage when I'm not available.
We're a small group (~35 people), but have a reasonably complex environment (a 600-core HPC cluster, infrastructure machines using RHCS HA clustering for critical services, ~45TB of SAN storage accessible via GPFS and NFS, a bunch of web services within the lab, etc). Thankfully, our lab is behind a corporate firewall--we have no public-internet facing equipment, so security and network complexity are not major issues. The researchers in the lab are very technical. One or two people have been trained to provide some assistance with system issues, but it's not part of their daily job description or core competence. It's difficult to address the big gap between "simple and routine" and "critical but rare" when preparing people with no system administration background. The easy things have already been taken care of -- I'm happy to say that most routine sysadmin tasks are either automated, well documented, or can be deferred. However, there will inevitably be complex issues that arise when I'm not available. During past vacations there have been data center fires, data center power outages, storage array failures, etc. You know, the kind of "interesting" events that are almost impossible to document in advance and which really take a combination of general experience in system administration and knowledge of the specific environment to resolve quickly and efficiently. If you're in a solo or small environment, how do you deal with this kind of thing? Thanks, Mark [1] "vacation" sounds so much nicer than "hit by a bus", don't you think? ----- Mark Bergman Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com I want a newsgroup with a infinite S/N ratio! Now taking CFV on: rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters 15+ So Far--Want to join? Check out: http://www.panix.com/~bergman _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
