How do you handle 'vacation'[1] coverage in a solo shop?

I'm the only sysadmin for a research lab, and I'm soliciting creative
suggesions for ways to provide in-depth sysadmin coverage when I'm
not available.

We're a small group (~35 people), but have a reasonably complex
environment (a 600-core HPC cluster, infrastructure machines using RHCS
HA clustering for critical services, ~45TB of SAN storage accessible via
GPFS and NFS, a bunch of web services within the lab, etc). Thankfully,
our lab is behind a corporate firewall--we have no public-internet facing
equipment, so security and network complexity are not major issues.

The researchers in the lab are very technical.  One or two people have
been trained to provide some assistance with system issues, but it's not
part of their daily job description or core competence.  It's difficult
to address the big gap between "simple and routine" and "critical but
rare" when preparing people with no system administration background.

The easy things have already been taken care of -- I'm happy to say
that most routine sysadmin tasks are either automated, well documented,
or can be deferred.

However, there will inevitably be complex issues that arise when I'm not
available. During past vacations there have been data center fires,
data center power outages, storage array failures, etc. You know, the
kind of "interesting" events that are almost impossible to document
in advance and which really take a combination of general experience
in system administration and knowledge of the specific environment to
resolve quickly and efficiently.


If you're in a solo or small environment, how do you deal with this kind
of thing?

Thanks,

Mark

[1] "vacation" sounds so much nicer than "hit by a bus", don't you think?


-----
Mark Bergman    Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand

http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com

I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters
15+ So Far--Want to join? Check out: http://www.panix.com/~bergman 
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to