hi bruce,
ha-oscar will do this for you. see http://xcr.cenit.latech.edu/ha-oscar/
one drawback with this version of ha-oscar is that id does not maintain state - that means you will lose jobs already started and jobs in the queue also. we are working on a version that is active/active such that there is no loss other than the power of the machine down - it will be some time (not an immediate release). until then you can use the current ha-oscar version - it drops directly onto a starndard oscar install.
stephen
Bruce Becker wrote:
Hello OSCAR friends
I am in the sad situation of having a very sick head head node, which tends to die on me at the most inconvenient times. I don't know whether it's a hardware or software problem, but the machine is nearing the end of her life, so it's not impossible that the motherboard is going wonky or something.
My question to you guys is this : How to eliminate that single point of failure ?
At the moment, this machine is not only our head node, but also the ssh-gateway, website, firewall, and many other monitoring services like gmond, gmetad, clumon run on that machine. I would like to move most of the services off of the head node, and leave only those necessary for computing, like lamd, etc. My idea then is to have a separate machine act as a router to and from our cluster, which is also a remote logger and polls the monitoring feeds from ganglia, etc on the cluster head node. The cluster head node can then be virtualised... I was thinking a couple of boxes running MOSIX kernel on the same switch... We need to eliminate the point of failure and have some-sort of fail-over mechanism. Note that my head node doesn't die because of an overloaded system, the failures are pretty much random at this stage...
Can I do something like this with HA-OSCAR ? Can I mix HA-OSCAR and FAT-OSCAR ? Does anyone else have this head-node-dies-and-life-turns-to-hell problem ? If so, how did you get around it ?
Comments welcome ! Thanks Bruce
-- ------------------------------------------------------------------------ Stephen L. Scott, Ph.D. voice: 865-574-3144 Oak Ridge National Laboratory fax: 865-576-5491 P. O. Box 2008, Bldg. 5600, MS-6016 [EMAIL PROTECTED] Oak Ridge, TN 37831-6016 http://www.csm.ornl.gov/~sscott/ ------------------------------------------------------------------------
------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
