Hello Bruce, I don't think that openMOSIX will do any good to you. In fact, IIRC, if the computer from which MOSIX processes has been started die then the processes will die too. This is not a solution for high-avalaibility but mainly for load-balancing.
OSCAR-HA is meant to adress exactly the kind problem you have. As such, you should try it. It is meant to be used with "regular" OSCAR and it provides HA to the master node only. You will certainly a couple of additional hardware if you want a real complete redundancy. Please have a look at the OSCAR-HA webpage : http://xcr.cenit.latech.edu/ha-oscar/ Ben On Thu, 14 Oct 2004 13:55:57 +0200 (SAST), Bruce Becker <[EMAIL PROTECTED]> wrote: > Hello OSCAR friends > > I am in the sad situation of having a very sick head head node, which > tends to die on me at the most inconvenient times. I don't know whether > it's a hardware or software problem, but the machine is nearing the end > of her life, so it's not impossible that the motherboard is going wonky or > something. > > My question to you guys is this : > How to eliminate that single point of failure ? > > At the moment, this machine is not only our head node, but also the > ssh-gateway, website, firewall, and many other monitoring services > like gmond, gmetad, clumon run on that machine. I would like to move most > of the services off of the head node, and leave only those necessary for > computing, like lamd, etc. My idea then is to have a separate machine act > as a router to and from our cluster, which is also a remote logger and > polls the monitoring feeds from ganglia, etc on the cluster head node. The > cluster head node can then be virtualised... I was thinking a couple of > boxes running MOSIX kernel on the same switch... We need to eliminate the > point of failure and have some-sort of fail-over mechanism. Note that my > head node doesn't die because of an overloaded system, the failures are > pretty much random at this stage... > > Can I do something like this with HA-OSCAR ? Can I mix HA-OSCAR and > FAT-OSCAR ? Does anyone else have this > head-node-dies-and-life-turns-to-hell problem ? If so, how did you > get around it ? > > Comments welcome ! > Thanks > Bruce > > -- > -- > Bruce Becker > UCT-CERN Research Center - University of Cape Town > Private Bag RONDEBOSCH 7700 > tel :(w) +27 21 650 3356 | (m) +27 82 537 9425 | (f) +27 21 650 3342 > IM : AIM/Jabber - brucellino | Yahoo - uctbruce > WEB :http://hep.phy.uct.ac.za/~becker > "Viel hilft viel" > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > Oscar-users mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/oscar-users > -- Benoit des Ligneris Ph. D. President de Revolution Linux http://www.revolutionlinux.com/ OSCAR Chair http://oscar.openclustergroup.org/ Chef de projet EduLinux http://www.edulinux.org/ ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
