>>> Israel Brewster <isr...@ravnalaska.net> schrieb am 17.11.2016 um 18:37 in Nachricht <751f1bd6-8434-4ad9-b77f-10eddfe28...@ravnalaska.net>: > I have a resource that is set up as a clone set across my cluster, partly for > pseudo-load balancing (If someone wants to perform an action that will take a > lot of resources, I can have them do it on a different node than the primary > one), but also simply because the resource can take several seconds to start, > and by having it already running as a clone set, I can failover in the time > it takes to move an IP resource - essentially zero down time. > > This is all well and good, but I ran into a problem the other day where the > process on one of the nodes stopped working properly. Pacemaker caught the > issue, and tried to fix it by restarting the resource, but was unable to > because the old instance hadn't actually exited completely and was still > tying up the TCP port, thereby preventing the new instance that pacemaker > launched from being able to start. > > So this leaves me with two questions: > > 1) is there a way to set up a "kill script", such that before trying to > launch a new copy of a process, pacemaker will run this script, which would > be responsible for making sure that there are no other instances of the > process running? > 2) Even in the above situation, where pacemaker couldn't launch a good copy > of the resource on the one node, the situation could have been easily > "resolved" by pacemaker moving the virtual IP resource to another node where > the cloned resource was running correctly, and notifying me of the problem. I > know how to make colocation constraints in general, but how do I do a > colocation constraint with a cloned resource where I just need the virtual IP > running on *any* node where there clone is working properly? Or is it the > same as any other colocation resource, and pacemaker is simply smart enough > to both try to restart the failed resource and move the virtual IP resource > at the same time?
I wonder: Wouldn't a monitor operation that reports the resource as running as long as the port is occupied resolve both issues? > > As an addendum to question 2, I'd be interested in any methods there may be > to be notified of changes in the cluster state, specifically things like when > a resource fails on a node - my current nagios/icinga setup doesn't catch > that > when pacemaker properly moves the resource to a different node, because the > resource remains up (which, of course, is the whole point), but it would > still be good to know something happened so I could look into it and see if > something needs fixed on the failed node to allow the resource to run there > properly. > > Thanks! > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > ----------------------------------------------- _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org