Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"
Le vendredi 13 avril 2018 à 11:59 +0200, Oyvind Albrigtsen a écrit : > On 13/04/18 11:53 +0200, Nicolas Huillard wrote: > > Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a > > écrit : > > The issue here is the monitor will at first return a "fail", which > > is considered fatal by Pacemaker unless property start-failure-is- > > fatal is set to false, which may come with side-effects. > > That's what I do now with a ping RA inserted before the service > > which may fail if the interface is not UP. It works, but triggers > > some "fail" events which are not really "fails" but "not started > > yet". > > You might try setting it to e.g. "sleep 30; > " and see if that works. I'm using resource-agent package 4.0.0, and just noticed that what I was thinking about was implemented more recently in : https://github.com/ClusterLabs/resource-agents/commit/ee099d62c23e0afd0 442a4febde80412b8ac22f1#diff-07b3e128cbd8576888076cc71c00233b I'll use that one, thanks ! -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"
On 13/04/18 11:53 +0200, Nicolas Huillard wrote: Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > One of my resources is a pppd process, which is started with the > heartbeat/anything RA. That RA just spawn the pppd process with the > correct parameters and return OCF_SUCCESS if the process started. > The problem is that the service provided by pppd is only available > after some time (a few seconds to 30s), ie. when it have > successfully > negotiated a connection. At this time, the interface it creates is > UP. > > The issue here is that other resources that depend on this > connection > are started by Pacemaker just after it starts pppd, thus before the > interface is UP. This creates various problems. > > I figured that fixing this would require to add a monitor call > inside > the start operation, and wait for a successful monitor before > returning > OCF_SUCCESS, within the start timeout. > > Is it a correct approach? > Are there some other standard way to fix this, like a "wait for > condition" Resource Agent? You could try using the monitor_hook parameter to check the status, The issue here is the monitor will at first return a "fail", which is considered fatal by Pacemaker unless property start-failure-is-fatal is set to false, which may come with side-effects. That's what I do now with a ping RA inserted before the service which may fail if the interface is not UP. It works, but triggers some "fail" events which are not really "fails" but "not started yet". You might try setting it to e.g. "sleep 30; " and see if that works. or use the Delay agent between the anything resource and the other resources. I'll try this. Hoping a sensible delay can be derived from the logs. Thanks, -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"
Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : > On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > > One of my resources is a pppd process, which is started with the > > heartbeat/anything RA. That RA just spawn the pppd process with the > > correct parameters and return OCF_SUCCESS if the process started. > > The problem is that the service provided by pppd is only available > > after some time (a few seconds to 30s), ie. when it have > > successfully > > negotiated a connection. At this time, the interface it creates is > > UP. > > > > The issue here is that other resources that depend on this > > connection > > are started by Pacemaker just after it starts pppd, thus before the > > interface is UP. This creates various problems. > > > > I figured that fixing this would require to add a monitor call > > inside > > the start operation, and wait for a successful monitor before > > returning > > OCF_SUCCESS, within the start timeout. > > > > Is it a correct approach? > > Are there some other standard way to fix this, like a "wait for > > condition" Resource Agent? > > You could try using the monitor_hook parameter to check the status, The issue here is the monitor will at first return a "fail", which is considered fatal by Pacemaker unless property start-failure-is-fatal is set to false, which may come with side-effects. That's what I do now with a ping RA inserted before the service which may fail if the interface is not UP. It works, but triggers some "fail" events which are not really "fails" but "not started yet". > or > use the Delay agent between the anything resource and the other > resources. I'll try this. Hoping a sensible delay can be derived from the logs. Thanks, -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"
On 13/04/18 11:07 +0200, Nicolas Huillard wrote: Hello all, One of my resources is a pppd process, which is started with the heartbeat/anything RA. That RA just spawn the pppd process with the correct parameters and return OCF_SUCCESS if the process started. The problem is that the service provided by pppd is only available after some time (a few seconds to 30s), ie. when it have successfully negotiated a connection. At this time, the interface it creates is UP. The issue here is that other resources that depend on this connection are started by Pacemaker just after it starts pppd, thus before the interface is UP. This creates various problems. I figured that fixing this would require to add a monitor call inside the start operation, and wait for a successful monitor before returning OCF_SUCCESS, within the start timeout. Is it a correct approach? Are there some other standard way to fix this, like a "wait for condition" Resource Agent? You could try using the monitor_hook parameter to check the status, or use the Delay agent between the anything resource and the other resources. Using Pacemaker 1.1.16 on Debian stretch. -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"
Hello all, One of my resources is a pppd process, which is started with the heartbeat/anything RA. That RA just spawn the pppd process with the correct parameters and return OCF_SUCCESS if the process started. The problem is that the service provided by pppd is only available after some time (a few seconds to 30s), ie. when it have successfully negotiated a connection. At this time, the interface it creates is UP. The issue here is that other resources that depend on this connection are started by Pacemaker just after it starts pppd, thus before the interface is UP. This creates various problems. I figured that fixing this would require to add a monitor call inside the start operation, and wait for a successful monitor before returning OCF_SUCCESS, within the start timeout. Is it a correct approach? Are there some other standard way to fix this, like a "wait for condition" Resource Agent? Using Pacemaker 1.1.16 on Debian stretch. -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org