Le Thu, 12 May 2016 11:24:33 -0500, Ken Gaillot <[email protected]> a écrit :
> On 05/09/2016 06:36 PM, Jehan-Guillaume de Rorthais wrote: > > Le Mon, 9 May 2016 17:40:19 -0500, > > Ken Gaillot <[email protected]> a écrit : > > > >> On 05/04/2016 11:47 AM, Adam Spiers wrote: > >>> Ken Gaillot <[email protected]> wrote: > >>>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote: > >>>>> On 05/04/2016 02:09 PM, Adam Spiers wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we > >>>>>> would like Pacemaker to be extended to export the current failcount as > >>>>>> an environment variable to OCF RA scripts when they are invoked with > >>>>>> 'start' or 'stop' actions. This would mean that if you have > >>>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you > >>>>>> would be able to implement a different behaviour for the third and > >>>>>> final 'stop' of a service executed on a node, which is different to > >>>>>> the previous 'stop' actions executed just prior to attempting a > >>>>>> restart of the service. (In the non-clone case, this would happen > >>>>>> just before migrating the service to another node.) > >>>>> So what you actually want to know is how much headroom > >>>>> there still is till the resource would be migrated. > >>>>> So wouldn't it then be much more catchy if we don't pass > >>>>> the failcount but rather the headroom? > >>>> > >>>> Yes, that's the plan: pass a new environment variable with > >>>> (migration-threshold - fail-count) when recovering a resource. I haven't > >>>> worked out the exact behavior yet, but that's the idea. I do hope to get > >>>> this in 1.1.15 since it's a small change. > >>>> > >>>> The advantage over using crm_failcount is that it will be limited to the > >>>> current recovery attempt, and it will calculate the headroom as you say, > >>>> rather than the raw failcount. > >>> > >>> Headroom sounds more usable, but if it's not significant extra work, > >>> why not pass both? It could come in handy, even if only for more > >>> informative logging from the RA. > >>> > >>> Thanks a lot! > >> > >> Here is what I'm testing currently: > >> > >> - When the cluster recovers a resource, the resource agent's stop action > >> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left = > >> migration-threshold - fail-count on the local node. > >> > >> - The variable is not added for any action other than stop. > > > > If the resource is a multistate one, the recover action will do a > > demote->stop->start->promote. What if the failure occurs during the first > > demote call and a new transition will try to demote first again? I suppose > > this new variable should appears at least in demote and stop action to > > cover such situation, isn't it? > > Good question. I can easily imagine a "lightweight stop", but I can't > think of a practical use for a "lightweight demote". If someone has a > scenario where that would be useful, I can look at adding it. PostgreSQL does not support the demote action. To "demote" a PostgreSQL master instance, we must stop it, then start it as a slave. But my point was mostly that before doing a stop, we must first do a demote. I think this futur variable should be available during each actions involved in the recovery process. Regards, -- Jehan-Guillaume de Rorthais Dalibo _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
