[ClusterLabs] Perl Modules for resource agents (was: Resource Agent language discussion)
Le Thu, 20 Aug 2015 18:21:01 +0200, Jehan-Guillaume de Rorthais a écrit : > On Thu, 20 Aug 2015 15:05:24 +1000 > Andrew Beekhof wrote: [...] > > > What I was discussing here was: > > > > > > * if not using bash, is there any trap we should avoid that are already > > >addressed in the ocf-shellfuncs library? > > > > No, you just might have to re-implement some things. > > Particularly logging. > > Ok, that was my conclusion so far. I'll have a look at the logging funcs then. > > > > * is there a chance a perl version of such library would be accepted > > > upstream? > > > > Depends if you’re volunteering to maintain it too :) > > I do. I'll have to do it on my own for my RA anyway. Months are flying! Already 3 of them since my last answer... I spent some time to port "ocf-shellfuncs", "ocf-returncodes" and "ocf-directories" shell scripts as perl modules called "OCF_Functions.pm", "OCF_ReturnCodes.pm" and "OCF_Directories.pm". They are currently hiding in our pgsql-resource-agent repository under the "multistate/lib" folder. See : https://github.com/dalibo/pgsql-resource-agent/ They are used from the "pgsqlms" resource agent available in the "multistate/script" folder. They are supposed to leave in "$OCF_ROOT//lib/heartbeat/". The pgsqlms agent has been tested again and again in various failure situations under CentOS 6 and CentOS 7. Modules seems to behave correctly. Before considering pushing them out in a dedicated repository (or upstream?) where maintaining them would be easier, I would like to hear some feedback about them. First, OCF_Functions does not implement all the shell functions available in ocf-shellfuncs. As a first step, I focused on a simple module supporting the popular functions we actually needed for our own agent. Let me know if I forgot a function that MUST be in this first version. Second, "OCF_Directories.pm" is actually generated from "OCF_Directories.pm.PL". Because I can not rely on the upstream autogen/configure to detect the distribution specific destination folders, I wrote a wrapper in "multistate/Build.PL" around the "ocf-directories" shell script to export these variables in a temp file. Then when "building" the module, OCF_Directories.pm.PL read this temp file to produce the final distribution-dependent "OCF_Directories.pm". I don't like stuffing too much shell in perl scripts, but it's really like the autogen/configure process at the end of the day and this piece of code is only in the build process. Cleaner ways would be to: * generate OCF_Directories.pm by the upstream ./configure which already have all the logic * re-implement the logic to find the appropriate destination folders in "Build.PL". I am currently not able to follow this solution as reverse engineering the autogen/configure process seems pretty difficult and time consuming. The libs are currently auto-installed with our pgsqlms agent following the quite standard way to install perl modules and scripts: perl Build.PL perl Build perl Build install Any feedback, advice, patch etc would be appreciated! PS: files are in attachment for ease of review. Regards, -- Jehan-Guillaume de Rorthais Dalibo Build.PL Description: Perl program OCF_Directories.pm.PL Description: Perl program OCF_Functions.pm Description: Perl program OCF_ReturnCodes.pm Description: Perl program ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] Perl Modules for resource agents (was: Resource Agent language discussion)
A quick top-post. The project moved to its own repository. See: https://github.com/dalibo/PAF/ Any feedback on the perl modules and related questions bellow would still be quite appreciated :) Regards, Le Thu, 26 Nov 2015 01:13:36 +0100, Jehan-Guillaume de Rorthais a écrit : > Months are flying! Already 3 of them since my last answer... > > I spent some time to port "ocf-shellfuncs", "ocf-returncodes" and > "ocf-directories" shell scripts as perl modules called "OCF_Functions.pm", > "OCF_ReturnCodes.pm" and "OCF_Directories.pm". They are currently > hiding in our pgsql-resource-agent repository under the "multistate/lib" > folder. See : > > https://github.com/dalibo/pgsql-resource-agent/ > > They are used from the "pgsqlms" resource agent available in the > "multistate/script" folder. They are supposed to leave in > "$OCF_ROOT//lib/heartbeat/". The pgsqlms agent has been tested again and again > in various failure situations under CentOS 6 and CentOS 7. Modules seems to > behave correctly. > > Before considering pushing them out in a dedicated repository (or upstream?) > where maintaining them would be easier, I would like to hear some feedback > about them. > > First, OCF_Functions does not implement all the shell functions available in > ocf-shellfuncs. As a first step, I focused on a simple module supporting the > popular functions we actually needed for our own agent. Let me know if I > forgot a function that MUST be in this first version. > > Second, "OCF_Directories.pm" is actually generated from > "OCF_Directories.pm.PL". Because I can not rely on the upstream > autogen/configure to detect the distribution specific destination folders, I > wrote a wrapper in "multistate/Build.PL" around the "ocf-directories" shell > script to export these variables in a temp file. Then when "building" the > module, OCF_Directories.pm.PL read this temp file to produce the final > distribution-dependent "OCF_Directories.pm". I don't like stuffing too much > shell in perl scripts, but it's really like the autogen/configure process at > the end of the day and this piece of code is only in the build process. > > Cleaner ways would be to: > > * generate OCF_Directories.pm by the upstream ./configure which already have > all the logic > * re-implement the logic to find the appropriate destination folders in > "Build.PL". I am currently not able to follow this solution as reverse > engineering the autogen/configure process seems pretty difficult and time > consuming. > > The libs are currently auto-installed with our pgsqlms agent following the > quite standard way to install perl modules and scripts: > > perl Build.PL > perl Build > perl Build install > > Any feedback, advice, patch etc would be appreciated! > > PS: files are in attachment for ease of review. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] why and when a call of crm_attribute can be delayed ?
Hi all, I am facing a strange issue with attrd while doing some testing on a three node cluster with the pgsqlms RA [1]. pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave setup on top of pgsqld. Before triggering a failure, here was the situation: * centos1: pgsql-ha slave * centos2: pgsql-ha slave * centos3: pgsql-ha master Then we triggered a failure: the node centos3 has been kill using echo c > /proc/sysrq-trigger In this situation, PEngine provide a transition where : * centos3 is fenced * pgsql-ha on centos2 is promoted During the pre-promote notify action in the pgsqlms RA, each remaining slave are setting a node attribute called lsn_location, see: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504 crm_attribute -l reboot -t status --node "$nodename" \ --name lsn_location --update "$node_lsn" During the promotion action in the pgsqlms RA, the RA check the lsn_location of the all the nodes to make sure the local one is higher or equal to all others. See: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292 This is where we face a attrd behavior we don't understand. Despite we can see in the log the RA was able to set its local "lsn_location", during the promotion action, the RA was unable to read its local lsn_location": pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 INFO: pgsql_notify: promoting instance on node "centos2" pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 INFO: pgsql_notify: current node LSN: 0/1EE24000 [...] pgsqlms(pgsqld)[9023]: 2016/04/22_14:46:16 CRIT: pgsql_promote: can not get current node LSN location Apr 22 14:46:16 [5864] centos2 lrmd: notice: operation_finished: pgsqld_promote_0:9023:stderr [ Error performing operation: No such device or address ] Apr 22 14:46:16 [5864] centos2 lrmd: info: log_finished: finished - rsc:pgsqld action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms queue-time:0ms The error comes from: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320 **After** this error, we can see in the log file attrd set the "lsn_location" of centos2: Apr 22 14:46:16 [5865] centos2 attrd: info: attrd_peer_update: Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 Apr 22 14:46:16 [5865] centos2 attrd: info: write_attribute: Write out of 'lsn_location' delayed:update 189 in progress As I understand it, the call of crm_attribute during pre-promote notification has been taken into account AFTER the "promote" action, leading to this error. Am I right? Why and how this could happen? Could it comes from the dampen parameter? We did not set any dampen anywhere, is there a default value in the cluster setup? Could we avoid this behavior? Please, find in attachment a tarball with : * all cluster logfiles from the three nodes * the content of /var/lib/pacemaker from the three nodes: * CIBs * PEngine transitions Regards, [1] https://github.com/dalibo/PAF -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions
Le Wed, 4 May 2016 13:09:04 +0100, Adam Spiers a écrit : > Hi all, Hello, > As discussed with Ken and Andrew at the OpenStack summit last week, we > would like Pacemaker to be extended to export the current failcount as > an environment variable to OCF RA scripts when they are invoked with > 'start' or 'stop' actions. This would mean that if you have > start-failure-is-fatal=false and migration-threshold=3 (say), then you > would be able to implement a different behaviour for the third and > final 'stop' of a service executed on a node, which is different to > the previous 'stop' actions executed just prior to attempting a > restart of the service. (In the non-clone case, this would happen > just before migrating the service to another node.) > > One use case for this is to invoke "nova service-disable" if Pacemaker > fails to restart the nova-compute service on an OpenStack compute > node. > > Is it feasible to squeeze this in before the 1.1.15 release? Wouldn't it possible to do the following command from the RA to get its current failcount ? crm_failcount --resource "$OCF_RESOURCE_INSTANCE" -G Moreover, how would you track the previous failures were all from the start action? I suppose you will have to track internally the failcount yourself, isn't it? Maybe you could track failure in some fashion using private attributes (eg. start_attempt and last_start_ts)? Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] why and when a call of crm_attribute can be delayed ?
Le Wed, 4 May 2016 09:55:34 -0500, Ken Gaillot a écrit : > On 04/25/2016 05:02 AM, Jehan-Guillaume de Rorthais wrote: > > Hi all, > > > > I am facing a strange issue with attrd while doing some testing on a three > > node cluster with the pgsqlms RA [1]. > > > > pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave > > setup on top of pgsqld. > > > > Before triggering a failure, here was the situation: > > > > * centos1: pgsql-ha slave > > * centos2: pgsql-ha slave > > * centos3: pgsql-ha master > > > > Then we triggered a failure: the node centos3 has been kill using > > > > echo c > /proc/sysrq-trigger > > > > In this situation, PEngine provide a transition where : > > > > * centos3 is fenced > > * pgsql-ha on centos2 is promoted > > > > During the pre-promote notify action in the pgsqlms RA, each remaining > > slave are setting a node attribute called lsn_location, see: > > > > https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504 > > > > crm_attribute -l reboot -t status --node "$nodename" \ > > --name lsn_location --update "$node_lsn" > > > > During the promotion action in the pgsqlms RA, the RA check the > > lsn_location of the all the nodes to make sure the local one is higher or > > equal to all others. See: > > > > https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292 > > > > This is where we face a attrd behavior we don't understand. > > > > Despite we can see in the log the RA was able to set its local > > "lsn_location", during the promotion action, the RA was unable to read its > > local lsn_location": > > > > pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 > > INFO: pgsql_notify: promoting instance on node "centos2" > > > > pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 > > INFO: pgsql_notify: current node LSN: 0/1EE24000 > > > > [...] > > > > pgsqlms(pgsqld)[9023]: 2016/04/22_14:46:16 > > CRIT: pgsql_promote: can not get current node LSN location > > > > Apr 22 14:46:16 [5864] centos2 lrmd: > > notice: operation_finished: pgsqld_promote_0:9023:stderr > > [ Error performing operation: No such device or address ] > > > > Apr 22 14:46:16 [5864] centos2 lrmd: > > info: log_finished: finished - rsc:pgsqld > > action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms > > queue-time:0ms > > > > The error comes from: > > > > https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320 > > > > **After** this error, we can see in the log file attrd set the > > "lsn_location" of centos2: > > > > Apr 22 14:46:16 [5865] centos2 > > attrd: info: attrd_peer_update: > > Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 > > > > Apr 22 14:46:16 [5865] centos2 > > attrd: info: write_attribute: > > Write out of 'lsn_location' delayed:update 189 in progress > > > > > > As I understand it, the call of crm_attribute during pre-promote > > notification has been taken into account AFTER the "promote" action, > > leading to this error. Am I right? > > > > Why and how this could happen? Could it comes from the dampen parameter? We > > did not set any dampen anywhere, is there a default value in the cluster > > setup? Could we avoid this behavior? > > Unfortunately, that is expected. Both the cluster's call of the RA's > notify action, and the RA's call of crm_attribute, are asynchronous. So > there is no guarantee that anything done by the pre-promote notify will > be complete (or synchronized across other cluster nodes) by the time the > promote action is called. Ok, thank you for this explanation. It helps. > There would be no point in the pre-promote notify waiting for the > attribute value to be retrievable, because the cluster isn't going to > wait for the pre-promote notify to finish before calling promote. Oh, this is surprising. I thought the pseudo action "*_confirmed-pre_notify_demote_0" in the transition graph was a wait for each resource clone return code before going on with the transition. The graph is confusing, if the cluster isn't going to wait for the pre-promote notify to finish before calling promote, I suppose some arrows should point directly from start (or post-start-notify?) action directly to the promot
Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions
Le Mon, 9 May 2016 17:40:19 -0500, Ken Gaillot a écrit : > On 05/04/2016 11:47 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> On 05/04/2016 08:49 AM, Klaus Wenninger wrote: > >>> On 05/04/2016 02:09 PM, Adam Spiers wrote: > >>>> Hi all, > >>>> > >>>> As discussed with Ken and Andrew at the OpenStack summit last week, we > >>>> would like Pacemaker to be extended to export the current failcount as > >>>> an environment variable to OCF RA scripts when they are invoked with > >>>> 'start' or 'stop' actions. This would mean that if you have > >>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you > >>>> would be able to implement a different behaviour for the third and > >>>> final 'stop' of a service executed on a node, which is different to > >>>> the previous 'stop' actions executed just prior to attempting a > >>>> restart of the service. (In the non-clone case, this would happen > >>>> just before migrating the service to another node.) > >>> So what you actually want to know is how much headroom > >>> there still is till the resource would be migrated. > >>> So wouldn't it then be much more catchy if we don't pass > >>> the failcount but rather the headroom? > >> > >> Yes, that's the plan: pass a new environment variable with > >> (migration-threshold - fail-count) when recovering a resource. I haven't > >> worked out the exact behavior yet, but that's the idea. I do hope to get > >> this in 1.1.15 since it's a small change. > >> > >> The advantage over using crm_failcount is that it will be limited to the > >> current recovery attempt, and it will calculate the headroom as you say, > >> rather than the raw failcount. > > > > Headroom sounds more usable, but if it's not significant extra work, > > why not pass both? It could come in handy, even if only for more > > informative logging from the RA. > > > > Thanks a lot! > > Here is what I'm testing currently: > > - When the cluster recovers a resource, the resource agent's stop action > will get a new variable, OCF_RESKEY_CRM_meta_recovery_left = > migration-threshold - fail-count on the local node. > > - The variable is not added for any action other than stop. If the resource is a multistate one, the recover action will do a demote->stop->start->promote. What if the failure occurs during the first demote call and a new transition will try to demote first again? I suppose this new variable should appears at least in demote and stop action to cover such situation, isn't it? Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] notify action asynchronous ? (was: why and when a call of crm_attribute can be delayed ?)
Le Sun, 8 May 2016 16:35:25 +0200, Jehan-Guillaume de Rorthais a écrit : > Le Sat, 7 May 2016 00:27:04 +0200, > Jehan-Guillaume de Rorthais a écrit : > > > Le Wed, 4 May 2016 09:55:34 -0500, > > Ken Gaillot a écrit : > ... > > > There would be no point in the pre-promote notify waiting for the > > > attribute value to be retrievable, because the cluster isn't going to > > > wait for the pre-promote notify to finish before calling promote. > > > > Oh, this is surprising. I thought the pseudo action > > "*_confirmed-pre_notify_demote_0" in the transition graph was a wait for > > each resource clone return code before going on with the transition. The > > graph is confusing, if the cluster isn't going to wait for the pre-promote > > notify to finish before calling promote, I suppose some arrows should point > > directly from start (or post-start-notify?) action directly to the promote > > action then, isn't it? > > > > This is quite worrying as our RA rely a lot on notifications. As instance, > > we try to recover a PostgreSQL instance during pre-start or pre-demote if we > > detect a recover action... > > I'm coming back on this point. > > Looking at this documentation page: > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html > > I can read "Arrows indicate ordering dependencies". > > Looking at the transition graph I am studying (see attachment, a simple > master resource move), I still don't understand how the cluster isn't going to > wait for a pre-promote notify to finish before calling promote. > > So either I misunderstood your words or I miss something else important, which > is quite possible as I am fairly new to this word. Anyway, I try to make a > RA as robust as possible and any lights/docs are welcome! I tried to trigger this potential asynchronous behavior of the notify action, but couldn't observe it. I added different sleep period in the notify action for each node of my cluster: * 10s for hanode1 * 15s for hanode2 * 20s for hanode3 The master was on hanode1 and the DC was hanode1. While moving the master resource to hanode2, I can see in the log files that the DC is always waiting for the rc of hanode3 before triggering the next action in the transition. So, **in pratice**, it seems the notify action is synchronous. In theory now, I still wonder if I misunderstood your words... Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] notify action asynchronous ?
Le Thu, 12 May 2016 11:11:15 -0500, Ken Gaillot a écrit : > On 05/12/2016 04:37 AM, Jehan-Guillaume de Rorthais wrote: > > Le Sun, 8 May 2016 16:35:25 +0200, > > Jehan-Guillaume de Rorthais a écrit : > > > >> Le Sat, 7 May 2016 00:27:04 +0200, > >> Jehan-Guillaume de Rorthais a écrit : > >> > >>> Le Wed, 4 May 2016 09:55:34 -0500, > >>> Ken Gaillot a écrit : > >> ... > >>>> There would be no point in the pre-promote notify waiting for the > >>>> attribute value to be retrievable, because the cluster isn't going to > >>>> wait for the pre-promote notify to finish before calling promote. > >>> > >>> Oh, this is surprising. I thought the pseudo action > >>> "*_confirmed-pre_notify_demote_0" in the transition graph was a wait for > >>> each resource clone return code before going on with the transition. The > >>> graph is confusing, if the cluster isn't going to wait for the pre-promote > >>> notify to finish before calling promote, I suppose some arrows should > >>> point directly from start (or post-start-notify?) action directly to the > >>> promote action then, isn't it? > >>> > >>> This is quite worrying as our RA rely a lot on notifications. As instance, > >>> we try to recover a PostgreSQL instance during pre-start or pre-demote if > >>> we detect a recover action... > >> > >> I'm coming back on this point. > >> > >> Looking at this documentation page: > >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html > >> > >> I can read "Arrows indicate ordering dependencies". > >> > >> Looking at the transition graph I am studying (see attachment, a simple > >> master resource move), I still don't understand how the cluster isn't > >> going to wait for a pre-promote notify to finish before calling promote. > >> > >> So either I misunderstood your words or I miss something else important, > >> which is quite possible as I am fairly new to this word. Anyway, I try to > >> make a RA as robust as possible and any lights/docs are welcome! > > > > I tried to trigger this potential asynchronous behavior of the notify > > action, but couldn't observe it. > > > > I added different sleep period in the notify action for each node of my > > cluster: > > * 10s for hanode1 > > * 15s for hanode2 > > * 20s for hanode3 > > > > The master was on hanode1 and the DC was hanode1. While moving the master > > resource to hanode2, I can see in the log files that the DC is always > > waiting for the rc of hanode3 before triggering the next action in the > > transition. > > > > So, **in pratice**, it seems the notify action is synchronous. In theory > > now, I still wonder if I misunderstood your words... > > I think you're right, and I was mistaken. OK > The asynchronicity most likely comes purely from crm_attribute not waiting > for the value to be set and propagated to all nodes. Yes, I'll deal with that in our RA. Thank you for this confirmation. > I think I was confusing clone notifications with the new alerts feature, > which is asynchronous. We named that "alerts" to try to avoid such > confusion, but my brain hasn't gotten the memo yet ;) Heh, OK, no problem. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions
Le Thu, 12 May 2016 11:24:33 -0500, Ken Gaillot a écrit : > On 05/09/2016 06:36 PM, Jehan-Guillaume de Rorthais wrote: > > Le Mon, 9 May 2016 17:40:19 -0500, > > Ken Gaillot a écrit : > > > >> On 05/04/2016 11:47 AM, Adam Spiers wrote: > >>> Ken Gaillot wrote: > >>>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote: > >>>>> On 05/04/2016 02:09 PM, Adam Spiers wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we > >>>>>> would like Pacemaker to be extended to export the current failcount as > >>>>>> an environment variable to OCF RA scripts when they are invoked with > >>>>>> 'start' or 'stop' actions. This would mean that if you have > >>>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you > >>>>>> would be able to implement a different behaviour for the third and > >>>>>> final 'stop' of a service executed on a node, which is different to > >>>>>> the previous 'stop' actions executed just prior to attempting a > >>>>>> restart of the service. (In the non-clone case, this would happen > >>>>>> just before migrating the service to another node.) > >>>>> So what you actually want to know is how much headroom > >>>>> there still is till the resource would be migrated. > >>>>> So wouldn't it then be much more catchy if we don't pass > >>>>> the failcount but rather the headroom? > >>>> > >>>> Yes, that's the plan: pass a new environment variable with > >>>> (migration-threshold - fail-count) when recovering a resource. I haven't > >>>> worked out the exact behavior yet, but that's the idea. I do hope to get > >>>> this in 1.1.15 since it's a small change. > >>>> > >>>> The advantage over using crm_failcount is that it will be limited to the > >>>> current recovery attempt, and it will calculate the headroom as you say, > >>>> rather than the raw failcount. > >>> > >>> Headroom sounds more usable, but if it's not significant extra work, > >>> why not pass both? It could come in handy, even if only for more > >>> informative logging from the RA. > >>> > >>> Thanks a lot! > >> > >> Here is what I'm testing currently: > >> > >> - When the cluster recovers a resource, the resource agent's stop action > >> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left = > >> migration-threshold - fail-count on the local node. > >> > >> - The variable is not added for any action other than stop. > > > > If the resource is a multistate one, the recover action will do a > > demote->stop->start->promote. What if the failure occurs during the first > > demote call and a new transition will try to demote first again? I suppose > > this new variable should appears at least in demote and stop action to > > cover such situation, isn't it? > > Good question. I can easily imagine a "lightweight stop", but I can't > think of a practical use for a "lightweight demote". If someone has a > scenario where that would be useful, I can look at adding it. PostgreSQL does not support the demote action. To "demote" a PostgreSQL master instance, we must stop it, then start it as a slave. But my point was mostly that before doing a stop, we must first do a demote. I think this futur variable should be available during each actions involved in the recovery process. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 10:53:31 -0500, Ken Gaillot a écrit : > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on the core issue remaining: > > The original idea was to pass the number of restarts remaining before > the resource will no longer tried to be started on the same node. This > involves calculating (fail-count - migration-threshold), and that > implies certain limitations: (1) it will only be set when the cluster > checks migration-threshold; (2) it will only be set for the failed > resource itself, not for other resources that may be recovered due to > dependencies on it. > > Ulrich Windl proposed an alternative: setting a boolean value instead. I > forgot to cc the list on my reply, so I'll summarize now: We would set a > new variable like OCF_RESKEY_CRM_recovery=true whenever a start is > scheduled after a stop on the same node in the same transition. This > would avoid the corner cases of the previous approach; instead of being > tied to migration-threshold, it would be set whenever a recovery was > being attempted, for any reason. And with this approach, it should be > easier to set the variable for all actions on the resource > (demote/stop/start/promote), rather than just the stop. I can see the value of having such variable during various actions. However, we can also deduce the transition is a recovering during the notify actions with the notify variables (the only information we lack is the order of the actions). A most flexible approach would be to make sure the notify variables are always available during the whole transaction for **all** actions, not just notify. It seems like it's already the case, but a recent discussion emphase this is just a side effect of the current implementation. I understand this as they were sometime available outside of notification "by accident". Also, I can see the benefit of having the remaining attempt for the current action before hitting the migration-threshold. I might misunderstand something here, but it seems to me both informations are different. Basically, what we need is a better understanding of the transition itself from the RA actions. If you are still brainstorming on this, as a RA dev, what I would suggest is: * provide and enforce the notify variables in all actions * add the actions order during the current transition to these variables using eg. OCF_RESKEY_CRM_meta_notify_*_actionid * add a new variable with remaining action attempt before migration. This one has the advantage to survive the transition breakage when a failure occurs. On a second step, we would be able to provide some helper function in the ocf_shellfuncs (and in my perl module equivalent) to compute if the transition is a switchover, a failover, a recovery, etc, based on the notify variables. Presently, I am detecting such scenarios directly in my RA during the notify actions and tracking them as private attributes to be aware of the situation during the real actions (demote and stop). See: https://github.com/dalibo/PAF/blob/952cb3cf2f03aad18fbeafe3a91f997a56c3b606/script/pgsqlms#L95 Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 13:15:20 -0500, Ken Gaillot a écrit : > On 05/19/2016 11:43 AM, Jehan-Guillaume de Rorthais wrote: >> Le Thu, 19 May 2016 10:53:31 -0500, >> Ken Gaillot a écrit : >> >>> A recent thread discussed a proposed new feature, a new environment >>> variable that would be passed to resource agents, indicating whether a >>> stop action was part of a recovery. >>> >>> Since that thread was long and covered a lot of topics, I'm starting a >>> new one to focus on the core issue remaining: >>> >>> The original idea was to pass the number of restarts remaining before >>> the resource will no longer tried to be started on the same node. This >>> involves calculating (fail-count - migration-threshold), and that >>> implies certain limitations: (1) it will only be set when the cluster >>> checks migration-threshold; (2) it will only be set for the failed >>> resource itself, not for other resources that may be recovered due to >>> dependencies on it. >>> >>> Ulrich Windl proposed an alternative: setting a boolean value instead. I >>> forgot to cc the list on my reply, so I'll summarize now: We would set a >>> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is >>> scheduled after a stop on the same node in the same transition. This >>> would avoid the corner cases of the previous approach; instead of being >>> tied to migration-threshold, it would be set whenever a recovery was >>> being attempted, for any reason. And with this approach, it should be >>> easier to set the variable for all actions on the resource >>> (demote/stop/start/promote), rather than just the stop. >> >> I can see the value of having such variable during various actions. >> However, we can also deduce the transition is a recovering during the >> notify actions with the notify variables (the only information we lack is >> the order of the actions). A most flexible approach would be to make sure >> the notify variables are always available during the whole transaction for >> **all** actions, not just notify. It seems like it's already the case, but >> a recent discussion emphase this is just a side effect of the current >> implementation. I understand this as they were sometime available outside >> of notification "by accident". > > It does seem that a recovery could be implied from the > notify_{start,stop}_uname variables, but notify variables are only set > for clones that support the notify action. I think the goal here is to > work with any resource type. Even for clones, if they don't otherwise > need notifications, they'd have to add the overhead of notify calls on > all instances, that would do nothing. Exact, notify variables are only available for clones, presently. What I was suggesting is that notify variables were always available, whatever the resource is a clone, a ms or a standard one. And I wasn't meaning notify *action* should be activated all the time for all the resources. The notify switch for clones/ms could be kept to false by default so the notify action is not called itself during the transitions. > > Also, I can see the benefit of having the remaining attempt for the current > > action before hitting the migration-threshold. I might misunderstand > > something here, but it seems to me both informations are different. > > I think the use cases that have been mentioned would all be happy with > just the boolean. Does anyone need the actual count, or just whether > this is a stop-start vs a full stop? I was thinking of a use case where a graceful demote or stop action failed multiple times and to give a chance to the RA to choose another method to stop the resource before it requires a migration. As instance, PostgreSQL has 3 different kind of stop, the last one being not graceful, but still better than a kill -9. > The problem with the migration-threshold approach is that there are > recoveries that will be missed because they don't involve > migration-threshold. If the count is really needed, the > migration-threshold approach is necessary, but if recovery is the really > interesting information, then a boolean would be more accurate. I think I misunderstood the original use cases you try to achieve. It seems to me we are talking about different a feature. >> Basically, what we need is a better understanding of the transition itself >> from the RA actions. >> >> If you are still brainstorming on this, as a RA dev, what I would >> suggest is: >> >> * provide and enforce the notify variables in all actions >> * add the action
Re: [ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Fri, 20 May 2016 08:39:42 +0200, "Ulrich Windl" a écrit : > >>> Jehan-Guillaume de Rorthais schrieb am 19.05.2016 um > >>> 21:29 in > Nachricht <20160519212947.6cc0fd7b@firost>: > [...] > > I was thinking of a use case where a graceful demote or stop action failed > > multiple times and to give a chance to the RA to choose another method to > > stop > > the resource before it requires a migration. As instance, PostgreSQL has 3 > > different kind of stop, the last one being not graceful, but still better > > than > > a kill -9. > > For example the Xen RA tries a clean shutdown with a timeout of about 2/3 of > the timeout; it it fails it shuts the VM down the hard way. Reading the Xen RA, I see they added a shutdown timeout escalation parameter. This is a reasonable solution, but isn't it possible to get the action timeout directly? I looked for such information in the past with no success. > > I don't know Postgres in detail, but I could imagine a three step approach: > 1) Shutdown after current operations have finished > 2) Shutdown regardless of pending operations (doing rollbacks) > 3) Shutdown the hard way, requiring recovery on the next start (I think in > Oracle this is called a "shutdown abort") Exactly. > Depending on the scenario one may start at step 2) Indeed. > [...] > I think RAs should not rely on "stop" being called multiple times for a > resource to be stopped. Ok, so the RA should take care of their own escalation during a single action. Thanks, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Fri, 20 May 2016 11:12:28 +0200, "Ulrich Windl" a écrit : > >>> Jehan-Guillaume de Rorthais schrieb am 20.05.2016 um > 09:59 in > Nachricht <20160520095934.029c1822@firost>: > > Le Fri, 20 May 2016 08:39:42 +0200, > > "Ulrich Windl" a écrit : > > > >> >>> Jehan-Guillaume de Rorthais schrieb am 19.05.2016 um > >> >>> 21:29 in > >> Nachricht <20160519212947.6cc0fd7b@firost>: > >> [...] > >> > I was thinking of a use case where a graceful demote or stop action > failed > >> > multiple times and to give a chance to the RA to choose another method to > > >> > stop > >> > the resource before it requires a migration. As instance, PostgreSQL has > 3 > >> > different kind of stop, the last one being not graceful, but still better > > >> > than > >> > a kill -9. > >> > >> For example the Xen RA tries a clean shutdown with a timeout of about 2/3 > of > >> the timeout; it it fails it shuts the VM down the hard way. > > > > Reading the Xen RA, I see they added a shutdown timeout escalation > > parameter. > > Not quite: > if [ -n "$OCF_RESKEY_shutdown_timeout" ]; then > timeout=$OCF_RESKEY_shutdown_timeout > elif [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then > # Allow 2/3 of the action timeout for the orderly shutdown > # (The origin unit is ms, hence the conversion) > timeout=$((OCF_RESKEY_CRM_meta_timeout/1500)) > else > timeout=60 > fi > > > This is a reasonable solution, but isn't it possible to get the action > > timeout > > directly? I looked for such information in the past with no success. > > See above. Gosh, this is embarrassing...how could we miss that? Thank you for pointing this! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not invoking monitor after $interval
Le Fri, 20 May 2016 11:33:39 +, "Felix Zachlod (Lists)" a écrit : > Hello! > > I am currently working on a cluster setup which includes several resources > with "monitor interval=XXs" set. As far as I understand this should run the > monitor action on the resource agent every XX seconds. But it seems it > doesn't. How do you know it doesn't? Are you looking at crm_mon? log files? If you are looking at crm_mon, the output will not be updated unless some changes are applied to the CIB or a transition is in progress. > Actually monitor is only invoked in special condition, e.g. cleanup, > start and so on, but never for a running (or stopped) resource. So it won't > detect any resource failures, unless a manual action takes place. It won't > update master preference either when set in the monitor action. > > Are there any special conditions under which the monitor will not be > executed? Could you provide us with your Pacemaker setup? > (Cluster IS managed though) Resources can be unmanaged individually as well. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
Le Fri, 20 May 2016 15:31:16 +0300, Andrey Rogovsky a écrit : > Hi! > I cant get attribute value: > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME --attr-name > master-pgsqld --get-value > Error performing operation: No such device or address > > This value is present: > crm_mon -A1 | grep master-pgsqld > + master-pgsqld: 1001 > + master-pgsqld: 1000 > + master-pgsqld: 1 Use crm_master to get master scores easily. > I use 1.1.15-rc1 > dpkg -l | grep pacemaker-cli-utils > ii pacemaker-cli-utils1.1.15-rc1amd64 >Command line interface utilities for Pacemaker > > Also non-integer values work file: > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME --attr-name > pgsql-data-status --get-value > STREAMING|ASYNC I'm very confused. It sounds you are mixing two different resource agent for PostgreSQL. I can recognize scores for you master resource set bu the pgsqlms RA (PAF project) and the data-status attribute from the pgsql RA... > I thinking this patch > https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204?diff=unified#diff-55bc49a57c12093902e3842ce349a71fR269 > is > not apply in 1.1.15-rc1? > > How I can get integere value from node attribute? With the correct name for the given attribute. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
Le Mon, 23 May 2016 09:28:41 +0300, Andrey Rogovsky a écrit : > I try crm_master, but it not works: > # LC_ALL=C /usr/sbin/crm_master -q -t nodes --node-uname $HOSTNAME > --attr-name master-pgsqld --get-value > crm_master: invalid option -- 't' > crm_master: unrecognized option '--node-uname' > crm_master: unrecognized option '--attr-name' > crm_master - A convenience wrapper for crm_attribute I tried this with success: crm_master -r pgsqld -N hanode1 -Q The resource you have to give is the resource (the one that is cloned), not the master resource. > 2016-05-20 16:40 GMT+03:00 Jehan-Guillaume de Rorthais : > > > Le Fri, 20 May 2016 15:31:16 +0300, > > Andrey Rogovsky a écrit : > > > > > Hi! > > > I cant get attribute value: > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > --attr-name > > > master-pgsqld --get-value > > > Error performing operation: No such device or address > > > > > > This value is present: > > > crm_mon -A1 | grep master-pgsqld > > > + master-pgsqld: 1001 > > > + master-pgsqld: 1000 > > > + master-pgsqld: 1 > > > > Use crm_master to get master scores easily. > > > > > I use 1.1.15-rc1 > > > dpkg -l | grep pacemaker-cli-utils > > > ii pacemaker-cli-utils1.1.15-rc1amd64 > > >Command line interface utilities for Pacemaker > > > > > > Also non-integer values work file: > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > --attr-name > > > pgsql-data-status --get-value > > > STREAMING|ASYNC > > > > I'm very confused. It sounds you are mixing two different resource agent > > for > > PostgreSQL. I can recognize scores for you master resource set bu the > > pgsqlms > > RA (PAF project) and the data-status attribute from the pgsql RA... > > > > > I thinking this patch > > > > > https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204?diff=unified#diff-55bc49a57c12093902e3842ce349a71fR269 > > > is > > > not apply in 1.1.15-rc1? > > > > > > How I can get integere value from node attribute? > > > > With the correct name for the given attribute. -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
Le Mon, 23 May 2016 11:36:37 +0300, Andrey Rogovsky a écrit : > Hi > This is not work for me: > # crm_master -r pgsqld -N $HOSTNAME $HOSTNAME -Q > Error performing operation: No such device or address This should be : crm_master -r pgsqld -N $HOSTNAME -Q (supposing as your resource name is "pgsqld") > 2016-05-23 11:19 GMT+03:00 Jehan-Guillaume de Rorthais : > > > Le Mon, 23 May 2016 09:28:41 +0300, > > Andrey Rogovsky a écrit : > > > > > I try crm_master, but it not works: > > > # LC_ALL=C /usr/sbin/crm_master -q -t nodes --node-uname $HOSTNAME > > > --attr-name master-pgsqld --get-value > > > crm_master: invalid option -- 't' > > > crm_master: unrecognized option '--node-uname' > > > crm_master: unrecognized option '--attr-name' > > > crm_master - A convenience wrapper for crm_attribute > > > > I tried this with success: > > > > crm_master -r pgsqld -N hanode1 -Q > > > > The resource you have to give is the resource (the one that is cloned), > > not the > > master resource. > > > > > > > 2016-05-20 16:40 GMT+03:00 Jehan-Guillaume de Rorthais > >: > > > > > > > Le Fri, 20 May 2016 15:31:16 +0300, > > > > Andrey Rogovsky a écrit : > > > > > > > > > Hi! > > > > > I cant get attribute value: > > > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > > > --attr-name > > > > > master-pgsqld --get-value > > > > > Error performing operation: No such device or address > > > > > > > > > > This value is present: > > > > > crm_mon -A1 | grep master-pgsqld > > > > > + master-pgsqld: 1001 > > > > > + master-pgsqld: 1000 > > > > > + master-pgsqld: 1 > > > > > > > > Use crm_master to get master scores easily. > > > > > > > > > I use 1.1.15-rc1 > > > > > dpkg -l | grep pacemaker-cli-utils > > > > > ii pacemaker-cli-utils1.1.15-rc1 > > amd64 > > > > >Command line interface utilities for Pacemaker > > > > > > > > > > Also non-integer values work file: > > > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > > > --attr-name > > > > > pgsql-data-status --get-value > > > > > STREAMING|ASYNC > > > > > > > > I'm very confused. It sounds you are mixing two different resource > > agent > > > > for > > > > PostgreSQL. I can recognize scores for you master resource set bu the > > > > pgsqlms > > > > RA (PAF project) and the data-status attribute from the pgsql RA... > > > > > > > > > I thinking this patch > > > > > > > > > > > https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204?diff=unified#diff-55bc49a57c12093902e3842ce349a71fR269 > > > > > is > > > > > not apply in 1.1.15-rc1? > > > > > > > > > > How I can get integere value from node attribute? > > > > > > > > With the correct name for the given attribute. > > > > -- > > Jehan-Guillaume de Rorthais > > Dalibo > > -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
Le Mon, 23 May 2016 12:31:37 +0300, Andrey Rogovsky a écrit : > This is not work > # crm_master -r master-pgsqld -N $HOSTNAME -Q > Error performing operation: No such device or address as I wrote: you must use the resource name that is cloned by your master resource. Could you show us your configuration please? > 2016-05-23 11:46 GMT+03:00 Jehan-Guillaume de Rorthais : > > > Le Mon, 23 May 2016 11:36:37 +0300, > > Andrey Rogovsky a écrit : > > > > > Hi > > > This is not work for me: > > > # crm_master -r pgsqld -N $HOSTNAME $HOSTNAME -Q > > > Error performing operation: No such device or address > > > > This should be : > > > > crm_master -r pgsqld -N $HOSTNAME -Q > > > > (supposing as your resource name is "pgsqld") > > > > > 2016-05-23 11:19 GMT+03:00 Jehan-Guillaume de Rorthais > >: > > > > > > > Le Mon, 23 May 2016 09:28:41 +0300, > > > > Andrey Rogovsky a écrit : > > > > > > > > > I try crm_master, but it not works: > > > > > # LC_ALL=C /usr/sbin/crm_master -q -t nodes --node-uname $HOSTNAME > > > > > --attr-name master-pgsqld --get-value > > > > > crm_master: invalid option -- 't' > > > > > crm_master: unrecognized option '--node-uname' > > > > > crm_master: unrecognized option '--attr-name' > > > > > crm_master - A convenience wrapper for crm_attribute > > > > > > > > I tried this with success: > > > > > > > > crm_master -r pgsqld -N hanode1 -Q > > > > > > > > The resource you have to give is the resource (the one that is cloned), > > > > not the > > > > master resource. > > > > > > > > > > > > > 2016-05-20 16:40 GMT+03:00 Jehan-Guillaume de Rorthais < > > j...@dalibo.com > > > > >: > > > > > > > > > > > Le Fri, 20 May 2016 15:31:16 +0300, > > > > > > Andrey Rogovsky a écrit : > > > > > > > > > > > > > Hi! > > > > > > > I cant get attribute value: > > > > > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > > > > > --attr-name > > > > > > > master-pgsqld --get-value > > > > > > > Error performing operation: No such device or address > > > > > > > > > > > > > > This value is present: > > > > > > > crm_mon -A1 | grep master-pgsqld > > > > > > > + master-pgsqld: 1001 > > > > > > > + master-pgsqld: 1000 > > > > > > > + master-pgsqld: 1 > > > > > > > > > > > > Use crm_master to get master scores easily. > > > > > > > > > > > > > I use 1.1.15-rc1 > > > > > > > dpkg -l | grep pacemaker-cli-utils > > > > > > > ii pacemaker-cli-utils1.1.15-rc1 > > > > amd64 > > > > > > >Command line interface utilities for Pacemaker > > > > > > > > > > > > > > Also non-integer values work file: > > > > > > > /usr/sbin/crm_attribute -q --type nodes --node-uname $HOSTNAME > > > > > > --attr-name > > > > > > > pgsql-data-status --get-value > > > > > > > STREAMING|ASYNC > > > > > > > > > > > > I'm very confused. It sounds you are mixing two different resource > > > > agent > > > > > > for > > > > > > PostgreSQL. I can recognize scores for you master resource set bu > > the > > > > > > pgsqlms > > > > > > RA (PAF project) and the data-status attribute from the pgsql RA... > > > > > > > > > > > > > I thinking this patch > > > > > > > > > > > > > > > > > > > https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204?diff=unified#diff-55bc49a57c12093902e3842ce349a71fR269 > > > > > > > is > > > > > > > not apply in 1.1.15-rc1? > > > > > > > > > > > > > > How I can get integere value from node attribute? > > > > > > > > > > > > With the correct name for the given attribute. > > > > > > > > -- > > > > Jehan-Guillaume de Rorthais > > > > Dalibo > > > > > > > > > > > > -- > > Jehan-Guillaume de Rorthais > > Dalibo > > -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
ok, you were trying with the attribute name. I wrote you had a use the **resource** name. Your command should be (again): crm_master -r pgsqld -N $HOSTNAME -Q Or simply this if you want to check the score on the local node: crm_master -r pgsqld -Q Moreover, you should really consider doing some cleanup in your attributes, "pgsql-data-status" and "maintenance" definitely does not comes from the PAF project. Le Mon, 23 May 2016 12:44:29 +0300, Andrey Rogovsky a écrit : > Stack: corosync > Current DC: b (version 1.1.12-cdf310a) - partition with quorum > Last updated: Mon May 23 12:43:57 2016 Last change: Wed May 4 12:15:06 > 2016 via crm_attribute on c > > 3 nodes and 7 resources configured > > Online: [ a b c ] > > Resource Group: master > pgsql-master-ip (ocf::heartbeat:IPaddr2): Started a > Master/Slave Set: msPostgresql [pgsqld] > Masters: [ a ] > Slaves: [ b c ] > Clone Set: WebFarm [apache] > Started: [ a b c ] > > Node Attributes: > * Node a: > + maintenance : off > + master-pgsqld : 1001 > + pgsql-data-status : STREAMING|ASYNC > * Node b: > + maintenance : off > + master-pgsqld : 1000 > + pgsql-data-status : LATEST > * Node c: > + maintenance : off > + master-pgsqld : 1 > + pgsql-data-status : STREAMING|ASYNC > > > 2016-05-23 12:35 GMT+03:00 Jehan-Guillaume de Rorthais : > > > Le Mon, 23 May 2016 12:31:37 +0300, > > Andrey Rogovsky a écrit : > > > > > This is not work > > > # crm_master -r master-pgsqld -N $HOSTNAME -Q > > > Error performing operation: No such device or address > > > > as I wrote: you must use the resource name that is cloned by your master > > resource. > > > > Could you show us your configuration please? > > > > > 2016-05-23 11:46 GMT+03:00 Jehan-Guillaume de Rorthais > >: > > > > > > > Le Mon, 23 May 2016 11:36:37 +0300, > > > > Andrey Rogovsky a écrit : > > > > > > > > > Hi > > > > > This is not work for me: > > > > > # crm_master -r pgsqld -N $HOSTNAME $HOSTNAME -Q > > > > > Error performing operation: No such device or address > > > > > > > > This should be : > > > > > > > > crm_master -r pgsqld -N $HOSTNAME -Q > > > > > > > > (supposing as your resource name is "pgsqld") > > > > > > > > > 2016-05-23 11:19 GMT+03:00 Jehan-Guillaume de Rorthais < > > j...@dalibo.com > > > > >: > > > > > > > > > > > Le Mon, 23 May 2016 09:28:41 +0300, > > > > > > Andrey Rogovsky a écrit : > > > > > > > > > > > > > I try crm_master, but it not works: > > > > > > > # LC_ALL=C /usr/sbin/crm_master -q -t nodes --node-uname > > $HOSTNAME > > > > > > > --attr-name master-pgsqld --get-value > > > > > > > crm_master: invalid option -- 't' > > > > > > > crm_master: unrecognized option '--node-uname' > > > > > > > crm_master: unrecognized option '--attr-name' > > > > > > > crm_master - A convenience wrapper for crm_attribute > > > > > > > > > > > > I tried this with success: > > > > > > > > > > > > crm_master -r pgsqld -N hanode1 -Q > > > > > > > > > > > > The resource you have to give is the resource (the one that is > > cloned), > > > > > > not the > > > > > > master resource. > > > > > > > > > > > > > > > > > > > 2016-05-20 16:40 GMT+03:00 Jehan-Guillaume de Rorthais < > > > > j...@dalibo.com > > > > > > >: > > > > > > > > > > > > > > > Le Fri, 20 May 2016 15:31:16 +0300, > > > > > > > > Andrey Rogovsky a écrit : > > > > > > > > > > > > > > > > > Hi! > > > > > > > > > I cant get attribute value: > > > > > > > > > /usr/sbin/crm_attribute -q --type nodes --node-uname > > $HOSTNAME > > > > > > > > --attr-name > > > > > > > > > master-pgsqld --get-value > > > > > > > > > Err
Re: [ClusterLabs] Antw: Using pacemaker for manual failover only?
Le Tue, 24 May 2016 07:49:16 +0200, "Ulrich Windl" a écrit : > >>> "Stephano-Shachter, Dylan" schrieb am > >>> 23.05.2016 um > 21:03 in Nachricht > : > > [...] > I would like for the cluster to do nothing when a node fails unexpectedly. > [...] > So this means you only want the cluster to do something, if the node fails as > part of a planned maintenance? Then you need no cluster at all! (MHO) I can see the use case for this. I already faced situation where customers wanted a one step procedure to failover manually on the other side. I call this is the big-red-button failover. Producing its own custom shell script to failover a resource is actually really complexe. There so many way it could fail and only one way to do it properly. And of course, not a single architecture is the same than the other one, so we quickly end up with custom script everywhere. And we are not speaking of fencing yet... Being able to set up and test a cluster to deal with all the machinery to move your resource correctly is much more confortable and safe. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Using pacemaker for manual failover only?
Le Tue, 24 May 2016 01:53:22 -0400, Digimer a écrit : > On 23/05/16 03:03 PM, Stephano-Shachter, Dylan wrote: > > Hello, > > > > I am using pacemaker 1.1.14 with pcs 0.9.149. I have successfully > > configured pacemaker for highly available nfs with drbd. Pacemaker > > allows me to easily failover without interrupting nfs connections. I, > > however, am only interested in failing over manually (currently I use > > "pcs resource move --master"). I would like for > > the cluster to do nothing when a node fails unexpectedly. > > > > Right now the solution I am going with is to run > > "pcs property set is-managed-default=no" > > until I need to failover, at which point I set is-managed-default=yes, > > then failover, then set it back to no. > > > > While this method works for me, it can be unpredictable if people run > > move commands at the wrong time. > > > > Is there a way to disable automatic failover permanently while still > > allowing manual failover (with "pcs resource move" or with something else)? Try to set up your cluster without the "interval" parameter on the monitor action? The resource will be probed during the target-action (start/promote I suppose), but then it should not get monitored anymore. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_attribute bug in 1.1.15-rc1
Le Mon, 23 May 2016 19:21:23 +0300, Andrey Rogovsky a écrit : > Hi > Any idea why it not work on my cluster? Ok, I think I understood the problem. By default, crm_master use "forever" as lifetime attribute. So my commands were incomplete to get live master score set from the RA itself. Try the following command: crm_master -l reboot -r pgsqld -Q or crm_master -l reboot -r pgsqld -N $NODENAME -Q > 2016-05-23 19:00 GMT+03:00 Jehan-Guillaume de Rorthais : > > > Le Mon, 23 May 2016 15:42:55 +0300, > > Andrey Rogovsky a écrit : > > > > > Hi > > > Your commands is not works > > > root@c:~# crm_master -r pgsqld -N $HOSTNAME -Q > > > Error performing operation: No such device or address > > > root@c:~# crm_master -r pgsqld -Q > > > Error performing operation: No such device or address > > > > > > It works here: > > > > root@srv1:~$ crm_master -r pgsqld -Q > > 1 > > root@srv1:~$ crm_master -r pgsqld -N srv2 -Q > > 1001 > > > > > > > > > > > 2016-05-23 13:38 GMT+03:00 Jehan-Guillaume de Rorthais > >: > > > > > > > ok, you were trying with the attribute name. I wrote you had a use the > > > > **resource** name. > > > > > > > > Your command should be (again): > > > > > > > > crm_master -r pgsqld -N $HOSTNAME -Q > > > > > > > > Or simply this if you want to check the score on the local node: > > > > > > > > crm_master -r pgsqld -Q > > > > > > > > Moreover, you should really consider doing some cleanup in your > > attributes, > > > > "pgsql-data-status" and "maintenance" definitely does not comes from > > the > > > > PAF > > > > project. > > > > > > > > > > > > Le Mon, 23 May 2016 12:44:29 +0300, > > > > Andrey Rogovsky a écrit : > > > > > > > > > Stack: corosync > > > > > Current DC: b (version 1.1.12-cdf310a) - partition with quorum > > > > > Last updated: Mon May 23 12:43:57 2016 Last change: Wed May 4 > > 12:15:06 > > > > > 2016 via crm_attribute on c > > > > > > > > > > 3 nodes and 7 resources configured > > > > > > > > > > Online: [ a b c ] > > > > > > > > > > Resource Group: master > > > > > pgsql-master-ip (ocf::heartbeat:IPaddr2): Started a > > > > > Master/Slave Set: msPostgresql [pgsqld] > > > > > Masters: [ a ] > > > > > Slaves: [ b c ] > > > > > Clone Set: WebFarm [apache] > > > > > Started: [ a b c ] > > > > > > > > > > Node Attributes: > > > > > * Node a: > > > > > + maintenance : off > > > > > + master-pgsqld : 1001 > > > > > + pgsql-data-status : STREAMING|ASYNC > > > > > * Node b: > > > > > + maintenance : off > > > > > + master-pgsqld : 1000 > > > > > + pgsql-data-status : LATEST > > > > > * Node c: > > > > > + maintenance : off > > > > > + master-pgsqld : 1 > > > > > + pgsql-data-status : STREAMING|ASYNC > > > > > > > > > > > > > > > 2016-05-23 12:35 GMT+03:00 Jehan-Guillaume de Rorthais < > > j...@dalibo.com > > > > >: > > > > > > > > > > > Le Mon, 23 May 2016 12:31:37 +0300, > > > > > > Andrey Rogovsky a écrit : > > > > > > > > > > > > > This is not work > > > > > > > # crm_master -r master-pgsqld -N $HOSTNAME -Q > > > > > > > Error performing operation: No such device or address > > > > > > > > > > > > as I wrote: you must use the resource name that is cloned by your > > > > master > > > > > > resource. > > > > > > > > > > > > Could you show us your configuration please? > > > > > > > > > > > > > 2016-05-23 11:46 GMT+03:00 Jehan-Guillaume de Rorthais < > > > > j...@dalibo.com > > > > > > >: > > > > > > > > > > > > >
Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster
Le 8 juin 2016 13:36:03 GMT+02:00, Nikhil Utane a écrit : >Hi, > >Would like to know the best and easiest way to add a new node to an >already >running cluster. > >Our limitation: >1) pcsd cannot be used since (as per my understanding) it communicates >over >ssh which is prevented. As far as i remember, pcsd deamons use their own tcp port (not the ssh one) and communicate with each others using http queries (over ssl i suppose). As far as i understand, crmsh uses ssh, not pcsd. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Recovering after split-brain
Le Mon, 20 Jun 2016 19:00:12 +0530, Nikhil Utane a écrit : > Hi, > > For our solution we are making a conscious choice to not use quorum/fencing > as for us service availability is more important than having 2 nodes take > up the same active role. Split-brain is not an issue for us (at least i > think that way) since we have a second line of defense. We have clients who > can connect to only one of the two active nodes. So in that sense, even if > we end up with 2 nodes becoming active, since the clients can connect to > only 1 of the active node, we should not have any issue. I've heard multiple time this kind of argument on the field, but soon or later, these clusters actually had a split brain scenario with clients connected on both side, some very bad corruptions, data lost, etc. Do never under estimate the kaos. It will always find a way to surprise you. If there is a breach somewhere, soon or later everything will blow up. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ocf scripts shell and local variables
On Mon, 29 Aug 2016 10:02:28 -0500 Ken Gaillot wrote: > On 08/29/2016 09:43 AM, Dejan Muhamedagic wrote: ... >> I doubt that we could do a moderately complex shell scripts >> without capability of limiting the variables' scope and retaining >> sanity at the same time. > > This prefixing approach would definitely be ugly, and it would violate > best practices on shells that do support local, but it should be feasible. > > I'd argue that anything moderately complex should be converted to python > (maybe after we have OCF 2.0, and some good python bindings ...). For what it worth, I already raised this discussion some month ago as we wrote some perl modules equivalent to ocf-shellfuncs, ocf-returncodes and ocf-directories. See: Subject: [ClusterLabs Developers] Perl Modules for resource agents (was: Resource Agent language discussion) Date: Thu, 26 Nov 2015 01:13:36 +0100 I don't want to start a flameware about languages here, this is not about that. Maybe it would be a good time to include various libraries for different languages in official source? At least for ocf-directories which is quite simple, but often tied to the configure options in various distro. We had to make a ugly wrapper around the ocf-directories librairie on build time to produce our OCF_Directories.pm module on various distros. Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On Thu, 8 Sep 2016 15:55:50 +0900 Digimer wrote: > On 08/09/16 03:47 PM, Ulrich Windl wrote: > Shermal Fernando schrieb am 08.09.2016 um > 06:41 in > > Nachricht > > <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: > >> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU > >> starvation or hanging while trying to perform a IO operation. > >> Please share some thoughts on this issue. > > > > What is "the whole cluster will fail"? If the DC times out, some recovery > > will take place. > > Yup. The starved node should be declared lost by corosync, the remaining > nodes reform and if they're still quorate, the hung node should be > fenced. Recovery occur and life goes on. +1 And fencing might either come from outside, or just from the server itself using watchdog. -- Jehan-Guillaume (ioguix) de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On Thu, 8 Sep 2016 08:58:15 + Shermal Fernando wrote: > Hi Jehan-Guillaume, > > Does this means watchdog will serf-terminate the machine when the crm daemon > is frozen? This means that if the machine is under such a load that PAcemaker is not able to feed the watchdog, the watchdog will fence the machine itself. > -Original Message- > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > Sent: Thursday, September 08, 2016 12:52 PM > To: Digimer > Cc: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster > decisions are delayed infinitely > > On Thu, 8 Sep 2016 15:55:50 +0900 > Digimer wrote: > > > On 08/09/16 03:47 PM, Ulrich Windl wrote: > > >>>> Shermal Fernando schrieb am > > >>>> 08.09.2016 um > > >>>> 06:41 in > > > Nachricht > > > <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: > > >> The whole cluster will fail if the DC (crm daemon) is frozen due to > > >> CPU starvation or hanging while trying to perform a IO operation. > > >> Please share some thoughts on this issue. > > > > > > What is "the whole cluster will fail"? If the DC times out, some > > > recovery will take place. > > > > Yup. The starved node should be declared lost by corosync, the > > remaining nodes reform and if they're still quorate, the hung node > > should be fenced. Recovery occur and life goes on. > > +1 > > And fencing might either come from outside, or just from the server itself > using watchdog. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On Thu, 8 Sep 2016 09:51:27 + Shermal Fernando wrote: > Hi Jehan-Guillaume, > > Sorry for disturbing you. This is really important for us to pass this test > on the pacemaker resiliency and robustness. To my understanding, it's the > pacemakerd who feeds the watchdog. If only the crmd is hung, fencing will not > work. Am I correct here? I guess yes. I am talking of a scenario where the server is under a high load (fork bomb, swap storm, ...), not only crmd being hung for some reasons. > -Original Message----- > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > Sent: Thursday, September 08, 2016 3:12 PM > To: Shermal Fernando > Cc: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster > decisions are delayed infinitely > > On Thu, 8 Sep 2016 08:58:15 + > Shermal Fernando wrote: > > > Hi Jehan-Guillaume, > > > > Does this means watchdog will serf-terminate the machine when the crm > > daemon is frozen? > > This means that if the machine is under such a load that PAcemaker is not > able to feed the watchdog, the watchdog will fence the machine itself. > > > -Original Message- > > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > > Sent: Thursday, September 08, 2016 12:52 PM > > To: Digimer > > Cc: Cluster Labs - All topics related to open-source clustering > > welcomed > > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, > > cluster decisions are delayed infinitely > > > > On Thu, 8 Sep 2016 15:55:50 +0900 > > Digimer wrote: > > > > > On 08/09/16 03:47 PM, Ulrich Windl wrote: > > > >>>> Shermal Fernando schrieb am > > > >>>> 08.09.2016 um > > > >>>> 06:41 in > > > > Nachricht > > > > <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: > > > >> The whole cluster will fail if the DC (crm daemon) is frozen due > > > >> to CPU starvation or hanging while trying to perform a IO operation. > > > >> Please share some thoughts on this issue. > > > > > > > > What is "the whole cluster will fail"? If the DC times out, some > > > > recovery will take place. > > > > > > Yup. The starved node should be declared lost by corosync, the > > > remaining nodes reform and if they're still quorate, the hung node > > > should be fenced. Recovery occur and life goes on. > > > > +1 > > > > And fencing might either come from outside, or just from the server > > itself using watchdog. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Replicated PGSQL woes
On Thu, 13 Oct 2016 10:05:33 -0800 Israel Brewster wrote: > On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: > > > > On 10/13/2016 12:04 PM, Israel Brewster wrote: [...] > >> But whatever- this is a cluster, it doesn't really matter which node > >> things are running on, as long as they are running. So the cluster is > >> working - postgresql starts, the master process is on the same node as > >> the IP, you can connect, etc, everything looks good. Obviously the next > >> thing to try is failover - should the master node fail, the slave node > >> should be promoted to master. So I try testing this by shutting down the > >> cluster on the primary server: "pcs cluster stop" > >> ...and nothing happens. The master shuts down (uncleanly, I might add - > >> it leaves behind a lock file that prevents it from starting again until > >> I manually remove said lock file), but the slave is never promoted to > > > > This definitely needs to be corrected. What creates the lock file, and > > how is that entity managed? > > The lock file entity is created/managed by the postgresql process itself. On > launch, postgres creates the lock file to say it is running, and deletes said > lock file when it shuts down. To my understanding, its role in life is to > prevent a restart after an unclean shutdown so the admin is reminded to make > sure that the data is in a consistent state before starting the server again. What is the name of this lock file? Where is it? PostgreSQL does not create lock file. It creates a "postmaster.pid" file, but it does not forbid a startup if the new process doesn't find another process with the pid and shm shown in the postmaster.pid. As far as I know, the pgsql resource agent create such a lock file on promote and delete it on graceful stop. If the PostgreSQL instance couldn't be stopped correctly, the lock files stays and the RA refuse to start it the next time. [...] > >> What can I do to fix this? What troubleshooting steps can I follow? Thanks. I can not find the result of the stop operation in your log files, maybe the log from CentTest2 would be more useful. but I can find this: Oct 13 08:29:41 CentTest1 pengine[30095]: notice: Scheduling Node centtest2.ravnalaska.net for shutdown ... Oct 13 08:29:41 CentTest1 pengine[30095]: notice: Scheduling Node centtest2.ravnalaska.net for shutdown Which means the stop operation probably raised an error, leading to a fencing of the node. In this circumstance, I bet PostgreSQL wasn't able to stop correctly and the lock file stayed in place. Could you please show us your full cluster setup? By the way, did you had a look to the PAF project? http://dalibo.github.io/PAF/ http://dalibo.github.io/PAF/documentation.html The v1.1 version for EL6 is not ready yet, but you might want to give it a try: https://github.com/dalibo/PAF/tree/v1.1 I would recommend EL7 and PAF 2.0, published, packaged, ready to use. Regards, -- Jehan-Guillaume (ioguix) de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Replicated PGSQL woes
On Thu, 13 Oct 2016 14:11:06 -0800 Israel Brewster wrote: > On Oct 13, 2016, at 1:56 PM, Jehan-Guillaume de Rorthais > wrote: > > > > On Thu, 13 Oct 2016 10:05:33 -0800 > > Israel Brewster wrote: > > > >> On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: > >>> > >>> On 10/13/2016 12:04 PM, Israel Brewster wrote: > > [...] > > > >>>> But whatever- this is a cluster, it doesn't really matter which node > >>>> things are running on, as long as they are running. So the cluster is > >>>> working - postgresql starts, the master process is on the same node as > >>>> the IP, you can connect, etc, everything looks good. Obviously the next > >>>> thing to try is failover - should the master node fail, the slave node > >>>> should be promoted to master. So I try testing this by shutting down the > >>>> cluster on the primary server: "pcs cluster stop" > >>>> ...and nothing happens. The master shuts down (uncleanly, I might add - > >>>> it leaves behind a lock file that prevents it from starting again until > >>>> I manually remove said lock file), but the slave is never promoted to > >>> > >>> This definitely needs to be corrected. What creates the lock file, and > >>> how is that entity managed? > >> > >> The lock file entity is created/managed by the postgresql process itself. > >> On launch, postgres creates the lock file to say it is running, and > >> deletes said lock file when it shuts down. To my understanding, its role > >> in life is to prevent a restart after an unclean shutdown so the admin is > >> reminded to make sure that the data is in a consistent state before > >> starting the server again. > > > > What is the name of this lock file? Where is it? > > > > PostgreSQL does not create lock file. It creates a "postmaster.pid" file, > > but it does not forbid a startup if the new process doesn't find another > > process with the pid and shm shown in the postmaster.pid. > > > > As far as I know, the pgsql resource agent create such a lock file on > > promote and delete it on graceful stop. If the PostgreSQL instance couldn't > > be stopped correctly, the lock files stays and the RA refuse to start it > > the next time. > > Ah, you're right. Looking auth the RA I see where it creates the file in > question. The delete appears to be in the pgsql_real_stop() function (which > makes sense), wrapped in an if block that checks for $1 being master and > $OCF_RESKEY_CRM_meta_notify_slave_uname being a space. Throwing a little > debugging code in there I see that when it hits that block on a cluster stop, > $OCF_RESKEY_CRM_meta_notify_slave_uname is centtest1.ravnalaska.net > <http://centtest1.ravnalaska.net/>, not a space, so the lock file is not > removed: > > if [ "$1" = "master" -a "$OCF_RESKEY_CRM_meta_notify_slave_uname" = " " > ]; then ocf_log info "Removing $PGSQL_LOCK." > rm -f $PGSQL_LOCK > fi > > It doesn't look like there is anywhere else where the file would be removed. This is quite wrong to me for two reasons (I'll try to be clear): 1) the resource agent (RA) make sure the timeline (TL) will not be incremented during promotion. As there is no documentation about that, I'm pretty sure this contortion comes from limitations in very old versions of PostgreSQL (<= 9.1): * a slave wasn't able to cross a timeline (TL) from streaming replication, only from WAL archives. That means crossing a TL was requiring to restart the slave or cutting the streaming rep temporary to force it to get back to the archives * moreover, it was possible a standby miss some transactions on after a clean master shutdown. That means the old master couldn't get back to the cluster as a slave safely, as the TL is still the same... See slide 35->37: http://www.slideshare.net/takmatsuo/2012929-pg-study-16012253 In my understanding, that's why we make sure there's no slave around before shutting down the master: should the master go back later cleanly, we make sure no one could be promoted in the meantime. Note that considering this issue and how the RA tries to avoid it, this test on slave being shutdown before master is quite weak anyway... Last but not least, the two PostgreSQL limitations the RA is messing with have been fixed long time ago in 9.3: * https://www.postgresql.org/docs/current/static/release-9-3.html#AEN138909 * https://git.postgresql.org/gitweb/?p=postgresql.gi
Re: [ClusterLabs] Antw: Re: Replicated PGSQL woes
On Fri, 14 Oct 2016 09:59:04 +0200 "Ulrich Windl" wrote: > >>> Jehan-Guillaume de Rorthais schrieb am 13.10.2016 um > >>> 23:56 in > Nachricht <20161013235606.007018eb@firost>: > > [...] > > As far as I know, the pgsql resource agent create such a lock file on > > promote > > and delete it on graceful stop. If the PostgreSQL instance couldn't be > > stopped > > correctly, the lock files stays and the RA refuse to start it the next > > time. > > As a note: We once had the case of a very old stale PID file, where a valid > start was denied, because the PID existed, but belonged to a completely > different process in the meantime (on a busy server). That's why stale PID > files should be deleted; specifically they shouldn't survive a reboot ;-) As far as I understand this logic, it has changed now. The PID file of PostgreSQL contains the PID **and** shmid created by the last postmaster started. During a fresh start, if the postmaster.pid file exists, it checks if the PID AND the shmid still exist on the system and some processes are still connected to it. See CreateLockFile in src/backend/utils/init/miscinit.c: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/init/miscinit.c;h=22b046e006e50be49c0615d271ed8963c97192c2;hb=HEAD#l758 > You can conclude from a missing PID that the process is not running with that > PID, but you cannot conclude from an existing PID that it's still the same > process ;-) At least, what I just described checks if the existing PID is owned by the same user using a kill -0. Cheers :) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Replicated PGSQL woes
On Fri, 14 Oct 2016 08:10:08 -0800 Israel Brewster wrote: > On Oct 14, 2016, at 1:39 AM, Jehan-Guillaume de Rorthais > wrote: > > > > On Thu, 13 Oct 2016 14:11:06 -0800 > > Israel Brewster wrote: [...] > > I **guess** if you really want a shutdown to occurs, I meant «failover» here, not shutdown, sorry. > > you need to simulate a real failure, not shutting down the first node > > cleanly. Try to kill corosync. > > From an academic standpoint the result of that test (which, incidentally, > were the same as the results of every other test I've done) are interesting, > however from a practical standpoint I'm not sure it helps much - most of the > "failures" that I experience are intentional: I want to fail over to the > other machine so I can run some software updates, reboot for whatever reason, > shutdown temporarily to upgrade the hardware, or whatever. As far as I know, this is switch over. Planed switch over. And you should definitely check PAF then :) [...] -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Replicated PGSQL woes
On Wed, 19 Oct 2016 19:44:14 +0900 Keisuke MORI wrote: > 2016-10-14 18:39 GMT+09:00 Jehan-Guillaume de Rorthais : > > On Thu, 13 Oct 2016 14:11:06 -0800 > > Israel Brewster wrote: > > > >> On Oct 13, 2016, at 1:56 PM, Jehan-Guillaume de Rorthais > >> wrote: > >> > > >> > On Thu, 13 Oct 2016 10:05:33 -0800 > >> > Israel Brewster wrote: > >> > > >> >> On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: > >> >>> > >> >>> On 10/13/2016 12:04 PM, Israel Brewster wrote: > >> > [...] > >> > > >> >>>> But whatever- this is a cluster, it doesn't really matter which node > >> >>>> things are running on, as long as they are running. So the cluster is > >> >>>> working - postgresql starts, the master process is on the same node as > >> >>>> the IP, you can connect, etc, everything looks good. Obviously the > >> >>>> next thing to try is failover - should the master node fail, the > >> >>>> slave node should be promoted to master. So I try testing this by > >> >>>> shutting down the cluster on the primary server: "pcs cluster stop" > >> >>>> ...and nothing happens. The master shuts down (uncleanly, I might add > >> >>>> - it leaves behind a lock file that prevents it from starting again > >> >>>> until I manually remove said lock file), but the slave is never > >> >>>> promoted to > >> >>> > >> >>> This definitely needs to be corrected. What creates the lock file, and > >> >>> how is that entity managed? > >> >> > >> >> The lock file entity is created/managed by the postgresql process > >> >> itself. On launch, postgres creates the lock file to say it is running, > >> >> and deletes said lock file when it shuts down. To my understanding, its > >> >> role in life is to prevent a restart after an unclean shutdown so the > >> >> admin is reminded to make sure that the data is in a consistent state > >> >> before starting the server again. > >> > > >> > What is the name of this lock file? Where is it? > >> > > >> > PostgreSQL does not create lock file. It creates a "postmaster.pid" file, > >> > but it does not forbid a startup if the new process doesn't find another > >> > process with the pid and shm shown in the postmaster.pid. > >> > > >> > As far as I know, the pgsql resource agent create such a lock file on > >> > promote and delete it on graceful stop. If the PostgreSQL instance > >> > couldn't be stopped correctly, the lock files stays and the RA refuse to > >> > start it the next time. > >> > >> Ah, you're right. Looking auth the RA I see where it creates the file in > >> question. The delete appears to be in the pgsql_real_stop() function (which > >> makes sense), wrapped in an if block that checks for $1 being master and > >> $OCF_RESKEY_CRM_meta_notify_slave_uname being a space. Throwing a little > >> debugging code in there I see that when it hits that block on a cluster > >> stop, $OCF_RESKEY_CRM_meta_notify_slave_uname is centtest1.ravnalaska.net > >> <http://centtest1.ravnalaska.net/>, not a space, so the lock file is not > >> removed: > >> > >> if [ "$1" = "master" -a "$OCF_RESKEY_CRM_meta_notify_slave_uname" = " > >> " ]; then ocf_log info "Removing $PGSQL_LOCK." > >> rm -f $PGSQL_LOCK > >> fi > >> > >> It doesn't look like there is anywhere else where the file would be > >> removed. > > > > This is quite wrong to me for two reasons (I'll try to be clear): > > > > 1) the resource agent (RA) make sure the timeline (TL) will not be > > incremented during promotion. > > > > As there is no documentation about that, I'm pretty sure this contortion > > comes from limitations in very old versions of PostgreSQL (<= 9.1): > > > > * a slave wasn't able to cross a timeline (TL) from streaming replication, > > only from WAL archives. That means crossing a TL was requiring to > > restart the slave or cutting the streaming rep temporary to force it to get > > back to the archives > > * moreover, it was possible a standby miss some transactions
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 10:12:04 +0100 Klaus Wenninger wrote: > On 11/07/2016 08:41 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 04.11.2016 um 22:37 in > Nachricht > > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > >> On 11/04/2016 02:29 AM, Ulrich Windl wrote: > >> Ken Gaillot schrieb am 03.11.2016 um 17:08 in > >>> Nachricht > >>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: ... > >> Another possible use would be for a cron that needs to know whether a > >> particular resource is running, and an attribute query is quicker and > >> easier than something like parsing crm_mon output or probing the service. > > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > > besides of lacking options and inefficient implementation, why should one > > be faster than the other? > > attrd_updater doesn't go for the CIB AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" since 1.1.13: https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 09:31:20 -0600 Ken Gaillot wrote: > On 11/07/2016 03:47 AM, Klaus Wenninger wrote: > > On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: > >> On Mon, 7 Nov 2016 10:12:04 +0100 > >> Klaus Wenninger wrote: > >> > >>> On 11/07/2016 08:41 AM, Ulrich Windl wrote: > >>>>>>> Ken Gaillot schrieb am 04.11.2016 um 22:37 in > >>>>>>> Nachricht > >>>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > >>>>> On 11/04/2016 02:29 AM, Ulrich Windl wrote: > >>>>>>>>> Ken Gaillot schrieb am 03.11.2016 um 17:08 > >>>>>>>>> in > >>>>>> Nachricht > >>>>>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: > >> ... > >>>>> Another possible use would be for a cron that needs to know whether a > >>>>> particular resource is running, and an attribute query is quicker and > >>>>> easier than something like parsing crm_mon output or probing the > >>>>> service. > >>>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > >>>> besides of lacking options and inefficient implementation, why should one > >>>> be faster than the other? > >>> attrd_updater doesn't go for the CIB > >> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" > >> since 1.1.13: > >> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 > > That prevents values being stored in the CIB. attrd_updater should > > always talk to attrd as I got it ... > > It's a bit confusing: Both crm_attribute and attrd_updater will > ultimately affect both attrd and the CIB in most cases, but *how* they > do so is different. crm_attribute modifies the CIB, and lets attrd pick > up the change from there; attrd_updater notifies attrd, and lets attrd > modify the CIB. > > The difference is subtle. > > With corosync 2, attrd only modifies "transient" node attributes (which > stay in effect till the next reboot), not "permanent" attributes. So why "--private" is not compatible with corosync 1.x as attrd_updater only set "transient" attributes anyway? How and where private attributes are stored? > So crm_attribute must be used if you want to set a permanent attribute. > crm_attribute also has the ability to modify cluster properties and > resource defaults, as well as node attributes. > > On the other hand, by contacting attrd directly, attrd_updater can > change an attribute's "dampening" (how often it is flushed to the CIB), > and it can (as mentioned above) set "private" attributes that are never > written to the CIB (and thus never cause the cluster to re-calculate > resource placement). Interesting, thank you for the clarification. As I understand it, it resumes to: crm_attribute -> CIB <-(poll/notify?) attrd attrd_updater -> attrd -> CIB Just a quick question about this, is it possible to set a "dampening" high enough so attrd never flush it to the CIB (kind of private attribute too)? Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 12:39:32 -0600 Ken Gaillot wrote: > On 11/07/2016 12:03 PM, Jehan-Guillaume de Rorthais wrote: > > On Mon, 7 Nov 2016 09:31:20 -0600 > > Ken Gaillot wrote: > > > >> On 11/07/2016 03:47 AM, Klaus Wenninger wrote: > >>> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: > >>>> On Mon, 7 Nov 2016 10:12:04 +0100 > >>>> Klaus Wenninger wrote: > >>>> > >>>>> On 11/07/2016 08:41 AM, Ulrich Windl wrote: > >>>>>>>>> Ken Gaillot schrieb am 04.11.2016 um 22:37 in > >>>>>>>>> Nachricht > >>>>>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > >>>>>>> On 11/04/2016 02:29 AM, Ulrich Windl wrote: > >>>>>>>>>>> Ken Gaillot schrieb am 03.11.2016 um 17:08 > >>>>>>>>>>> in > >>>>>>>> Nachricht > >>>>>>>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: > >>>> ... > >>>>>>> Another possible use would be for a cron that needs to know whether a > >>>>>>> particular resource is running, and an attribute query is quicker and > >>>>>>> easier than something like parsing crm_mon output or probing the > >>>>>>> service. > >>>>>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > >>>>>> besides of lacking options and inefficient implementation, why should > >>>>>> one be faster than the other? > >>>>> attrd_updater doesn't go for the CIB > >>>> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" > >>>> since 1.1.13: > >>>> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 > >>> That prevents values being stored in the CIB. attrd_updater should > >>> always talk to attrd as I got it ... > >> > >> It's a bit confusing: Both crm_attribute and attrd_updater will > >> ultimately affect both attrd and the CIB in most cases, but *how* they > >> do so is different. crm_attribute modifies the CIB, and lets attrd pick > >> up the change from there; attrd_updater notifies attrd, and lets attrd > >> modify the CIB. > >> > >> The difference is subtle. > >> > >> With corosync 2, attrd only modifies "transient" node attributes (which > >> stay in effect till the next reboot), not "permanent" attributes. > > > > So why "--private" is not compatible with corosync 1.x as attrd_updater > > only set "transient" attributes anyway? > > Corosync 1 does not support certain reliability guarantees required by > the current attrd, so when building against the corosync 1 libraries, > pacemaker will install "legacy" attrd instead. The difference is mainly > that the current attrd can guarantee atomic updates to attribute values. > attrd_updater actually can set permanent attributes when used with > legacy attrd. OK, I understand now. > > How and where private attributes are stored? > > They are kept in memory only, in attrd. Of course, attrd is clustered, > so they are kept in sync across all nodes. OK, that was my guess. > >> So crm_attribute must be used if you want to set a permanent attribute. > >> crm_attribute also has the ability to modify cluster properties and > >> resource defaults, as well as node attributes. > >> > >> On the other hand, by contacting attrd directly, attrd_updater can > >> change an attribute's "dampening" (how often it is flushed to the CIB), > >> and it can (as mentioned above) set "private" attributes that are never > >> written to the CIB (and thus never cause the cluster to re-calculate > >> resource placement). > > > > Interesting, thank you for the clarification. > > > > As I understand it, it resumes to: > > > > crm_attribute -> CIB <-(poll/notify?) attrd > > attrd_updater -> attrd -> CIB > > Correct. On startup, attrd registers with CIB to be notified of all changes. > > > Just a quick question about this, is it possible to set a "dampening" high > > enough so attrd never flush it to the CIB (kind of private attribute too)? > > I'd expect that to work, if the dampening interval was higher than the > lifetime of the cluster being up. Interesting. > It's also possible to abuse attrd to create a kind of private attribute > by using a node name that doesn't exist and never will. :) This ability > is intentionally allowed, so you can set attributes for nodes that the > current partition isn't aware of, or nodes that are planned to be added > later, but only attributes for known nodes will be written to the CIB. Again, interesting. I'll do some test on my RA as I need clustered private attributes and was not able to get them under old stack (Debian < 8 or RHEL < 7). Thank you very much for your answers! Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.16 released
On Wed, 30 Nov 2016 14:05:19 -0600 Ken Gaillot wrote: > ClusterLabs is proud to announce the latest release of the Pacemaker > cluster resource manager, version 1.1.15. 1.1.6 I guess ;) But congrats anyway ! > * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not > properly passed to multistate resources with notification enabled. This > has been fixed. To help resource agents detect when the fix is > available, the CRM feature set has been incremented. (Whenever the > feature set changes, mixed-version clusters are supported only during > rolling upgrades -- nodes with an older version will not be allowed to > rejoin once they shut down.) * how could we get the "CRM feature set" version from the RA? * when this "CRM feature set" has been introduced in Pacemaker? Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.16 released
Le 1 décembre 2016 17:39:45 GMT+01:00, Ken Gaillot a écrit : >On 12/01/2016 10:13 AM, Jehan-Guillaume de Rorthais wrote: >> On Wed, 30 Nov 2016 14:05:19 -0600 >> Ken Gaillot wrote: >> >>> ClusterLabs is proud to announce the latest release of the Pacemaker >>> cluster resource manager, version 1.1.15. >> >> 1.1.6 I guess ;) > >Whoops! > >Well now I don't feel so bad since the correction is wrong too ;) Argh !! :-D What about my questions bellow BTW ? :-P >> But congrats anyway ! >> >>> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not >>> properly passed to multistate resources with notification enabled. >>> This has been fixed. To help resource agents detect when the fix is >>> available, the CRM feature set has been incremented. (Whenever the >>> feature set changes, mixed-version clusters are supported only during >>> rolling upgrades -- nodes with an older version will not be allowed to >>> rejoin once they shut down.) >> >> * how could we get the "CRM feature set" version from the RA? >> * when this "CRM feature set" has been introduced in Pacemaker? >> >> Regards, >> -- Sent from my phone ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.16 released
On Fri, 2 Dec 2016 13:44:59 -0600 Ken Gaillot wrote: > On 12/01/2016 11:58 AM, Jehan-Guillaume de Rorthais wrote: > > > > > > Le 1 décembre 2016 17:39:45 GMT+01:00, Ken Gaillot a > > écrit : > >> On 12/01/2016 10:13 AM, Jehan-Guillaume de Rorthais wrote: > >>> On Wed, 30 Nov 2016 14:05:19 -0600 > >>> Ken Gaillot wrote: > >>> > >>>> ClusterLabs is proud to announce the latest release of the Pacemaker > >>>> cluster resource manager, version 1.1.15. > >>> > >>> 1.1.6 I guess ;) > >> > >> Whoops! > >> > >> Well now I don't feel so bad since the correction is wrong too ;) > > > > Argh !! :-D > > > > What about my questions bellow BTW ? :-P > > > >>> But congrats anyway ! > >>> > >>>> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not > >>>> properly passed to multistate resources with notification enabled. > >>>> This has been fixed. To help resource agents detect when the fix is > >>>> available, the CRM feature set has been incremented. (Whenever the > >>>> feature set changes, mixed-version clusters are supported only during > >>>> rolling upgrades -- nodes with an older version will not be allowed to > >>>> rejoin once they shut down.) > >>> > >>> * how could we get the "CRM feature set" version from the RA? > > it's passed as an environment variable > > >>> * when this "CRM feature set" has been introduced in Pacemaker? > > always, see http://wiki.clusterlabs.org/wiki/ReleaseCalendar Thank you for the answers! Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout
Hello, While setting this various parameters, I couldn't find documentation and details about them. Bellow some questions. Considering the watchdog module used on a server is set up with a 30s timer (lets call it the wdt, the "watchdog timer"), how should "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be set? Here is my thinking so far: "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the wdt expire so the server stay alive. Online resources and default values are usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to reset the timer multiple times (eg. because of excessive load, swap storm etc)? The server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right? "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after it asked for a node fencing before it considers the watchdog was actually triggered and the node reseted, even with no confirmation? I suppose "stonith-watchdog-timeout" is mostly useful to stonithd, right? "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action timeout should be at least greater than the wdt so stonithd will not raise a timeout before the wdt had a chance to exprire and reset the node. Is it right? Any other comments? Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout
On Thu, 8 Dec 2016 11:47:20 +0100 Jehan-Guillaume de Rorthais wrote: > Hello, > > While setting this various parameters, I couldn't find documentation and > details about them. Bellow some questions. > > Considering the watchdog module used on a server is set up with a 30s timer > (lets call it the wdt, the "watchdog timer"), how should > "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be > set? > > Here is my thinking so far: > > "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the > wdt expire so the server stay alive. Online resources and default values are > usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to > reset the timer multiple times (eg. because of excessive load, swap storm > etc)? The server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, > right? > > "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is > stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after > it asked for a node fencing before it considers the watchdog was actually > triggered and the node reseted, even with no confirmation? I suppose > "stonith-watchdog-timeout" is mostly useful to stonithd, right? > > "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action > timeout should be at least greater than the wdt so stonithd will not raise a > timeout before the wdt had a chance to exprire and reset the node. Is it > right? Anyone on these questions? I am currently writing some more doc/cookbook for the PAF project[1], I would prefer being sure of what is written there :) [1] http://dalibo.github.io/PAF/documentation.html Regards, ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] [Announce] PostgreSQL Automatic Failover (PAF) v2.1 rc2 released
= Announce = PostgreSQL Automatic Failover (PAF) v2.1_rc2 has been released on December 17th 2016 under the PostgreSQL licence. Any kind of test and report is highly appreciated and welcomed. Only one harmless bug has been fixed since v2.1_rc1. Many thanks to YanChii for the report and push request on Github: * fix: timeout given to pg_ctl was in ms instead of sec, by YanChii We plan to release v2.1.0 next 22nd of December! To download the sources, a RPM or a DEB package, see: https://github.com/dalibo/PAF/releases/tag/v2.1_rc2 = Introduction = PAF is a new PostgreSQL resource agent for Pacemaker. Its original aim is to keep it clear between the Pacemaker administration and the PostgreSQL one, to keep things simple, documented and yet powerful. PAF has very few parameters to setup and very few requirements: * setting up streaming replication by yourself * requires "application_name=$hostname" in primary_conninfo * requires "recovery_target_timeline = 'latest'" * requires "standby_mode = on" It deals with the start/stop order of PostgreSQL clusters in the replication cluster, the failover obviously, the switchover, the role swapping between the master and one of its slaves, etc. = Links = Source code and releases are available on github: * https://github.com/dalibo/PAF/ * https://github.com/dalibo/PAF/releases Documentation, quick starts, cookbook and support as well: * http://dalibo.github.io/PAF/ * http://dalibo.github.io/PAF/documentation.html * https://github.com/dalibo/PAF/issues Please, use the pgsql-general mailing list if you have questions. Any feedback, bug report, patch is welcomed. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout
On Wed, 14 Dec 2016 14:52:41 +0100 Klaus Wenninger wrote: > On 12/14/2016 01:26 PM, Jehan-Guillaume de Rorthais wrote: > > On Thu, 8 Dec 2016 11:47:20 +0100 > > Jehan-Guillaume de Rorthais wrote: > > > >> Hello, > >> > >> While setting this various parameters, I couldn't find documentation and > >> details about them. Bellow some questions. > >> > >> Considering the watchdog module used on a server is set up with a 30s timer > >> (lets call it the wdt, the "watchdog timer"), how should > >> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be > >> set? > >> > >> Here is my thinking so far: > >> > >> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before > >> the wdt expire so the server stay alive. Online resources and default > >> values are usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if > >> sbd fails to reset the timer multiple times (eg. because of excessive > >> load, swap storm etc)? The server will not reset before > >> random*SBD_WATCHDOG_TIMEOUT or wdt, right? > > SBD_WATCHDOG_TIMEOUT (e.g. in /etc/sysconfig/sbd) is already the > timeout the hardware watchdog is configured to by sbd-daemon. Oh, ok, I did not realized sbd was actually setting the hardware watchdog timeout itself based on this variable. After some quick search to make sure I understand it right, I suppose it is done there? https://github.com/ClusterLabs/sbd/blob/172dcd03eaf26503a10a18501aa1b9f30eed7ee2/src/sbd-common.c#L123 > sbd-daemon is triggering faster - timeout_loop defaults to 1s but > is configurable. > > SBD_WATCHDOG_TIMEOUT (and maybe the loop timeout as well > but significantly shorter should be sufficient) > has to be configured so that failing to trigger within time means > a failure with high enough certainty or the machine showing > comparable response-times would anyway violate timing requirements > of the services running on itself and in the cluster. OK. So I understand now why 5s is fine as a default value then. > Have in mind that sbd-daemon defaults to running realtime-scheduled > and thus is gonna be more responsive than the usual services > on the system. Although you of course have to consider that > the watchers (child-processes of sbd that are observing e.g. > the block-device(s), corosync, pacemaker_remoted or > pacemaker node-health) might be significantly less responsive > due to their communication partners. I'm not sure yet to understand clearly the mechanism and interactions of sbd with other daemons. So far, I understood that Pacemaker/stonithd was able to poke sbd to ask it to trigger a node reset through the wd device. I'm very new to this area and I still lake of self documentation. > >> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what > >> is stonith-watchdog-timeout. Is it the maximum time to wait from stonithd > >> after it asked for a node fencing before it considers the watchdog was > >> actually triggered and the node reseted, even with no confirmation? I > >> suppose "stonith-watchdog-timeout" is mostly useful to stonithd, right? > > Yes, the time we can assume a node to be killed by the hardware-watchdog... > Double the hardware-watchdog-timeout is a good choice. OK, thank you > >> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith > >> action timeout should be at least greater than the wdt so stonithd will > >> not raise a timeout before the wdt had a chance to exprire and reset the > >> node. Is it right? > > stonith-timeout is the cluster-wide-defaut to wait for stonith-devices > to carry out their duty. In the sbd-case without a block-device (sbd used > for pacemaker to be observed by a hardware-watchdog) it shouldn't > play a role. I thought self-fencing through sbd/wd was carried by stonithd because of such messages in my PoC log files: stonith-ng: notice: unpack_config: Relying on watchdog integration for fencing That's why I thought "stonith-timeout" might have a role there, as it looks like a stonith device then... By pure tech interest here, some more input or documentation to read about how it works would be really appreciated. > When a block-device is being used it guards the > communication with the fence-agent communicating with the > block-device. OK Thank you for your help! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout
On Mon, 19 Dec 2016 13:37:09 +0100 Klaus Wenninger wrote: > On 12/17/2016 11:55 PM, Jehan-Guillaume de Rorthais wrote: > > On Wed, 14 Dec 2016 14:52:41 +0100 > > Klaus Wenninger wrote: > > > >> On 12/14/2016 01:26 PM, Jehan-Guillaume de Rorthais wrote: > >>> On Thu, 8 Dec 2016 11:47:20 +0100 > >>> Jehan-Guillaume de Rorthais wrote: > >>> > >>>> Hello, > >>>> > >>>> While setting this various parameters, I couldn't find documentation and > >>>> details about them. Bellow some questions. > >>>> > >>>> Considering the watchdog module used on a server is set up with a 30s > >>>> timer (lets call it the wdt, the "watchdog timer"), how should > >>>> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" > >>>> be set? > >>>> > >>>> Here is my thinking so far: > >>>> > >>>> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer > >>>> before the wdt expire so the server stay alive. Online resources and > >>>> default values are usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But > >>>> what if sbd fails to reset the timer multiple times (eg. because of > >>>> excessive load, swap storm etc)? The server will not reset before > >>>> random*SBD_WATCHDOG_TIMEOUT or wdt, right? > >> SBD_WATCHDOG_TIMEOUT (e.g. in /etc/sysconfig/sbd) is already the > >> timeout the hardware watchdog is configured to by sbd-daemon. > > Oh, ok, I did not realized sbd was actually setting the hardware watchdog > > timeout itself based on this variable. After some quick search to make sure > > I understand it right, I suppose it is done there? > > https://github.com/ClusterLabs/sbd/blob/172dcd03eaf26503a10a18501aa1b9f30eed7ee2/src/sbd-common.c#L123 > > > >> sbd-daemon is triggering faster - timeout_loop defaults to 1s but > >> is configurable. > >> > >> SBD_WATCHDOG_TIMEOUT (and maybe the loop timeout as well > >> but significantly shorter should be sufficient) > >> has to be configured so that failing to trigger within time means > >> a failure with high enough certainty or the machine showing > >> comparable response-times would anyway violate timing requirements > >> of the services running on itself and in the cluster. > > OK. So I understand now why 5s is fine as a default value then. > > > >> Have in mind that sbd-daemon defaults to running realtime-scheduled > >> and thus is gonna be more responsive than the usual services > >> on the system. Although you of course have to consider that > >> the watchers (child-processes of sbd that are observing e.g. > >> the block-device(s), corosync, pacemaker_remoted or > >> pacemaker node-health) might be significantly less responsive > >> due to their communication partners. > > I'm not sure yet to understand clearly the mechanism and interactions of sbd > > with other daemons. So far, I understood that Pacemaker/stonithd was able to > > poke sbd to ask it to trigger a node reset through the wd device. I'm very > > new to this area and I still lake of self documentation. > > Pacemaker is setting the node unclean which pacemaker-watcher > (one of sbd daemons) sees as it is connected to the cib. > This is why the mechanism is working (sort of - see the discussion > in my pull request in the sbd-repo) on nodes without stonithd as > well (remote-nodes). > If you are running sbd with a block-device there is of course this > way of communication as well between pacemaker and sbd. > (e.g. via fence_sbd fence-agent) > Be aware that there are different levels of support for these > features in the distributions. (RHEL more on the watchdog-side, > SLES more on the block-device side ... roughly as far as I got it) OK, I have a better understanding of the need for various sbd watchers and how it all sounds to works. > >>>> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure > >>>> what is stonith-watchdog-timeout. Is it the maximum time to wait from > >>>> stonithd after it asked for a node fencing before it considers the > >>>> watchdog was actually triggered and the node reseted, even with no > >>>> confirmation? I suppose "stonith-watchdog-timeout" is mostly useful to > >>>> stonithd, right? > >> Yes, the time we ca
Re: [ClusterLabs] Status and help with pgsql RA
On Fri, 6 Jan 2017 13:47:34 -0600 Ken Gaillot wrote: > On 12/28/2016 02:24 PM, Nils Carlson wrote: > > Hi, > > > > I am looking to set up postgresql in high-availability and have been > > comparing the guide at > > http://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster with the > > contents of the pgsql resource agent on github. It seems that there have > > been substantial improvements in the resource agent regarding the use of > > replication slots. > > > > Could anybody look at updating the guide, or just sending it out in an > > e-mail to help spread knowledge? > > > > The replications slots with pacemaker look really cool, if I've > > understood things right there should be no need for manual work after > > node recovery with the replication slots (though there is a risk of a > > full disk)? > > > > All help, tips and guidance much appreciated. > > > > Cheers, > > Nils > > Hmm, that wiki page could definitely use updating. I'm personally not > familiar with pgsql, so hopefully someone else can chime in. > > Another user on this list has made an alternative resource agent that > you might want to check out: > > http://lists.clusterlabs.org/pipermail/users/2016-December/004740.html Indeed, but PAF does not supports replication slots itself. As far as I understand the pgsql RA, the parameter "replication_slot_name" is only useful to generate the slot names, create them on all nodes and add the "primary_slot_name" parameter in the generated recovery.conf files for each node. Other admin considerations are in the hands of the user. As PAF does not take care of the recovery.conf files altogether, it is quite easy to have the same behavior: just create the required slots on all nodes by hands and set up accordingly the recovery.conf files. Regards, ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Question] About a change of crm_failcount.
On Fri, 3 Feb 2017 09:45:18 -0600 Ken Gaillot wrote: > On 02/02/2017 12:33 PM, Ken Gaillot wrote: > > On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: > >> Hi All, > >> > >> By the next correction, the user was not able to set a value except zero > >> in crm_failcount. > >> > >> - [Fix: tools: implement crm_failcount command-line options correctly] > >>- > >> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 > >> > >> However, pgsql RA sets INFINITY in a script. > >> > >> ``` > >> (snip) > >> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" > >> (snip) > >> ocf_exit_reason "My data is newer than new master's one. New > >> master's location : $master_baseline" exec_with_retry 0 $CRM_FAILCOUNT -r > >> $OCF_RESOURCE_INSTANCE -U $NODENAME -v INFINITY return $OCF_ERR_GENERIC > >> (snip) > >> ``` > >> > >> There seems to be the influence only in pgsql somehow or other. > >> > >> Can you revise it to set a value except zero in crm_failcount? > >> We make modifications to use crm_attribute in pgsql RA if we cannot revise > >> it. > >> > >> Best Regards, > >> Hideo Yamauchi. > > > > Hmm, I didn't realize that was used. I changed it because it's not a > > good idea to set fail-count without also changing last-failure and > > having a failed op in the LRM history. I'll have to think about what the > > best alternative is. > > Having a resource agent modify its own fail count is not a good idea, > and could lead to unpredictable behavior. I didn't realize the pgsql > agent did that. > > I don't want to re-enable the functionality, because I don't want to > encourage more agents doing this. > > There are two alternatives the pgsql agent can choose from: > > 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When > Pacemaker gets one of these errors from an agent, it will ban the > resource from that node (until the failure is cleared). > > 2. Use crm_resource --ban instead. This would ban the resource from that > node until the user removes the ban with crm_resource --clear (or by > deleting the ban consraint from the configuration). > > I'd recommend #1 since it does not require any pacemaker-specific tools. > > We can make sure resource-agents has a fix for this before we release a > new version of Pacemaker. We'll have to publicize as much as possible to > pgsql users that they should upgrade resource-agents before or at the > same time as pacemaker. I see the alternative PAF agent has the same > usage, so it will need to be updated, too. Yes, I was following this conversation. I'll do the fix on our side. Thank you! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fence agent for VirtualBox
Hi, On Mon, 6 Feb 2017 14:20:45 +0100 Marek Grac wrote: > I don't have one. But I see a lot of question about fence_vbox in last > days, is there any new material that references it? Here is a script a colleague of mine wrote (based on fence_virsh) to be able to fence a vbox VM: https://gist.github.com/marco44/2a4e5213a328829acee60015bf9b5671 He wrote it to be able to build PoC cluster using vbox. It has not been tested in production, but it worked like a charm during some workshops so far. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.
On Thu, 9 Feb 2017 19:24:22 +0900 (JST) renayama19661...@ybb.ne.jp wrote: > Hi Ken, > > > > 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When > > Pacemaker gets one of these errors from an agent, it will ban the > > resource from that node (until the failure is cleared). > > The first suggestion does not work well. > > Even if this returns OCF_ERR_ARGS and OCF_ERR_PERM, it seems to be to be > pre_promote(notify) handling of RA. Pacemaker does not record the notify(pre > promote) error in CIB. > > * https://github.com/ClusterLabs/pacemaker/blob/master/crmd/lrm.c#L2411 > > Because it is not recorded in CIB, there cannot be the thing that pengine > works as "hard error". Indeed. That's why PAF use private attribute to give informations between actions. We detect the failure during the notify as well, but raise the error during the promotion itself. See how I dealt with this in PAF: https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68 As private attributes does not work on older stacks, you could rely on local temp file as well in $HA_RSCTMP. > > 2. Use crm_resource --ban instead. This would ban the resource from that > > node until the user removes the ban with crm_resource --clear (or by > > deleting the ban consraint from the configuration). > > The second suggestion works well. > I intend to adopt the second suggestion. > > As other methods, you think crm_resource -F to be available, but what do you > think? I think that last-failure does not have a problem either to let you > handle pseudotrouble if it is crm_resource -F. > > I think whether crm_resource -F is available, but adopt crm_resource -B > because RA wants to completely stop pgsql resource. > > ``` @pgsql RA > > pgsql_pre_promote() { > (snip) > if [ "$cmp_location" != "$my_master_baseline" ]; then > ocf_exit_reason "My data is newer than new master's one. New > master's location : $master_baseline" exec_with_retry 0 $CRM_RESOURCE -B -r > $OCF_RESOURCE_INSTANCE -N $NODENAME -Q return $OCF_ERR_GENERIC > fi > (snip) > CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" > CRM_RESOURCE="${HA_SBIN_DIR}/crm_resource" > ``` > > I test movement a little more and send a patch. I suppose crm_resource -F will just raise the failcount, break the current transition and the CRM will recompute another transition paying attention to your "failed" resource (will it try to recover it? retry the previous transition again?). I would bet on crm_resource -B. > - Original Message - > > From: Ulrich Windl > > To: users@clusterlabs.org; kgail...@redhat.com > > Cc: > > Date: 2017/2/6, Mon 17:44 > > Subject: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount. > > > >>>> Ken Gaillot schrieb am 02.02.2017 um > > 19:33 in Nachricht > > <91a83571-9930-94fd-e635-962830671...@redhat.com>: > >> On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: > >>> Hi All, > >>> > >>> By the next correction, the user was not able to set a value except > > zero in > >> crm_failcount. > >>> > >>> - [Fix: tools: implement crm_failcount command-line options correctly] > >>> - > >> > > https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40 > > > >> a994498cafd#diff-6e58482648938fd488a920b9902daac4 > >>> > >>> However, pgsql RA sets INFINITY in a script. > >>> > >>> ``` > >>> (snip) > >>> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" > >>> (snip) > >>> ocf_exit_reason "My data is newer than new master's one. > > New master's > >> location : $master_baseline" > >>> exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U > > $NODENAME -v > >> INFINITY > >>> return $OCF_ERR_GENERIC > >>> (snip) > >>> ``` > >>> > >>> There seems to be the influence only in pgsql somehow or other. > >>> > >>> Can you revise it to set a value except zero in crm_failcount? > >>> We make modifications to use crm_attribute in pgsql RA if we cannot > > revise > >> it. > >>> > >>> Best Regards, > >>> Hideo Yamauchi. > >> > >> Hmm, I didn't realize that was used. I changed it because it's not > > a > >> good idea to set fail-count without also changing last-failure and > >> having a failed op in the LRM history. I'll have to think about what > > the > >> best alternative is. > > > > The question also is whether the RA can acieve the same effect otherwise. I > > thought CRM sets the failcount, not the RA... > > -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Question] About a change of crm_failcount.
On Thu, 09 Feb 2017 18:04:41 +0100 wf...@niif.hu (Ferenc Wágner) wrote: > Jehan-Guillaume de Rorthais writes: > > > PAF use private attribute to give informations between actions. We > > detect the failure during the notify as well, but raise the error > > during the promotion itself. See how I dealt with this in PAF: > > > > https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68 > > > > This is the first time I hear about private attributes. Since they > could come useful one day, I'd like to understand them better. After > some reading, they seem to be node attributes, not resource attributes. > This may be irrelevant for PAF, but doesn't it mean that two resources > of the same type on the same node would interfere with each other? > Also, your _set_priv_attr could fall into an infinite loop if another > instance used it at the inappropriate moment. Do I miss something here? No, you are perfectly right. We are aware of this, this is something we need to fix in the next release of PAF (I was actually discussing this with a user 2 days ago on IRC :)). Thank you for the report! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] pending actions
Hi, Occasionally, I find my cluster with one pending action not being executed for some minutes (I guess until the "PEngine Recheck Timer" elapse). Running "crm_simulate -SL" shows the pending actions. I'm still confused about how it can happens, why it happens and how to avoid this. Earlier today, I started my test cluster with 3 nodes and a master/slave resource[1], all with positive master score (1001, 1000 and 990), and the cluster kept the promote action as a pending action for 15 minutes. You will find in attachment the first 3 pengine inputs executed after the cluster startup. What are the consequences if I set cluster-recheck-interval to 30s as instance? Thanks in advance for your lights :) Regards, [1] here is the setup: http://dalibo.github.io/PAF/Quick_Start-CentOS-7.html#cluster-resource-creation-and-management -- Jehan-Guillaume de Rorthais Dalibo pe-input-417.bz2 Description: application/bzip pe-input-418.bz2 Description: application/bzip pe-input-419.bz2 Description: application/bzip ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Surprising semantics of location constraints with INFINITY score
Hi, > >>> Kristoffer Grönlund schrieb am 11.04.2017 um 15:30 > >>> in > Nachricht <87lgr7kr64@suse.com>: > > Hi all, > > > > I discovered today that a location constraint with score=INFINITY > > doesn't actually restrict resources to running only on particular > > nodes. From what I can tell, the constraint assigns the score to that > > node, but doesn't change scores assigned to other nodes. So if the node > > in question happens to be offline, the resource will be started on any > > other node. AFAIU, this behavior is expected when you set up your cluster with the Opt-In strategy: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_deciding_which_nodes_a_resource_can_run_on -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Surprising semantics of location constraints with INFINITY score
On Wed, 12 Apr 2017 09:19:06 +0200 Kristoffer Grönlund wrote: > Jehan-Guillaume de Rorthais writes: > > > Hi, > > > >> >>> Kristoffer Grönlund schrieb am 11.04.2017 um 15:30 > >> >>> in > >> Nachricht <87lgr7kr64@suse.com>: > >> > Hi all, > >> > > >> > I discovered today that a location constraint with score=INFINITY > >> > doesn't actually restrict resources to running only on particular > >> > nodes. From what I can tell, the constraint assigns the score to that > >> > node, but doesn't change scores assigned to other nodes. So if the node > >> > in question happens to be offline, the resource will be started on any > >> > other node. > > > > AFAIU, this behavior is expected when you set up your cluster with the > > Opt-In strategy: > > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_deciding_which_nodes_a_resource_can_run_on > > > > No, this is the behavior of an Opt-Out cluster. So it seems you are > under the same misconception as I was. :) Sorry, my previous mail could be read in two different way...I meant: «set your cluster with the Opt-in strategy if you want this behavior». :) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] checking all procs on system enough during stop action?
Hi all, In the PostgreSQL Automatic Failover (PAF) project, one of most frequent negative feedback we got is how difficult it is to experience with it because of fencing occurring way too frequently. I am currently hunting this kind of useless fencing to make life easier. It occurs to me, a frequent reason of fencing is because during the stop action, we check the status of the PostgreSQL instance using our monitor function before trying to stop the resource. If the function does not return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, leading to a fencing. See: https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 I am considering adding a check to define if the instance is stopped even if the monitor action returns an error. The idea would be to parse **all** the local processes looking for at least one pair of "/proc//{comm,cwd}" related to the PostgreSQL instance we want to stop. If none are found, we consider the instance is not running. Gracefully or not, we just know it is down and we can return OCF_SUCCESS. Just for completeness, the piece of code would be: my @pids; foreach my $f (glob "/proc/[0-9]*") { push @pids => basename($f) if -r $f and basename( readlink( "$f/exe" ) ) eq "postgres" and readlink( "$f/cwd" ) eq $pgdata; } I feels safe enough to me. The only risk I could think of is in a shared disk cluster with multiple nodes accessing the same data in RW (such setup can fail in so many ways :)). However, PAF is not supposed to work in such context, so I can live with this. Do you guys have some advices? Do you see some drawbacks? Hazards? Thanks in advance! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 17:08:15 +0200 Lars Ellenberg wrote: > On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: > > Hi all, > > > > In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > > negative feedback we got is how difficult it is to experience with it > > because of fencing occurring way too frequently. I am currently hunting > > this kind of useless fencing to make life easier. > > > > It occurs to me, a frequent reason of fencing is because during the stop > > action, we check the status of the PostgreSQL instance using our monitor > > function before trying to stop the resource. If the function does not return > > OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, > > leading to a fencing. See: > > https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > > > > I am considering adding a check to define if the instance is stopped even > > if the monitor action returns an error. The idea would be to parse **all** > > the local processes looking for at least one pair of > > "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > > stop. If none are found, we consider the instance is not running. > > Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > > > > Just for completeness, the piece of code would be: > > > >my @pids; > >foreach my $f (glob "/proc/[0-9]*") { > >push @pids => basename($f) > >if -r $f > >and basename( readlink( "$f/exe" ) ) eq "postgres" > >and readlink( "$f/cwd" ) eq $pgdata; > >} > > > > I feels safe enough to me. The only risk I could think of is in a shared > > disk cluster with multiple nodes accessing the same data in RW (such setup > > can fail in so many ways :)). However, PAF is not supposed to work in such > > context, so I can live with this. > > > > Do you guys have some advices? Do you see some drawbacks? Hazards? > > Isn't that the wrong place to "fix" it? > Why did your _monitor return something "weird"? Because this _monitor is the one called by the monitor action. It is able to define if an instance is running and if it feels good. Take the scenario where the slave instance is crashed: 1/ the monitor action raise an OCF_ERR_GENERIC 2/ Pacemaker tries a recover of the resource (stop->start) 3/ the stop action fails because _monitor says the resource is crashed 4/ Pacemaker fence the node. > What did it return? Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance. > Should you not fix it there? fixing this in the monitor action? This would bloat the code of this function. We would have to add a special code path in there to define if it is called as a real monitor action or just as a status one for other actions. But anyway, here or there, I would have to add this piece of code looking at each processes. According to you, is it safe enough? Do you see some hazard with it? > Just thinking out loud. Thank you, it helps :) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 17:52:09 +0200 Jan Pokorný wrote: > On 24/04/17 17:32 +0200, Jehan-Guillaume de Rorthais wrote: > > On Mon, 24 Apr 2017 17:08:15 +0200 > > Lars Ellenberg wrote: > > > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais > >> wrote: > >>> Hi all, > >>> > >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > >>> negative feedback we got is how difficult it is to experience with it > >>> because of fencing occurring way too frequently. I am currently hunting > >>> this kind of useless fencing to make life easier. > >>> > >>> It occurs to me, a frequent reason of fencing is because during the stop > >>> action, we check the status of the PostgreSQL instance using our monitor > >>> function before trying to stop the resource. If the function does not > >>> return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise > >>> an error, leading to a fencing. See: > >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > >>> > >>> I am considering adding a check to define if the instance is stopped even > >>> if the monitor action returns an error. The idea would be to parse **all** > >>> the local processes looking for at least one pair of > >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > >>> stop. If none are found, we consider the instance is not running. > >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > >>> > >>> Just for completeness, the piece of code would be: > >>> > >>>my @pids; > >>>foreach my $f (glob "/proc/[0-9]*") { > >>>push @pids => basename($f) > >>>if -r $f > >>>and basename( readlink( "$f/exe" ) ) eq "postgres" > >>>and readlink( "$f/cwd" ) eq $pgdata; > >>>} > >>> > >>> I feels safe enough to me. > > > > [...] > > > > But anyway, here or there, I would have to add this piece of code looking at > > each processes. According to you, is it safe enough? Do you see some hazard > > with it? > > Just for the sake of completeness, there's a race condition, indeed, > in multiple repeated path traversals (without being fixed of particular > entry inode), which can be interleaved with new postgres process being > launched anew (or what not). But that may happen even before the code > in question is executed -- naturally not having a firm grip on the > process is open to such possible issues, so this is just an aside. Indeed, a new process can appear right after the glob listing them. However, in a Pacemaker cluster, only Pacemaker should be responsible to start the resource. PostgreSQL is not able to restart itself by its own. I don't want to rely on the postmaster.pid (the postgresql pid file) file existence or content, neither track the postmaster pid from the RA itself. Way too much race conditions or complexity appears when I start thinking about it. Thank you for your answer! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 11:27:51 -0500 Ken Gaillot wrote: > On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote: > > On Mon, 24 Apr 2017 17:08:15 +0200 > > Lars Ellenberg wrote: > > > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais > >> wrote: > >>> Hi all, > >>> > >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > >>> negative feedback we got is how difficult it is to experience with it > >>> because of fencing occurring way too frequently. I am currently hunting > >>> this kind of useless fencing to make life easier. > >>> > >>> It occurs to me, a frequent reason of fencing is because during the stop > >>> action, we check the status of the PostgreSQL instance using our monitor > >>> function before trying to stop the resource. If the function does not > >>> return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise > >>> an error, leading to a fencing. See: > >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > >>> > >>> I am considering adding a check to define if the instance is stopped even > >>> if the monitor action returns an error. The idea would be to parse **all** > >>> the local processes looking for at least one pair of > >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > >>> stop. If none are found, we consider the instance is not running. > >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > >>> > >>> Just for completeness, the piece of code would be: > >>> > >>>my @pids; > >>>foreach my $f (glob "/proc/[0-9]*") { > >>>push @pids => basename($f) > >>>if -r $f > >>>and basename( readlink( "$f/exe" ) ) eq "postgres" > >>>and readlink( "$f/cwd" ) eq $pgdata; > >>>} > >>> > >>> I feels safe enough to me. The only risk I could think of is in a shared > >>> disk cluster with multiple nodes accessing the same data in RW (such setup > >>> can fail in so many ways :)). However, PAF is not supposed to work in such > >>> context, so I can live with this. > >>> > >>> Do you guys have some advices? Do you see some drawbacks? Hazards? > >> > >> Isn't that the wrong place to "fix" it? > >> Why did your _monitor return something "weird"? > > > > Because this _monitor is the one called by the monitor action. It is able to > > define if an instance is running and if it feels good. > > > > Take the scenario where the slave instance is crashed: > > 1/ the monitor action raise an OCF_ERR_GENERIC > > 2/ Pacemaker tries a recover of the resource (stop->start) > > 3/ the stop action fails because _monitor says the resource is crashed > > 4/ Pacemaker fence the node. > > > >> What did it return? > > > > Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance. > > > >> Should you not fix it there? > > > > fixing this in the monitor action? This would bloat the code of this > > function. We would have to add a special code path in there to define if it > > is called as a real monitor action or just as a status one for other > > actions. > > > > But anyway, here or there, I would have to add this piece of code looking at > > each processes. According to you, is it safe enough? Do you see some hazard > > with it? > > > >> Just thinking out loud. > > > > Thank you, it helps :) > > It feels odd that there is a situation where monitor should return an > error (instead of "not running"), but stop should return OK. > > I think the question is whether the service can be considered cleanly > stopped at that point -- i.e. whether it's safe for another node to > become master, and safe to try starting the crashed service again on the > same node. > > If it's cleanly stopped, the monitor should probably return "not > running". Pacemaker will already compare that result against the > expected state, and recover appropriately if needed. From old OCF dev guide, the advice is to do everything possible to stop the resource, even killing it: http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html «It is important t
Re: [ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby
On Tue, 25 Apr 2017 10:02:21 +0200 Lars Ellenberg wrote: > On Mon, Apr 24, 2017 at 03:08:55PM -0500, Ken Gaillot wrote: > > Hi all, > > > > Pacemaker 1.1.17 will have a feature that people have occasionally asked > > for in the past: the ability to start a node in standby mode. > > > I seem to remember that at some deployment, > we set the node instance attribute standby=on, always, > and took it out of standby using the node_state transient_attribute :-) > > As in > # crm node standby ava > > > > ... > > > ... This solution seems much more elegant and obvious to me. A cli (crm_standby?) interface would be ideal. It feels weird to mix setup interfaces (through crm_standby or through the config file) to manipulate the same node attribute. Isn't it possible to set the standby instance attribute of a node **before** it is added to the cluster? > # crm node status-attr ava set standby off >crm-debug-origin="do_update_resource" join="member" expected="member"> ... > > > ... > > > > It is not really straight forward to understand why you need to edit a second different nvpair to exit the standby mode... :/ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby
On Tue, 25 Apr 2017 10:33:13 +0200 Lars Ellenberg wrote: > On Tue, Apr 25, 2017 at 10:27:43AM +0200, Jehan-Guillaume de Rorthais wrote: > > On Tue, 25 Apr 2017 10:02:21 +0200 > > Lars Ellenberg wrote: > > > > > On Mon, Apr 24, 2017 at 03:08:55PM -0500, Ken Gaillot wrote: > > > > Hi all, > > > > > > > > Pacemaker 1.1.17 will have a feature that people have occasionally asked > > > > for in the past: the ability to start a node in standby mode. > > > > > > > > > I seem to remember that at some deployment, > > > we set the node instance attribute standby=on, always, > > > and took it out of standby using the node_state transient_attribute :-) > > > > > > As in > > > # crm node standby ava > > > > > > > > > > > > ... > > > > > > > > > ... > > > > This solution seems much more elegant and obvious to me. A cli > > (crm_standby?) interface would be ideal. > > > > It feels weird to mix setup interfaces (through crm_standby or through the > > config file) to manipulate the same node attribute. Isn't it possible to set > > the standby instance attribute of a node **before** it is added to the > > cluster? > > > # crm node status-attr ava set standby off > > >> > crm-debug-origin="do_update_resource" join="member" expected="member"> ... > > > > > > > > > ... > > > > > > > > > > > > > > > > It is not really straight forward to understand why you need to edit a > > second different nvpair to exit the standby mode... :/ > > Well, you want the "persistent" setting "on", > and override it with a "transient" setting "off". Quick questions: * is it what happen in the CIB when you call crm_standby? * is it possible to do the opposite? persistent setting "off" and override it with the transient setting? > That's how to do it in pacemaker. OK > But yes, what exactly has ever been "obvious" in pacemaker, > before you knew? :-)(or HA in general, to be fair) Sure, the more I dig, the more I learn about it... ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby
On Thu, 27 Apr 2017 16:07:11 +0200 Lars Ellenberg wrote: > On Thu, Apr 27, 2017 at 09:19:55AM +0200, Jehan-Guillaume de Rorthais wrote: > > > > > I seem to remember that at some deployment, > > > > > we set the node instance attribute standby=on, always, > > > > > and took it out of standby using the node_state > > > > > transient_attribute :-) > > > > > > > > > > As in > > > > > # crm node standby ava > > > > > > # crm node status-attr ava set standby off > > > > Well, you want the "persistent" setting "on", > > > and override it with a "transient" setting "off". > > > Quick questions: > > > > * is it what happen in the CIB when you call crm_standby? > > crm_standby --node emma --lifetime reboot --update off > crm_standby --node emma --lifetime forever --update on > > (or -n -l -v, > default node is current node, > default lifetime is forever) > > > * is it possible to do the opposite? persistent setting "off" and > > override it with the transient setting? > > see above, also man crm_standby, > which again is only a wrapper around crm_attribute. Thank you! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How to check if a resource on a cluster node is really back on after a crash
Hi Ludovic, On Thu, 11 May 2017 22:00:12 +0200 Ludovic Vaugeois-Pepin wrote: > I translated the a Postgresql multi state RA (https://github.com/dalibo/PAF) > in Python (https://github.com/ulodciv/deploy_cluster), and I have been > editing it heavily. Could you please provide the feedback to the upstream project (or here :))? * what did you improved in PAF? * what did you changed in PAF? * why did you translate PAF to python? Any advantages? A lot of time and research has been dedicated to this project. PAF is a pure open source project. We would love some feedback and contributors to keep improving it. Do not hesitate to open issues on PAF project if you need to discuss improvements. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2rc1 released
Hi all, The first v2.2 release candidate of the PAF resource agent for Pacemaker has been released yesterday. The changelog since 2.1 includes: * new: support PostgreSQL 10 * new: add the maxlag parameter to exclude lagging slaves from promotion, by Thomas Reiss * new: support for multiple pgsqlms resources in the same cluster * new: provide comprehensive error messages to crm_mon * fix: follow the resource agent man page naming policy and section * fix: add documentation to the pgsqlms man page * fix: do not rely on crm_failcount, suggested on the clusterlabs lists * misc: improve the RPM packaging * misc: check Pacemaker compatibility and resource setup * misc: improve the election process by including timelines comparison * misc: various code cleanup, factorization and module improvement You'll find the tarball, packages and details there: https://github.com/dalibo/PAF/releases/tag/v2.2_rc1 Any contribution, testing and feedback are appreciated and welcomed. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout
On Fri, 01 Sep 2017 09:07:16 +0200 "Ulrich Windl" wrote: > >>> Klechomir schrieb am 01.09.2017 um 08:48 in Nachricht > <9f043557-233d-6c1c-b46d-63f8c2ee5...@gmail.com>: > > Hi Ulrich, > > Have to disagree here. > > > > I have cases, when for an unknown reason a single monitoring request > > never returns result. > > So having bigger timeouts doesn't resolve this problem. > > But if your monitor hangs instead of giving a result, you also cannot ignore > the result that isn't there! OTOH: Isn't the operation timeout for monitors > that hang? If the monitor is killed, it returns an implicit status (it > failed). I agree. It seems to me the problems comes from either the resource agent or the resource itself. Presently, this issue bothers the cluster stack, but soon or later, it will blows something else. Track where the issue comes from, and fix it. -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Moving PAF to clusterlabs ?
Hi All, I am currently thinking about moving the RA PAF (PostgreSQL Automatic Failover) out of the Dalibo organisation on Github. Code and website. The point here is to encourage the community to contribute to the project (not only in code, but feedback, use, doc, etc), stating that this is a truly community-oriented project. Not just a beast from Dalibo for Dalibo. Moreover, this project already receive a significant amount of contribution from some other people outside of Dalibo. I would rather love seeing it in ClusterLabs organization. Either as a specific project or directly in the resource-agents one. However, I'm not sure of what it would requiert, how it can conflict with the pgsql existing RA, if it should move to its own namespace, etc. I already explained some years ago why we built this RA instead of using the existing one. See thread [1]. Currently, PAF links are: * homepage: http://dalibo.github.io/PAF/documentation.html * lists: * pgsql-general https://www.postgresql.org/list/pgsql-general/ * users@clusterlabs.org * code: https://github.com/dalibo/PAF/ * issues: https://github.com/dalibo/PAF/issues * license: PostgreSQL (BSD-like) * language: perl * packages: RPM and Deb (available on github or included in the PGDG repository) Note that part of the project (some perl modules) might be pushed to resource-agents independently, see [2]. Two years after, I'm still around on this project. Obviously, I'll keep maintaining it on my Dalibo's and personal time. Thoughts? [1] http://lists.clusterlabs.org/pipermail/developers/2015-August/66.html [2] http://lists.clusterlabs.org/pipermail/developers/2015-August/68.html Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0
PostgreSQL Automatic Failover (PAF) v2.2.0 has been released on September 12th 2017 under the PostgreSQL licence. See: https://github.com/dalibo/PAF/releases/tag/v2.2.0 PAF is a PostgreSQL resource agent for Pacemaker. Its original aim is to keep it clear between the Pacemaker administration and the PostgreSQL one, to keep things simple, documented and yet powerful. This release features: * the support of PostgreSQL 10 * a new "maxlag" parameter to exclude lagging slaves from promotion * ability to deal with multiple PostgreSQL instances in the same cluster * comprehensive error messages directly in crm_mon! Source code and releases are available on github: * https://github.com/dalibo/PAF/ * https://github.com/dalibo/PAF/releases Documentation, procedures, community support as well: * http://dalibo.github.io/PAF/ * http://dalibo.github.io/PAF/documentation.html * https://github.com/dalibo/PAF/issues Please, use the pgsql-gene...@postgresql.org or users@clusterlabs.org mailing lists if you have questions. Any feedback is welcomed. Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0
On Tue, 12 Sep 2017 08:02:00 -0700 Digimer wrote: > On 2017-09-12 07:48 AM, Jehan-Guillaume de Rorthais wrote: > > PostgreSQL Automatic Failover (PAF) v2.2.0 has been released on September > > 12th 2017 under the PostgreSQL licence. > > > > See: https://github.com/dalibo/PAF/releases/tag/v2.2.0 > > > > PAF is a PostgreSQL resource agent for Pacemaker. Its original aim is to > > keep it clear between the Pacemaker administration and the PostgreSQL one, > > to keep things simple, documented and yet powerful. > > > > This release features: > > > > * the support of PostgreSQL 10 > > * a new "maxlag" parameter to exclude lagging slaves from promotion > > * ability to deal with multiple PostgreSQL instances in the same cluster > > * comprehensive error messages directly in crm_mon! > > > > Source code and releases are available on github: > > > > * https://github.com/dalibo/PAF/ > > * https://github.com/dalibo/PAF/releases > > > > Documentation, procedures, community support as well: > > > > * http://dalibo.github.io/PAF/ > > * http://dalibo.github.io/PAF/documentation.html > > * https://github.com/dalibo/PAF/issues > > > > Please, use the pgsql-gene...@postgresql.org or users@clusterlabs.org > > mailing lists if you have questions. > > > > Any feedback is welcomed. > > > > Regards, > > Congrats! > > Planning to move this under the Clusterlabs github group? Yes! I'm not sure how long and how many answers I should wait for to reach a community agreement. But first answers are encouraging :) Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Moving PAF to clusterlabs ?
On Fri, 08 Sep 2017 22:41:47 +0200 Kristoffer Grönlund wrote: > Jehan-Guillaume de Rorthais writes: > > > Hi All, > > > > I am currently thinking about moving the RA PAF (PostgreSQL Automatic > > Failover) out of the Dalibo organisation on Github. Code and website. > > [snip] > > > Note that part of the project (some perl modules) might be pushed to > > resource-agents independently, see [2]. Two years after, I'm still around on > > this project. Obviously, I'll keep maintaining it on my Dalibo's and > > personal time. > > > > Thoughts? > > Hi, > > I for one would be happy to see it included in the resource-agents > repository. If people are worried about the additional dependency on > perl, we can just add a --without-perl flag (or something along those > lines) to the Makefile. You are taking about the modules right? > We already have different agents for the same application but with > different contexts so this wouldn't be anything new. You are talking about integrating PAF itself now, right? :) Pushing PAF in resource-agents is challenging in regard with: * disabling perl from autoconfig? * OCF_Directories.pm generation from autoconfig? The current generation is a ugly hack around "$OCF_ROOT/lib/heartbeat/ocf-directories" * changing the RA script name? * moving the existing PAF documentation somewhere (currently hosted in gh pages) I'm not used to autoconfig, but I can give it a try. Based on the feedback, I would make a move request of the repo to clusterlabs or create a new branch in resource-agents to start working on the integration. Thanks! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] New website design and new-new logo
On Wed, 20 Sep 2017 21:25:51 -0400 Digimer wrote: > On 2017-09-20 07:53 PM, Ken Gaillot wrote: > > Hi everybody, > > > > We've started a major update of the ClusterLabs web design. The main > > goal (besides making it look more modern) is to make the top-level more > > about all ClusterLabs projects rather than Pacemaker-specific. It's > > also much more mobile-friendly. > > > > We've also updated our new logo -- Kristoffer Grönlund had a > > professional designer look at the one he created. I hope everyone likes > > the end result. It's simpler, cleaner and friendlier. Really nice, I like it! Thanks to both of you! > > Check it out at https://clusterlabs.org/ > > This is excellent! > > Can I recommend an additional category? It would be nice to have a > "Projects" link that provided a list of projects that fall under the > clusterlabs umbrella, with a brief blurb and a link to each. Speaking of another category, maybe it could references some more community blogs? Or even add a planet? (planet.postgresql.org is pretty popular in pgsql community). I'm sure you guys has some people around writing posts about news, features, tech preview, etc. On top of my head, I can think of RH, Suse, Linbit, Unixarena, Hastexo, Alteeve, ... ++ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Moving PAF to clusterlabs (was: PostgreSQL Automatic Failover (PAF) v2.2.0)
Hi All, Sorry, this discussion spanned over two different discussions over time...Renaming to the original subject. On Wed, 13 Sep 2017 08:03:14 -0700 Digimer wrote: > On 2017-09-13 07:15 AM, Jehan-Guillaume de Rorthais wrote: > > On Tue, 12 Sep 2017 08:02:00 -0700 > > Digimer wrote: > > > >> On 2017-09-12 07:48 AM, Jehan-Guillaume de Rorthais wrote: > >>> PostgreSQL Automatic Failover (PAF) v2.2.0 has been released on September > >>> 12th 2017 under the PostgreSQL licence. [...] > >> Congrats! > >> > >> Planning to move this under the Clusterlabs github group? > > > > Yes! > > > > I'm not sure how long and how many answers I should wait for to reach a > > community agreement. But first answers are encouraging :) > > We chatted about this at the Summit last week, and the only real concern > about adding new projects was having some confidence the program won't > be abandoned and having a way to remove it, if so. So I don't see an issue. > > Andrew, who has ability to add projects? Am I able to do anything on my side to help moving forward? In the «Danger zone» of the project on github, I have the following: « Transfer ownership Transfer this repository to another user or to an organization where you have the ability to create repositories. » However, docs goes further and says I can initiate the transfert and the new owner will have an email to acknowledge it within the next 24h. See: https://help.github.com/articles/about-repository-transfers/ So, if we are agree to move the project, when should I try to initiate the transfert? Cheers, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0
On Thu, 5 Oct 2017 19:04:52 +0200 Valentin Vidic wrote: > On Tue, Sep 12, 2017 at 04:48:19PM +0200, Jehan-Guillaume de Rorthais wrote: > > PostgreSQL Automatic Failover (PAF) v2.2.0 has been released on September > > 12th 2017 under the PostgreSQL licence. > > > > See: https://github.com/dalibo/PAF/releases/tag/v2.2.0 > > > > PAF is a PostgreSQL resource agent for Pacemaker. Its original aim is to > > keep it clear between the Pacemaker administration and the PostgreSQL one, > > to keep things simple, documented and yet powerful. > > Do you think it might be possible to integrate the PostgreSQL > replication with pgbouncer for a transparent failover? The idea > would be to pause the clients in pgbouncer while moving the > replication master so no queries would fail. It doesn't seems impossible, however I'm not sure of the complexity around this. You would have to either hack PAF and detect failover/migration or create a new RA that would always be part of the transition implying your PAF RA to define if it is moving elsewhere or not. It feels the complexity is quite high and would require some expert advices about Pacemaker internals to avoid wrong or unrelated behaviors or race conditions. But, before going farther, you need to realize a failover will never be transparent. Especially one that would trigger randomly outside of your control. Even if you can pause (or suspend) client activity, their old sessions on the old master are lost, including their running transaction, prepared transactions, prepared queries, session parameters, cursors, portals, etc. It could be surprising to receive an error from the backend when you commit a non existant transaction. Even without all these features, in a very trivial oltp context, it could be surprising if you use asynchronous replication and you lack a pair of transactions on the new master. Nonetheless, the project seems interesting. -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0
On Thu, 5 Oct 2017 21:24:36 +0200 Valentin Vidic wrote: > On Thu, Oct 05, 2017 at 08:55:59PM +0200, Jehan-Guillaume de Rorthais wrote: > > It doesn't seems impossible, however I'm not sure of the complexity around > > this. > > > > You would have to either hack PAF and detect failover/migration or create a > > new RA that would always be part of the transition implying your PAF RA to > > define if it is moving elsewhere or not. > > > > It feels the complexity is quite high and would require some expert advices > > about Pacemaker internals to avoid wrong or unrelated behaviors or race > > conditions. > > > > But, before going farther, you need to realize a failover will never be > > transparent. Especially one that would trigger randomly outside of your > > control. > > Yes, I was thinking more about manual failover, for example to upgrade > the postgresql master. RA for pgbouncer would wait for all active > queries to finish and queue all new queries. Once there is nothing > running on the master anymore, another slave is activated and pgbouncer > would than resume queries there. OK. Then for a manual and controlled switchover, I suppose the best option is to keep things simple and add two more steps to your blueprint: * one to pause the client connection before the "pcs resource move --master --wait [node]" * one to resume them as soon as the "pcs resource move" finised. Obviously, this could be scripted to make controls, checks and actions faster. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot wrote: > On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > > > Kristoffer Gronlund wrote: > > > > Adam Spiers writes: > > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > > what > > > > > happens if the last node to shut down is not the first to > > > > > start up? > > > > > How will the cluster ensure it has the most recent version of > > > > > the > > > > > CIB? Without that, how would it know whether the last man > > > > > standing > > > > > was shut down cleanly or not?) > > > > > > > > This is my opinion, I don't really know what the "official" > > > > pacemaker > > > > stance is: There is no such thing as shutting down a cluster > > > > cleanly. A > > > > cluster is a process stretching over multiple nodes - if they all > > > > shut > > > > down, the process is gone. When you start up again, you > > > > effectively have > > > > a completely new cluster. > > > > > > Sorry, I don't follow you at all here. When you start the cluster > > > up > > > again, the cluster config from before the shutdown is still there. > > > That's very far from being a completely new cluster :-) > > > > The problem is you cannot "start the cluster" in pacemaker; you can > > only "start nodes". The nodes will come up one by one. As opposed (as > > I had said) to HP Sertvice Guard, where there is a "cluster formation > > timeout". That is, the nodes wait for the specified time for the > > cluster to "form". Then the cluster starts as a whole. Of course that > > only applies if the whole cluster was down, not if a single node was > > down. > > I'm not sure what that would specifically entail, but I'm guessing we > have some of the pieces already: > > - Corosync has a wait_for_all option if you want the cluster to be > unable to have quorum at start-up until every node has joined. I don't > think you can set a timeout that cancels it, though. > > - Pacemaker will wait dc-deadtime for the first DC election to > complete. (if I understand it correctly ...) > > - Higher-level tools can start or stop all nodes together (e.g. pcs has > pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? IIRC, crmsh does not implement the full cluster shutdown, only one node shut down at a time. Is it because Pacemaker has no way to shutdown the whole cluster by stopping all resources everywhere forbidding failovers in the process? Is it required to include a bunch of "pcs resource disable " before shutting down the cluster? Thanks, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Mon, 4 Dec 2017 12:31:06 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > > On Fri, 01 Dec 2017 16:34:08 -0600 > > Ken Gaillot wrote: > > > >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > >>> > >>> > >>>> Kristoffer Gronlund wrote: > >>>>> Adam Spiers writes: > >>>>> > >>>>>> - The whole cluster is shut down cleanly. > >>>>>> > >>>>>> - The whole cluster is then started up again. (Side question: > >>>>>> what > >>>>>> happens if the last node to shut down is not the first to > >>>>>> start up? > >>>>>> How will the cluster ensure it has the most recent version of > >>>>>> the > >>>>>> CIB? Without that, how would it know whether the last man > >>>>>> standing > >>>>>> was shut down cleanly or not?) > >>>>> > >>>>> This is my opinion, I don't really know what the "official" > >>>>> pacemaker > >>>>> stance is: There is no such thing as shutting down a cluster > >>>>> cleanly. A > >>>>> cluster is a process stretching over multiple nodes - if they all > >>>>> shut > >>>>> down, the process is gone. When you start up again, you > >>>>> effectively have > >>>>> a completely new cluster. > >>>> > >>>> Sorry, I don't follow you at all here. When you start the cluster > >>>> up > >>>> again, the cluster config from before the shutdown is still there. > >>>> That's very far from being a completely new cluster :-) > >>> > >>> The problem is you cannot "start the cluster" in pacemaker; you can > >>> only "start nodes". The nodes will come up one by one. As opposed (as > >>> I had said) to HP Sertvice Guard, where there is a "cluster formation > >>> timeout". That is, the nodes wait for the specified time for the > >>> cluster to "form". Then the cluster starts as a whole. Of course that > >>> only applies if the whole cluster was down, not if a single node was > >>> down. > >> > >> I'm not sure what that would specifically entail, but I'm guessing we > >> have some of the pieces already: > >> > >> - Corosync has a wait_for_all option if you want the cluster to be > >> unable to have quorum at start-up until every node has joined. I don't > >> think you can set a timeout that cancels it, though. > >> > >> - Pacemaker will wait dc-deadtime for the first DC election to > >> complete. (if I understand it correctly ...) > >> > >> - Higher-level tools can start or stop all nodes together (e.g. pcs has > >> pcs cluster start/stop --all). > > > > Based on this discussion, I have some questions about pcs: > > > > * how is it shutting down the cluster when issuing "pcs cluster stop > > --all"? > > First, it sends a request to each node to stop pacemaker. The requests > are sent in parallel which prevents resources from being moved from node > to node. Once pacemaker stops on all nodes, corosync is stopped on all > nodes in the same manner. What if for some external reasons one node is slower (load, network, whatever) than the others and start reacting ? Sending queries in parallel doesn't feels safe enough in regard with all the race conditions that can occurs in the same time. Am I missing something ? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Mon, 4 Dec 2017 16:50:47 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > > On Mon, 4 Dec 2017 12:31:06 +0100 > > Tomas Jelinek wrote: > > > >> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > >>> On Fri, 01 Dec 2017 16:34:08 -0600 > >>> Ken Gaillot wrote: > >>> > >>>> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > >>>>> > >>>>> > >>>>>> Kristoffer Gronlund wrote: > >>>>>>> Adam Spiers writes: > >>>>>>> > >>>>>>>> - The whole cluster is shut down cleanly. > >>>>>>>> > >>>>>>>> - The whole cluster is then started up again. (Side question: > >>>>>>>> what > >>>>>>>> happens if the last node to shut down is not the first to > >>>>>>>> start up? > >>>>>>>> How will the cluster ensure it has the most recent version of > >>>>>>>> the > >>>>>>>> CIB? Without that, how would it know whether the last man > >>>>>>>> standing > >>>>>>>> was shut down cleanly or not?) > >>>>>>> > >>>>>>> This is my opinion, I don't really know what the "official" > >>>>>>> pacemaker > >>>>>>> stance is: There is no such thing as shutting down a cluster > >>>>>>> cleanly. A > >>>>>>> cluster is a process stretching over multiple nodes - if they all > >>>>>>> shut > >>>>>>> down, the process is gone. When you start up again, you > >>>>>>> effectively have > >>>>>>> a completely new cluster. > >>>>>> > >>>>>> Sorry, I don't follow you at all here. When you start the cluster > >>>>>> up > >>>>>> again, the cluster config from before the shutdown is still there. > >>>>>> That's very far from being a completely new cluster :-) > >>>>> > >>>>> The problem is you cannot "start the cluster" in pacemaker; you can > >>>>> only "start nodes". The nodes will come up one by one. As opposed (as > >>>>> I had said) to HP Sertvice Guard, where there is a "cluster formation > >>>>> timeout". That is, the nodes wait for the specified time for the > >>>>> cluster to "form". Then the cluster starts as a whole. Of course that > >>>>> only applies if the whole cluster was down, not if a single node was > >>>>> down. > >>>> > >>>> I'm not sure what that would specifically entail, but I'm guessing we > >>>> have some of the pieces already: > >>>> > >>>> - Corosync has a wait_for_all option if you want the cluster to be > >>>> unable to have quorum at start-up until every node has joined. I don't > >>>> think you can set a timeout that cancels it, though. > >>>> > >>>> - Pacemaker will wait dc-deadtime for the first DC election to > >>>> complete. (if I understand it correctly ...) > >>>> > >>>> - Higher-level tools can start or stop all nodes together (e.g. pcs has > >>>> pcs cluster start/stop --all). > >>> > >>> Based on this discussion, I have some questions about pcs: > >>> > >>> * how is it shutting down the cluster when issuing "pcs cluster stop > >>> --all"? > >> > >> First, it sends a request to each node to stop pacemaker. The requests > >> are sent in parallel which prevents resources from being moved from node > >> to node. Once pacemaker stops on all nodes, corosync is stopped on all > >> nodes in the same manner. > > > > What if for some external reasons one node is slower (load, network, > > whatever) than the others and start reacting ? Sending queries in parallel > > doesn't feels safe enough in regard with all the race conditions that can > > occurs in the same time. > > > > Am I missing something ? > > > > If a node gets the request later than others, some resources may be > moved to it before it starts shutting down pacemaker as well. Pcs waits > for all nodes to shutdown pacemaker before it moves to shutting down > corosync. This way, quorum is maintained the whole time pacemaker is > shutting down and therefore no services are blocked from stopping due to > lack of quorum. OK, so if admins or RA expect to start in, the same conditions the cluster was shut downed, we have to take care of the shutdown ourselves by hands. Considering disabling the resource before shutting down might be the best option in the situation as the CRM will take care of switching off things correctly in a proper transition. That's fine to me, as a cluster shutdown should be part of a controlled procedure. I have to update my online docs I suppose now. Thank you for your answers! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Tue, 5 Dec 2017 10:05:03 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a): > > On Mon, 4 Dec 2017 16:50:47 +0100 > > Tomas Jelinek wrote: > > > >> Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > >>> On Mon, 4 Dec 2017 12:31:06 +0100 > >>> Tomas Jelinek wrote: > >>> > >>>> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > >>>>> On Fri, 01 Dec 2017 16:34:08 -0600 > >>>>> Ken Gaillot wrote: > >>>>> > >>>>>> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Kristoffer Gronlund wrote: > >>>>>>>>> Adam Spiers writes: > >>>>>>>>> > >>>>>>>>>> - The whole cluster is shut down cleanly. > >>>>>>>>>> > >>>>>>>>>> - The whole cluster is then started up again. (Side question: > >>>>>>>>>> what > >>>>>>>>>> happens if the last node to shut down is not the first to > >>>>>>>>>> start up? > >>>>>>>>>> How will the cluster ensure it has the most recent version of > >>>>>>>>>> the > >>>>>>>>>> CIB? Without that, how would it know whether the last man > >>>>>>>>>> standing > >>>>>>>>>> was shut down cleanly or not?) > >>>>>>>>> > >>>>>>>>> This is my opinion, I don't really know what the "official" > >>>>>>>>> pacemaker > >>>>>>>>> stance is: There is no such thing as shutting down a cluster > >>>>>>>>> cleanly. A > >>>>>>>>> cluster is a process stretching over multiple nodes - if they all > >>>>>>>>> shut > >>>>>>>>> down, the process is gone. When you start up again, you > >>>>>>>>> effectively have > >>>>>>>>> a completely new cluster. > >>>>>>>> > >>>>>>>> Sorry, I don't follow you at all here. When you start the cluster > >>>>>>>> up > >>>>>>>> again, the cluster config from before the shutdown is still there. > >>>>>>>> That's very far from being a completely new cluster :-) > >>>>>>> > >>>>>>> The problem is you cannot "start the cluster" in pacemaker; you can > >>>>>>> only "start nodes". The nodes will come up one by one. As opposed (as > >>>>>>> I had said) to HP Sertvice Guard, where there is a "cluster formation > >>>>>>> timeout". That is, the nodes wait for the specified time for the > >>>>>>> cluster to "form". Then the cluster starts as a whole. Of course that > >>>>>>> only applies if the whole cluster was down, not if a single node was > >>>>>>> down. > >>>>>> > >>>>>> I'm not sure what that would specifically entail, but I'm guessing we > >>>>>> have some of the pieces already: > >>>>>> > >>>>>> - Corosync has a wait_for_all option if you want the cluster to be > >>>>>> unable to have quorum at start-up until every node has joined. I don't > >>>>>> think you can set a timeout that cancels it, though. > >>>>>> > >>>>>> - Pacemaker will wait dc-deadtime for the first DC election to > >>>>>> complete. (if I understand it correctly ...) > >>>>>> > >>>>>> - Higher-level tools can start or stop all nodes together (e.g. pcs has > >>>>>> pcs cluster start/stop --all). > >>>>> > >>>>> Based on this discussion, I have some questions about pcs: > >>>>> > >>>>> * how is it shutting down the cluster when issuing "pcs cluster stop > >>>>> --all"? > >>>> > >>>> First, it sends a request to
Re: [ClusterLabs] Antw: Re: Antw: Re: questions about startup fencing
On Tue, 05 Dec 2017 08:59:55 -0600 Ken Gaillot wrote: > On Tue, 2017-12-05 at 14:47 +0100, Ulrich Windl wrote: > > > > > Tomas Jelinek schrieb am 04.12.2017 um > > > > > 16:50 in Nachricht > > > > <3e60579c-0f4d-1c32-70fc-d207e0654...@redhat.com>: > > > Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > > > > On Mon, 4 Dec 2017 12:31:06 +0100 > > > > Tomas Jelinek wrote: > > > > > > > > > Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > > > > > > On Fri, 01 Dec 2017 16:34:08 -0600 > > > > > > Ken Gaillot wrote: > > > > > > > > > > > > > On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Kristoffer Gronlund wrote: > > > > > > > > > > Adam Spiers writes: > > > > > > > > > > > > > > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > > > > > > > > > > > > > > > - The whole cluster is then started up > > > > > > > > > > > again. (Side question: > > > > > > > > > > > what > > > > > > > > > > > happens if the last node to shut down is not > > > > > > > > > > > the first to > > > > > > > > > > > start up? > > > > > > > > > > > How will the cluster ensure it has the most > > > > > > > > > > > recent version of > > > > > > > > > > > the > > > > > > > > > > > CIB? Without that, how would it know whether > > > > > > > > > > > the last man > > > > > > > > > > > standing > > > > > > > > > > > was shut down cleanly or not?) > > > > > > > > > > > > > > > > > > > > This is my opinion, I don't really know what the > > > > > > > > > > "official" > > > > > > > > > > pacemaker > > > > > > > > > > stance is: There is no such thing as shutting down a > > > > > > > > > > cluster > > > > > > > > > > cleanly. A > > > > > > > > > > cluster is a process stretching over multiple nodes - > > > > > > > > > > if they all > > > > > > > > > > shut > > > > > > > > > > down, the process is gone. When you start up again, > > > > > > > > > > you > > > > > > > > > > effectively have > > > > > > > > > > a completely new cluster. > > > > > > > > > > > > > > > > > > Sorry, I don't follow you at all here. When you start > > > > > > > > > the cluster > > > > > > > > > up > > > > > > > > > again, the cluster config from before the shutdown is > > > > > > > > > still there. > > > > > > > > > That's very far from being a completely new cluster :-) > > > > > > > > > > > > > > > > The problem is you cannot "start the cluster" in > > > > > > > > pacemaker; you can > > > > > > > > only "start nodes". The nodes will come up one by one. As > > > > > > > > opposed (as > > > > > > > > I had said) to HP Sertvice Guard, where there is a > > > > > > > > "cluster formation > > > > > > > > timeout". That is, the nodes wait for the specified time > > > > > > > > for the > > > > > > > > cluster to "form". Then the cluster starts as a whole. Of > > > > > > > > course that > > > > > > > > only applies if the whole cluster was down, not if a > > > > > > > > single node was > > > > > > > > down. > > > > > > > > > > > > > > I'm not sure what that would specifically entail, but I'
Re: [ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?
On Wed, 10 Jan 2018 12:23:59 -0600 Ken Gaillot wrote: ... > My question is: has anyone used or tested this, or is anyone interested > in this? We won't promote it to the default schema unless it is tested. > > My feeling is that it is more likely to be confusing than helpful, and > there are probably ways to achieve any reasonable use case with > existing syntax. For what it worth, I tried to implement such solution to dispatch mulitple IP addresses to slaves in a 1 master 2 slaves cluster. This is quite time consuming to wrap its head around sides effects with colocation, scores and stickiness. My various tests shows everything sounds to behave correctly now, but I don't feel really 100% confident about my setup. I agree that there are ways to achieve such a use case with existing syntax. But this is quite confusing as well. As instance, I experienced a master relocation when messing with a slave to make sure its IP would move to the other slave node...I don't remember exactly what was my error, but I could easily dig for it if needed. I feel like it fits in the same area that the usability of Pacemaker. Making it easier to understand. See the recent discussion around the gocardless war story. My tests was mostly for labs, demo and tutorial purpose. I don't have a specific field use case. But if at some point this feature is promoted officially as preview, I'll give it some testing and report here (barring the fact I'm actually aware some feedback are requested ;)). ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Changes coming in Pacemaker 2.0.0
On Wed, 10 Jan 2018 16:10:50 -0600 Ken Gaillot wrote: > Pacemaker 2.0 will be a major update whose main goal is to remove > support for deprecated, legacy syntax, in order to make the code base > more maintainable into the future. There will also be some changes to > default configuration behavior, and the command-line tools. > > I'm hoping to release the first release candidate in the next couple of > weeks. Great news! Congrats. > We'll have a longer than usual rc phase to allow for plenty of > testing. > > A thoroughly detailed list of changes will be maintained on the > ClusterLabs wiki: > > https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes > > These changes are not final, and we can restore functionality if there > is a strong need for it. Most user-visible changes are complete (in the > 2.0 branch on github); major changes are still expected, but primarily > to the C API. > > Some highlights: > > * Only Corosync version 2 will be supported as the underlying cluster > layer. Support for Heartbeat and Corosync 1 is removed. (Support for > the new kronosnet layer will be added in a future version.) I thought (according to some conference slides from sept 2017) knet was mostly related to corosync directly? Is there some visible impact on Pacemaker too? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0
On Thu, 11 Jan 2018 18:32:35 +0300 Andrei Borzenkov wrote: > On Thu, Jan 11, 2018 at 2:52 PM, Ulrich Windl > wrote: > > > > > Andrei Borzenkov schrieb am 11.01.2018 um 12:41 > in > > Nachricht > > : > >> On Thu, Jan 11, 2018 at 10:54 AM, Ulrich Windl > >> wrote: > >>> Hi! > >>> > >>> On the tool changes, I'd prefer --move and --un-move as pair over --move > >>> and --clear > >> ("clear" is less expressive IMHO). > >> > >> --un-move is really wrong semantically. You do not "unmove" resource - > >> you "clear" constraints that were created. Whether this actually > >> results in any "movement" is unpredictable (easily). > > > > You undo what "move" does: "un-move". With your argument, "move" is just as > > bad: Why not "--forbid-host" and "--allow-host" then? > > That would be less confusing as it sounds more declarative and matches > what actually happens - setting configuration parameter instead of > initiating some action. For what is worth, while using crmsh, I always have to explain to people or customers that: * we should issue an "unmigrate" to remove the constraint as soon as the resource can get back to the original node or get off the current node if needed (depending on the -inf or +inf constraint location issued) * this will not migrate back the resource if it's sticky enough on the current node. See: http://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html#swapping-master-and-slave-roles-between-nodes This is counter-intuitive, indeed. I prefer the pcs interface using the move/clear actions. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0
On Thu, 11 Jan 2018 17:04:35 +0100 Kristoffer Grönlund wrote: > Jehan-Guillaume de Rorthais writes: > > > > > For what is worth, while using crmsh, I always have to explain to > > people or customers that: > > > > * we should issue an "unmigrate" to remove the constraint as soon as the > > resource can get back to the original node or get off the current node if > > needed (depending on the -inf or +inf constraint location issued) > > * this will not migrate back the resource if it's sticky enough on the > > current node. > > > > See: > > http://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html#swapping-master-and-slave-roles-between-nodes > > > > This is counter-intuitive, indeed. I prefer the pcs interface using > > the move/clear actions. > > No need! You can use crm rsc move / crm rsc clear. In fact, "unmove" is > just a backwards-compatibility alias for clear in crmsh. Thanks for the tip! I'll update the docs next week. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?
On Thu, 11 Jan 2018 12:00:25 -0600 Ken Gaillot wrote: > On Thu, 2018-01-11 at 20:11 +0300, Andrei Borzenkov wrote: > > 11.01.2018 19:21, Ken Gaillot пишет: > > > On Thu, 2018-01-11 at 01:16 +0100, Jehan-Guillaume de Rorthais > > > wrote: > > > > On Wed, 10 Jan 2018 12:23:59 -0600 > > > > Ken Gaillot wrote: > > > > ... > > > > > My question is: has anyone used or tested this, or is anyone > > > > > interested > > > > > in this? We won't promote it to the default schema unless it is > > > > > tested. > > > > > > > > > > My feeling is that it is more likely to be confusing than > > > > > helpful, > > > > > and > > > > > there are probably ways to achieve any reasonable use case with > > > > > existing syntax. > > > > > > > > For what it worth, I tried to implement such solution to dispatch > > > > mulitple > > > > IP addresses to slaves in a 1 master 2 slaves cluster. This is > > > > quite > > > > time > > > > consuming to wrap its head around sides effects with colocation, > > > > scores and > > > > stickiness. My various tests shows everything sounds to behave > > > > correctly now, > > > > but I don't feel really 100% confident about my setup. > > > > > > > > I agree that there are ways to achieve such a use case with > > > > existing > > > > syntax. > > > > But this is quite confusing as well. As instance, I experienced a > > > > master > > > > relocation when messing with a slave to make sure its IP would > > > > move > > > > to the > > > > other slave node...I don't remember exactly what was my error, > > > > but I > > > > could > > > > easily dig for it if needed. > > > > > > > > I feel like it fits in the same area that the usability of > > > > Pacemaker. > > > > Making it > > > > easier to understand. See the recent discussion around the > > > > gocardless > > > > war story. > > > > > > > > My tests was mostly for labs, demo and tutorial purpose. I don't > > > > have > > > > a > > > > specific field use case. But if at some point this feature is > > > > promoted > > > > officially as preview, I'll give it some testing and report here > > > > (barring the > > > > fact I'm actually aware some feedback are requested ;)). > > > > > > It's ready to be tested now -- just do this: > > > > > > cibadmin --upgrade > > > cibadmin --modify --xml-text '' > > > > > > Then use constraints like: > > > > > > > > rsc="rsc1" > > > with-rsc="clone1" with-rsc-instance="1" /> > > > > > > > > rsc="rsc2" > > > with-rsc="clone1" with-rsc-instance="2" /> > > > > > > to colocate rsc1 and rsc2 with separate instances of clone1. There > > > is > > > no way to know *which* instance of clone1 will be 1, 2, etc.; this > > > just > > > allows you to ensure the colocations are separate. > > > > > > > Is it possible to designate master/slave as well? > > If you mean constrain one resource to the master, and a bunch of other > resources to the slaves, then no, this new syntax doesn't support that. > But it should be possible with existing syntax, by constraining with > role=master or role=slave, then anticolocating the resources with each > other. > Oh, wait, this is a deal breaker then... This was exactly my use case: * giving a specific IP address to the master * provide various IP addresses to slaves I suppose I'm stucked with the existing syntaxe then. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Changes coming in Pacemaker 2.0.0
On Mon, 15 Jan 2018 11:05:52 -0600 Ken Gaillot wrote: > On Thu, 2018-01-11 at 10:24 -0600, Ken Gaillot wrote: > > On Thu, 2018-01-11 at 01:21 +0100, Jehan-Guillaume de Rorthais wrote: > > > On Wed, 10 Jan 2018 16:10:50 -0600 > > > Ken Gaillot wrote: > > > > > > > Pacemaker 2.0 will be a major update whose main goal is to remove > > > > support for deprecated, legacy syntax, in order to make the code > > > > base > > > > more maintainable into the future. There will also be some > > > > changes > > > > to > > > > default configuration behavior, and the command-line tools. > > > > > > > > I'm hoping to release the first release candidate in the next > > > > couple of > > > > weeks. > > > > > > Great news! Congrats. > > > > > > > We'll have a longer than usual rc phase to allow for plenty of > > > > testing. > > > > > > > > A thoroughly detailed list of changes will be maintained on the > > > > ClusterLabs wiki: > > > > > > > > https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes > > > > > > > > These changes are not final, and we can restore functionality if > > > > there > > > > is a strong need for it. Most user-visible changes are complete > > > > (in > > > > the > > > > 2.0 branch on github); major changes are still expected, but > > > > primarily > > > > to the C API. > > > > > > > > Some highlights: > > > > > > > > * Only Corosync version 2 will be supported as the underlying > > > > cluster > > > > layer. Support for Heartbeat and Corosync 1 is removed. (Support > > > > for > > > > the new kronosnet layer will be added in a future version.) > > > > > > I thought (according to some conference slides from sept 2017) knet > > > was mostly > > > related to corosync directly? Is there some visible impact on > > > Pacemaker too? > > > > You're right -- it's more accurate to say that corosync 3 will > > support > > knet, and I'm not yet aware whether the corosync 3 API will require > > any > > changes in Pacemaker. > > Good news! The corosync developers say that knet support will be > completely transparent to pacemaker, so pacemaker already supports > corosync 3. :-) Perfect! Thanks for the confirmation and feedback :) ++ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Opinions wanted: another logfile question for Pacemaker 2.0
On Mon, 15 Jan 2018 11:19:27 -0600 Ken Gaillot wrote: > On Mon, 2018-01-15 at 18:08 +0100, Klaus Wenninger wrote: > > On 01/15/2018 05:51 PM, Ken Gaillot wrote: > > > Currently, Pacemaker will use the same detail log as corosync if > > > one is > > > specified (as "logfile:" in the "logging {...}" section of > > > corosync.conf). > > > > > > The corosync developers think that is a bad idea, and would like > > > pacemaker 2.0 to always use its own log. > > > > > > Corosync and pacemaker both use libqb to write to the logfile. > > > libqb > > > doesn't have any locking mechanism, so there could theoretically be > > > some conflicting writes, though we don't see any in practice. > > > > > > Does anyone have a strong opinion on this one way or the other? Do > > > you > > > like having pacemaker and corosync detail messages in one logfile, > > > or > > > would you prefer separate logfiles? > > > > I'm aware that a log-entry from one source (like corosync) appearing > > before an entry from another source (like pacemaker) doesn't > > necessarily > > mean that this correctly reflects their sequence in time but usually > > it is working fairly well. > > With timestamps of 1 second granularity in 2 files we would be > > definitely > > off worse. > > Please correct me if timestamping is configurable already but if not > > I would say we should either have at least the possibility to log > > into > > a single file or we should have timestamping with a granularity at > > least 3 magnitudes finer. (configurable timestamps as in pacemaker- > > alerts > > might be a solution) > > > > Regards, > > Klaus > > Configurable timestamps (or even just switching to high-precision > timestamps permanently) sounds great. However that would require > changes in libqb. > > You can already get configurable timestamps in the syslog, via the > native syslog functionality. That is more reliable than relying on the > corosync vs pacemaker ordering in the detail log. > > Better support in libqb for writing to the same logfile might be > interesting, but it would require both high-precision timestamps and > locking, which could impact performance. I'd prefer high-precision > timestamps in separate files. Log files are quite difficult to understand for novices, especially understanding what piece of the stack is talking about compare to other ones around. Moreover, it's not natural to explain logs from Pacemaker are written in corosync.log file on some distro. Splitting Corosync and Pacemaker log make sense to me. Having better precision in timestamp as well. Handling multiple logs is easy from the system point of view anyway. And I agree most of this job can be forwarded to syslog or journald. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
FWIW, bellow my opinion about this On Tue, 16 Jan 2018 16:33:56 -0600 Ken Gaillot wrote: [...] > I think the term "stateful resource" is a better substitute for > "master/slave resource". That would mainly be a documentation change. +1 > A bigger question is what to call the two roles. "Master" and "Slave" > would be continue to be accepted for backward compatibility for a long > time. Some suggestions: > * master/worker, master/replicant, primary/backup: I'd like to avoid > terms like these. OCF and Pacemaker are application-agnostic, whereas > these terms imply particular functionality and are often associated > with particular software. -1 for them as well. > * primary/secondary: Widely used, but sometimes associated with > particular software. +1 > * promoted with either unpromoted, demoted, default, or started: All > OCF and Pacemaker actually care about is whether the resource agent has > been called with the promote action. "Promoted" is good, but the other > role is less obvious. Started and Promoted might do the job, and sounds agnostic in regard with application terminology. I don't have strong argument to pick one between primary/secondary and started/promoted. The first might be more convenient to understand to most people without further explanation about Pacemaker internal mechanism though. [...] I suppose, this should be reflected on pcs/crmsh as well at some point. ++ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Misunderstanding or bug in crm_simulate output
Hi list, I was explaining how to use crm_simulate to a colleague when he pointed to me a non expected and buggy output. Here are some simple steps to reproduce: $ pcs cluster setup --name usecase srv1 srv2 srv3 $ pcs cluster start --all $ pcs property set stonith-enabled=false $ pcs resource create dummy1 ocf:heartbeat:Dummy \ state=/tmp/dummy1.state\ op monitor interval=10s\ meta migration-threshold=3 resource-stickiness=1 Now, we are injecting 2 monitor soft errors, triggering 2 local recovery (stop/start): $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O /tmp/step1.xml $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 -O /tmp/step2.xml So far so good. A third soft error on monitor push dummy1 out of srv1, this was expected. However, the final status of the cluster shows dummy1 as started on both srv1 and srv2! $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 -O /tmp/step3.xml Current cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started srv1 Performing requested modifications + Injecting dummy1_monitor_10@srv1=1 into the configuration + Injecting attribute fail-count-dummy1=value++ into /node_state '1' + Injecting attribute last-failure-dummy1=1516287891 into /node_state '1' Transition Summary: * Recoverdummy1 ( srv1 -> srv2 ) Executing cluster transition: * Cluster action: clear_failcount for dummy1 on srv1 * Resource action: dummy1 stop on srv1 * Resource action: dummy1 cancel=10 on srv1 * Pseudo action: all_stopped * Resource action: dummy1 start on srv2 * Resource action: dummy1 monitor=1 on srv2 Revised cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] I suppose this is a bug from crm_simulate? Why is it considering dummy1 is started on srv1 when the transition execution stopped it on srv1? Taking the step3.xml output of this weird result force the cluster to stop dummy1 everywhere and start it on srv2 only: $ crm_simulate -S -x /tmp/step3.xml Current cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] Transition Summary: * Move dummy1 ( srv1 -> srv2 ) Executing cluster transition: * Resource action: dummy1 stop on srv2 * Resource action: dummy1 stop on srv1 * Pseudo action: all_stopped * Resource action: dummy1 start on srv2 * Resource action: dummy1 monitor=1 on srv2 Revised cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started srv2 Thoughts? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Misunderstanding or bug in crm_simulate output
On Thu, 18 Jan 2018 10:54:33 -0600 Ken Gaillot wrote: > On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais wrote: > > Hi list, > > > > I was explaining how to use crm_simulate to a colleague when he > > pointed to me a > > non expected and buggy output. > > > > Here are some simple steps to reproduce: > > > > $ pcs cluster setup --name usecase srv1 srv2 srv3 > > $ pcs cluster start --all > > $ pcs property set stonith-enabled=false > > $ pcs resource create dummy1 ocf:heartbeat:Dummy \ > > state=/tmp/dummy1.state\ > > op monitor interval=10s\ > > meta migration-threshold=3 resource-stickiness=1 > > > > Now, we are injecting 2 monitor soft errors, triggering 2 local > > recovery > > (stop/start): > > > > $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O /tmp/step1.xml > > $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 > > -O /tmp/step2.xml > > > > > > So far so good. A third soft error on monitor push dummy1 out of > > srv1, this > > was expected. However, the final status of the cluster shows dummy1 > > as > > started on both srv1 and srv2! > > > > $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 > > -O /tmp/step3.xml > > > > Current cluster status: > > Online: [ srv1 srv2 srv3 ] > > > > dummy1 (ocf::heartbeat:Dummy): Started srv1 > > > > Performing requested modifications > > + Injecting dummy1_monitor_10@srv1=1 into the configuration > > + Injecting attribute fail-count-dummy1=value++ into /node_state > > '1' > > + Injecting attribute last-failure-dummy1=1516287891 into > > /node_state '1' > > > > Transition Summary: > > * Recoverdummy1 ( srv1 -> srv2 ) > > > > Executing cluster transition: > > * Cluster action: clear_failcount for dummy1 on srv1 > > * Resource action: dummy1 stop on srv1 > > * Resource action: dummy1 cancel=10 on srv1 > > * Pseudo action: all_stopped > > * Resource action: dummy1 start on srv2 > > * Resource action: dummy1 monitor=1 on srv2 > > > > Revised cluster status: > > Online: [ srv1 srv2 srv3 ] > > > > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] > > > > I suppose this is a bug from crm_simulate? Why is it considering > > dummy1 is > > started on srv1 when the transition execution stopped it on srv1? > > It's definitely a bug, either in crm_simulate or the policy engine > itself. Can you attach step2.xml? Sure, please, find in attachment step2.xml. step2.xml Description: XML document ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
promote" action must move an > > instance from the default role to the non-default role, and a > > successful "demote" action must move an instance from the non-default > > role to the default role. > > > > So, it's very generic from the cluster's point of view. > > > > > > Too bad "roleful" isn't a word ;-) > > > > > > > > As you mentioned, "state" can more broadly refer to started, > > > > stopped, > > > > etc., but pacemaker does consider "started in slave role" and > > > > "started > > > > in master role" as extensions of this, so I don't think > > > > "stateful" > > > > is > > > > too far off the mark. > > > > > > Maybe also state the purpose of having different roles here, and > > > define what a role as opposed to a state is. > > > > That's part of the problem -- the purpose is entirely up to the > > specific application. > > > > Some use it for a master copy of data vs a replicated copy of data, a > > read/write instance vs a read-only instance, a coordinating function > > vs > > an executing function, an active instance vs a hot-spare instance, > > etc. > > > > That's why I like "promoted"/"started" -- it most directly implies > > "whatever role you get after promote" vs "whatever role you get after > > start". > > > > It would even be easy to think of the pacemaker daemons themselves as > > clones. The crmd would be a stateful clone whose non-default role is > > the DC. The attrd would be a stateful clone whose non-default role is > > the writer. (It might be "fun" to represent the daemons as resources > > one day ...) > > > > > > Separately, clones (whether stateful or not) may be anonymous or > > > > unique > > > > (i.e. whether it makes sense to start more than one instance on > > > > the > > > > same node), which confuses things further. > > > > > > "anonymous clone" should be defined also, just as unique: Aren't > > > all > > > configured resources "unique" (i.e. being different from each > > > other)? > > > > > > I'm curious about more than two roles, multiple "masters" and > > > multiple "slaves". > > > > It's a common model to have one database master and a bunch of > > replicants, and with most databases having good active/active support > > these says, it's becoming more common to have multiple masters, with > > or > > without separate replicants. It's also common for one coordinator > > with > > multiple workers. > > > > > > > > Regards, > > > Ulrich -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Thu, 25 Jan 2018 10:03:34 +0100 Ivan Devát wrote: > > I think there's enough sentiment for "promoted"/"started" as the role > > names, since it most directly reflects how pacemaker uses them. > > > Just a question. > The property "role" of a resource operation can have values: "Stopped", > "Started" and in the case of multi-state resources, "Slave" and "Master". > > What does it mean when the value is "Started"? Does it mean either > "Slave" or "Master" or does it mean just "Slave"? This has been discussed on this list, not sure when... "Started" has the exact same meaning than "Slave" (and the other way around). If I remember correctly, a patch fixed some output about this some versions ago. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Thu, 25 Jan 2018 11:28:16 +0100 Jehan-Guillaume de Rorthais wrote: > On Thu, 25 Jan 2018 10:03:34 +0100 > Ivan Devát wrote: > > > > I think there's enough sentiment for "promoted"/"started" as the role > > > names, since it most directly reflects how pacemaker uses them. > > > > > Just a question. > > The property "role" of a resource operation can have values: "Stopped", > > "Started" and in the case of multi-state resources, "Slave" and "Master". > > > > What does it mean when the value is "Started"? Does it mean either > > "Slave" or "Master" or does it mean just "Slave"? > > This has been discussed on this list, not sure when... > > "Started" has the exact same meaning than "Slave" (and the other way around). > If I remember correctly, a patch fixed some output about this some versions > ago. See commits: ab44df4b6b2f2af6cf94a50833b4e3acc3718c72 4acd6327a949a3836fa7bb1851f758d4474cd05d ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Misunderstanding or bug in crm_simulate output
On Wed, 24 Jan 2018 17:42:56 -0600 Ken Gaillot wrote: > On Fri, 2018-01-19 at 00:37 +0100, Jehan-Guillaume de Rorthais wrote: > > On Thu, 18 Jan 2018 10:54:33 -0600 > > Ken Gaillot wrote: > > > > > On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais > > > wrote: > > > > Hi list, > > > > > > > > I was explaining how to use crm_simulate to a colleague when he > > > > pointed to me a > > > > non expected and buggy output. > > > > > > > > Here are some simple steps to reproduce: > > > > > > > > $ pcs cluster setup --name usecase srv1 srv2 srv3 > > > > $ pcs cluster start --all > > > > $ pcs property set stonith-enabled=false > > > > $ pcs resource create dummy1 ocf:heartbeat:Dummy \ > > > > state=/tmp/dummy1.state\ > > > > op monitor interval=10s\ > > > > meta migration-threshold=3 resource-stickiness=1 > > > > > > > > Now, we are injecting 2 monitor soft errors, triggering 2 local > > > > recovery > > > > (stop/start): > > > > > > > > $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O > > > > /tmp/step1.xml > > > > $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 > > > > -O /tmp/step2.xml > > > > > > > > > > > > So far so good. A third soft error on monitor push dummy1 out of > > > > srv1, this > > > > was expected. However, the final status of the cluster shows > > > > dummy1 > > > > as > > > > started on both srv1 and srv2! > > > > > > > > $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 > > > > -O /tmp/step3.xml > > > > > > > > Current cluster status: > > > > Online: [ srv1 srv2 srv3 ] > > > > > > > > dummy1 (ocf::heartbeat:Dummy): Started srv1 > > > > > > > > Performing requested modifications > > > > + Injecting dummy1_monitor_10@srv1=1 into the configuration > > > > + Injecting attribute fail-count-dummy1=value++ into > > > > /node_state > > > > '1' > > > > + Injecting attribute last-failure-dummy1=1516287891 into > > > > /node_state '1' > > > > > > > > Transition Summary: > > > > * Recoverdummy1 ( srv1 -> srv2 ) > > > > > > > > Executing cluster transition: > > > > * Cluster action: clear_failcount for dummy1 on srv1 > > > > * Resource action: dummy1 stop on srv1 > > > > * Resource action: dummy1 cancel=10 on srv1 > > > > * Pseudo action: all_stopped > > > > * Resource action: dummy1 start on srv2 > > > > * Resource action: dummy1 monitor=1 on srv2 > > > > > > > > Revised cluster status: > > > > Online: [ srv1 srv2 srv3 ] > > > > > > > > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 > > > > srv2 ] > > > > > > > > I suppose this is a bug from crm_simulate? Why is it considering > > > > dummy1 is > > > > started on srv1 when the transition execution stopped it on > > > > srv1? > > > > > > It's definitely a bug, either in crm_simulate or the policy engine > > > itself. Can you attach step2.xml? > > > > Sure, please, find in attachment step2.xml. > > I can reproduce the issue with 1.1.16 but not 1.1.17 or later, so > whatever it was, it got fixed. Interesting. I did some quick search and was supposing the bug came from somewhere around "fake_transition.c". I hadn't time to dig real far though. Moreover, I am studying the code from master, not for 1.1.16, what a waste of time if it's fixed on the master branch :) Too bad most stable distro are still using 1.1.16 (Debian, CentOS, Ubuntu). I have another question related to my tests in regard with this subject. Keep in mind this is still with 1.1.16, but it might not be related to the version. While testing on a live cluster, without crm_simulate, to see what really happen there, I found in the log file that the pengine was producing 2 distincts transitions. Find the log file in attachment. As far as I understand it, I suppose that: * the crmd on srv3/dc receive the failure result from srv1 for monitor on dummy1 * c
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Thu, 25 Jan 2018 15:21:30 -0500 Digimer wrote: > On 2018-01-25 01:28 PM, Ken Gaillot wrote: > > On Thu, 2018-01-25 at 13:06 -0500, Digimer wrote: > >> On 2018-01-25 11:11 AM, Ken Gaillot wrote: > >>> On Wed, 2018-01-24 at 20:58 +0100, Jehan-Guillaume de Rorthais > >>> wrote: > >>>> On Wed, 24 Jan 2018 13:28:03 -0600 > >>>> Ken Gaillot wrote: > >>>> > >>>>> I think there's enough sentiment for "promoted"/"started" as > >>>>> the > >>>>> role > >>>>> names, since it most directly reflects how pacemaker uses them. > >>>>> > >>>>> For the resources themselves, how about "binary clones"? > >>>> > >>>> I'm not sure to understand what your question is about. > >>>> > >>>> If it is related to how the RA are designated between the ones > >>>> able > >>>> to > >>>> promote/demote and the other ones, this does not reflect to me > >>>> the > >>>> resource can > >>>> be either started or promoted. Moreover, I suppose this kind of > >>>> resources are > >>>> not always binary clones. The states might be purely logical. > >>>> > >>>> Multistate sounds the best option to me. Simple. > >>>> > >>>> If you need some more options, I would pick: clustered resource. > >>>> > >>>> We could argue simple clones might be "clustered resource" as > >>>> well, > >>>> but they > >>>> are not supposed to be related to each other as a > >>>> primary/promoted > >>>> resource and > >>>> a secondary/standby resource are. > >>> > >>> Zeroing in on this question, which does everyone prefer: > >>> > >>> * "Binary clones" (in the sense of "one of two roles", but not very > >>> obvious) > >>> > >>> * "Stateful clones" (potentially confusing with anonymous vs unique > >>> clones, and all resources have state) > >>> > >>> * "Multistate clones" (less confusing with anonymous vs unique, and > >>> already in current use in documentation, but still all resources > >>> have > >>> multiple possible states) > >>> > >>> * "Promotable clones" (consistent with "promote" theme, but the > >>> word > >>> looks odd, and confusing with whether an individual instance is > >>> eligible to be promoted) > >>> > >>> * "Promotion clones" (also consistent, but sounds odd and not > >>> particularly obvious) > >> > >> I don't want to push my preferences here, but I wanted to suggest > >> that > >> something that sounds a bit on now will sound normal over time. > >> > >> I will point out, though, that spell check doesn't complain about > >> 'Binary' and 'Promotion'. > >> > >> If I can throw another suggestion in (without offering preference for > >> it > >> myself), 'dual-state clones'? The reasoning is that, though three > >> words > >> instead of two, spell-check likes it, it sounds OK on day one (from a > >> language perspective) and it reflects that the clone has only one of > >> two > >> states. > > > > Or "dual-role". > > > > Binary/dual/multi all have the issue that all resources have multiple > > states (stopped, started, etc.). Not a deal-breaker, but a factor to > > consider. > > > > What we're trying to represent is: clone resources that have an > > additional possible role that pacemaker manages via the promote/demote > > actions. > > > > I go back and forth between options. "Multistate" would be OK, > > especially since it's already used in some places. "Promotable" is > > probably most accurate. > > If the thing only has two states; "dual-role" is perfect. If the thing > can have 3+ states, "multistate" is perfect. In 1.1.x, there is exactly three states: stopped, started, promoted. In 1.1.x, there is exactly two roles: slave and master. Using this terminology, "dual-role" would make sense. However, note that "target-role" could be set to "Stopped"
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Fri, 26 Jan 2018 12:41:51 +0300 Vladislav Bogdanov wrote: > 25.01.2018 21:28, Ken Gaillot wrote: > > [...] > > >> If I can throw another suggestion in (without offering preference for > >> it > >> myself), 'dual-state clones'? The reasoning is that, though three > >> words > >> instead of two, spell-check likes it, it sounds OK on day one (from a > >> language perspective) and it reflects that the clone has only one of > >> two > >> states. > > > > Or "dual-role". > > Btw, is the word 'tri-state' or 'tristate' usable in contemporary English? > Some online translators accept it, but google isn't. Exact, I was thinking about this one as well but forgot to suggest it. What about "binary-state" and "ternary-state" ? Or "binary-role" and "ternary-role" ? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Fri, 26 Jan 2018 17:10:14 +0100 Klaus Wenninger wrote: > On 01/26/2018 04:37 PM, Ken Gaillot wrote: > > On Fri, 2018-01-26 at 09:07 +0100, Jehan-Guillaume de Rorthais wrote: > >> On Thu, 25 Jan 2018 15:21:30 -0500 > >> Digimer wrote: > >> > >>> On 2018-01-25 01:28 PM, Ken Gaillot wrote: > >>>> On Thu, 2018-01-25 at 13:06 -0500, Digimer wrote: > >>>>> On 2018-01-25 11:11 AM, Ken Gaillot wrote: > >>>>>> On Wed, 2018-01-24 at 20:58 +0100, Jehan-Guillaume de > >>>>>> Rorthais > >>>>>> wrote: > >>>>>>> On Wed, 24 Jan 2018 13:28:03 -0600 > >>>>>>> Ken Gaillot wrote: > >>>>>>> > >>>>>>>> I think there's enough sentiment for "promoted"/"started" > >>>>>>>> as > >>>>>>>> the > >>>>>>>> role > >>>>>>>> names, since it most directly reflects how pacemaker uses > >>>>>>>> them. > > Doh! I forgot that there is a key difference between "started" and > > "slave". If you set target-role="Started" for a multistate clone, > > Pacemaker is allowed to bring it up in master or slave mode, whereas > > specifying "Master" or "Slave" specifically only allows that one role. > > > > So, we can't use "started" to replace "slave". > > Question is if it has to be like that. > Intention is obviously to prevent that pacemaker doesn't promote > the resource. Could it be a meta-attribute "inhibit-promotion" > as well. Or any other means to inhibit promotion - you are > getting my idea ... Or maybe introduce something more intuitive like "active"? target-role would have: stopped, started, promoted or active(started or promoted). ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
On Fri, 26 Jan 2018 09:37:39 -0600 Ken Gaillot wrote: ... > > All RA > > must implement the first two states "stopped" and "started". The > > cases where RA > > is promotable should then be called..."promotable" I suppose. > > > > However, why exactly should we find a terminology for RA implementing > > the > > "promote" action? Should we find some new terminology for RA > > implementing > > optional actions like "notify" ("notifiable") or "validate-all" > > ("validatable") > > action as well? (ok, these are not roles, but I try to explain my > > point :)) > > > > Now the terminology is common using "stopped, started, promoted", > > should we stop considering "multistate RA" as special cases? We could > > simply > > explain RAs are implementing all or a subset of the actions/roles > > Pacemaker > > offers and give documentation details related to roles properties, > > variables > > and transitions, isn't it? > > We will need some way of specifying it in the syntax, and that will > become its name ... or whatever. (We can't > simply go by what's in the RA metadata for technical reasons not worth > going into here.) That makes me wonder how pcs and crm syntax should evolve as well from the user point of view... :/ > > > "Promotable" is certainly accurate, but I have a (very mild) > > > concern > > > about how easily it is understood by non-native English speakers. > > > Perhaps someone who speaks English as a second language could chime > > > in > > > on that? > > > > "Promotable" sounds accurate to me (native french speaker). > > That's good to know. I think we have at least narrowed it down to > "multistate" or "promotable" :-) > > Unfortunately the role names are now back in play as well. "Promoted" > and "unpromoted"? > > E.g.: > > Instead of a single "started" role, a promotable clone resource must be > in one of two roles when active: "promoted" or "unpromoted". What about my previous suggestion to use "active" to match either "Started" or "Promoted" state? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?
On Tue, 13 Feb 2018 21:38:06 + Eric Robinson wrote: > Thanks for the suggestion everyone. I'll give that a try. Sorry, I'm late on this, but I wrote a quick start doc describing this (amongs other things) some time ago. See the following chapter: https://clusterlabs.github.io/PAF/Quick_Start-CentOS-6.html#cluster-creation Other chapter might not be useful to you. Do not hesitate to give feedback if something changed or doesn't work anymore. This was based on CentOS 6.7. Cheers, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?
On Wed, 14 Feb 2018 23:11:49 + Eric Robinson wrote: > > > Thanks for the suggestion everyone. I'll give that a try. > > > > Sorry, I'm late on this, but I wrote a quick start doc describing this > > (amongs other things) some time ago. See the following chapter: > > > > https://clusterlabs.github.io/PAF/Quick_Start-CentOS-6.html#cluster- > > creation > > > > I scanned through that page but I did not see where it talks about setting up > multiple corosync rings. Quoting the page: « If you have an alternative network available (this is highly recommended), you can use the following syntax: pcs cluster setup --name cluster_pgsql srv1,srv1-alt srv2,srv2-alt srv3,srv3-alt If your version of pcs does not support it (ie. CentOS 6.6 and bellow), you can fallback on the old but useful ccs command: pcs cluster setup --name cluster_pgsql srv1 srv2 srv3 ccs -f /etc/cluster/cluster.conf --addalt srv1 srv1-alt ccs -f /etc/cluster/cluster.conf --addalt srv2 srv2-alt ccs -f /etc/cluster/cluster.conf --addalt srv3 srv3-alt pcs cluster sync » ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP
Hi guys, Few month ago, I started a new chapter about this exact subject for "PAF - Cluster administration under CentOS" ( see: https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html) Please, find attach my draft. All feedback, fix, comments and intensive tests are welcome! ## Adding IPs on slaves nodes In this chapter, we are using a three node cluster with one PostgreSQL master instance and two standbys instances. As usual, we start from the cluster created in the quick start documentation: * one master resource called `pgsql-ha` * an IP address called `pgsql-master-ip` linked to the `pgsql-ha` master role See the [Quick Start CentOS 7]({{ site.baseurl}}/Quick_Start-CentOS-7.html#cluster-resources) for more informations. We want to create two IP addresses with the following properties: * start on a standby node * avoid to start on the same standby node than the other one * move to the available standby node should a failure occurs to the other one * move to the master if there is no standby alive To make this possible, we have to play with the resources co-location scores. First, let's add two `IPaddr2` resources called `pgsql-ip-stby1` and `pgsql-ip-stby2` holding IP addresses `192.168.122.49` and `192.168.122.48`: ~~~ # pcs resource create pgsql-ip-stby1 ocf:heartbeat:IPaddr2 \ cidr_netmask=24 ip=192.168.122.49 op monitor interval=10s \ # pcs resource create pgsql-ip-stby2 ocf:heartbeat:IPaddr2 \ cidr_netmask=24 ip=192.168.122.48 op monitor interval=10s \ ~~~ We want both IP addresses to avoid co-locating with each other. We add a co-location constraint so `pgsql-ip-stby2` avoids `pgsql-ip-stby1` with a score of `-5`: ~~~ #Â pcs constraint colocation add pgsql-ip-stby2 with pgsql-ip-stby1 -5 ~~~ > **NOTE**: that means the cluster manager have to start `pgsql-ip-stby1` first > to decide where `pgsql-ip-stby2` should start according to the new scores in > the cluster. Also, that means that whenever you move `pgsql-ip-stby1` to > another node, the cluster might have to stop `pgsql-ip-stby2` first and > restart it elsewhere depending on new scores. {: .notice} Now, we add similar co-location constraints to define that each IP address prefers to run on a node with a slave of `pgsql-ha`: ~~~ # pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 10 # pcs constraint order start pgsql-ha then start pgsql-ip-stby1 kind=Mandatory # pcs constraint colocation add pgsql-ip-stby2 with slave pgsql-ha 10 # pcs constraint order start pgsql-ha then start pgsql-ip-stby2 kind=Mandatory ~~~ ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP
On Wed, 7 Mar 2018 01:27:16 + 范国腾 wrote: > Thank you, Rorthais, > > I read the link and it is very helpful. Did you read the draft I attached to the email? It was the main purpose of my answer: helping you with IP on slaves. It seems to me your mail is reporting different issues than the original subject. > There are some issues that I have met when I installed the cluster. I suppose this is another subject and we should open a new thread with the appropriate subject. > 1. “pcs cluster stop” could not stop the cluster in some times. You would have to give some more details about the context where "pcs cluster stop" timed out. > 2. when I upgrade the PAF, I could just replace the pgsqlms file. When I > upgrade the postgres, I just replace the /usr/local/pgsql/. I believe both actions are documented with best practices in this links I gave you. > 3. If the cluster does not stop normally, the pgcontroldata status is not > "SHUTDOWN",then the PAF would not start the postgresql any more, so I > normally change the pgsqlms as below after installing the PAF. > [...] This should be discussed to understand the exact context before considering your patch. At a first glance, your patch seems quite dangerous as it bypass the sanity checks. Please, could you start a new thread with proper subject and add extensive informations about this issue? You could open a new issue on PAF repository as well: https://github.com/ClusterLabs/PAF/issues Regards, ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP
On Thu, 8 Mar 2018 01:45:43 + 范国腾 wrote: > Sorry, Rorthais, I have thought that the link and the attachment was the same > document yesterday. No problem. For your information, I merged the draft in the official documentation yesterday. > I just read the attachment and that is exactly what I ask > originally. Excellent! Glad it could helped. > I have two questions on the following two command: > # pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 10 > Q: Does the score 10 means that " move to the master if there is no standby > alive "? Kind of. It actually says nothing about moving to the master. It just says the slaves IP should prefers to locate with a slave. If slaves nodes are down or in standby, the IP "can" move to the master as nothing forbid it. In fact, while writing this sentence, I realize there's nothing to push the slaves IP on the master if other nodes are up, but the pgsql-ha slaves are stopped or banned. The configuration I provided is incomplete. 1. I added the missing constraints in the doc online 2. notice I raised all the scores so they are higher than the stickiness See: https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html#adding-ips-on-slaves-nodes Sorry for this :/ > # pcs constraint order start pgsql-ha then start pgsql-ip-stby1 kind=Mandatory > Q: I did not set the order and I did not find the issue until now. So I add > this constraint? What will happen if I miss it? The IP address can start before PostgreSQL is up on the node. You will have client connexions being rejected with error "PostgreSQL is not listening on host [...]". > Here is what I did now: > pcs resource create pgsql-slave-ip1 ocf:heartbeat:IPaddr2 ip=192.168.199.186 > nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s; > pcs resource create pgsql-slave-ip2 ocf:heartbeat:IPaddr2 ip=192.168.199.187 > nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s; > pcs constraint colocation add pgsql-slave-ip1 with pgsql-ha It misses the score and the role. Without role specification, it can colocates with Master or Slave with no preference. > pcs constraint colocation add pgsql-slave-ip2 with pgsql-ha Same, it misses the score and the role. > pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 > pgsql-master-ip setoptions score=-1000 The score seems too high in my opinion, compared to other ones. You should probably remove all the colocation constraints and try with the one I pushed online. Regards, > -邮件原件- > 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > 发送时间: 2018年3月7日 16:29 > 收件人: 范国腾 > 抄送: Cluster Labs - All topics related to open-source clustering welcomed > 主题: Re: [ClusterLabs] 答复: 答复: 答复: How to > configure to make each slave resource has one VIP > > On Wed, 7 Mar 2018 01:27:16 + > 范国腾 wrote: > > > Thank you, Rorthais, > > > > I read the link and it is very helpful. > > Did you read the draft I attached to the email? It was the main purpose of my > answer: helping you with IP on slaves. It seems to me your mail is reporting > different issues than the original subject. > > > There are some issues that I have met when I installed the cluster. > > I suppose this is another subject and we should open a new thread with the > appropriate subject. > > > 1. “pcs cluster stop” could not stop the cluster in some times. > > You would have to give some more details about the context where "pcs cluster > stop" timed out. > > > 2. when I upgrade the PAF, I could just replace the pgsqlms file. When > > I upgrade the postgres, I just replace the /usr/local/pgsql/. > > I believe both actions are documented with best practices in this links I > gave you. > > > 3. If the cluster does not stop normally, the pgcontroldata status is > > not "SHUTDOWN",then the PAF would not start the postgresql any more, > > so I normally change the pgsqlms as below after installing the PAF. > > [...] > > This should be discussed to understand the exact context before considering > your patch. > > At a first glance, your patch seems quite dangerous as it bypass the sanity > checks. > > Please, could you start a new thread with proper subject and add extensive > informations about this issue? You could open a new issue on PAF repository > as well: https://github.com/ClusterLabs/PAF/issues > > Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP
On Fri, 9 Mar 2018 00:54:00 + 范国腾 wrote: > Thanks Rorthais, Got it. The following command could make sure that it move > to the master if there is no standby alive: > > pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 100 > pcs constraint colocation add pgsql-ip-stby1 with pgsql-ha 50 Exact. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] state file not created for Stateful resource agent
On Tue, 20 Mar 2018 13:00:49 -0500 Ken Gaillot wrote: > On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote: > > Hi, > > > > > > We have two node active/standby cluster with a dummy Stateful > > resource (pacemaker/Stateful). > > > > We observed that in case one node is up with master resource and > > other node is booted up, state file for dummy resource is not created > > on the node coming up. > > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc > > e_attributes[@id='status-2']: > name="master-unicloud" value="5"/> > > Mar 17 12:22:29 [24875] tigana lrmd: notice: > > operation_finished: unicloud_start_0:25729:stderr [ > > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such > > file or directory ] > > The resource agent is ocf:pw:uc -- I assume this is a local > customization of the ocf:pacemaker:Stateful agent? > > It looks to me like the /var/run/uc directory is not being created on > the second node. /var/run is a memory filesystem, so it's wiped at > every reboot, and any directories need to be created (as root) before > they are used, every boot. > > ocf:pacemaker:Stateful puts its state file directly in /var/run to > avoid needing to create any directories. You can change that by setting > the "state" parameter, but in that case you have to make sure the > directory you specify exists beforehand. Another way to create the folder at each boot is to ask systemd. Eg.: cat < /etc/tmpfiles.d/ocf-pw-uc.conf # Directory for ocf:pw:uc resource agent d /var/run/uc 0700 root root - - EOF Adjust the rights and owner to suit your need. To take this file in consideration immediately without rebooting the server, run the following command: systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org