Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-04 Thread Andrei Borzenkov
04.06.2018 18:53, Casey & Gina пишет: >> There are different code paths when RA is called automatically by >> resource manager and when RA is called manually by crm_resource. The >> latter did not export this environment variable until 1.1.17. So >> documentation is correct in that you do not need

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-04 Thread Casey & Gina
> There are different code paths when RA is called automatically by > resource manager and when RA is called manually by crm_resource. The > latter did not export this environment variable until 1.1.17. So > documentation is correct in that you do not need 1.1.17 to use RA > normally, as part of pa

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-01 Thread Andrei Borzenkov
On Fri, Jun 1, 2018 at 12:22 AM, Casey & Gina wrote: > >> pacemaker is too old. The error most likely comes from missing >> OCF_RESKEY_crm_feature_set which is exported by crm_resource starting >> with 1.1.17. I am not that familiar with debian packaging, but I'd >> expect resource-agents-paf requ

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Casey & Gina
> Quick look at PAF manual gives > > you need to rebuild the PostgreSQL instance on the failed node > > did you do it? I am not intimately familiar with Postgres, but in this > case I expect that you need to make database on node B secondary (slave, > whatever it is called) to new master on node

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Andrei Borzenkov
31.05.2018 19:20, Casey & Gina пишет: >> There is no "master node" in pacemaker. There is master/slave >> resource so at the best it is "node on which specific resource has >> master role". And we have no way to know which on which node you >> resource had master role when you did it. Please be mor

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Casey & Gina
> There is no "master node" in pacemaker. There is master/slave resource > so at the best it is "node on which specific resource has master role". > And we have no way to know which on which node you resource had master > role when you did it. Please be more specific, otherwise it is hard to > impo

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-30 Thread Andrei Borzenkov
31.05.2018 01:30, Casey & Gina пишет: >> In this case, the agent is returning "master (failed)", which does not >> mean that it previously failed when it was master -- it means it is >> currently running as master, in a failed condition. > > Well, it surely is NOT running. So the likely problem i

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-30 Thread Casey & Gina
> In this case, the agent is returning "master (failed)", which does not > mean that it previously failed when it was master -- it means it is > currently running as master, in a failed condition. Well, it surely is NOT running. So the likely problem is the way it's doing this check? I see a lo

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Ken Gaillot
On Tue, 2018-05-29 at 13:09 -0600, Casey & Gina wrote: > > On May 27, 2018, at 2:28 PM, Ken Gaillot > > wrote: > > > > Pacemaker isn't fencing because the start failed, at least not > > directly: > > > > > May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: > > > determine_op_status: Oper

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Ken Gaillot
On Tue, 2018-05-29 at 15:56 -0600, Casey & Gina wrote: > > On May 27, 2018, at 2:28 PM, Ken Gaillot > > wrote: > > > > > May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: > > > determine_op_status: Operation monitor found resource postgresql- > > > 10- > > > main:2 active on d-gp2-dbpg0-

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote: > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: >> determine_op_status: Operation monitor found resource postgresql-10- >> main:2 active on d-gp2-dbpg0-2 > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: notice: >> LogAction

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote: > > Pacemaker isn't fencing because the start failed, at least not > directly: > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: >> determine_op_status: Operation monitor found resource postgresql-10- >> main:2 active on d-gp2-dbpg0

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-27 Thread Ken Gaillot
On Wed, 2018-05-23 at 14:22 -0600, Casey & Gina wrote: > I have pcsd set to auto-start at boot, but not pacemaker or > corosync.  After I power off the node in vSphere, the node is fenced > and then powered back on.  I see it show up in `pcs status` with PCSD > Status of Online after a few seconds

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-25 Thread Casey Allen Shobe
Any advice about how to fix this? I've been struggling to get things working for weeks now and I think this is the final stumbling block I need to figure out. On May 23, 2018, at 2:22 PM, Casey & Gina wrote: >>> So now my concern is this - our VM's are distributed across 32 hosts. One >>> c

[ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-23 Thread Casey & Gina
>> So now my concern is this - our VM's are distributed across 32 hosts. One >> condition we were hoping to handle was when one of those host machines >> fails, due to bad memory or something else, as it is likely that not all of >> the nodes within a cluster are residing on the same VM host (t