Re: [ClusterLabs] Fence agent executing thousands of API calls per hour

2018-08-02 Thread Casey Allen Shobe
Eh?  Then why does pcs config show nothing new after that command is run?

> On Aug 1, 2018, at 3:08 PM, Ken Gaillot  wrote:
> 
> A resource can have more than one monitor, so that command by itself
> just adds a second monitor. You have to delete the original one
> separately with pcs resource op remove.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fence agent ends up stopped with no clear reason why

2018-08-02 Thread Casey Allen Shobe
The fence device starts fine, and randomly fails some time later.  In the first 
message I sent, you can see that the failure message had a different date/time 
on each cluster node, but ultimately it failed on all nodes.  My second E-mail 
on this thread has the log data from one node attached...

> On Aug 1, 2018, at 3:03 PM, Ken Gaillot  wrote:
> 
> For fence devices, a start first registers the device with stonithd
> (which should never fail). There should be a log message from stonithd
> like "Added 'vmware_fence' to the device list". The cluster then does
> an initial monitor. That is most likely what failed.
> 
> If you're lucky, the fence agent logged some detail about why that
> monitor failed, or has a debug option to do so.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crmsh 3.0 availability for RHEL 7..4?

2018-08-02 Thread FeldHost™ Admin
https://software.opensuse.org/download.html?project=network%3Aha-clustering%3AStable=crmsh

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby prispôsobíme vám. Máte 
špecifické požiadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 2 Aug 2018, at 20:55, Ron Kerry  wrote:
> 
> Is there a pre-packaged rpm available that will work on RHEL 7.4.
> 
> Right now all I have is a much older crmsh-2.1+git98 package. This does not 
> work for configuration purposes on RHEL7.4. I get these sorts of errors 
> trying to configure resources.
> 
> [root@node2 ~]# crm configure edit
> ERROR: CIB not supported: validator 'pacemaker-2.10', release '3.0.14'
> ERROR: You may try the upgrade command
> ERROR: configure: Missing requirements
> 
> The upgrade command, of course, does not work. I can configure things with 
> pcs, but I already have a pre-defined crm style configuration text file with 
> all my primitives, groups, clones and constraints that I would like to be 
> able to use directly.
> 
> -- 
> 
> Ron Kerry
> ron.ke...@hpe.com
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] crmsh 3.0 availability for RHEL 7..4?

2018-08-02 Thread Ron Kerry

Is there a pre-packaged rpm available that will work on RHEL 7.4.

Right now all I have is a much older crmsh-2.1+git98 package. This does 
not work for configuration purposes on RHEL7.4. I get these sorts of 
errors trying to configure resources.


[root@node2 ~]# crm configure edit
ERROR: CIB not supported: validator 'pacemaker-2.10', release '3.0.14'
ERROR: You may try the upgrade command
ERROR: configure: Missing requirements

The upgrade command, of course, does not work. I can configure things 
with pcs, but I already have a pre-defined crm style configuration text 
file with all my primitives, groups, clones and constraints that I would 
like to be able to use directly.


--

Ron Kerry
ron.ke...@hpe.com

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] monitor IP address

2018-08-02 Thread Aurelien Kempiak

Hello,

I'm using Openvpn in failover mode, and it manages a ucarp VIP for this. 
Problem is there's only one managed VIP (let's say I use it for WAN), 
not two (for WAN + LAN).
My idea is to use Corosync/pacemaker, monitor the openvpn ucarp (WAN) 
VIP, and brings up my pacemaker (LAN) VIP depending on WAN VIP's 
presence, that's all.
I read a lot of documentation but I really did not find how to monitor 
an IP address presence. Is that even possible ?


I would be glad if someone can tell me ;)

Best regards.

--



*Aurélien* *Kempiak*
*System & Network Engineer*

*Fixe :* 03 59 82 20 05

125 Avenue de la République 59110 La Madeleine
12 rue Marivaux 75002 Paris

 
 
 



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why Won't Resources Move?

2018-08-02 Thread Ken Gaillot
On Thu, 2018-08-02 at 02:25 +, Eric Robinson wrote:
> > > The message likely came from the resource agent calling
> > > crm_attribute
> > > to set a node attribute. That message usually means the cluster
> > > isn't
> > > running on that node, so it's highly suspect. The cib might have
> > > crashed, which should be in the log as well. I'd look into that
> > > first.
> > 
> > 
> > I rebooted the server and afterwards I'm still getting tons of
> > these...
> > 
> > Aug  2 01:43:40 001db01a drbd(p_drbd1)[18628]: ERROR: ha02_mysql:
> > Called
> > /usr/sbin/crm_master -Q -l reboot -v 1 Aug  2 01:43:40 001db01a
> > drbd(p_drbd0)[18627]: ERROR: ha01_mysql: Called
> > /usr/sbin/crm_master -Q -l
> > reboot -v 1 Aug  2 01:43:40 001db01a drbd(p_drbd0)[18627]:
> > ERROR:
> > ha01_mysql: Exit code 107 Aug  2 01:43:40 001db01a
> > drbd(p_drbd1)[18628]:
> > ERROR: ha02_mysql: Exit code 107 Aug  2 01:43:40 001db01a
> > drbd(p_drbd0)[18627]: ERROR: ha01_mysql: Command output:
> > Aug  2 01:43:40 001db01a drbd(p_drbd1)[18628]: ERROR: ha02_mysql:
> > Command output:
> > Aug  2 01:43:40 001db01a lrmd[2025]:  notice:
> > p_drbd0_monitor_6:18627:stderr [ Error signing on to the CIB
> > service:
> > Transport endpoint is not connected ] Aug  2 01:43:40 001db01a
> > lrmd[2025]:
> > notice: p_drbd1_monitor_6:18628:stderr [ Error signing on to
> > the CIB
> > service: Transport endpoint is not connected ]
> > 
> > 
> 
> Ken, 
> 
> Ironically, while researching this problem, I ran across the same
> question being asked back in November of 2017, and you made the same
> comment back then.
> 
> https://lists.clusterlabs.org/pipermail/users/2017-November/013975.ht
> ml
> 
> And the solution turned out to be the same for me as it was for that
> guy. On the node where I was getting the errors, SELINUX was
> enforcing. I set it to permissive and the errors went away. 
> 
> --Eric

LOL, glad the Internet's memory is better than mine :-)

-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Why Won't Resources Move?

2018-08-02 Thread Eric Robinson
> Hi!
> 
> I'm not familiar with Redhat, but is tis normal?:
> 
> > >   corosync: active/disabled
> > >   pacemaker: active/disabled
> 
> Regards,
> Ulrich

That's the default after a new install. I had not enabled them to start 
automatically yet. 


> 
> >>> Eric Robinson  schrieb am 02.08.2018 um
> >>> 03:44 in
> Nachricht
>  rd03.prod.outlook.com>
> 
> >>  -Original Message-
> >> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Ken
> Gaillot
> >> Sent: Wednesday, August 01, 2018 2:17 PM
> >> To: Cluster Labs - All topics related to open-source clustering
> >> welcomed 
> >> Subject: Re: [ClusterLabs] Why Won't Resources Move?
> >>
> >> On Wed, 2018-08-01 at 03:49 +, Eric Robinson wrote:
> >> > I have what seems to be a healthy cluster, but I can’t get
> >> > resources to move.
> >> >
> >> > Here’s what’s installed…
> >> >
> >> > [root@001db01a cluster]# yum list installed|egrep "pacem|coro"
> >> > corosync.x86_64  2.4.3-2.el7_5.1 @updates
> >> > corosynclib.x86_64   2.4.3-2.el7_5.1 @updates
> >> > pacemaker.x86_64 1.1.18-11.el7_5.3 @updates
> >> > pacemaker-cli.x86_64 1.1.18-11.el7_5.3 @updates
> >> > pacemaker-cluster-libs.x86_641.1.18-11.el7_5.3 @updates
> >> > pacemaker-libs.x86_641.1.18-11.el7_5.3 @updates
> >> >
> >> > Cluster status looks good…
> >> >
> >> > [root@001db01b cluster]# pcs status Cluster name: 001db01ab
> >> > Stack: corosync
> >> > Current DC: 001db01b (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
> >> > partition with quorum Last updated: Wed Aug  1 03:44:47 2018 Last
> >> > change: Wed Aug  1 03:22:18 2018 by root via cibadmin on 001db01a
> >> >
> >> > 2 nodes configured
> >> > 11 resources configured
> >> >
> >> > Online: [ 001db01a 001db01b ]
> >> >
> >> > Full list of resources:
> >> >
> >> > p_vip_clust01  (ocf::heartbeat:IPaddr2):   Started 001db01b
> >> > p_azip_clust01 (ocf::heartbeat:AZaddr2):   Started 001db01b
> >> > Master/Slave Set: ms_drbd0 [p_drbd0]
> >> >  Masters: [ 001db01b ]
> >> >  Slaves: [ 001db01a ]
> >> > Master/Slave Set: ms_drbd1 [p_drbd1]
> >> >  Masters: [ 001db01b ]
> >> >  Slaves: [ 001db01a ]
> >> > p_fs_clust01   (ocf::heartbeat:Filesystem):Started 001db01b
> >> > p_fs_clust02   (ocf::heartbeat:Filesystem):Started 001db01b
> >> > p_vip_clust02  (ocf::heartbeat:IPaddr2):   Started 001db01b
> >> > p_azip_clust02 (ocf::heartbeat:AZaddr2):   Started 001db01b
> >> > p_mysql_001(lsb:mysql_001):Started 001db01b
> >> >
> >> > Daemon Status:
> >> >   corosync: active/disabled
> >> >   pacemaker: active/disabled
> >> >   pcsd: active/enabled
> >> >
> >> > Constraints look like this…
> >> >
> >> > [root@001db01b cluster]# pcs constraint Location Constraints:
> >> > Ordering Constraints:
> >> >   promote ms_drbd0 then start p_fs_clust01 (kind:Mandatory)
> >> >   promote ms_drbd1 then start p_fs_clust02 (kind:Mandatory)
> >> >   start p_fs_clust01 then start p_vip_clust01 (kind:Mandatory)
> >> >   start p_vip_clust01 then start p_azip_clust01 (kind:Mandatory)
> >> >   start p_fs_clust02 then start p_vip_clust02 (kind:Mandatory)
> >> >   start p_vip_clust02 then start p_azip_clust02 (kind:Mandatory)
> >> >   start p_vip_clust01 then start p_mysql_001 (kind:Mandatory)
> >> > Colocation Constraints:
> >> >   p_azip_clust01 with p_vip_clust01 (score:INFINITY)
> >> >   p_fs_clust01 with ms_drbd0 (score:INFINITY) (with-rsc-role:Master)
> >> >   p_fs_clust02 with ms_drbd1 (score:INFINITY) (with-rsc-role:Master)
> >> >   p_vip_clust01 with p_fs_clust01 (score:INFINITY)
> >> >   p_vip_clust02 with p_fs_clust02 (score:INFINITY)
> >> >   p_azip_clust02 with p_vip_clust02 (score:INFINITY)
> >> >   p_mysql_001 with p_vip_clust01 (score:INFINITY) Ticket Constraints:
> >> >
> >> > But when I issue a move command, nothing at all happens.
> >> >
> >> > I see this in the log on one node…
> >> >
> >> > Aug 01 03:21:57 [16550] 001db01bcib: info:
> >> > cib_perform_op:  ++ /cib/configuration/constraints:   >> > id="cli-prefer-ms_drbd0" rsc="ms_drbd0" role="Started"
> >> > node="001db01a" score="INFINITY"/>
> >> > Aug 01 03:21:57 [16550] 001db01bcib: info:
> >> > cib_process_request: Completed cib_modify operation for section
> >> > constraints: OK (rc=0, origin=001db01a/crm_resource/4,
> >> > version=0.138.0)
> >> > Aug 01 03:21:57 [16555] 001db01b   crmd: info:
> >> > abort_transition_graph:  Transition aborted by rsc_location.cli-
> >> > prefer-ms_drbd0 'create': Configuration change | cib=0.138.0
> >> > source=te_update_diff:456 path=/cib/configuration/constraints
> >> > complete=true
> >> >
> >> > And I see this in the log on the other node…
> >> >
> >> > notice: p_drbd1_monitor_6:69196:stderr [ Error signing on to
> >> > the CIB service: Transport endpoint is not connected ]
> >>
> >> The message likely came from the resource agent calling crm_attribute
> >> to
> set
> >> a node attribute. That 

Re: [ClusterLabs] Antw: Re: Why Won't Resources Move?

2018-08-02 Thread Andrei Borzenkov


Отправлено с iPhone

> 2 авг. 2018 г., в 9:27, Ulrich Windl  
> написал(а):
> 
> Hi!
> 
> I'm not familiar with Redhat, but is tis normal?:
> 
>>>  corosync: active/disabled
>>>  pacemaker: active/disabled
> 

Some administrators prefer starting cluster stack manually, so it may be 
intentional.

> Regards,
> Ulrich
> 
 Eric Robinson  schrieb am 02.08.2018 um 03:44 in
> Nachricht
> 
> 
>>> -Original Message-
>>> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Ken
> Gaillot
>>> Sent: Wednesday, August 01, 2018 2:17 PM
>>> To: Cluster Labs - All topics related to open-source clustering welcomed
>>> 
>>> Subject: Re: [ClusterLabs] Why Won't Resources Move?
>>> 
 On Wed, 2018-08-01 at 03:49 +, Eric Robinson wrote:
 I have what seems to be a healthy cluster, but I can’t get resources
 to move.
 
 Here’s what’s installed…
 
 [root@001db01a cluster]# yum list installed|egrep "pacem|coro"
 corosync.x86_64  2.4.3-2.el7_5.1 @updates
 corosynclib.x86_64   2.4.3-2.el7_5.1 @updates
 pacemaker.x86_64 1.1.18-11.el7_5.3 @updates
 pacemaker-cli.x86_64 1.1.18-11.el7_5.3 @updates
 pacemaker-cluster-libs.x86_641.1.18-11.el7_5.3 @updates
 pacemaker-libs.x86_641.1.18-11.el7_5.3 @updates
 
 Cluster status looks good…
 
 [root@001db01b cluster]# pcs status
 Cluster name: 001db01ab
 Stack: corosync
 Current DC: 001db01b (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
 partition with quorum Last updated: Wed Aug  1 03:44:47 2018 Last
 change: Wed Aug  1 03:22:18 2018 by root via cibadmin on 001db01a
 
 2 nodes configured
 11 resources configured
 
 Online: [ 001db01a 001db01b ]
 
 Full list of resources:
 
 p_vip_clust01  (ocf::heartbeat:IPaddr2):   Started 001db01b
 p_azip_clust01 (ocf::heartbeat:AZaddr2):   Started 001db01b
 Master/Slave Set: ms_drbd0 [p_drbd0]
 Masters: [ 001db01b ]
 Slaves: [ 001db01a ]
 Master/Slave Set: ms_drbd1 [p_drbd1]
 Masters: [ 001db01b ]
 Slaves: [ 001db01a ]
 p_fs_clust01   (ocf::heartbeat:Filesystem):Started 001db01b
 p_fs_clust02   (ocf::heartbeat:Filesystem):Started 001db01b
 p_vip_clust02  (ocf::heartbeat:IPaddr2):   Started 001db01b
 p_azip_clust02 (ocf::heartbeat:AZaddr2):   Started 001db01b
 p_mysql_001(lsb:mysql_001):Started 001db01b
 
 Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
 
 Constraints look like this…
 
 [root@001db01b cluster]# pcs constraint Location Constraints:
 Ordering Constraints:
  promote ms_drbd0 then start p_fs_clust01 (kind:Mandatory)
  promote ms_drbd1 then start p_fs_clust02 (kind:Mandatory)
  start p_fs_clust01 then start p_vip_clust01 (kind:Mandatory)
  start p_vip_clust01 then start p_azip_clust01 (kind:Mandatory)
  start p_fs_clust02 then start p_vip_clust02 (kind:Mandatory)
  start p_vip_clust02 then start p_azip_clust02 (kind:Mandatory)
  start p_vip_clust01 then start p_mysql_001 (kind:Mandatory)
 Colocation Constraints:
  p_azip_clust01 with p_vip_clust01 (score:INFINITY)
  p_fs_clust01 with ms_drbd0 (score:INFINITY) (with-rsc-role:Master)
  p_fs_clust02 with ms_drbd1 (score:INFINITY) (with-rsc-role:Master)
  p_vip_clust01 with p_fs_clust01 (score:INFINITY)
  p_vip_clust02 with p_fs_clust02 (score:INFINITY)
  p_azip_clust02 with p_vip_clust02 (score:INFINITY)
  p_mysql_001 with p_vip_clust01 (score:INFINITY) Ticket Constraints:
 
 But when I issue a move command, nothing at all happens.
 
 I see this in the log on one node…
 
 Aug 01 03:21:57 [16550] 001db01bcib: info:
 cib_perform_op:  ++ /cib/configuration/constraints:  >>> id="cli-prefer-ms_drbd0" rsc="ms_drbd0" role="Started"
 node="001db01a" score="INFINITY"/>
 Aug 01 03:21:57 [16550] 001db01bcib: info:
 cib_process_request: Completed cib_modify operation for section
 constraints: OK (rc=0, origin=001db01a/crm_resource/4,
 version=0.138.0)
 Aug 01 03:21:57 [16555] 001db01b   crmd: info:
 abort_transition_graph:  Transition aborted by rsc_location.cli-
 prefer-ms_drbd0 'create': Configuration change | cib=0.138.0
 source=te_update_diff:456 path=/cib/configuration/constraints
 complete=true
 
 And I see this in the log on the other node…
 
 notice: p_drbd1_monitor_6:69196:stderr [ Error signing on to the
 CIB service: Transport endpoint is not connected ]
>>> 
>>> The message likely came from the resource agent calling crm_attribute to
> set
>>> a node attribute. That message usually means the cluster isn't running on 
>> that
>>> node, so it's highly suspect. The cib might have crashed, which should be
> in 
>> the

[ClusterLabs] Antw: Re: Why Won't Resources Move?

2018-08-02 Thread Ulrich Windl
Hi!

I'm not familiar with Redhat, but is tis normal?:

> >   corosync: active/disabled
> >   pacemaker: active/disabled

Regards,
Ulrich

>>> Eric Robinson  schrieb am 02.08.2018 um 03:44 in
Nachricht


>>  -Original Message-
>> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Ken
Gaillot
>> Sent: Wednesday, August 01, 2018 2:17 PM
>> To: Cluster Labs - All topics related to open-source clustering welcomed
>> 
>> Subject: Re: [ClusterLabs] Why Won't Resources Move?
>> 
>> On Wed, 2018-08-01 at 03:49 +, Eric Robinson wrote:
>> > I have what seems to be a healthy cluster, but I can’t get resources
>> > to move.
>> >
>> > Here’s what’s installed…
>> >
>> > [root@001db01a cluster]# yum list installed|egrep "pacem|coro"
>> > corosync.x86_64  2.4.3-2.el7_5.1 @updates
>> > corosynclib.x86_64   2.4.3-2.el7_5.1 @updates
>> > pacemaker.x86_64 1.1.18-11.el7_5.3 @updates
>> > pacemaker-cli.x86_64 1.1.18-11.el7_5.3 @updates
>> > pacemaker-cluster-libs.x86_641.1.18-11.el7_5.3 @updates
>> > pacemaker-libs.x86_641.1.18-11.el7_5.3 @updates
>> >
>> > Cluster status looks good…
>> >
>> > [root@001db01b cluster]# pcs status
>> > Cluster name: 001db01ab
>> > Stack: corosync
>> > Current DC: 001db01b (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>> > partition with quorum Last updated: Wed Aug  1 03:44:47 2018 Last
>> > change: Wed Aug  1 03:22:18 2018 by root via cibadmin on 001db01a
>> >
>> > 2 nodes configured
>> > 11 resources configured
>> >
>> > Online: [ 001db01a 001db01b ]
>> >
>> > Full list of resources:
>> >
>> > p_vip_clust01  (ocf::heartbeat:IPaddr2):   Started 001db01b
>> > p_azip_clust01 (ocf::heartbeat:AZaddr2):   Started 001db01b
>> > Master/Slave Set: ms_drbd0 [p_drbd0]
>> >  Masters: [ 001db01b ]
>> >  Slaves: [ 001db01a ]
>> > Master/Slave Set: ms_drbd1 [p_drbd1]
>> >  Masters: [ 001db01b ]
>> >  Slaves: [ 001db01a ]
>> > p_fs_clust01   (ocf::heartbeat:Filesystem):Started 001db01b
>> > p_fs_clust02   (ocf::heartbeat:Filesystem):Started 001db01b
>> > p_vip_clust02  (ocf::heartbeat:IPaddr2):   Started 001db01b
>> > p_azip_clust02 (ocf::heartbeat:AZaddr2):   Started 001db01b
>> > p_mysql_001(lsb:mysql_001):Started 001db01b
>> >
>> > Daemon Status:
>> >   corosync: active/disabled
>> >   pacemaker: active/disabled
>> >   pcsd: active/enabled
>> >
>> > Constraints look like this…
>> >
>> > [root@001db01b cluster]# pcs constraint Location Constraints:
>> > Ordering Constraints:
>> >   promote ms_drbd0 then start p_fs_clust01 (kind:Mandatory)
>> >   promote ms_drbd1 then start p_fs_clust02 (kind:Mandatory)
>> >   start p_fs_clust01 then start p_vip_clust01 (kind:Mandatory)
>> >   start p_vip_clust01 then start p_azip_clust01 (kind:Mandatory)
>> >   start p_fs_clust02 then start p_vip_clust02 (kind:Mandatory)
>> >   start p_vip_clust02 then start p_azip_clust02 (kind:Mandatory)
>> >   start p_vip_clust01 then start p_mysql_001 (kind:Mandatory)
>> > Colocation Constraints:
>> >   p_azip_clust01 with p_vip_clust01 (score:INFINITY)
>> >   p_fs_clust01 with ms_drbd0 (score:INFINITY) (with-rsc-role:Master)
>> >   p_fs_clust02 with ms_drbd1 (score:INFINITY) (with-rsc-role:Master)
>> >   p_vip_clust01 with p_fs_clust01 (score:INFINITY)
>> >   p_vip_clust02 with p_fs_clust02 (score:INFINITY)
>> >   p_azip_clust02 with p_vip_clust02 (score:INFINITY)
>> >   p_mysql_001 with p_vip_clust01 (score:INFINITY) Ticket Constraints:
>> >
>> > But when I issue a move command, nothing at all happens.
>> >
>> > I see this in the log on one node…
>> >
>> > Aug 01 03:21:57 [16550] 001db01bcib: info:
>> > cib_perform_op:  ++ /cib/configuration/constraints:  > > id="cli-prefer-ms_drbd0" rsc="ms_drbd0" role="Started"
>> > node="001db01a" score="INFINITY"/>
>> > Aug 01 03:21:57 [16550] 001db01bcib: info:
>> > cib_process_request: Completed cib_modify operation for section
>> > constraints: OK (rc=0, origin=001db01a/crm_resource/4,
>> > version=0.138.0)
>> > Aug 01 03:21:57 [16555] 001db01b   crmd: info:
>> > abort_transition_graph:  Transition aborted by rsc_location.cli-
>> > prefer-ms_drbd0 'create': Configuration change | cib=0.138.0
>> > source=te_update_diff:456 path=/cib/configuration/constraints
>> > complete=true
>> >
>> > And I see this in the log on the other node…
>> >
>> > notice: p_drbd1_monitor_6:69196:stderr [ Error signing on to the
>> > CIB service: Transport endpoint is not connected ]
>> 
>> The message likely came from the resource agent calling crm_attribute to
set
>> a node attribute. That message usually means the cluster isn't running on 
> that
>> node, so it's highly suspect. The cib might have crashed, which should be
in 
> the
>> log as well. I'd look into that first.
> 
> 
> I rebooted the server and afterwards I'm still getting tons of these...
> 
> Aug  2 01:43:40 001db01a drbd(p_drbd1)[18628]: ERROR: ha02_mysql: Called 
> 

[ClusterLabs] Antw: Re: Fence agent ends up stopped with no clear reason why

2018-08-02 Thread Ulrich Windl
Hi!

I think "Processing failed op start for vmware_fence on q-gp2-dbpg57-3:
unknown error (1)" is the reason. You should investigate why it could not be
started.

Regards,
Ulrich

>>> Casey Allen Shobe  schrieb am 01.08.2018 um
21:43
in Nachricht <1abda8cb-59c0-467c-b540-1ff498430...@icloud.com>:
> Here is the corosync.log for the first host in the list at the indicated 
> time.  Not sure what it's doing or why ‑ all cluster nodes were up and
running 
> the entire time...no fencing events.
> 
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: cib_perform_op: 
   
> Diff: ‑‑‑ 0.700.4 2
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: cib_perform_op: 
   
> Diff: +++ 0.700.5 (null)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: cib_perform_op: 
   
> +  /cib:  @num_updates=5
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='
> vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_0']:  
> @operation_key=vmware_fence_start_0, @operation=start, 
> @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @call‑id=42, @rc‑code=1, @op‑status=4, @exec‑time=1510
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='
> vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_failure_0']:  
> @operation_key=vmware_fence_start_0, @operation=start, 
> @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @call‑id=42, @interval=0, @last‑rc‑change=1532987187, @exec‑time=1510, 
> @op‑digest=8653f310a5c96a63ab95a
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: 
> cib_process_request:Completed cib_modify operation for section 
> status: OK (rc=0, origin=q‑gp2‑dbpg57‑3/crmd/32, version=0.700.5)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1   crmd:   notice: 
> abort_transition_graph: Transition aborted by vmware_fence_start_0 
> 'modify' on q‑gp2‑dbpg57‑3: Event failed 
> (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, 
> source=match_graph_event:381, 0)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1   crmd: info: 
> abort_transition_graph: Transition aborted by vmware_fence_start_0 
> 'modify' on q‑gp2‑dbpg57‑3: Event failed 
> (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, 
> source=match_graph_event:381, 0)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1   crmd:   notice: run_graph:  
> Transition 5084 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑729.bz2): Complete
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1   crmd: info: 
> do_state_transition:State transition S_TRANSITION_ENGINE ‑> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1cib: info: 
> cib_process_request:Forwarding cib_modify operation for section 
> status to master (origin=local/attrd/46)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status_fencing:Node q‑gp2‑dbpg57‑1 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status:Node q‑gp2‑dbpg57‑1 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status_fencing:Node q‑gp2‑dbpg57‑3 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status:Node q‑gp2‑dbpg57‑3 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status_fencing:Node q‑gp2‑dbpg57‑2 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_online_status:Node q‑gp2‑dbpg57‑2 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status:Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status:Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status:Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status:Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status:Operation monitor found resource 
> postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1pengine: info: 
> determine_op_status: