Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

Ken Gaillot Mon, 19 Sep 2016 07:35:58 -0700

On 09/19/2016 02:31 AM, Auer, Jens wrote:
> Hi,
> 
> I am not sure that pacemaker should do any fencing here. In my setting, 
> corosync is configured to use a back-to-back connection for heartbeats. This 
> is different subnet then used by the ping resource that checks the network 
> connectivity and detects a failure. In my test, I bring down the network 
> device used by ping and this triggers the failover. The node status is known 
> by pacemaker since it receives heartbeats and it only a resource failure. I 
> asked for fencing conditions a few days ago, and basically was asserted that 
> resource failure should not trigger STONITH actions if not explicitly 
> configured.


Is the network interface being taken down here used for corosync
communication? If so, that is a node-level failure, and pacemaker will
fence.

There is a bit of a distinction between DRBD fencing and pacemaker
fencing. The DRBD configuration is designed so that DRBD's fencing
method is to go through pacemaker.

When DRBD is configured with 'fencing resource-only' and 'fence-peer
"/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
it will try to add a constraint that prevents the other node from
becoming master. It removes the constraint when connectivity is restored.

I am not familiar with all the under-the-hood details, but IIUC, if
pacemaker actually fences the node, then the other node can still take
over the DRBD. But if there is a network outage and no pacemaker
fencing, then you'll see the behavior you describe -- DRBD prevents
master takeover, to avoid stale data being used.


> I am also wondering why this is "sticky". After a failover test the DRBD 
> resources are not working even if I restart the cluster on all nodes. 
> 
> Best wishes,
>   Jens
> 
> --
> Dr. Jens Auer | CGI | Software Engineer
> CGI Deutschland Ltd. & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> [email protected]
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
> de.cgi.com/pflichtangaben.
> 
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
> Group Inc. and its affiliates may be contained in this message. If you are 
> not a recipient indicated or intended in this message (or responsible for 
> delivery of this message to such person), or you think for any reason that 
> this message may have been addressed to you in error, you may not use or copy 
> or deliver this message to anyone else. In such case, you should destroy this 
> message and are asked to notify the sender by reply e-mail.
> 
>> -----Original Message-----
>> From: Ken Gaillot [mailto:[email protected]]
>> Sent: 16 September 2016 17:56
>> To: [email protected]
>> Subject: Re: [ClusterLabs] No DRBD resource promoted to master in 
>> Active/Passive
>> setup
>>
>> On 09/16/2016 10:02 AM, Auer, Jens wrote:
>>> Hi,
>>>
>>> I have an Active/Passive configuration with a drbd mast/slave resource:
>>>
>>> MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP
>>> Last updated: Fri Sep 16 14:41:18 2016        Last change: Fri Sep 16
>>> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
>>> Stack: corosync
>>> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
>>> with quorum
>>> 2 nodes and 7 resources configured
>>>
>>> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>>>
>>> Full list of resources:
>>>
>>>  Master/Slave Set: drbd1_sync [drbd1]
>>>      Masters: [ MDA1PFP-PCS02 ]
>>>      Slaves: [ MDA1PFP-PCS01 ]
>>>  mda-ip    (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
>>>  Clone Set: ping-clone [ping]
>>>      Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>>>  ACTIVE    (ocf::heartbeat:Dummy):    Started MDA1PFP-PCS02
>>>  shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02
>>>
>>> PCSD Status:
>>>   MDA1PFP-PCS01: Online
>>>   MDA1PFP-PCS02: Online
>>>
>>> Daemon Status:
>>>   corosync: active/disabled
>>>   pacemaker: active/disabled
>>>   pcsd: active/enabled
>>>
>>> MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
>>>  Master: drbd1_sync
>>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>> clone-node-max=1 notify=true
>>>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>>>    Attributes: drbd_resource=shared_fs
>>>    Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>>>                promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>>>                demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>>>                stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>>>                monitor interval=60s (drbd1-monitor-interval-60s)
>>>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>>>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>>>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>>>               stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>>>               monitor interval=1s (mda-ip-monitor-interval-1s)
>>>  Clone: ping-clone
>>>   Resource: ping (class=ocf provider=pacemaker type=ping)
>>>    Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1
>>> timeout=1 attempts=3
>>>    Operations: start interval=0s timeout=60 (ping-start-interval-0s)
>>>                stop interval=0s timeout=20 (ping-stop-interval-0s)
>>>                monitor interval=1 (ping-monitor-interval-1)
>>>  Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
>>>   Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
>>>               stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
>>>               monitor interval=10 timeout=20
>>> (ACTIVE-monitor-interval-10)
>>>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>>>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>>>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>>>               stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>>>               monitor interval=20 timeout=40
>>> (shared_fs-monitor-interval-20)
>>>
>>> MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full Location
>>> Constraints:
>>>   Resource: mda-ip
>>>     Enabled on: MDA1PFP-PCS01 (score:50)
>>> (id:location-mda-ip-MDA1PFP-PCS01-50)
>>>     Constraint: location-mda-ip
>>>       Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
>>>         Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
>>>         Expression: not_defined pingd
>>> (id:location-mda-ip-rule-expr-1) Ordering Constraints:
>>>   start ping-clone then start mda-ip (kind:Optional)
>>> (id:order-ping-clone-mda-ip-Optional)
>>>   promote drbd1_sync then start shared_fs (kind:Mandatory)
>>> (id:order-drbd1_sync-shared_fs-mandatory)
>>> Colocation Constraints:
>>>   ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
>>>   drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master)
>>> (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
>>>   shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started)
>>> (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)
>>>
>>> The cluster starts fine, except resources starting not on the
>>> preferred host. I asked this in a different question to keep things 
>>> separated.
>>> The status after starting is:
>>> Last updated: Fri Sep 16 14:39:57 2016          Last change: Fri Sep 16
>>> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
>>> Stack: corosync
>>> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
>>> with quorum
>>> 2 nodes and 7 resources configured
>>>
>>> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>>>
>>>  Master/Slave Set: drbd1_sync [drbd1]
>>>      Masters: [ MDA1PFP-PCS02 ]
>>>      Slaves: [ MDA1PFP-PCS01 ]
>>> mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
>>>  Clone Set: ping-clone [ping]
>>>      Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] ACTIVE
>>> (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
>>> shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02
>>>
>>> From this state, I did two tests to simulate a cluster failover:
>>> 1. Shutdown the cluster node with the master with pcs cluster stop 2.
>>> Disable the network device for the virtual ip with ifdown and wait
>>> until ping detects it
>>>
>>> In both cases, the failover is executed but the drbd is not promoted
>>> to master on the new active node:
>>> Last updated: Fri Sep 16 14:43:33 2016          Last change: Fri Sep 16
>>> 14:43:31 2016 by root via cibadmin on MDA1PFP-PCS01
>>> Stack: corosync
>>> Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition
>>> with quorum
>>> 2 nodes and 7 resources configured
>>>
>>> Online: [ MDA1PFP-PCS01 ]
>>> OFFLINE: [ MDA1PFP-PCS02 ]
>>>
>>>  Master/Slave Set: drbd1_sync [drbd1]
>>>      Slaves: [ MDA1PFP-PCS01 ]
>>> mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS01
>>>  Clone Set: ping-clone [ping]
>>>      Started: [ MDA1PFP-PCS01 ]
>>> ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01
>>>
>>> I was able to trace this to the fencing in the drbd configuration
>>> MDA1PFP-S01 14:41:44 1806 0 ~ # cat /etc/drbd.d/shared_fs.res resource
>>> shared_fs {
>>> disk    /dev/mapper/rhel_mdaf--pf--pep--1-drbd;
>>>   disk {
>>>     fencing resource-only;
>>>   }
>>>   handlers {
>>>     fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>>>     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>>>   }
>>>     device    /dev/drbd1;
>>>     meta-disk internal;
>>>     on MDA1PFP-S01 {
>>>         address 192.168.123.10:7789;
>>>     }
>>>     on MDA1PFP-S02 {
>>>         address 192.168.123.11:7789;
>>>     }
>>> }
>>
>> This coordinates fencing between DRBD and pacemaker. You still have to 
>> configure
>> fencing in pacemaker. If pacemaker can't fence the unseen node, it can't be 
>> sure it's
>> safe to bring up master.
>>
>>> I am using drbd 8.4.7, drbd utils 8.9.5 and pacemaker 2.3.4-7.el7 with
>>> corosyinc 0.9.143-15.el7 from the Centos7 repositories.
>>>
>>> MDA1PFP-S01 15:00:20 1841 0 ~ # drbdadm --version
>>> DRBDADM_BUILDTAG=GIT-hash:\
>> 5d50d9fb2a967d21c0f5746370ccc066d3a67f7d\
>>> build\ by\ mockbuild@\,\ 2016-01-12\ 12:46:45
>>> DRBDADM_API_VERSION=1
>>> DRBD_KERNEL_VERSION_CODE=0x080407
>>> DRBDADM_VERSION_CODE=0x080905
>>> DRBDADM_VERSION=8.9.5
>>>
>>> If I disable the fencing scripts everything works as expected. If
>>> enabled, no node is promoted to master after failover. It seems to be
>>> a sticky modificaton because once a failover is simulated with fencing
>>> scripts activated I cannot get the cluster to work anymore. Even
>>> removing the setting from the DRBD configuration does not help.
>>>
>>> I captured the complete log from /var/log/messages from cluster start
>>> to failover if that helps:
>>> MDA1PFP-S01 14:48:37 1807 0 ~ # cat /var/log/messages Sep 16 14:40:16
>>> MDA1PFP-S01 rsyslogd: [origin software="rsyslogd"
>>> swVersion="7.4.7" x-pid="13857" x-info="http://www.rsyslog.com";] start
>>> Sep 16 14:40:16 MDA1PFP-S01 rsyslogd-2221: module 'imuxsock' already
>>> in this config, cannot be added  [try http://www.rsyslog.com/e/2221 ]
>>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Stopping System Logging Service...
>>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Starting System Logging Service...
>>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Started System Logging Service.
>>> Sep 16 14:40:27 MDA1PFP-S01 systemd: Started Corosync Cluster Engine.
>>> Sep 16 14:40:27 MDA1PFP-S01 systemd: Started Pacemaker High
>>> Availability Cluster Manager.
>>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=33, rc=0, cib-update=22,
>>> confirmed=true)
>>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=32, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:30 MDA1PFP-S01 IPaddr2(mda-ip)[15321]: INFO: Adding inet
>>> address 192.168.120.20/32 with broadcast address 192.168.120.255 to
>>> device bond0 Sep 16 14:43:30 MDA1PFP-S01 avahi-daemon[912]:
>>> Registering new address record for 192.168.120.20 on bond0.IPv4.
>>> Sep 16 14:43:30 MDA1PFP-S01 IPaddr2(mda-ip)[15321]: INFO: Bringing
>>> device bond0 up Sep 16 14:43:30 MDA1PFP-S01 kernel: block drbd1: peer(
>>> Primary -> Secondary ) Sep 16 14:43:30 MDA1PFP-S01
>>> IPaddr2(mda-ip)[15321]: INFO:
>>> /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
>>> /var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20
>>> auto not_used not_used Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]:
>>> notice: Operation
>>> mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=35, rc=0, cib-update=24,
>>> confirmed=true)
>>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=36, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs: peer( Secondary ->
>>> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
>>> Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs: ack_receiver
>>> terminated Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs:
>>> Terminating drbd_a_shared_f Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd
>>> shared_fs: Connection closed Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd
>>> shared_fs: conn( TearDown -> Unconnected ) Sep 16 14:43:31 MDA1PFP-S01
>>> kernel: drbd shared_fs: receiver terminated Sep 16 14:43:31
>>> MDA1PFP-S01 kernel: drbd shared_fs: Restarting receiver thread Sep 16
>>> 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: receiver (re)started Sep
>>> 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: conn( Unconnected ->
>>> WFConnection ) Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice:
>>> Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: helper command:
>>> /sbin/drbdadm fence-peer shared_fs
>>> Sep 16 14:43:31 MDA1PFP-S01 crm-fence-peer.sh[15569]: invoked for
>>> shared_fs Sep 16 14:43:31 MDA1PFP-S01 crm-fence-peer.sh[15569]: INFO
>>> peer is not reachable, my disk is UpToDate: placed constraint
>>> 'drbd-fence-by-handler-shared_fs-drbd1_sync'
>>> Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: helper command:
>>> /sbin/drbdadm fence-peer shared_fs exit code 5 (0x500) Sep 16 14:43:31
>>> MDA1PFP-S01 kernel: drbd shared_fs: fence-peer helper returned 5 (peer
>>> is unreachable, assumed to be dead) Sep 16 14:43:31 MDA1PFP-S01
>>> kernel: drbd shared_fs: pdsk( DUnknown -> Outdated ) Sep 16 14:43:31
>>> MDA1PFP-S01 kernel: block drbd1: role( Secondary -> Primary ) Sep 16
>>> 14:43:31 MDA1PFP-S01 kernel: block drbd1: new current UUID
>>>
>> B1FC3E9C008711DD:C02542C7B26F9B28:BCC6102B1FD69768:BCC5102B1FD697
>> 68
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:   error: pcmkRegisterNode:
>>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_promote_0: ok (node=MDA1PFP-PCS01, call=41, rc=0, cib-update=26,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=42, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Our peer on the DC
>>> (MDA1PFP-PCS02) is dead
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: State transition
>>> S_NOT_DC -> S_ELECTION [ input=I_ELECTION
>> cause=C_CRMD_STATUS_CALLBACK
>>> origin=peer_update_callback ] Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:
>>> notice: State transition S_ELECTION -> S_INTEGRATION [
>>> input=I_ELECTION_DC cause=C_TIMER_POPPED
>>> origin=election_timeout_popped ] Sep 16 14:43:31 MDA1PFP-S01
>>> attrd[13128]:  notice: crm_update_peer_proc:
>>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31
>>> MDA1PFP-S01 attrd[13128]:  notice: Removing all
>>> MDA1PFP-PCS02 attributes for attrd_peer_change_cb Sep 16 14:43:31
>>> MDA1PFP-S01 attrd[13128]:  notice: Lost attribute writer
>>> MDA1PFP-PCS02
>>> Sep 16 14:43:31 MDA1PFP-S01 attrd[13128]:  notice: Removing
>>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01
>>> attrd[13128]:  notice: Purged 1 peers with
>>> id=2 and/or uname=MDA1PFP-PCS02 from the membership cache Sep 16
>>> 14:43:31 MDA1PFP-S01 stonith-ng[13125]:  notice:
>>> crm_update_peer_proc: Node MDA1PFP-PCS02[2] - state is now lost (was
>>> member) Sep 16 14:43:31 MDA1PFP-S01 stonith-ng[13125]:  notice:
>>> Removing
>>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01
>>> stonith-ng[13125]:  notice: Purged 1 peers with id=2 and/or
>>> uname=MDA1PFP-PCS02 from the membership cache Sep 16 14:43:31
>>> MDA1PFP-S01 cib[13124]:  notice: crm_update_peer_proc:
>>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31
>>> MDA1PFP-S01 cib[13124]:  notice: Removing
>>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01
>>> cib[13124]:  notice: Purged 1 peers with
>>> id=2 and/or uname=MDA1PFP-PCS02 from the membership cache Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]: warning: FSA: Input I_ELECTION_DC
>>> from do_election_check() received in state S_INTEGRATION Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Notifications disabled
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:   error: pcmkRegisterNode:
>>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16
>>> 14:43:31 MDA1PFP-S01 pengine[13129]:  notice: On loss of CCM
>>> Quorum: Ignore
>>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]:  notice: Demote  drbd1:0
>>> (Master -> Slave MDA1PFP-PCS01)
>>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]:  notice: Calculated
>>> Transition 0: /var/lib/pacemaker/pengine/pe-input-414.bz2
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Initiating action 55:
>>> notify drbd1_pre_notify_demote_0 on MDA1PFP-PCS01 (local) Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=43, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Initiating action 8:
>>> demote drbd1_demote_0 on MDA1PFP-PCS01 (local) Sep 16 14:43:31
>>> MDA1PFP-S01 systemd-udevd: error: /dev/drbd1: Wrong medium type Sep 16
>>> 14:43:31 MDA1PFP-S01 kernel: block drbd1: role( Primary -> Secondary )
>>> Sep 16 14:43:31 MDA1PFP-S01 kernel: block drbd1: bitmap WRITE of 0
>>> pages took 0 jiffies Sep 16 14:43:31 MDA1PFP-S01 kernel: block drbd1:
>>> 0 KB (0 bits) marked out-of-sync by on disk bit-map.
>>> Sep 16 14:43:31 MDA1PFP-S01 systemd-udevd: error: /dev/drbd1: Wrong
>>> medium type
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:   error: pcmkRegisterNode:
>>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_demote_0: ok (node=MDA1PFP-PCS01, call=44, rc=0, cib-update=49,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Initiating action 56:
>>> notify drbd1_post_notify_demote_0 on MDA1PFP-PCS01 (local) Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Operation
>>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=0,
>>> confirmed=true)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Initiating action 10:
>>> monitor drbd1_monitor_60000 on MDA1PFP-PCS01 (local) Sep 16 14:43:31
>>> MDA1PFP-S01 corosync[13019]: [TOTEM ] A new membership
>>> (192.168.121.10:988) was formed. Members left: 2 Sep 16 14:43:31
>>> MDA1PFP-S01 corosync[13019]: [QUORUM] Members[1]: 1 Sep 16 14:43:31
>>> MDA1PFP-S01 corosync[13019]: [MAIN  ] Completed service
>>> synchronization, ready to provide service.
>>> Sep 16 14:43:31 MDA1PFP-S01 pacemakerd[13113]:  notice:
>>> crm_reap_unseen_nodes: Node MDA1PFP-PCS02[2] - state is now lost (was
>>> member)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: crm_reap_unseen_nodes:
>>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31
>>> MDA1PFP-S01 crmd[13130]: warning: No match for shutdown action on 2
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Stonith/shutdown of
>>> MDA1PFP-PCS02 not matched
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Transition aborted:
>>> Node failure (source=peer_update_callback:252, 0)
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:   error: pcmkRegisterNode:
>>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Transition 0 (Complete=10,
>>> Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-414.bz2): Complete Sep 16
>>> 14:43:31 MDA1PFP-S01 pengine[13129]:  notice: On loss of CCM
>>> Quorum: Ignore
>>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]:  notice: Calculated
>>> Transition 1: /var/lib/pacemaker/pengine/pe-input-415.bz2
>>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: Transition 1
>>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-415.bz2): Complete Sep 16
>>> 14:43:31 MDA1PFP-S01 crmd[13130]:  notice: State transition
>>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 16 14:48:48 MDA1PFP-S01
>>> chronyd[909]: Source 62.116.162.126 replaced with 46.182.19.75
>>>
>>> Any help appreciated,
>>>   Jens
>>>
>>>
>>> --
>>> *Jens Auer *| CGI | Software-Engineer
>>> CGI (Germany) GmbH & Co. KG
>>> Rheinstraße 95 | 64295 Darmstadt | Germany
>>> T: +49 6151 36860 154
>>> [email protected]_ <mailto:[email protected]> Unsere Pflichtangaben
>>> gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter
>>> _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>.
>>>
>>> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging
>>> to CGI Group Inc. and its affiliates may be contained in this message.
>>> If you are not a recipient indicated or intended in this message (or
>>> responsible for delivery of this message to such person), or you think
>>> for any reason that this message may have been addressed to you in
>>> error, you may not use or copy or deliver this message to anyone else.
>>> In such case, you should destroy this message and are asked to notify
>>> the sender by reply e-mail.
>>>
>>> --
>>> *Jens Auer *| CGI | Software-Engineer
>>> CGI (Germany) GmbH & Co. KG
>>> Rheinstraße 95 | 64295 Darmstadt | Germany
>>> T: +49 6151 36860 154
>>> [email protected]_ <mailto:[email protected]> Unsere Pflichtangaben
>>> gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter
>>> _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>.

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

Reply via email to