On 09/19/2016 02:31 AM, Auer, Jens wrote: > Hi, > > I am not sure that pacemaker should do any fencing here. In my setting, > corosync is configured to use a back-to-back connection for heartbeats. This > is different subnet then used by the ping resource that checks the network > connectivity and detects a failure. In my test, I bring down the network > device used by ping and this triggers the failover. The node status is known > by pacemaker since it receives heartbeats and it only a resource failure. I > asked for fencing conditions a few days ago, and basically was asserted that > resource failure should not trigger STONITH actions if not explicitly > configured.
Is the network interface being taken down here used for corosync communication? If so, that is a node-level failure, and pacemaker will fence. There is a bit of a distinction between DRBD fencing and pacemaker fencing. The DRBD configuration is designed so that DRBD's fencing method is to go through pacemaker. When DRBD is configured with 'fencing resource-only' and 'fence-peer "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage, it will try to add a constraint that prevents the other node from becoming master. It removes the constraint when connectivity is restored. I am not familiar with all the under-the-hood details, but IIUC, if pacemaker actually fences the node, then the other node can still take over the DRBD. But if there is a network outage and no pacemaker fencing, then you'll see the behavior you describe -- DRBD prevents master takeover, to avoid stale data being used. > I am also wondering why this is "sticky". After a failover test the DRBD > resources are not working even if I restart the cluster on all nodes. > > Best wishes, > Jens > > -- > Dr. Jens Auer | CGI | Software Engineer > CGI Deutschland Ltd. & Co. KG > Rheinstraße 95 | 64295 Darmstadt | Germany > T: +49 6151 36860 154 > [email protected] > Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter > de.cgi.com/pflichtangaben. > > CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI > Group Inc. and its affiliates may be contained in this message. If you are > not a recipient indicated or intended in this message (or responsible for > delivery of this message to such person), or you think for any reason that > this message may have been addressed to you in error, you may not use or copy > or deliver this message to anyone else. In such case, you should destroy this > message and are asked to notify the sender by reply e-mail. > >> -----Original Message----- >> From: Ken Gaillot [mailto:[email protected]] >> Sent: 16 September 2016 17:56 >> To: [email protected] >> Subject: Re: [ClusterLabs] No DRBD resource promoted to master in >> Active/Passive >> setup >> >> On 09/16/2016 10:02 AM, Auer, Jens wrote: >>> Hi, >>> >>> I have an Active/Passive configuration with a drbd mast/slave resource: >>> >>> MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP >>> Last updated: Fri Sep 16 14:41:18 2016 Last change: Fri Sep 16 >>> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01 >>> Stack: corosync >>> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition >>> with quorum >>> 2 nodes and 7 resources configured >>> >>> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] >>> >>> Full list of resources: >>> >>> Master/Slave Set: drbd1_sync [drbd1] >>> Masters: [ MDA1PFP-PCS02 ] >>> Slaves: [ MDA1PFP-PCS01 ] >>> mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS02 >>> Clone Set: ping-clone [ping] >>> Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] >>> ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02 >>> shared_fs (ocf::heartbeat:Filesystem): Started MDA1PFP-PCS02 >>> >>> PCSD Status: >>> MDA1PFP-PCS01: Online >>> MDA1PFP-PCS02: Online >>> >>> Daemon Status: >>> corosync: active/disabled >>> pacemaker: active/disabled >>> pcsd: active/enabled >>> >>> MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full >>> Master: drbd1_sync >>> Meta Attrs: master-max=1 master-node-max=1 clone-max=2 >>> clone-node-max=1 notify=true >>> Resource: drbd1 (class=ocf provider=linbit type=drbd) >>> Attributes: drbd_resource=shared_fs >>> Operations: start interval=0s timeout=240 (drbd1-start-interval-0s) >>> promote interval=0s timeout=90 (drbd1-promote-interval-0s) >>> demote interval=0s timeout=90 (drbd1-demote-interval-0s) >>> stop interval=0s timeout=100 (drbd1-stop-interval-0s) >>> monitor interval=60s (drbd1-monitor-interval-60s) >>> Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2) >>> Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0 >>> Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s) >>> stop interval=0s timeout=20s (mda-ip-stop-interval-0s) >>> monitor interval=1s (mda-ip-monitor-interval-1s) >>> Clone: ping-clone >>> Resource: ping (class=ocf provider=pacemaker type=ping) >>> Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 >>> timeout=1 attempts=3 >>> Operations: start interval=0s timeout=60 (ping-start-interval-0s) >>> stop interval=0s timeout=20 (ping-stop-interval-0s) >>> monitor interval=1 (ping-monitor-interval-1) >>> Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy) >>> Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s) >>> stop interval=0s timeout=20 (ACTIVE-stop-interval-0s) >>> monitor interval=10 timeout=20 >>> (ACTIVE-monitor-interval-10) >>> Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem) >>> Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs >>> Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s) >>> stop interval=0s timeout=60 (shared_fs-stop-interval-0s) >>> monitor interval=20 timeout=40 >>> (shared_fs-monitor-interval-20) >>> >>> MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full Location >>> Constraints: >>> Resource: mda-ip >>> Enabled on: MDA1PFP-PCS01 (score:50) >>> (id:location-mda-ip-MDA1PFP-PCS01-50) >>> Constraint: location-mda-ip >>> Rule: score=-INFINITY boolean-op=or (id:location-mda-ip-rule) >>> Expression: pingd lt 1 (id:location-mda-ip-rule-expr) >>> Expression: not_defined pingd >>> (id:location-mda-ip-rule-expr-1) Ordering Constraints: >>> start ping-clone then start mda-ip (kind:Optional) >>> (id:order-ping-clone-mda-ip-Optional) >>> promote drbd1_sync then start shared_fs (kind:Mandatory) >>> (id:order-drbd1_sync-shared_fs-mandatory) >>> Colocation Constraints: >>> ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY) >>> drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) >>> (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY) >>> shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) >>> (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY) >>> >>> The cluster starts fine, except resources starting not on the >>> preferred host. I asked this in a different question to keep things >>> separated. >>> The status after starting is: >>> Last updated: Fri Sep 16 14:39:57 2016 Last change: Fri Sep 16 >>> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01 >>> Stack: corosync >>> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition >>> with quorum >>> 2 nodes and 7 resources configured >>> >>> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] >>> >>> Master/Slave Set: drbd1_sync [drbd1] >>> Masters: [ MDA1PFP-PCS02 ] >>> Slaves: [ MDA1PFP-PCS01 ] >>> mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS02 >>> Clone Set: ping-clone [ping] >>> Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] ACTIVE >>> (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02 >>> shared_fs (ocf::heartbeat:Filesystem): Started MDA1PFP-PCS02 >>> >>> From this state, I did two tests to simulate a cluster failover: >>> 1. Shutdown the cluster node with the master with pcs cluster stop 2. >>> Disable the network device for the virtual ip with ifdown and wait >>> until ping detects it >>> >>> In both cases, the failover is executed but the drbd is not promoted >>> to master on the new active node: >>> Last updated: Fri Sep 16 14:43:33 2016 Last change: Fri Sep 16 >>> 14:43:31 2016 by root via cibadmin on MDA1PFP-PCS01 >>> Stack: corosync >>> Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition >>> with quorum >>> 2 nodes and 7 resources configured >>> >>> Online: [ MDA1PFP-PCS01 ] >>> OFFLINE: [ MDA1PFP-PCS02 ] >>> >>> Master/Slave Set: drbd1_sync [drbd1] >>> Slaves: [ MDA1PFP-PCS01 ] >>> mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS01 >>> Clone Set: ping-clone [ping] >>> Started: [ MDA1PFP-PCS01 ] >>> ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01 >>> >>> I was able to trace this to the fencing in the drbd configuration >>> MDA1PFP-S01 14:41:44 1806 0 ~ # cat /etc/drbd.d/shared_fs.res resource >>> shared_fs { >>> disk /dev/mapper/rhel_mdaf--pf--pep--1-drbd; >>> disk { >>> fencing resource-only; >>> } >>> handlers { >>> fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >>> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; >>> } >>> device /dev/drbd1; >>> meta-disk internal; >>> on MDA1PFP-S01 { >>> address 192.168.123.10:7789; >>> } >>> on MDA1PFP-S02 { >>> address 192.168.123.11:7789; >>> } >>> } >> >> This coordinates fencing between DRBD and pacemaker. You still have to >> configure >> fencing in pacemaker. If pacemaker can't fence the unseen node, it can't be >> sure it's >> safe to bring up master. >> >>> I am using drbd 8.4.7, drbd utils 8.9.5 and pacemaker 2.3.4-7.el7 with >>> corosyinc 0.9.143-15.el7 from the Centos7 repositories. >>> >>> MDA1PFP-S01 15:00:20 1841 0 ~ # drbdadm --version >>> DRBDADM_BUILDTAG=GIT-hash:\ >> 5d50d9fb2a967d21c0f5746370ccc066d3a67f7d\ >>> build\ by\ mockbuild@\,\ 2016-01-12\ 12:46:45 >>> DRBDADM_API_VERSION=1 >>> DRBD_KERNEL_VERSION_CODE=0x080407 >>> DRBDADM_VERSION_CODE=0x080905 >>> DRBDADM_VERSION=8.9.5 >>> >>> If I disable the fencing scripts everything works as expected. If >>> enabled, no node is promoted to master after failover. It seems to be >>> a sticky modificaton because once a failover is simulated with fencing >>> scripts activated I cannot get the cluster to work anymore. Even >>> removing the setting from the DRBD configuration does not help. >>> >>> I captured the complete log from /var/log/messages from cluster start >>> to failover if that helps: >>> MDA1PFP-S01 14:48:37 1807 0 ~ # cat /var/log/messages Sep 16 14:40:16 >>> MDA1PFP-S01 rsyslogd: [origin software="rsyslogd" >>> swVersion="7.4.7" x-pid="13857" x-info="http://www.rsyslog.com"] start >>> Sep 16 14:40:16 MDA1PFP-S01 rsyslogd-2221: module 'imuxsock' already >>> in this config, cannot be added [try http://www.rsyslog.com/e/2221 ] >>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Stopping System Logging Service... >>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Starting System Logging Service... >>> Sep 16 14:40:16 MDA1PFP-S01 systemd: Started System Logging Service. >>> Sep 16 14:40:27 MDA1PFP-S01 systemd: Started Corosync Cluster Engine. >>> Sep 16 14:40:27 MDA1PFP-S01 systemd: Started Pacemaker High >>> Availability Cluster Manager. >>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]: notice: Operation >>> ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=33, rc=0, cib-update=22, >>> confirmed=true) >>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=32, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:30 MDA1PFP-S01 IPaddr2(mda-ip)[15321]: INFO: Adding inet >>> address 192.168.120.20/32 with broadcast address 192.168.120.255 to >>> device bond0 Sep 16 14:43:30 MDA1PFP-S01 avahi-daemon[912]: >>> Registering new address record for 192.168.120.20 on bond0.IPv4. >>> Sep 16 14:43:30 MDA1PFP-S01 IPaddr2(mda-ip)[15321]: INFO: Bringing >>> device bond0 up Sep 16 14:43:30 MDA1PFP-S01 kernel: block drbd1: peer( >>> Primary -> Secondary ) Sep 16 14:43:30 MDA1PFP-S01 >>> IPaddr2(mda-ip)[15321]: INFO: >>> /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p >>> /var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 >>> auto not_used not_used Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]: >>> notice: Operation >>> mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=35, rc=0, cib-update=24, >>> confirmed=true) >>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=36, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:30 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs: peer( Secondary -> >>> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) >>> Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs: ack_receiver >>> terminated Sep 16 14:43:30 MDA1PFP-S01 kernel: drbd shared_fs: >>> Terminating drbd_a_shared_f Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd >>> shared_fs: Connection closed Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd >>> shared_fs: conn( TearDown -> Unconnected ) Sep 16 14:43:31 MDA1PFP-S01 >>> kernel: drbd shared_fs: receiver terminated Sep 16 14:43:31 >>> MDA1PFP-S01 kernel: drbd shared_fs: Restarting receiver thread Sep 16 >>> 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: receiver (re)started Sep >>> 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: conn( Unconnected -> >>> WFConnection ) Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: >>> Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: helper command: >>> /sbin/drbdadm fence-peer shared_fs >>> Sep 16 14:43:31 MDA1PFP-S01 crm-fence-peer.sh[15569]: invoked for >>> shared_fs Sep 16 14:43:31 MDA1PFP-S01 crm-fence-peer.sh[15569]: INFO >>> peer is not reachable, my disk is UpToDate: placed constraint >>> 'drbd-fence-by-handler-shared_fs-drbd1_sync' >>> Sep 16 14:43:31 MDA1PFP-S01 kernel: drbd shared_fs: helper command: >>> /sbin/drbdadm fence-peer shared_fs exit code 5 (0x500) Sep 16 14:43:31 >>> MDA1PFP-S01 kernel: drbd shared_fs: fence-peer helper returned 5 (peer >>> is unreachable, assumed to be dead) Sep 16 14:43:31 MDA1PFP-S01 >>> kernel: drbd shared_fs: pdsk( DUnknown -> Outdated ) Sep 16 14:43:31 >>> MDA1PFP-S01 kernel: block drbd1: role( Secondary -> Primary ) Sep 16 >>> 14:43:31 MDA1PFP-S01 kernel: block drbd1: new current UUID >>> >> B1FC3E9C008711DD:C02542C7B26F9B28:BCC6102B1FD69768:BCC5102B1FD697 >> 68 >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: error: pcmkRegisterNode: >>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_promote_0: ok (node=MDA1PFP-PCS01, call=41, rc=0, cib-update=26, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=42, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Our peer on the DC >>> (MDA1PFP-PCS02) is dead >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: State transition >>> S_NOT_DC -> S_ELECTION [ input=I_ELECTION >> cause=C_CRMD_STATUS_CALLBACK >>> origin=peer_update_callback ] Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: >>> notice: State transition S_ELECTION -> S_INTEGRATION [ >>> input=I_ELECTION_DC cause=C_TIMER_POPPED >>> origin=election_timeout_popped ] Sep 16 14:43:31 MDA1PFP-S01 >>> attrd[13128]: notice: crm_update_peer_proc: >>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31 >>> MDA1PFP-S01 attrd[13128]: notice: Removing all >>> MDA1PFP-PCS02 attributes for attrd_peer_change_cb Sep 16 14:43:31 >>> MDA1PFP-S01 attrd[13128]: notice: Lost attribute writer >>> MDA1PFP-PCS02 >>> Sep 16 14:43:31 MDA1PFP-S01 attrd[13128]: notice: Removing >>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01 >>> attrd[13128]: notice: Purged 1 peers with >>> id=2 and/or uname=MDA1PFP-PCS02 from the membership cache Sep 16 >>> 14:43:31 MDA1PFP-S01 stonith-ng[13125]: notice: >>> crm_update_peer_proc: Node MDA1PFP-PCS02[2] - state is now lost (was >>> member) Sep 16 14:43:31 MDA1PFP-S01 stonith-ng[13125]: notice: >>> Removing >>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01 >>> stonith-ng[13125]: notice: Purged 1 peers with id=2 and/or >>> uname=MDA1PFP-PCS02 from the membership cache Sep 16 14:43:31 >>> MDA1PFP-S01 cib[13124]: notice: crm_update_peer_proc: >>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31 >>> MDA1PFP-S01 cib[13124]: notice: Removing >>> MDA1PFP-PCS02/2 from the membership list Sep 16 14:43:31 MDA1PFP-S01 >>> cib[13124]: notice: Purged 1 peers with >>> id=2 and/or uname=MDA1PFP-PCS02 from the membership cache Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: warning: FSA: Input I_ELECTION_DC >>> from do_election_check() received in state S_INTEGRATION Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Notifications disabled >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: error: pcmkRegisterNode: >>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16 >>> 14:43:31 MDA1PFP-S01 pengine[13129]: notice: On loss of CCM >>> Quorum: Ignore >>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]: notice: Demote drbd1:0 >>> (Master -> Slave MDA1PFP-PCS01) >>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]: notice: Calculated >>> Transition 0: /var/lib/pacemaker/pengine/pe-input-414.bz2 >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Initiating action 55: >>> notify drbd1_pre_notify_demote_0 on MDA1PFP-PCS01 (local) Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=43, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Initiating action 8: >>> demote drbd1_demote_0 on MDA1PFP-PCS01 (local) Sep 16 14:43:31 >>> MDA1PFP-S01 systemd-udevd: error: /dev/drbd1: Wrong medium type Sep 16 >>> 14:43:31 MDA1PFP-S01 kernel: block drbd1: role( Primary -> Secondary ) >>> Sep 16 14:43:31 MDA1PFP-S01 kernel: block drbd1: bitmap WRITE of 0 >>> pages took 0 jiffies Sep 16 14:43:31 MDA1PFP-S01 kernel: block drbd1: >>> 0 KB (0 bits) marked out-of-sync by on disk bit-map. >>> Sep 16 14:43:31 MDA1PFP-S01 systemd-udevd: error: /dev/drbd1: Wrong >>> medium type >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: error: pcmkRegisterNode: >>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_demote_0: ok (node=MDA1PFP-PCS01, call=44, rc=0, cib-update=49, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Initiating action 56: >>> notify drbd1_post_notify_demote_0 on MDA1PFP-PCS01 (local) Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Operation >>> drbd1_notify_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=0, >>> confirmed=true) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Initiating action 10: >>> monitor drbd1_monitor_60000 on MDA1PFP-PCS01 (local) Sep 16 14:43:31 >>> MDA1PFP-S01 corosync[13019]: [TOTEM ] A new membership >>> (192.168.121.10:988) was formed. Members left: 2 Sep 16 14:43:31 >>> MDA1PFP-S01 corosync[13019]: [QUORUM] Members[1]: 1 Sep 16 14:43:31 >>> MDA1PFP-S01 corosync[13019]: [MAIN ] Completed service >>> synchronization, ready to provide service. >>> Sep 16 14:43:31 MDA1PFP-S01 pacemakerd[13113]: notice: >>> crm_reap_unseen_nodes: Node MDA1PFP-PCS02[2] - state is now lost (was >>> member) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: crm_reap_unseen_nodes: >>> Node MDA1PFP-PCS02[2] - state is now lost (was member) Sep 16 14:43:31 >>> MDA1PFP-S01 crmd[13130]: warning: No match for shutdown action on 2 >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Stonith/shutdown of >>> MDA1PFP-PCS02 not matched >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Transition aborted: >>> Node failure (source=peer_update_callback:252, 0) >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: error: pcmkRegisterNode: >>> Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Transition 0 (Complete=10, >>> Pending=0, Fired=0, Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-414.bz2): Complete Sep 16 >>> 14:43:31 MDA1PFP-S01 pengine[13129]: notice: On loss of CCM >>> Quorum: Ignore >>> Sep 16 14:43:31 MDA1PFP-S01 pengine[13129]: notice: Calculated >>> Transition 1: /var/lib/pacemaker/pengine/pe-input-415.bz2 >>> Sep 16 14:43:31 MDA1PFP-S01 crmd[13130]: notice: Transition 1 >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-415.bz2): Complete Sep 16 >>> 14:43:31 MDA1PFP-S01 crmd[13130]: notice: State transition >>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >>> cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 16 14:48:48 MDA1PFP-S01 >>> chronyd[909]: Source 62.116.162.126 replaced with 46.182.19.75 >>> >>> Any help appreciated, >>> Jens >>> >>> >>> -- >>> *Jens Auer *| CGI | Software-Engineer >>> CGI (Germany) GmbH & Co. KG >>> Rheinstraße 95 | 64295 Darmstadt | Germany >>> T: +49 6151 36860 154 >>> [email protected]_ <mailto:[email protected]> Unsere Pflichtangaben >>> gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter >>> _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>. >>> >>> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging >>> to CGI Group Inc. and its affiliates may be contained in this message. >>> If you are not a recipient indicated or intended in this message (or >>> responsible for delivery of this message to such person), or you think >>> for any reason that this message may have been addressed to you in >>> error, you may not use or copy or deliver this message to anyone else. >>> In such case, you should destroy this message and are asked to notify >>> the sender by reply e-mail. >>> >>> -- >>> *Jens Auer *| CGI | Software-Engineer >>> CGI (Germany) GmbH & Co. KG >>> Rheinstraße 95 | 64295 Darmstadt | Germany >>> T: +49 6151 36860 154 >>> [email protected]_ <mailto:[email protected]> Unsere Pflichtangaben >>> gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter >>> _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>. _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
