On 03/10/2016 09:49 AM, Lorand Kelemen wrote: > Dear List, > > After the creation and testing of a simple 2 node active-passive > drbd+postfix cluster nearly everything works flawlessly (standby, failure > of a filesystem resource + failover, splitbrain + manual recovery) however > when delibarately killing the postfix instance, after reaching the > migration threshold failover does not occur and resources revert to the > Stopped state (except the master-slave drbd resource, which works as > expected). > > Ordering and colocation is configured, STONITH and quorum disabled, the > goal is to always have one node running all the resources and at any sign > of error it should fail over to the passive node, nothing fancy. > > Is my configuration wrong or am I hitting a bug? > > All software from centos 7 + elrepo repositories.
With these versions, you can set "two_node: 1" in /etc/corosync/corosync.conf (which will be done automatically if you used "pcs cluster setup" initially), and then you don't need to ignore quorum in pacemaker. > Regarding STONITH: the machines are running on free ESXi instances on > separate machines, so the Vmware fencing agents won't work because in the > free version the API is read-only. > Still trying to figure out a way to go, until then manual recovery + huge > arp cache times on the upstream firewall... > > Please find pe-input*.bz files attached, logs and config below. The > situation: on node mail1 postfix was killed 3 times (migration threshold), > it should have failed over to mail2. > When killing a filesystem resource three times this happens flawlessly. > > Thanks for your input! > > Best regards, > Lorand > > > Cluster Name: mailcluster > Corosync Nodes: > mail1 mail2 > Pacemaker Nodes: > mail1 mail2 > > Resources: > Group: network-services > Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0 > Operations: start interval=0s timeout=20s (virtualip-1-start-interval-0s) > stop interval=0s timeout=20s (virtualip-1-stop-interval-0s) > monitor interval=30s (virtualip-1-monitor-interval-30s) > Master: spool-clone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: spool (class=ocf provider=linbit type=drbd) > Attributes: drbd_resource=spool > Operations: start interval=0s timeout=240 (spool-start-interval-0s) > promote interval=0s timeout=90 (spool-promote-interval-0s) > demote interval=0s timeout=90 (spool-demote-interval-0s) > stop interval=0s timeout=100 (spool-stop-interval-0s) > monitor interval=10s (spool-monitor-interval-10s) > Master: mail-clone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: mail (class=ocf provider=linbit type=drbd) > Attributes: drbd_resource=mail > Operations: start interval=0s timeout=240 (mail-start-interval-0s) > promote interval=0s timeout=90 (mail-promote-interval-0s) > demote interval=0s timeout=90 (mail-demote-interval-0s) > stop interval=0s timeout=100 (mail-stop-interval-0s) > monitor interval=10s (mail-monitor-interval-10s) > Group: fs-services > Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem) > Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4 > options=nodev,nosuid,noexec > Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s) > stop interval=0s timeout=60 (fs-spool-stop-interval-0s) > monitor interval=20 timeout=40 (fs-spool-monitor-interval-20) > Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem) > Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4 > options=nodev,nosuid,noexec > Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s) > stop interval=0s timeout=60 (fs-mail-stop-interval-0s) > monitor interval=20 timeout=40 (fs-mail-monitor-interval-20) > Group: mail-services > Resource: postfix (class=ocf provider=heartbeat type=postfix) > Operations: start interval=0s timeout=20s (postfix-start-interval-0s) > stop interval=0s timeout=20s (postfix-stop-interval-0s) > monitor interval=45s (postfix-monitor-interval-45s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > start network-services then promote mail-clone (kind:Mandatory) > (id:order-network-services-mail-clone-mandatory) > promote mail-clone then promote spool-clone (kind:Mandatory) > (id:order-mail-clone-spool-clone-mandatory) > promote spool-clone then start fs-services (kind:Mandatory) > (id:order-spool-clone-fs-services-mandatory) > start fs-services then start mail-services (kind:Mandatory) > (id:order-fs-services-mail-services-mandatory) > Colocation Constraints: > network-services with spool-clone (score:INFINITY) (rsc-role:Started) > (with-rsc-role:Master) (id:colocation-network-services-spool-clone-INFINITY) > network-services with mail-clone (score:INFINITY) (rsc-role:Started) > (with-rsc-role:Master) (id:colocation-network-services-mail-clone-INFINITY) > network-services with fs-services (score:INFINITY) > (id:colocation-network-services-fs-services-INFINITY) > network-services with mail-services (score:INFINITY) > (id:colocation-network-services-mail-services-INFINITY) I'm not sure whether it's causing your issue, but I would make the constraints reflect the logical relationships better. For example, network-services only needs to be colocated with mail-services logically; it's mail-services that needs to be with fs-services, and fs-services that needs to be with spool-clone/mail-clone master. In other words, don't make the highest-level resource depend on everything else, make each level depend on the level below it. Also, I would guess that the virtual IP only needs to be ordered before mail-services, and mail-clone and spool-clone could both be ordered before fs-services, rather than ordering mail-clone before spool-clone. > Resources Defaults: > migration-threshold: 3 > Operations Defaults: > on-fail: restart > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: mailcluster > cluster-recheck-interval: 5min > dc-version: 1.1.13-10.el7_2.2-44eb2dd > default-resource-stickiness: infinity > have-watchdog: false > last-lrm-refresh: 1457613674 > no-quorum-policy: ignore > pe-error-series-max: 1024 > pe-input-series-max: 1024 > pe-warn-series-max: 1024 > stonith-enabled: false > > > > > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.15 2 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.16 (null) > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=16 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']: > @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1274, @last-rc-change=1457613440 > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > abort_transition_graph: Transition aborted by postfix_monitor_45000 > 'modify' on mail1: Inactive graph > (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16, > source=process_graph_event:598, 1) > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > update_failcount: Updating failcount for postfix on mail1 after failed > monitor: rc=7 (update=value++, time=1457613440) > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_client_update: Expanded fail-count-postfix=value++ to 3 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail1/crmd/196, version=0.197.16) > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from mail2 > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > write_attribute: Sent update 400 with 2 changes for > fail-count-postfix, id=<n/a>, set=(null) > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Forwarding cib_modify operation for section status to > master (origin=local/attrd/400) > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > process_graph_event: Detected action (1234.4) > postfix_monitor_45000.1274=not running: failed > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347 -> > 1457613440 from mail2 > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > write_attribute: Sent update 401 with 2 changes for > last-failure-postfix, id=<n/a>, set=(null) > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.16 2 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.17 (null) > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=17 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']: > @value=3 > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail2/attrd/400, version=0.197.17) > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > abort_transition_graph: Transition aborted by > status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute > change (modify cib=0.197.17, source=abort_unless_down:319, > path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'], > 1) > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 400 for fail-count-postfix: OK (0) > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK (0) > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 400 for fail-count-postfix[mail2]=(null): OK > (0) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Forwarding cib_modify operation for section status to > master (origin=local/attrd/401) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.17 2 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.18 (null) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=18 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']: > @value=1457613440 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail2/attrd/401, version=0.197.18) > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 401 for last-failure-postfix: OK (0) > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 401 for > last-failure-postfix[mail1]=1457613440: OK (0) > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > attrd_cib_callback: Update 401 for > last-failure-postfix[mail2]=1457610376: OK (0) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > abort_transition_graph: Transition aborted by > status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient > attribute change (modify cib=0.197.18, source=abort_unless_down:319, > path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'], > 1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > unpack_config: On loss of CCM Quorum: Ignore > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_online_status: Node mail1 is online > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_online_status: Node mail2 is online > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource mail:0 active in > master mode on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource spool:0 active in > master mode on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource fs-spool active on > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource fs-mail active on > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > unpack_rsc_op_failure: Processing failed op monitor for postfix on > mail1: not running (7) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource spool:1 active in > master mode on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource mail:1 active in > master mode on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: network-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > clone_print: Master/Slave Set: spool-clone [spool] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Masters: [ mail1 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Slaves: [ mail2 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > clone_print: Master/Slave Set: mail-clone [mail] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Masters: [ mail1 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Slaves: [ mail2 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: fs-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: mail-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: postfix (ocf::heartbeat:postfix): FAILED mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > get_failcount_full: postfix has failed 3 times on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > common_apply_stickiness: Forcing postfix away from mail1 after 3 > failures (max=3) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: Promoting mail:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: mail-clone: Promoted 1 instances of a possible 1 to master > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: Promoting spool:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: spool-clone: Promoted 1 instances of a possible 1 to master > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > rsc_merge_weights: postfix: Rolling back scores from virtualip-1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_color: Resource virtualip-1 cannot run anywhere > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > RecurringOp: Start recurring monitor (45s) for postfix on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop virtualip-1 (mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave spool:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave spool:1 (Slave mail2) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave mail:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave mail:1 (Slave mail2) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop fs-spool (Started mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop fs-mail (Started mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop postfix (Started mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > process_pe_message: Calculated Transition 1235: > /var/lib/pacemaker/pengine/pe-input-302.bz2 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > handle_response: pe_calc calculation pe_calc-dc-1457613441-3756 is > obsolete > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > unpack_config: On loss of CCM Quorum: Ignore > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_online_status: Node mail1 is online > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_online_status: Node mail2 is online > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource mail:0 active in > master mode on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource spool:0 active in > master mode on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource fs-spool active on > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource fs-mail active on > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > unpack_rsc_op_failure: Processing failed op monitor for postfix on > mail1: not running (7) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource spool:1 active in > master mode on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > determine_op_status: Operation monitor found resource mail:1 active in > master mode on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: network-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started > mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > clone_print: Master/Slave Set: spool-clone [spool] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Masters: [ mail1 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Slaves: [ mail2 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > clone_print: Master/Slave Set: mail-clone [mail] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Masters: [ mail1 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > short_print: Slaves: [ mail2 ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: fs-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > group_print: Resource Group: mail-services > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_print: postfix (ocf::heartbeat:postfix): FAILED mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > get_failcount_full: postfix has failed 3 times on mail1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > common_apply_stickiness: Forcing postfix away from mail1 after 3 > failures (max=3) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: Promoting mail:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: mail-clone: Promoted 1 instances of a possible 1 to master > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: Promoting spool:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > master_color: spool-clone: Promoted 1 instances of a possible 1 to master > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > rsc_merge_weights: postfix: Rolling back scores from virtualip-1 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > native_color: Resource virtualip-1 cannot run anywhere > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > RecurringOp: Start recurring monitor (45s) for postfix on mail2 > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop virtualip-1 (mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave spool:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave spool:1 (Slave mail2) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave mail:0 (Master mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > LogActions: Leave mail:1 (Slave mail2) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop fs-spool (Started mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop fs-mail (Started mail1) > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > LogActions: Stop postfix (Started mail1) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response ] > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > process_pe_message: Calculated Transition 1236: > /var/lib/pacemaker/pengine/pe-input-303.bz2 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757) > derived from /var/lib/pacemaker/pengine/pe-input-303.bz2 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.18 2 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.19 (null) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=19 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']: > @operation_key=virtualip-1_stop_0, @operation=stop, > @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441, > @exec-time=66 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail1/crmd/197, version=0.197.19) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1 > (rc=0) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.19 2 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.20 (null) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=20 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']: > @operation_key=postfix_stop_0, @operation=stop, > @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441, > @exec-time=476 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on mail1 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail1/crmd/198, version=0.197.20) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.20 2 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.21 (null) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=21 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']: > @operation_key=fs-mail_stop_0, @operation=stop, > @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441, > @exec-time=88, @queue-time=1 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail1/crmd/199, version=0.197.21) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1 (rc=0) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on mail1 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: --- 0.197.21 2 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: Diff: +++ 0.197.22 (null) > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + /cib: @num_updates=22 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_perform_op: + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']: > @operation_key=fs-spool_stop_0, @operation=stop, > @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441, > @exec-time=86 > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > cib_process_request: Completed cib_modify operation for section status: OK > (rc=0, origin=mail1/crmd/200, version=0.197.22) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1 (rc=0) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning: > run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0, Skipped=0, > Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2): > Terminated > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning: > te_graph_trigger: Transition failed: terminated > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs, > network-delay=0ms > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 16]: Completed pseudo op > network-services_stopped_0 on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 15]: Completed pseudo op > network-services_stop_0 on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 12]: Completed rsc op virtualip-1_stop_0 > on mail1 (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 84]: Completed pseudo op > fs-services_stopped_0 on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 83]: Completed pseudo op fs-services_stop_0 > on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0 > on mail1 (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0 > on mail1 (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 90]: Completed pseudo op > mail-services_stopped_0 on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 89]: Completed pseudo op > mail-services_stop_0 on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 86]: Pending rsc op postfix_monitor_45000 > on mail2 (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: * [Input 85]: Unresolved dependency rsc op > postfix_start_0 on mail2 > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 5]: Completed rsc op postfix_stop_0 > on mail1 (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > print_synapse: [Action 8]: Completed pseudo op all_stopped > on N/A (priority: 0, waiting: none) > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: do_log: > FSA: Input I_TE_SUCCESS from notify_crmd() received in state > S_TRANSITION_ENGINE > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ > input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info: > cib_process_ping: Reporting our current digest to mail2: > 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0) _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
