Dear friends ,

We have the following configuration :

CentOS7 , pacemaker 0.9.152 and Corosync 2.4.0, storage with DRBD and stonith eanbled with APC PDU devices.

I have a windows VM configured as cluster resource with the following attributes :

Resource: WindowSentinelOne_res (class=ocf provider=heartbeat type=VirtualDomain) Attributes: hypervisor=qemu:///system config=/opt/customer_vms/conf/WindowSentinelOne/WindowSentinelOne.xml migration_transport=ssh
Utilization: cpu=8 hv_memory=8192
Operations: start interval=0s timeout=120s (WindowSentinelOne_res-start-interval-0s)                     stop interval=0s timeout=120s (WindowSentinelOne_res-stop-interval-0s)                     monitor interval=10s timeout=30s (WindowSentinelOne_res-monitor-interval-10s)

under some circumstances  (which i try to identify) the VM fails and disappears under virsh list --all and also pacemaker reports the VM as stopped .

If run pcs resource cleanup windows_wm everything is OK, but i can't identify the reason of failure.

For example when shutdown the VM (with windows shutdown)  the cluster reports the following :

WindowSentinelOne_res    (ocf::heartbeat:VirtualDomain): Started sgw-02 (failure ignored)

Failed Actions:
* WindowSentinelOne_res_monitor_10000 on sgw-02 'not running' (7): call=67, status=complete, exitreason='none',
    last-rc-change='Mon Jun 25 07:41:37 2018', queued=0ms, exec=0ms.


My questions are

1) why the VM shutdown is reported as (FailedAction) from cluster ? Its a worthy operation during VM life cycle .

2) why sometimes the resource is marked as stopped (the VM is healthy) and needs cleanup ?

3) I can't understand the corosync logs ... during the the VM shutdown corosync logs is the following


Jun 25 07:41:37 [5140] sgw-02       crmd:     info: process_lrm_event:    Result of monitor operation for WindowSentinelOne_res on sgw-02: 7 (not running) | call=67 key=WindowSentinelOne_res_monitor_10000 confirmed=false cib-update=36 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Forwarding cib_modify operation for section status to all (origin=local/crmd/36) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: --- 0.4704.67 2 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: +++ 0.4704.68 (null) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    +  /cib:  @num_updates=68 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    +  /cib/status/node_state[@id='2']: @crm-debug-origin=do_update_resource Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    ++ /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='WindowSentinelOne_res']: <lrm_rsc_op id="WindowSentinelOne_res_last_failure_0" operation_key="WindowSentinelOne_res_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="84:3:0:f910c793-a714-4e24-80d1-b0ec66275491" transition-magic="0:7;84:3:0:f910c793-a714-4e24-80d1-b0ec66275491" on_node="sgw-02" cal Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Completed cib_modify operation for section status: OK (rc=0, origin=sgw-02/crmd/36, version=0.4704.68) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_peer_update:    Setting fail-count-WindowSentinelOne_res[sgw-02]: (null) -> 1 from sgw-01 Jun 25 07:41:37 [5137] sgw-02      attrd:     info: write_attribute:    Sent update 10 with 1 changes for fail-count-WindowSentinelOne_res, id=<n/a>, set=(null) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Forwarding cib_modify operation for section status to all (origin=local/attrd/10) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_peer_update:    Setting last-failure-WindowSentinelOne_res[sgw-02]: (null) -> 1529912497 from sgw-01 Jun 25 07:41:37 [5137] sgw-02      attrd:     info: write_attribute:    Sent update 11 with 1 changes for last-failure-WindowSentinelOne_res, id=<n/a>, set=(null) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Forwarding cib_modify operation for section status to all (origin=local/attrd/11) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: --- 0.4704.68 2 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: +++ 0.4704.69 (null) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    +  /cib:  @num_updates=69 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    ++ /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']: <nvpair id="status-2-fail-count-WindowSentinelOne_res" name="fail-count-WindowSentinelOne_res" value="1"/> Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Completed cib_modify operation for section status: OK (rc=0, origin=sgw-02/attrd/10, version=0.4704.69) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_cib_callback:    Update 10 for fail-count-WindowSentinelOne_res: OK (0) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_cib_callback:    Update 10 for fail-count-WindowSentinelOne_res[sgw-02]=1: OK (0) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: --- 0.4704.69 2 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    Diff: +++ 0.4704.70 (null) Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    +  /cib:  @num_updates=70 Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    ++ /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']: <nvpair id="status-2-last-failure-WindowSentinelOne_res" name="last-failure-WindowSentinelOne_res" value="1529912497"/> Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_process_request:    Completed cib_modify operation for section status: OK (rc=0, origin=sgw-02/attrd/11, version=0.4704.70) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_cib_callback:    Update 11 for last-failure-WindowSentinelOne_res: OK (0) Jun 25 07:41:37 [5137] sgw-02      attrd:     info: attrd_cib_callback:    Update 11 for last-failure-WindowSentinelOne_res[sgw-02]=1529912497: OK (0) Jun 25 07:41:42 [5130] sgw-02        cib:     info: cib_process_ping:    Reporting our current digest to sgw-01: 3e27415fcb003ef3373b47ffa6c5f358 for 0.4704.70 (0x7faac1729720 0)

Sincerely ,

Vaggelis Papastavros

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to