Thanks, your explanation is very helpful considering that it happens rarely and only on the first boot after VMs are created.
On Mon, May 22, 2017 at 9:34 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 05/19/2017 02:03 PM, Radoslaw Garbacz wrote: > > Hi, > > > > I have some more information regarding this issue (pacemaker debug logs). > > > > Firstly, I have not mentioned probably important facts: > > 1) this happen rarely > > 2) this happen only on first boot > > 3) turning on debug in corosync/pacemaker significantly reduced > > frequency of this happening, i.e. without debug every ~7 cluster > > creation, with debug every ~66 cluster creation. > > > > This is a 3 nodes cluster on Azure Cloud and it does not seem like the > > resource agent is reporting an error, because all nodes logs proper "not > > running" results: > > > > The resource in question name is "dbx_head_head". > > > > node1) > > May 19 13:15:41 [6872] olegdbx39-vm-0 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31 > > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7717]: > > 2017/05/19_13:15:42 DEBUG: head_monitor: return 7 > > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:7717 - exited with rc=7 > > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:7717:stderr [ -- empty > -- ] > > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:7717:stdout [ -- empty > -- ] > > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug: > > log_finished: finished - rsc:dbx_head_head action:monitor call_id:14 > > pid:7717 exit-code:7 exec-time:932ms queue-time:0ms > > > > > > node2) > > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.31 to 2.5.30 > > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[6485]: > > 2017/05/19_13:15:41 DEBUG: head_monitor: return 7 > > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:6485 - exited with rc=7 > > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:6485:stderr [ -- empty > -- ] > > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug: > > operation_finished: dbx_head_head_monitor_0:6485:stdout [ -- empty > -- ] > > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug: > > log_finished: finished - rsc:dbx_head_head action:monitor call_id:14 > > pid:6485 exit-code:7 exec-time:790ms queue-time:0ms > > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31 > > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.33 to 2.5.32 > > > > > > node3) > > == the logs here are different - there is no probing, just stop attempt > > (with proper exit code) == > > > > == reporting not existing resource == > > > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_info operation from > > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0 > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_exec operation from > > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=10, reply=1, notify=0 > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug: > > log_execute: executing - rsc:dbx_first_datas action:monitor call_id:10 > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info: > > process_lrmd_get_rsc_info: Resource 'dbx_head_head' not found (2 > > active resources) > > FYI, this is normal. It just means the lrmd hasn't been asked to do > anything with this resource before, so it's not found in the lrmd's memory. > > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_info operation from > > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0 > > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info: > > process_lrmd_rsc_register: Added 'dbx_head_head' to the rsc list (3 > > active resources) > > May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_register operation from > > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=1, notify=1 > > May 19 13:15:29 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.9 to 2.5.8 > > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.10 to 2.5.9 > > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.11 to 2.5.10 > > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.12 to 2.5.11 > > May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_info operation from > > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0 > > > > > > == and much later it tries to stop the resource on node3, even though it > > has never been started, nor probed here == > > At any given time, one node in the cluster has been elected DC. The DC > schedules all actions. So, the logs on the DC (not this node) will have > the explanation for why there was no probe, and why there was a stop. > > > > > May 19 13:15:45 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.59 to 2.5.58 > > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_info operation from > > 069e399c-6c10-47cf-b55f-de46985562b3: rc=0, reply=0, notify=0 > > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug: > > process_lrmd_message: Processed lrmd_rsc_exec operation from > > 069e399c-6c10-47cf-b55f-de46985562b3: rc=77, reply=1, notify=0 > > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: info: > > log_execute: executing - rsc:dbx_head_head action:stop call_id:77 > > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]: > > 2017/05/19_13:15:45 INFO: head_stop: Stopping resource > > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]: > > 2017/05/19_13:15:47 DEBUG: head_monitor: return 7 > > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]: > > 2017/05/19_13:15:47 INFO: Resource is already stopped > > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug: > > operation_finished: dbx_head_head_stop_0:7202 - exited with rc=0 > > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug: > > operation_finished: dbx_head_head_stop_0:7202:stderr [ -- empty -- ] > > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug: > > operation_finished: dbx_head_head_stop_0:7202:stdout [ -- empty -- ] > > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: info: > > log_finished: finished - rsc:dbx_head_head action:stop call_id:77 > > pid:7202 exit-code:0 exec-time:1359ms queue-time:0ms > > May 19 13:15:47 [6292] olegdbx39-vm000003 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.60 to 2.5.59 > > > > > > > > > > node1) > > == reports an error on node3, even though the resource has never been > > even configured, probed nor started here == > > > > The error is reported on node1 that it happened on node3 (acquired from > > 'crm_mon'): > > > > Failed Actions: > > * dbx_head_head_monitor_0 on olegdbx39-vm000003 'not configured' (6): > > call=999999999, status=Error, exitreason='none', > > last-rc-change='Fri May 19 13:15:40 2017', queued=0ms, exec=0ms > > The combination of 'not configured' and call=999999999 has only one > possible source: the crmd could not register the resource with the lrmd. > > Usually this means the resource definition in the CIB is inherently > invalid. However, it can also be reported when there is any kind of > error in the communication between the two daemons: the connection > dropped, the machine was out of memory, the communication timed out > (e.g. due to high load), etc. > > The "Added 'dbx_head_head' to the rsc list" message means that the lrmd > did receive and complete the register request, so the problem occurred > sometime between there and crmd getting the result back. > > This explains why you don't see a probe being executed. The crmd > couldn't register the resource with the lrmd, so it immediately failed > the probe on its side. > > The root cause is a mystery at this point, but hopefully that gives you > some hints as to what to look for. Check the logs on the DC, and check > the host's system logs for any signs of trouble at that time. > > > == Node1 logs "not configured" even though all return codes logged > > earlier are OK, below are logs for thread, which reports this error: == > > > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: > > crm_xml_cleanup: Cleaning up memory from libxml2 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: main: > > Starting crm_mon > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > cib_native_signon_raw: Connection to CIB successful > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: STONITH timeout: 60000 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: STONITH of failed nodes is disabled > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: Concurrent fencing is disabled > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: Stop all active resources: false > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: Cluster is symmetric - resources can run anywhere by > > default > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: Default stickiness: 0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: notice: > > unpack_config: On loss of CCM Quorum: Ignore > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' > = 0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: > > determine_online_status: Node olegdbx39-vm000003 is online > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: > > determine_online_status: Node olegdbx39-vm-0 is online > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: > > determine_online_status: Node olegdbx39-vm000002 is online > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003 > > returned 'not configured' (6) instead of the expected value: 'not > > running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head > > on olegdbx39-vm000003: not configured (6) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error: > > unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere: > > operation monitor failed 'not configured' (6) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003 > > returned 'not configured' (6) instead of the expected value: 'not > > running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head > > on olegdbx39-vm000003: not configured (6) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error: > > unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere: > > operation monitor failed 'not configured' (6) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on > > olegdbx39-vm000003 to dbx_bind_mounts_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_bind_mounts_nodes_monitor_0 on > > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected > > value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_bind_mounts_nodes_monitor_0 on > > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected > > value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_swap_nodes on > > olegdbx39-vm000003 to dbx_swap_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_swap_nodes_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_swap_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_nodes on > > olegdbx39-vm000003 to dbx_nfs_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_nfs_nodes_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_nfs_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_first_datas on > > olegdbx39-vm000003 to dbx_first_datas:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_first_datas:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_first_datas:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_first_head on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [6872] olegdbx39-vm-0 stonith-ng: debug: > > xml_patch_version_check: Can apply patch 2.5.47 to 2.5.46 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_first_head on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_mounts_nodes on > > olegdbx39-vm000003 to dbx_mounts_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003 > > returned 'unknown' (189) instead of the expected value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on > > olegdbx39-vm000003 to dbx_nfs_mounts_datas:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > determine_op_status: dbx_nfs_mounts_datas_monitor_0 on > > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected > > value: 'not running' (7) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning: > > unpack_rsc_op_failure: Processing failed op monitor for > > dbx_nfs_mounts_datas:0 on olegdbx39-vm000003: unknown (189) > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_ready_primary on > > olegdbx39-vm000003 to dbx_ready_primary:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_first_datas on > > olegdbx39-vm-0 to dbx_first_datas:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_swap_nodes on > > olegdbx39-vm-0 to dbx_swap_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_mounts_nodes on > > olegdbx39-vm-0 to dbx_mounts_nodes:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on > > olegdbx39-vm-0 to dbx_bind_mounts_nodes:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on > > olegdbx39-vm-0 to dbx_nfs_mounts_datas:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_nodes on > > olegdbx39-vm-0 to dbx_nfs_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_ready_primary on > > olegdbx39-vm-0 to dbx_ready_primary:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_first_datas on > > olegdbx39-vm000002 to dbx_first_datas:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_swap_nodes on > > olegdbx39-vm000002 to dbx_swap_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on > > olegdbx39-vm000002 to dbx_nfs_mounts_datas:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_mounts_nodes on > > olegdbx39-vm000002 to dbx_mounts_nodes:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on > > olegdbx39-vm000002 to dbx_bind_mounts_nodes:1 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_nfs_nodes on > > olegdbx39-vm000002 to dbx_nfs_nodes:0 > > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug: > > find_anonymous_clone: Internally renamed dbx_ready_primary on > > olegdbx39-vm000002 to dbx_ready_primary:0 > > [...] > > > > > > Thanks in advance, > > > > > > On Thu, May 18, 2017 at 4:37 PM, Ken Gaillot <kgail...@redhat.com > > <mailto:kgail...@redhat.com>> wrote: > > > > On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote: > > > Hi, > > > > > > I have a question regarding '<lrm_resource><lrm_rsc_op> 'op-status > > > attribute getting value 4. > > > > > > In my case I have a strange behavior, when resources get those > "monitor" > > > operation entries in the CIB with op-status=4, and they do not > seem to > > > be called (exec-time=0). > > > > > > What does 'op-status' = 4 mean? > > > > The action had an error status > > > > > > > > I would appreciate some elaboration regarding this, since this is > > > interpreted by pacemaker as an error, which causes logs: > > > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from > > > re-starting anywhere: operation monitor failed 'not configured' (6) > > > > The rc-code="6" is the more interesting number; it's the result > returned > > by the resource agent. As you can see above, it means "not > configured". > > What that means exactly is up to the resource agent's interpretation. > > > > > and I am pretty sure the resource agent was not called (no logs, > > > exec-time=0) > > > > Normally this could only come from the resource agent. > > > > However there are two cases where pacemaker generates this error > itself: > > if the resource definition in the CIB is invalid; and if your > version of > > pacemaker was compiled with support for reading sensitive parameter > > values from a file, but that file could not be read. > > > > It doesn't sound like your case is either one of those though, since > > they would prevent the resource from even starting. Most likely it's > > coming from the resource agent. I'd look at the resource agent source > > code and see where it can return OCF_ERR_CONFIGURED. > > > > > There are two aspects of this: > > > > > > 1) harmless (pacemaker seems to not bother about it), which I guess > > > indicates cancelled monitoring operations: > > > op-status=4, rc-code=189 > > > > This error means the connection between the crmd and lrmd daemons was > > lost -- most commonly, that shows up for operations that were > pending at > > shutdown. > > > > > > > > * Example: > > > <lrm_rsc_op id="dbx_first_datas_last_failure_0" > > > operation_key="dbx_first_datas_monitor_0" operation="monitor" > > > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" > > > transition-key="38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739" > > > transition-magic="4:189;38:0:7:c8b63d9d-9c70-4f99-aa1b- > e993de6e4739" > > > on_node="olegdbx61-vm000001" call-id="10" rc-code="189" > op-status="4" > > > interval="0" last-run="1495057378" last-rc-change="1495057378" > > > exec-time="0" queue-time="0" > > op-digest="f6bd1386a336e8e6ee25ecb651a9efb6"/> > > > > > > > > > 2) error level one (op-status=4, rc-code=6), which generates logs: > > > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from > > > re-starting anywhere: operation monitor failed 'not configured' (6) > > > > > > * Example: > > > <lrm_rsc_op id="dbx_head_head_last_failure_0" > > > operation_key="dbx_head_head_monitor_0" operation="monitor" > > > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" > > > transition-key="39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739" > > > transition-magic="4:6;39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739" > > > on_node="olegdbx61-vm000001" call-id="999999999" rc-code="6" > > > op-status="4" interval="0" last-run="1495057389" > > > last-rc-change="1495057389" exec-time="0" queue-time="0" > > > op-digest="60cdc9db1c5b77e8dba698d3d0c8cda8"/> > > > > > > > > > Could it be some hardware (VM hyperviser) issue? > > > > > > > > > Thanks in advance, > > > > > > -- > > > Best Regards, > > > > > > Radoslaw Garbacz > > > XtremeData Incorporated > -- Best Regards, Radoslaw Garbacz XtremeData Incorporated
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org