[ClusterLabs] Strange monitor return code log for LSB resource

Harvey Shepherd Tue, 25 Jun 2019 06:53:36 -0700

Hi All,


I have a 2 node cluster running under Pacemaker 2.0.2, with around 20 resources 
configured, the majority of which are LSB resources, but there are also a few 
OCF ones. One of the LSB resources is controlled via an init script called 
"logging" and runs only on the master node. The CIB configuration for it is as 
follows:


    <primitive id="logging" class="lsb" type="logging">
      <operations>
        <op name="monitor" interval="10s" id="logging-monitor-10s"/>
        <op name="start" interval="0" id="logging-start-30s"/>
        <op name="stop" interval="0" on-fail="restart" id="logging-stop-30s"/>
      </operations>
    </primitive>


There is a global setting which sets the default timeout:


  <op_defaults>
    <meta_attributes id="op-options">
      <nvpair name="timeout" value="30s" id="op-options-timeout"/>
    </meta_attributes>
  </op_defaults>


All of the other LSB resources are configured in the same way, and seem to work 
correctly, but for some reason I see the following recurring logs for the 
logging resource:


Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)    
debug: Scheduling another invocation of logging_status_10000
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8423 - exited with rc=0
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8423:stdout [ Remote syslog service is running ]
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (log_finished)      debug: 
finished - rsc:logging action:monitor call_id:123 pid:8423 exit-code:0 
exec-time:0ms queue-time:0ms
Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)    
debug: Scheduling another invocation of logging_status_10000
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8670 - exited with rc=0
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8670:stdout [ Remote syslog service is running ]
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (log_finished)      debug: 
finished - rsc:logging action:monitor call_id:123 pid:8670 exit-code:0 
exec-time:0ms queue-time:0ms
Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7

Pacemaker is reporting failed resource actions, but fail-count is not 
incremented for the resource:


Migration Summary:
* Node primary:
* Node secondary:

Failed Resource Actions:
* logging_monitor_10000 on primary 'not running' (7): call=119, 
status=complete, exitreason='',
    last-rc-change='Tue Jun 25 13:13:12 2019', queued=0ms, exec=0ms


I have checked the operation of the LSB script manually and it always correctly 
exits with a return code of 0 when I run it manually, indicating that the 
resource is running. So my questions are:


1. Why does Pacemaker seem to be running a monitor operation in parallel with a 
status operation, with conflicting results? A monitor operation returning 7 
"not running" would only make sense for an OCF resource, but it is clearly 
defined as LSB in the CIB.

2. Why does the status operation always return 0 (running) and the monitor 
operation always returns 7 (not running)?

2. Why is fail-count not being incremented even though failures are being 
logged?


I would really appreciate any pointers that anyone could give me. Perhaps I've 
made an error in the configuration.



Thanks,

Harvey Shepherd

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Strange monitor return code log for LSB resource

Reply via email to