Hi All,

I have a 2 node cluster running under Pacemaker 2.0.2, with around 20 resources 
configured, the majority of which are LSB resources, but there are also a few 
OCF ones. One of the LSB resources is controlled via an init script called 
"logging" and runs only on the master node. The CIB configuration for it is as 

    <primitive id="logging" class="lsb" type="logging">
        <op name="monitor" interval="10s" id="logging-monitor-10s"/>
        <op name="start" interval="0" id="logging-start-30s"/>
        <op name="stop" interval="0" on-fail="restart" id="logging-stop-30s"/>

There is a global setting which sets the default timeout:

    <meta_attributes id="op-options">
      <nvpair name="timeout" value="30s" id="op-options-timeout"/>

All of the other LSB resources are configured in the same way, and seem to work 
correctly, but for some reason I see the following recurring logs for the 
logging resource:

Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)    
debug: Scheduling another invocation of logging_status_10000
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8423 - exited with rc=0
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8423:stdout [ Remote syslog service is running ]
Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (log_finished)      debug: 
finished - rsc:logging action:monitor call_id:123 pid:8423 exit-code:0 
exec-time:0ms queue-time:0ms
Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)    
debug: Scheduling another invocation of logging_status_10000
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8670 - exited with rc=0
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)        
debug: logging_status_10000:8670:stdout [ Remote syslog service is running ]
Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (log_finished)      debug: 
finished - rsc:logging action:monitor call_id:123 pid:8670 exit-code:0 
exec-time:0ms queue-time:0ms
Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7
Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (determine_op_status)       
debug: logging_monitor_10000 on primary returned 'not running' (7) instead of 
the expected value: 'ok' (0)
Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (unpack_rsc_op_failure)     
warning: Processing failed monitor of logging on primary: not running | rc=7

Pacemaker is reporting failed resource actions, but fail-count is not 
incremented for the resource:

Migration Summary:
* Node primary:
* Node secondary:

Failed Resource Actions:
* logging_monitor_10000 on primary 'not running' (7): call=119, 
status=complete, exitreason='',
    last-rc-change='Tue Jun 25 13:13:12 2019', queued=0ms, exec=0ms

I have checked the operation of the LSB script manually and it always correctly 
exits with a return code of 0 when I run it manually, indicating that the 
resource is running. So my questions are:

1. Why does Pacemaker seem to be running a monitor operation in parallel with a 
status operation, with conflicting results? A monitor operation returning 7 
"not running" would only make sense for an OCF resource, but it is clearly 
defined as LSB in the CIB.

2. Why does the status operation always return 0 (running) and the monitor 
operation always returns 7 (not running)?

2. Why is fail-count not being incremented even though failures are being 

I would really appreciate any pointers that anyone could give me. Perhaps I've 
made an error in the configuration.


Harvey Shepherd
Manage your subscription:

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to