On Sep 8, 2008, at 10:11 AM, Junko IKEDA wrote:

Hi,

This is a requet about showing monitor NG with crm_mon.
One dummy resource is running like this;

# crm_mon -fot -i1 -r
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online

dummy   (ocf::heartbeat:Dummy): Started node-a

Operations:
* Node node-b:
* Node node-a:
  dummy:
   + start: rc=0 (ok)
   + monitor: interval=10000ms rc=0 (ok)



remove its status file, so dummy will do failover.

# rm -f /var/run/heartbeat/rsctmp/Dummy-dummy.state
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online

dummy   (ocf::heartbeat:Dummy): Started node-b

Operations:
* Node node-b:
  dummy:
   + start: rc=0 (ok)
   + monitor: interval=10000ms rc=0 (ok)
* Node node-a:
  dummy:  fail-count=1
   + start: rc=0 (ok)
   + monitor: interval=10000ms rc=7 (not running)
   + stop: rc=0 (ok)

Failed actions:
   dummy_monitor_10000 (node=node-a, call=4, rc=7): complete



After that, the node which the resource is running now is stopped manually.

# service heartbeat stop
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): OFFLINE
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online

Operations:
* Node node-b:
  dummy:
   + start: rc=0 (ok)
   + monitor: interval=10000ms rc=0 (ok)
   + stop: rc=0 (ok)
* Node node-a:
  dummy:  fail-count=1
   + start: rc=0 (ok)
   + stop: rc=0 (ok)

At this time, node-a's monitor NG disappears from crm_mon.

because it is no longer in the current start/stop series for the resource.


It might be an expected behavior for now,

it is.


it would be convenient if crm_mon can keep showing some past failures.

it cant display them forever. they are not (and should not) be kept in the CIB forever as it would cause the CIB size to explode.


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to