Re: [Pacemaker] Failure after intermittent network outage

Pavel Levshin Thu, 10 Mar 2011 04:08:26 -0800

Hi,

No, I think you've missed the point. RA did not answer at all. Monitoractions had been lost due to a cluster transition:

Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: do_lrm_rsc_op:Performing key=33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cdop=p-drbd-mdirect1-1:0_monitor_0 )Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: do_lrm_rsc_op:Discarding attempt to perform action monitor on p-drbd-mdirect1-1:0 instate S_ELECTIONMar 1 11:16:00 wapgw1-log crmd: [24547]: info: send_direct_ack: ACK'ingresource op p-drbd-mdirect1-1:0_monitor_0 from33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd:lrm_invoke-lrmd-1298967360-58Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: process_te_message:Processing (N)ACK lrm_invoke-lrmd-1298967360-58 from wapgw1-logMar 1 11:16:00 wapgw1-log crmd: [24547]: info: process_graph_event:Action p-drbd-mdirect1-1:0_monitor_0/33(4:99;33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd) initiated by adifferent transitionerMar 1 11:16:00 wapgw1-log crmd: [24547]: info: abort_transition_graph:process_graph_event:456 - Triggered transition abort (complete=1,tag=lrm_rsc_op, id=p-drbd-mdirect1-1:0_monitor_0,magic=4:99;33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd) : Foreign event


So, RA had not have a chance to answer anything.

Apart from this, should I fake all RA's which are supposed to be unusedon the particular nodes in the cluster? It seemes to me like a partialsolution only.

Suppose that I want to use Virtual machine "X" on hardware nodes A andB, and VM "Y" on nodes B and C. Using DRBD, this is very commonconfiguration, because "X" cannot access it's disk device on hardwarenode "C". Currently, I must configure "X" and "Y" on every hardwarenode, or RA will fail with status "not configured". It's notminimalistic configuration, so it is more error prone than needed.

I would be happy to tell the cluster never to touch resource "X" on nodeC in this case. What do you think?



10.03.2011 14:09, Andrew Beekhof wrote:

Your basic problem is this...

Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op:
Processing failed op vm-mproxy1-1_monitor_0 on wapgw1-log: unknown
error (1)

We asked what state the resource was in and it replied "arrrggghhhh"
instead of "not installed".
Had it replied with not installed, we'd have no reason to call stop or
fence the node to try and clean it up.



--
Pavel Levshin //flicker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Failure after intermittent network outage

Reply via email to