Re: [Pacemaker] node offline after fencing (pacemakerd hangs)

Ulrich Leodolter Wed, 18 Jul 2012 07:03:33 -0700

hi,

after adding a second ring to corosync.conf
the problem seems to be gone.


after killing corosync the node is fenced by
the other node.  after reboot the cluster is
fully operational.

is this essential to have at least 2 rings?

maybe there is a network timing problem (but can't see
error messages)
the interface on ring 0 (192.168.20.171) is a bridge.
the interface on ring 1 (10.10.10.171) is normal ethernet interface.


regards
ulrich

[root@pcmk1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID -1424709440
RING ID 0
        id      = 192.168.20.171
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.10.10.171
        status  = ring 1 active with no faults


On Tue, 2012-07-17 at 15:24 +0200, Ulrich Leodolter wrote:
> hi,
> 
> i have setup a very basic 2-node cluster on RHEL 6.3
> first thing i tried was to setup stonith/fencing_ipmilan
> resource.
> 
> fencing seems to work,  if i kill corosync on one node
> it is restarted (ipmi reboot) by the other node.  
> 
> but after restart the cluster doesn't come back to normal
> operation,   i looks like the pacemakerd hangs and the
> node status is offline.
> 
> i found only one way to fix the problem:
> 
> killall -9 pacemakerd
> service pacemakerd start
> 
> after that both nodes are online.  below you can see my
> cluster configuration and the corosync.log messages which
> repeat forever when pacemakerd hangs.
> 
> i am new to pacemaker and followed the "Clusters from Scratch"
> guide for the first setup.   information about fence_ipmilan
> is from google :-)
> 
> can u give me tips ?? what is wrong with this basic cluster
> config.  i don't want to add more resources (kvm virtual
> machines) until fencing is configured correctly.
> 
> thx
> ulrich
> 
> 
> 
> [root@pcmk1 ~]# crm configure show
> node pcmk1 \
>       attributes standby="off"
> node pcmk2 \
>       attributes standby="off"
> primitive p_stonith_pcmk1 stonith:fence_ipmilan \
>       params auth="password" ipaddr="192.168.120.171" passwd="xxx" 
> lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" 
> pcmk_host_check="static-list" pcmk_host_list="pcmk1" \
>       meta target-role="started"
> primitive p_stonith_pcmk2 stonith:fence_ipmilan \
>       params auth="password" ipaddr="192.168.120.172" passwd="xxx" 
> lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" 
> pcmk_host_check="static-list" pcmk_host_list="pcmk2" \
>       meta target-role="started"
> location loc_p_stonith_pcmk1_pcmk1 p_stonith_pcmk1 -inf: pcmk1
> location loc_p_stonith_pcmk2_pcmk2 p_stonith_pcmk2 -inf: pcmk2
> property $id="cib-bootstrap-options" \
>       expected-quorum-votes="2" \
>       dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
>       no-quorum-policy="ignore" \
>       cluster-infrastructure="openais"
> rsc_defaults $id="rsc-options" \
>       resource-stickiness="200"
> 
> 
> /var/log/cluster/corosync.log:
> 
> Jul 13 11:29:41 [1859] pcmk2       crmd:     info: do_dc_release:       DC 
> role released
> Jul 13 11:29:41 [1859] pcmk2       crmd:     info: do_te_control:       
> Transitioner is now inactive
> Jul 13 11:29:41 [1854] pcmk2        cib:     info: set_crm_log_level:   New 
> log level: 3 0
> Jul 13 11:30:01 [1859] pcmk2       crmd:     info: crm_timer_popped:    
> Election Trigger (I_DC_TIMEOUT) just popped (20000ms)
> Jul 13 11:30:01 [1859] pcmk2       crmd:  warning: do_log:      FSA: Input 
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jul 13 11:30:01 [1859] pcmk2       crmd:   notice: do_state_transition:       
>   State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT 
> cause=C_TIMER_POPPED origin=crm_timer_poppe
> d ]
> Jul 13 11:30:01 [1859] pcmk2       crmd:     info: do_election_count_vote:    
>   Election 8 (owner: pcmk1) lost: vote from pcmk1 (Uptime)
> Jul 13 11:30:01 [1859] pcmk2       crmd:   notice: do_state_transition:       
>   State transition S_ELECTION -> S_PENDING [ input=I_PENDING 
> cause=C_FSA_INTERNAL origin=do_election_count_
> vote ]
> 
> 



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] node offline after fencing (pacemakerd hangs)

Reply via email to