[Openais] failover issues faced with pacemaker/corosync

Amit Jathar Fri, 25 Feb 2011 08:06:12 -0800

Hi,

I am trying to use Pacemaker with corosync & facing following issues.
I want to know whether these are due to misconfiguration or these are known 
issues.


I have two nodes in the cluster :- VIP-1 & VIP-2
The corosync version is :-
Corosync Cluster Engine, version '1.2.7' SVN revision '3008'

==================================================================
The crm_mon output is :-

============
Last updated: Thu Feb 24 17:44:33 2011
Stack: openais
Current DC: VIP-1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ VIP-1 VIP-2 ]

ClusterIP       (ocf::heartbeat:IPaddr2):       Started VIP-1
WebSite (ocf::heartbeat:apache):        Started VIP-1
My_Tomcat       (ocf::heartbeat:tomcat):        Started VIP-

==================================================================
My configuration is :-

[root@VIP-1 local]# crm configure show
node VIP-1
node VIP-2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="172.16.201.23" cidr_netmask="32" \
        op monitor interval="5s"
primitive My_Tomcat ocf:heartbeat:tomcat \
        params catalina_home="/root/Softwares/apache-tomcat-6.0.26" 
java_home="/root/Softwares/Java/linux/jdk1.6.0_21" \
        op monitor interval="5s"
primitive WebSite ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="5s"
property $id="cib-bootstrap-options" \
        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1298547656"
rsc_defaults $id="rsc-options" \
        resource-stickiness="2"
=======================================================================


Issue -1)

I observed that If any service is manually shutdown on the VIP-1, then the 
corosync restarts it on the same node.
In the logs, I can see this :-
=================================================================================================================
Feb 24 18:14:32 VIP-1 pengine: [28098]: info: get_failcount: My_Tomcat has 
failed 35 times on VIP-1
Feb 24 18:14:32 VIP-1 pengine: [28098]: notice: common_apply_stickiness: 
My_Tomcat can fail 999965 more times on VIP-1 before being forced off
==================================================================================================================

I have not configured to restart the service for INFINITY times on VIP-1, so is 
this default behavior?
Is there any configuration to tell the corosync to restart the service only for 
two times on VIP-1  & if not started, then start it on VIP-2 ?

Issue -2)

I have changed the error codes in the Apache & Tomcat RA scripts, & returned 
the error code=2 if the monitor fails.
Now, if I manually stop the service,  then it is not restarted on the VIP-1 but 
it is started on VIP-2.
The fail count of that service on VIP-1 is showing as 1.

Now, if I make the service manually  down on the VIP-2, then it is not getting 
started on the VIP-1 untill I clean up the resource.

So, is this known behavior or I have missed any configuration?

Let me know if you need more information.

Thanks,
Amit




________________________________
This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.
________________________________

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] failover issues faced with pacemaker/corosync

Reply via email to