Hi!

I have a small corosync/pacemaker based cluster which consists of 4 nodes. 2 
nodes are in standby mode, another 2 actually handle all the resources. 

corosync  ver. 1.4.7-1. 
pacemaker  ver  1.1.11.
os: ubuntu 12.04. 

Inside our production environment which has a plenty of free ram,cpu etc 
everything is working well. When I switch one node off all the resources move 
to another without any problems. And vice versa. That's what I need :)

Our staging environment has rather weak hardware (that's ok - it's just staging 
:) ) and is rather busy. Sometimes it even doesn't have enough cpu or disk 
speed to be stable. When that happens some of cluster resources fail (which I 
consider to be normal), but also I can see the following crm output:

Node db-node1: standby
Node db-node2: standby
Online: [ lb-node1 lb-node2 ]

 Pgpool2        (ocf::heartbeat:pgpool):        FAILED (unmanaged) [ lb-node2 
lb-node1 ]
 Resource Group: IPGroup
     FailoverIP1        (ocf::heartbeat:IPaddr2):       Started [ lb-node2 
lb-node1 ]

As you can see the resource ocf::heartbeat:IPaddr2 is started on both nodes ( 
lb-node2 and lb-node1 ). But I can't figure out how than could happen. 

this is the output of my crm configure show:

node db-node1 \
        attributes standby=on
node db-node2 \
        attributes standby=on
node lb-node1
node lb-node2
primitive Cachier ocf:site:cachier \
        op monitor interval=10s timeout=30s depth=10 \
        meta target-role=Started
primitive FailoverIP1 IPaddr2 \
        params ip=111.22.33.44 cidr_netmask=32 iflabel=FAILOVER \
        op monitor interval=30s
primitive Mailer ocf:site:mailer \
        meta target-role=Started \
        op monitor interval=10s timeout=30s depth=10
primitive Memcached memcached \
        op monitor interval=10s timeout=30s depth=10 \
        meta target-role=Started
primitive Nginx nginx \
        params status10url="/nginx_status" testclient=curl port=8091 \
        op monitor interval=10s timeout=30s depth=10 \
        op start interval=0 timeout=40s \
        op stop interval=0 timeout=60s \
        meta target-role=Started
primitive Pgpool2 pgpool \
        params checkmethod=pid \
        op monitor interval=30s \
        op start interval=0 timeout=40s \
        op stop interval=0 timeout=60s
group IPGroup FailoverIP1 \
        meta target-role=Started
colocation ip-with-cachier inf: Cachier IPGroup
colocation ip-with-mailer inf: Mailer IPGroup
colocation ip-with-memcached inf: Memcached IPGroup
colocation ip-with-nginx inf: Nginx IPGroup
colocation ip-with-pgpool inf: Pgpool2 IPGroup
order cachier-after-ip inf: IPGroup Cachier
order mailer-after-ip inf: IPGroup Mailer
order memcached-after-ip inf: IPGroup Memcached
order nginx-after-ip inf: IPGroup Nginx
order pgpool-after-ip inf: IPGroup Pgpool2
property cib-bootstrap-options: \
        expected-quorum-votes=4 \
        stonith-enabled=false \
        default-resource-stickiness=100 \
        maintenance-mode=false \
        dc-version=1.1.10-9d39a6b \
        cluster-infrastructure="classic openais (with plugin)" \
        last-lrm-refresh=1422438144


So the question is - does my config allow a resource like 
ocf::heartbeat:IPaddr2 to be started on multiple nodes simultaneously? Is it 
something that normally can happen? Or is it happening because of the shortage 
of computing power which i described earlier? : )
How can I prevent a thing like this from happening? Is it a case which normally 
is supposed to be solved by STONITH?  

Thanks in advance.

--
Best regards,
Sergey Arlashin








_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to