[ClusterLabs] Opt-in cluster shows resources stopped where no nodes should be considered

2016-03-04 Thread Martin Schlegel
Hello all

While our cluster seems to be working just fine I have noticed something in the
crm_mon output that I don't quite understand and that is throwing off my
monitoring a bit as stopped resources could mean something is wrong. I was
hoping somebody could help me to understand what it means. It seems this might
have something to do with the fact I am using remote nodes, but I cannot wrap my
head around it.

What I am seeing are 3 additional, unexpected lines in the crm_mon -1rR output
listing my "p_pgcPgbouncer_test" resources as stopped even though there should
not be any more nodes to be considered in my mind (opt-in cluster, see location
rules). At the same time this is not happening to my p_pgsqln resources as shown
at the top of the crm_mon output.

The important crm_mon -1rR output lines further below are marked with arrows ->
  <---.  


Some background on the policy:
We are running an asymmetric / opt-in cluster (property symmetric-cluster=false.


The cluster's main purpose is to take care of a 3+-nodes replicating master /
slave database running strictly on nodes pg1, pg2 and pg3 per location rule
l_pgs_resources.

We also have 2 remote nodes pagalog1 & pgalog2 defined to control database
connection pooler resources (p_pgcPgbouncer_test) to facilitate client
connection reroute as per location rule l_pgc_resources.


crm_mon -1rR output:

Last updated: Fri Mar  4 09:56:02 2016  Last change: Fri Mar  4 09:55:47
2016 by root via cibadmin on pg1
Stack: corosync
Current DC: pg1 (1) (version 1.1.14-70404b0) - partition with quorum
5 nodes and 29 resources configured

Online: [ pg1 (1) pg2 (2) pg3 (3) ]
RemoteOnline: [ pgalog1 pgalog2 ]

Full list of resources:

 Master/Slave Set: ms_pgsqln [p_pgsqln]

   
 p_pgsqln   (ocf::heartbeat:pgsqln):Master pg3

  
 p_pgsqln   (ocf::heartbeat:pgsqln):Started pg1

 
 p_pgsqln   (ocf::heartbeat:pgsqln):Started pg2
-> NO additional lines here <---
 Masters: [ pg3 ]
 Stopped: [ pg1 pg2 ]
[...]
 pgalog1(ocf::pacemaker:remote):Started pg1
 pgalog2(ocf::pacemaker:remote):Started pg3
 Clone Set: cl_pgcPgbouncer [p_pgcPgbouncer_test]
 p_pgcPgbouncer_test(ocf::heartbeat:pgbouncer): Started pgalog1
 p_pgcPgbouncer_test(ocf::heartbeat:pgbouncer): Started pgalog2
->   p_pgcPgbouncer_test(ocf::heartbeat:pgbouncer): Stopped
<
->   p_pgcPgbouncer_test(ocf::heartbeat:pgbouncer): Stopped
<
->   p_pgcPgbouncer_test(ocf::heartbeat:pgbouncer): Stopped
<
 Started: [ pgalog1 pgalog2 ]



Here are the most important parts of the configuration as shown in "crm
configure show":

[...]
primitive pgalog1 ocf:pacemaker:remote \
params server=pgalog1 port=3121 \
meta target-role=Started
primitive pgalog2 ocf:pacemaker:remote \
params server=pgalog2 port=3121 \
meta target-role=Started
[...]
location l_pgc_resources { cl_pgcPgbouncer } resource-discovery=exclusive \
rule #uname eq pgalog1 \
rule #uname eq pgalog2

location l_pgs_resources { cl_pgsServices1 ms_pgsqln p_pgsBackupjob pgalog1
pgalog2 } resource-discovery=exclusive \
rule #uname eq pg1 \
rule #uname eq pg2 \
rule #uname eq pg3

[...]
property cib-bootstrap-options: \
symmetric-cluster=false \
[...]


Regards,
Martin Schlegel

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-04 Thread Andrei Maruha
I have tried it on my cluster, "crm node delete" just removes node from 
the cib without updating of corosync.conf.


After restart of pacemaker service you will get something like this:
Online: [ node1 ]
OFFLINE: [ node2 ]


BTW, you will get the same state after "pacemaker restart", if you 
remove a node from corosync.conf and do not call "crm corosync reload".


On 03/04/2016 12:07 PM, Dejan Muhamedagic wrote:

Hi,

On Thu, Mar 03, 2016 at 03:20:56PM +0300, Andrei Maruha wrote:

Hi,
Usually I use the following steps to delete node from the cluster:
1. #crm corosync del-node 
2. #crm_node -R node --force
3. #crm corosync reload

I'd expect all this to be wrapped in "crm node delete". Isn't
that the case?

Also, is "corosync reload" really required after node removal?

Thanks,

Dejan


Instead of steps 1 and 2you can delete certain node from the
corosync config manually and run:
#corosync-cfgtool -R

On 03/03/2016 02:44 PM, Somanath Jeeva wrote:

Hi,

I am trying to remove a node from the pacemaker’/corosync cluster,
using the command “crm_node -R dl360x4061 –force”.

Though this command removes the node from the cluster, it is
appearing as offline after pacemaker/corosync restart in the nodes
that are online.

Is there any other command to completely delete the node from the
pacemaker/corosync cluster.

Pacemaker and Corosync Versions.

PACEMAKER=1.1.10

COROSYNC=1.4.1

Regards

Somanath Thilak J



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-04 Thread Dejan Muhamedagic
Hi,

On Thu, Mar 03, 2016 at 03:20:56PM +0300, Andrei Maruha wrote:
> Hi,
> Usually I use the following steps to delete node from the cluster:
> 1. #crm corosync del-node 
> 2. #crm_node -R node --force
> 3. #crm corosync reload

I'd expect all this to be wrapped in "crm node delete". Isn't
that the case?

Also, is "corosync reload" really required after node removal?

Thanks,

Dejan

> Instead of steps 1 and 2you can delete certain node from the
> corosync config manually and run:
> #corosync-cfgtool -R
> 
> On 03/03/2016 02:44 PM, Somanath Jeeva wrote:
> >
> >Hi,
> >
> >I am trying to remove a node from the pacemaker’/corosync cluster,
> >using the command “crm_node -R dl360x4061 –force”.
> >
> >Though this command removes the node from the cluster, it is
> >appearing as offline after pacemaker/corosync restart in the nodes
> >that are online.
> >
> >Is there any other command to completely delete the node from the
> >pacemaker/corosync cluster.
> >
> >Pacemaker and Corosync Versions.
> >
> >PACEMAKER=1.1.10
> >
> >COROSYNC=1.4.1
> >
> >Regards
> >
> >Somanath Thilak J
> >
> >
> >
> >___
> >Users mailing list: Users@clusterlabs.org
> >http://clusterlabs.org/mailman/listinfo/users
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> 

> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org