Re: [ClusterLabs] How to force remove a cluster node?

2017-04-19 Thread Scott Greenlese
Tomas,

Yes, I have an IBM internal build we're using for KVM on System Z.I
tried the --force option and, while it didn't complain,
it didn't work either (as expected, as per bug
https://bugzilla.redhat.com/show_bug.cgi?id=1225423), so
maybe it is a valid option.

[root@zs95kj VD]# date; pcs cluster node remove zs95KLpcs1 --force
Wed Apr 19 10:14:10 EDT 2017
Error: pcsd is not running on zs95KLpcs1
[root@zs95kj VD]#

Hopefully we'll roll in pcs-0.9.143-15.el7_2.1 with our next release of
KVM.

In the mean time, thanks very much for all the valuable feedback.  I'm good
to go for now with the workaround.

Scott

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgre...@us.ibm.com




From:   Tomas Jelinek <tojel...@redhat.com>
To: users@clusterlabs.org
Date:   04/19/2017 03:25 AM
Subject:    Re: [ClusterLabs] How to force remove a cluster node?



Dne 18.4.2017 v 19:52 Scott Greenlese napsal(a):
> My thanks to both Ken Gaillot and Tomas Jelinek for the workaround. The
> procedure(s) worked like a champ.
>
> I just have a few side comments / observations ...
>
> First - Tomas, in the bugzilla you show this error message on your
> cluster remove command, directing you to use the --force option:
>
> [root@rh72-node1:~]# pcs cluster node remove rh72-node3
> Error: pcsd is not running on rh72-node3, use --force to override
>
> When I issue the cluster remove, I do not get and reference to the
> --force option in the error message:
>
> [root@zs93kl ]# pcs cluster node remove zs95KLpcs1
> Error: pcsd is not running on zs95KLpcs1
> [root@zs93kl ]#
>
> The man page doesn't mention --force at my level.

The man page doesn't mention --force for most commands in which --force
can be used. One shouldn't really make any conclusions from that.

> Is this a feature added after pcs-0.9.143-15.el7_2.ibm.2.s390x ?

The feature has been backported to pcs-0.9.143-15.el7_2.1. I cannot
really check if it is present in pcs-0.9.143-15.el7_2.ibm.2.s390x
because I don't have access to that particular build. Based on the name
I would say it was build internally at IBM. However, if the error
message doesn't suggest using --force, than the feature is most likely
not present in that build.

>
> Also, in your workaround procedure, you have me do: 'pcs cluster
> *localnode*remove  '.
> However, wondering why the 'localnode' option is not in the pcs man page
> for the pcs cluster command?
> The command / option worked great, just curious why it's not
documented ...

It's an internal pcs command which is not meant to be run by users. It
exists mostly for the sake of the current pcs/pcsd architecture ("pcs
cluster node" calls pcsd instance on all nodes over network and pcsd
instance runs "pcs cluster localnode" to do the actual job) and is
likely to be removed in the future. It is useful for the workaround as
the check whether all nodes are running is done in the "pcs cluster
node" command.

Regards,
Tomas

>
> [root@zs93kl # pcs cluster localnode remove zs93kjpcs1
> zs93kjpcs1: successfully removed!
>
> My man page level:
>
> [root@zs93kl VD]# rpm -q --whatprovides /usr/share/man/man8/pcs.8.gz
> pcs-0.9.143-15.el7_2.ibm.2.s390x
> [root@zs93kl VD]#
>
> Thanks again,
>
> Scott G.
>
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
> INTERNET: swgre...@us.ibm.com
>
>
> Inactive hide details for Tomas Jelinek ---04/18/2017 09:04:59 AM---Dne
> 17.4.2017 v 17:28 Ken Gaillot napsal(a): > On 04/13/201Tomas Jelinek
> ---04/18/2017 09:04:59 AM---Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):
>> On 04/13/2017 01:11 PM, Scott Greenlese wrote:
>
> From: Tomas Jelinek <tojel...@redhat.com>
> To: users@clusterlabs.org
> Date: 04/18/2017 09:04 AM
> Subject: Re: [ClusterLabs] How to force remove a cluster node?
>
> 
>
>
>
> Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):
>> On 04/13/2017 01:11 PM, Scott Greenlese wrote:
>>> Hi,
>>>
>>> I need to remove some nodes from my existing pacemaker cluster which
are
>>> currently unbootable / unreachable.
>>>
>>> Referenced
>>>
>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR

>>>
>>> *4.4.4. Removing Cluster Nodes*
>>> The following command shuts down the specified node and removes it from
>>> the cluster configuration file, corosync.conf, on all of the other
nodes
>>> in the cluster. For information on removing all information about the
>>> cluster from the cluster nodes entirely, thereby destroying the cl

Re: [ClusterLabs] How to force remove a cluster node?

2017-04-19 Thread Tomas Jelinek

Dne 18.4.2017 v 19:52 Scott Greenlese napsal(a):

My thanks to both Ken Gaillot and Tomas Jelinek for the workaround. The
procedure(s) worked like a champ.

I just have a few side comments / observations ...

First - Tomas, in the bugzilla you show this error message on your
cluster remove command, directing you to use the --force option:

[root@rh72-node1:~]# pcs cluster node remove rh72-node3
Error: pcsd is not running on rh72-node3, use --force to override

When I issue the cluster remove, I do not get and reference to the
--force option in the error message:

[root@zs93kl ]# pcs cluster node remove zs95KLpcs1
Error: pcsd is not running on zs95KLpcs1
[root@zs93kl ]#

The man page doesn't mention --force at my level.


The man page doesn't mention --force for most commands in which --force 
can be used. One shouldn't really make any conclusions from that.



Is this a feature added after pcs-0.9.143-15.el7_2.ibm.2.s390x ?


The feature has been backported to pcs-0.9.143-15.el7_2.1. I cannot 
really check if it is present in pcs-0.9.143-15.el7_2.ibm.2.s390x 
because I don't have access to that particular build. Based on the name 
I would say it was build internally at IBM. However, if the error 
message doesn't suggest using --force, than the feature is most likely 
not present in that build.




Also, in your workaround procedure, you have me do: 'pcs cluster
*localnode*remove  '.
However, wondering why the 'localnode' option is not in the pcs man page
for the pcs cluster command?
The command / option worked great, just curious why it's not documented ...


It's an internal pcs command which is not meant to be run by users. It 
exists mostly for the sake of the current pcs/pcsd architecture ("pcs 
cluster node" calls pcsd instance on all nodes over network and pcsd 
instance runs "pcs cluster localnode" to do the actual job) and is 
likely to be removed in the future. It is useful for the workaround as 
the check whether all nodes are running is done in the "pcs cluster 
node" command.


Regards,
Tomas



[root@zs93kl # pcs cluster localnode remove zs93kjpcs1
zs93kjpcs1: successfully removed!

My man page level:

[root@zs93kl VD]# rpm -q --whatprovides /usr/share/man/man8/pcs.8.gz
pcs-0.9.143-15.el7_2.ibm.2.s390x
[root@zs93kl VD]#

Thanks again,

Scott G.

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
INTERNET: swgre...@us.ibm.com


Inactive hide details for Tomas Jelinek ---04/18/2017 09:04:59 AM---Dne
17.4.2017 v 17:28 Ken Gaillot napsal(a): > On 04/13/201Tomas Jelinek
---04/18/2017 09:04:59 AM---Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):

On 04/13/2017 01:11 PM, Scott Greenlese wrote:


From: Tomas Jelinek <tojel...@redhat.com>
To: users@clusterlabs.org
Date: 04/18/2017 09:04 AM
Subject: Re: [ClusterLabs] How to force remove a cluster node?





Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):

On 04/13/2017 01:11 PM, Scott Greenlese wrote:

Hi,

I need to remove some nodes from my existing pacemaker cluster which are
currently unbootable / unreachable.

Referenced


https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR


*4.4.4. Removing Cluster Nodes*
The following command shuts down the specified node and removes it from
the cluster configuration file, corosync.conf, on all of the other nodes
in the cluster. For information on removing all information about the
cluster from the cluster nodes entirely, thereby destroying the cluster
permanently, refer to _Section 4.6, “Removing the Cluster
Configuration”_


<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusterremove-HAAR.html#s2-noderemove-HAAR>.


pcs cluster node remove /node/

I ran the command with the cluster active on 3 of the 5 available
cluster nodes (with quorum). The command fails with:

[root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
Thu Apr 13 13:40:59 EDT 2017
*Error: pcsd is not running on zs93kjpcs1*


The node was not removed:

[root@zs90KP VD]# pcs status |less
Cluster name: test_cluster_2
Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
2017 by root via cibadmin on zs93KLpcs1
Stack: corosync
Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
partition with quorum
45 nodes and 180 resources configured

Node zs95KLpcs1: UNCLEAN (offline)
Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
*OFFLINE: [ zs93kjpcs1 ]*


Is there a way to force remove a node that's no longer bootable? If not,
what's the procedure for removing a rogue cluster node?

Thank you...

Scott Greenlese ... KVM on System Z - Solutions Test, IBM

Poughkeepsie, N.Y.

INTERNET: swgre...@us.ibm.com


Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure p

Re: [ClusterLabs] How to force remove a cluster node?

2017-04-18 Thread Scott Greenlese
My thanks to both Ken Gaillot and Tomas Jelinek for the workaround.   The
procedure(s) worked like a champ.

I just have a few side comments / observations ...

First - Tomas,  in the bugzilla you show this error message on your cluster
remove command, directing you to use the --force option:

[root@rh72-node1:~]# pcs cluster node remove rh72-node3
Error: pcsd is not running on rh72-node3, use --force to override

When I issue the cluster remove, I do not get and reference to the --force
option in the error message:

[root@zs93kl ]# pcs cluster node remove  zs95KLpcs1
Error: pcsd is not running on zs95KLpcs1
[root@zs93kl ]#

The man page doesn't mention --force at my level.  Is this a feature added
after pcs-0.9.143-15.el7_2.ibm.2.s390x ?

Also, in your workaround procedure,  you have me do: 'pcs cluster localnode
remove  '.
However, wondering why the 'localnode' option is not in the pcs man page
for the pcs cluster command?
The command / option worked great, just curious why it's not documented ...

[root@zs93kl #  pcs cluster localnode remove zs93kjpcs1
zs93kjpcs1: successfully removed!

My man page level:

[root@zs93kl VD]# rpm -q --whatprovides /usr/share/man/man8/pcs.8.gz
pcs-0.9.143-15.el7_2.ibm.2.s390x
[root@zs93kl VD]#

Thanks again,

Scott G.

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgre...@us.ibm.com




From:   Tomas Jelinek <tojel...@redhat.com>
To: users@clusterlabs.org
Date:   04/18/2017 09:04 AM
Subject:    Re: [ClusterLabs] How to force remove a cluster node?



Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):
> On 04/13/2017 01:11 PM, Scott Greenlese wrote:
>> Hi,
>>
>> I need to remove some nodes from my existing pacemaker cluster which are
>> currently unbootable / unreachable.
>>
>> Referenced
>>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR

>>
>> *4.4.4. Removing Cluster Nodes*
>> The following command shuts down the specified node and removes it from
>> the cluster configuration file, corosync.conf, on all of the other nodes
>> in the cluster. For information on removing all information about the
>> cluster from the cluster nodes entirely, thereby destroying the cluster
>> permanently, refer to _Section 4.6, “Removing the Cluster
>> Configuration”_
>> <
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusterremove-HAAR.html#s2-noderemove-HAAR
>.
>>
>> pcs cluster node remove /node/
>>
>> I ran the command with the cluster active on 3 of the 5 available
>> cluster nodes (with quorum). The command fails with:
>>
>> [root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
>> Thu Apr 13 13:40:59 EDT 2017
>> *Error: pcsd is not running on zs93kjpcs1*
>>
>>
>> The node was not removed:
>>
>> [root@zs90KP VD]# pcs status |less
>> Cluster name: test_cluster_2
>> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
>> 2017 by root via cibadmin on zs93KLpcs1
>> Stack: corosync
>> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
>> partition with quorum
>> 45 nodes and 180 resources configured
>>
>> Node zs95KLpcs1: UNCLEAN (offline)
>> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
>> *OFFLINE: [ zs93kjpcs1 ]*
>>
>>
>> Is there a way to force remove a node that's no longer bootable? If not,
>> what's the procedure for removing a rogue cluster node?
>>
>> Thank you...
>>
>> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
>> INTERNET: swgre...@us.ibm.com
>
> Yes, the pcs command is just a convenient shorthand for a series of
> commands. You want to ensure pacemaker and corosync are stopped on the
> node to be removed (in the general case, obviously already done in this
> case), remove the node from corosync.conf and restart corosync on all
> other nodes, then run "crm_node -R " on any one active node.

Hi Scott,

It is possible to remove an offline node from a cluster with upstream
pcs 0.9.154 or RHEL pcs-0.9.152-5 (available in RHEL7.3) or newer.

If you have an older version, here's a workaround:
1. run 'pcs cluster localnode remove ' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R  --force' on one node
It's basically the same procedure Ken described.

See https://bugzilla.redhat.com/show_bug.cgi?id=1225423 for more details.

Regards,
Tomas

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home:

Re: [ClusterLabs] How to force remove a cluster node?

2017-04-18 Thread Tomas Jelinek

Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):

On 04/13/2017 01:11 PM, Scott Greenlese wrote:

Hi,

I need to remove some nodes from my existing pacemaker cluster which are
currently unbootable / unreachable.

Referenced
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR

*4.4.4. Removing Cluster Nodes*
The following command shuts down the specified node and removes it from
the cluster configuration file, corosync.conf, on all of the other nodes
in the cluster. For information on removing all information about the
cluster from the cluster nodes entirely, thereby destroying the cluster
permanently, refer to _Section 4.6, “Removing the Cluster
Configuration”_
.

pcs cluster node remove /node/

I ran the command with the cluster active on 3 of the 5 available
cluster nodes (with quorum). The command fails with:

[root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
Thu Apr 13 13:40:59 EDT 2017
*Error: pcsd is not running on zs93kjpcs1*


The node was not removed:

[root@zs90KP VD]# pcs status |less
Cluster name: test_cluster_2
Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
2017 by root via cibadmin on zs93KLpcs1
Stack: corosync
Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
partition with quorum
45 nodes and 180 resources configured

Node zs95KLpcs1: UNCLEAN (offline)
Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
*OFFLINE: [ zs93kjpcs1 ]*


Is there a way to force remove a node that's no longer bootable? If not,
what's the procedure for removing a rogue cluster node?

Thank you...

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
INTERNET: swgre...@us.ibm.com


Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure pacemaker and corosync are stopped on the
node to be removed (in the general case, obviously already done in this
case), remove the node from corosync.conf and restart corosync on all
other nodes, then run "crm_node -R " on any one active node.


Hi Scott,

It is possible to remove an offline node from a cluster with upstream 
pcs 0.9.154 or RHEL pcs-0.9.152-5 (available in RHEL7.3) or newer.


If you have an older version, here's a workaround:
1. run 'pcs cluster localnode remove ' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R  --force' on one node
It's basically the same procedure Ken described.

See https://bugzilla.redhat.com/show_bug.cgi?id=1225423 for more details.

Regards,
Tomas

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to force remove a cluster node?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 01:11 PM, Scott Greenlese wrote:
> Hi,
> 
> I need to remove some nodes from my existing pacemaker cluster which are
> currently unbootable / unreachable.
> 
> Referenced
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR
> 
> *4.4.4. Removing Cluster Nodes*
> The following command shuts down the specified node and removes it from
> the cluster configuration file, corosync.conf, on all of the other nodes
> in the cluster. For information on removing all information about the
> cluster from the cluster nodes entirely, thereby destroying the cluster
> permanently, refer to _Section 4.6, “Removing the Cluster
> Configuration”_
> .
> 
> pcs cluster node remove /node/
> 
> I ran the command with the cluster active on 3 of the 5 available
> cluster nodes (with quorum). The command fails with:
> 
> [root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
> Thu Apr 13 13:40:59 EDT 2017
> *Error: pcsd is not running on zs93kjpcs1*
> 
> 
> The node was not removed:
> 
> [root@zs90KP VD]# pcs status |less
> Cluster name: test_cluster_2
> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
> 2017 by root via cibadmin on zs93KLpcs1
> Stack: corosync
> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 45 nodes and 180 resources configured
> 
> Node zs95KLpcs1: UNCLEAN (offline)
> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
> *OFFLINE: [ zs93kjpcs1 ]*
> 
> 
> Is there a way to force remove a node that's no longer bootable? If not,
> what's the procedure for removing a rogue cluster node?
> 
> Thank you...
> 
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com

Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure pacemaker and corosync are stopped on the
node to be removed (in the general case, obviously already done in this
case), remove the node from corosync.conf and restart corosync on all
other nodes, then run "crm_node -R " on any one active node.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] How to force remove a cluster node?

2017-04-13 Thread Scott Greenlese

Hi,

I need to remove some nodes from my existing pacemaker cluster which are
currently unbootable / unreachable.

Referenced
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR

4.4.4. Removing Cluster Nodes
The following command shuts down the specified node and removes it from the
cluster configuration file, corosync.conf, on all of the other nodes in the
cluster. For information on removing all information about the cluster from
the cluster nodes entirely, thereby destroying the cluster permanently,
refer to Section 4.6, “Removing the Cluster Configuration”.
pcs cluster node remove node

I ran the command with the cluster active on 3 of the 5 available cluster
nodes (with quorum).  The command fails with:

[root@zs90KP VD]# date;pcs cluster node remove zs93kjpcs1
Thu Apr 13 13:40:59 EDT 2017
Error: pcsd is not running on zs93kjpcs1


The node was not removed:

[root@zs90KP VD]# pcs status |less
Cluster name: test_cluster_2
Last updated: Thu Apr 13 14:08:15 2017  Last change: Wed Apr 12
16:40:26 2017 by root via cibadmin on zs93KLpcs1
Stack: corosync
Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition
with quorum
45 nodes and 180 resources configured

Node zs95KLpcs1: UNCLEAN (offline)
Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
OFFLINE: [ zs93kjpcs1 ]


Is there a way to force remove a node that's no longer bootable?If not,
what's the procedure for removing a rogue cluster node?

Thank you...

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgre...@us.ibm.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org