Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/06/2016 10:20 AM, Dan Swartzendruber wrote:
> On 2016-09-06 10:59, Ken Gaillot wrote:
> 
> [snip]
> 
>> I thought power-wait was intended for this situation, where the node's
>> power supply can survive a brief outage, so a delay is needed to ensure
>> it drains. In any case, I know people are using it for that.
>>
>> Are there any drawbacks to using power-wait for this purpose, even if
>> that wasn't its original intent? Is it just that the "on" will get the
>> delay as well?
> 
> I can't speak to the first part of your question, but for me the second
> part is a definite YES.  The issue is that I want a long enough delay to
> be sure the host is D E A D and not writing to the pool anymore; but
> that delay is now multiplied by 2, and if it gets "too long", vsphere
> guests can start getting disk I/O errors...

Ah, Marek's suggestions are the best way out, then. Fence agents are
usually simple shell scripts, so adding a power-wait-off option
shouldn't be difficult.

>>> *) Configure fence device to not use reboot but OFF, ON
>>> Very same to the situation when there are multiple power circuits; you
>>> have to switch them all OFF and afterwards turn them ON.
>>
>> FYI, no special configuration is needed for this with recent pacemaker
>> versions. If multiple devices are listed in a topology level, pacemaker
>> will automatically convert reboot requests into all-off-then-all-on.
> 
> My understanding was that applied to 1.1.14?  My CentOS 7 host has
> pacemaker 1.1.13 :(

Correct -- but most OS distributions, including CentOS, backport
specific bugfixes and features from later versions. In this case, as
long as you've applied updates (pacemaker-1.1.13-10 or later), you've
got it.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Dan Swartzendruber

On 2016-09-06 10:59, Ken Gaillot wrote:

[snip]


I thought power-wait was intended for this situation, where the node's
power supply can survive a brief outage, so a delay is needed to ensure
it drains. In any case, I know people are using it for that.

Are there any drawbacks to using power-wait for this purpose, even if
that wasn't its original intent? Is it just that the "on" will get the
delay as well?


I can't speak to the first part of your question, but for me the second 
part is a definite YES.  The issue is that I want a long enough delay to 
be sure the host is D E A D and not writing to the pool anymore; but 
that delay is now multiplied by 2, and if it gets "too long", vsphere 
guests can start getting disk I/O errors...



*) Configure fence device to not use reboot but OFF, ON
Very same to the situation when there are multiple power circuits; you
have to switch them all OFF and afterwards turn them ON.


FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.


My understanding was that applied to 1.1.14?  My CentOS 7 host has 
pacemaker 1.1.13 :(


[snip]


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/05/2016 09:38 AM, Marek Grac wrote:
> Hi,
> 
> On Mon, Sep 5, 2016 at 3:46 PM, Dan Swartzendruber  > wrote:
> 
> ...
> Marek, thanks.  I have tested repeatedly (8 or so times with disk
> writes in progress) with 5-7 seconds and have had no corruption.  My
> only issue with using power_wait here (possibly I am
> misunderstanding this) is that the default action is 'reboot' which
> I *think* is 'power off, then power on'.  e.g. two operations to the
> fencing device.  The only place I need a delay though, is after the
> power off operation - doing so after power on is just wasted time
> that the resource is offline before the other node takes it over. 
> Am I misunderstanding this?  Thanks!
> 
> 
> You are right. Default sequence for reboot is:
> 
> get status, power off, delay(power-wait), get status [repeat until OFF],
> power on, delay(power-wait), get status [repeat until ON].
> 
> The power-wait was introduced because some devices respond with strange
> values when they are asked too soon after power change. It was not
> intended to be used in a way that you propose. Possible solutions:

I thought power-wait was intended for this situation, where the node's
power supply can survive a brief outage, so a delay is needed to ensure
it drains. In any case, I know people are using it for that.

Are there any drawbacks to using power-wait for this purpose, even if
that wasn't its original intent? Is it just that the "on" will get the
delay as well?

> *) Configure fence device to not use reboot but OFF, ON
> Very same to the situation when there are multiple power circuits; you
> have to switch them all OFF and afterwards turn them ON.

FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.

> *) Add a new option power-wait-off that will be used only in OFF case
> (and will override power-wait). It should be quite easy to do. Just,
> send us PR.
> 
> m,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What cib_stats line means in logfile

2016-09-06 Thread Ken Gaillot
On 09/05/2016 03:59 PM, Jan Pokorný wrote:
> On 05/09/16 21:26 +0200, Jan Pokorný wrote:
>> On 25/08/16 17:55 +0200, Sébastien Emeriau wrote:
>>> When i check my corosync.log i see this line :
>>>
>>> info: cib_stats: Processed 1 operations (1.00us average, 0%
>>> utilization) in the last 10min
>>>
>>> What does it mean (cpu load or just information) ?
>>
>> These are just periodically (10 minutes by default, if any
>> operations observed at all) emitted diagnostic summaries that
>> were once considered useful, which was later reconsidered
>> leading to their complete removal:
>>
>> https://github.com/ClusterLabs/pacemaker/commit/73e8c89#diff-37b681fa792dfc09ec67bb0d64eb55feL306
>>
>> Honestly, using as old Pacemaker as 1.1.8 (released 4 years ago)
> 
> actually, it must have been even older than that (I'm afraid to ask).
> 
>> would be a bigger concern for me.  Plenty of important fixes
>> (as well as enhancements) have been added since then...
> 
> P.S. Checked my mailbox, aggregating plentiful sources such as this
> list and various GitHub notifications, and found 1 other trace of
> such an oudated version within this year + 2 another last year(!).

My guess is Debian -- the pacemaker package stagnated in Debian for a
long time, so the stock Debian packages were at 1.1.7 as late as wheezy
(initially released in 2013, but it's LTS until 2018). Then, pacemaker
was dropped entirely from jessie.

Recent versions are once again actively maintained in Debian
backports/unstable, so the situation should improve from here on out,
but I bet a lot of Debian boxes still run wheezy or earlier.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] clustering fiber channel pools on multiple wwns

2016-09-06 Thread Gabriele Bulfon
Hi,
on illumos, I have a way to cluster one zfs pool on two nodes, by moving 
ip,pool and its shares at once on the other node.
This works for iscsi too: the ip of the target has been migrated together with 
the pool, so the iscsi resource is still there running on the same ip (just a 
different node).
Now I was thinking to do the same with fiber channel: two nodes, each with its 
own qlogic fc connected to a fc switch, with vmware clients with their fc cards 
connected on the same switch.
I can't see how I can do this with fc, because with iscsi I can migrate the 
hosting IP, but with fc I can't migrate the hosting wwn!
What I need, is to tell vmware that the target volume may be running on two 
different wwns, so a failing wwn should trigger retry on the other wwn: the 
pool and shared volumes will be moving from one wwn to the other.
Am I dreaming??
Gabriele

Sonicle S.r.l.
:
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] unable to add removed node to cluster

2016-09-06 Thread Tomas Jelinek

Hi,

Dne 6.9.2016 v 08:12 Omar Jaber napsal(a):

Hi,

I create cluster contain three nodes  when I remove one of the node by
run "pcs cluster destroy"  command


This is the root cause of your problem.  "pcs cluster destroy" only 
wipes out cluster configuration from a node but it does not tell the 
rest of the cluster that the node got removed.  Use "pcs cluster node 
remove" to remove a node from a cluster.




The node was  stopped from cluster, but when I try to  rejoin the node
by run  commands

1-systemctl start pcsd.service

2-systemctl start pcsd.service

(one removed node)

3-*pcs cluster auth*

4-*pcs cluster node add*

(On a node in the existing cluster)

the output from last command  (Error: unable to add hosname1 on
hostname2- Error connecting to hostname2- (HTTP error: 400

Error: unable to add hostname1 on hostname3 - Error connecting to
hostname3 - (HTTP error: 400)

Error: unable to add hostname1 on hostname1- Error connecting to
hostname2 - (HTTP error: 400)

Error: Unable to update any nodes


This fails most probably because the node you want to add is still 
present in the cluster configuration on the remaining nodes.  You can 
get detailed info by running "pcs cluster node add  --debug".


Now to fix that run "pcs cluster localnode remove " on the 
two remaining nodes.  Then you can add the removed node back to the cluster.



Regards,
Tomas





Any idea what is the problem ?



Thanks

Omar Jaber













___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] unable to add removed node to cluster

2016-09-06 Thread Omar Jaber
Hi,
I create cluster contain three nodes  when I remove one of the node by run "pcs 
cluster destroy"  command
The node was  stopped from cluster, but when I try to  rejoin the node by run  
commands
1-systemctl start pcsd.service
2-systemctl start pcsd.service
(one removed node)

3-pcs cluster auth 

4-pcs cluster node add 
(On a node in the existing cluster)
the output from last command  (Error: unable to add hosname1 on hostname2- 
Error connecting to hostname2- (HTTP error: 400
Error: unable to add hostname1 on hostname3 - Error connecting to hostname3 - 
(HTTP error: 400)
Error: unable to add hostname1 on hostname1- Error connecting to hostname2 - 
(HTTP error: 400)
Error: Unable to update any nodes

Any idea what is the problem ?

Thanks
Omar Jaber





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org