Re: [ClusterLabs] Pacemake/Corosync good fit for embedded product?

2018-04-12 Thread Klaus Wenninger
our services that require a much > faster switch over time it would appear we need something propriety. > > Regards > David > > On 12 April 2018 at 02:56, Klaus Wenninger <kwenn...@redhat.com > <mailto:kwenn...@redhat.com>> wrote: > > On 04/11/2018 10:44

Re: [ClusterLabs] Pacemake/Corosync good fit for embedded product?

2018-04-11 Thread Klaus Wenninger
On 04/11/2018 10:44 AM, Jan Friesse wrote: > David, > >> Hi, >> >> We are planning on creating a HA product in an active/standby >> configuration >> whereby the standby unit needs to take over from the active unit very >> fast >> (<50ms including all services restored). >> >> We are able to do

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-11 Thread Klaus Wenninger
On 04/11/2018 01:14 AM, Andrew Beekhof wrote: > > > On Wed, Apr 11, 2018 at 12:42 AM, Ken Gaillot > wrote: > > On Tue, 2018-04-10 at 08:50 +0200, Jehan-Guillaume de Rorthais wrote: > > On Tue, 10 Apr 2018 00:54:01 +0200 > > Jan Pokorný

Re: [ClusterLabs] Pacemaker's additional services for distributed applications (Was: Possible idea for 2.0.0: renaming the Pacemaker daemons)

2018-04-10 Thread Klaus Wenninger
On 04/10/2018 11:27 AM, Jan Pokorný wrote: > On 06/04/18 12:24 +0200, Jan Pokorný wrote: >> On 06/04/18 09:09 +0200, Kristoffer Grönlund wrote: > The idea is to provide a more generalized key-value store that > other applications built on top of pacemaker can use. Something > like a

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-10 Thread Klaus Wenninger
Ken Gaillot <kgail...@redhat.com> wrote: >>> >>>> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote: >>>>> On 04/03/2018 05:43 PM, Ken Gaillot wrote:   >>>>>> On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:   >&

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-05 Thread Klaus Wenninger
On 04/05/2018 06:45 AM, Andrei Borzenkov wrote: > 04.04.2018 01:35, Ken Gaillot пишет: >> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote: > ... >>>>> -inf constraints like that should effectively prevent >>>>> stonith-actions from being

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-03 Thread Klaus Wenninger
On 04/03/2018 11:35 PM, Ken Gaillot wrote: > On Tue, 2018-04-03 at 08:33 +0200, Kristoffer Grönlund wrote: >> Ken Gaillot writes: >> I would vote against PREFIX-configd as compared to other cluster software, I would expect that daemon name to refer to a

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-03 Thread Klaus Wenninger
On 04/03/2018 05:43 PM, Ken Gaillot wrote: > On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote: >> On 04/02/2018 04:02 PM, Ken Gaillot wrote: >>> On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais >>> wrote: >>>> On Sun, 1 Apr 2018 09:01

Re: [ClusterLabs] Announcing the first ClusterLabs video karaoke contest!

2018-04-03 Thread Klaus Wenninger
On 04/03/2018 09:52 AM, Christine Caulfield wrote: > On 03/04/18 07:14, Klaus Wenninger wrote: >> On 04/02/2018 02:57 AM, Digimer wrote: >>> On 2018-04-01 05:30 PM, Ken Gaillot wrote: >>>> In honor of the recent 10th anniversary of the first public release of >&g

Re: [ClusterLabs] Announcing the first ClusterLabs video karaoke contest!

2018-04-03 Thread Klaus Wenninger
On 04/02/2018 02:57 AM, Digimer wrote: > On 2018-04-01 05:30 PM, Ken Gaillot wrote: >> In honor of the recent 10th anniversary of the first public release of >> Pacemaker, ClusterLabs is proud to announce its first video karaoke >> contest! >> >> To participate, simply record video of yourself

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-02 Thread Klaus Wenninger
On 04/02/2018 04:02 PM, Ken Gaillot wrote: > On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais wrote: >> On Sun, 1 Apr 2018 09:01:15 +0300 >> Andrei Borzenkov wrote: >> >>> 31.03.2018 23:29, Jehan-Guillaume de Rorthais пишет: Hi all, I experienced

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-14 Thread Klaus Wenninger
gt; cluster and no downtime would occur. I had meant (a little bit provocative ;-) ) consider if you need the resources to be started via a resource-manager at all. Klaus > > -- > Regards, > Muhammad Sharfuddin > > On 3/13/2018 11:20 PM, Andrei Borzenkov wrote: >> 13.03.2018 17:

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-13 Thread Klaus Wenninger
. Yes I know. And what I tried to point out is that "no-quorum-policy=ignore" is dangerous for services that do require a resource-manager. If you don't have any of those go with a systemd startup. Regards, Klaus > > -- > Regards, > Muhammad Sharfuddin > > On 3/13/2018 7

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-13 Thread Klaus Wenninger
handle cases where one node doesn't come up after a full shutdown of all nodes, you probably could go for a setup with qdevice. Regards, Klaus > > -- > Regards, > Muhammad Sharfuddin > > On 3/13/2018 6:16 PM, Klaus Wenninger wrote: >> On 03/13/2018 02:03 PM, Muhammad

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-13 Thread Klaus Wenninger
rted every 5-10 minutes, till I >>> kill the mount.ocfs2 process: >>> >>>   dlm_controld[17655]: 62627 fence work wait for quorum >>>       dlm_controld[17655]: 62680 BFA9FF042AA045F4822C2A6A06020EE9 wait >>> for quorum >>> >>> I

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 04:17 PM, Valentin Vidic wrote: > On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: >> But isn't dlm directly interfering with corosync so >> that it would get the quorum state from there? >> As you have 2-node set probably on a 2-node-cluster

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
s get fixed automatically, i.e ocfs2 > resources mounts. > > -- > Regards, > Muhammad Sharfuddin > > On 3/12/2018 5:25 PM, Klaus Wenninger wrote: >> Hi Muhammad! >> >> Could you be a little bit more elaborate on your fencing-setup! >> I read about you

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
Hi Muhammad! Could you be a little bit more elaborate on your fencing-setup! I read about you using SBD but I don't see any sbd-fencing-resource. For the case you wanted to use watchdog-fencing with SBD this would require stonith-watchdog-timeout property to be set. But watchdog-fencing relies on

Re: [ClusterLabs] copy file

2018-03-07 Thread Klaus Wenninger
On 03/07/2018 10:03 AM, Mevo Govo wrote: > Thanks for advices, I will try! > lados. > > 2018-03-05 23:29 GMT+01:00 Ken Gaillot >: > > On Mon, 2018-03-05 at 15:09 +0100, Mevo Govo wrote: > > Hi, > > I am new in pacemaker. I think, I

Re: [ClusterLabs] watchdog module for sbd fencing on hyper-v

2018-02-26 Thread Klaus Wenninger
On 02/26/2018 07:11 PM, Muhammad Sharfuddin wrote: > On 2/26/2018 10:39 PM, Klaus Wenninger wrote: > >> On 02/26/2018 06:16 PM, Muhammad Sharfuddin wrote: >>> Linux kernel version 4.x >>> >>> OS: SLES 12 SP3. >>> >>> which watchdog module sh

Re: [ClusterLabs] watchdog module for sbd fencing on hyper-v

2018-02-26 Thread Klaus Wenninger
On 02/26/2018 06:16 PM, Muhammad Sharfuddin wrote: > Linux kernel version 4.x > > OS: SLES 12 SP3. > > which watchdog module should be use on virtual machines running on MS > hyper-v ? afaik there is no virtual watchdog device available with hyper-v. Thus the only possibility left for using sbd

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Klaus Wenninger
On 02/22/2018 02:55 PM, j...@disroot.org wrote: > Hi, > > I am trying to configure the failure-timeout for stonith, but I only can do > it for the other resources. > When try to enable it for stonith, I get this error: "Error: resource > option(s): 'failure-timeout', are not recognized for

Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Klaus Wenninger
On 02/13/2018 01:28 PM, Maxim wrote: > 13.02.2018 14:03, Klaus Wenninger пишет: >> - fencing helps you turning the  'maybe the node is down - it doesn't > > respond within x milli-seconds' into certainty that your node is dead > > and won't interfere with the rest of the cl

Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Klaus Wenninger
On 02/13/2018 11:46 AM, Maxim wrote: > 12.02.2018 19:31, Digimer пишет: >> Without fencing, all bets are  off. Please enable it and see if the > > issue remains > Seems, i know [in theory] about the fencing ability and its importance > (although I've never configured it so far). > But i don't

Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-12 Thread Klaus Wenninger
On 02/12/2018 04:34 PM, Maxim wrote: > 12.02.2018 16:15, Klaus Wenninger пишет: >> On 02/12/2018 01:02 PM, Maxim  wrote: > > fencing-disabled is probably due to it being a test-setup ... RHEL 6 > > pcs being made for configuring a cman-pacemaker-setup I'm not sure if >

Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-12 Thread Klaus Wenninger
On 02/12/2018 01:02 PM, Maxim wrote: > Hello, > > [Sorry for a message duplication. Web mail client ruined the > formatting of the previous e-mail =( ] > > There is a simple configuration of two cluster nodes (built via RHEL 6 > pcs interface) with multiple master/slave resources, disabled fencing

Re: [ClusterLabs] How to create the stonith resource in virtualbox

2018-02-08 Thread Klaus Wenninger
On 02/08/2018 02:05 PM, Andrei Borzenkov wrote: > On Thu, Feb 8, 2018 at 5:51 AM, 范国腾 wrote: >> Hello, >> >> I setup the pacemaker cluster using virtualbox. There are three nodes. The >> OS is centos7, the /dev/sdb is the shared storage(three nodes use the same >> disk

Re: [ClusterLabs] One volume is trimmable but the other is not?

2018-01-29 Thread Klaus Wenninger
On 01/29/2018 07:51 PM, Eric Robinson wrote: > > > >> -Original Message- >> From: Klaus Wenninger [mailto:kwenn...@redhat.com] >> Sent: Friday, January 26, 2018 11:38 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] One volume is trimmab

Re: [ClusterLabs] One volume is trimmable but the other is not?

2018-01-26 Thread Klaus Wenninger
On 01/26/2018 07:45 PM, Eric Robinson wrote: >>> I sent this to the drbd list too, but it’s possible that someone here >>> may know. >>> >>> >>> >>> This is a WEIRD one. >>> >>> >>> >>> Why would one drbd volume be trimmable and the other one not? >>> >> iirc drbd stores some of the config in the

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-26 Thread Klaus Wenninger
On 01/26/2018 04:37 PM, Ken Gaillot wrote: > On Fri, 2018-01-26 at 09:07 +0100, Jehan-Guillaume de Rorthais wrote: >> On Thu, 25 Jan 2018 15:21:30 -0500 >> Digimer wrote: >> >>> On 2018-01-25 01:28 PM, Ken Gaillot wrote: On Thu, 2018-01-25 at 13:06 -0500, Digimer wrote:  

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-26 Thread Klaus Wenninger
On 01/25/2018 09:21 PM, Digimer wrote: > On 2018-01-25 01:28 PM, Ken Gaillot wrote: >> On Thu, 2018-01-25 at 13:06 -0500, Digimer wrote: >>> On 2018-01-25 11:11 AM, Ken Gaillot wrote: On Wed, 2018-01-24 at 20:58 +0100, Jehan-Guillaume de Rorthais wrote: > On Wed, 24 Jan 2018 13:28:03

Re: [ClusterLabs] One volume is trimmable but the other is not?

2018-01-26 Thread Klaus Wenninger
On 01/25/2018 11:45 PM, Eric Robinson wrote: > > I sent this to the drbd list too, but it’s possible that someone here > may know. > >   > > This is a WEIRD one. > >   > > Why would one drbd volume be trimmable and the other one not? > iirc drbd stores some of the config in the meta-data as well

Re: [ClusterLabs] Antw: Clone resource active only if all nodes are active

2018-01-22 Thread Klaus Wenninger
On 01/22/2018 02:55 PM, alu...@poczta.onet.pl wrote: > Unfortunately meta clone-min introduced since 1.1.14 but I am using > 1.1.12. > > Is there any other options to have such type of resource? If not I > have to updatepacemaker software. Other possibility coming to my mind would be 3 primitives

[ClusterLabs] Clusterstack Dependency on SBD with systemd

2018-01-22 Thread Klaus Wenninger
Hi! Wanted to pick up where we dropped the issue before Christmas. Observations as a result of a startup-timeout issue led to the awareness that the way we are starting SBD together with the rest of the Clusterstack doesn't handle SBD startup-issues properly. SBD is integrated in the

Re: [ClusterLabs] “pcs --debug” does not work

2018-01-22 Thread Klaus Wenninger
On 01/22/2018 01:06 PM, Tomas Jelinek wrote: > Dne 22.1.2018 v 11:01 Klaus Wenninger napsal(a): >> On 01/22/2018 10:46 AM, Tomas Jelinek wrote: >>> Hello, >>> >>> the --debug option is supposed to be used as an additional option for >>> a pcs com

Re: [ClusterLabs] “pcs --debug” does not work

2018-01-22 Thread Klaus Wenninger
On 01/22/2018 10:46 AM, Tomas Jelinek wrote: > Hello, > > the --debug option is supposed to be used as an additional option for > a pcs command, for example: > > pcs status --debug > pcs --debug resource create dummy ocf:pacemaker:Dummy > > It writes what a command is doing in the background -

Re: [ClusterLabs] Opinions wanted: another logfile question for Pacemaker 2.0

2018-01-15 Thread Klaus Wenninger
On 01/15/2018 06:08 PM, Klaus Wenninger wrote: > On 01/15/2018 05:51 PM, Ken Gaillot wrote: >> Currently, Pacemaker will use the same detail log as corosync if one is >> specified (as "logfile:" in the "logging {...}" section of >> corosync.conf). >>

Re: [ClusterLabs] Opinions wanted: another logfile question for Pacemaker 2.0

2018-01-15 Thread Klaus Wenninger
On 01/15/2018 05:51 PM, Ken Gaillot wrote: > Currently, Pacemaker will use the same detail log as corosync if one is > specified (as "logfile:" in the "logging {...}" section of > corosync.conf). > > The corosync developers think that is a bad idea, and would like > pacemaker 2.0 to always use its

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Changes coming in Pacemaker 2.0.0

2018-01-15 Thread Klaus Wenninger
On 01/15/2018 10:17 AM, Ulrich Windl wrote: > >>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 15.01.2018 um 08:41 in > Nachricht <a5a3bfc9-ddfb-5cb6-797d-afb69e585...@redhat.com>: >> On 01/15/2018 08:33 AM, Ulrich Windl wrote: >>>>>> Ken

Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0

2018-01-14 Thread Klaus Wenninger
On 01/15/2018 08:33 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 11.01.2018 um 17:37 in Nachricht > <1515688644.12807.1.ca...@redhat.com>: >> On Thu, 2018-01-11 at 08:54 +0100, Ulrich Windl wrote: >>> On "--crm_xml -> --xml-text": Why not simply "--xml" (XML IS

Re: [ClusterLabs] Wrong sbd.service dependencies

2017-12-17 Thread Klaus Wenninger
On 12/17/2017 06:10 PM, Andrei Borzenkov wrote: > 17.12.2017 15:20, Gao,Yan пишет: >> On 2017/12/16 16:59, Andrei Borzenkov wrote: >>> 04.12.2017 21:55, Andrei Borzenkov пишет: >>> ... >> I tried it (on openSUSE Tumbleweed which is what I have at hand, it >> has >> SBD 1.3.0) and with

Re: [ClusterLabs] How to make a node persistent active in pacemaker-corosync?

2017-12-11 Thread Klaus Wenninger
On 12/10/2017 12:10 PM, Ricardo Cristian Ramirez wrote: > Hi all, > > I have an active-passive pacemaker-corosync configuration. > > When a node is powered up before the other one, it becomes active, and > the node, which is powered up second, becomes passive. > > (For a node, being active means

Re: [ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

2017-12-06 Thread Klaus Wenninger
On 12/06/2017 08:03 PM, Ken Gaillot wrote: > On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote: >> I assumed that with corosync 2.x quorum is maintained by corosync and >> pacemaker simply gets yes/no. Apparently this is more complicated. > It shouldn't be, but everything in HA-land is

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Klaus Wenninger
On 12/04/2017 04:02 PM, Kristoffer Grönlund wrote: > Tomas Jelinek writes: > >>> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >> First, it sends a request to each node to stop pacemaker. The requests >> are sent in parallel which prevents

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Klaus Wenninger
On 11/30/2017 01:41 PM, Ulrich Windl wrote: > "Gao,Yan" schrieb am 30.11.2017 um 11:48 in Nachricht > : >> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with

Re: [ClusterLabs] pacemaker self stonith

2017-11-30 Thread Klaus Wenninger
On 11/30/2017 11:41 AM, Hauke Homburg wrote: > Hallo List, > > I am searching für a possibility to stonith a pacemaker node himself. > > The Reason is ich need to check of the pacemaker noch can reach the > network outside the local network. Because of network outage. > > I can't connect to an ILO

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 09:09 PM, Kristoffer Grönlund wrote: > Adam Spiers writes: > >> OK, so reading between the lines, if we don't want our cluster's >> latest config changes accidentally discarded during a complete cluster >> reboot, we should ensure that the last man standing is also

Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 08:24 PM, Andrei Borzenkov wrote: > 29.11.2017 20:14, Klaus Wenninger пишет: >> On 11/28/2017 07:41 PM, Andrei Borzenkov wrote: >>> 28.11.2017 10:45, Ramann, Björn пишет: >>>> hi@all, >>>> >>>> in my configuration, the 1st N

Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Klaus Wenninger
On 11/28/2017 07:41 PM, Andrei Borzenkov wrote: > 28.11.2017 10:45, Ramann, Björn пишет: >> hi@all, >> >> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now >> I'm looking for a way to configure the cluster fence/stonith with two ESX >> server - is this possible? > if you

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote: > Adam Spiers writes: > >> - The whole cluster is shut down cleanly. >> >> - The whole cluster is then started up again. (Side question: what >> happens if the last node to shut down is not the first to start up? >> How

Re: [ClusterLabs] Antw: cluster with two ESX server

2017-11-28 Thread Klaus Wenninger
On 11/28/2017 10:18 AM, Ulrich Windl wrote: > > >> hi@all, >> >> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now >> I'm looking for a way to configure the cluster fence/stonith with two ESX >> server - is this possible? Haven't tried it but don't see obvious reasons

[ClusterLabs] sbd v1.3.1

2017-11-06 Thread Klaus Wenninger
Hi sbd - developers & users! Thanks to everybody for contributing to tests and further development. I tried to quickly summarize the changes in the repo since it was labeled v1.3.0: - Add commands to test/query watchdogs - Allow 2-node-operation with a single shared-disk - Overhaul of the

Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-11-06 Thread Klaus Wenninger
Hi! Not saying that the use of start-delay in the monitor-operations is a good thing. It should in most cases be definitely better to delay the return of start till a monitor would succeed. Have seen discussion about deprecating start-delay - don't know the current state though. But this case -

Re: [ClusterLabs] Transition aborted when disabling resource

2017-09-28 Thread Klaus Wenninger
On 09/28/2017 11:09 AM, Roberto Muñoz Gomez wrote: > > > It is common to get a "Transition aborted" error when try to disable > > a resource? > > Yes, "Transition aborted" is not an error (notice the log is at > "info:" > level), just an indication that something in the

Re: [ClusterLabs] if resourceA starts @nodeA then start resource[xy] @node[xy]

2017-09-26 Thread Klaus Wenninger
On 09/26/2017 02:06 PM, lejeczek wrote: > hi fellas > > can something like in the subject pacemaker do? And if yes then how to > do it? You could bind ResourceA to nodeA and resource[xy] to node[xy] via location constraints. Afterwards you could make resource[xy] depend on ResourceA - without

Re: [ClusterLabs] can't create master/slave resource

2017-09-20 Thread Klaus Wenninger
ist: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > http://lists.clusterlabs.org/mailman/listinfo/users > <http://lists.clusterlabs.org/mailman/listinfo/users> > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pd

Re: [ClusterLabs] Pacemaker 1.1.18 deprecation warnings

2017-09-19 Thread Klaus Wenninger
On 09/18/2017 07:48 PM, Ken Gaillot wrote: > As discussed at the recent ClusterLabs Summit, I plan to start the > release cycle for Pacemaker 1.1.18 soon. > > There will be the usual bug fixes and a few small new features, but the > main goal will be to provide a final 1.1 release that Pacemaker

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Klaus Wenninger
On 09/11/2017 12:32 PM, Jan Friesse wrote: > Ferenc, > >> wf...@niif.hu (Ferenc Wágner) writes: >> >>> Jan Friesse writes: >>> wf...@niif.hu writes: > In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day > (in August; in May, it happened 0-2

Re: [ClusterLabs] XenServer guest and host watchdog

2017-09-08 Thread Klaus Wenninger
On 09/08/2017 02:57 PM, Mark Syms wrote: > > Hi Klaus, > >   > > Good to meet you and everyone else at the summit. > >   > > As we discussed regarding the handling of watchdog in XenServer, both > guest and host, I’ve had a discussion with our subject matter expert > (Andrew, cc’d) on this topic.

Re: [ClusterLabs] Clusterlabs Summit: Expect rain tomorrow

2017-09-05 Thread Klaus Wenninger
On 09/06/2017 06:59 AM, Digimer wrote: > On 2017-09-06 01:30 AM, Adam Spiers wrote: >> Kristoffer Gronlund wrote: >>> Hey everyone! >>> >>> I am going to try to be at the event area at 8 in the morning tomorrow, >>> and I wouldn't recommend showing up earlier than that. The

Re: [ClusterLabs] pacemaker fencing

2017-09-05 Thread Klaus Wenninger
On 09/05/2017 02:43 PM, Papastavros Vaggelis wrote: > Dear friends , > > I have two_nodes (sgw-01 and sgw-02) HA cluster integrated with two > APC PDUs as fence devices > > 1) pcs stonith create node2-psu fence_apc ipaddr="10.158.0.162" > login="apc" passwd="apc" port="1" pcmk_host_list="sgw-02" >

Re: [ClusterLabs] Pacemaker stopped monitoring the resource

2017-09-05 Thread Klaus Wenninger
.sandvine.com>    cib: info: > cib_perform_op:    ++ > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']: >   > value="1"/> > > I was suspecting around the highlighted parts of the logs above.  > After 09:10

Re: [ClusterLabs] Pacemaker stopped monitoring the resource

2017-09-02 Thread Klaus Wenninger
On 09/01/2017 11:45 PM, Ken Gaillot wrote: > On Fri, 2017-09-01 at 15:06 +0530, Abhay B wrote: >> Are you sure the monitor stopped? Pacemaker only logs >> recurring monitors >> when the status changes. Any successful monitors after this >> wouldn't be >>

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Klaus Wenninger
On 08/31/2017 11:58 PM, Ferenc Wágner wrote: > Klaus Wenninger <kwenn...@redhat.com> writes: > >> Just seen that you are hosting VMs which might make you use KSM ... >> Don't fully remember at the moment but I have some memory of >> issues with KSM and page-lo

Re: [ClusterLabs] Antw: Re: Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-30 Thread Klaus Wenninger
On 08/30/2017 12:35 PM, Ulrich Windl wrote: Ferenc Wágner schrieb am 30.08.2017 um 10:51 in Nachricht > <87h8wpa16c.fsf...@lant.ki.iif.hu>: > > [...] >> Do you mean the LVM metadata read latency as seen by the LVM tools or >> that of a mirrored data region? > [...] > >

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-30 Thread Klaus Wenninger
On 08/30/2017 08:54 AM, Jan Friesse wrote: > Ferenc, > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a >>

Re: [ClusterLabs] RHEL 7.4 cluster cannot commit suicide (sbd)

2017-08-27 Thread Klaus Wenninger
On 08/26/2017 09:53 PM, strahil nikolov wrote: > Hello everyone, > > as this is my first usage (writing) to mailing list , please excuse me. > > Here is the reason I'm writing to you. I have 3 VM machines (kvm/qemu), > watchdog of type 'i6300esb' with RHEL 7.4 and iscsi target as a shared >

Re: [ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow! -- RESOLVED!

2017-08-17 Thread Klaus Wenninger
On 08/03/2017 07:52 PM, Eric Robinson wrote: > For anyone else who has this problem, we have reduced the time required to > trim a 1.3TB volume from 3 days to 1.5 minutes. > > Initially, we had used mdraid to build a raid0 array with a 32K chunk size. > We initialized it as a drbd disk, synced

Re: [ClusterLabs] dry-run an alert?

2017-08-14 Thread Klaus Wenninger
On 08/07/2017 07:02 PM, Ken Gaillot wrote: > On Mon, 2017-08-07 at 17:48 +0100, lejeczek wrote: >> hi everyone >> >> I wonder, is it possible to dry-run an alert agent? Test it >> somehow without the actual event taking place? >> >> >> many thanks. >> L. > There's no special tool to do so, but it

Re: [ClusterLabs] Antw: Growing a cluster from 1 node without fencing

2017-08-14 Thread Klaus Wenninger
On 08/14/2017 03:12 PM, Edwin Török wrote: > On 14/08/17 13:46, Klaus Wenninger wrote: > > How does your /etc/sysconfig/sbd look like? > > With just that pcs-command you get some default-config with > > watchdog-only-support. > > It currently looks like this: > >

Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients

2017-08-14 Thread Klaus Wenninger
simple: crmd of node1 just didn't have anything to do with shifting the resource from node2 -> node3. There is no additional information passed between the nodes just to create a full set of notifications on every node. If you want to have a full log (or whatever you altert-agent is doing)

Re: [ClusterLabs] Antw: Growing a cluster from 1 node without fencing

2017-08-14 Thread Klaus Wenninger
On 08/14/2017 12:20 PM, Ulrich Windl wrote: > Hi! > > Have you tried studying the logs? Usually you get useful information from > there (to share!). > > Regards, > Ulrich > Edwin Török schrieb am 14.08.2017 um 11:51 in > Nachricht

Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients

2017-08-14 Thread Klaus Wenninger
On 08/14/2017 12:32 PM, Sriram wrote: > Hi Ken, > > I used the alerts as well, seems to be not working. > > Please check the below configuration > [root@node1 alerts]# pcs config show > Cluster Name: > Corosync Nodes: > Pacemaker Nodes: > node1 node2 node3 > > Resources: > Resource: TRR

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
958 > C 200350 vedená u Městského soudu v Praze > > Banka: Fio banka a.s. > Číslo účtu: 2400330446/2010 > BIC: FIOBCZPPXX > IBAN: CZ82 2010 0024 0033 0446 > >> On 24 Jul 2017, at 21:16, Klaus Wenninger <kwenn...@redhat.com >> <mailto:kwenn...@redhat.com&

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
ssions for ‘hacluster’, should help. > If you don't see your fencing device assuming after some time the the corresponding node will probably be down is quite risky in my opinion. But why not assure it to be down using a watchdog? > > > Thanx. > > > > > > *From:*Kl

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
orum in your 2-node-setup with e.g. qdevice or using a shared disk with sbd (not directly pacemaker quorum here but similar thing handled inside sbd). > Since the alerts are issued from ‘hacluster’ login, sudo permissions > for ‘hacluster’ needs to be configured. > > > &g

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
čtu: 2400330446/2010 > BIC: FIOBCZPPXX > IBAN: CZ82 2010 0024 0033 0446 > >> On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenn...@redhat.com >> <mailto:kwenn...@redhat.com>> wrote: >> >> On 07/24/2017 05:15 PM, Tomer Azran wrote: >>> I s

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
reliably and that the other node is assuming it to be down after a timeout you configured using cluster property stonith-watchdog-timeout. > > > From: Klaus Wenninger > Sent: Monday, July 24, 18:28 > Subject: Re: [ClusterLabs] Two nodes cluster issue > To: Cluster Labs - All topics relate

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
> > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure Red Hat kwenn...@redhat.com

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger
oc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home

Re: [ClusterLabs] (no subject)

2017-07-20 Thread Klaus Wenninger
On 07/20/2017 07:21 AM, ArekW wrote: > Hi, How to properly unset a value with pcs? Set to false or null gives error: > > # pcs stonith update vbox-fencing verbose=false --force > or > # pcs stonith update vbox-fencing verbose= --force The latter should be fine actually. Regards, Klaus > > Jul 20

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-14 Thread Klaus Wenninger
Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Softw

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Klaus Wenninger
On 07/06/2017 04:48 PM, Ken Gaillot wrote: > On 07/06/2017 09:26 AM, Klaus Wenninger wrote: >> On 07/06/2017 04:20 PM, Cesar Hernandez wrote: >>>> If node2 is getting the notification of its own fencing, it wasn't >>>> successfully fenced. Successful fenc

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Klaus Wenninger
ul 5 10:41:59 node2 attrd[606]: notice: Invoking handler for signal 15: > Terminated > Jul 5 10:41:59 node2 attrd[606]: notice: Exiting... > Jul 5 10:41:59 node2 corosync[585]: [pcmk ] info: pcmk_ipc_exit: Client > attrd (conn=0x2280ef0, async-conn=0x2280ef0) left > > > ______

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 04:50 PM, Cesar Hernandez wrote: >> Not a good idea probably - and the reason for what you are experiencing ;-) >> If you have problems starting the nodes within a certain time-window >> disabling startup-fencing might be an option to consider although dangerous. >> But you

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 04:22 PM, Cesar Hernandez wrote: > >> Are you logging which ones went OK and which failed. >> The script returns negatively if both go wrong? > The script always returns OK Not a good idea probably - and the reason for what you are experiencing ;-) If you have problems starting the

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 08:50 AM, Cesar Hernandez wrote: >> Might be kind of a strange race as well ... but without knowing what the >> script actually does ... >> > The script first try to reboot the node using ssh, something like ssh $NODE > reboot -f, then runs a remote reboot using AWS api Are you

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Klaus Wenninger
On 07/04/2017 04:52 PM, Cesar Hernandez wrote: >> The first line is the consequence of the 2nd. >> And the 1st says that node2 just has seen some fencing-resource >> positively reporting to have fenced himself - which >> is why crmd is exiting in a way that it is not respawned >> by pacemakerd. >

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Klaus Wenninger
On 07/04/2017 03:28 PM, Cesar Hernandez wrote: >> Agreed, I don't think it's multicast vs unicast. >> >> I can't see from this what's going wrong. Possibly node1 is trying to >> re-fence node2 when it comes back. Check that the fencing resources are >> configured correctly, and check whether node1

Re: [ClusterLabs] Question about STONITH for VM HA cluster in shared hosts environment

2017-06-30 Thread Klaus Wenninger
On 06/29/2017 07:23 PM, Ken Gaillot wrote: > On 06/29/2017 12:08 PM, Digimer wrote: >> On 29/06/17 12:39 PM, Andrés Pozo Muñoz wrote: >>> Hi all, >>> >>> I am a newbie to Pacemaker and I can't find the perfect solution for my >>> problem (probably I'm missing something), maybe someone can give me

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-06-28 Thread Klaus Wenninger
On 06/28/2017 05:29 PM, Klaus Wenninger wrote: > On 05/08/2017 09:20 PM, Lentes, Bernd wrote: >> Hi, >> >> i remember that digimer often campaigns for a fence delay in a 2-node >> cluster. >> E.g. here: >> http://oss.clusterlabs.org/pipermail/pacema

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-06-28 Thread Klaus Wenninger
On 05/08/2017 09:20 PM, Lentes, Bernd wrote: > Hi, > > i remember that digimer often campaigns for a fence delay in a 2-node > cluster. > E.g. here: > http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html > In my eyes it makes sense, so i try to establish that. I have two HP

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-28 Thread Klaus Wenninger
On 06/27/2017 09:22 PM, Dan Ragle wrote: > > > On 6/19/2017 5:32 AM, Klaus Wenninger wrote: >> On 06/16/2017 09:08 PM, Ken Gaillot wrote: >>> On 06/16/2017 01:18 PM, Dan Ragle wrote: >>>> >>>> On 6/12/2017 10:30 AM, Ken Gaillot wrote: >

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-19 Thread Klaus Wenninger
On 06/16/2017 09:08 PM, Ken Gaillot wrote: > On 06/16/2017 01:18 PM, Dan Ragle wrote: >> >> On 6/12/2017 10:30 AM, Ken Gaillot wrote: >>> On 06/12/2017 09:23 AM, Klaus Wenninger wrote: >>>> On 06/12/2017 04:02 PM, Ken Gaillot wrote: >>>>> On 06/1

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Klaus Wenninger
_ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Kl

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Klaus Wenninger
On 06/12/2017 03:24 PM, Dan Ragle wrote: > > > On 6/12/2017 2:03 AM, Klaus Wenninger wrote: >> On 06/10/2017 05:53 PM, Dan Ragle wrote: >>> >>> >>> On 5/25/2017 5:33 PM, Ken Gaillot wrote: >>>> On 05/24/2017 12:27 PM, Dan Ragle wrote: >>

Re: [ClusterLabs] cluster setup for nodes at KVM guest

2017-06-08 Thread Klaus Wenninger
On 06/09/2017 07:43 AM, ashutosh tiwari wrote: > Hi, > > We have two node cluster(ACTIVE/STANDBY). > Recently we moved these nodes to KVM. > > When we create a private virtual network and use this vnet for > assigning cluster interfaces then things work as expected and both the > nodes are able

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Klaus Wenninger
On 05/17/2017 03:33 PM, Lentes, Bernd wrote: > > - On May 17, 2017, at 2:58 PM, Klaus Wenninger kwenn...@redhat.com wrote: > > >>> I don't see that. >> fence_* are the RHCS-style fence-agents coming mainly from >> https://github.com/ClusterLabs/fence-agents. &g

Re: [ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate

2017-05-17 Thread Klaus Wenninger
On 05/17/2017 11:28 AM, 井上 和徳 wrote: > Hi, > I'm testing Pacemaker-1.1.17-rc1. > The number of failures in "Too many failures (10) to fence" log does not > match the number of actual failures. Well it kind of does as after 10 failures it doesn't try fencing again so that is what failures stay at

Re: [ClusterLabs] pacemaker remote node ofgline after reboot

2017-05-15 Thread Klaus Wenninger
On 05/15/2017 01:16 PM, Ignazio Cassano wrote: > Hello, cluster-recheck-interval=1min. > > > When I use the sytax: > " pcs resource create computenode1 ocf:pacemaker:remote" > > the name is resolved in /etc/hosts Just wanted to know if you have it in /etc/hosts ... > > When I use the syntax: >

<    1   2   3   4   5   6   >