Re: [ClusterLabs] Wireshark dissector for corosync/totem packes.

2015-10-22 Thread Christine Caulfield
On 20/10/15 21:20, Vallevand, Mark K wrote: > OK, I guess that the data is encrypted in the corosync/token packet. > IIRC you can tell wireshark what the encryption key is and it will decrypt the packets for you. later: https://github.com/masatake/wireshark-plugin-rhcs Chrissie > > Regard

Re: [ClusterLabs] required nodes for quorum policy

2015-11-10 Thread Christine Caulfield
On 09/11/15 22:20, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding the policy to check for cluster quorum for > corosync+pacemaker. > > As far as I know at present it is always (excpected_votes)/2 + 1. Seems > like "qdiskd" has an option to change it, but it is not clear to me if >

Re: [ClusterLabs] Anyone successfully install Pacemaker/Corosync on Freebsd?

2015-12-22 Thread Christine Caulfield
On 21/12/15 16:12, Ken Gaillot wrote: > On 12/19/2015 04:56 PM, mike wrote: >> Hi All, >> >> just curious if anyone has had any luck at one point installing >> Pacemaker and Corosync on FreeBSD. I have to install from source of >> course and I've run into an issue when running ./configure while try

[ClusterLabs] [Announce] libqb 10.rc1 release

2016-01-14 Thread Christine Caulfield
I am pleased to announce the 1.0rc1 release of libqb This is a bugfix release and I hope to work towards a full 1.0 release in February or March depending on feedback. Changes from 0.17.2 Improvements to build process includes: Fix format string for C++11 ipc: Prevent fd and memory

[ClusterLabs] [Announce] libqb 10.rc2 release

2016-02-02 Thread Christine Caulfield
I am pleased to announce the second 1.0 release candidate release of libqb. Huge thanks to all those who have contributed to this release. This is a bugfix release and I hope to work towards a full 1.0 release in late February or early March depending on feedback. Changes from 1.0rc2 are mainly f

Re: [ClusterLabs] [Announce] libqb 1.0rc2 release (fixed subject)

2016-02-04 Thread Christine Caulfield
On 03/02/16 17:45, Jan Pokorný wrote: > On 02/02/16 11:05 +0000, Christine Caulfield wrote: >> I am pleased to announce the second 1.0 release candidate release of >> libqb. Huge thanks to all those who have contributed to this release. > > IIUIC, good news is that so f

Re: [ClusterLabs] Too quick node reboot leads to failed corosync assert on other node(s)

2016-02-18 Thread Christine Caulfield
On 12/02/16 15:51, Jan Friesse wrote: > Michal, > > >> Hello. >> >> The subject is just a hypothesis that I'd like to confirm/discuss here. >> >> TL;DR Token timeout shouldn't be greater than reboot cycle, is that >> correct? >> > > actually when I was reading your mail it was like "this sounds

[ClusterLabs] [Announce] libqb 10.rc3 release

2016-02-25 Thread Christine Caulfield
I am pleased to announce the third 1.0 release candidate release of libqb. Huge thanks to all those who have contributed to this release. There are a couple of tiny things to tidy up before 1.0 so I'm going to try and close the release off by the end of next week (5th march) unless anything seriou

Re: [ClusterLabs] [Announce] libqb 10.rc3 release

2016-03-01 Thread Christine Caulfield
On 26/02/16 07:58, Jan Pokorný wrote: > On 26/02/16 14:27 +0900, Keisuke MORI wrote: >> As of libqb-1.0rc3, Pacemaker fails to build upon it with the gcc >> warnings as below. >> There was no such a problem until 1.0rc2, and it seems that the >> changes in the pull request #175 is related. >> >> ht

[ClusterLabs] [Announce] libqb 10.rc4 release

2016-03-19 Thread Christine Caulfield
This is a bugfix release and a potential 1.0 candidate. There are no actual code changes in this release, most of the patches are to the build system. Thanks to Jan Pokorný for, er, all of them. I've bumped the library soname to 0.18.0 which should really have happened last time. Changes from 1.0

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-21 Thread Christine Caulfield
On 19/03/16 15:43, Digimer wrote: > On 19/03/16 10:10 AM, Dennis Jacobfeuerborn wrote: >> On 18.03.2016 00:50, Digimer wrote: >>> On 17/03/16 07:30 PM, Christopher Harvey wrote: On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote: > On 03/17/2016 05:10 PM, Christopher Harvey wrote: >>

[ClusterLabs] [Announce] libqb 1.0 release

2016-04-01 Thread Christine Caulfield
I am very pleased to announce the 1.0 release of libqb This release is identical to 1.0rc4 but with the doxygen generation fixed. Huge thanks you to all of the people who have contributed to this release. Chrissie The current release tarball is here: https://github.com/ClusterLabs/libqb/release

Re: [ClusterLabs] [Announce] libqb 1.0 release

2016-04-03 Thread Christine Caulfield
On 01/04/16 16:57, Digimer wrote: > On 01/04/16 08:59 AM, Christine Caulfield wrote: >> I am very pleased to announce the 1.0 release of libqb >> >> This release is identical to 1.0rc4 but with the doxygen generation fixed. >> >> Huge thanks you to all of the pe

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Christine Caulfield
On 16/06/16 13:22, Vladislav Bogdanov wrote: > Hi, > > 16.06.2016 14:09, Jan Friesse wrote: >> I am pleased to announce the latest maintenance release of Corosync >> 2.3.6 available immediately from our website at >> http://build.clusterlabs.org/corosync/releases/. >

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Christine Caulfield
On 16/06/16 13:54, Vladislav Bogdanov wrote: > 16.06.2016 15:28, Christine Caulfield wrote: >> On 16/06/16 13:22, Vladislav Bogdanov wrote: >>> Hi, >>> >>> 16.06.2016 14:09, Jan Friesse wrote: >>>> I am pleased to announce the latest maintenance rele

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Christine Caulfield
On 16/06/16 13:54, Vladislav Bogdanov wrote: > 16.06.2016 15:28, Christine Caulfield wrote: >> On 16/06/16 13:22, Vladislav Bogdanov wrote: >>> Hi, >>> >>> 16.06.2016 14:09, Jan Friesse wrote: >>>> I am pleased to announce the latest maintenance rele

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Christine Caulfield
On 16/06/16 14:09, Vladislav Bogdanov wrote: > 16.06.2016 16:04, Christine Caulfield wrote: >> On 16/06/16 13:54, Vladislav Bogdanov wrote: >>> 16.06.2016 15:28, Christine Caulfield wrote: >>>> On 16/06/16 13:22, Vladislav Bogdanov wrote: >>>>> Hi, &g

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-27 Thread Christine Caulfield
On 26/06/16 14:47, Lentes, Bernd wrote: > > > - Am 26. Jun 2016 um 7:59 schrieb Ferenc Wágner wf...@niif.hu: > >> "Lentes, Bernd" writes: >> >>> wf...@niif.hu writes: >>> "Lentes, Bernd" writes: > is it possible to have a DLM running without CRM? Yes. You'll need to

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-27 Thread Christine Caulfield
On 27/06/16 15:27, Lentes, Bernd wrote: > > - Am 27. Jun 2016 um 9:04 schrieb Christine Caulfield ccaul...@redhat.com: > >> On 26/06/16 14:47, Lentes, Bernd wrote: >>> >>> >>> - Am 26. Jun 2016 um 7:59 schrieb Ferenc Wágner wf...@niif.hu: >&

Re: [ClusterLabs] Limiting number of nodes that can join into a cluster

2016-06-28 Thread Christine Caulfield
On 28/06/16 13:27, Nikhil Utane wrote: > Hi Klaus, > > I am using multicast to avoid having to configure the host names. > To be honest, if you're serious about keeping the number of nodes down, then careful management is the way to do it, looking for a technical fix is not the answer. Yes, you

Re: [ClusterLabs] corosync just listening on 127.0.0.1

2016-07-04 Thread Christine Caulfield
On 01/07/16 18:31, Lentes, Bernd wrote: > Hi, > > i'm currently establishing a two-node cluster and i'm playing around with it. > I have two nodes, both have a bond-device. It is intended for DRBD, MySQL > replication and the inter-cluster-communication. > Each bond has a private IP-address (192.

Re: [ClusterLabs] Antw: Re: corosync just listening on 127.0.0.1

2016-07-04 Thread Christine Caulfield
On 04/07/16 10:28, Ulrich Windl wrote: >>>> Christine Caulfield schrieb am 04.07.2016 um 09:26 in > Nachricht <577a0f98.5020...@redhat.com>: > [...] >>> ... >>> interface { >>> >>> ringnumber: 0 >>>

Re: [ClusterLabs] Antw: Re: corosync just listening on 127.0.0.1

2016-07-04 Thread Christine Caulfield
On 04/07/16 10:35, Christine Caulfield wrote: > On 04/07/16 10:28, Ulrich Windl wrote: >>>>> Christine Caulfield schrieb am 04.07.2016 um 09:26 >>>>> in >> Nachricht <577a0f98.5020...@redhat.com>: >> [...] >>>> ... &

Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-18 Thread Christine Caulfield
On 18/07/16 07:59, Da Shi Cao wrote: > dlm_controld is very tightly coupled with cman. I have built a cluster purely > with pacemaker+corosync+fence_sanlock. But if agent ocf:pacemaker:controld is > desired, dlm_controld must exist! I can only find it in cman. > Can the command dlm_controld be ob

Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 14:29, Thomas Lamprecht wrote: > Hi all, > > I'm currently testing the new features of corosync 2.4, especially > qdevices. > First tests show quite nice results, like having quorum on a single node > left out of a three node cluster. > > But what I'm a bit worrying about is what happ

Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 14:51, Christine Caulfield wrote: > On 25/07/16 14:29, Thomas Lamprecht wrote: >> Hi all, >> >> I'm currently testing the new features of corosync 2.4, especially >> qdevices. >> First tests show quite nice results, like having quorum on a s

Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 16:27, Klaus Wenninger wrote: > On 07/25/2016 04:56 PM, Thomas Lamprecht wrote: >> Thanks for the fast reply :) >> >> >> On 07/25/2016 03:51 PM, Christine Caulfield wrote: >>> On 25/07/16 14:29, Thomas Lamprecht wrote: >>>> Hi all, >

Re: [ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-20 Thread Christine Caulfield
On 20/09/16 10:46, Thomas Lamprecht wrote: > Hi, > > when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a > IP address, > which does not resolve to a hostname, I get the nodes IP addresses for > the 'Name' column. > > As I'm using the nodelist.node.X.name key to set the name of

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Christine Caulfield
On 10/10/16 05:51, Eric Robinson wrote: > I have about a dozen corosync+pacemaker clusters and I am just now getting > around to understanding timeouts. > > Most of my corosync.conf files look something like this: > > version:2 > token: 5000 > token_retra

Re: [ClusterLabs] Establishing Timeouts

2016-10-11 Thread Christine Caulfield
On 10/10/16 19:35, Eric Robinson wrote: > Basically, when we turn off a switch, I want to keep the cluster from failing > over before Linux bonding has had a chance to recover. > > I'm mostly interested in prventing false-positive cluster failovers that > might occur during manual network maint

Re: [ClusterLabs] Antw: Re: Establishing Timeouts

2016-10-11 Thread Christine Caulfield
On 11/10/16 08:22, Vladislav Bogdanov wrote: > 11.10.2016 09:31, Ulrich Windl wrote: > Klaus Wenninger schrieb am 10.10.2016 um > 20:04 in >> Nachricht <936e4d4b-df5c-246d-4552-5678653b3...@redhat.com>: >>> On 10/10/2016 06:58 PM, Eric Robinson wrote: Thanks for the clarification. So

[ClusterLabs] [corosync] Master branch

2016-10-11 Thread Christine Caulfield
and allows dynamic reconfiguration of interfaces. It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have plagued corosync/openais from day 1 Signed-off-by: Christine Caulfield Chrissie ___ Users mailing list: Users@cl

Re: [ClusterLabs] [corosync] Master branch

2016-10-11 Thread Christine Caulfield
On 11/10/16 12:07, Dennis Jacobfeuerborn wrote: > On 11.10.2016 12:42, Christine Caulfield wrote: >> I've just committed a bit patch to the master branch of corosync - it is >> now all very experimental, and existing pull requests against master >> might need to be checke

[ClusterLabs] libqb 1.0.1 release

2016-11-24 Thread Christine Caulfield
I am very pleased to announce the 1.0.1 release of libqb This is a bugfix release with mainly lots of small amendments. Low: ipc_shm: fix superfluous NULL check log: Don't overwrite valid tags Low: further avoid magic in qblog.h by using named constants Low: log: check for appropriate space when

Re: [ClusterLabs] Corosync maximum nodes

2017-01-30 Thread Christine Caulfield
On 27/01/17 09:43, Гюльнара Невежина wrote: > Hello! > I'm very sorry to disturb you with such question but I can't find > information if there is maximum nodes' limit in corosync? I've found a > bug report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with > "Corosync has hardcoded maximum

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Christine Caulfield
On 15/02/17 14:50, Jan Friesse wrote: >> Hi all, >> >> Corosync Cluster Engine, version '2.3.4' >> Copyright (c) 2006-2009 Red Hat, Inc. >> >> Today I found corosync consuming 100% cpu. Strace showed following: >> >> write(7, "\v\0\0\0", 4) = -1 EAGAIN (Resource >> temporarily unava

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 03:51, cys wrote: > At 2017-02-15 23:13:08, "Christine Caulfield" wrote: >> >> Yes, it seems that some corosync SEGVs trigger this obscure bug in >> libqb. I've chased a few possible causes and none have been fruitful. >> >> If you ge

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
;t work (it was worth a try!) Thanks Chrissie > Unfortunately corosync was restarted yesterday, and I can't get the blackbox > dump covering the day the incident occurred. > > At 2017-02-16 16:00:05, "Christine Caulfield" wrote: >> On 16/02/17 03:51, cys wrote: &g

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-20 Thread Christine Caulfield
work corruption or on-wire incompatibilities. Has it happened before? Chrissie > At 2017-02-16 19:38:03, "Christine Caulfield" wrote: >> On 16/02/17 09:31, cys wrote: >>> The attachment includes coredump and logs just before corosync went wrong. >>&g

Re: [ClusterLabs] Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-03 Thread Christine Caulfield
On 03/03/17 12:59, Ulrich Windl wrote: > Hello! > > After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a > "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying" message > when I expected the node to joint the cluster. What can be the reasons for > this? > In fact t

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-09 Thread Christine Caulfield
On 08/03/17 11:04, cys wrote: > At 2017-02-21 00:24:33, "Christine Caulfield" wrote: >> Thanks, I can read that core now. It's something odd happening in the >> sync() code that I can't quite diagnose without the blackbox. We've only >> ever se

Re: [ClusterLabs] corosync cannot acquire quorum

2017-03-13 Thread Christine Caulfield
On 11/03/17 02:50, cys wrote: > We have a cluster containing 3 nodes(nodeA, nodeB, nodeC). > After nodeA is taken offline(by ifdown, this may be not right?), ifdown isn't right, no. you need to do a physical cable pull or use iptables to simulate loss of traffic, ifdown does odd things to corosyn

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-14 Thread Christine Caulfield
On 11/03/17 01:32, cys wrote: > At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >> Thanks. Oddly that looks like a totally different incident to the core >> file we had last time. That seemed to be in a node state transition >> whereas this is in stable runnin

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-29 Thread Christine Caulfield
On 24/03/17 20:44, Seth Reid wrote: > I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in > production yet because I'm having a problem during fencing. When I > disable the network interface of any one machine, If you mean by using ifdown or similar then ... don't do that. A pr

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-18 Thread Christine Caulfield
> > This isn't the first time this has come up, so I decided to elaborate on > this email by writing an article on the topic. > > It's a first-draft so there are likely spelling/grammar mistakes. > However, the body is done. > > https://www.alteeve.com/w/The_2-Node_Myth > An excellent article

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Christine Caulfield
On 18/04/17 15:02, Digimer wrote: > On 18/04/17 10:00 AM, Digimer wrote: >> On 18/04/17 03:47 AM, Ulrich Windl wrote: >> Digimer schrieb am 16.04.2017 um 20:17 in Nachricht >>> <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>: On 16/04/17 01:53 PM, Eric Robinson wrote: > I was readin

[ClusterLabs] [announce] libqb 1.0.2 release

2017-05-19 Thread Christine Caulfield
I am pleased to announce the 1.0.2 release of libqb This is mainly a bug-fix release to 1.0.1. There is one new feature added and that is the option to use filesystem sockets (as opposed to the more usual abstract sockets) on Linux. CI: make travis watch for the issue CI: travis: fix dh

Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Christine Caulfield
On 25/05/17 15:48, Rui Feng wrote: > Hi, > > I have a test based on corosync 2.3.4, and find the data stored by > cmap( corosync-cmapctl -s test i8 1) which can't be sync to other > node. > Could somebody give some comment or solution for it, thanks! > > cmap isn't replicated across the clus

Re: [ClusterLabs] Introducing the Anvil! Intelligent Availability platform

2017-07-06 Thread Christine Caulfield
On 05/07/17 14:55, Ken Gaillot wrote: > Wow! I'm looking forward to the September summit talk. > Me too! Congratulations on the release :) Chrissie > On 07/05/2017 01:52 AM, Digimer wrote: >> Hi all, >> >> I suspect by now, many of you here have heard me talk about the Anvil! >> intellige

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-12 Thread Christine Caulfield
On 12/10/17 11:54, Jan Friesse wrote: > Jonathan, > >> >> >> On 12/10/17 07:48, Jan Friesse wrote: >>> Jonathan, >>> I believe main "problem" is votequorum ability to work during sync >>> phase (votequorum is only one service with this ability, see >>> votequorum_overview.8 section VIRTUAL SYNCHRO

[ClusterLabs] [Announce] libqb 1.0.3 release

2017-12-21 Thread Christine Caulfield
We are pleased to announce the release of libqb 1.0.3 Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.xz This is mainly a bug-fix release to 1.0.2 Christine Caulfield (6): tests: Fix signal handling in check_ipc.c test: Disable

[ClusterLabs] [corosync] Document on configuring corosync3 with knet

2018-01-16 Thread Christine Caulfield
Hi All, To get people started with the new things going on with kronosnet and corosync3, I've written a document which explains what you can do with the new configuration options, how to set up multiple links and much, much more. It might be helpful for people who want to write configuration tool

Re: [ClusterLabs] [corosync] Document on configuring corosync3 with knet

2018-03-02 Thread Christine Caulfield
On 16/01/18 13:46, Christine Caulfield wrote: > Hi All, > > To get people started with the new things going on with kronosnet and > corosync3, I've written a document which explains what you can do with > the new configuration options, how to set up multiple links and much,

Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-03-13 Thread Christine Caulfield
On 09/03/18 16:26, Jan Friesse wrote: > Thomas, > >> Hi, >> >> On 3/7/18 1:41 PM, Jan Friesse wrote: >>> Thomas, >>> First thanks for your answer! On 3/7/18 11:16 AM, Jan Friesse wrote: > > ... > >> TotemConfchgCallback: ringid (1.1436) >> active processors 3: 1 2 3 >> EXIT >> Fin

Re: [ClusterLabs] Announcing the first ClusterLabs video karaoke contest!

2018-04-03 Thread Christine Caulfield
On 03/04/18 07:14, Klaus Wenninger wrote: > On 04/02/2018 02:57 AM, Digimer wrote: >> On 2018-04-01 05:30 PM, Ken Gaillot wrote: >>> In honor of the recent 10th anniversary of the first public release of >>> Pacemaker, ClusterLabs is proud to announce its first video karaoke >>> contest! >>> >>> To

Re: [ClusterLabs] Failure of preferred node in a 2 node cluster

2018-04-29 Thread Christine Caulfield
On 29/04/18 13:22, Andrei Borzenkov wrote: > 29.04.2018 04:19, Wei Shan пишет: >> Hi, >> >> I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I >> understand this is a really bad setup but this is what the end-user wants. >> >> ATB => auto_tie_breaker >> >> "When the auto_tie_b

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
On 07/06/18 09:21, Prasad Nagaraj wrote: > Hi - I am running corosync on  3 nodes of CentOS release 6.9 (Final). > Corosync version is  corosync-1.4.7. > The nodes are not seeing each other and not able to form memberships. > What I see is continuous message about " A processor joined or left the >

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
> length 332 > 10:25:30.910820 IP 172.22.0.4.34060 > 172.22.0.11.netsupport: UDP, > length 376 > 10:25:30.923403 IP 172.22.0.13.57332 > 172.22.0.11.netsupport: UDP, > length 332 > 10:25:30.946507 IP 172.22.0.11.54545 > 172.22.0.4.netsupport: UDP, >

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
On 07/06/18 15:24, Prasad Nagaraj wrote: > > No iptables or otherwise firewalls are setup on these nodes. > > One observation is that each node sends messages on with its own ring > sequence number which is not converging.. I have seen that in a good > cluster, when nodes respond with same sequen

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
Thu, 7 Jun 2018, 8:03 pm Christine Caulfield, <mailto:ccaul...@redhat.com>> wrote: > > On 07/06/18 15:24, Prasad Nagaraj wrote: > > > > No iptables or otherwise firewalls are setup on these nodes. > > > > One observation is that each nod

Re: [ClusterLabs] corosync not able to form cluster

2018-06-08 Thread Christine Caulfield
it never gets out of the JOIN "Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11." process so something is wrong on that node, either a rogue routing table entry, dangling iptables rule or even a broken NIC. Chrissie > Thanks! > > On Thu, Jun 7, 2018 at 8:43 P

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-19 Thread Christine Caulfield
On 19/06/18 02:46, Jason Gauthier wrote: > Greetings, > >I've just discovered corosync-qdevice and corosync-qnet. > (Thanks Ken Gaillot) . Set up was pretty quick. > > I enabled qnet off cluster. I followed the steps presented by > corosync-qdevice-net-certutil.However, when running > co

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-19 Thread Christine Caulfield
On 19/06/18 11:44, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield > wrote: >> >> On 19/06/18 02:46, Jason Gauthier wrote: >>> Greetings, >>> >>>I've just discovered corosync-qdevice and corosync-qnet. >>>

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 19/06/18 18:47, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield > wrote: >> >> On 19/06/18 11:44, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield >>> wrote: >>>> >>>&

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 12:05, Jason Gauthier wrote: > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield > wrote: >> >> On 19/06/18 18:47, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield >>> wrote: >>>> >>>> O

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 14:27, Christine Caulfield wrote: > On 21/06/18 12:05, Jason Gauthier wrote: >> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield >> wrote: >>> >>> On 19/06/18 18:47, Jason Gauthier wrote: >>>> On Tue, Jun 19, 2018 at 6:58 AM Christine

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
On 21/06/18 16:16, Salvatore D'angelo wrote: > Hi, > > I upgraded my PostgreSQL/Pacemaker cluster with these versions. > Pacemaker 1.1.14 -> 1.1.18 > Corosync 2.3.5 -> 2.4.4 > Crmsh 2.2.0 -> 3.0.1 > Resource agents 3.9.7 -> 4.1.1 > > I started on a first node  (I am trying one node at a time upgr

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
  } >         node { >                 ring0_addr: pg2 >                 ring1_addr: pg2p >                 nodeid: 2 >         } >         node { >                 ring0_addr: pg3 >                 ring1_addr: pg3p >                 nodeid: 3 >         } > }

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
e moment, lets get things mostly working first. If you enable debug logging in corosync.conf: logging { to_syslog: yes debug: on } Then see what happens and post the syslog file that has all of the corosync messages in it, we'll take it from there. Chrissie >> On 22 Ju

Re: [ClusterLabs] Upgrade corosync problem

2018-06-25 Thread Christine Caulfield
ection FAILED: Resource temporarily unavailable (11) [17323] pg1 corosyncerror [QB] Error in connection setup (17324-17334-23): Resource temporarily unavailable (11) [17323] pg1 corosyncdebug [QB] qb_ipcs_disconnect(17324-17334-23) state:0 is /dev/shm full? Chrissie > > >>

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
On 25/06/18 20:41, Salvatore D'angelo wrote: > Hi, > > Let me add here one important detail. I use Docker for my test with 5 > containers deployed on my Mac. > Basically the team that worked on this project installed the cluster on soft > layer bare metal. > The PostgreSQL cluster was hard to te

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
            64M   11M   54M  16% /dev/shm > > but I do not know how to do that. Any suggestion? > According to google, you just add a new line to /etc/fstab for /dev/shm tmpfs /dev/shm tmpfs defaults,size=512m 0 0 Chrissie >> On 26 Jun 2018, at 09:48, Christine Caulfield &g

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
the log file >> before start corosync so it does not contains lines of previous >> executions. >> >> >> But the command: >> corosync-quorumtool -ps >> >> still give: >> Cannot initialize QUORUM service >> >> Consider that f

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
rding to the code. > Anyone can help? > Have you tried downgrading libqb to the previous version to see if it still happens? Chrissie >> On 26 Jun 2018, at 11:56, Christine Caulfield > <mailto:ccaul...@redhat.com>> wrote: >> >> On 26/06/18 10:35,

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
On 26/06/18 11:24, Salvatore D'angelo wrote: > Hi, > > I have tried with: > 0.16.0.real-1ubuntu4 > 0.16.0.real-1ubuntu5 > > which version should I try? Hmm both of those are actually quite old! maybe a newer one? Chrissie > >> On 26 Jun 2018, at 12:03,

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
n google but results where > quite confusing. > It's pretty unlikely to be the crypto libraries. It's almost certainly in libqb, with a small possibility that of corosync. Which versions did you have that worked (libqb and corosync) ? Chrissie > >> On 26 Jun 2018

Re: [ClusterLabs] Upgrade corosync problem

2018-06-29 Thread Christine Caulfield
On 27/06/18 08:35, Salvatore D'angelo wrote: > Hi, > > Thanks for reply and detailed explaination. I am not using the > —network=host option. > I have a docker image based on Ubuntu 14.04 where I only deploy this > additional software: > > *RUN apt-get update && apt-get install -y wget git xz-uti

Re: [ClusterLabs] Upgrade corosync problem

2018-07-01 Thread Christine Caulfield
On 29/06/18 17:20, Jan Pokorný wrote: > On 29/06/18 10:00 +0100, Christine Caulfield wrote: >> On 27/06/18 08:35, Salvatore D'angelo wrote: >>> One thing that I do not understand is that I tried to compare corosync >>> 2.3.5 (the old version that work

Re: [ClusterLabs] Upgrade corosync problem

2018-07-03 Thread Christine Caulfield
On 03/07/18 07:53, Jan Pokorný wrote: > On 02/07/18 17:19 +0200, Salvatore D'angelo wrote: >> Today I tested the two suggestions you gave me. Here what I did. >> In the script where I create my 5 machines cluster (I use three >> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs >>

Re: [ClusterLabs] Found libqb issue that affects pacemaker 1.1.18

2018-07-06 Thread Christine Caulfield
On 06/07/18 10:09, Salvatore D'angelo wrote: > I closed the issue. > Libqb uses tagging and people should not download the Source code (zip) >  or Source > code (tar.gz) . > The foll

Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Christine Caulfield
__verbose != __stop___verbose' failed.* > > anything is logged (even in debug mode). > > I do not understand why installing libqb during the normal upgrade > process fails while if I upgrade it after the > crmsh/pacemaker/corosync/resourceagents upgrade it works fine.  >

Re: [ClusterLabs] short circuiting the corosync token timeout

2018-08-13 Thread Christine Caulfield
On 13/08/18 09:00, Jan Friesse wrote: > Chris Walker napsal(a): >> Hello, >> >> Before Pacemaker can declare a node as 'offline', the Corosync layer >> must first declare that the node is no longer part of the cluster >> after waiting a full token timeout.  For example, if I manually >> STONITH a n

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Christine Caulfield
On 24/09/18 13:12, Ferenc Wágner wrote: > Jan Friesse writes: > >> Have you had a time to play with packaging current alpha to find out >> if there are no issues? I had no problems with Fedora, but Debian has >> a lot of patches, and I would be really grateful if we could reduce >> them a lot - s

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 26/09/18 09:21, Ferenc Wágner wrote: > Jan Friesse writes: > >> wagner.fer...@kifu.gov.hu writes: >> >>> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common >>> choices, but logging.* cmap keys probably fit Corosync better). That >>> would enable proper log rotation. >> >

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 27/09/18 12:52, Ferenc Wágner wrote: > Christine Caulfield writes: > >> I'm looking into new features for libqb and the option in >> https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425 >> looks like a good option to me. > > It f

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 27/09/18 16:01, Ken Gaillot wrote: > On Thu, 2018-09-27 at 09:58 -0500, Ken Gaillot wrote: >> On Thu, 2018-09-27 at 15:32 +0200, Ferenc Wágner wrote: >>> Christine Caulfield writes: >>> >>>> TBH I would be quite happy to leave this to logrotate but the

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-28 Thread Christine Caulfield
On 27/09/18 20:16, Ferenc Wágner wrote: > Christine Caulfield writes: > >> I'm also looking into high-res timestamps for logfiles too. > > Wouldn't that be a useful option for the syslog output as well? I'm > sometimes concerned by the batching effect

Re: [ClusterLabs] Antw: Re: Corosync 3 release plans?

2018-10-01 Thread Christine Caulfield
On 01/10/18 07:45, Ulrich Windl wrote: >>>> Ferenc Wágner schrieb am 27.09.2018 um 21:16 > in > Nachricht <87zhw23g5p@lant.ki.iif.hu>: >> Christine Caulfield writes: >> >>> I'm also looking into high‑res timestamps for logfiles too. >

Re: [ClusterLabs] Antw: Corosync 3.0.0 is available at corosync.org!

2018-12-17 Thread Christine Caulfield
On 17/12/2018 09:34, Ulrich Windl wrote: Jan Friesse schrieb am 14.12.2018 um 15:06 in > Nachricht > <991569e4-2430-30f1-1bbc-827be7637...@redhat.com>: > [...] >> ‑ UDP/UDPU transports are still present, but supports only single ring >> (RRP is gone in favor of Knet) and doesn't support encr

Re: [ClusterLabs] Corosync 3.0.0 is available at corosync.org!

2018-12-17 Thread Christine Caulfield
On 17/12/2018 12:14, Jan Pokorný wrote: > On 17/12/18 10:04 +0000, Christine Caulfield wrote: >> On 17/12/2018 09:34, Ulrich Windl wrote: >>> I wonder: Is there a migration script that can converts corosync.conf files? >>> At least you have a few version components i

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-15 Thread Christine Caulfield
On 14/02/2019 17:33, Edwin Török wrote: > Hello, > > We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and > noticed a fundamental problem with realtime priorities: > - corosync runs on CPU3, and interrupts for the NIC used by corosync are > also routed to CPU3 > - corosync runs

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-15 Thread Christine Caulfield
On 15/02/2019 10:56, Edwin Török wrote: > On 15/02/2019 09:31, Christine Caulfield wrote: >> On 14/02/2019 17:33, Edwin Török wrote: >>> Hello, >>> >>> We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and >>> noticed a fundamental pro

Re: [ClusterLabs] Network split during corosync startup results in split brain

2015-07-20 Thread Christine Caulfield
That's very interesting, and worrying. Can you send me the full logs please (just the corosync ones if they're separated, I don't think pacemaker is involved here). If you still have one node in that state (or can reproduce it) then the output of corosync-cmapctl on both nodes would also he helpfu

[ClusterLabs] [Announce] libqb v0.17.2 release

2015-08-24 Thread Christine Caulfield
This is mainly a bug fix release, but also includes a new split-logging feature. Changes v0.17.1 - v0.17.2 Implement extended information logging (aka split logging) switch libtool soname versioning from -version-number to -version-info High: loop: fixes resource starvation in mainloo

Re: [ClusterLabs] Cluster.conf

2015-08-25 Thread Christine Caulfield
On 25/08/15 14:14, Streeter, Michelle N wrote: > I am using pcs but it does nothing with the cluster.conf file. Also, I am > currently required to use rhel6.6. > > I have not been able to find any documentation on what is required in the > cluster.conf file under the newer versions of pacem

Re: [ClusterLabs] Detect when corosync has started

2015-09-01 Thread Christine Caulfield
On 28/08/15 16:37, Brian Campbell wrote: > I'm running Corosync/Pacemaker on an Ubuntu derivative, and to make > them easier to manage wrote upstart jobs to start them up rather than > using the init scripts. > > After doing so, my config scripts, which start up corosync and > pacemaker and then c

Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Christine Caulfield
On 12/09/15 19:15, Noel Kuntze wrote: > > Hello Digimer, > >> I am a strong believer in "keep it as simple as possible". In a case >> like this, it's hard to argue that any option is simple, but given that >> RRP is baked into the HA stack, I decided to trust it over QoS. I am >> perfectly open t

Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Christine Caulfield
On 14/09/15 12:45, Noel Kuntze wrote: > > Hello Christine, > >> I think it's worth mentioning here that corosync already sets its >> packets to TC_INTERACTIVE (which DLM does not), so they should not need >> too much messing around with in iptables/qdisc > > If that is the case, then why do the

Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Christine Caulfield
On 14/09/15 15:24, Noel Kuntze wrote: > > Hello Christine, > > Do you have a pointer for me where to look in the source? > Searching for TC_INTERACTIVE in the Corosync sources on Github yielded no > results. > How the scheduler handles the packets depends on the settings and type of it, > so yes

Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-15 Thread Christine Caulfield
On 15/09/15 01:01, Digimer wrote: > On 14/09/15 10:46 AM, Noel Kuntze wrote: >> >> Hello Christine, >> >> I googled a bit and some doc[1] says that TC_PRIO_INTERACTIVE maps to value >> 6, whatever that is. >> Assuming that value of 6 is the same as the "priority value", Corosync >> traffic should

  1   2   >