[ClusterLabs] [Announce] libqb 2.0.7 released

2023-07-21 Thread Christine caulfield
We are pleased to announce the release of libqb 2.0.8 https://github.com/ClusterLabs/libqb/releases/tag/v2.0.8 The main purpose of this release is to fix a potential memory overwrite caused by very long log messages, so an upgrade is recommended. __

[ClusterLabs] [Announce] libqb 2.0.7 released

2023-06-07 Thread Christine caulfield
We are pleased to announce the release of libqb 2.0.7 https://github.com/ClusterLabs/libqb/releases/tag/v2.0.7 This release mainly fixes build and test issues (especially building with -j which is now supported), but there are a few obscure bugfixes in here too that are worthwhile upgrading t

Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread Christine caulfield
On 27/03/2023 07:48, d tbsky wrote: Hi: the cluster is running under RHEL 9.0 elements. today I saw log report strange errors like below: Mar 27 13:07:06.287 example.com pacemaker-fenced[2405] (qb_sys_mmap_file_open) error: couldn't allocate file /dev/shm/qb-2405-2403-12-A9UUaJ/qb-re

Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Christine caulfield
Hi, The error is coming from libqb - which is what manages the local IPC connections between local clients and the server. I'm the libqb maintainer but I've never seen that error before! Is there anything unusual about the setup on this node? Like filesystems on NFS or some other networked f

Re: [ClusterLabs] corosync not starting

2022-06-28 Thread Christine caulfield
On 27/06/2022 17:10, Sridhar K wrote: Hi Team, corosync not starting , getting below error  any port number which I can do telnet and check similar to that of 2224 for pcs image.png image.png The error message from Corosync is "no interfaces defined" - so it looks like the node(s) being

Re: [ClusterLabs] No node name in corosync-cmapctl output

2022-06-01 Thread Christine caulfield
On 01/06/2022 11:17, Jan Friesse wrote: On 31/05/2022 16:28, Andreas Hasenack wrote: Hi, On Tue, May 31, 2022 at 1:35 PM Jan Friesse wrote: Hi, On 31/05/2022 15:16, Andreas Hasenack wrote: Hi, corosync 3.1.6 pacemaker 2.1.2 crmsh 4.3.1 TL;DR I only seem to get a "name" attribute in the "

Re: [ClusterLabs] Corosync Transport- Knet Vs UDPU

2022-03-27 Thread Christine caulfield
On 28/03/2022 03:30, Somanath Jeeva via Users wrote: Hi , I am upgrading from corosync 2.x/pacemaker 1.x to corosync 3.x/pacemaker 2.1.x In our use case we are using a 2 node corosync/pacemaker cluster. In corosync 2.x version I was using udpu as transport method. In the corosync 3.x , as p

[ClusterLabs] [Announce] libqb 2.0.6 released

2022-03-23 Thread Christine caulfield
A quick update to 2.0.5 that fixes the tests and RPM building. *the new ipc_sock tests needs to be run as root as otherwise each sub-test will timeout - making the run-time huge. *Make sure that the libstat_wrapper.so library is included in the libqb-tests RPM (when built) If you don'

[ClusterLabs] [Announce] libqb 2.0.5 released

2022-03-21 Thread Christine caulfield
We are pleased to announce the release of libqb 2.0.5 The headline feature of this release is the addition of the new qb_ipcc_connect_async() API call, but there are lots of smaller fixes that should be helpful. Chrissie Caulfield (7): ipcc: Add an async connect API (#450) Tidy some scripts (

[ClusterLabs] [Announce] libqb 2.0.4 released

2021-11-15 Thread Christine caulfield
We are pleased to announce the release of libqb 2.0.4 Source code is available at: https://github.com/ClusterLabs/libqb/releases/ Please use the signed .tar.gz or .tar.xz files with the version number in rather than the github-generated "Source Code" ones. The most important fix in this release

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-21 Thread Christine caulfield
On 21/07/2021 09:50, Frank D. Engel, Jr. wrote: OpenVMS can do this sort of thing without a requirement for fencing (you still need a third disk as a quorum device in a 2-node cluster), but Linux (at least in its current form) cannot. From what I can tell the fencing requirements in the Linux s

[ClusterLabs] [Announce] libqb 2.0.3 released

2021-03-03 Thread Christine Caulfield
We are pleased to announce the release of libqb 2.0.3. This is the latest stable release of libqb Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/2.0.3/libqb-2.0.3.tar.xz Please use the signed .tar.gz or .tar.xz files with the version number in rather than th

Re: [ClusterLabs] Q: effieciently collecting some cluster facts

2021-02-24 Thread Christine Caulfield
The most efficient way of getting corosync facts about nodes/quorum is to use the votequorum API. see /usr/include/corosync/votequorum.h and in the corosync sources tarball tests/testvotequorum1.c CHrissie On 25/02/2021 07:16, Ulrich Windl wrote: Hi! I'm thinking about some simple cluster s

Re: [ClusterLabs] corosync.conf is missing, I did not delete manually. what should I do?

2021-02-16 Thread Christine Caulfield
If you ran pcs cluster destroy then, yes that will delete cluster.conf (at least it did when I just tried it) - which seems reasonable behaviour to me. If you want it back then you should either rerun pcs to create the cluster again or rescue the file from system backups I suppose. Chrissie

Re: [ClusterLabs] Corosync node gets unique Ring ID

2021-01-27 Thread Christine Caulfield
A few things really stand out from this report, I think the inconsistent ring_id is just a symptom. It worries me that corosync-quorumtool behaves differently on some nodes - some show names, some just IP addresses. That could be a cause of some inconsistency. Also the messages " Jan 26 02:1

[ClusterLabs] {announce] [Alpha] Rust bindings for Corosync libraries

2021-01-20 Thread Christine Caulfield
I don't know how many/few people will be interested in this, but I have been working on some Rust bindings for the corosync libraries: cpg, cfg, cmap, quorum & votequorum. They are currently in Alpha stage but all features are (I think) implemented and seem to work. There's a little more work

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2021-01-04 Thread Christine Caulfield
On 04/01/2021 13:19, Klaus Wenninger wrote: On 1/4/21 1:50 PM, Christine Caulfield wrote: On 04/01/2021 09:21, Klaus Wenninger wrote: On 1/4/21 8:36 AM, Christine Caulfield wrote: On 18/12/2020 20:41, Andrei Borzenkov wrote: 18.12.2020 21:54, Ken Gaillot пишет: On Fri, 2020-12-18 at

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2021-01-04 Thread Christine Caulfield
On 04/01/2021 09:21, Klaus Wenninger wrote: On 1/4/21 8:36 AM, Christine Caulfield wrote: On 18/12/2020 20:41, Andrei Borzenkov wrote: 18.12.2020 21:54, Ken Gaillot пишет: On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: Hello, Is there a tool that would allow for commands to be

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2021-01-03 Thread Christine Caulfield
On 18/12/2020 20:41, Andrei Borzenkov wrote: 18.12.2020 21:54, Ken Gaillot пишет: On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: Hello, Is there a tool that would allow for commands to be run on remote nodes in the cluster through the corosync messaging layer? I have a cluster confi

[ClusterLabs] [Announce] libqb 2.0.2 released

2020-12-03 Thread Christine Caulfield
IN32 (#424) doxygen2man: Fix a couple of covscan-detected errors (#425) cov: Quieten some covscan warnings (#427) Christine Caulfield (1): lib: Update library version for 2.0.2 release Hideo Yamauchi (1): ipcs : Decrease log level. (#426) wferi (1): doc related fi

Re: [ClusterLabs] Antw: [EXT] Re: Q: cryptic messages from "QB"

2020-11-26 Thread Christine Caulfield
On 25/11/2020 13:04, Ulrich Windl wrote: Christine Caulfield schrieb am 25.11.2020 um 10:17 in Nachricht <56738406-9222-a9f3-c57c-e30400a0b...@redhat.com>: On 25/11/2020 08:45, Ulrich Windl wrote: Hi! Setting up a cluster in SLES15 SP2, I wonder about a few log messages: 1) what do

Re: [ClusterLabs] Q: cryptic messages from "QB"

2020-11-25 Thread Christine Caulfield
On 25/11/2020 08:45, Ulrich Windl wrote: Hi! Setting up a cluster in SLES15 SP2, I wonder about a few log messages: 1) what does "QB" stand for? 2) When QB talks about "server", does it mean "service"? Examples: corosync[7982]: [QB] server name: cmap corosync[7982]: [QB] server nam

[ClusterLabs] [Announce] libqb 2.0.1 released

2020-07-29 Thread Christine Caulfield
We are pleased to announce the release of libqb 2.0.1. This is the latest stable release of libqb Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/2.0.1/libqb-2.0.1.tar.xz Please use the signed .tar.gz or .tar.xz files with the version number in rather than the

Re: [ClusterLabs] clusterlabs.github.io

2020-06-29 Thread Christine Caulfield
On 29/06/2020 10:27, Jehan-Guillaume de Rorthais wrote: > On Mon, 29 Jun 2020 09:27:00 +0100 > Christine Caulfield wrote: > >> Is anyone (else) using this? > > I do: https://clusterlabs.github.io/PAF/ > >> We publish the libqb man pages to clusterlabs.github.i

[ClusterLabs] clusterlabs.github.io

2020-06-29 Thread Christine Caulfield
Is anyone (else) using this? We publish the libqb man pages to clusterlabs.github.io/libqb but I can't see any other clusterlabs projects using it (just by adding, eg, /pacemaker to the hostname). With libqb 2.0.1 having actual man pages installed with it - which seems far more useful to me - I

Re: [ClusterLabs] Linux 8.2 - high totem token requires manual setting of ping_interval and ping_timeout

2020-06-26 Thread Christine Caulfield
On 26/06/2020 07:56, Jan Friesse wrote: > Robert, > thank you for the info/report. More comments inside. > >> All, >> Hello.  Hope all is well.   I have been researching Oracle Linux 8.2 >> and ran across a situation that is not well documented.   I decided to >> provide some details to the commun

[ClusterLabs] libqb 2.0.0 released

2020-05-04 Thread Christine Caulfield
We are pleased to announce the release of libqb 2.0.0. This is the latest stable release of libqb Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/2.0.0/libqb-2.0.0.tar.xz Please use the signed .tar.gz or .tar.xz files with the version number in rather than the

[ClusterLabs] [Announce] libqb 1.0.6 released

2020-04-29 Thread Christine Caulfield
version number in rather than the github-generated "Source Code" ones. Chrissie Shortlog: Christine Caulfield (3): bump version for 1.0.6 Backported fixes to allow applications to compile using gcc10 (#392) Fix error in CI tests - make distcheck Jan Pokorný (9): tests: ipc: avoid pro

[ClusterLabs] [Announce] libqb 1.9.1 released

2020-03-18 Thread Christine Caulfield
We are pleased to announce the release of libqb 1.9.1 - this is a release candidate for a future 2.0 release Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/1.9.0/libqb-1.9.1.tar.xz Please use the signed .tar.gz or .tar.xz files with the version number in rath

Re: [ClusterLabs] [Announce] libqb 1.9.0 released

2020-01-13 Thread Christine Caulfield
On 13/12/2019 15:00, Yan Gao wrote: > Hi Christine, > > Congratulations and thanks for the release! > > As previously brought from: > https://github.com/ClusterLabs/libqb/issues/338#issuecomment-503155816 > > , the master branch has this too: > > https://github.com/ClusterLabs/libqb/commit/6a4

Re: [ClusterLabs] [Announce] libqb 1.9.0 released

2020-01-06 Thread Christine Caulfield
rLabs/libqb/pull/349 > > Does it mean the master branch is somehow not impacted by the issues, or > some other solutions are being sought there? Thanks. > > Regards, >Yan > > > > On 12/12/19 5:37 PM, christine caulfield wrote: >> We are pleased to announc

[ClusterLabs] [Announce] libqb 1.9.0 released

2019-12-12 Thread christine caulfield
We are pleased to announce the release of libqb 1.9.0 - this is a release candidate for a future 2.0 release Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/1.9.0/libqb-1.9.0.tar.xz Please use the signed .tar.gz or .tar.xz files with the version number in ra

Re: [ClusterLabs] corosync 3.0.1 on Debian/Buster reports some MTU errors

2019-11-21 Thread christine caulfield
On 18/11/2019 21:31, Jean-Francois Malouin wrote: Hi, Maybe not directly a pacemaker question but maybe some of you have seen this problem: A 2 node pacemaker cluster running corosync-3.0.1 with dual communication ring sometimes reports errors like this in the corosync log file: [KNET ] pmtud

Re: [ClusterLabs] Announcing ClusterLabs Summit 2020

2019-11-12 Thread christine caulfield
On 11/11/2019 13:21, Thomas Lamprecht wrote: On 11/5/19 3:07 AM, Ken Gaillot wrote: Hi all, A reminder: We are still interested in ideas for talks, and rough estimates of potential attendees. "Maybe" is perfectly fine at this stage. It will let us negotiate hotel rates and firm up the location

Re: [ClusterLabs] DLM in the cluster can tolerate more than one node failure at the same time?

2019-10-23 Thread christine caulfield
On 22/10/2019 07:15, Gang He wrote: Hi List, I remember that master node has the full copy for one DLM lock resource and the other nodes have their own lock status, then if one node is failed(or fenced), the DLM lock status can be recovered from the remained node quickly. My question is, if th

Re: [ClusterLabs] [Announce] libqb 1.0.5 release

2019-04-25 Thread Christine Caulfield
We are pleased to announce the release of libqb 1.0.5 Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/v1.0.5/libqb-1.0.5.tar.xz Please used the signed .tar.gz or .tar.xz files with the version number in rather than the github-generated "Source Code" ones. This

[ClusterLabs] [Announce] libqb 1.0.4 release

2019-04-15 Thread Christine Caulfield
We are pleased to announce the release of libqb 1.0.4 Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/v1.0.4/libqb-1.0.4.tar.xz Please used the signed .tar.gz or .tar.xz files with the version number in rather than the github-generated "Source Code" ones. This

Re: [ClusterLabs] Why do clusters have a name?

2019-03-28 Thread Christine Caulfield
On 26/03/2019 20:12, Brian Reichert wrote: > This will sound like a dumb question: > > The manpage for pcs(8) implies that to set up a cluster, one needs > to provide a name. > > Why do clusters have names? > > Is there a use case wherein there would be multiple clusters visible > in an administ

Re: [ClusterLabs] Can subsequent rings be added to established cluster?

2019-02-25 Thread Christine Caulfield
On 21/02/2019 18:33, lejeczek wrote: > hi guys > > as per the subject. > > Would there be some nice docs/howto? Or maybe it's just standard op > procedure? > With corosync 3 you can add links (similar to rings from the user POV) dynamically just by adding the necessary ringX_addr entries to c

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-18 Thread Christine Caulfield
On 15/02/2019 16:58, Edwin Török wrote: > On 15/02/2019 16:08, Christine Caulfield wrote: >> On 15/02/2019 13:06, Edwin Török wrote: >>> I tried again with 'debug: trace', lots of process pause here: >>> https://clbin.com/ZUHpd >>> >>> A

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-15 Thread Christine Caulfield
On 15/02/2019 13:06, Edwin Török wrote: > > > On 15/02/2019 11:12, Christine Caulfield wrote: >> On 15/02/2019 10:56, Edwin Török wrote: >>> On 15/02/2019 09:31, Christine Caulfield wrote: >>>> On 14/02/2019 17:33, Edwin Török wrote: >>>>> Hell

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-15 Thread Christine Caulfield
On 15/02/2019 10:56, Edwin Török wrote: > On 15/02/2019 09:31, Christine Caulfield wrote: >> On 14/02/2019 17:33, Edwin Török wrote: >>> Hello, >>> >>> We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and >>> noticed a fundamental pro

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-15 Thread Christine Caulfield
On 14/02/2019 17:33, Edwin Török wrote: > Hello, > > We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and > noticed a fundamental problem with realtime priorities: > - corosync runs on CPU3, and interrupts for the NIC used by corosync are > also routed to CPU3 > - corosync runs

Re: [ClusterLabs] Corosync 3.0.0 is available at corosync.org!

2018-12-17 Thread Christine Caulfield
On 17/12/2018 12:14, Jan Pokorný wrote: > On 17/12/18 10:04 +0000, Christine Caulfield wrote: >> On 17/12/2018 09:34, Ulrich Windl wrote: >>> I wonder: Is there a migration script that can converts corosync.conf files? >>> At least you have a few version components i

Re: [ClusterLabs] Antw: Corosync 3.0.0 is available at corosync.org!

2018-12-17 Thread Christine Caulfield
On 17/12/2018 09:34, Ulrich Windl wrote: Jan Friesse schrieb am 14.12.2018 um 15:06 in > Nachricht > <991569e4-2430-30f1-1bbc-827be7637...@redhat.com>: > [...] >> ‑ UDP/UDPU transports are still present, but supports only single ring >> (RRP is gone in favor of Knet) and doesn't support encr

Re: [ClusterLabs] Antw: Re: Corosync 3 release plans?

2018-10-01 Thread Christine Caulfield
On 01/10/18 07:45, Ulrich Windl wrote: >>>> Ferenc Wágner schrieb am 27.09.2018 um 21:16 > in > Nachricht <87zhw23g5p@lant.ki.iif.hu>: >> Christine Caulfield writes: >> >>> I'm also looking into high‑res timestamps for logfiles too. >

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-28 Thread Christine Caulfield
On 27/09/18 20:16, Ferenc Wágner wrote: > Christine Caulfield writes: > >> I'm also looking into high-res timestamps for logfiles too. > > Wouldn't that be a useful option for the syslog output as well? I'm > sometimes concerned by the batching effect

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 27/09/18 16:01, Ken Gaillot wrote: > On Thu, 2018-09-27 at 09:58 -0500, Ken Gaillot wrote: >> On Thu, 2018-09-27 at 15:32 +0200, Ferenc Wágner wrote: >>> Christine Caulfield writes: >>> >>>> TBH I would be quite happy to leave this to logrotate but the

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 27/09/18 12:52, Ferenc Wágner wrote: > Christine Caulfield writes: > >> I'm looking into new features for libqb and the option in >> https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425 >> looks like a good option to me. > > It f

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Christine Caulfield
On 26/09/18 09:21, Ferenc Wágner wrote: > Jan Friesse writes: > >> wagner.fer...@kifu.gov.hu writes: >> >>> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common >>> choices, but logging.* cmap keys probably fit Corosync better). That >>> would enable proper log rotation. >> >

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Christine Caulfield
On 24/09/18 13:12, Ferenc Wágner wrote: > Jan Friesse writes: > >> Have you had a time to play with packaging current alpha to find out >> if there are no issues? I had no problems with Fedora, but Debian has >> a lot of patches, and I would be really grateful if we could reduce >> them a lot - s

Re: [ClusterLabs] short circuiting the corosync token timeout

2018-08-13 Thread Christine Caulfield
On 13/08/18 09:00, Jan Friesse wrote: > Chris Walker napsal(a): >> Hello, >> >> Before Pacemaker can declare a node as 'offline', the Corosync layer >> must first declare that the node is no longer part of the cluster >> after waiting a full token timeout.  For example, if I manually >> STONITH a n

Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Christine Caulfield
__verbose != __stop___verbose' failed.* > > anything is logged (even in debug mode). > > I do not understand why installing libqb during the normal upgrade > process fails while if I upgrade it after the > crmsh/pacemaker/corosync/resourceagents upgrade it works fine.  >

Re: [ClusterLabs] Found libqb issue that affects pacemaker 1.1.18

2018-07-06 Thread Christine Caulfield
On 06/07/18 10:09, Salvatore D'angelo wrote: > I closed the issue. > Libqb uses tagging and people should not download the Source code (zip) >  or Source > code (tar.gz) . > The foll

Re: [ClusterLabs] Upgrade corosync problem

2018-07-03 Thread Christine Caulfield
On 03/07/18 07:53, Jan Pokorný wrote: > On 02/07/18 17:19 +0200, Salvatore D'angelo wrote: >> Today I tested the two suggestions you gave me. Here what I did. >> In the script where I create my 5 machines cluster (I use three >> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs >>

Re: [ClusterLabs] Upgrade corosync problem

2018-07-01 Thread Christine Caulfield
On 29/06/18 17:20, Jan Pokorný wrote: > On 29/06/18 10:00 +0100, Christine Caulfield wrote: >> On 27/06/18 08:35, Salvatore D'angelo wrote: >>> One thing that I do not understand is that I tried to compare corosync >>> 2.3.5 (the old version that work

Re: [ClusterLabs] Upgrade corosync problem

2018-06-29 Thread Christine Caulfield
On 27/06/18 08:35, Salvatore D'angelo wrote: > Hi, > > Thanks for reply and detailed explaination. I am not using the > —network=host option. > I have a docker image based on Ubuntu 14.04 where I only deploy this > additional software: > > *RUN apt-get update && apt-get install -y wget git xz-uti

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
n google but results where > quite confusing. > It's pretty unlikely to be the crypto libraries. It's almost certainly in libqb, with a small possibility that of corosync. Which versions did you have that worked (libqb and corosync) ? Chrissie > >> On 26 Jun 2018

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
On 26/06/18 11:24, Salvatore D'angelo wrote: > Hi, > > I have tried with: > 0.16.0.real-1ubuntu4 > 0.16.0.real-1ubuntu5 > > which version should I try? Hmm both of those are actually quite old! maybe a newer one? Chrissie > >> On 26 Jun 2018, at 12:03,

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
rding to the code. > Anyone can help? > Have you tried downgrading libqb to the previous version to see if it still happens? Chrissie >> On 26 Jun 2018, at 11:56, Christine Caulfield > <mailto:ccaul...@redhat.com>> wrote: >> >> On 26/06/18 10:35,

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
the log file >> before start corosync so it does not contains lines of previous >> executions. >> >> >> But the command: >> corosync-quorumtool -ps >> >> still give: >> Cannot initialize QUORUM service >> >> Consider that f

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
            64M   11M   54M  16% /dev/shm > > but I do not know how to do that. Any suggestion? > According to google, you just add a new line to /etc/fstab for /dev/shm tmpfs /dev/shm tmpfs defaults,size=512m 0 0 Chrissie >> On 26 Jun 2018, at 09:48, Christine Caulfield &g

Re: [ClusterLabs] Upgrade corosync problem

2018-06-26 Thread Christine Caulfield
On 25/06/18 20:41, Salvatore D'angelo wrote: > Hi, > > Let me add here one important detail. I use Docker for my test with 5 > containers deployed on my Mac. > Basically the team that worked on this project installed the cluster on soft > layer bare metal. > The PostgreSQL cluster was hard to te

Re: [ClusterLabs] Upgrade corosync problem

2018-06-25 Thread Christine Caulfield
ection FAILED: Resource temporarily unavailable (11) [17323] pg1 corosyncerror [QB] Error in connection setup (17324-17334-23): Resource temporarily unavailable (11) [17323] pg1 corosyncdebug [QB] qb_ipcs_disconnect(17324-17334-23) state:0 is /dev/shm full? Chrissie > > >>

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
e moment, lets get things mostly working first. If you enable debug logging in corosync.conf: logging { to_syslog: yes debug: on } Then see what happens and post the syslog file that has all of the corosync messages in it, we'll take it from there. Chrissie >> On 22 Ju

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
  } >         node { >                 ring0_addr: pg2 >                 ring1_addr: pg2p >                 nodeid: 2 >         } >         node { >                 ring0_addr: pg3 >                 ring1_addr: pg3p >                 nodeid: 3 >         } > }

Re: [ClusterLabs] Upgrade corosync problem

2018-06-22 Thread Christine Caulfield
On 21/06/18 16:16, Salvatore D'angelo wrote: > Hi, > > I upgraded my PostgreSQL/Pacemaker cluster with these versions. > Pacemaker 1.1.14 -> 1.1.18 > Corosync 2.3.5 -> 2.4.4 > Crmsh 2.2.0 -> 3.0.1 > Resource agents 3.9.7 -> 4.1.1 > > I started on a first node  (I am trying one node at a time upgr

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 14:27, Christine Caulfield wrote: > On 21/06/18 12:05, Jason Gauthier wrote: >> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield >> wrote: >>> >>> On 19/06/18 18:47, Jason Gauthier wrote: >>>> On Tue, Jun 19, 2018 at 6:58 AM Christine

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 12:05, Jason Gauthier wrote: > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield > wrote: >> >> On 19/06/18 18:47, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield >>> wrote: >>>> >>>> O

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 19/06/18 18:47, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield > wrote: >> >> On 19/06/18 11:44, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield >>> wrote: >>>> >>>&

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-19 Thread Christine Caulfield
On 19/06/18 11:44, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield > wrote: >> >> On 19/06/18 02:46, Jason Gauthier wrote: >>> Greetings, >>> >>>I've just discovered corosync-qdevice and corosync-qnet. >>>

Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-19 Thread Christine Caulfield
On 19/06/18 02:46, Jason Gauthier wrote: > Greetings, > >I've just discovered corosync-qdevice and corosync-qnet. > (Thanks Ken Gaillot) . Set up was pretty quick. > > I enabled qnet off cluster. I followed the steps presented by > corosync-qdevice-net-certutil.However, when running > co

Re: [ClusterLabs] corosync not able to form cluster

2018-06-08 Thread Christine Caulfield
it never gets out of the JOIN "Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11." process so something is wrong on that node, either a rogue routing table entry, dangling iptables rule or even a broken NIC. Chrissie > Thanks! > > On Thu, Jun 7, 2018 at 8:43 P

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
Thu, 7 Jun 2018, 8:03 pm Christine Caulfield, <mailto:ccaul...@redhat.com>> wrote: > > On 07/06/18 15:24, Prasad Nagaraj wrote: > > > > No iptables or otherwise firewalls are setup on these nodes. > > > > One observation is that each nod

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
On 07/06/18 15:24, Prasad Nagaraj wrote: > > No iptables or otherwise firewalls are setup on these nodes. > > One observation is that each node sends messages on with its own ring > sequence number which is not converging.. I have seen that in a good > cluster, when nodes respond with same sequen

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
> length 332 > 10:25:30.910820 IP 172.22.0.4.34060 > 172.22.0.11.netsupport: UDP, > length 376 > 10:25:30.923403 IP 172.22.0.13.57332 > 172.22.0.11.netsupport: UDP, > length 332 > 10:25:30.946507 IP 172.22.0.11.54545 > 172.22.0.4.netsupport: UDP, >

Re: [ClusterLabs] corosync not able to form cluster

2018-06-07 Thread Christine Caulfield
On 07/06/18 09:21, Prasad Nagaraj wrote: > Hi - I am running corosync on  3 nodes of CentOS release 6.9 (Final). > Corosync version is  corosync-1.4.7. > The nodes are not seeing each other and not able to form memberships. > What I see is continuous message about " A processor joined or left the >

Re: [ClusterLabs] Failure of preferred node in a 2 node cluster

2018-04-29 Thread Christine Caulfield
On 29/04/18 13:22, Andrei Borzenkov wrote: > 29.04.2018 04:19, Wei Shan пишет: >> Hi, >> >> I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I >> understand this is a really bad setup but this is what the end-user wants. >> >> ATB => auto_tie_breaker >> >> "When the auto_tie_b

Re: [ClusterLabs] Announcing the first ClusterLabs video karaoke contest!

2018-04-03 Thread Christine Caulfield
On 03/04/18 07:14, Klaus Wenninger wrote: > On 04/02/2018 02:57 AM, Digimer wrote: >> On 2018-04-01 05:30 PM, Ken Gaillot wrote: >>> In honor of the recent 10th anniversary of the first public release of >>> Pacemaker, ClusterLabs is proud to announce its first video karaoke >>> contest! >>> >>> To

Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-03-13 Thread Christine Caulfield
On 09/03/18 16:26, Jan Friesse wrote: > Thomas, > >> Hi, >> >> On 3/7/18 1:41 PM, Jan Friesse wrote: >>> Thomas, >>> First thanks for your answer! On 3/7/18 11:16 AM, Jan Friesse wrote: > > ... > >> TotemConfchgCallback: ringid (1.1436) >> active processors 3: 1 2 3 >> EXIT >> Fin

Re: [ClusterLabs] [corosync] Document on configuring corosync3 with knet

2018-03-02 Thread Christine Caulfield
On 16/01/18 13:46, Christine Caulfield wrote: > Hi All, > > To get people started with the new things going on with kronosnet and > corosync3, I've written a document which explains what you can do with > the new configuration options, how to set up multiple links and much,

[ClusterLabs] [corosync] Document on configuring corosync3 with knet

2018-01-16 Thread Christine Caulfield
Hi All, To get people started with the new things going on with kronosnet and corosync3, I've written a document which explains what you can do with the new configuration options, how to set up multiple links and much, much more. It might be helpful for people who want to write configuration tool

[ClusterLabs] [Announce] libqb 1.0.3 release

2017-12-21 Thread Christine Caulfield
We are pleased to announce the release of libqb 1.0.3 Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.xz This is mainly a bug-fix release to 1.0.2 Christine Caulfield (6): tests: Fix signal handling in check_ipc.c test: Disable

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-12 Thread Christine Caulfield
On 12/10/17 11:54, Jan Friesse wrote: > Jonathan, > >> >> >> On 12/10/17 07:48, Jan Friesse wrote: >>> Jonathan, >>> I believe main "problem" is votequorum ability to work during sync >>> phase (votequorum is only one service with this ability, see >>> votequorum_overview.8 section VIRTUAL SYNCHRO

Re: [ClusterLabs] Introducing the Anvil! Intelligent Availability platform

2017-07-06 Thread Christine Caulfield
On 05/07/17 14:55, Ken Gaillot wrote: > Wow! I'm looking forward to the September summit talk. > Me too! Congratulations on the release :) Chrissie > On 07/05/2017 01:52 AM, Digimer wrote: >> Hi all, >> >> I suspect by now, many of you here have heard me talk about the Anvil! >> intellige

Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Christine Caulfield
On 25/05/17 15:48, Rui Feng wrote: > Hi, > > I have a test based on corosync 2.3.4, and find the data stored by > cmap( corosync-cmapctl -s test i8 1) which can't be sync to other > node. > Could somebody give some comment or solution for it, thanks! > > cmap isn't replicated across the clus

[ClusterLabs] [announce] libqb 1.0.2 release

2017-05-19 Thread Christine Caulfield
I am pleased to announce the 1.0.2 release of libqb This is mainly a bug-fix release to 1.0.1. There is one new feature added and that is the option to use filesystem sockets (as opposed to the more usual abstract sockets) on Linux. CI: make travis watch for the issue CI: travis: fix dh

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Christine Caulfield
On 18/04/17 15:02, Digimer wrote: > On 18/04/17 10:00 AM, Digimer wrote: >> On 18/04/17 03:47 AM, Ulrich Windl wrote: >> Digimer schrieb am 16.04.2017 um 20:17 in Nachricht >>> <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>: On 16/04/17 01:53 PM, Eric Robinson wrote: > I was readin

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-18 Thread Christine Caulfield
> > This isn't the first time this has come up, so I decided to elaborate on > this email by writing an article on the topic. > > It's a first-draft so there are likely spelling/grammar mistakes. > However, the body is done. > > https://www.alteeve.com/w/The_2-Node_Myth > An excellent article

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-29 Thread Christine Caulfield
On 24/03/17 20:44, Seth Reid wrote: > I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in > production yet because I'm having a problem during fencing. When I > disable the network interface of any one machine, If you mean by using ifdown or similar then ... don't do that. A pr

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-14 Thread Christine Caulfield
On 11/03/17 01:32, cys wrote: > At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >> Thanks. Oddly that looks like a totally different incident to the core >> file we had last time. That seemed to be in a node state transition >> whereas this is in stable runnin

Re: [ClusterLabs] corosync cannot acquire quorum

2017-03-13 Thread Christine Caulfield
On 11/03/17 02:50, cys wrote: > We have a cluster containing 3 nodes(nodeA, nodeB, nodeC). > After nodeA is taken offline(by ifdown, this may be not right?), ifdown isn't right, no. you need to do a physical cable pull or use iptables to simulate loss of traffic, ifdown does odd things to corosyn

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-09 Thread Christine Caulfield
On 08/03/17 11:04, cys wrote: > At 2017-02-21 00:24:33, "Christine Caulfield" wrote: >> Thanks, I can read that core now. It's something odd happening in the >> sync() code that I can't quite diagnose without the blackbox. We've only >> ever se

Re: [ClusterLabs] Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-03 Thread Christine Caulfield
On 03/03/17 12:59, Ulrich Windl wrote: > Hello! > > After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a > "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying" message > when I expected the node to joint the cluster. What can be the reasons for > this? > In fact t

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-20 Thread Christine Caulfield
work corruption or on-wire incompatibilities. Has it happened before? Chrissie > At 2017-02-16 19:38:03, "Christine Caulfield" wrote: >> On 16/02/17 09:31, cys wrote: >>> The attachment includes coredump and logs just before corosync went wrong. >>&g

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
;t work (it was worth a try!) Thanks Chrissie > Unfortunately corosync was restarted yesterday, and I can't get the blackbox > dump covering the day the incident occurred. > > At 2017-02-16 16:00:05, "Christine Caulfield" wrote: >> On 16/02/17 03:51, cys wrote: &g

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 03:51, cys wrote: > At 2017-02-15 23:13:08, "Christine Caulfield" wrote: >> >> Yes, it seems that some corosync SEGVs trigger this obscure bug in >> libqb. I've chased a few possible causes and none have been fruitful. >> >> If you ge

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Christine Caulfield
On 15/02/17 14:50, Jan Friesse wrote: >> Hi all, >> >> Corosync Cluster Engine, version '2.3.4' >> Copyright (c) 2006-2009 Red Hat, Inc. >> >> Today I found corosync consuming 100% cpu. Strace showed following: >> >> write(7, "\v\0\0\0", 4) = -1 EAGAIN (Resource >> temporarily unava

Re: [ClusterLabs] Corosync maximum nodes

2017-01-30 Thread Christine Caulfield
On 27/01/17 09:43, Гюльнара Невежина wrote: > Hello! > I'm very sorry to disturb you with such question but I can't find > information if there is maximum nodes' limit in corosync? I've found a > bug report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with > "Corosync has hardcoded maximum

[ClusterLabs] libqb 1.0.1 release

2016-11-24 Thread Christine Caulfield
I am very pleased to announce the 1.0.1 release of libqb This is a bugfix release with mainly lots of small amendments. Low: ipc_shm: fix superfluous NULL check log: Don't overwrite valid tags Low: further avoid magic in qblog.h by using named constants Low: log: check for appropriate space when

  1   2   >