Re: [Openais] new config system

2008-03-26 Thread David Teigland
On Wed, Mar 26, 2008 at 11:57:59AM -0400, Lon Hohberger wrote: On Wed, 2008-03-26 at 10:32 -0500, David Teigland wrote: [1] Just to be clear, the meta-configuration idea is where a variety of config files can be used to populate a central config-file-agnostic respository. A single

Re: [Openais] new config system

2008-03-26 Thread David Teigland
On Wed, Mar 26, 2008 at 10:32:54AM -0500, David Teigland wrote: A while back I drew this diagram to show what we were aiming to design, in broad terms, for the next generation aisexec/cman config system: http://people.redhat.com/teigland/cman3.jpg I think perhaps that diagram attempts

Re: [Openais] logsys patch

2008-07-02 Thread David Teigland
On Tue, Jul 01, 2008 at 03:11:26PM -0700, Steven Dake wrote: Dave, Your patch looks reasonable but has a few issues which need to be addressed. It doesn't address the setting of logsys_subsys_id but defines it. I want to avoid the situation where logsys_subsys_id is defined, but then not

Re: [Openais] Split brain when using EVS library

2008-09-09 Thread David Teigland
On Tue, Sep 09, 2008 at 12:27:34PM +0200, Arne Eriksson R wrote: Hi, We have a cluster with 6 processors using openais stable version 0.80.3. For some reason our cluster splits up into two rings. Scenario is: node1(n1) n2 n3 n4 n5 n6 are in the ring. Suddenly the ring splits into two

Re: [Openais] [RFC] simple blackbox

2008-10-09 Thread David Teigland
Wow that is a complicated solution. I though that simple and blackbox went well together. Completely agree, too complex. The logging code I copy into all the daemons I write is at the opposite end of the spectrum; I doubt it's possible to be much simpler. (I copy it everywhere because it's

Re: [Openais] [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland
On Thu, Oct 30, 2008 at 11:26:14PM -0700, Steven Dake wrote: There are two types of messages. Those intended for users/admins and those intended for developers. Both of these message types should always be recorded *somewhere*. The entire concept of LOG_LEVEL_DEBUG is dubious to me. If

Re: [Openais] [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland
On Tue, Nov 04, 2008 at 02:58:47PM -0600, David Teigland wrote: the cluster.conf logging/ section? My suggestion is: syslog_level=foo logfile_level=bar FWIW, I'm not set on this if someone has a better suggestion. I just want something unambiguous. debug=on has been shown to mean

Re: [Openais] automake merged into corosync

2009-03-10 Thread David Teigland
On Tue, Mar 10, 2009 at 01:41:57AM -0700, Steven Dake wrote: ./autogen.sh ./configure make make install DESTDIR=/ Any chance that install could default to DESTDIR=/ ? Dave ___ Openais mailing list Openais@lists.linux-foundation.org

Re: [Openais] [CRASH] corosync crash under load

2009-03-17 Thread David Teigland
On Tue, Mar 17, 2009 at 02:18:58PM +, Chrissie Caulfield wrote: I had three GFS filesystems all mounted on 13 nodes. When I went to umount them I got the following crash on 5 nodes of the system: (gdb) bt #0 0x7f21baeb0f05 in raise () from /lib64/libc.so.6 #1

Re: [Openais] detecting cpg joiners

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 01:50:18PM +0200, Andrew Beekhof wrote: For added fun, a node that restarts quickly enough (think a VM) won't even appear to have left (or rejoined) the cluster. At the next totem confchg event, It will simply just be there again with no indication that anything

Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 09:00:08PM +0200, Dietmar Maurer wrote: If new, normal read/write messages to the replicated state continue while the new node is syncing the pre-existing state, the new node needs to save those operations to apply after it's synced. Ah, that probably works. But

Re: [Openais] detecting cpg joiners

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 10:12:43PM +0200, Andrew Beekhof wrote: On Thu, Apr 9, 2009 at 20:49, Joel Becker joel.bec...@oracle.com wrote: On Thu, Apr 09, 2009 at 01:50:18PM +0200, Andrew Beekhof wrote: For added fun, a node that restarts quickly enough (think a VM) won't even appear to have

Re: [Openais] detecting cpg joiners

2009-04-13 Thread David Teigland
On Thu, Apr 09, 2009 at 06:02:38PM -0700, Steven Dake wrote: The issue that Dave is talking about I believe is described in the following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=489451 No, not at all. IMO you should get a leave event for any process that leaves the process

Re: [Openais] detecting cpg joiners

2009-04-13 Thread David Teigland
On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote: On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote: 0. configure token timeout to some long time that is longer than all the following steps take 1. cluster members are nodeid's: 1,2,3,4 2. cpg foo has

Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread David Teigland
On Tue, Apr 14, 2009 at 02:05:10PM +0200, Dietmar Maurer wrote: So CPG provide a framework to implement distributed finite state machines (DFSM). But there is no standard way to get the initial state of the DFSM. Almost all applications need to get the initial state, so I wonder if it would

[Openais] improvements and optimizations

2009-04-14 Thread David Teigland
From one lone, biased, user's point of view, optimized malloc and memcpy are uninteresting -- message throughput isn't what I'm looking for. Are there others out there who see this as important? I *would* be interested in seeing improvements in the following areas: . message latency, if that's

Re: [Openais] improvements and optimizations

2009-04-14 Thread David Teigland
On Tue, Apr 14, 2009 at 01:18:14PM -0700, Steven Dake wrote: . message latency, if that's even possible Reducing the time a token is held reduces latency. So memcpy and malloc specials does reduce latency. I don't have measures of how much, however. That would be interesting to measure

[Openais] delayed shutdown

2009-04-15 Thread David Teigland
If I run 'cman_tool leave' on four nodes in parallel, node1 will leave right away, but the other three nodes don't leave until the token timeout expires for node1 causing a confchg for it, after which the other three all leave right away. This has only been annoying me recently, so I think it

Re: [Openais] Partition Recovery and CPG

2009-04-16 Thread David Teigland
On Thu, Apr 16, 2009 at 12:38:19PM +0200, Dietmar Maurer wrote: Lest assume the cluster is partitioned: Part1: node1 node2 node3 Part2: node4 node5 After recovery, what join/leave messaged do I receive with a CPG: A.) JOIN: node4 node5 or B.) JOIN: node1 node2 node3 or anything

Re: [Openais] howto distribute data accross all nodes? Reply-To:

2009-04-20 Thread David Teigland
On Fri, Apr 17, 2009 at 10:56:47PM -0700, Steven Dake wrote: On Sat, 2009-04-18 at 07:49 +0200, Dietmar Maurer wrote: like a 'merge' function? Seems the algorithm for checkpoint recovery always uses the state from the node with the lowest processor id? Yes that is right. So if

Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 07:49:12AM +0200, Dietmar Maurer wrote: like a 'merge' function? Seems the algorithm for checkpoint recovery always uses the state from the node with the lowest processor id? Yes that is right. So if I have the following cluster: Part1: node2 node3 node4

Re: [Openais] Partition Recovery and CPG

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 09:37:26AM +0200, Dietmar Maurer wrote: Yes, forcing the losers to reset and start from scratch is a must, but we end up doing that a layer above corosync. That means the losers often reappear again through corosync/cpg prior to being forced out. Are you talking

Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 03:55:57AM -0700, Steven Dake wrote: On Sat, 2009-04-18 at 12:47 +0200, Dietmar Maurer wrote: At least the SA Forum does not mention such strange behavior. Isn't that a serious bug? Yes, I'd consider it a serious bug. Consider 2 Partitions with one checkpoint:

Re: [Openais] [PATCH] corosync/trunk: add logging backward compatibility config layer

2009-04-21 Thread David Teigland
On Tue, Apr 21, 2009 at 07:43:04PM +0200, Fabio M. Di Nitto wrote: On Tue, 2009-04-21 at 08:51 -0500, Ryan O'Hara wrote: On Tue, Apr 21, 2009 at 06:06:25AM +0200, Fabio M. Di Nitto wrote: Hi guys, in order to match the new logging config spec, 2 logging config keywords had to be

Re: [Openais] [Corosync] Patch - Decouple shutdown ordering from objdb position

2009-04-29 Thread David Teigland
On Wed, Apr 29, 2009 at 02:28:05PM +0200, Andrew Beekhof wrote: At the moment, startup and shutdown ordering is controlled by the plugin's position in an objdb list. This is particularly problematic for cluster resource managers which must be unloaded/stopped first. The reason for this

Re: [Openais] Partition Recovery and CPG

2009-04-30 Thread David Teigland
On Thu, Apr 16, 2009 at 12:29:27PM -0500, David Teigland wrote: VS guarantees that all cpg members will see the same sequence of messages and configuration changes, i.e. history of events. If a cpg is partitioned, that immediately violates VS. One part must be killed so that the remaining

Re: [Openais] detecting cpg joiners

2009-05-06 Thread David Teigland
On Mon, Apr 13, 2009 at 02:17:00PM -0500, David Teigland wrote: On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote: On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote: 0. configure token timeout to some long time that is longer than all the following steps take 1

Re: [Openais] detecting cpg joiners

2009-05-06 Thread David Teigland
On Wed, May 06, 2009 at 02:10:27PM -0700, Steven Dake wrote: On Wed, 2009-05-06 at 15:04 -0500, David Teigland wrote: On Mon, Apr 13, 2009 at 02:17:00PM -0500, David Teigland wrote: On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote: On Mon, 2009-04-13 at 13:35 -0500, David

[Openais] saCkptSectionIterationNext() error

2009-05-06 Thread David Teigland
I think we may have lost something in transit between irc/email/svn, Mar 26 16:10:20 dct confchg, node1 create ckpt, node2 open ckpt, node2 read ckpt - fail Mar 26 16:10:46 dct nodeid 1 creates the ckpt Mar 26 16:13:42 dct saCkptCheckpointOpen() works,

Re: [Openais] saCkptSectionIterationNext() error

2009-05-07 Thread David Teigland
On Thu, May 07, 2009 at 12:46:33AM -0700, Steven Dake wrote: On Wed, 2009-05-06 at 16:26 -0500, David Teigland wrote: I think we may have lost something in transit between irc/email/svn, Mar 26 16:10:20 dct confchg, node1 create ckpt, node2 open ckpt, node2 read

[Openais] cpg_dispatch BAD_HANDLE

2009-05-07 Thread David Teigland
I recently started getting BAD_HANDLE errors from cpg_dispatch() when leaving a cpg: - cpg_leave() - cpg_dispatch(handle, CPG_DISPATCH_ALL) - dispatch executes a confchg for the leave - dispatch returns 9 It doesn't break anything, but I'd like to avoid adding code to detect when I should or

Re: [Openais] [PATCH] fix delayed shutdown

2009-05-18 Thread David Teigland
On Mon, May 18, 2009 at 01:44:50PM +0100, Chrissie Caulfield wrote: Steven Dake wrote: I don't think this will be backwards compatible with whitetank. IMO use the memb_join_message_send function as outlined. If you can show it works with whitetank then looks good for commit. OK,

Re: [Openais] [corosync + openais] [patch] Dispatch return bad handle - proposed solution

2009-05-19 Thread David Teigland
On Tue, May 19, 2009 at 03:40:53PM +0200, Jan Friesse wrote: Hi, attached are proposed solution to *dispatch* functions, which returns CS_ERR_BAD_HANDLE (AIS_ERR_BAD_HANDLE (9)). David, can you please test them, and give results? Thanks, I tried the corosync patch, and cpg_dispatch error 9

Re: [Openais] [corosync trunk] fix confchg races in cpg

2009-05-28 Thread David Teigland
On Thu, May 21, 2009 at 07:36:28AM -0700, Steven Dake wrote: It is possible with 3+ nodes joining or leaving at same time for a configuration change to be delivered to the user which it is not meant for. This patch solves that problem. ack, using this patch I can't reproduce the problem

Re: [Openais] [corosync] [patch] - ckpt solution - Change of Makefile.am

2009-05-29 Thread David Teigland
On Wed, May 27, 2009 at 04:15:52PM +0200, Jan Friesse wrote: Hi, included is patch for Makefile.am of corosync, so coroipcc.o is no longer included in lib... directly, but rather *.so is a dependency, so ipc_hdb is no longer in multiple *.so and multiple times in binary what causes problem.

[Openais] new test prog

2009-05-29 Thread David Teigland
I wrote a new program cpgx to test the virtual synchrony guarantees of corosync and cpg, http://fedorapeople.org/gitweb?p=teigland/public_git/dct-stuff.git;a=summary It joins a cpg, then randomly sends messages, leaves or exits, and repeats. This all creates a random sequence of messages and

Re: [Openais] cpgx stuck

2009-06-03 Thread David Teigland
On Wed, Jun 03, 2009 at 04:28:27PM -0500, David Teigland wrote: Running cpgx -d1 on four nodes, where -d1 causes the test to periodically kill and restart corosync. When this kill/restart happens on one node, others are typically exiting/joining the cpg during at the same time. The result

Re: [Openais] call for roadmap features for future releases

2009-06-22 Thread David Teigland
On Mon, Jun 22, 2009 at 09:26:06AM -0700, Steven Dake wrote: On Mon, 2009-06-22 at 10:59 -0500, David Teigland wrote: On Sat, Jun 20, 2009 at 11:51:40AM -0700, Steven Dake wrote: I invite all of our contributors to help define the X.Y roadmap of both corosync and openais. Please submit

Re: [Openais] change startup notice to Corosync Cluster Engine

2009-06-23 Thread David Teigland
On Mon, Jun 22, 2009 at 10:48:18PM -0700, Steven Dake wrote: While you're there, perhaps knock down the level of those messages so we don't see it all in /var/log/messages every time? Jun 22 14:58:12 bull-01 corosync[2343]: [MAIN ] Corosync Executive Service RELEASE 'trunk' Jun 22 14:58:12

Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-01 Thread David Teigland
On Wed, Jul 01, 2009 at 06:21:14PM +0200, Jan Friesse wrote: Included patch should fix https://bugzilla.redhat.com/show_bug.cgi?id=506255 . David, I hope it will fix problem for you. It's based on simple idea of adding node startup timestamp at the end of cpg_join (and joinlist) calls. If

Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-01 Thread David Teigland
On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote: other nodes should immediately recognize it has previously failed and process a complete failure for it. i.e. the full equivalent to what apps (using any api's) would see if the node had failed via normal token timeout. Dave

Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-02 Thread David Teigland
On Thu, Jul 02, 2009 at 11:09:26AM -0700, Steven Dake wrote: On Thu, 2009-07-02 at 09:27 -0500, David Teigland wrote: On Thu, Jul 02, 2009 at 01:15:18PM +0200, Jan Friesse wrote: David Teigland wrote: On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote: other nodes should

Re: [Openais] [corosync] - Allow only one connection per (node, pid, grp)

2009-07-20 Thread David Teigland
On Mon, Jul 20, 2009 at 10:03:36AM +0200, Jan Friesse wrote: Patch solves problem, when one process connect multiple times to one group by disallow this situation. Please see patch comment for more informations. David, do you agree, that this is how cpg should behave, or you would rather

[Openais] correlating events

2009-08-31 Thread David Teigland
Here are two related and troublesome problems that would be nice to fix, probably in future versions -- they probably can't be fixed maintaining existing apis and protocols (although adding new api's to help with them might be nice if possible). 1. correlating events from different services

Re: [Openais] correlating events

2009-08-31 Thread David Teigland
On Mon, Aug 31, 2009 at 02:28:33PM -0700, Steven Dake wrote: On Mon, 2009-08-31 at 15:44 -0500, David Teigland wrote: Here are two related and troublesome problems that would be nice to fix, probably in future versions -- they probably can't be fixed maintaining existing apis and protocols

Re: [Openais] correlating events

2009-09-02 Thread David Teigland
On Mon, Aug 31, 2009 at 02:28:33PM -0700, Steven Dake wrote: On Mon, 2009-08-31 at 15:44 -0500, David Teigland wrote: Here are two related and troublesome problems that would be nice to fix, probably in future versions -- they probably can't be fixed maintaining existing apis and protocols

Re: [Openais] correlating events

2009-09-11 Thread David Teigland
On Thu, Sep 10, 2009 at 04:11:28PM -0700, Steven Dake wrote: IMO the proper way to do this is to ensure whatever ringid was delivered in a callback to the application is the current ring id returned by the api. This gets rid of any races you describe above. I can't really think of any races

Re: [Openais] cherrypicking into flatiron discussion - post 1.1.0

2009-09-21 Thread David Teigland
On Mon, Sep 21, 2009 at 08:35:33AM -0700, Steven Dake wrote: 4) flatiron to trail trunk with bug resolution It appears waiting months to cherrypick patches doesn't produce a high quality flatiron that people can use continuously. I'm open to suggestions. One option is to set some time

[Openais] [PATCH] corosync/trunk QUORUM log message

2009-12-03 Thread David Teigland
This puts multiple nodeids on each [QUORUM] Members line instead of putting each nodeid on a separate line. With more than a few nodes the excessive lines become a real nuisance, and anyone up around 32 nodes may literally be scrolling through hundreds of those lines. Index: vsf_quorum.c

[Openais] corosync-objctl **binary**

2010-01-12 Thread David Teigland
corosync-objctl used to print a lot of useful information which now appears only as **binary**. Is there a way to get that back? Perhaps two output modes, one where it prints binary values in hex and another where it makes a best effort to interpret and print the values in a useful form? Dave

Re: [Openais] corosync-objctl **binary**

2010-01-13 Thread David Teigland
On Wed, Jan 13, 2010 at 02:49:53PM +1100, Angus Salkeld wrote: On Wed, Jan 13, 2010 at 6:06 AM, David Teigland teigl...@redhat.com wrote: corosync-objctl used to print a lot of useful information which now appears only as **binary**. ?Is there a way to get that back? Perhaps two output

[Openais] [QUORUM] This node is within the primary component and will provide service.

2010-02-19 Thread David Teigland
The corosync logs are so full of these messages that they end up being unhelpful. I think they could be made very helpful, though, if they were printed when the quorum state changed. Dave Index: exec/vsf_quorum.c === ---

Re: [Openais] corosync - CPG callback with totem ringid + members

2010-02-22 Thread David Teigland
On Mon, Feb 22, 2010 at 06:00:21PM +0100, Jan Friesse wrote: Related to https://bugzilla.redhat.com/show_bug.cgi?id=529424 Patch implements new callback with current totem ring id and members. Included is modified testcpg using functionality. As required, callback is delivered AFTER all

Re: [Openais] does self-fencing makes sense?

2010-02-22 Thread David Teigland
On Fri, Feb 19, 2010 at 03:31:10PM -0700, Steven Dake wrote: There are millions of lines of C code involved in directing a power fencing device to fence a node. Generally in this case, the system directing the fencing is operating from a known good state. There are several hundred lines of

Re: [Openais] corosync - CPG callback with totem ringid + members

2010-02-26 Thread David Teigland
On Mon, Feb 22, 2010 at 06:00:21PM +0100, Jan Friesse wrote: +struct cpg_ring_id { + uint32_t nodeid; + uint64_t seq; +}; What do you think about combining this patch with the other patch that adds cpg_ringid_get()? It's troublesome to combine the two patches to test. +typedef void

Re: [Openais] corosync - CPG callback with totem ringid + members

2010-03-02 Thread David Teigland
On Tue, Mar 02, 2010 at 11:10:49AM +0100, Jan Friesse wrote: I'll give you example. Let's say, you have 3 nodes (a,b,c). B,C are joined in group EXAMPLE. Now, A will fall ... you will not get normal confchg, because A was not in group. Now on B, you will run new cpg process joined to group. If

Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-07 Thread David Teigland
On Tue, Apr 06, 2010 at 02:05:00PM +0200, Jan Friesse wrote: Same patch but rebased on top of Steve's change (today trunk). Thanks, this is mostly working well, but I've found one problem, and one additional thing I need (mentioned on irc already): 1. When a node joins, I get the totem callback

Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-08 Thread David Teigland
On Thu, Apr 08, 2010 at 04:15:06PM +0100, Christine Caulfield wrote: On 08/04/10 15:57, Jan Friesse wrote: Included is patch solving 2nd problem. In first problem, I agree with Chrissie, and really don't have any single idea how to make regular confchg precede totem_confchg. We can't.

Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-08 Thread David Teigland
On Thu, Apr 08, 2010 at 04:57:22PM +0200, Jan Friesse wrote: Included is patch solving 2nd problem. Thanks, it works for me. In first problem, I agree with Chrissie, and really don't have any single idea how to make regular confchg precede totem_confchg. I've stepped through things and it

Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-09 Thread David Teigland
On Fri, Apr 09, 2010 at 09:33:30AM +0200, Jan Friesse wrote: Dave, Oh, and I may have just invented a time machine by merging partitioned clusters! 1270661597 cluster node 1 added seq 2128 1270661597 fenced:daemon conf 3 1 0 memb 1 2 4 join 1 left 1270661597 cpg_mcast_joined retried 4

[Openais] stuck on sem_timedwait

2010-04-13 Thread David Teigland
When corosync exits, my application (fenced) gets stuck. # strace -p 2005 Process 2005 attached - interrupt to quit restart_syscall(... resuming interrupted call ...) = -1 ETIMEDOUT (Connection timed out) poll([{fd=14, events=0}], 1, 0) = 1 ([{fd=14, revents=POLLNVAL}])

Re: [Openais] stuck on sem_timedwait

2010-04-14 Thread David Teigland
On Wed, Apr 14, 2010 at 12:57:14PM +0200, Jan Friesse wrote: David, in that case, corosync exits (so it is really not running) or not? Yep, the corosync process is gone. David Teigland wrote: When corosync exits, my application (fenced) gets stuck. # strace -p 2005 Process 2005

Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-19 Thread David Teigland
On Thu, Apr 08, 2010 at 04:57:22PM +0200, Jan Friesse wrote: commit 0d509f4bf23f618c940c3bcdd7cf0e97faf64876 Author: Jan Friesse jfrie...@redhat.com Date: Thu Apr 8 16:48:45 2010 +0200 CPG model_initialize and ringid + members callback Patch adds new function to initialize

[Openais] segfault in objdb

2010-04-21 Thread David Teigland
I'm using trunk svnversion 2770. I ran 'service cman start' on four nodes, which I do all the time, and one segfaulted here, Core was generated by `corosync -f'. Program terminated with signal 11, Segmentation fault. #0 0x7f1437774eb9 in object_find_next (

Re: [Openais] [PATCH orosync] select a new sync member if the node with the lowest nodeid has left.

2010-04-22 Thread David Teigland
On Thu, Apr 22, 2010 at 11:06:19AM +1000, Angus Salkeld wrote: Problem: Under certain circumstances cpg does not send group leave messages. With a big token timeout (tested with token == 5min). 1 start all nodes 2 start ./test/testcpg on all nodes 2 go to the node with the lowest nodeid

Re: [Openais] [PATCH orosync] select a new sync member if the node with the lowest nodeid has left.

2010-04-22 Thread David Teigland
On Thu, Apr 22, 2010 at 04:35:08PM -0500, David Teigland wrote: On Thu, Apr 22, 2010 at 11:06:19AM +1000, Angus Salkeld wrote: Problem: Under certain circumstances cpg does not send group leave messages. With a big token timeout (tested with token == 5min). 1 start all nodes 2

[Openais] [TOTEM ] A processor joined or left the membership and a new membership was formed.

2010-04-23 Thread David Teigland
I'm always looking for ways to make debugging/diagnosing corosync easier since it's notoriously difficult. I've always just ignored the messages in the subject line; they seem more or less equivalent to something happened. (The length of corosync messages tend to be inversely proportional to

Re: [Openais] Announcing Corosync 1.3.0

2011-01-13 Thread David Teigland
On Thu, Jan 13, 2011 at 08:09:13AM -0700, Steven Dake wrote: On 01/13/2011 08:03 AM, Lars Marowsky-Bree wrote: On 2010-12-01T14:18:25, Steven Dake sd...@redhat.com wrote: Corosync 1.3.0 is available for immediate download from our website. This version brings many enhancements to the

Re: [Openais] cpg behavior on transitional membership change

2011-09-02 Thread David Teigland
On Fri, Sep 02, 2011 at 10:30:53AM -0700, Steven Dake wrote: On 09/02/2011 12:59 AM, Vladislav Bogdanov wrote: Hi all, I'm trying to further investigate problem I described at https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html The main problem for me there is