Steven Dake wrote:
When a process pauses for longer then the token timeout, the other
processors in the system form a new ring. The remaining processor then
eventually reschedules and processes the pending membership multicast
messages in its kernel queues. This wreaks havok on the
Chrissie Caulfield wrote:
Steven Dake wrote:
On Thu, 2009-06-25 at 16:25 +0100, Chrissie Caulfield wrote:
Steven Dake wrote:
The change is a good idea, but I'd rather not stick more error code
handling requirements on the user. Can't we just use try again instead?
Hmmm
ERR_INTERRUPT
Currently coroipcc detects EINTR returns from poll() etc and simply
retries the operation without informing the clients.
I think the clients need to know a signal has been detected. Many
daemons trap SIGINT to help them shutdown cleanly, and this used to
work. Now they get the signal delivered
think. Also I
don't really think that ERR_TRY_AGAIN properly captures what has happened.
But I'm not going to get anal about this. Does anyone else have an opinion?
Chrissie
On Thu, 2009-06-25 at 10:03 +0100, Chrissie Caulfield wrote:
Currently coroipcc detects EINTR returns from poll() etc
Fabio M. Di Nitto wrote:
gcc -DHAVE_CONFIG_H -I. -I../include/corosync -I../include -I../include
-I/usr/include/nss -I/usr/include/nspr -fPIC -g -O2 -O3 -ggdb3
-Wall -Wshadow -Wmissing-prototypes -Wmissing-declarations
-Wstrict-prototypes -Wdeclaration-after-statement -Wpointer-arith
There is a bad comparison in corosync-cfgtool -a which means that it
will never display any output!
The attached patch fixes.
Thanks for Dave for finding this
--
Chrissie
Index: tools/corosync-cfgtool.c
===
---
Steven Dake wrote:
totemip uses a GNU glibc-ism (s16_addr in the in6_addr struct) which is
not available on non-GLIBC systems (solaris specifically).
This patch changes the compare to compare 8 bits at a time instead of
16.
Chrissie, is there a reason the compare was being done 16 bits at
Steven Dake wrote:
good for merge
regards
-steve
Committed revision 2207.
On Thu, 2009-06-04 at 14:58 +0100, Chrissie Caulfield wrote:
Currently corosync-keygen will fail if /etc/ais exists. This is absurd
surely. I appreciate that it needs to create the directory if it doesn't
exist
Andrew Beekhof wrote:
This is a re-post of an earlier patch that decouples shutdown/startup
order from the objdb order.
This is needed as the objdb order will change as modules are
loaded/unloaded and is also set up to unload non-default services last
(which is the opposite of what something
Andrew Beekhof wrote:
Some minor fixes that allow building on OSX.
Please ACK/NACK
It works for me.
ACK
--
Chrissie
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais
Mansuri Yunus wrote:
Hi All,
I am getting below error
Jun 2 14:56:48 openais[4417]: [MAIN ] Received message has invalid
digest... ignoring.
Jun 2 14:56:48 openais[4417]: [MAIN ] Invalid packet data
The gfs cluster is running fine but every 20 Sec I am getting this error
Message
Steven Dake wrote:
good for merge
thanks for doing the port
Committed revision 2202.
regards
-steve
On Wed, 2009-05-27 at 11:03 +0100, Chrissie Caulfield wrote:
Steven Dake wrote:
Instead of specifying mcastaddr, broadcast: yes can be set in
openais.conf to allow openais to use
Jan Friesse wrote:
Attached is second version of patch. I tested this 300% more harder then
previous patch, so I hope it will work.
ACK. This fixes a similar problem I was seeing too.
David Teigland wrote:
On Wed, May 27, 2009 at 04:15:52PM +0200, Jan Friesse wrote:
Hi,
included is
Steven Dake wrote:
Instead of specifying mcastaddr, broadcast: yes can be set in
openais.conf to allow openais to use the broadcast address instead of a
multicast address.
This patch ports almost effortlessly to corosync. And it works too :-)
Chrissie
Index: include/corosync/totem/totem.h
The deliver_fn in the YKD quorum plugin has not be adapted for the new
tpg API. This patch fixes:
--
Chrissie
Index: exec/vsf_ykd.c
===
--- exec/vsf_ykd.c (revision 2194)
+++ exec/vsf_ykd.c (working copy)
@@ -336,14 +336,14 @@
Steven Dake wrote:
I don't think this will be backwards compatible with whitetank. IMO use
the memb_join_message_send function as outlined. If you can show it
works with whitetank then looks good for commit.
OK, here's a new patch that doesn't create a new message type. The
reason I had
It's very useful when reading debug logs, to know which members of the
cluster are included in quorum. Also it's possible that someone might
not have Clm loaded and that's about the only subsystem that prints out
members lists after a transition.
This patch adds a member list print to quorum.
Steven Dake wrote:
good for merge.
Thanks :-)
On Tue, 2009-05-12 at 09:57 +0100, Chrissie Caulfield wrote:
Sorry to be anal, but things like this really get on my nerves after a
while...
___
Openais mailing list
Openais@lists.linux
This patch converts votequorum to use the proper message delivery API
rather than the tpg_ calls. It also remove a lot of mess around those
calls, such as headers and things that were needed for cman
compatibility but which we will not need.
It also fixes some handle changes that did not get
can build it.
Regards
-steve
On Thu, 2009-04-23 at 11:46 +0100, Chrissie Caulfield wrote:
Jim Meyering wrote:
Chrissie Caulfield wrote:
This pair of patches is to allow people to upgrade the cryptographic
code fro the built-in SOBER128 in openais to libnss in corosync.
Hi Chrissie
Committed revision 2131.
Steven Dake wrote:
good for merge
thansk
-steve
On Thu, 2009-04-23 at 14:58 +0100, Chrissie Caulfield wrote:
This patch removes the (now redundant) call to sync_primary_callback_fn
when quorum changes.
As we established a while ago, quorum is independent
This pair of patches is to allow people to upgrade the cryptographic
code fro the built-in SOBER128 in openais to libnss in corosync.
The patch for openais is needed to make it work with corosync - it
simply adds code to allow it to handle the extended packets that
corosync sends with the crypto
Fabio M. Di Nitto wrote:
Hi,
several times it's been reported that make install will override
the installed configuration.
This is wrong.
Instead of shipping/installing corosync.conf (that in 99.9% of the case
it needs to be customized), ship it as corosync.conf.example and let the
Fabio M. Di Nitto wrote:
On Wed, 2009-04-15 at 14:35 -0500, David Teigland wrote:
If I run 'cman_tool leave' on four nodes in parallel, node1 will leave right
away, but the other three nodes don't leave until the token timeout expires
for node1 causing a confchg for it, after which the other
Joel Becker wrote:
Steve, Dave, etc,
Someone told me a while back that a node joining a cpg group
would be by its lonesome in the join message. That is, when the node
gets its first confchg, it will be the only node in the list of joins.
I've been using this to detect the first joiner
Steven Dake wrote:
On Tue, 2009-04-07 at 16:53 +0200, Jan Friesse wrote:
Attached is workaround for cpg.c bug causing sig...
Why I call it workaround and not patch? Because it's not real patch
solving root of problem.
This doesn't solve the root of the problem.
IMO there is a design
Jim Meyering wrote:
While outlining an API change for coroipcc_dispatch_recv,
I noticed that locks are released in two cases in confdb.c,
for which they are not released in nearly identical code in cfg.c's
corosync_cfg_dispatch function.
I don't know enough about the surrounding code and
Dietmar Maurer wrote:
I am playing around with corosync/openais and clvmd-openais. So far it
works. But when is stop the corosync process (or if it gets stopped by a
SIGSEGV), clvmd-openais is completely unusable.
Like any openais client, it clvmd simple connects to the corosync
process at
Jan Friesse wrote:
Attached patch solves problem with running corosync as ais user.
Main problem was hidden in reading aisexec section. If this section
exists in corosync.conf, everything works, but in other cases,
main_config-uid/gid are initialized to 0 (so only root:root) can run
Steven Dake wrote:
Chrissie/others,
Dave reported a bug today which is basically that sync is broken. I
found the root cause to be some integration issues between quorum and
sync.
After thinking about it today, I came to the conclusion that sync should
always happen regardless of quorum
This patch includes the MSG_NOSIGNAL patch posted by Steven earlier but
add s a couple of other things needed to get IPC working on Mac OS/X.
--
Chrissie
Index: exec/coroipcs.c
===
--- exec/coroipcs.c (revision 1909)
+++
The IPC system smply concatenates SOCKETDIR with run/socketname so if
th euser forgets to add a trailing slash to the name: eg
./configure --with-socket-dir=/var/run
then the socket is created as /var/runcorosync.ipc
This patch adds the slash into the name generation printf.
--
Chrissie
For interest here's my cpgbench results for a 2.33Ghz Core2 Duo Macbook
pro with 3GB RAM
171372 messages received 1000 bytes per write 10.000 Seconds runtime
17137.464 TP/s 17.137 MB/s.
164848 messages received 2000 bytes per write 10.000 Seconds runtime
16485.027 TP/s 32.970 MB/s.
154322
Jim Meyering wrote:
Chrissie Caulfield wrote:
This patch includes the MSG_NOSIGNAL patch posted by Steven earlier but
add s a couple of other things needed to get IPC working on Mac OS/X.
Hi Chrissie,
This looks fine.
A question and a suggestion:
Index: exec/main.c
Committed revision 1910.
Chrissie Caulfield wrote:
The IPC system smply concatenates SOCKETDIR with run/socketname so if
the user forgets to add a trailing slash to the name: eg
./configure --with-socket-dir=/var/run
then the socket is created as /var/runcorosync.ipc
This patch adds
Committed revision 1911.
I had to use
#if MSG_NOSIGNAL == 0
rather than
#ifundef because it's defined to be zero to make the IPC code work.
Chrissie
Chrissie Caulfield wrote:
Jim Meyering wrote:
Chrissie Caulfield wrote:
This patch includes the MSG_NOSIGNAL patch posted by Steven earlier
Fabio M. Di Nitto wrote:
Set the log level to default with all the other services.
ACK
That sounds like debugging code that got checked in by mistake !
Thanks,
Chrissie
___
Openais mailing list
Openais@lists.linux-foundation.org
Fabio M. Di Nitto wrote:
On Tue, 2009-03-17 at 09:18 +, Chrissie Caulfield wrote:
Fabio M. Di Nitto wrote:
Patch in attachment removes hardcoded /var and use localstatedir
instead.
My personal preference would be to split out the sockets and the real
'state'. But I seem
Fabio M. Di Nitto wrote:
Patch in attachment removes hardcoded /var and use localstatedir
instead.
My personal preference would be to split out the sockets and the real
'state'. But I seem to be in a minority here.
Chrissie
___
Openais mailing list
Fabio M. Di Nitto wrote:
As requested by Christine,
add support for --with-socket-dir.
The patch in attachment changes the configure system to understand the
option, propagate SOCKETDIR in Makefile.am and into pkg-config snippet
for external entities.
It's my duty to ACK this patch ;-)
I had three GFS filesystems all mounted on 13 nodes. When I went to
umount them I got the following crash on 5 nodes of the system:
(gdb) bt
#0 0x7f21baeb0f05 in raise () from /lib64/libc.so.6
#1 0x7f21baeb2a73 in abort () from /lib64/libc.so.6
#2 0x7f21baef0438 in
Priyanka Ranjan wrote:
Hi All,
i am using suse 11 (openais+pacemaker ) cluster. do we have any way to
get nodeid or any other way to uniquely identify a node in cluster.
Hi,
You can get the local nodeid using the libcpg call cpg_local_get()
Chrissie
Vivek Purohit wrote:
Hi Steve,
I have been able to use openAis from root
but was unable to run aisexec as UIDGID
ais.
I simply made a group ais and added one user
in this group as ais.
Then logged out from root and tried to run aisexec
from ais, but was unable to run as some error
ACK for the .c file changes
(I don't feel competent to check the makefile pieces... so I'll trust
you on those)
Chrissie
Fabio M. Di Nitto wrote:
Hi guys,
this one needs an ACK since it changes a few lines of code around.
Patch does:
configure.ac:
- Fix white space for --help.
-
Steven Dake wrote:
Merged 1743.
Does this fix the other issues Dave was seeing?
No, I don't think so. I'm looking into those now. I discovered that one
while I was investigating dave's troubles!
On Mon, 2009-03-09 at 16:13 +, Chrissie Caulfield wrote:
CPG has a stupid typo which
Quorum checks the ring ID is new before initiating a sync. Unfortunately
it copies the ring ID BEFORE checking it so there is always a match.
Sigh
This patch fixes it.
Chrissie
Index: vsf_quorum.c
===
--- vsf_quorum.c (revision
--
Chrissie
Index: test/testconfdb.c
===
--- test/testconfdb.c (revision 1809)
+++ test/testconfdb.c (working copy)
@@ -115,7 +115,7 @@
char error_string[1024];
/* Add a scratch object and put some keys into it */
- res =
Steven Dake wrote:
Patch looks good minus the bit about strdup() of the config_iface. The
definition for that can instead be changed to const (which i have done
many patches ago) and it should compile without warning.
Actually that fixes another bug, where the later code will do a strtok
and
CPG has a stupid typo which stops confchg from working correctly if a
node has a nodeid 0xff
This bug is fixed in corosync after version 0.92.
Chrissie
Index: exec/cpg.c
===
--- exec/cpg.c (revision 1741)
+++ exec/cpg.c
that is there.
Regards
-steve
On Fri, 2009-02-27 at 11:30 +, Chrissie Caulfield wrote:
The IPC patch broke CFG shutdown in several places, this patches fixes
all of them.
In particular, cfg_try_shutdown asks all applications that are
registered for callbacks if they approve the shutdown
Chrissie Caulfield wrote:
The IPC patch broke CFG shutdown in several places, this patches fixes
all of them.
In particular, cfg_try_shutdown asks all applications that are
registered for callbacks if they approve the shutdown. This caused a bit
of a re-entrancy problem because it also
The current object database allows duplicate key names per object. This
is a bit of a nightmare to manage and provides no useful functionality
that I can see. Making keys unique has been discussed on IRC several
times and there seem to be no objections...so here is the patch:
Note that I have
Committed revision 1783.
Steven Dake wrote:
thanks for the work
good for merge
Regards
-steve
On Thu, 2009-02-26 at 13:22 +, Chrissie Caulfield wrote:
The current object database allows duplicate key names per object. This
is a bit of a nightmare to manage and provides no useful
Chrissie Caulfield wrote:
OK,
I've fixed the quorum votequorum libraries (there was an '' missing
in the connect call!). As it stands they still crash corosync though,
because it doesn't validate the module ID. As quorum votequorum are
not loaded by default, ipc dereferences a NULL
When a quorum device registers it tells the corosync quorum engine of
the new quorum which then tries to do a new sync(). But that's no use
because the nodelist and ring_id is identical to before. Also it can try
and register while a sync is already in operation ... which gets it
awfully stuck!
-02-18 at 14:11 +, Chrissie Caulfield wrote:
Steven Dake wrote:
try using touch on the file coroipc.h
I don't think that's enough (well, it isn't, I tried!) - looking at the
diff:
Index: include/corosync/coroipc.h
This seems to be biting a lot of people, so I propose that cpg is
allowed to send messages on an inquorate cluster
--
Chrissie
Index: services/cpg.c
===
--- services/cpg.c (revision 1767)
+++ services/cpg.c (working copy)
@@ -263,6
Committed revision 1771.
Steven Dake wrote:
good for merge
thanks!
-steve
On Thu, 2009-02-19 at 15:33 +, Chrissie Caulfield wrote:
This seems to be biting a lot of people, so I propose that cpg is
allowed to send messages on an inquorate cluster
, 2009-02-18 at 13:32 +, Chrissie Caulfield wrote:
Steven Dake wrote:
Here is the first stab at the whitetank IPC forward port into corosync.
Currently CPG and EVS appear to work properly while other services have
various segfaults with the test cases.
Patches appreciated:)
I can't get
Committed revision 1759.
Thanks,
Steven Dake wrote:
great work patch good for merge
thanks
-steve
On Fri, 2009-02-13 at 09:11 +, Chrissie Caulfield wrote:
I think it's handy to have a _local_get() call in most libraries. Lots
of applications only use one library and most need
Here's an updated patch that works against the latest whitetank (with
new IPC system) version 1698.
Chrissie
Steven Dake wrote:
I will commit.
Thanks for the patch.
Regards
-steve
On Wed, 2009-01-28 at 14:12 +, Chrissie Caulfield wrote:
As promised here's the patch to add
Alan Conway wrote:
Chrissie Caulfield wrote:
Alan Conway wrote:
I've got a cluster app that works fine with corosync alone. Hoever if
I start corosync via cman I get errno 11 - access denied from
cpg_init. Is there some different security rule when cman is in play?
cman enables quorum
Alan Conway wrote:
Chrissie Caulfield wrote:
Alan Conway wrote:
I've got a cluster app that works fine with corosync alone. Hoever if
I start corosync via cman I get errno 11 - access denied from
cpg_init. Is there some different security rule when cman is in play?
cman enables quorum
As promised here's the patch to add a limited subset of confdb
facilities to whitetank.
--
Chrissie
Index: test/Makefile
===
--- test/Makefile (revision 1672)
+++ test/Makefile (working copy)
@@ -33,7 +33,7 @@
#
include
The attached patch fixes two memory leaks in confdb objdb.
The objdb occurred because object_find_destroy wasn't implemented!
The one in confdb occurred because object_find_destroy wasn't called if
object_find_next returned an error the first time it was invoked (ie
there were no subobjects)
Fabio M. Di Nitto wrote:
hi,
when changing logging configuration at runtime (such as on reload
operations), we want to make sure to release the fd associated with the
logfile everytime that is not required.
There are 2 cases that we need to handle:
- if the user decides to stop logging
Fabio M. Di Nitto wrote:
Hi,
the main logsys config was performed within main(). This is not really
correct.
the 3 calls to logsys_config belongs in the readlogging function for 2
reasons: coherency with all subsystems logsys_config init and reload
operation.
the call to
Fabio M. Di Nitto wrote:
Hi,
warningthis looks like an heavy patch/warning
The only reason why main.c carries all this code and knowledge about
user/group id is to be able to drop privileges (that in the actual code
is still disabled anyway), and pass the group id to the IPC init system.
Committed revision 1741.
And I also changed the copyright dates on those files too.
Steven Dake wrote:
:) good for merge
thanks
-steve
On Fri, 2009-01-23 at 10:50 +, Chrissie Caulfield wrote:
The attached patch fixes two memory leaks in confdb objdb.
The objdb occurred because
Steven Dake wrote:
Without calling endpwent friends, the calls that retrieve the password
entry will leak memory permanently on some platforms.
That sounds like a bug in libc. And having seen a few libcs in my time I
am unsurprised...
On Fri, 2009-01-23 at 14:05 +, Chrissie Caulfield
Committed revision 1725.
Steven Dake wrote:
good for merge
regards
-steve
On Tue, 2009-01-13 at 11:12 +, Chrissie Caulfield wrote:
Some library functions in cfg.c are missing pthread_mutex_lock calls.
The offending function has subsequently been copied and pasted to make
new
OK, here's a patch to tidy up all the hideous bicapitalised names in the
CFG service
--
Chrissie
Index: services/cfg.c
===
--- services/cfg.c (revision 1726)
+++ services/cfg.c (working copy)
@@ -1,6 +1,6 @@
/*
* Copyright (c)
Some library functions in cfg.c are missing pthread_mutex_lock calls.
The offending function has subsequently been copied and pasted to make
new functions and propagated the error!
This patch adds the missing calls.
--
Chrissie
Index: lib/cfg.c
Andrew Beekhof wrote:
On Jan 8, 2009, at 7:04 AM, Steven Dake wrote:
On Wed, 2009-01-07 at 15:10 +0100, Andrew Beekhof wrote:
On Jan 7, 2009, at 2:58 PM, Steven Dake wrote:
On Wed, 2009-01-07 at 13:55 +0100, Andrew Beekhof wrote:
On Jan 7, 2009, at 1:43 PM, Steven Dake wrote:
Yes.
We
Steven Dake wrote:
On Wed, 2009-01-07 at 20:08 +0100, Geert Jansen wrote:
Hi,
calling cpg_membership_get() results in a segmentation fault on my
system (latest Fedora 10, latest Corosync SVN). Upon inspecting the
code i noticed that:
* It assumes that group_name is an input parameter,
Due a typo or maybe a cut-n-pasteo in the Makefile, make install
always overwrites an installed corosync.conf
The attached patch fixes this.
--
Chrissie
Index: Makefile
===
--- Makefile (revision 1717)
+++ Makefile (working copy)
Committed revision 1723.
Steven Dake wrote:
my guess is typo and oversight :)
please merge
thanks again!
-steve
On Thu, 2009-01-08 at 11:14 +, Chrissie Caulfield wrote:
Due a typo or maybe a cut-n-pasteo in the Makefile, make install
always overwrites an installed corosync.conf
Angus Salkeld wrote:
Coverity says it might be possible for buf to be NULL when
dereferenced in send_group_list_callbacks().
I have just added a continue.
I reckon it's impossible for buf to be NULL at that point. It's
allocated in the realloc() above it and that can only be skipped if
there
Committed revision 1712.
Steven Dake wrote:
patch looks good for commit.
Not sure why this was changed.
Regards
-steve
On Wed, 2008-12-17 at 14:44 +, Chrissie Caulfield wrote:
Despite what it says in the logs, if there is an error corosync always
exits with status code 1
If objdb is reloaded, then we re-parse the logging options.
This allows logging to be enabled/disabled without restarting corosync
--
Chrissie
Index: exec/mainconfig.c
===
--- exec/mainconfig.c (revision 1708)
+++
80 matches
Mail list logo