Re: [Openais] Pb with last corosync releases available for RHEL5 ...

2010-05-24 Thread Vladislav Bogdanov
Hi all, Sorry for being out of References, just subscribed. On Fri, 2010-05-21 at 16:19 +0200, Alain.Moulle wrote: Hi These new releases of corosync do not start successfully on RHEL5 : corosync-1.2.2-1.1.el5 corosynclib-1.2.2-1.1.el5 I 've joined the messages trace. whereas on same

Re: [Openais] Pb with last corosync releases available for RHEL5 ...

2010-05-24 Thread Vladislav Bogdanov
25.05.2010 02:01, Steven Dake wrote: On 05/24/2010 02:51 PM, Vladislav Bogdanov wrote: Hi all, Sorry for being out of References, just subscribed. On Fri, 2010-05-21 at 16:19 +0200, Alain.Moulle wrote: Hi These new releases of corosync do not start successfully on RHEL5

Re: [Openais] corosync trunk - resolve problems with failed to receive logic

2010-07-22 Thread Vladislav Bogdanov
:21:4f:2a:7d Aggregator ID: 1 - Regards -steve On Wed, Jul 21, 2010 at 10:44 PM, Vladislav Bogdanov bub...@hoster-ok.com mailto:bub...@hoster-ok.com wrote: 03.06.2010 22:42, Steven Dake wrote: The failed to receive logic in totem

Re: [Openais] corosync trunk - resolve problems with failed to receive logic

2010-07-24 Thread Vladislav Bogdanov
. BTW, I can't know where that infinite loop in fplay is originated from, but I expect that this could be from data which is possibly broken... 22.07.2010 14:48, Vladislav Bogdanov wrote: 22.07.2010 08:50, Steven Dake wrote: On Wed, Jul 21, 2010 at 11:21 PM, Steven Dake steven.d...@gmail.com

[Openais] corosync and openais initscripts

2010-08-24 Thread Vladislav Bogdanov
Hi all, is there any background idea having # chkconfig: - 20 20 in initscript? Shouldn't it be # chkconfig: - 20 80 ? Now both corosync and openais stop too early (20), and (should be) unable to stop if pacemaker ver1 API is used (pacemaker has 90 90, but this is for another mailing list).

Re: [Openais] [ANNOUNCE] corosync 1.2.8 released

2010-08-31 Thread Vladislav Bogdanov
Hi, 31.08.2010 22:22, Steven Dake wrote: I am pleased to announce Corosync 1.2.8 is available for immediate download from our website. Initscript doesn't seem to be fixed yet. http://marc.info/?l=openaism=128271460429681w=2

Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-03 Thread Vladislav Bogdanov
03.09.2010 11:16, Fabio M. Di Nitto wrote: On 9/3/2010 10:00 AM, Keisuke MORI wrote: 2010/9/3 Fabio M. Di Nitto fabbi...@fabbione.net: so the current init script has: # chkconfig: - 20 20 and that is definitely wrong. It must have slept through the crack when we re-did the init script a

Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-03 Thread Vladislav Bogdanov
03.09.2010 19:01, Steven Dake wrote: Agree with Fabio, this sounds too complicated and works around a problem which can be addressed with distro tools. Also pcmkv0 is legacy, so we would be working around a problem which doesn't exist upstream. I don't know enough about the LSB stanzas,

Re: [Openais] superfluous dependency in corosync spec file

2010-10-12 Thread Vladislav Bogdanov
12.10.2010 20:43, Fabio M. Di NItto wrote: ... Also you might want to notice that there is no way any of the corosync library can be of any use on a system without corosync main package. Then what is the reason to have both corosync and corosynclib packages rather then one monolithic corosync

Re: [Openais] superfluous dependency in corosync spec file

2010-10-12 Thread Vladislav Bogdanov
13.10.2010 07:14, Fabio M. Di NItto wrote: On 10/13/2010 12:14 AM, Vadym Chepkov wrote: Also you might want to notice that there is no way any of the corosync library can be of any use on a system without corosync main package. Of cause there is. What if you just compile pacemaker, for

Re: [Openais] superfluous dependency in corosync spec file

2010-10-13 Thread Vladislav Bogdanov
13.10.2010 11:41, Fabio M. Di Nitto wrote: ... It is absolutely legal to have both heartbeat and corosync packages installed even with that Provides: cluster-engine virtual dependencies as long as they do not have Conflicts: or Obsoletes: on each other or some filesystem-level conflicts. So,

Re: [Openais] [Corosync] [FreeBSD] Fix client deconnection when Corosync is compiled with --enable-debug

2010-11-08 Thread Vladislav Bogdanov
08.11.2010 14:47, Jerome Flesch wrote: Hello, On FreeBSD, when a client is disconnected by Corosync, there is an assert() that fails on the client side. The assert assumes that, since the poll() call was ok, recvmsg() will also be ok. However, this is not true on BSD systems. This is not

[Openais] Assert at totemsrp.c:1194 after FAILED TO RECEIVE

2010-11-23 Thread Vladislav Bogdanov
Hi Steven, hi all. I often see this assert on one of nodes after I stop corosync on some another node in newly-setup 4-node cluster. #0 0x7f51953e49a5 in raise () from /lib64/libc.so.6 #1 0x7f51953e6185 in abort () from /lib64/libc.so.6 #2 0x7f51953dd935 in __assert_fail () from

Re: [Openais] Assert at totemsrp.c:1194 after FAILED TO RECEIVE

2010-11-24 Thread Vladislav Bogdanov
, in kernel or in corosync itself. Best, Vladislav Steven Dake sd...@redhat.com wrote: On 11/23/2010 03:53 AM, Vladislav Bogdanov wrote: Hi Steven, hi all. I often see this assert on one of nodes after I stop corosync on some another node in newly-setup 4-node cluster. #0 0x7f51953e49a5

Re: [Openais] Assert at totemsrp.c:1194 after FAILED TO RECEIVE

2010-11-30 Thread Vladislav Bogdanov
Hi Steven, did you have a chance to loo at this issue? 24.11.2010 20:28, Steven Dake wrote: ... What could be the reason for this? Bug, switches, memory errors? The FAILED TO RECEIVE indicates the node didn't receive any multicast packets for long periods (switch problem). Given the

Re: [Openais] Assert at totemsrp.c:1194 after FAILED TO RECEIVE

2010-12-01 Thread Vladislav Bogdanov
01.12.2010 16:32, Dejan Muhamedagic wrote: Hi, On Tue, Nov 23, 2010 at 12:53:42PM +0200, Vladislav Bogdanov wrote: Hi Steven, hi all. I often see this assert on one of nodes after I stop corosync on some another node in newly-setup 4-node cluster. Does the assert happen on a node lost

Re: [Openais] Changing IP addresses without restarting Corosync?

2011-01-21 Thread Vladislav Bogdanov
21.01.2011 11:48, JiaQiang Xu wrote: Hi folks, Recently I came up with a requirement for changing IP addresses used by the UDPU transport without restarting Corosync. I need a tool to add/remove IP addresses for UDPU transport dynamically. Does Corosync support this feature now? Or is

Re: [Openais] firewire

2011-03-09 Thread Vladislav Bogdanov
08.03.2011 23:40, ray klassen wrote: one other thing. in this configuration, corosync has to be shot in the head itself to stop. /etc/init.d/corosync stop results in something like Waiting for corosync services to stop and lines and lines of dots. Kill -9 is the only way, it seems. Did

Re: [Openais] Corosync/openais segfault

2011-04-07 Thread Vladislav Bogdanov
05.04.2011 19:34, Steven Dake wrote: On 04/05/2011 09:53 AM, Vladislav Bogdanov wrote: 05.04.2011 19:41, Steven Dake wrote: This could be one of two things. Either a bug in the lock service around reference counting, or a known issue we have resolved with recursion that causes stack

Re: [Openais] Assert in openais-1.1.4/services/lck.c:2768

2011-04-12 Thread Vladislav Bogdanov
Hi, still having this. Steve, can you please take a quick look? Is that corosync or lck? The only change in corosync comparing to previous report is that changeset for recv-flush fix is applied now. Best, Vladislav 24.03.2011 12:06, Vladislav Bogdanov wrote: Hi, just hit assert (on several

[Openais] [TOTEM ] Process pause detected for XXX ms, flushing membership messages.

2011-07-05 Thread Vladislav Bogdanov
Hi all, Last days I see following messages in logs: [TOTEM ] Process pause detected for XXX ms, flushing membership messages. After that ring is quickly re-established. DLM/clvmd notifies this and switches to kern_stop waiting for fencing to be done. Although what dlm_tool ls provides is really

Re: [Openais] [TOTEM ] Process pause detected for XXX ms, flushing membership messages.

2011-07-05 Thread Vladislav Bogdanov
05.07.2011 19:10, Steven Dake wrote: On 07/05/2011 07:26 AM, Vladislav Bogdanov wrote: Hi all, Last days I see following messages in logs: [TOTEM ] Process pause detected for XXX ms, flushing membership messages. After that ring is quickly re-established. DLM/clvmd notifies

Re: [Openais] [TOTEM ] Process pause detected for XXX ms, flushing membership messages.

2011-07-05 Thread Vladislav Bogdanov
05.07.2011 22:13, Steven Dake wrote: On 07/05/2011 11:08 AM, Vladislav Bogdanov wrote: 05.07.2011 20:25, Steven Dake wrote: On 07/05/2011 10:08 AM, Vladislav Bogdanov wrote: 05.07.2011 19:10, Steven Dake wrote: On 07/05/2011 07:26 AM, Vladislav Bogdanov wrote: Hi all, Last days I see

Re: [Openais] [TOTEM ] Process pause detected for XXX ms, flushing membership messages.

2011-07-08 Thread Vladislav Bogdanov
I checked the archives and found a patch from some time ago that was never merged. It wasn't verified to resolve the pause timeout problem but t could indeed solve the problem. It wasn't merged because we lacked verification it resolved the problem. Great, I'll try it in next few days,

[Openais] cpg behavior on transitional membership change

2011-09-02 Thread Vladislav Bogdanov
Hi all, I'm trying to further investigate problem I described at https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html The main problem for me there is that pacemaker first sees transitional membership with left nodes, then it sees stable membership with that nodes returned

Re: [Openais] cpg behavior on transitional membership change

2011-09-02 Thread Vladislav Bogdanov
Hi Steve, 02.09.2011 20:30, Steven Dake wrote: On 09/02/2011 12:59 AM, Vladislav Bogdanov wrote: ... I'm trying to further investigate problem I described at https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html The main problem for me there is that pacemaker first sees

Re: [Openais] cpg behavior on transitional membership change

2011-09-02 Thread Vladislav Bogdanov
02.09.2011 20:55, David Teigland wrote: [snip] I really can't make any sense of the report, sorry. Maybe reproduce it without pacemaker, and then describe the specific steps to create the issue and resulting symptoms. After that we can determine what logs, if any, would be useful. I

Re: [Openais] cpg behavior on transitional membership change

2011-09-03 Thread Vladislav Bogdanov
Hi Jiaju, 03.09.2011 19:52, Jiaju Zhang wrote: On Fri, Sep 02, 2011 at 10:12:11PM +0300, Vladislav Bogdanov wrote: 02.09.2011 20:55, David Teigland wrote: [snip] I really can't make any sense of the report, sorry. Maybe reproduce it without pacemaker, and then describe the specific steps