Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-14 Thread Marco Felettigh
On Mon, 14 Apr 2014 14:40:43 +1000 Andrew Beekhof wrote: > > On 11 Apr 2014, at 10:54 pm, Marco Felettigh wrote: > > > On Fri, 11 Apr 2014 17:17:57 +1000 > > Andrew Beekhof wrote: > > > >> > >> On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > >> > >>> On Tue, 8 Apr 2014 10:49:16 +1000

Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-13 Thread Andrew Beekhof
On 11 Apr 2014, at 10:54 pm, Marco Felettigh wrote: > On Fri, 11 Apr 2014 17:17:57 +1000 > Andrew Beekhof wrote: > >> >> On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: >> >>> On Tue, 8 Apr 2014 10:49:16 +1000 >>> Andrew Beekhof wrote: >>> On 7 Apr 2014, at 8:46 pm, ma...@nuc

Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-11 Thread Marco Felettigh
On Fri, 11 Apr 2014 17:17:57 +1000 Andrew Beekhof wrote: > > On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > > > On Tue, 8 Apr 2014 10:49:16 +1000 > > Andrew Beekhof wrote: > > > >> > >> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: > >> > >>> Hi, > >>> in a production environment

Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-11 Thread Andrew Beekhof
On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > On Tue, 8 Apr 2014 10:49:16 +1000 > Andrew Beekhof wrote: > >> >> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: >> >>> Hi, >>> in a production environment with 2 nodes ( nodeA , nodeB ) we had an >>> hardware failure so we restart the

Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-07 Thread Andrew Beekhof
On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: > Hi, > in a production environment with 2 nodes ( nodeA , nodeB ) we had an > hardware failure so we restart the nodeB. > After the restarted nodeB came up we restart corosync/pacemaker on it > but for 2 days till now che corosync/pacemaker stuf

[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

2014-04-07 Thread marco
Hi, in a production environment with 2 nodes ( nodeA , nodeB ) we had an hardware failure so we restart the nodeB. After the restarted nodeB came up we restart corosync/pacemaker on it but for 2 days till now che corosync/pacemaker stuff is looping. crm_mon NodeA: Stack: openais Current DC: nodeA

Re: [Pacemaker] cib connection error

2013-09-23 Thread Andrew Beekhof
On 24/09/2013, at 2:09 AM, Халезов Иван wrote: > Hi all, > > I use pacemaker 1.1.9 with corosync 2.3 both built from source. > My OS is CentOS 6.4 x86_64 > > I have about 30 resources of one type managed by my own resource agent. It is > nesessary for the resource agent to know utilization pa

[Pacemaker] cib connection error

2013-09-23 Thread Халезов Иван
Hi all, I use pacemaker 1.1.9 with corosync 2.3 both built from source. My OS is CentOS 6.4 x86_64 I have about 30 resources of one type managed by my own resource agent. It is nesessary for the resource agent to know utilization parameter of the configured resource. I query for this parameter

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-24 Thread Dejan Muhamedagic
On Thu, Jan 24, 2013 at 09:10:33AM +0100, Jacek Konieczny wrote: > On Thu, 24 Jan 2013 09:04:14 +0100 > Jacek Konieczny wrote: > > I should probably upgrade my CIB somehow > > Indeed. 'cibadmin --upgrade --force' solved my problem. > Thanks for all the hints. crm(live)configure# help upgrade If

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-24 Thread Dejan Muhamedagic
On Thu, Jan 24, 2013 at 09:04:14AM +0100, Jacek Konieczny wrote: > Hi, > > On Wed, 23 Jan 2013 18:52:20 +0100 > Dejan Muhamedagic wrote: > > > > > > > > > > Note sure if id can start with a digit. > > Corosync node id's are always digits-only. > > > This should really work with ver

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-24 Thread Jacek Konieczny
On Thu, 24 Jan 2013 09:04:14 +0100 Jacek Konieczny wrote: > I should probably upgrade my CIB somehow Indeed. 'cibadmin --upgrade --force' solved my problem. Thanks for all the hints. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clus

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-24 Thread Jacek Konieczny
Hi, On Wed, 23 Jan 2013 18:52:20 +0100 Dejan Muhamedagic wrote: > > > > > > Note sure if id can start with a digit. Corosync node id's are always digits-only. > This should really work with versions >= v1.2.4 Yeah… I have looked into the crmsh code and it has explicit support for

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-23 Thread Dejan Muhamedagic
Hi, On Wed, Jan 23, 2013 at 04:31:20PM +0100, Jacek Konieczny wrote: > Hi, > > I have recently upgraded Pacemaker on one of my clusters from > 1.0.something to 1.1.8 and installed crmsh to manage it as I used to. > > crmsh mostly works for me, until I try to change the configuration with > 'crm

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-23 Thread Jacek Konieczny
On Wed, 23 Jan 2013 16:44:45 +0100 Lars Marowsky-Bree wrote: > On 2013-01-23T16:31:20, Jacek Konieczny wrote: > > > I have recently upgraded Pacemaker on one of my clusters from > > 1.0.something to 1.1.8 and installed crmsh to manage it as I used > > to. > > It'd be helpful if you mentioned w

Re: [Pacemaker] CIB verification failure with any change via crmsh

2013-01-23 Thread Lars Marowsky-Bree
On 2013-01-23T16:31:20, Jacek Konieczny wrote: > I have recently upgraded Pacemaker on one of my clusters from > 1.0.something to 1.1.8 and installed crmsh to manage it as I used to. It'd be helpful if you mentioned which crmsh version you installed. The errors you get suggest you need to update

[Pacemaker] CIB verification failure with any change via crmsh

2013-01-23 Thread Jacek Konieczny
Hi, I have recently upgraded Pacemaker on one of my clusters from 1.0.something to 1.1.8 and installed crmsh to manage it as I used to. crmsh mostly works for me, until I try to change the configuration with 'crm configure'. Any, even trivial change shows verification errors and fails to commit:

Re: [Pacemaker] CIB not saved

2012-03-29 Thread Fiorenza Meini
Normally we log an error at startup if we can't write there... did this not happen? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Ies, it happened. I saw a warning while writing CIB..bu

Re: [Pacemaker] CIB not saved

2012-03-29 Thread Andrew Beekhof
On Thu, Mar 29, 2012 at 8:45 PM, Fiorenza Meini wrote: > Il 29/03/2012 10:12, Rasto Levrinc ha scritto: > >> On Thu, Mar 29, 2012 at 9:54 AM, Fiorenza Meini  wrote: >>> >>> Hi there, >>> a strange thing happened to my two node cluster: I rebooted both machine >>> at >>> the same time, when s.o. we

Re: [Pacemaker] CIB not saved

2012-03-29 Thread Fiorenza Meini
Il 29/03/2012 10:12, Rasto Levrinc ha scritto: On Thu, Mar 29, 2012 at 9:54 AM, Fiorenza Meini wrote: Hi there, a strange thing happened to my two node cluster: I rebooted both machine at the same time, when s.o. went up again, no resources were configured anymore: as it was a fresh installatio

Re: [Pacemaker] CIB not saved

2012-03-29 Thread Rasto Levrinc
On Thu, Mar 29, 2012 at 9:54 AM, Fiorenza Meini wrote: > Hi there, > a strange thing happened to my two node cluster: I rebooted both machine at > the same time, when s.o. went up again, no resources were configured > anymore: as it was a fresh installation. Why ? > It was explained to me that the

[Pacemaker] CIB not saved

2012-03-29 Thread Fiorenza Meini
Hi there, a strange thing happened to my two node cluster: I rebooted both machine at the same time, when s.o. went up again, no resources were configured anymore: as it was a fresh installation. Why ? It was explained to me that the configuration of resources managed by pacemaker should be in

Re: [Pacemaker] cib not connected

2011-11-01 Thread Andrew Beekhof
On Tue, Oct 25, 2011 at 4:08 AM, Proskurin Kirill wrote: > Hello. > > corosync-1.4.1 > pacemaker-1.1.5 > pacemaker runs with "ver: 1" > > I run on strange problem. Hope someone can help me. > > I have 9 nodes cluster. All was fine till I need to reboot a node. > After reboot it don`t want to come

[Pacemaker] cib not connected

2011-10-24 Thread Proskurin Kirill
Hello. corosync-1.4.1 pacemaker-1.1.5 pacemaker runs with "ver: 1" I run on strange problem. Hope someone can help me. I have 9 nodes cluster. All was fine till I need to reboot a node. After reboot it don`t want to come back to cluster with "not in our membership" error. I happens with othe

Re: [Pacemaker] cib

2010-10-05 Thread Shravan Mishra
Really appreciate your response. I just wanted to close this thread by saying that we were able to figure out the problem. Since pacemaker was running on other virtual machines except our appliance clearly the problem was our runtime environment. It turns out that our libxml2 library on our appli

Re: [Pacemaker] cib

2010-10-05 Thread Andrew Beekhof
On Fri, Oct 1, 2010 at 3:45 PM, Shravan Mishra wrote: > Hi, > > Just a quick question, who generates the very first cib.xml when > pacemaker processes are initialized? The cib > > Thanks > Shravan > > On Thu, Sep 30, 2010 at 4:22 AM, Andrew Beekhof wrote: >> On Tue, Sep 28, 2010 at 11:47 AM, An

Re: [Pacemaker] cib

2010-10-01 Thread Shravan Mishra
Hi, Just a quick question, who generates the very first cib.xml when pacemaker processes are initialized? Thanks Shravan On Thu, Sep 30, 2010 at 4:22 AM, Andrew Beekhof wrote: > On Tue, Sep 28, 2010 at 11:47 AM, Andrew Beekhof wrote: >> On Mon, Sep 27, 2010 at 6:26 AM, Shravan Mishra >> wrote

Re: [Pacemaker] cib

2010-09-30 Thread Andrew Beekhof
On Tue, Sep 28, 2010 at 11:47 AM, Andrew Beekhof wrote: > On Mon, Sep 27, 2010 at 6:26 AM, Shravan Mishra > wrote: >> Thanks Raoul for the response. >> >> Changing the permission to hacluster:haclient did stop that error. >> >> Now I'm hitting another problem whereby cib is failing to start > > V

Re: [Pacemaker] cib

2010-09-29 Thread Shravan Mishra
Some more info: root 14170 14166 0 12:23 ?00:00:00 /usr/lib64/heartbeat/stonithd nobody 14172 14166 0 12:23 ?00:00:00 /usr/lib64/heartbeat/lrmd 82 14173 14166 0 12:23 ?00:00:00 /usr/lib64/heartbeat/attrd 82 14174 14166 0 12:23 ?00:00:00 /usr/l

Re: [Pacemaker] cib

2010-09-29 Thread Shravan Mishra
Hi, I did a bt on the core, this is what I found: == Core was generated by `/usr/lib64/heartbeat/cib'. Program terminated with signal 11, Segmentation fault. [New process 12340] #0 0x7f23acc553fa in strncmp () from /lib64/libc.so.6 (gdb) bt #0 0x7f23acc553fa in strncmp () fro

Re: [Pacemaker] cib

2010-09-28 Thread Shravan Mishra
Sorry forgot to attach my corosync.conf. = totem { version: 2 # token: 3000 # token_retransmits_before_loss_const: 10 # join: 60 # consensus: 1500 # vsftype: none # max_messages: 20 # clear_node_high_bit: yes secauth: off t

Re: [Pacemaker] cib

2010-09-28 Thread Andrew Beekhof
On Mon, Sep 27, 2010 at 6:26 AM, Shravan Mishra wrote: > Thanks Raoul for the response. > > Changing the permission to hacluster:haclient did stop that error. > > Now I'm hitting another problem whereby cib is failing to start Very strange logs. Which distribution is this? What does your corosync

Re: [Pacemaker] cib

2010-09-25 Thread Raoul Bhatia [IPAX]
On 24.09.2010 21:41, Shravan Mishra wrote: crmd[20612]: 2010/09/24_15:29:57 ERROR: crm_log_init_worker: Cannot change active directory to /var/lib/heartbeat/cores/hacluster: Permission denied (13) ls -ald /var/lib/heartbeat/cores/hacluster /var/lib/heartbeat/cores/ /var/lib/heartbeat/ /var/lib

[Pacemaker] cib

2010-09-24 Thread Shravan Mishra
Hi All, We recently upgraded to /usr/sbin/corosync -v Corosync Cluster Engine, version '1.2.1' SVN revision '2723:2724' Copyright (c) 2006-2009 Red Hat, Inc. In my logs I see the following lines: crmd[20612]: 2010/09/24_15:29:57 ERROR: crm_log_init_worker: Cannot change active directory to /var

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-17 Thread Michael Smith
Andrew Beekhof wrote: I spoke to Steve, and the only thing he could come up with was that the group might not be correct. When the cluster is in this state, please run: ps x -o pid,euser,ruser,egroup,rgroup,command And compare it to the "normal" output. Also, confirm that there is only one

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-17 Thread Andrew Beekhof
I spoke to Steve, and the only thing he could come up with was that the group might not be correct. When the cluster is in this state, please run: ps x -o pid,euser,ruser,egroup,rgroup,command And compare it to the "normal" output. Also, confirm that there is only one group named haclient, an

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-07 Thread Michael Smith
Michael Smith wrote: On Mon, 6 Sep 2010, Andrew Beekhof wrote: Is /dev/shm full (or not mounted) by any chance? No - I tried clearing that out, too. And corosync is actually running? Yes, it's logging "[IPC ] Invalid IPC credentials." when cib tries to connect. For what it's worth, I h

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-06 Thread Michael Smith
On Mon, 6 Sep 2010, Andrew Beekhof wrote: > >> Is /dev/shm full (or not mounted) by any chance? > > > > No - I tried clearing that out, too. > > And corosync is actually running? Yes, it's logging "[IPC ] Invalid IPC credentials." when cib tries to connect. Mike

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-05 Thread Andrew Beekhof
On Thu, Sep 2, 2010 at 2:18 PM, Michael Smith wrote: > On Thu, 2 Sep 2010, Andrew Beekhof wrote: > >> On Mon, Aug 30, 2010 at 10:04 PM, Michael Smith wrote: >> > Hi, >> > >> > I have a pacemaker/corosync setup on a bunch of fully patched SLES11 SP1 >> > systems. On one of the systems, if I /etc/i

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-02 Thread Michael Smith
On Thu, 2 Sep 2010, Andrew Beekhof wrote: > On Mon, Aug 30, 2010 at 10:04 PM, Michael Smith wrote: > > Hi, > > > > I have a pacemaker/corosync setup on a bunch of fully patched SLES11 SP1 > > systems. On one of the systems, if I /etc/init.d/openais stop, then > > /etc/init.d/openais start, pacema

Re: [Pacemaker] cib fails to start until host is rebooted

2010-09-01 Thread Andrew Beekhof
On Mon, Aug 30, 2010 at 10:04 PM, Michael Smith wrote: > Hi, > > I have a pacemaker/corosync setup on a bunch of fully patched SLES11 SP1 > systems. On one of the systems, if I /etc/init.d/openais stop, then > /etc/init.d/openais start, pacemaker fails to come up: Is /dev/shm full (or not mounted

[Pacemaker] cib fails to start until host is rebooted

2010-08-30 Thread Michael Smith
Hi, I have a pacemaker/corosync setup on a bunch of fully patched SLES11 SP1 systems. On one of the systems, if I /etc/init.d/openais stop, then /etc/init.d/openais start, pacemaker fails to come up: Aug 30 15:48:09 xen-test1 cib: [5858]: info: crm_cluster_connect: Connecting to OpenAIS Aug

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-08 Thread Andrew Beekhof
On Fri, Apr 2, 2010 at 4:16 PM, Alan Robertson wrote: >> Do it again, with higher log level.  Sorry, no time right now to rebuild >> your exact thing with your exact gcc and stuff to look at your core file. > > You can just download the RPM and extract the objects.  That's what I used. Spend half

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-02 Thread Lars Ellenberg
On Fri, Apr 02, 2010 at 08:16:32AM -0600, Alan Robertson wrote: > >Do it again, with higher log level. Sorry, no time right now to rebuild > >your exact thing with your exact gcc and stuff to look at your core file. > > You can just download the RPM and extract the objects. That's what I used.

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-02 Thread Alan Robertson
Lars Ellenberg wrote: On Thu, Apr 01, 2010 at 08:27:02AM -0600, Alan Robertson wrote: Lars Ellenberg wrote: On Thu, Apr 01, 2010 at 12:12:47AM -0600, Alan Robertson wrote: OK Since there was no ssh-as-root between the cluster nodes, I didn't send all the logs along from every node in the

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-01 Thread Lars Ellenberg
On Thu, Apr 01, 2010 at 08:27:02AM -0600, Alan Robertson wrote: > Lars Ellenberg wrote: > >On Thu, Apr 01, 2010 at 12:12:47AM -0600, Alan Robertson wrote: > >>OK > >> > >>Since there was no ssh-as-root between the cluster nodes, I didn't > >>send all the logs along from every node in the cluste

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-01 Thread Alan Robertson
Florian Haas wrote: On 2010-04-01 16:27, Alan Robertson wrote: None of them verified. All the nodes in the cluster failed the test at the same time - and now I have no official CIBs on disk - on any cluster nodes... I sent Andrew all the CIBs, and all the core files, and basically everything

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-01 Thread Florian Haas
On 2010-04-01 16:27, Alan Robertson wrote: > None of them verified. All the nodes in the cluster failed the test at > the same time - and now I have no official CIBs on disk - on any cluster > nodes... I sent Andrew all the CIBs, and all the core files, and > basically everything under /var/lib/

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-01 Thread Alan Robertson
Lars Ellenberg wrote: On Thu, Apr 01, 2010 at 12:12:47AM -0600, Alan Robertson wrote: OK Since there was no ssh-as-root between the cluster nodes, I didn't send all the logs along from every node in the cluster - and it didn't occur to me to look at all of them. However, the problem has go

Re: [Pacemaker] CIB write-to-disk bug?

2010-04-01 Thread Lars Ellenberg
On Thu, Apr 01, 2010 at 12:12:47AM -0600, Alan Robertson wrote: > OK > > Since there was no ssh-as-root between the cluster nodes, I didn't > send all the logs along from every node in the cluster - and it > didn't occur to me to look at all of them. > > However, the problem has gotten curios

Re: [Pacemaker] CIB write-to-disk bug?

2010-03-31 Thread Alan Robertson
OK Since there was no ssh-as-root between the cluster nodes, I didn't send all the logs along from every node in the cluster - and it didn't occur to me to look at all of them. However, the problem has gotten curioser and curioser - because ALL the nodes in the cluster reported the same

[Pacemaker] CIB write-to-disk bug?

2010-03-31 Thread Alan Robertson
Hi, I've run into what looks at first blush to be a CIB bug in writing to disk. The key messages from this incident are these: Mar 31 19:02:52 vhost0384 cib: [13294]: ERROR: validate_cib_digest: Digest comparision failed: expected 316049fa7ee8d2e107573ce7cded07cf (/var/lib/heartbeat/crm/cib.

Re: [Pacemaker] cib and attrd processes segfault

2010-02-17 Thread Alessandro Federico
> > > Please > - enable coredumps (set "ulimit -c unlimited" at the top of the > corosync init file) > - use hb_report to create a support tarball covering the problem > - attach the tarball to a new bug: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > Thats the minimu

Re: [Pacemaker] cib and attrd processes segfault

2010-02-16 Thread Andrew Beekhof
On Tue, Feb 16, 2010 at 4:36 PM, Alessandro Federico wrote: > Hi all, > we have just installed the latest versions of pacemaker/corosync software: > cluster-glue-1.0.3-1.el5.x86_64 > cluster-glue-libs-1.0.3-1.el5.x86_64 > corosync-1.2.0-1.el5.x86_64 > corosynclib-1.2.0-1.el5.x86_64 > heartbeat-3.0

[Pacemaker] cib and attrd processes segfault

2010-02-16 Thread Alessandro Federico
Hi all, we have just installed the latest versions of pacemaker/corosync software: cluster-glue-1.0.3-1.el5.x86_64 cluster-glue-libs-1.0.3-1.el5.x86_64 corosync-1.2.0-1.el5.x86_64 corosynclib-1.2.0-1.el5.x86_64 heartbeat-3.0.2-2.el5.x86_64 heartbeat-libs-3.0.2-2.el5.x86_64 pacemaker-1.0.7-4.el5.x

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-22 Thread Andrew Beekhof
And you'll also want this patch for the crmd diff -r 4619c842d58c crmd/callbacks.c --- a/crmd/callbacks.c Fri May 22 16:52:14 2009 +0200 +++ b/crmd/callbacks.c Fri May 22 21:34:12 2009 +0200 @@ -179,7 +179,6 @@ crmd_ha_msg_callback(HA_Message *hamsg, } else { crmd_ha_msg_fil

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-20 Thread Nikola Ciprich
On Wed, May 20, 2009 at 02:02:52PM +0200, Andrew Beekhof wrote: > Ah, well that was pretty obvious. > /me humbly apologizes for such a stupid error. Hi and thanks! no problem > (It wasn't caught by my own valgrind testing because this function is > specific to heartbeat based clusters) don't worr

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-20 Thread Andrew Beekhof
Ah, well that was pretty obvious. /me humbly apologizes for such a stupid error. (It wasn't caught by my own valgrind testing because this function is specific to heartbeat based clusters) Try this: diff -r ea5d0b58c0be cib/callbacks.c --- a/cib/callbacks.c Wed May 20 11:56:39 2009 +0200 +++

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-19 Thread Andrew Beekhof
I'll take a look at the valgrind data. Thanks! On Tue, May 19, 2009 at 6:39 PM, Nikola Ciprich wrote: > Hello, > sorry to bother again. I've discovered why valgrind didn't > find anything. It is important to stop the process in order to > have valgrind finish the analysis. And it seems that ther

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-19 Thread Nikola Ciprich
Hello, sorry to bother again. I've discovered why valgrind didn't find anything. It is important to stop the process in order to have valgrind finish the analysis. And it seems that there really are leaks not only in cib, but also in attrd and crmd. I just had a slight look into the code reporte

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-18 Thread Andrew Beekhof
On Sat, May 16, 2009 at 10:33 PM, Nikola Ciprich wrote: > Hi guys, > I was able to enable valgrind on our production cluster today, > but unfortunately only on the secondary node, I'll be allowed to enable > it on primary node hopefully during next weekend. > Unfortunately it seems that valgrind p

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-16 Thread Nikola Ciprich
Hi guys, I was able to enable valgrind on our production cluster today, but unfortunately only on the secondary node, I'll be allowed to enable it on primary node hopefully during next weekend. Unfortunately it seems that valgrind probably won't be of much help here. I've got some output from it,

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi guys, sooo I've got valgrind grinding:) I had some trouble getting the latest stuff working, so I used heartbeat-2.99.2 with Dejan's (fixed) patch and --enable-valgrind --with-valgrind-log="--log-file=/tmp/crm-%p.valgrind" and recompiled pacemaker-1.0.3 (withount openais as Andrew suggested).

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Andrew Beekhof
On Thu, May 14, 2009 at 3:58 PM, Nikola Ciprich wrote: > Hi, > Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker > keeps segfaulting > with it, and unable to rebuild pacemaker with this heartbeat to get the > -debug package. > compilation fails with: > > plugin.c: In

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi, Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker keeps segfaulting with it, and unable to rebuild pacemaker with this heartbeat to get the -debug package. compilation fails with: plugin.c: In function 'check_message_sanity': plugin.c:1190: warning: format '%d' ex

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-13 Thread Andrew Beekhof
On Wed, May 13, 2009 at 7:41 PM, Dejan Muhamedagic wrote: > Hi, > > On Wed, May 13, 2009 at 05:36:40PM +0200, Nikola Ciprich wrote: >> > holy ! >> yes! exactly! :) >> >> > sure >> > in theory you can just add "crm valgrind" instead of "crm yes" in ha.cf >> >> hmm, i tried that now, but all I got i

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-13 Thread Dejan Muhamedagic
Hi, On Wed, May 13, 2009 at 05:36:40PM +0200, Nikola Ciprich wrote: > > holy ! > yes! exactly! :) > > > sure > > in theory you can just add "crm valgrind" instead of "crm yes" in ha.cf > > hmm, i tried that now, but all I got is: > May 13 16:46:16 faxb heartbeat: [1655]: ERROR: Heartbeat was not

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-13 Thread Nikola Ciprich
> holy ! yes! exactly! :) > sure > in theory you can just add "crm valgrind" instead of "crm yes" in ha.cf hmm, i tried that now, but all I got is: May 13 16:46:16 faxb heartbeat: [1655]: ERROR: Heartbeat was not compiled with --enable-libc-malloc, "crm valgrind" is therefor not supported. So I

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-13 Thread Andrew Beekhof
On May 13, 2009, at 8:28 AM, Nikola Ciprich wrote: Hello, I've reported this some time ago, few days ago I've updated my system to pacemaker-1.0.3 + related packages. But unfortunately cib process seems to be still leaking,ie it's RSS memory usage is constantly growing. This means we have t

[Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-12 Thread Nikola Ciprich
Hello, I've reported this some time ago, few days ago I've updated my system to pacemaker-1.0.3 + related packages. But unfortunately cib process seems to be still leaking,ie it's RSS memory usage is constantly growing. This means we have to restart whole heartbeat service approximately once ever