-----Original Message----- From: Andrew Beekhof [mailto:[email protected]] Sent: Tuesday, August 11, 2015 2:49 AM To: Cluster Labs - All topics related to open-source clustering welcomed <[email protected]> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <[email protected]> wrote: > > Hi! > > We are building a new cluster on top of pacemaker/corosync and several times > during the past days we noticed that „crm_mon -Af” used up all the > memory+swap and caused high CPU usage. Killing the process solves the issue. > > We are using the binary package versions available in the latest ubuntu > trusty, namely: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.3 > pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-46-generic > > Looking back some „atop” data, the CPU went to 100% many times during the > last couple of days, at various times, more often around midnight exaclty > (strange). > > 08.05 14:00 > 08.06 21:41 > 08.07 00:00 > 08.07 00:00 > 08.08 00:00 > 08.09 06:27 > > Checked the corosync log and syslog, but did not find any correlation between > the entries int he logs around the specific times. > For most of the time, the node running the crm_mon was the DC as well – not > running any resources (e.g. a pairless node for quorum). > > > We have another running system, where everything works perfecly, whereas it > is almost the same: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.1 > pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-8-generic > > > Is this perhaps a known issue? Possibly, that version is over 2 years old. > Any hints? Getting something a little more recent would be the best place to start Thanks Andew, I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like Could not establish cib_rw connection: Connection refused (111) I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not) I use udpu for the transport. Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated http://clusterlabs.org/wiki/Upgrade Could you please direct me to some howto/guide on how to perform the upgrade? Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) ) Thanks a lot in advance > > Thanks! > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
