> On 17 Aug 2015, at 4:35 pm, Ulrich Windl <[email protected]> > wrote: > >>>> Andrew Beekhof <[email protected]> schrieb am 17.08.2015 um 00:08 in > Nachricht > <[email protected]>: > >>> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <[email protected]> > wrote: >>> >>> Hi Andrew, >>> >>> I managed to isolate / reproduce the issue. You might want to take a look, > >> as it might be present in 1.1.12 as well. >>> >>> I monitor my cluster from putty, mainly this way: >>> - I have a putty (Windows client) session, that connects via SSH to the > box, >> authenticates using public key as a non-root user. >>> - It immediately sends a "sudo crm_mon -Af" command, so with a single click > >> I have a nice view of what the cluster is doing. >> >> Perhaps add -1 to the option list. >> The root cause seems to be that closing the putty window doesn’t actually > >> kill the process running inside it. > > Sorry, the root cause seems to be that cm_mon happily writes to a closed > filehandle (I guess). If crm_mon would handle that error by exiting the loop, > ther would be no need for putty to kill any process.
No, if you want a process to die you need to kill it. > >> >>> >>> Whenever I close this putty window (terminate the app), crm_mon process > gets >> to 100% cpu usage, starts to leak, in a few hours consumes all memory and >> then destroys the whole cluster. >>> This does not happen if I leave crm_mon with Ctrl-C. >>> >>> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu >> trusty packages. >>> This might be related on how sudo executes crm_mon, and what it signalls to > >> crm_mon when it gets terminated. >>> >>> Now I know what I need to pay attention to in order to avoid this problem, > >> but you might want to check whether this issue is still present. >>> >>> >>> Thanks, >>> Attila >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Attila Megyeri [mailto:[email protected]] >>> Sent: Friday, August 14, 2015 12:40 AM >>> To: Cluster Labs - All topics related to open-source clustering welcomed >> <[email protected]> >>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ? >>> >>> >>> >>> -----Original Message----- >>> From: Andrew Beekhof [mailto:[email protected]] >>> Sent: Tuesday, August 11, 2015 2:49 AM >>> To: Cluster Labs - All topics related to open-source clustering welcomed >> <[email protected]> >>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ? >>> >>> >>>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <[email protected]> > wrote: >>>> >>>> Hi! >>>> >>>> We are building a new cluster on top of pacemaker/corosync and several > times >> during the past days we noticed that „crm_mon -Af” used up all the >> memory+swap and caused high CPU usage. Killing the process solves the > issue. >>>> >>>> We are using the binary package versions available in the latest ubuntu >> trusty, namely: >>>> >>>> crmsh > 1.2.5+hg1034-1ubuntu4 >> >>>> pacemaker >> 1.1.10+git20130802-1ubuntu2.3 >>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3 > >>>> corosync 2.3.3-1ubuntu1 >>>> >>>> Kernel is 3.13.0-46-generic >>>> >>>> Looking back some „atop” data, the CPU went to 100% many times during > the >> last couple of days, at various times, more often around midnight exaclty >> (strange). >>>> >>>> 08.05 14:00 >>>> 08.06 21:41 >>>> 08.07 00:00 >>>> 08.07 00:00 >>>> 08.08 00:00 >>>> 08.09 06:27 >>>> >>>> Checked the corosync log and syslog, but did not find any correlation >> between the entries int he logs around the specific times. >>>> For most of the time, the node running the crm_mon was the DC as well – > not >> running any resources (e.g. a pairless node for quorum). >>>> >>>> >>>> We have another running system, where everything works perfecly, whereas > it >> is almost the same: >>>> >>>> crmsh > 1.2.5+hg1034-1ubuntu4 >> >>>> pacemaker >> 1.1.10+git20130802-1ubuntu2.1 >>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1 >>>> corosync 2.3.3-1ubuntu1 >>>> >>>> Kernel is 3.13.0-8-generic >>>> >>>> >>>> Is this perhaps a known issue? >>> >>> Possibly, that version is over 2 years old. >>> >>>> Any hints? >>> >>> Getting something a little more recent would be the best place to start >>> >>> Thanks Andew, >>> >>> I tried to upgrade to 1.1.12 using the packages availabe at >> https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a > >> single node, to see how it works out but I ended up with errors like >>> >>> Could not establish cib_rw connection: Connection refused (111) >>> >>> I have disabled the firewall, no changes. The node appears to be running > but >> does not see any of the other nodes. On the other nodes I see this node as > an >> UNCLEAN one. (I assume corosync is fine, but pacemaker not) >>> I use udpu for the transport. >>> >>> Am I doing something wrong? I tried to look for some howtos on upgrade, but > >> the only thing I found was the rather outdated >> http://clusterlabs.org/wiki/Upgrade >>> >>> Could you please direct me to some howto/guide on how to perform the >> upgrade? >>> >>> Or am I facing some compatibility issue, so I should extract the whole cib, > >> upgrade all nodes and reconfigure the cluster from the scratch? (The cluster > >> is meant to go live in 2 days... :) ) >>> >>> Thanks a lot in advance >>> >>> >>> >>> >>>> >>>> Thanks! >>>> _______________________________________________ >>>> Users mailing list: [email protected] >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> _______________________________________________ >>> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
