>>> Andrew Beekhof <[email protected]> schrieb am 17.08.2015 um 00:08 in Nachricht <[email protected]>:
>> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <[email protected]> wrote: >> >> Hi Andrew, >> >> I managed to isolate / reproduce the issue. You might want to take a look, > as it might be present in 1.1.12 as well. >> >> I monitor my cluster from putty, mainly this way: >> - I have a putty (Windows client) session, that connects via SSH to the box, > authenticates using public key as a non-root user. >> - It immediately sends a "sudo crm_mon -Af" command, so with a single click > I have a nice view of what the cluster is doing. > > Perhaps add -1 to the option list. > The root cause seems to be that closing the putty window doesn’t actually > kill the process running inside it. Sorry, the root cause seems to be that cm_mon happily writes to a closed filehandle (I guess). If crm_mon would handle that error by exiting the loop, ther would be no need for putty to kill any process. > >> >> Whenever I close this putty window (terminate the app), crm_mon process gets > to 100% cpu usage, starts to leak, in a few hours consumes all memory and > then destroys the whole cluster. >> This does not happen if I leave crm_mon with Ctrl-C. >> >> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu > trusty packages. >> This might be related on how sudo executes crm_mon, and what it signalls to > crm_mon when it gets terminated. >> >> Now I know what I need to pay attention to in order to avoid this problem, > but you might want to check whether this issue is still present. >> >> >> Thanks, >> Attila >> >> >> >> >> >> >> -----Original Message----- >> From: Attila Megyeri [mailto:[email protected]] >> Sent: Friday, August 14, 2015 12:40 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed > <[email protected]> >> Subject: Re: [ClusterLabs] Memory leak in crm_mon ? >> >> >> >> -----Original Message----- >> From: Andrew Beekhof [mailto:[email protected]] >> Sent: Tuesday, August 11, 2015 2:49 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed > <[email protected]> >> Subject: Re: [ClusterLabs] Memory leak in crm_mon ? >> >> >>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <[email protected]> wrote: >>> >>> Hi! >>> >>> We are building a new cluster on top of pacemaker/corosync and several times > during the past days we noticed that „crm_mon -Af” used up all the > memory+swap and caused high CPU usage. Killing the process solves the issue. >>> >>> We are using the binary package versions available in the latest ubuntu > trusty, namely: >>> >>> crmsh 1.2.5+hg1034-1ubuntu4 > >>> pacemaker > 1.1.10+git20130802-1ubuntu2.3 >>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3 >>> corosync 2.3.3-1ubuntu1 >>> >>> Kernel is 3.13.0-46-generic >>> >>> Looking back some „atop” data, the CPU went to 100% many times during the > last couple of days, at various times, more often around midnight exaclty > (strange). >>> >>> 08.05 14:00 >>> 08.06 21:41 >>> 08.07 00:00 >>> 08.07 00:00 >>> 08.08 00:00 >>> 08.09 06:27 >>> >>> Checked the corosync log and syslog, but did not find any correlation > between the entries int he logs around the specific times. >>> For most of the time, the node running the crm_mon was the DC as well – not > running any resources (e.g. a pairless node for quorum). >>> >>> >>> We have another running system, where everything works perfecly, whereas it > is almost the same: >>> >>> crmsh 1.2.5+hg1034-1ubuntu4 > >>> pacemaker > 1.1.10+git20130802-1ubuntu2.1 >>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1 >>> corosync 2.3.3-1ubuntu1 >>> >>> Kernel is 3.13.0-8-generic >>> >>> >>> Is this perhaps a known issue? >> >> Possibly, that version is over 2 years old. >> >>> Any hints? >> >> Getting something a little more recent would be the best place to start >> >> Thanks Andew, >> >> I tried to upgrade to 1.1.12 using the packages availabe at > https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a > single node, to see how it works out but I ended up with errors like >> >> Could not establish cib_rw connection: Connection refused (111) >> >> I have disabled the firewall, no changes. The node appears to be running but > does not see any of the other nodes. On the other nodes I see this node as an > UNCLEAN one. (I assume corosync is fine, but pacemaker not) >> I use udpu for the transport. >> >> Am I doing something wrong? I tried to look for some howtos on upgrade, but > the only thing I found was the rather outdated > http://clusterlabs.org/wiki/Upgrade >> >> Could you please direct me to some howto/guide on how to perform the > upgrade? >> >> Or am I facing some compatibility issue, so I should extract the whole cib, > upgrade all nodes and reconfigure the cluster from the scratch? (The cluster > is meant to go live in 2 days... :) ) >> >> Thanks a lot in advance >> >> >> >> >>> >>> Thanks! >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
