Jan, thank you for the help. I assume that my understanding of what was happening is correct.
Thank you, Kostia On Fri, Feb 19, 2016 at 12:11 PM, Jan Friesse <[email protected]> wrote: > > Jan, thank you for the answer. I still have few questions. >> >> Also, FYI, this happened on a cluster with just one node. >> The cluster is designed to be a two-node cluster, with possibility to work >> even with only one node. >> >> and if corosync was not scheduled for more than token timeout "Process >>>> >>> pause detected for ..." message is displayed and new membership is >> formed. >> That meant other nodes will call STONITH to fence that node. Right? >> Although, it didn't happen that way, because there was no the other >> "online" node in the cluster. >> And because I didn't configure my fencing device to accept node-list, it >> was called by the cluster. >> And the call was successful because of the logic - the other node was not >> connected to the STONITH physical device, equals to a "successful >> STONITH". >> And then that Pacemaker's logic worked out to to shut down itself ("crit:" >> message in the log). >> >> There is really no help. It's best to make sure corosync is scheduled >>>> >>> regularly. >> I may sound silly, but how can I do it? >> > > It's actually very hard to say. Pauses like 30 sec is really unusual and > shouldn't happen (specially with RT scheduling). It is usually happening on > VM where host is overcommitted. I don't think it's your case. Another > problem may be thrashing (what in theory may be the case because of "Out of > memory: Kill process 8674 ..." message). > > > > >> I did: >> # grep -n --color "Process pause detected for\|Corosync main process was >> not scheduled for\|invoked oom-killer\|kernel: Out of memory: Kill >> process\|kernel: Killed process\|RA:\|error:\|fence\|STONITH" >> B5-2U-205-LS.log >B5-2U-205-LS.log.cluster_excerpt >> >> and attached it to the letter. >> >> >> I see complete disaster in the syslog. >> Correct me if I am wrong, but here I will try to analyze what happened to >> the cluster: >> >> At some time, when the system has already been working under the really >> high load for about 16 hours, Corosync started to report that "Corosync >> main process was not scheduled for ...". >> Which means that Corosync wasn't scheduled by the OS often enough so it >> couldn't detect membership changes (token timeout). >> Then, after a few such messages which appeared almost in a row, monitor >> operation of few resources failed. >> Question: in the log, entries from Resource Agent is shown first, then >> "lrmd" reports a timeout problem, like this: >> >> Jan 29 07:00:19 B5-2U-205-LS diskHelper(sm0dh)[18835]: WARNING: RA: >> [monitor] : got rc=1 >> Jan 29 07:00:32 B5-2U-205-LS lrmd[3012]: notice: operation_finished: >> sm0dh_monitor_30000:18803:stderr [ Failed to get properties: Connection >> timed out ] >> >> Does it mean that the monitor failed because of timeout? >> >> Two minutes later Corosync started to report another message "Process >> pause >> detected for ..." >> Which means that Corosync was not scheduled for more than a token timeout. >> Then five minutes later I heave this line in the log: >> >> Jan 29 07:05:54 B5-2U-205-LS kernel: stonithd invoked oom-killer: >> gfp_mask=0x201da, order=0, oom_score_adj=0 >> >> Which I assume states for "stonithd tried to allocate some memory, and >> kernel decided to run oom-killer, because there was no enough available >> memory". I am right here? >> Why stonithd activated that time? >> Was it because of "Process pause detected for ..." Corosync's message? >> What stonithd actually aimed to do? >> >> Then oom-killer kills one of heavy processes. >> Then systemd-journal requests memory (?) and another one heavy resource >> goes down, killed by oom-killer. >> >> Both killed resources was under Pacemaker's control. >> Other processes managed by Pacemaker report monitor timeout (?). >> Then one of them times out on "stop" operation and so Pacemaker requests a >> node to be STONITHed. >> There is only one node in the cluster and the only running resource is not >> designed (properly - don't have "node-list") to kill the node on which it >> runs. >> And because there is no another guy physically connected to the fencing >> device - STONITH reports success. >> Pacemaker's internal check works out (thank you guys!) and Pacemaker shuts >> down itself. >> >> >> Please, correct me if I am wrong in this log analyzing. I just want to >> level up in understanding what is happening here. >> As a general question, is this all happened because of: >> >> For some reasons Corosync started to experience a lack of processor time >> (scheduling). >> That is why monitor operations started to time out. >> Than after "Process pause detected for ..." message I assume the node >> should be STONITHed by the other node, but there is no another node, so >> what should happen in that case? >> Than for some reasons "stonithd" triggered "oom-killer", which killed one >> of the managed resources. Why? >> Monitor function time out for all resources continuously. >> Eventually "stop" function times out for one of the resources, that is why >> Pacemaker eventually shuts down. >> >> Please correct me in case I am wrong anywhere in my assumptions. >> >> Thank you for spending your precious time reading all this =) >> Hope for some help here =) >> >> >> Thank you, >> Kostia >> >> On Wed, Feb 17, 2016 at 6:47 PM, Jan Friesse <[email protected]> wrote: >> >> Kostiantyn Ponomarenko napsal(a): >>> >>> Thank you for the suggestion. >>>> The OS is Debian 8. All Packages are build by myself. >>>> libqb-0.17.2 >>>> corosync-2.3.5 >>>> cluster-glue-1.0.12 >>>> pacemaker-1.1.13 >>>> >>>> It is really important for me to understand what is happening with the >>>> cluster under the high load. >>>> >>>> >>> For Corosync it's really simple. Corosync has to be scheduled by OS >>> regularly (more often than it's current token timeout) to be able to >>> detect >>> membership changes and send/receive messages (cpg). If it's not >>> scheduled, >>> membership is not up to date and eventually when it's finally scheduled, >>> it >>> logs "process was not scheduled for ... ms" message (warning for user) >>> and >>> if corosync was not scheduled for more than token timeout "Process pause >>> detected for ..." message is displayed and new membership is formed. >>> Other >>> nodes (if scheduled regularly) sees non regularly scheduled node as dead. >>> >>> So I would appreciate any help here =) >>> >>>> >>>> >>> There is really no help. It's best to make sure corosync is scheduled >>> regularly. >>> >>> >>> >>>> Thank you, >>>> Kostia >>>> >>>> On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods <[email protected]> wrote: >>>> >>>> >>>> On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko < >>>>> [email protected]> wrote: >>>>> >>>>> Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main >>>>> >>>>>> process was not scheduled for 12483.7363 ms (threshold is 800.0000 >>>>>> ms). >>>>>> Consider token timeout increase. >>>>>> >>>>>> >>>>> >>>>> I was having this problem as well. You don't say which version of >>>>> corosync >>>>> you are running or on what OS, but on CentOS 7, there is an available >>>>> >>>>> >>>> This update sets round robin realtime scheduling for corosync by >>> default. >>> Same can be achieved without update by editing /etc/sysconfig/corosync >>> and >>> changing COROSYNC_OPTIONS line to something like COROSYNC_OPTIONS="-r" >>> >>> Regards, >>> Honza >>> >>> >>> update that looks like it might address this (it has to do with >>> >>>> scheduling). We haven't gotten around to actually applying it yet >>>>> because >>>>> it will require some down time on production services (we do have a few >>>>> node-locked VMs in our cluster), and it only happens when the system is >>>>> under very high load, so I can't say for sure the update will fix the >>>>> issue, but it might be worth looking into. >>>>> >>>>> --Greg >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: [email protected] >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: [email protected] >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
