I've put the wrong entry from "journalctl --since="2016-01-19" --until="2016-01-20"". The correct one is:
Jan 19 23:42:24 A2-2U12-302-LS ntpd[2204]: 0.0.0.0 c61c 0c clock_step -43194.111405 s Jan 19 11:42:29 A2-2U12-302-LS ntpd[2204]: 0.0.0.0 c614 04 freq_mode Jan 19 11:42:29 A2-2U12-302-LS systemd[1]: Time has been changed Yes, the really first monitor operation was successful. But I still have a question. The first monitor op was at [Jan 19 23:42:16] and the failure was considered by Pacemaker at [Jan 19 12:57:53]. So, the first one still remains older and here I don't understand why Pacemaker considers it failed? Thank you, Kostia On Tue, Jan 19, 2016 at 8:02 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 01/19/2016 10:30 AM, Kostiantyn Ponomarenko wrote: > > The resource that wasn't running, but was reported as running, is > > "adminServer". > > > > Here are a brief chronological description: > > > > [Jan 19 23:42:16] The first time Pacemaker triggers its monitor function > at > > line #1107. (those lines are from its Resource Agent) > > [Jan 19 23:42:16] Then Pacemaker starts the resource - line #1191. > > [Jan 19 11:42:53] The first failure is reported by monitor operation at > > line #1543. > > [Jan 19 11:42:53] The fail-count is set, but I don't see any attempt from > > Pacemaker to "start" the resource - the start function is not called > (from > > the logs) - line #1553. > > [Jan 19 12:27:56] Then adminServer's monitor operation keeps returning > > $OCF_NOT_RUNNING - starts at line #1860. > > [Jan 19 12:57:53] Then the expired failcount is cleared at line #1969. > > [Jan 19 12:57:53] Another call of the monitor function happens at line > > #2038. > > [Jan 19 12:57:53] I assume that the line #2046 means "not running" (?). > > [Jan 19 12:57:53] The "stop" function is called - line #2150 > > [Jan 19 12:57:53] The "start" function is called and the resource is > > successfully started - line #2164 > > > > > > The time change occurred while cluster was starting, I see this from > > "journalctl --since="2016-01-19" --until="2016-01-20"": > > > > Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step > > -43193.793349 s > > Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode > > Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed > > > > I am attaching corosync.log. > > The time change is interesting. I suspect what's happening is that > pacemaker considers the failed monitor "older" than the original > successful one, and so ignores it. > > In general, we don't support large clock shifts in a running cluster (a > topic that has come up before on this list). But if you can reproduce > the behavior with 1.1.14, feel free to open a bug report. It might be > worth revisiting to see if there is anything we can do about it. > > > Thank you, > > Kostia > > > > On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <bdobre...@mirantis.com > > > > wrote: > > > >> On 19.01.2016 16:13, Ken Gaillot wrote: > >>> On 01/19/2016 06:49 AM, Kostiantyn Ponomarenko wrote: > >>>> One of resources in my cluster is not actually running, but "crm_mon" > >> shows > >>>> it with the "Started" status. > >>>> Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but > >>>> Pacemaker doesn't react on this anyhow - crm_mon show the resource as > >>>> Started. > >>>> I couldn't find an explanation to this behavior, so I suppose it is a > >> bug, > >>>> is it? > >>> > >>> That is unexpected. Can you post the configuration and logs from around > >>> the time of the issue? > >>> > >> > >> Oh, sorry, I forgot to mention the related thread [0]. That is exactly > >> the case I reported there. Looks same, so I thought you've just updated > >> my thread :) > >> > >> These may be merged perhaps. > >> > >> [0] http://clusterlabs.org/pipermail/users/2016-January/002035.html > >> > >>> > >>> _______________________________________________ > >>> Users mailing list: Users@clusterlabs.org > >>> http://clusterlabs.org/mailman/listinfo/users > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org