On 30/05/2019 14:06, Jan Pokorný wrote: > On 30/05/19 11:01 +0100, lejeczek wrote: >> On 29/05/2019 21:04, Ken Gaillot wrote: >>> On Wed, 2019-05-29 at 17:28 +0100, lejeczek wrote: >>>> and: >>>> $ systemctl status -l pacemaker.service >>>> ● pacemaker.service - Pacemaker High Availability Cluster Manager >>>> Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; >>>> disabled; vendor preset: disabled) >>>> Active: active (running) since Wed 2019-05-29 17:21:45 BST; 7s ago >>>> Docs: man:pacemakerd >>>> >>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html >>>> Main PID: 51617 (pacemakerd) >>>> Tasks: 1 >>>> Memory: 3.3M >>>> CGroup: /system.slice/pacemaker.service >>>> └─51617 /usr/sbin/pacemakerd -f >>>> >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing pengine process (pid=51528) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing lrmd process (pid=51542) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing stonithd process (pid=51558) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing attrd process (pid=51559) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing cib process (pid=51560) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Tracking >>>> existing crmd process (pid=51566) >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Quorum acquired >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Node >>>> whale.private state is now member >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Node >>>> swir.private state is now member >>>> May 29 17:21:45 rider.private pacemakerd[51617]: notice: Node >>>> rider.private state is now member > I grok that you've, in parallel, started asking about this part also > on the systemd ML, and I redirected that thread here (but my message > still didn't hit here for being stuck in the moderation queue, since > I use different addresses on these two lists -- you can still respond > right away to that as readily available via said systemd list, just > make sure you only target users@cl.o, it was really unrelated to > systemd). > > In a nutshell, we want to know how you get into such situation that > entirely detached subdaemons would be flowing in your environment, > prior to starting pacemaker.service (or after stopping it). > That's rather unexpected. > If you can dig up traces of any pacemaker associated processes > (search pattern: pacemaker*|attrd|cib|crmd|lrmd|stonithd|pengine) > dying (+ the messages logged immediately before that if at all), > it could help up diagnose your situation.
I think.... it's time. I cannot afford to investigate it by trying to revert to state when it failed. Should be easy for devel in a lab to try to reproduce it - time, was not in sync between three nodes, a few minutes discrepancy between the nodes. On that one node with crippled systemd's service I was getting: $ pcs status --all Error: cluster is not currently running on this node If it really is time, then maybe some checks should be put in place(pacemaker/corosync). Everybody knows how time is vital for everything, but sometimes it can escape our attention, checks would be of great help value. many thanks, L p.s !!!! be aware of the time, always!!! > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
pEpkey.asc
Description: application/pgp-keys
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/