On 30/05/2019 14:06, Jan Pokorný wrote:
> On 30/05/19 11:01 +0100, lejeczek wrote:
>> On 29/05/2019 21:04, Ken Gaillot wrote:
>>> On Wed, 2019-05-29 at 17:28 +0100, lejeczek wrote:
>>>> and:
>>>> $ systemctl status -l pacemaker.service 
>>>> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>>>>    Loaded: loaded (/usr/lib/systemd/system/pacemaker.service;
>>>> disabled; vendor preset: disabled)
>>>>    Active: active (running) since Wed 2019-05-29 17:21:45 BST; 7s ago
>>>>      Docs: man:pacemakerd
>>>>            
>>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
>>>>  Main PID: 51617 (pacemakerd)
>>>>     Tasks: 1
>>>>    Memory: 3.3M
>>>>    CGroup: /system.slice/pacemaker.service
>>>>            └─51617 /usr/sbin/pacemakerd -f
>>>>
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing pengine process (pid=51528)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing lrmd process (pid=51542)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing stonithd process (pid=51558)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing attrd process (pid=51559)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing cib process (pid=51560)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking 
>>>> existing crmd process (pid=51566)
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Quorum acquired
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node 
>>>> whale.private state is now member
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node 
>>>> swir.private state is now member
>>>> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node 
>>>> rider.private state is now member
> I grok that you've, in parallel, started asking about this part also
> on the systemd ML, and I redirected that thread here (but my message
> still didn't hit here for being stuck in the moderation queue, since
> I use different addresses on these two lists -- you can still respond
> right away to that as readily available via said systemd list, just
> make sure you only target users@cl.o, it was really unrelated to
> systemd).
>
> In a nutshell, we want to know how you get into such situation that
> entirely detached subdaemons would be flowing in your environment,
> prior to starting pacemaker.service (or after stopping it).
> That's rather unexpected.
> If you can dig up traces of any pacemaker associated processes
> (search pattern: pacemaker*|attrd|cib|crmd|lrmd|stonithd|pengine)
> dying (+ the messages logged immediately before that if at all),
> it could help up diagnose your situation.

I think.... it's time. I cannot afford to investigate it by trying to
revert to state when it failed. Should be easy for devel in a lab to try
to reproduce it - time, was not in sync between three nodes, a few
minutes discrepancy between the nodes.

On that one node with crippled systemd's service I was getting:

$ pcs status --all
Error: cluster is not currently running on this node

If it really is time, then maybe some checks should be put in
place(pacemaker/corosync). Everybody knows how time is vital for
everything, but sometimes it can escape our attention, checks would be
of great help value.

many thanks, L

p.s !!!! be aware of the time, always!!!

>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


Attachment: pEpkey.asc
Description: application/pgp-keys

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to