On Mon, Jan 8, 2018 at 9:51 AM, Nish Aravamudan
<nish.aravamu...@canonical.com> wrote:
> On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <victor.ta...@canonical.com> 
> wrote:
>> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
>> throws the same error as the upgrade. Syslog from Xenial +
>> corosync=2.3.5-3ubuntu1:
>>
>> Jan  8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High 
>> Availability Cluster Manager...
>> Jan  8 16:24:37 xenial-corosync pacemakerd[28747]:   notice: Invoking 
>> handler for signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync crmd[28753]:   notice: Invoking handler for 
>> signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync crmd[28753]:   notice: State transition 
>> S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN 
>> origin=crm_shutdown ]
>> Jan  8 16:24:37 xenial-corosync pengine[28752]:   notice: Delaying fencing 
>> operations until there are resources to manage
>> Jan  8 16:24:37 xenial-corosync pengine[28752]:   notice: Scheduling Node 
>> xenial-corosync for shutdown
>> Jan  8 16:24:37 xenial-corosync pengine[28752]:   notice: Calculated 
>> Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
>> Jan  8 16:24:37 xenial-corosync crmd[28753]:   notice: Transition 1 
>> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
>> Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
>> Jan  8 16:24:37 xenial-corosync crmd[28753]:   notice: Disconnecting from 
>> Corosync
>> Jan  8 16:24:37 xenial-corosync cib[28748]:  warning: new_event_notification 
>> (28748-28753-12): Broken pipe (32)
>> Jan  8 16:24:37 xenial-corosync pengine[28752]:   notice: Invoking handler 
>> for signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync attrd[28751]:   notice: Invoking handler for 
>> signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync lrmd[28750]:   notice: Invoking handler for 
>> signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync stonith-ng[28749]:   notice: Invoking 
>> handler for signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync cib[28748]:   notice: Invoking handler for 
>> signal 15: Terminated
>> Jan  8 16:24:37 xenial-corosync cib[28748]:   notice: Disconnecting from 
>> Corosync
>> Jan  8 16:24:37 xenial-corosync cib[28748]:   notice: Disconnecting from 
>> Corosync
>> Jan  8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High 
>> Availability Cluster Manager.
>>
>>
>> Pacemakerd shuts down sending SIGTERM to its components, but after the 
>> install, corosync does not start pacemaker. BTW, "systemctl restart 
>> corosync" restarts both services perfectly
>>
>> I think that the option A from James Page (#11) is the way to go
>
> I took a quick look at a LXD container after seeing Felipe and
> Victor's posts. It seems like this is a bug in the xenial (at least)
> systemd unit files:
>
> # grep pacemaker /lib/systemd/system/corosync.service
> #  pacemaker.service, and if you want to exert the watchdog when a
>
> # grep corosync /lib/systemd/system/pacemaker.service
> After=corosync.service
> Requires=corosync.service
> # ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'
>
> So, what I see is that corosync.service has no dependency on
> pacemaker.service (in the file).
>
> pacemaker.service will start after corosync.service. And when
> pacemaker.service is shutdown it will be before corosync.service.
> Additionally, if pacemaker.service is started, then corosync.service
> is started as well.
>
> Note, nothing specifies what Felipe said -- there is no guarantee that
> pacemaker is started, restarted, etc. when corosync is.
>
> I think the next step is to look at Bionic's systemd services
> (probably newer) or upstream's and see if there is a difference, or
> new dependencies added there.

Or perhaps ask upstream what they think is providing this assurance in
their systemd files, because I'm not seeing it.

If we have a hard dependency between pacemaker and corosync, then I
think we might need a PartOf directive, in order to ensure they are
always following the state transitions together.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1740892

Title:
  corosync upgrade on 2018-01-02 caused pacemaker to fail

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to