Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2023-01-17 Thread Ross Vandegrift
On Fri, Dec 16, 2022 at 03:48:00PM -0800, Ross Vandegrift wrote:
> At a high level the issue is: firewalld.service forces network-pre.target 
> after
> sysinit.target, but cloud-init.service forces the other way around.  In 
> detail,
> using < to represent Before, the imposed orderings look like:
> 
> - from firewalld:
>   sysinit.target < dbus.service < firewalld.service < network-pre.target
> - from cloud-init:
>   cloud-init-local.service < network-pre.target < 
> systemd-networkd-wait-online.service < cloud-init.service < sysinit.target
> 
> There's a few approaches to resolving this.  As far as I can tell, the only
> immediately viable one (at the bottom) requires users to manually fix this
> and accept some trade-offs.  Anyone have any better ideas?

We discussed this issue on the recent cloud-team meeting and had some
revised options.

> Modify firewalld to run before sysinit.target 
> -
[snip]

This one still seems impossible.

> Modify cloud-init to run after sysinit.target
> -
[snip]

The main downside of this one, is that cloud-init will be running too
late to configure block devices.  But this feature didn't always work
well.  So maybe we'd affect a non-working feature.

I've confirmed that cloud-init's block device setup is working well on
AWS at least.  So I think this will break working cloud-init features.
IMO, that means it is not viable.

> Locally override firewalld.service's order
> --
[snip]

This remains unattractive since unsuspecting users will be left with
broken images and no clear path to fix the problem.


Modify dbus to run later


We discussed a way improve things by shuffling dbus later, but I didn't
take good enough notes, and I can't reconstruct the details.  Sorry for
forgetting - Bastian do you recall the details?


Add Breaks or Conflicts to prevent coinstallation
-

None of the alternatives seem reasonable and installing cloud-init and
firewalld cannot produce a working Debian image.  So we should prevent
this state.

We thought Conflicts might be required because once both are unpacked,
the problematic cycle technically exists.  Though it may not cause harm
unless both services are (re-)started simultaneously.

Ross



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-16 Thread Ross Vandegrift
On Mon, Dec 12, 2022 at 05:41:46PM -0800, Noah Meyerhans wrote:
> On 12/12/2022 6:44 AM, Sam Hartman wrote:
> >  >> From my quick read: Michael Biebl proposes dropping
> >  >> network-pre.target
> >  Ross> from cloud-init's After=, and replacing it with each of the
> >  Ross> config backends that cloud-init supports.  This sounds pretty
> >  Ross> reasonable, but also like something that upstream should
> >  Ross> address first.
> > 
> > Why wait for upstream?
> > It's a bug affecting Debian users, our systemd maintainer has a solution
> > that you (and I) think is reasonable.
> > The symptom is quite serious.
> > We often make changes before upstream in situations like that,
> > especially when the alternative is:
> > 
> >  Ross> Should we consider adding "Conflicts: firewalld" to cloud-init
> >  Ross> before the freeze?  That's not optimal of course, but it'd
> >  Ross> prevent a user from ending up in this situation for now.
> > 
> > I'd much rather see Debian local changes than conflicts.
> 
> We should simply move this discussion to an upstream pull request rather
> than wait passively for their response. I agree that diverging from upstream
> is preferable to unnecessary conflicts, but it shouldn't be done without
> first consulting with upstream on our proposed solution.

I played with the suggested solution and was unable to get it working:
cloud-init.service doesn't have a /direct/ Before=network-pre.target to remove.
The ordering is implicit in the combination of units.

Probably, I think Michael knew that when he made the suggestion - but I had to
play with it for a few hours first. :)

At a high level the issue is: firewalld.service forces network-pre.target after
sysinit.target, but cloud-init.service forces the other way around.  In detail,
using < to represent Before, the imposed orderings look like:

- from firewalld:
  sysinit.target < dbus.service < firewalld.service < network-pre.target
- from cloud-init:
  cloud-init-local.service < network-pre.target < 
systemd-networkd-wait-online.service < cloud-init.service < sysinit.target

There's a few approaches to resolving this.  As far as I can tell, the only
immediately viable one (at the bottom) requires users to manually fix this
and accept some trade-offs.  Anyone have any better ideas?



Modify firewalld to run before sysinit.target 
-

This would let cloud-init and firewalld agree to do network-pre.target before
sysinit.target.

This is probably not possible since firewalld requires dbus, which starts after
sysinit.target.  There's a thread at [1] about why moving firewalld to be an
early boot service is difficult.
   

Modify cloud-init to run after sysinit.target
-

This would let cloud-init and firewalld agree to do network-pre.target after
sysinit.target.  This might not be advisable (see comments in [1] about running
network management services in late boot), but it looks like this is how RHEL
does it [2].

>From [3], I think cloud-init.service added Before=basic.target (which
eventually became Before=sysinit.target) to ensure cloud-init configured block
device mounts were ready early enough in boot process.  The network needs to be
online for this, since some block device config can come from network sources.
So changing this in the Debian package seems risky to me.


Locally override firewalld.service's order
--

If you need to use both together, create an override unit that removes
Before=network-pre.target.  This eliminates the cycle by allowing cloud-init's
order to win.  But it the network will be up without firewalld for a period.
Unfortunately, dependencies can't be removed in a drop-in - so I think you need
to copy the unit to /etc/systemd/system and modify it.

Ross

[1] - 
https://lists.freedesktop.org/archives/systemd-devel/2022-March/047538.html
[2] - 
https://github.com/canonical/cloud-init/blob/main/systemd/cloud-init.service.tmpl#L4-L6
[3] - 
https://github.com/canonical/cloud-init/commit/80f5ec4be0f781b26eca51d90d51abfab396b3f6



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-12 Thread Noah Meyerhans

On 12/12/2022 6:44 AM, Sam Hartman wrote:

 >> From my quick read: Michael Biebl proposes dropping
 >> network-pre.target
 Ross> from cloud-init's After=, and replacing it with each of the
 Ross> config backends that cloud-init supports.  This sounds pretty
 Ross> reasonable, but also like something that upstream should
 Ross> address first.

Why wait for upstream?
It's a bug affecting Debian users, our systemd maintainer has a solution
that you (and I) think is reasonable.
The symptom is quite serious.
We often make changes before upstream in situations like that,
especially when the alternative is:

 Ross> Should we consider adding "Conflicts: firewalld" to cloud-init
 Ross> before the freeze?  That's not optimal of course, but it'd
 Ross> prevent a user from ending up in this situation for now.

I'd much rather see Debian local changes than conflicts.


We should simply move this discussion to an upstream pull request rather 
than wait passively for their response. I agree that diverging from 
upstream is preferable to unnecessary conflicts, but it shouldn't be 
done without first consulting with upstream on our proposed solution.


noah



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-12 Thread Sam Hartman
> "Ross" == Ross Vandegrift  writes:

>> From my quick read: Michael Biebl proposes dropping
>> network-pre.target
Ross> from cloud-init's After=, and replacing it with each of the
Ross> config backends that cloud-init supports.  This sounds pretty
Ross> reasonable, but also like something that upstream should
Ross> address first.

Why wait for upstream?
It's a bug affecting Debian users, our systemd maintainer has a solution
that you (and I) think is reasonable.
The symptom is quite serious.
We often make changes before upstream in situations like that,
especially when the alternative is:

Ross> Should we consider adding "Conflicts: firewalld" to cloud-init
Ross> before the freeze?  That's not optimal of course, but it'd
Ross> prevent a user from ending up in this situation for now.

I'd much rather see Debian local changes than conflicts.



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-08 Thread Guillaume Knispel
Hi Ross,

> Should we consider adding "Conflicts: firewalld" to cloud-init before
> the freeze?  That's not optimal of course, but it'd prevent a user from
> ending up in this situation for now.

Is there a way to bypass "Conflicts" and install such packages anyway,
in case the user finds a way to customize the configuration in a way that
fits their needs?

For example, on one of my systems I kept cloud-init installed for now, but
I disabled and masked cloud-init.service.

Cheers!
Guillaume



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-07 Thread Ross Vandegrift
Control: forwarded -1 
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1956629

Hi Guillaume,

On Tue, Dec 06, 2022 at 06:26:26PM +0100, Guillaume Knispel wrote:
> firewalld and cloud-init have ordering cycles between their systemd unit
> files, leading to more or less broken boot results when both are installed
> and active, because at each boot systemd decides to skip a
> non-deterministically choosen service (not necessarily cloud-init or
> firewalld) to break the cycle.

Thanks for bringing this to our attention.  There's a few useful
discussions:

https://github.com/firewalld/firewalld/issues/414
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1956629

>From my quick read: Michael Biebl proposes dropping network-pre.target
from cloud-init's After=, and replacing it with each of the config
backends that cloud-init supports.  This sounds pretty reasonable, but
also like something that upstream should address first.

Should we consider adding "Conflicts: firewalld" to cloud-init before
the freeze?  That's not optimal of course, but it'd prevent a user from
ending up in this situation for now.

Thanks,
Ross



Bug#1025618: cloud-init and firewalld systemd unit files have ordering cycles

2022-12-06 Thread Guillaume Knispel
Package: cloud-init
Version: 20.4.1-2+deb11u1
Severity: important
X-Debbugs-Cc: xi...@australdx.fr

Dear Maintainer,

firewalld and cloud-init have ordering cycles between their systemd unit
files, leading to more or less broken boot results when both are installed
and active, because at each boot systemd decides to skip a
non-deterministically choosen service (not necessarily cloud-init or
firewalld) to break the cycle.

I'm not sure if any of firewalld or cloud-init is more at fault (maybe
in not respecting some systemd rules?) so I'm also opening a duplicate
of this bug for the other package.

This can have various but potentially serious consequences, depending on
what should be, but is not, started.

Examples of boot traces of this issue happening:

* example 1:
sysinit.target: Found ordering cycle on cloud-init.service/start
sysinit.target: Found dependency on networking.service/start
sysinit.target: Found dependency on network-pre.target/start
sysinit.target: Found dependency on firewalld.service/start
sysinit.target: Found dependency on basic.target/start
sysinit.target: Found dependency on sockets.target/start
sysinit.target: Found dependency on uuidd.socket/start
sysinit.target: Found dependency on sysinit.target/start
sysinit.target: Job cloud-init.service/start deleted to break ordering cycle 
starting with sysinit.target/start

* example 2:
sysinit.target: Found ordering cycle on cloud-init.service/start
sysinit.target: Found dependency on networking.service/start
sysinit.target: Found dependency on network-pre.target/start
sysinit.target: Found dependency on firewalld.service/start
sysinit.target: Found dependency on dbus.service/start
sysinit.target: Found dependency on sysinit.target/start
sysinit.target: Job cloud-init.service/start deleted to break ordering cycle 
starting with sysinit.target/start

* example 3:
firewalld.service: Found ordering cycle on dbus.socket/start
firewalld.service: Found dependency on sysinit.target/start
firewalld.service: Found dependency on cloud-init.service/start
firewalld.service: Found dependency on networking.service/start
firewalld.service: Found dependency on network-pre.target/start
firewalld.service: Found dependency on firewalld.service/start
firewalld.service: Job dbus.socket/start deleted to break ordering cycle 
starting with firewalld.service/start

* example 4:
firewalld.service: Found ordering cycle on dbus.service/start
firewalld.service: Found dependency on sysinit.target/start
firewalld.service: Found dependency on cloud-init.service/start
firewalld.service: Found dependency on networking.service/start
firewalld.service: Found dependency on network-pre.target/start
firewalld.service: Found dependency on firewalld.service/start
firewalld.service: Job dbus.service/start deleted to break ordering cycle 
starting with firewalld.service/start
basic.target: Found ordering cycle on sysinit.target/start
basic.target: Found dependency on cloud-init.service/start
basic.target: Found dependency on networking.service/start
basic.target: Found dependency on network-pre.target/start
basic.target: Found dependency on firewalld.service/start
basic.target: Found dependency on basic.target/start
basic.target: Job cloud-init.service/start deleted to break ordering cycle 
starting with basic.target/start

* example 5:
basic.target: Found ordering cycle on sockets.target/start
basic.target: Found dependency on uuidd.socket/start
basic.target: Found dependency on sysinit.target/start
basic.target: Found dependency on cloud-init.service/start
basic.target: Found dependency on networking.service/start
basic.target: Found dependency on network-pre.target/start
basic.target: Found dependency on firewalld.service/start
basic.target: Found dependency on dbus.service/start
basic.target: Found dependency on basic.target/start
basic.target: Job sockets.target/start deleted to break ordering cycle starting 
with basic.target/start
firewalld.service: Found ordering cycle on dbus.socket/start
firewalld.service: Found dependency on sysinit.target/start
firewalld.service: Found dependency on cloud-init.service/start
firewalld.service: Found dependency on networking.service/start
firewalld.service: Found dependency on network-pre.target/start
firewalld.service: Found dependency on firewalld.service/start
firewalld.service: Job dbus.socket/start deleted to break ordering cycle 
starting with firewalld.service/start

* example 6:
networking.service: Found ordering cycle on network-pre.target/start
networking.service: Found dependency on firewalld.service/start
networking.service: Found dependency on dbus.service/start
networking.service: Found dependency on basic.target/start
networking.service: Found dependency on sockets.target/start
networking.service: Found dependency on uuidd.socket/start
networking.service: Found dependency on sysinit.target/start
networking.service: Found dependency on cloud-init.service/start
networking.service: Found dependency on networking.service/star