Re: [systemd-devel] learning how to run systemd in a container, journal shows errors I would like to understand what they mean and why

2022-03-28 Thread Daniel Walsh

On 3/25/22 17:11, Michal Koutný wrote:

Hello Masber.

On Fri, Mar 25, 2022 at 11:52:33AM +, masber masber  
wrote:

I have a k8s cluster with docker as container runtime and am I trying
to make systemd to work.
I read this doc 
https://developers.redhat.com/blog/2016/09/13/running-systemd-in-a-non-privileged-container#enter_oci_hooks
 and I have systemd running in a container.

Note the article is almost six years old. Plenty things were implemented
and configs changed since then.


Mar 25 11:24:31 nid001002-cluster-1 systemd[1]: Failed to reset devices.list on 
/kubepods/burstable/podcd69d169-d610-4af7-895a-eb86ee74ed49/4caa4403b8b6d263012e95ca51357ab0bb46fb3bc7a23221115d22efb757cc9c/system.slice/etc-resolv.conf.mount:
 Operation not permitted

I would like to ask the meaning of this message and how to solve it (if 
possible)

This message says that the containerized systemd attempts to set some
cgroup attributes (in this case regarding device access rules via
devices controller, DeviceAllow= directive) but it fails.
Effectively it could mean your container failed to made itself more
secure but it should not affect functionality (from what you provided
here).

You say you run this in an unprivileged container, a responsible runtime
would not set up access to v1 controllers (devices is v1 only), so EPERM
is sort of expected. For the unprivileged containers, I'd suggest you
switch the host into unified cgroup mode (and consequently the container
too). That should resolve the reported problem but there may still
something else that breaks your containerized systemd.

HTH,
Michal

I would also advise you to switch to Podman which makes running 
containized systemd easy and is fully supported.




Re: [systemd-devel] version bump of minimal kernel version supported by systemd?

2022-03-28 Thread Lennart Poettering
On Do, 24.03.22 10:28, Luca Boccassi (bl...@debian.org) wrote:

> > What I am trying to say is that it would actually help us a lot if
> > we'd not just be able to take croupv2 for granted but to take a
> > reasonably complete cgroupv2 for granted.
> >
> > Lennart
> >
> > --
> > Lennart Poettering, Berlin
>
> Yes, that does sound like worth exploring - our README doesn't document
> it though, do we have a list of required controllers and when they were
> introduced?

So I'd argue cgroupsv2 was pretty useless before 4.15, since it lacked
the cpu controller, which I'd argue is actually the one that matters
most. hence, before 4.15 cgroupsv2 was an experiment, not something
you could actually deploy.

some other interesting milestones:

* kcmp → 3.5
* renameat2 on all relevant file systems → 4.0
* pids controller in cgroupv1 → 4.3
* pids controller in cgroupv2 → 4.5
* cgroup namespaces → 4.6
* statx → 4.11
* pidfd → 5.3

This is just some quick search through man pages. There might be a lot
of other stuff that would make sense for us to be able to rely on.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-28 Thread Lennart Poettering
On Do, 24.03.22 14:32, Benjamin Berg (benja...@sipsolutions.net) wrote:

> HI,
>
> On Thu, 2022-03-24 at 12:40 +0100, Felip Moll wrote:
> > False, the JobRemoved signal returns the id, job, unit and result. To
> > wait for JobRemoved only needs a matching rule for this signal. The
> > matching rule can just contain the path. In fact, nothing else than
> > strings can be matched in a rule, so I may be only able to use the
> > path.
>
> I think you need to add a wildcard match before the job is created
> (i.e. before StartTransientUnit). Otherwise registering the match rule
> (using the job's object path) will race with systemd signalling that
> the job has completed.

Correct.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-28 Thread Lennart Poettering
On Do, 24.03.22 00:45, Felip Moll (fe...@schedmd.com) wrote:

> Hi, some days ago we were talking about this:
>
>
> > > Problem number two, there's a significant delay since when creating the
> > > scope, until it is ready and the pid attached into it. The only way it
> > > worked was to put a 'sleep' after the dbus call and make my process wait
> > > for the async call to dbus to be materialized. This is really
> > > un-elegant.
> >
> > If you want to synchronize in the cgroup creation to complete just
> > wait for the JobRemoved bus signal for the job returned by
> > StartTransientUnit().
> >
> >
> StartTransientUnit returns a string to a job object path. To call
> JobRemoved I need the job id, so the easier way to get it is to strip the
> last part of the returned string from StartTransientUnit job object path.
> Am I right?

JobRemoved is a signal, not a method call. i.e. not something you
call, but you are notified about. And it originates from an object and
objects have object paths in D-Bus.

> Once I have the job id, I can then subscribe to JobRemoved bus signal for
> the recently created job, but what happens if during the time I am
> obtaining the ID or parsing the output, the job is finished? Will I lose
> the signal?

Yes. D-Bus sucks that way. You ave to subscribe to all jobs first, and
the filte rout the ones you don#t want.

> What is the correct order of doing a StartTransientUnit and wait for the
> job to be finished (done, failed, whatever) ?

first subscribe to JobRemoved, then issue StartTransientUnit, and then
wait until you see JobRemoved for the unit you just started.

Lennart

--
Lennart Poettering, Berlin