Re: [systemd-devel] How can we debug systemd-gpt-auto-generator failures?

2022-07-28 Thread Kevin P. Fleming
Thanks! I hadn't paid any attention to that issue since I'm not using
btrfs, but it seems that the root cause is the same.

On Thu, Jul 28, 2022 at 9:31 AM Lennart Poettering
 wrote:
>
> On Do, 28.07.22 07:40, Kevin P. Fleming (ke...@km6g.us) wrote:
>
> > Thanks for that, it did indeed produce some output, but unfortunately
> > it doesn't seem to lead anywhere specific :-)
> >
> > root@edge21-a:~# SYSTEMD_LOG_LEVEL=debug SYSTEMD_LOG_TARGET=console
> > LIBBLKID_DEBUG=all
> > /usr/lib/systemd/system-generators/systemd-gpt-auto-generator
> > Found container virtualization none.
> > Disabling root partition auto-detection, root= is defined.
> > Disabling root partition auto-detection, root= is defined.
> > Failed to open device: No such device
> >
> > Adding strace to the command provides something more useful:
> >
> > openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 3
> > openat(3, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
> > fstat(4, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> > close(3)= 0
> > openat(4, "dev", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
> > fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > close(4)= 0
> > openat(3, "block", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
> > fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > close(3)= 0
> > openat(4, "0:0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ENOENT (No
> > such file or directory)
> > close(4)
> >
> > So it's trying to open() /sys/dev/block/0:0, but my system does not
> > have that device file. The only files in /sys/dev/block are 8:0
> > through 8:3.
>
> → https://github.com/systemd/systemd/issues/22504
>
> Lennart
>
> --
> Lennart Poettering, Berlin


Re: [systemd-devel] How can we debug systemd-gpt-auto-generator failures?

2022-07-28 Thread Lennart Poettering
On Do, 28.07.22 07:40, Kevin P. Fleming (ke...@km6g.us) wrote:

> Thanks for that, it did indeed produce some output, but unfortunately
> it doesn't seem to lead anywhere specific :-)
>
> root@edge21-a:~# SYSTEMD_LOG_LEVEL=debug SYSTEMD_LOG_TARGET=console
> LIBBLKID_DEBUG=all
> /usr/lib/systemd/system-generators/systemd-gpt-auto-generator
> Found container virtualization none.
> Disabling root partition auto-detection, root= is defined.
> Disabling root partition auto-detection, root= is defined.
> Failed to open device: No such device
>
> Adding strace to the command provides something more useful:
>
> openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 3
> openat(3, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
> fstat(4, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> close(3)= 0
> openat(4, "dev", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
> fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> close(4)= 0
> openat(3, "block", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
> fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> close(3)= 0
> openat(4, "0:0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ENOENT (No
> such file or directory)
> close(4)
>
> So it's trying to open() /sys/dev/block/0:0, but my system does not
> have that device file. The only files in /sys/dev/block are 8:0
> through 8:3.

→ https://github.com/systemd/systemd/issues/22504

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] How can we debug systemd-gpt-auto-generator failures?

2022-07-28 Thread Kevin P. Fleming
Thanks for that, it did indeed produce some output, but unfortunately
it doesn't seem to lead anywhere specific :-)

root@edge21-a:~# SYSTEMD_LOG_LEVEL=debug SYSTEMD_LOG_TARGET=console
LIBBLKID_DEBUG=all
/usr/lib/systemd/system-generators/systemd-gpt-auto-generator
Found container virtualization none.
Disabling root partition auto-detection, root= is defined.
Disabling root partition auto-detection, root= is defined.
Failed to open device: No such device

Adding strace to the command provides something more useful:

openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 3
openat(3, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
fstat(4, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
close(3)= 0
openat(4, "dev", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
close(4)= 0
openat(3, "block", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
close(3)= 0
openat(4, "0:0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ENOENT (No
such file or directory)
close(4)

So it's trying to open() /sys/dev/block/0:0, but my system does not
have that device file. The only files in /sys/dev/block are 8:0
through 8:3.

On Thu, Jul 28, 2022 at 7:17 AM Mantas Mikulėnas  wrote:
>
> On Thu, Jul 28, 2022 at 1:51 PM Kevin P. Fleming  wrote:
>>
>> I've got two systems that report a failure (exit code 1) every time
>> systemd-gpt-auto-generator is run. There are a small number of reports
>> of this affecting other users too:
>>
>> https://bugs.archlinux.org/task/73168
>>
>> This *may* be related to the use of ZFS, although I've got a
>> half-dozen systems using ZFS and only two of them have this issue.
>>
>> Running the generator from the command line also produces exit code 1,
>> but doesn't produce any output. is there any practical way to debug
>> this failure?
>
>
> Start with SYSTEMD_LOG_LEVEL=debug (and SYSTEMD_LOG_TARGET=console if you 
> want to run the tool from CLI, otherwise it logs to journal). If it only 
> fails as part of a daemon-reexec, `systemctl set-environment` (not sure if 
> `systemctl log-level` has any effect on non-pid1 processes). Since it uses 
> libblkid, there's also LIBBLKID_DEBUG=all.
>
> --
> Mantas Mikulėnas


Re: [systemd-devel] How can we debug systemd-gpt-auto-generator failures?

2022-07-28 Thread Mantas Mikulėnas
On Thu, Jul 28, 2022 at 1:51 PM Kevin P. Fleming  wrote:

> I've got two systems that report a failure (exit code 1) every time
> systemd-gpt-auto-generator is run. There are a small number of reports
> of this affecting other users too:
>
> https://bugs.archlinux.org/task/73168
>
> This *may* be related to the use of ZFS, although I've got a
> half-dozen systems using ZFS and only two of them have this issue.
>
> Running the generator from the command line also produces exit code 1,
> but doesn't produce any output. is there any practical way to debug
> this failure?
>

Start with SYSTEMD_LOG_LEVEL=debug (and SYSTEMD_LOG_TARGET=console if you
want to run the tool from CLI, otherwise it logs to journal). If it only
fails as part of a daemon-reexec, `systemctl set-environment` (not sure if
`systemctl log-level` has any effect on non-pid1 processes). Since it uses
libblkid, there's also LIBBLKID_DEBUG=all.

-- 
Mantas Mikulėnas


[systemd-devel] How can we debug systemd-gpt-auto-generator failures?

2022-07-28 Thread Kevin P. Fleming
I've got two systems that report a failure (exit code 1) every time
systemd-gpt-auto-generator is run. There are a small number of reports
of this affecting other users too:

https://bugs.archlinux.org/task/73168

This *may* be related to the use of ZFS, although I've got a
half-dozen systems using ZFS and only two of them have this issue.

Running the generator from the command line also produces exit code 1,
but doesn't produce any output. is there any practical way to debug
this failure?


Re: [systemd-devel] Antw: [EXT] Re: Feedback sought: can we drop cgroupv1 support soon?

2022-07-28 Thread Lennart Poettering
On Do, 28.07.22 09:48, Ulrich Windl (ulrich.wi...@rz.uni-regensburg.de) wrote:

> Hi!
>
> What about making cgroup1 support _configurable_ as a first step?
> So maybe people could try how well things work when there is no cgroups v1
> support in systemd.

It's already runtime configurable. Kernel command line option 
systemd.unified_cgroup_hierarchy=yes|no

Lennart

--
Lennart Poettering, Berlin


[systemd-devel] Antw: [EXT] Re: [systemd‑devel] Feedback sought: can we drop cgroupv1 support soon?

2022-07-28 Thread Ulrich Windl
>>> Lennart Poettering  schrieb am 22.07.2022 um 17:35
in
Nachricht :
> On Fr, 22.07.22 12:15, Lennart Poettering (mzerq...@0pointer.de) wrote:
> 
>> > I guess that would mean holding on to cgroup1 support until EOY 2023
>> > or thereabout?
>>
>> That does sound OK to me. We can mark it deprecated before though,
>> i.e. generate warnings, and remove it from docs, as long as the actual
>> code stays around until then.

I would not remove it from the docs, but declare it obsolete/deprecated
instead.
I think "undocumented" features are a bad thing.

> 
> So I prepped a PR now that documents the EOY 2023 date:
> 
> https://github.com/systemd/systemd/pull/24086 
> 
> That way we shoudn't forget about this, and remind us that we still
> actually need to do it then.
> 
> Lennart
> 
> ‑‑
> Lennart Poettering, Berlin





[systemd-devel] Antw: [EXT] Re: [systemd‑devel] Feedback sought: can we drop cgroupv1 support soon?

2022-07-28 Thread Ulrich Windl
>>> Lennart Poettering  schrieb am 22.07.2022 um 12:15
in
Nachricht :
> On Do, 21.07.22 16:24, Stéphane Graber (stgra...@ubuntu.com) wrote:
> 
>> Hey there,
>>
>> I believe Christian may have relayed some of this already but on my
>> side, as much as I can sympathize with the annoyance of having to
>> support both cgroup1 and cgroup2 side by side, I feel that we're sadly
>> nowhere near the cut off point.
>>
>> >From what I can gather from various stats we have, over 90% of LXD
>> users are still on distributions relying on CGroup1.
>> That's because most of them are using LTS releases of server
>> distributions and those only somewhat recently made the jump to
>> cgroup2:
>>  ‑ RHEL 9 in May 2022
>>  ‑ Ubuntu 22.04 LTS in April 2022
>>  ‑ Debian 11 in August 2021
>>
>> OpenSUSE is still on cgroup1 by default in 15.4 for some reason.
>> All this is also excluding our two largest users, Chromebooks and QNAP
>> NASes, neither of them made the switch yet.
> 
> At some point I feel no sympathy there. If google/qnap/suse still are
> stuck in cgroupv1 land, then that's on them, we shouldn't allow
> ourselves to be held hostage by that.
> 
> I mean, that Google isn't forward looking in these things is well
> known, but I am a bit surprised SUSE is still so far back.

Well, openSUSE actually is rather equivalent to SLES15 (which exists for some
years now).
I guess they didn't want to switch within a major release.
Everybody is free to file an "enhancement" request, at opensuse's bugzilla,
however.
...

Regards,
Ulrich




[systemd-devel] Antw: [EXT] Re: Feedback sought: can we drop cgroupv1 support soon?

2022-07-28 Thread Ulrich Windl
Hi!

What about making cgroup1 support _configurable_ as a first step?
So maybe people could try how well things work when there is no cgroups v1
support in systemd.

Regards,
Ulrich

>>> Stéphane Graber  schrieb am 21.07.2022 um 22:24 in
Nachricht
:
> Hey there,
> 
> I believe Christian may have relayed some of this already but on my
> side, as much as I can sympathize with the annoyance of having to
> support both cgroup1 and cgroup2 side by side, I feel that we're sadly
> nowhere near the cut off point.
> 
> From what I can gather from various stats we have, over 90% of LXD
> users are still on distributions relying on CGroup1.
> That's because most of them are using LTS releases of server
> distributions and those only somewhat recently made the jump to
> cgroup2:
>  - RHEL 9 in May 2022
>  - Ubuntu 22.04 LTS in April 2022
>  - Debian 11 in August 2021
> 
> OpenSUSE is still on cgroup1 by default in 15.4 for some reason.
> All this is also excluding our two largest users, Chromebooks and QNAP
> NASes, neither of them made the switch yet.
> 
> I honestly wouldn't be holding deprecating cgroup1 on waiting for
> those few to wake up and transition.
> Both ChromeOS and QNAP can very quickly roll it out to all their users
> should they want to.
> It's a bit trickier for OpenSUSE as it's used as the basis for SLES
> and so those enterprise users are unlikely to see cgroup2 any time
> soon.
> 
> Now all of this is a problem because:
>  - Our users are slow to upgrade. It's common for them to skip an
> entire LTS release and those that upgrade every time will usually wait
> 6 months to a year prior to upgrading to a new release.
>  - This deprecation would prevent users of anything but the most
> recent release from running any newer containers. As it's common to
> switch to newer containers before upgrading the host, this would cause
> some issues.
>  - Unfortunately the reverse is a problem too. RHEL 7 and derivatives
> are still very common as a container workload, as is Ubuntu 16.04 LTS.
> Unfortunately those releases ship with a systemd version that does not
> boot under cgroup2.
> 
> That last issue has been biting us a bit recently but it's something
> that one can currently workaround by forcing systemd back into hybrid
> mode on the host.
> With the deprecation of cgroup1, this won't be possible anymore. You
> simply won't be able to have both CentOS7 and Fedora XYZ running in
> containers on the same system as one will only work on cgroup1 and the
> other only on cgroup2.
> 
> Now this doesn't bother me at all for anything that's end of life, but
> RHEL 7 is only reaching EOL in June 2024 and while Ubuntu 16.04 is
> officially EOL, Canonical provides extended support (ESM) on it until
> April 2026.
> 
> 
> So given all that, my 2 cents would be that ideally systemd should
> keep supporting cgroup1 until June 2024 or shortly before that given
> the usual leg between releasing systemd and it being adopted by Linux
> distros. This would allow for most distros to have made it through two
> long term releases shipping with cgroup2, making sure the vast
> majority of users will finally be on cgroup2 and will also allow for
> those cgroup1-only workloads to have gone away.
> 
> I guess that would mean holding on to cgroup1 support until EOY 2023
> or thereabout?
> 
> Stéphane
> 
> On Thu, Jul 21, 2022 at 5:55 AM Christian Brauner 
wrote:
>>
>> [Cc Stéphane and Serge]
>>
>> On Thu, Jul 21, 2022 at 11:03:49AM +0200, Lennart Poettering wrote:
>> > Heya!
>> >
>> > It's currently a terrible mess having to support both cgroupsv1 and
>> > cgroupsv2 in our codebase.
>> >
>> > cgroupsv2 first entered the kernel in 2014, i.e. *eight* years ago
>> > (kernel 3.16). We soon intend to raise the baseline for systemd to
>> > kernel 4.3 (because we want to be able to rely on the existance of
>> > ambient capabilities), but that also means, that all kernels we intend
>> > to support have a well-enough working cgroupv2 implementation.
>> >
>> > hence, i'd love to drop the cgroupv1 support from our tree entirely,
>> > and simplify and modernize our codebase to go cgroupv2-only. Before we
>> > do that I'd like to seek feedback on this though, given this is not
>> > purely a thing between the kernel and systemd — this does leak into
>> > some userspace, that operates on cgroups directly.
>> >
>> > Specifically, legacy container infra (i.e. docker/moby) for the
>> > longest time was cgroupsv1-only. But as I understand it has since been
>> > updated, to cgroupsv2 too.
>> >
>> > Hence my question: is there a strong community of people who insist on
>> > using newest systemd while using legacy container infra? Anyone else
>> > has a good reason to stick with cgroupsv1 but really wants newest
>> > systemd?
>> >
>> > The time where we'll drop cgroupv1 support *will* come eventually
>> > either way, but what's still up for discussion is to determine
>> > precisely when. hence, please let us know!
>>
>> In general, I wouldn't mind