Dne 04. 06. 25 v 5:33 Chengen Du napsal(a):
Hi DM developers,
On Tue, May 13, 2025 at 10:35 PM Lennart Poettering
<lenn...@poettering.net> wrote:
On Di, 13.05.25 16:08, Chengen Du (chengen...@canonical.com) wrote:
Hi,
Apologies for including everyone in this message, but I’d like to bring
your attention to a fix [1], which may require your input.
As mentioned in my comments there: we can certainly enable the locking
stuff again for DM block devices too, but only if DM maintainers sign
off that this is OK. Hence ping the DM people about this, otherwise we
won't move on this.
To mitigate such issues, systemd-udevd normally acquires a LOCK_SH|LOCK_NB
using flock on the main block device before processing.
However, commit #e918a1b5a94f (udev: exclude device-mapper from block
device ownership event locking) disabled this behavior for device-mapper
devices, which appears to be the root cause of the boot hang with encrypted
swap.
iirc dm for some reason is allergic to us taking a bsd lock, because
they don't want us to hold an fd open while the udev rules run
(because bsd locking implies holding an fd open as long as the lock is
kept).
But only the DM people can shed some light on this. if they are fine
these days if we relax this then we can certainly cover their stuff
via the locking, too.
Apologies for reaching out again, but may I kindly ask for your input
on this issue?
Your assistance would be greatly appreciated to help move things forward.
Hi
We have overlooked the issue which seems to have origins most likely in the
lost uevents due to switch from initramfs to rootfs and should be possibly
addressed by a new socket flag.
But anyway let's looks at the current locking mechanism.
So for lvm2 to be able to 'deactivate' DM device - such device must NOT be
opened - so taking a lock on an open descriptor to deactivate DM device is
likely not going to work.
lvm2 however could be possibly enhanced to at least grab these bsd locks maybe
when processing PV - that does not looks like a problematic part.
But adding bsd locks when processing DM (active LVs) looks like not so
trivial task - there are DM devices which are 'private' to DM stack itself
(i.e. cached raid LV - for a single public DM device - there might be
tens of 'private' DM devices associated in a device tree - and for none of
these devices lvm2 expects anyone using them - so any 'device stack tree'
manipulation basically aborts when an unexpected user is there (public
availability of these 'private' devices is however useful thing for various
'recovery/debugging' reasons - so there is very good reason all devices are
present in users's /dev/ directory - but administrator should not blindly open
them)
For protection against udev access to these private devices - were have
originally used some uevent flags - those however were not 'permanent' as if
udev was restarted with the clear database - all this info was lost (like one
of the reason we asked in the past for this DM exception). Later on we added
UUID -suffix solution - but this is not yet 'decorating' all device types -
and although we now try to add them - it's not a simple task - so likely some
nearby future version of lvm2 could be better - and in such a case - if this
newer version of lvm2 would be in the system - and there would be no access
to any device with UUID '-suffix' from udev tools chain - we can possibly
reconsider this DM exception and see whether we can make it work somehow.
Yet - for locking itself - I'd probably see some usage of separate locking dir
in /run as more usable approach - as the case where device needs to be
'removed/instantiated/....' cannot be 'lock protected' if the device itself
must be held open.
But as a short term solution - we would rather need to see the actual exact
problem which seems to be missing this locking - as is could be possibly
something unrelated to this locking...
Regards
Zdenek