[systemd-devel] RFC: one more time: SCSI device identification

2021-03-29 Thread Martin Wilck
Hello,

[sorry for cross-posting, I think this is relevant to multiple
communities.]

I'm referring to the recent discussion about SCSI device identification
for multipath-tools 
(https://listman.redhat.com/archives/dm-devel/2021-March/msg00332.html)

As you all know, there are different designators to identify SCSI LUNs,
and the specs don't mandate priorities for devices that support
multiple designator types. There are various implementations for device
identification, which use different priorities (summarized below).

It's highly desirable to clean up this confusion and settle on a single
instance and a unique priority order. I believe this instance should be
the kernel.

OTOH, changing device WWIDs is highly dangerous for productive systems.
The WWID is prominently used in multipath-tools, but also in lots of
other important places such as fstab, grub.cfg, dracut, etc. No doubt
that we'll be stuck with the different algorithms for years, especially
for LTS distributions. But perhaps we can figure out a long-term exit
strategy?

The kernel's preference for type 8 designators (see below) is in
contrast with the established user space algorithms, which determine
SCSI WWIDs on productive systems in practice. User space can try to
adapt to the kernel logic, but it will necessarily be a slow and
painful path if we want to avoid breaking user setups.

In principle, I believe the kernel is "right" to prefer type 8. But
because the "wwid" attribute isn't actually used for device
identification today, changing the kernel logic would be less prone to
regressions than changing user space, even if it violates the principle
that the kernel's user space API must remain stable.

Would it be an option to modify the kernel logic?

If we can't, I think we should start with making the "wwid" attribute
part of the udev rule logic, and letting distros configure whether the
kernel logic or the traditional udev logic would be used.

Please tell me your thoughts on this matter.

Regards,
Martin

PS: Incomplete list of algorithms for SCSI designator priorities:

The kernel ("wwid" sysfs attribute) prefers "SCSI name string" (type 8)
designators over other types
(https://elixir.bootlin.com/linux/latest/A/ident/designator_prio).

The current set of udev rules in sg3_utils
(https://github.com/hreinecke/sg3_utils/blob/master/scripts/55-scsi-sg3_id.rules)
don't use the kernel's wwid attribute; they parse VPD 83 and 80
instead and prioritize types 36, 35, 32, and 2 over type 8.

udev's "scsi_id" tool, historically the first attempt to implement a
priority for this, doesn't look at the SCSI name attribute at all:
https://github.com/systemd/systemd/blob/main/src/udev/scsi_id/scsi_serial.c

There's a "fallback" logic in multipath-tools in case udev doesn't
provide a WWID:
https://github.com/opensvc/multipath-tools/blob/a41a61e8482def33e3ca8c9e3639ad2c37611551/libmultipath/discovery.c#L1040

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Waiting udev jobs

2021-03-29 Thread Luca Boccassi
On Sat, 2021-03-27 at 22:20 -0700, Alan Perry wrote:
> On 3/27/21 5:38 AM, Lennart Poettering wrote:
> > On Fr, 26.03.21 23:24, Alan Perry (al...@snowmoose.com) wrote:
> > 
> > > I occasionally see a problem where systemd-analyze reports that boot
> > > did not complete and it is suggested that I use systemctl list-jobs
> > > to find out more. That shows a .device service job and some sub-jobs
> > > (associated with udev rules) all waiting. They will wait for literal
> > > days in this state. When I accessed the system, it wasn’t apparent
> > > what the jobs were waiting on since all of the device symlinks and
> > > such were there and working. The systemctl status of the .device
> > > service was alive.
> > > 
> > > Any suggestions on what is going on and/or how to figure out what is
> > > going on?
> > > 
> > > If you have followed my posts here previously, it should come as no
> > > surprise that the device that I observed this happen with was one of
> > > the emmc boot devices.
> > This is not enough information. Please provide "systemctl status" info
> > on the relevant units and jobs, please provide a dump of the output.
> 
> I don't have access to that info at the moment. IIRC ...
> 
> dev-disk-by\x2dpath-platform\x2d68cf1000.sdhci\x2dboot0.device and 
> sys-devices-platform…mc0:0001-block-mmcblk0-mmcblk0boot0.device returned 
> "Active: inactive(dead)" and not much else.
> 
> dev-mmcblk0boot0.device returned "Active: active (plugged)". There was 
> more, but I don't remember what else.
> 
> 
> > And most importantly, always start with the systemd version number you
> > are using,
> 
> v247 plus patches

And for completeness, the patches on top are just backports from yours
truly, nothing weird, and nothing touching udev or rules apart from
this:

https://github.com/systemd/systemd/commit/8db704b28b4fd4d13e

> >   and whether you have any weird udev rules or so, or just
> > plain upstream stuff.
> 
> plain, upstream rules.
> 
> 
> I am trying to figure out what to look at when I have access to the 
> system exhibiting the problem that I am trying to resolve.
> 
> 
> alan


signature.asc
Description: This is a digitally signed message part
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel