Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-26 Thread Martin Wilck
On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > 
> > 
> > While we're at it, I'd like to mention another issue: WWID changes.
> > 
> > This is a big problem for multipathd. The gist is that the device
> > identification attributes in sysfs only change after rescanning the
> > device. Thus if a user changes LUN assignments on a storage system,
> > it can happen that a direct INQUIRY returns a different WWID as in
> > sysfs, which is fatal. If we plan to rely more on sysfs for device
> > identification in the future, the problem gets worse. 
> 
> I think many devices rely on the fact that they are identified by
> Vendor/model/serial_nr, because in most professional SAN storage
> systems you
> can pre-set the serial number to a custom value; so if you want a new
> disk
> (maybe a snapshot) to be compatible with the old one, just assign the
> same
> serial number. I guess that's the idea behind.

What you are saying sounds dangerous to me. If a snapshot has the same
WWID as the device it's a snapshot of, it must not be exposed to any
host(s) at the same time with its origin, otherwise the host may
happily combine it with the origin into one multipath map, and data
corruption will almost certainly result. 

My argument is about how the host is supposed to deal with a WWID
change if it happens. Here, "WWID change" means that a given H:C:T:L
suddenly exposes different device designators than it used to, while
this device is in use by a host. Here, too, data corruption is
imminent, and can happen in a blink of an eye. To avoid this, several
things are needed:

 1) the host needs to get notified about the change (likely by an UA of
some sort)
 2) the kernel needs to react to the notification immediately, e.g. by
blocking IO to the device,
 3) userspace tooling such as udev or multipathd need to figure out how
to  how to deal with the situation cleanly, and eventually unblock it.

Wrt 1), we can only hope that it's the case. But 2) and 3) need work,
afaics.

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Antw: [EXT] Re: systemctl reboot get terminated by signal 15

2021-04-26 Thread Ulrich Windl
>>> Lennart Poettering  schrieb am 23.04.2021 um 21:33
in
Nachricht :
> On Fr, 23.04.21 14:15, Pengpeng Sun (pengpe...@vmware.com) wrote:
> 
>> Hi Lennart,
>>
>> The issue reproduced at 2021‑04‑22T15:45:30.230Z,  I ran 'sudo journalctl
‑b', 
> the log began at Apr 22 15:45:48 which is later than the issue reproduced.
>> How can I get more early and detailed systemd log?
> 
> man 5 journald.conf
> 
> Maybe your distro didn't enable persistent storage of journald, and
> thus journald uses only in‑memory storage in /run, and is thus
> constrained by its diminutive size?

In SLES it seems to be just "mkdir /var/log/journal" and restarting journald.

> 
> Lennart
> 
> ‑‑
> Lennart Poettering, Berlin
> ___
> systemd‑devel mailing list
> systemd‑de...@lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/systemd‑devel 



___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Antw: [EXT] Re: RFC: one more time: SCSI device identification

2021-04-26 Thread Ulrich Windl
>>> Martin Wilck  schrieb am 23.04.2021 um 12:28 in
Nachricht :
> On Thu, 2021‑04‑22 at 21:40 ‑0400, Martin K. Petersen wrote:
>> 
>> Martin,
>> 
>> > I suppose 99.9% of users never bother with customizing the udev
>> > rules.
>> 
>> Except for the other 99.9%, perhaps? :) We definitely have many users
>> that tweak udev storage rules for a variety of reasons. Including
>> being
>> able to use RII for LUN naming purposes.
>> 
>> > But we can actually combine both approaches. If "wwid" yields a
>> > good
>> > value most of the time (which is true IMO), we could make user
>> > space
>> > rely on it by default, and make it possible to set an udev property
>> > (e.g. ENV{ID_LEGACY}="1") to tell udev rules to determine WWID
>> > differently. User‑space apps like multipath could check the
>> > ID_LEGACY
>> > property to determine whether or not reading the "wwid" attribute
>> > would
>> > be consistent with udev. That would simplify matters a lot for us
>> > (Ben,
>> > do you agree?), without the need of adding endless BLIST entries.
>> 
>> That's fine with me.
>> 
>> > AFAICT, no major distribution uses "wwid" for this purpose (yet).
>> 
>> We definitely have users that currently rely on wwid, although
>> probably
>> not through standard distro scripts.
>> 
>> > In a recent discussion with Hannes, the idea came up that the
>> > priority
>> > of "SCSI name string" designators should actually depend on their
>> > subtype. "naa." name strings should map to the respective NAA
>> > descriptors, and "eui.", likewise (only "iqn." descriptors have no
>> > binary counterpart; we thought they should rather be put below NAA,
>> > prio‑wise).
>> 
>> I like what NVMe did wrt. to exporting eui, nguid, uuid separately
>> from
>> the best‑effort wwid. That's why I suggested separate sysfs files for
>> the various page 0x83 descriptors. I like the idea of being able to
>> explicitly ask for an eui if that's what I need. But that appears to
>> be
>> somewhat orthogonal to your request.
>> 
>> > I wonder if you'd agree with a change made that way for "wwid". I
>> > suppose you don't. I'd then propose to add a new attribute
>> > following
>> > this logic. It could simply be an additional attribute with a
>> > different
>> > name. Or this new attribute could be a property of the block device
>> > rather than the SCSI device, like NVMe does it
>> > (/sys/block/nvme0n2/wwid).
>> 
>> That's fine. I am not a big fan of the idea that block/foo/wwid and
>> block/foo/device/wwid could end up being different. But I do think
>> that
>> from a userland tooling perspective the consistency with NVMe is more
>> important.
> 
> OK, then here's the plan: Change SCSI (block) device identification to
> work similar to NVMe (in addition to what we have now).
> 
>  1. add a new sysfs attribute for SCSI block devices as
> /sys/block/sd$X/wwid, the value derived similar to the current "wwid"
> SCSI device attribute, but using the same prio for SCSI name strings as
> for their binary counterparts, as described above.
> 
>  2. add "naa" and "eui" attributes, too, for user‑space applications
> that are interested in these specific attributes. 
> Fixme: should we differentiate between different "naa" or eui subtypes,
> like "naa_regext", "eui64" or similar? If the device defines multiple
> "naa" designators, which one should we choose?
> 
>  3. Change udev rules such that they primarily look at the attribute in
> 1.) on new installments, and introduce a variable ID_LEGACY to tell the
> rules to fall back to the current algorithm. I suppose it makes sense
> to have at least ID_VENDOR and ID_PRODUCT available when making this
> decision, so that it doesn't have to be a global setting on a given
> host.
> 
> While we're at it, I'd like to mention another issue: WWID changes.
> 
> This is a big problem for multipathd. The gist is that the device
> identification attributes in sysfs only change after rescanning the
> device. Thus if a user changes LUN assignments on a storage system, 
> it can happen that a direct INQUIRY returns a different WWID as in
> sysfs, which is fatal. If we plan to rely more on sysfs for device
> identification in the future, the problem gets worse. 

I think many devices rely on the fact that they are identified by
Vendor/model/serial_nr, because in most professional SAN storage systems you
can pre-set the serial number to a custom value; so if you want a new disk
(maybe a snapshot) to be compatible with the old one, just assign the same
serial number. I guess that's the idea behind.

> 
> I wonder if there's a chance that future kernels would automatically
> update the attributes if a corresponding UNIT ATTENTION condition such
> as INQUIRY DATA HAS CHANGED is received (*), or if we can find some
> other way to avoid data corruption resulting from writing to the wrong
> device.
> 
> Regards,
> Martin
> 
> (*) I've been told that WWID changes can happen even without receiving
> an UA. But in that case I'm inclined to put the blame on