Re: Introduce Storage Instantiation Daemon - Fedora 33 System-Wide Change proposal

2020-07-01 Thread Peter Rajnoha
On 7/1/20 2:03 PM, Neal Gompa wrote:
> On Wed, Jul 1, 2020 at 8:00 AM Peter Rajnoha  wrote:
>>
>> On 6/30/20 9:35 PM, Igor Raits wrote:
>>> On Tue, 2020-06-30 at 15:18 -0400, Ben Cotton wrote:
>>>> https://fedoraproject.org/wiki/Changes/SID
>>>
>>>> == Summary ==
>>>> Introduce Storage Instantiation Daemon (SID) that aims to provide a
>>>> central event-driven engine to write modules for identifying specific
>>>> Linux storage devices, their dependencies, collecting information and
>>>> state tracking while
>>>> being aware of device groups forming layers and layers forming whole
>>>> stacks or simply creating custom groups of enumerated devices. SID
>>>> will provide mechanisms to retrieve and query collected information
>>>> and a possibility to bind predefined or custom triggers with actions
>>>> for each group.
>>>
>>>> == Owner ==
>>>> * Name: [[User:prajnoha | Peter Rajnoha]]
>>>> * Email: prajn...@redhat.com
>>>
>>>> == Detailed Description ==
>>>> Over the years, various storage subsystems have been installing hooks
>>>> within udev rules and calling out numerous external commands for them
>>>> to be able to react on events like device presence, removal or a
>>>> change in general. However, this approach ended up with very complex
>>>> rules that are hard to maintain and debug if we are considering
>>>> storage setups where we build layers consisting of several underlying
>>>> devices (horizontal scope) and where we can stack one layer on top of
>>>> another (vertical scope), building up diverse storage stacks where we
>>>> also need to track progression of states either at device level or
>>>> group level.
>>>
>>>> SID extends udevd functionality here in a way that it incorporates a
>>>> notion of device grouping directly in its core which helps with
>>>> tracking devices in storage subsystems like LVM, multipath, MD...
>>>> Also, it provides its own database where records are separated into
>>>> per-device, per-module, global or udev namespace. The udev namespace
>>>> keeps per-device records that are imported and/or exported to/from
>>>> udev environment and this is used as compatible communication channel
>>>> with udevd. The records can be marked with restriction flags that aid
>>>> record separation and it prevents other modules to read, write or
>>>> create a record with the same key, hence making sure that only a
>>>> single module can create the records with certain keys (reserving a
>>>> key).
>>>
>>>> Currently, SID project provides a companion command called 'usid'
>>>> which is used for communication between udev and SID itself. After
>>>> calling the usid command in a udev rule, device processing is
>>>> transferred to SID and SID strictly separates the processing into
>>>> discrete phases (device identificaton, pre-scan, device scan,
>>>> post-scan). Within these phases, it is possible to decide whether the
>>>> next phase is executed and it is possible to schedule delayed actions
>>>> or set records in the database that can fire triggers with associated
>>>> actions or records which are then exported to udev environment
>>>> (mainly
>>>> for backwards compatibility and for other udev rules to have a chance
>>>> to react). The scheduled actions and triggers are executed out of
>>>> udev
>>>> context and hence not delaying the udev processing itself and
>>>> improving issues with udev timeouts where unnecessary work is done.
>>>
>>>> A module writer can hook into the processing phases and use SID's API
>>>> to access the database as well as set the triggers with actions or
>>>> schedule separate actions and mark devices as ready or not for use in
>>>> next layers. The database can be used within any phase to retrieve
>>>> and
>>>> store key-value records (where value could be any binary value in
>>>> general) and the records can be marked as transient (only available
>>>> during processing phases for current event) or persistent so they can
>>>> be accessed while processing subsequent events.
>>>
>>>> == Benefit to Fedora ==
>>>> The main benefit is all about centralizing the solution to solve
>>>> issues that storage sub

Re: Introduce Storage Instantiation Daemon - Fedora 33 System-Wide Change proposal

2020-07-01 Thread Peter Rajnoha
On 7/1/20 9:50 AM, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Jun 30, 2020 at 03:18:57PM -0400, Ben Cotton wrote:
>> == Benefit to Fedora ==
>> The main benefit is all about centralizing the solution to solve
>> issues that storage subsystem maintainers have been hitting with udev,
>> that is:
>>
>> * providing a central infrastructure for storage event processing,
>> currently targeted at udev events
>>
>> * improving the way storage events and their sequences are recognized
>> and for which complex udev rules were applied before
>>
>> * single notion of device readiness shared among various storage
>> subsystems (single API to set the state instead of setting various
>> variables by different subsystems)
>>
>> * providing more enhanced possibilities to store and retrieve
>> storage-device-related records when compared to udev database
>>
>> * direct support for generic device grouping (matching
>> subsystem-related groups like LVM, multipath, MD... or creating
>> arbitrary groups of devices)
>>
>> * centralized solution for scheduling triggers with associated actions
>> defined on groups of storage devices
> 
> This sounds interesting. Assembling complex storage from udev rules is
> not easy, in particular because while it is easy to collect devices
> and handle the case where all awaited devices have been detected, it's
> much harder to do timeouts or partial assembly or conditional
> handling. A daemon can listen to hotplug events and have an internal
> state take decisions based on configuration and time and events.
> 

Exactly, that's also one of the areas we'd like to cover here - partial
activations based on policies. This is hard to do within pure udev... or at
least, at the moment, we'd need to put together several *external* pieces
together besides udev to make this working somehow at least. SID will try to
provide the infrastructure to implement this in one place.

> OTOH, based on this description, SID seems to want to take on some
> bigger role, e.g. by providing an alternate execution and device
> description mechanism. That sounds unnecessary (since udev does that
> part reasonably well) and complex (also because support would have to
> be added to consumers who currently get this data from udev). I would
> love to see a daemon to handle storage devices, but with close
> cooperation with udev and filling in the bits that udev cannot provide.
>
Not quite. If it sounds that SID is taking over most of udev's responsibility,
then no. It's trying to build on top of it - still considering udev as
low-level layer for event processing based on simple rules. Then SID adding
abstraction that we need for storage mainly - that is the grouping part, state
recording and delayed trigger/action part.

The issue with udev is that it's concentrated on single device processing and
on current state (yes, we have IMPORT{db}, but that's good for simple records
only). But this is OK as it is a low-level tool.

Also, udev's primary job is to record these single device properties and then
to create the /dev content so these devs are accessible. But there are actions
we don't need to execute within udev context at all - e.g. the device
activation itself. And there are other details where we come short with udev
like the udev rule language itself so if you need to define more complex
logic, you need to call out external commands to do that (and that is just
another fork, just another delay). Even comparing values of two variables is
not possible in udev (you can compare only with a literal constant).

With SID, for backwards compatibility and for udev db readers, we have still
the possibility to export selected information from SID to udev db, if needed
(importing and exporting from/to udev environment is just about using
dedicated namespace we have in SID db). But I think storage subsystems would
go for SID directly if it provides this domain specific information - it's
just adding more details to what udev can see.

What I would probably like to see in the future though is surely a more closer
cooperation of udevd and SID in a way where udevd could still record those
simple generic single device properties as it does today and if it sees that
this is a device that falls under certain domain (like "storage" here), udevd
itself can contact the domain-specific daemon/resource for more information
and then provide that through its interface. Similar logic could apply for
"network" domain, etc. All these domain-specific external resources could be
registered with udevd. But this is for later time and much more discussion...

>> * adding a centralized solution for delayed actions on storage devices
>> and groups of devices (avoiding unnecessary work done within udev
>> context and hence avoiding frequent udev timeouts when processing
>> events for such devices)
> I don't think such timeouts are common. Currently the default worker
> timeout is 180s, and this should be enough to handle any device hotplug
> event. And if

Re: Introduce Storage Instantiation Daemon - Fedora 33 System-Wide Change proposal

2020-07-01 Thread Peter Rajnoha
On 6/30/20 9:35 PM, Igor Raits wrote:
> On Tue, 2020-06-30 at 15:18 -0400, Ben Cotton wrote:
>> https://fedoraproject.org/wiki/Changes/SID
> 
>> == Summary ==
>> Introduce Storage Instantiation Daemon (SID) that aims to provide a
>> central event-driven engine to write modules for identifying specific
>> Linux storage devices, their dependencies, collecting information and
>> state tracking while
>> being aware of device groups forming layers and layers forming whole
>> stacks or simply creating custom groups of enumerated devices. SID
>> will provide mechanisms to retrieve and query collected information
>> and a possibility to bind predefined or custom triggers with actions
>> for each group.
> 
>> == Owner ==
>> * Name: [[User:prajnoha | Peter Rajnoha]]
>> * Email: prajn...@redhat.com
> 
>> == Detailed Description ==
>> Over the years, various storage subsystems have been installing hooks
>> within udev rules and calling out numerous external commands for them
>> to be able to react on events like device presence, removal or a
>> change in general. However, this approach ended up with very complex
>> rules that are hard to maintain and debug if we are considering
>> storage setups where we build layers consisting of several underlying
>> devices (horizontal scope) and where we can stack one layer on top of
>> another (vertical scope), building up diverse storage stacks where we
>> also need to track progression of states either at device level or
>> group level.
> 
>> SID extends udevd functionality here in a way that it incorporates a
>> notion of device grouping directly in its core which helps with
>> tracking devices in storage subsystems like LVM, multipath, MD...
>> Also, it provides its own database where records are separated into
>> per-device, per-module, global or udev namespace. The udev namespace
>> keeps per-device records that are imported and/or exported to/from
>> udev environment and this is used as compatible communication channel
>> with udevd. The records can be marked with restriction flags that aid
>> record separation and it prevents other modules to read, write or
>> create a record with the same key, hence making sure that only a
>> single module can create the records with certain keys (reserving a
>> key).
> 
>> Currently, SID project provides a companion command called 'usid'
>> which is used for communication between udev and SID itself. After
>> calling the usid command in a udev rule, device processing is
>> transferred to SID and SID strictly separates the processing into
>> discrete phases (device identificaton, pre-scan, device scan,
>> post-scan). Within these phases, it is possible to decide whether the
>> next phase is executed and it is possible to schedule delayed actions
>> or set records in the database that can fire triggers with associated
>> actions or records which are then exported to udev environment
>> (mainly
>> for backwards compatibility and for other udev rules to have a chance
>> to react). The scheduled actions and triggers are executed out of
>> udev
>> context and hence not delaying the udev processing itself and
>> improving issues with udev timeouts where unnecessary work is done.
> 
>> A module writer can hook into the processing phases and use SID's API
>> to access the database as well as set the triggers with actions or
>> schedule separate actions and mark devices as ready or not for use in
>> next layers. The database can be used within any phase to retrieve
>> and
>> store key-value records (where value could be any binary value in
>> general) and the records can be marked as transient (only available
>> during processing phases for current event) or persistent so they can
>> be accessed while processing subsequent events.
> 
>> == Benefit to Fedora ==
>> The main benefit is all about centralizing the solution to solve
>> issues that storage subsystem maintainers have been hitting with
>> udev,
>> that is:
> 
>> * providing a central infrastructure for storage event processing,
>> currently targeted at udev events
> 
>> * improving the way storage events and their sequences are recognized
>> and for which complex udev rules were applied before
> 
>> * single notion of device readiness shared among various storage
>> subsystems (single API to set the state instead of setting various
>> variables by different subsystems)
> 
>> * providing more enhanced possibilities to store and retrieve
>> storage-device-related records when compared to udev datab