subject:"\[RFC\] fs\: add userspace critical mounts event support"

Re: [RFC] fs: add userspace critical mounts event support

2016-11-30 Thread Johannes Berg

On Tue, 2016-11-29 at 22:37 +0100, Luis R. Rodriguez wrote:
> On Tue, Nov 29, 2016 at 10:10:56PM +0100, Tom Gundersen wrote:
> > 
> > On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg
> >  wrote:
> > > 
> > > My argument basically goes like this:
> > > 
> > > First, given good drivers (i.e. using request_firmware_nowait())
> > > putting firmware even for a built-in driver into initramfs or not
> > > should be a system integrator decision. If they don't need the
> > > device
> > > that early, it should be possible for them to delay it. Or,
> > > perhaps, if
> > > the firmware is too big, etc. I'm sure we can all come up with
> > > more
> > > examples of why you'd want to do it one way or another.
> > 
> > This is how I understood the the situation, but I never quite
> > bought
> > it. What is wrong with the kernel saying "you must put your module
> > and
> > your firmware together"? Sure, people may want to do things
> > differently, but what is the real blocker?
> 
> 0) Firmware upgrades are possible
> 1) Some firmware is optional
> 2) Firmware licenses may often not be GPLv2 compatible
> 3) Some firmwares may be stupid large (remote-proc) as such
>    neither built-in firmware nor using the firmware in initramfs
>    is reasonable.

4) "firmware" may be on a separate flash partition because it's really
   calibration data required by the (wifi) chip/driver
   (to allow kernel updates without having to taylor the kernel image
   to each and every device!!

johannes

Re: [RFC] fs: add userspace critical mounts event support

2016-11-29 Thread Luis R. Rodriguez

On Wed, Nov 09, 2016 at 03:21:07AM -0800, Andy Lutomirski wrote:
> On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner
>  wrote:
> > [CC: added Harald]
> >
> > As Harald pointed out over a beer yesterday evening, there is at least
> > one more reason why UMH isn't obsolete. The ordering of the firmware loading
> > might be of important. Say you want to greet the user with a splash screen
> > really early on, the graphic card firmware should be loaded first. Also the
> > automotive world has this fancy requirement that rear camera must be on the
> > screen within 2 seconds. So controlling the firmware loading order is of
> > importance (e.g. also do not overcommit the I/O bandwith not so important
> > firmwares). A user space helper is able to prioritize the request
> > accordingly the use case.
> 
> That seems like a valid problem, but I don't think that UMH adequately
> solves it.  Sure, loading firmware in the right order avoids a >2sec
> delay due to firmware loading, but what happens when you have a slow
> USB device that *doesn't* need firmware plugged in to your car's shiny
> USB port when you start the car?
> 
> It seems to me that this use case requires explicit control over
> device probing and, if that gets added, you get your firmware ordering
> for free (just probe the important devices first).

In theory this is correct, the problem comes with the flexibility we have
created with pivot_root() and friends (another is mount on /lib/firmware) which
enables system integrators to pick and choose the "real rootfs" to be a few
layers away from the first fs picked up by the kernel. In providing this
flexibility we did not envision nor have devised signals to enable a
deterministic lookup due to the requirements such lookups might have --
in this case the requirements are that direct fs is ready and kosher
all the paths possible for firmware are ready. As you can imagine first race is
not only an issue for firmware but a generic issue.

The generic race on the fs lookup requires a fs availability event, and
addressing fs suspend. I'll note that the race on init is addressed today
*only* by the firmware UMH (its UMH is kobject uevent and optionally a custom
binary) by using the UMH lock. During a cleanup by Daniel recently I
realized it was bogus to use the UMH of the UMH was not used, turns out
this would still expose the direct FS lookup to a race though. This
begs the question if the UMH lock either be removed / shared with the
other kernel UMHs or a generic solution provided for direct fs lookup
with some requirements specified.

This is all a mess so I've documented each component and issues / ideas
we've discussed so far separately, the firmware UMH (which we should
probably rebrand to firmware kobject uevent helper to avoid confusion)
[0], the real kernel usermode helper [1], the new common kernel file
loader [2]

[0] https://kernelnewbies.org/KernelProjects/firmware-class-enhancements
[1] https://kernelnewbies.org/KernelProjects/usermode-helper-enhancements
[2] https://kernelnewbies.org/KernelProjects/common-kernel-loader

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-11-29 Thread Luis R. Rodriguez

On Tue, Nov 29, 2016 at 10:10:56PM +0100, Tom Gundersen wrote:
> On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg
>  wrote:
> > My argument basically goes like this:
> >
> > First, given good drivers (i.e. using request_firmware_nowait())
> > putting firmware even for a built-in driver into initramfs or not
> > should be a system integrator decision. If they don't need the device
> > that early, it should be possible for them to delay it. Or, perhaps, if
> > the firmware is too big, etc. I'm sure we can all come up with more
> > examples of why you'd want to do it one way or another.
> 
> This is how I understood the the situation, but I never quite bought
> it. What is wrong with the kernel saying "you must put your module and
> your firmware together"? Sure, people may want to do things
> differently, but what is the real blocker?

0) Firmware upgrades are possible
1) Some firmware is optional
2) Firmware licenses may often not be GPLv2 compatible
3) Some firmwares may be stupid large (remote-proc) as such
   neither built-in firmware nor using the firmware in initramfs
   is reasonable.

But note that Johannes' main point was that today only a few
properly constructed drivers use async fw request, and furthermore
given the lack of a deterministic final rootfs signal his proposal
was to address the lack of semantics available between kernel and
userspcae available for this with a firmware kobject uevent fallback
helper. This fallback kobject uevent helper would not reply firmly against
files not found until it knows all rootfs firmware paths are ready.

> Fundamentally, it seems to me that if a module needs firmware, it
> makes no sense to make the module available before the firmware. I'm
> probably missing something though :)

You are right but just consider all the above.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-11-29 Thread Tom Gundersen

On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg
 wrote:
> My argument basically goes like this:
>
> First, given good drivers (i.e. using request_firmware_nowait())
> putting firmware even for a built-in driver into initramfs or not
> should be a system integrator decision. If they don't need the device
> that early, it should be possible for them to delay it. Or, perhaps, if
> the firmware is too big, etc. I'm sure we can all come up with more
> examples of why you'd want to do it one way or another.

This is how I understood the the situation, but I never quite bought
it. What is wrong with the kernel saying "you must put your module and
your firmware together"? Sure, people may want to do things
differently, but what is the real blocker?

Fundamentally, it seems to me that if a module needs firmware, it
makes no sense to make the module available before the firmware. I'm
probably missing something though :)

Cheers,

Tom

Re: [RFC] fs: add userspace critical mounts event support

2016-11-15 Thread Johannes Berg

On Tue, 2016-11-08 at 23:47 +0100, Luis R. Rodriguez wrote:

> This issue still stands. At Plumbers Johannes Berg did indicate to me
> he had a simple elegant solution in mind. He suggested that since the
> usermode helper was available, he had added support to be able to
> differentiate async firmware request calls form sync requests,

For reference:

commit e9045f9178f3e3445a3a5b85206f8681b3869562
Author: Johannes Berg 
Date:   Mon Mar 29 17:57:20 2010 +0200

firmware class: export nowait to userspace

> it can determine we're initramfs. The semantics issue is the same
> though, is there a generic way to determine we're initramfs ? What if
> we move multiple levels? Anyway -- provided we could figure this out,
> userspace would simply yield and wait until the real rootfs is met. 

One way or another we have to have this kind of information somewhere.
I don't actually know how/where though.

> Upon pivot_root() the assumption is all previous udev events pending
> would be re-triggered and finally udev could finally confirm/deny if
> the firmware was present.

The retriggering is already the case, as far as I know, if only to load
modules that weren't part of initramfs.

> note Johannes was asking for *all* async firmware requests to always
> rely on the kernel syfs UMH fallback -- this suggestion is against
> the direction we've been taking to eventually compartamentalize the
> kernel UMH code, so whatever we decide to do, lets please take a
> breather and seriously address this properly *with* systemd folks.

I was saying that because that's the only way you can actually rely on
this functionality as a system integrator. If drivers have to opt in or
can opt out then you'll always end up chasing the drivers around.

My argument basically goes like this:

First, given good drivers (i.e. using request_firmware_nowait())
putting firmware even for a built-in driver into initramfs or not
should be a system integrator decision. If they don't need the device
that early, it should be possible for them to delay it. Or, perhaps, if
the firmware is too big, etc. I'm sure we can all come up with more
examples of why you'd want to do it one way or another.

Second, all of this can be solved in other ways by adding logic to the
kernel, like the rejected proposal to add a "rootfs available" bit
somewhere, that would cause async requests to behave similarly within
the kernel (don't return "not found" until they time out or this bit is
set, and retry loading when the bit gets set)

Third, having this in place can be more friendly to users who play with
kernel compilation, modules, etc. This is a fringe group in some ways,
but it is (was?) actually a relatively common complaint that drivers
built into the kernel wouldn't work - we'd always have to direct users
to do magic steps like rebuilding initramfs with the right options etc.

johannes

Re: [RFC] fs: add userspace critical mounts event support

2016-11-09 Thread Luis R. Rodriguez

On Wed, Nov 9, 2016 at 3:21 AM, Andy Lutomirski  wrote:
> On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner
>  wrote:
>> [CC: added Harald]
>>
>> As Harald pointed out over a beer yesterday evening, there is at least
>> one more reason why UMH isn't obsolete. The ordering of the firmware loading
>> might be of important. Say you want to greet the user with a splash screen
>> really early on, the graphic card firmware should be loaded first. Also the
>> automotive world has this fancy requirement that rear camera must be on the
>> screen within 2 seconds. So controlling the firmware loading order is of
>> importance (e.g. also do not overcommit the I/O bandwith not so important
>> firmwares). A user space helper is able to prioritize the request
>> accordingly the use case.
>
> That seems like a valid problem, but I don't think that UMH adequately
> solves it.  Sure, loading firmware in the right order avoids a >2sec
> delay due to firmware loading, but what happens when you have a slow
> USB device that *doesn't* need firmware plugged in to your car's shiny
> USB port when you start the car?
>
> It seems to me that this use case requires explicit control over
> device probing and, if that gets added, you get your firmware ordering
> for free (just probe the important devices first).

Agreed, we could prioritize requests but actually these days we don't
queue requests we simply go through them serially. We could add later
a priority aspect for async requests but its not clear this is needed
yet.

As Andy notes most of these requirements could be solved through
alternative means. One is loading your module really early so stuffing
it into for instance initramfs and if using systmed for instance
listing it on a say /etc/modules-load.d/load.conf file, otherwise
priority of loading the module is set through the actual default init
level associated for the module, if using module_init() that's just
device_initcall() and these are all probed serially. If your driver is
built-in and needs to be after rootfs_initcall() we don't provide
finer granularity for ordering other than just device_initcall() and
late_initcall(). In the future we could extend this easily but I
really do want a valid use case which we can't handle with today's
infrastructure. If this is a special driver and we are certain it must
load really early and first than others for *all* system cases when
the driver is needed, it may be worth looking into this as a new
category. If so let me know and I think we can work on a solution. If
you have systemd though I think using something like
/etc/modules-load.d/load.conf should suffice.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-11-09 Thread Luis R. Rodriguez

On Tue, Nov 8, 2016 at 2:47 PM, Luis R. Rodriguez  wrote:
> Whatever the outcome of this discussion is -- Johannes seemed to *want*
> to further use the UMH by default on *all* async alls... even if the
> driver did not explicitly requested it -- I'm concerned about this given
> all the above and the existing flip/flop on systemd for it. Whatever
> we try to dream up here, please consider all the above as well.

One addition to this: the current API does not always require the UMH
firmware fallback, for most distributions that do not enable
CONFIG_FW_LOADER_USER_HELPER_FALLBACK but do enable
CONFIG_FW_LOADER_USER_HELPER we only require the UMH firmware fallback
*iff* the driver explicitly requests it. For kernels with
CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled we *always* use the UMH
fallback. By fallback note that this means its used only if the first
direct filesystem request failed. For further details on complexities
of the UMH refer to two ongoing threads [0] [1] about it.

Johannes, you seemed to note you added some uevent classifier for
async requests, I checked and it seems this was with commit
e9045f9178f3e ("firmware class: export nowait to userspace") was this
the change you were referring to ? Even with all these complexities
annotated, do you still believe we need the UMH always for all async
calls ?

[0] https://lkml.kernel.org/r/20161109211741.gi13...@wotan.suse.de
[1] https://lkml.kernel.org/r/20161109220210.gj13...@wotan.suse.de

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-11-09 Thread Andy Lutomirski

On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner
 wrote:
> [CC: added Harald]
>
> As Harald pointed out over a beer yesterday evening, there is at least
> one more reason why UMH isn't obsolete. The ordering of the firmware loading
> might be of important. Say you want to greet the user with a splash screen
> really early on, the graphic card firmware should be loaded first. Also the
> automotive world has this fancy requirement that rear camera must be on the
> screen within 2 seconds. So controlling the firmware loading order is of
> importance (e.g. also do not overcommit the I/O bandwith not so important
> firmwares). A user space helper is able to prioritize the request
> accordingly the use case.

That seems like a valid problem, but I don't think that UMH adequately
solves it.  Sure, loading firmware in the right order avoids a >2sec
delay due to firmware loading, but what happens when you have a slow
USB device that *doesn't* need firmware plugged in to your car's shiny
USB port when you start the car?

It seems to me that this use case requires explicit control over
device probing and, if that gets added, you get your firmware ordering
for free (just probe the important devices first).

--Andy

Re: [RFC] fs: add userspace critical mounts event support

2016-11-09 Thread Daniel Wagner


[CC: added Harald]

On 11/08/2016 11:47 PM, Luis R. Rodriguez wrote:

On Wed, Oct 05, 2016 at 09:46:33PM +0200, Luis R. Rodriguez wrote:

On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote:

On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  wrote:

On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:


I did some shuffling around of those code to make initmpfs work, does
anybody know why initramfs extraction _before_ we initialize drivers
would be a bad thing?


No, but it seems sensible to me, if its done before do_initcalls()
that should resolve the race for initramfs users


initramfs should already be set up before drivers are.


Actually you are right, the issue would only be for old initrd, for initramfs
we populate that via rootfs_initcall(populate_rootfs), so as long as drivers
in question use an init level beyond rootfs's we're good there.


Exactly what is it that has trouble right now?


It would seem then that the only current stated race possible should
then be non-initramfs users.


Or:

a) initramfs users that include a driver but do not include the firmware
into initramfs

b) driver is built-in but firmware is not in initrafms (Johannes reports
   this causes driver failure on intel wireless for instance, and I guess
   you need to reload)


One example if very large firmware for
remote-proc, whereby an initramfs is just not practical or desirable.


This issue still stands. At Plumbers Johannes Berg did indicate to me
he had a simple elegant solution in mind. He suggested that since the
usermode helper was available, he had added support to be able to
differentiate async firmware request calls form sync requests, and that
userspace should not return an error *iff* the request made was async and
it can determine we're initramfs. The semantics issue is the same though,
is there a generic way to determine we're initramfs ? What if we move
multiple levels? Anyway -- provided we could figure this out, userspace
would simply yield and wait until the real rootfs is met. Upon pivot_root()
the assumption is all previous udev events pending would be re-triggered
and finally udev could finally confirm/deny if the firmware was present.
This would *also* allow you to stuff your firmware whever, however big
it was. This however relied on the userspace firmware loading support,
it turns out that (I think because of an incorrect negative backlash
back in the day over blaming this over booting issues due to the timeout
whereas the real issue was the kmod timeout was affecting our long
standing serial init()/probe()) the systemd userspace firmware laoding
support was removed from systemd udev in 2014 by Kay via commit
be2ea723b1d023b3d ("udev: remove userspace firmware loading support").

Systemd might *still* be able to provide a solution here, however I will
note Johannes was asking for *all* async firmware requests to always
rely on the kernel syfs UMH fallback -- this suggestion is against the
direction we've been taking to eventually compartamentalize the kernel UMH
code, so whatever we decide to do, lets please take a breather and seriously
address this properly *with* systemd folks.

A side race discussed at Plumbers worth mentioning given its related to the
UMH was inspired by Jiri's talk on the abuse of the freezer for kthreads --
and his suggestion to use freeze_super(). Currently the UMH lock is used
for the UMH but as I have noted in Daniel Wagner's recent patches to
give some love to this code and further compartamentalize the UMH --
the UMH lock was originally added to help avoid drivers use the firmware
API on resume, given the races. The firmware cache solution implemented by
Ming Lei years ago helped address this, whereby devm helpers are used
based on the requested firmware and prior to suspend we cache all required
firmware for drivers so that upon resume calls would work without the
effective race present. This mitigated the actual races / issues with
drivers, but they must not use the firmware API on suspend/resume. Since
this solution *kills* all pending UMH caller on suspend obviously this
means on suspend using request_firmware*() API and expecting it to work
is brutally dumb as we will eventually kill any pending requests. This
is a long winded way to say that if you rely on the UMH for firmware
you must figure out your own proactive firmware cache solution and
must definitely not request firmware on suspend. Two things then:

1) I've been brainstorming with Daniel how to use freeze_super() to
   replace the now invalid UMH lock -- its purpose only helps races
   on boot, for the fallback case to the UMH. But note most distributions
   disable forcing it always on, so these days we *only* rely on the UMH
   as a fallback if the driver explicitly requested it

2) Drivers relying on the UMH very likely have a broken cache solution
   if they are doing this on suspend

Whatever the outcome of this discussion is -- Johannes seemed to *want*
to

Re: [RFC] fs: add userspace critical mounts event support

2016-11-08 Thread Luis R. Rodriguez

On Wed, Oct 05, 2016 at 09:46:33PM +0200, Luis R. Rodriguez wrote:
> On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote:
> > On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  
> > wrote:
> > > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
> > >
> > >> I did some shuffling around of those code to make initmpfs work, does
> > >> anybody know why initramfs extraction _before_ we initialize drivers
> > >> would be a bad thing?
> > >
> > > No, but it seems sensible to me, if its done before do_initcalls()
> > > that should resolve the race for initramfs users
> > 
> > initramfs should already be set up before drivers are.
> 
> Actually you are right, the issue would only be for old initrd, for initramfs
> we populate that via rootfs_initcall(populate_rootfs), so as long as drivers
> in question use an init level beyond rootfs's we're good there.
> 
> > Exactly what is it that has trouble right now?
> 
> It would seem then that the only current stated race possible should
> then be non-initramfs users.

Or:

a) initramfs users that include a driver but do not include the firmware
into initramfs

b) driver is built-in but firmware is not in initrafms (Johannes reports
   this causes driver failure on intel wireless for instance, and I guess
   you need to reload)

> One example if very large firmware for
> remote-proc, whereby an initramfs is just not practical or desirable.

This issue still stands. At Plumbers Johannes Berg did indicate to me
he had a simple elegant solution in mind. He suggested that since the
usermode helper was available, he had added support to be able to
differentiate async firmware request calls form sync requests, and that
userspace should not return an error *iff* the request made was async and
it can determine we're initramfs. The semantics issue is the same though,
is there a generic way to determine we're initramfs ? What if we move
multiple levels? Anyway -- provided we could figure this out, userspace
would simply yield and wait until the real rootfs is met. Upon pivot_root()
the assumption is all previous udev events pending would be re-triggered
and finally udev could finally confirm/deny if the firmware was present.
This would *also* allow you to stuff your firmware whever, however big
it was. This however relied on the userspace firmware loading support,
it turns out that (I think because of an incorrect negative backlash
back in the day over blaming this over booting issues due to the timeout
whereas the real issue was the kmod timeout was affecting our long
standing serial init()/probe()) the systemd userspace firmware laoding
support was removed from systemd udev in 2014 by Kay via commit
be2ea723b1d023b3d ("udev: remove userspace firmware loading support").

Systemd might *still* be able to provide a solution here, however I will
note Johannes was asking for *all* async firmware requests to always
rely on the kernel syfs UMH fallback -- this suggestion is against the
direction we've been taking to eventually compartamentalize the kernel UMH
code, so whatever we decide to do, lets please take a breather and seriously
address this properly *with* systemd folks.

A side race discussed at Plumbers worth mentioning given its related to the
UMH was inspired by Jiri's talk on the abuse of the freezer for kthreads --
and his suggestion to use freeze_super(). Currently the UMH lock is used
for the UMH but as I have noted in Daniel Wagner's recent patches to
give some love to this code and further compartamentalize the UMH --
the UMH lock was originally added to help avoid drivers use the firmware
API on resume, given the races. The firmware cache solution implemented by
Ming Lei years ago helped address this, whereby devm helpers are used
based on the requested firmware and prior to suspend we cache all required
firmware for drivers so that upon resume calls would work without the
effective race present. This mitigated the actual races / issues with
drivers, but they must not use the firmware API on suspend/resume. Since
this solution *kills* all pending UMH caller on suspend obviously this
means on suspend using request_firmware*() API and expecting it to work
is brutally dumb as we will eventually kill any pending requests. This
is a long winded way to say that if you rely on the UMH for firmware
you must figure out your own proactive firmware cache solution and
must definitely not request firmware on suspend. Two things then:

1) I've been brainstorming with Daniel how to use freeze_super() to
   replace the now invalid UMH lock -- its purpose only helps races
   on boot, for the fallback case to the UMH. But note most distributions
   disable forcing it always on, so these days we *only* rely on the UMH
   as a fallback if the driver explicitly requested it

2) Drivers relying on the UMH very likely have a broken cache solution
   if they are doing this on suspend

Whatever the outcome of this discussion is -- Johannes seemed to

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez

On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote:
> On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  wrote:
> > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
> >
> >> I did some shuffling around of those code to make initmpfs work, does
> >> anybody know why initramfs extraction _before_ we initialize drivers
> >> would be a bad thing?
> >
> > No, but it seems sensible to me, if its done before do_initcalls()
> > that should resolve the race for initramfs users
> 
> initramfs should already be set up before drivers are.

Actually you are right, the issue would only be for old initrd, for initramfs
we populate that via rootfs_initcall(populate_rootfs), so as long as drivers
in question use an init level beyond rootfs's we're good there.

> Exactly what is it that has trouble right now?

It would seem then that the only current stated race possible should
then be non-initramfs users. One example if very large firmware for
remote-proc, whereby an initramfs is just not practical or desirable.

> The gating issue for initramfs is that technically the filesystem
> setup needs to be done, which means that it currently ends up being
> populated _fairly_ late in the initcall series, but certainly before
> drivers. But since initramfs really only needs very limited filesystem
> functionality, I assume Rob had few problems with just moving it
> earlier.
> 
> Still, what kind of ordering issues did people have? What is it that
> needs to load files even before driver init? Some crazy subsystem?

No, I think this is just about non-initramfs users now, if we disregard
old initrd users. Bjorn, Marc, correct me if I'm wrong, as I think its
so far you both who have seemed to run into race issues and have then
ended up trying to look for hacks to address this race or considered using
the usermode helper (which we're trying to minimize users for). Daniel
seems to note a lot of video drivers use firmware on probe as well so
there's a potential issue for those users if they don't use initramfs.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Linus Torvalds

On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  wrote:
> On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
>
>> I did some shuffling around of those code to make initmpfs work, does
>> anybody know why initramfs extraction _before_ we initialize drivers
>> would be a bad thing?
>
> No, but it seems sensible to me, if its done before do_initcalls()
> that should resolve the race for initramfs users

initramfs should already be set up before drivers are. Exactly what is
it that has trouble right now?

The gating issue for initramfs is that technically the filesystem
setup needs to be done, which means that it currently ends up being
populated _fairly_ late in the initcall series, but certainly before
drivers. But since initramfs really only needs very limited filesystem
functionality, I assume Rob had few problems with just moving it
earlier.

Still, what kind of ordering issues did people have? What is it that
needs to load files even before driver init? Some crazy subsystem?

   Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez

On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
> On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> > kernel_read_file_from_path() can try to read a file from
> > the system's filesystem. This is typically done for firmware
> > for instance, which lives in /lib/firmware. One issue with
> > this is that the kernel cannot know for sure when the real
> > final /lib/firmare/ is ready, and even if you use initramfs
> > drivers are currently initialized *first* prior to the initramfs
> > kicking off.
> 
> Why?

do_initcalls() is called prior to prepare_namespace(), other than that
we have no strict rules over where the real rootfs should be, and since
we have pivot_root() its up to userspace to decide when/how the real
rootfs goes. This and the fact that its also up to userspace to design
what files to place in initramfs of further rootfs -- only userspace
will know for sure when all firmware for all drivers is really ready.

> > During init we run through all init calls first
> > (do_initcalls()) and finally the initramfs is processed via
> > prepare_namespace():
> 
> What's the downside of moving initramfs cpio extraction earlier in the boot?

That would help users of initrafms, some folks seem to not want to use
initramfs, one of such users are that of the large firmwares for remote-proc
(Documentation/remoteproc.txt), we're talking about over 200 MiB for some
firmware for example.

> I did some shuffling around of those code to make initmpfs work, does
> anybody know why initramfs extraction _before_ we initialize drivers
> would be a bad thing?

No, but it seems sensible to me, if its done before do_initcalls()
that should resolve the race for initramfs users but -- so long
as the drivers that need firmware early are dumped into initramfs.
We have no assurances/warnings for this, but we can add such things
if we want them. This would not resolve the race for non-initramfs
users / pivot_root() changes.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez

On Tue, Oct 04, 2016 at 05:32:22PM -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2016 at 5:24 PM, Luis R. Rodriguez  wrote:
> >
> > Note that the races are beyond firmware, so all
> > kernel_read_file_from_path() users, as such re-using such old /sys/
> > interafeces for firmware will not suffice to cover all ground now for
> > the same race for other possible users.
> 
> Blah blah blah.
> 
> The reason I've hated this whole discussion is that it's full of
> "let's re-architect everything", and then it has these horribly warty
> interfaces.

To be clear, kernel_read_file_from_path() was an agreed upon strategy
about 1 year ago at the Linux Security summit as we found different
kernel implementations for the same exact task, reading files from
the filesystem -- my point here was simply that acknowledging that the
race on early init and driver's init / probe for firmware is implicating
that the race is *also* possible for the other kernel-read-from-fs points.
Its not clear to me what your grudge here is other than the proposal
for a solution in this patch is not what we want.

> It's classic second-system syndrome.
> 
> Just do *one* thing, and do it well. Don't change anything else. Don't
> force existign drivers to use new interfaces. Don't over-architect,
> and don't do stupid interfaces.

If there is a race for the other users and we want to avoid wrapping
a solution for it to the other callers without doing any vetting for
correctness then so be it, but to disregard completely seems error-prone.
I accept that thinking about such other users may complicate a solution
for firmware and if you prefer we just separate the race solution for
both that's fine.

> If user-space mounts a new filesystem (or just unpacks files from a
> tar-file that has firmware images in it, for chissake), that is not
> some magical "critical mount event". The whole concept is just stupid.
> Is it a "mount event" when the user downloads a new firmware image
> from the internet?
> 
> HELL NO.

We've gotten passed that the original implementation proposed is not what we
want, let's move on.

> But what is equally stupid is to then dismiss simple models because
> some totally unrelated "beyond firmware" issue.

I have not heard back from the other stakeholders using
kernel_read_file_from_path() and possible races for them. You seem to suggest
to ignore those possible theoretical races in the name of a simple solution for
firmware. Fine.

> Anything that is "beyond firmware" shouldn't even be discussed, for
> chrissake! It has nothing what-so-ever to do with firmware loading. If
> there ends up being some common helper functions, and shared code,
> that *still* doesn't make it so.

My point was to raise the flag of the possible races on the other call sites
where we read files directly from the kernel, that's all, if we agree we really
don't care for that fine.

> Basic rules of thumb:
> 
>  (a) don't over-design
> 
>  (b) don't have stupid illogical interfaces
> 
>  (c) don't conflate different issues just because you think they may
> have shared code.
> 
>  (4) be consistent. Don't make up new interfaces, and most certainly
> do *NOT* dismiss something just because it's what we have done before.
> 
> That's it.

OK..

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Linus Torvalds

On Tue, Oct 4, 2016 at 6:48 PM, Josh Triplett  wrote:
>
>I definitely don't think it
> should be a system-wide "mount event"; it should be a per-device "go
> direct-load your firmware" poke from userspace.

I don't disagree with that kind of interface. We already have things
like "rescan" for PCI bus devices to force a bus rescan. Iit's a
simple device attribute. Having a similar thing to trigger firmware
reload for a driver sounds entirely sane.

  Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Josh Triplett

On Tue, Oct 04, 2016 at 05:12:58PM -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguez  wrote:
> > I am not sure how/why a firmware loading daemon would be a better
> > idea now. What Marc describes that Josh proposed with signals for
> > userspcae seems more aligned with what we likely need
> 
> Quite frankly, I doubt you want a signal.
> 
> You will want to have some way to specify where the firmware files
> are. Right now we have "fw_path[]" which is hardcoded except for the
> first entry that can be set as a module parameter. But you'd probably
> want to expand on that, which implies some /sys or /proc interface.
> 
> And once you do that, wouldn't it make more sense to just make the
> "update the firmware path /proc/sys/kernel/fw_path file" make things
> re-search for firmware?

That could work, but it seems like overkill to allow changing the path,
rather than the simpler interface of just telling the one driver "go
ahead and direct-load your firmware now".  I definitely don't think it
should be a system-wide "mount event"; it should be a per-device "go
direct-load your firmware" poke from userspace.  That would solve the
"build-in the driver so it can start waking up slow monitors, but wait
to load the firmware until you have a filesystem" problem.  (And it
would avoid creating some unusual driver-specific late-firmware-load
mechanism.)

That said, the Chrome OS folks apparently have some mechanism where they
mount a tmpfs over /lib/firmware to let userspace choose firmware at
runtime, so perhaps the path-changing mechanism would help there.  Kees?

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Linus Torvalds

On Tue, Oct 4, 2016 at 5:24 PM, Luis R. Rodriguez  wrote:
>
> Note that the races are beyond firmware, so all
> kernel_read_file_from_path() users, as such re-using such old /sys/
> interafeces for firmware will not suffice to cover all ground now for
> the same race for other possible users.

Blah blah blah.

The reason I've hated this whole discussion is that it's full of
"let's re-architect everything", and then it has these horribly warty
interfaces. It's classic second-system syndrome.

Just do *one* thing, and do it well. Don't change anything else. Don't
force existign drivers to use new interfaces. Don't over-architect,
and don't do stupid interfaces.

If user-space mounts a new filesystem (or just unpacks files from a
tar-file that has firmware images in it, for chissake), that is not
some magical "critical mount event". The whole concept is just stupid.
Is it a "mount event" when the user downloads a new firmware image
from the internet?

HELL NO.

But what is equally stupid is to then dismiss simple models because
some totally unrelated "beyond firmware" issue.

Anything that is "beyond firmware" shouldn't even be discussed, for
chrissake! It has nothing what-so-ever to do with firmware loading. If
there ends up being some common helper functions, and shared code,
that *still* doesn't make it so.

Basic rules of thumb:

 (a) don't over-design

 (b) don't have stupid illogical interfaces

 (c) don't conflate different issues just because you think they may
have shared code.

 (4) be consistent. Don't make up new interfaces, and most certainly
do *NOT* dismiss something just because it's what we have done before.

That's it.

Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Luis R. Rodriguez

On Tue, Oct 4, 2016 at 5:12 PM, Linus Torvalds
 wrote:
> On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguez  wrote:
>>
>> I am not sure how/why a firmware loading daemon would be a better
>> idea now. What Marc describes that Josh proposed with signals for
>> userspcae seems more aligned with what we likely need
>
> Quite frankly, I doubt you want a signal.
>
> You will want to have some way to specify where the firmware files
> are. Right now we have "fw_path[]" which is hardcoded except for the
> first entry that can be set as a module parameter. But you'd probably
> want to expand on that, which implies some /sys or /proc interface.
>
> And once you do that, wouldn't it make more sense to just make the
> "update the firmware path /proc/sys/kernel/fw_path file" make things
> re-search for firmware?

We can, but re-searching for firmware assumes we cache pending
firmware, we currently don't, we just either process sync or async
firmware requests.

> In other words, the interface has to be something *sensible*. Not some
> idiotic ad-hoc "send a signal" (of which that stupid original patch
> was just a very odd example).

Note that the races are beyond firmware, so all
kernel_read_file_from_path() users, as such re-using such old /sys/
interafeces for firmware will not suffice to cover all ground now for
the same race for other possible users.

 Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Linus Torvalds

On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguez  wrote:
>
> I am not sure how/why a firmware loading daemon would be a better
> idea now. What Marc describes that Josh proposed with signals for
> userspcae seems more aligned with what we likely need

Quite frankly, I doubt you want a signal.

You will want to have some way to specify where the firmware files
are. Right now we have "fw_path[]" which is hardcoded except for the
first entry that can be set as a module parameter. But you'd probably
want to expand on that, which implies some /sys or /proc interface.

And once you do that, wouldn't it make more sense to just make the
"update the firmware path /proc/sys/kernel/fw_path file" make things
re-search for firmware?

In other words, the interface has to be something *sensible*. Not some
idiotic ad-hoc "send a signal" (of which that stupid original patch
was just a very odd example).

  Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-10-04 Thread Luis R. Rodriguez

On Sat, Sep 24, 2016 at 10:41:46AM -0700, Dmitry Torokhov wrote:
> On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marc  wrote:
> > On 03/09/2016 11:10, Dmitry Torokhov wrote:
> >> I was thinking if we kernel could post
> >> "conditions" (maybe simple stings) that it waits for, and userspace
> >> could unlock these "conditions". One of them might be "firmware
> >> available".
> >
> > On idea offered by Josh Triplett that seems to overlap with this one
> > is to have something similar to the (deprecated) userhelper with
> > *per-blob* requests and notifications except for one major difference:
> > userspace would not anymore be in charge of *providing* the blob but
> > would instead only *signal* when a given blob becomes available and is
> > either found or found missing. Then the kernel loads the blob _by
> > itself_; unlike the userhelper. No new “critical filesystem” concept
> > and a *per-blob basis*, allowing any variation of blob locations
> > across any number of initramfs and filesystems.
> >
> 
> Really, I do not quite understand why people have issues with usermode
> helper/uevents.

One reason is you'd have to implement your own cache for suspend/resume.

> It used to work reasonably well (if you were using
> request_firmware_nowait()), as the kernel would post the request and
> then, when userspace was ready[^Hier], uevents would be processed and
> firmware would be loaded. We had a timeout of 60(?) seconds by
> default, but that would be adjusted as systems needed.

The issue with the timeout was kernel developers *assumed* module init
and probe were detached, and saying 'thou shall not load firmware on
probe' seems actually like a more radical change than just saying
'thou shall load firmware on init'. I'll note that as it stands
its the right thing to complain about these users only because we
lack the semantics to ensure correctness if used on init or probe.
The timeout incurred huge latencies for optional firmwares, and
while we had a new API added to avoid the wait on optional firmware,
that obviously still leaved the races as possible. We now have async
probe which *does* enable some original misconceptions by kernel
developers, but by now other issues have also been found on the
usermode helper, the cache was one, another one was a recent discusion
over the user of the UMH lock with the assumption this was providing
a sort of safeguard on early boot use -- it does not, for the same
exact reasons why a UMH lock does not suffice to avoid all possible
rootfs races. For this later issue refer to a recent discussion in
review with Daniel Wagner's patches.

> Unfortunately it all broke when udev started insisting [1] on
> servicing some uevents in strict sequence, which resulted in boot
> stalls.

That was not the only issue... another implicit issue was that
you are reducing the number of possible supported number of
devices Linux supports per module by the timeout, it would
depend on the combine time it takes to both init and probe.
Some drivers are super complex and even if you *don't* have
firmware requirements and say burn the firmware onto a device
we found that *probe* alone was taking a long long time on some
device drivers -- check out cxgb4 driver, where one device actually
ends up loading about 4 subdevices underneath it. Yes that's a mess
and the driver needs a major rewrite to address this in a clean way
but that takes time. Its no trivial pursuit. The umh timeout then
would not be implicated anymore *but* since systemd implemented the
timeout in general for kmod loading it did mean system was limiting
them Linux drivers and how much devices a driver can support
depending on this timeout value. At SUSE we solved this by lifting
this timeout for kmod workers for now. A long term goal here, which
could help, is also to just detach init and probes, so we give to
system what it originally thought. Summary of this all is here:

http://www.do-not-panic.com/2015/12/linux-asynchronous-probe.html

I have some code that starts to enable some of this on systemd/kmod
but it still needs some more testing before I post.

> Maybe the ultimate answer is to write a firmware loading
> daemon that would also listen to netlink events and do properly what
> udev refused to be doing?

Meh, in the wireless subsystem we devised our own file loader,
check CRDA. That worked for us since we needed to optionally
enable digital RSA signed file checking, but long term our
experience is that this is pointless. So we're going to phase
that out in favor of using the firmware API for the file loading
of this file, and support then digital signatures on the firmware.

I am not sure how/why a firmware loading daemon would be a better
idea now. What Marc describes that Josh proposed with signals for
userspcae seems more aligned with what we likely need -- but note
that since we now use a shared common API for kernel reads from a
path via kernel_read_file_from_path() we'd probably want something

Re: [RFC] fs: add userspace critical mounts event support

2016-09-24 Thread Dmitry Torokhov

On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marc  wrote:
> On 03/09/2016 11:10, Dmitry Torokhov wrote:
>> I was thinking if we kernel could post
>> "conditions" (maybe simple stings) that it waits for, and userspace
>> could unlock these "conditions". One of them might be "firmware
>> available".
>
> On idea offered by Josh Triplett that seems to overlap with this one
> is to have something similar to the (deprecated) userhelper with
> *per-blob* requests and notifications except for one major difference:
> userspace would not anymore be in charge of *providing* the blob but
> would instead only *signal* when a given blob becomes available and is
> either found or found missing. Then the kernel loads the blob _by
> itself_; unlike the userhelper. No new “critical filesystem” concept
> and a *per-blob basis*, allowing any variation of blob locations
> across any number of initramfs and filesystems.
>

Really, I do not quite understand why people have issues with usermode
helper/uevents. It used to work reasonably well (if you were using
request_firmware_nowait()), as the kernel would post the request and
then, when userspace was ready[^Hier], uevents would be processed and
firmware would be loaded. We had a timeout of 60(?) seconds by
default, but that would be adjusted as systems needed.

Unfortunately it all broke when udev started insisting [1] on
servicing some uevents in strict sequence, which resulted in boot
stalls. Maybe the ultimate answer is to write a firmware loading
daemon that would also listen to netlink events and do properly what
udev refused to be doing? The distribution would know when it is ready
to service firmware requests (and thus when to start this daemon), and
we would have the freedom of having drivers both built-in and as
modules and bulding firmware into kernel, intiramfs or keep on a
"real" fs available at later time.

Thanks.

-- 
Dmitry

[1] https://lwn.net/Articles/518942/

Re: [RFC] fs: add userspace critical mounts event support

2016-09-23 Thread Herbert, Marc

On 03/09/2016 11:10, Dmitry Torokhov wrote:
> I was thinking if we kernel could post
> "conditions" (maybe simple stings) that it waits for, and userspace
> could unlock these "conditions". One of them might be "firmware
> available".

On idea offered by Josh Triplett that seems to overlap with this one
is to have something similar to the (deprecated) userhelper with
*per-blob* requests and notifications except for one major difference:
userspace would not anymore be in charge of *providing* the blob but
would instead only *signal* when a given blob becomes available and is
either found or found missing. Then the kernel loads the blob _by
itself_; unlike the userhelper. No new “critical filesystem” concept
and a *per-blob basis*, allowing any variation of blob locations
across any number of initramfs and filesystems.

Could this one fly?

Re: [RFC] fs: add userspace critical mounts event support

2016-09-13 Thread Rob Landley

On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> kernel_read_file_from_path() can try to read a file from
> the system's filesystem. This is typically done for firmware
> for instance, which lives in /lib/firmware. One issue with
> this is that the kernel cannot know for sure when the real
> final /lib/firmare/ is ready, and even if you use initramfs
> drivers are currently initialized *first* prior to the initramfs
> kicking off.

Why?

> During init we run through all init calls first
> (do_initcalls()) and finally the initramfs is processed via
> prepare_namespace():

What's the downside of moving initramfs cpio extraction earlier in the boot?

I did some shuffling around of those code to make initmpfs work, does
anybody know why initramfs extraction _before_ we initialize drivers
would be a bad thing? (The cpio is in memory, either linked into the
kernel or from the bootloader. No drivers are needed to extract it,
that's sort of the point.)

The only things I can think of are memory churn (large contiguous
physical page allocations), or if a driver somehow got us access to more
physical memory?

Rob

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Luis R. Rodriguez

On Tue, Sep 06, 2016 at 03:28:47PM -0700, Bjorn Andersson wrote:
> On Tue 06 Sep 14:52 PDT 2016, Luis R. Rodriguez wrote:
> 
> > We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or
> > something like it to help annotate the the driver was only functional with 
> > the
> > firmware, punt things to kmod to deal with the requirements.
> 
> That implies that a single driver will only use a single version of the
> firmware. 

Today firmware requests are done manually by the driver, in the future it should
be possible to specify just an array of firmwares and on that list specify
which firmware is optional, and perhaps if you need at last one. Then you treat
optional firmware upgrades as optional -- and only treat fatal conditions as
such. Today drivers manage all this on their own.  This is something we can
later do as we have the flexible firmware API in place, but for now -- you are
right. There is no clear way to extract the semantics of what firmware
requirements really are in an easy way.

> There are cases where we want a single driver to load firmware
> depending on e.g. hardware revisions,

Dynamic firmware names -- indeed. Good point.

> or previous firmware version and
> there are cases where we want to load firmware based on requested
> use-cases.

True.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Luis R. Rodriguez

On Tue, Sep 06, 2016 at 02:50:51PM -0700, Linus Torvalds wrote:
> On Tue, Sep 6, 2016 at 2:11 PM, Bjorn Andersson
>  wrote:
> > On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote:
> >
> >> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson
> >> Nobody has actually answered the "why don't we just tie the firmware
> >> and module together" question.
> >
> > The answer to this depends on the details of the suggestion; but
> > generally there's a much stronger bond between the kernel and the driver
> > than between the driver and the firmware in my cases.
> 
> I call BS.
> 
> Let me be very clear. I'm not applying that shit-for-brains stupid
> patch, and will not be pulling it unless somebody tricks me into it.

Great thanks. Please note that the only reason I proposed this was to
get the ball rolling in terms of finding a solution to the problem for why
some folks claim they *need* the firmware usermode helper. Granted, upstream
we only have 2 explicit users left, I'm told some out-of-tree users still
need and use the usermode helper.

They claim that without it there is the race between /lib/firmware being ready
and driver asking for the firmware. I was told there were quite a bit
of out-of-tree hacks to address this without using the usermode helper,
the goal of this patch was to create the discussion needed to a proper
resolution to this.

Given that upon inspection -- I've determined that even if you stuff the
firmware into initramfs you may still run into the race (the commit log of this
patch explains that) and that using initramfs was the alternative solution we
expected folks to use -- the only current deterministic sure way a driver can
depend on a firmware and avoid the race is to use CONFIG_EXTRA_FIRMWARE. This
is an issue.

> Because all these arguments make no sense at all.
> 
> If the driver doesn't work without the firmware, then anybody who
> distributes the driver binary without a firmware is just
> *fundamentally* doing something insane.

Some companies only redistribute firmware binaries to specific entities due to
avoid expanding to whom a patent grant is given to. That is, not every company
writing drivers or pushing out binary drivers is willing to dish out the
firmware as per the terms in linux-firmware.

> You may do it for *development* purposes, but doing so for actual *use* would
> be entirely pointless.

To be fair we haven't been explicit about our requirements for firmware_class
and expectations about what we expect for firmware for driver in Linux. This
has all been rather loose.

Furthermore the race issues we have found in firmware_class have only come about
through introspection, and I've been slowly documenting all this tribal 
knowledge.

> See my point? If a distribution is distributing the driver without the
> firmware, then what the hell is the point of such a thing?

Agreed.

> But even if you decide to do that for some odd reason, the patch is
> still just stupid. Instead of adding some crazy infrastructure for
> "now I've mounted everything", you could equally well just
> 
>  (a) make the driver fail the module load if it cannot find a firmware binary

Not sure if its clear but:

0) this is not just about firmware anymore
1) there is a race between using kernel_read_file_from_path() and
   having the filesystem that should be present on be ready;
2) Only *userspace* can know for sure what the real valid filesystem for
   files read from kernel_read_file_from_path() can be ready, so only userspace
   can tell the kernel if a read from kernel_read_file_from_path() is
   at a certain point in time valid.

>  (b) after user space has mounted everything, just do "insmod -a"
> again (or insmod just that driver).

I'm happy to document this as the resolution... I have a feeling some folks
will not like it. We also have built-in drivers to consider, what do we
advise for that? Keep in mind only CONFIG_EXTRA_FIRMWARE is deterministically
safe.

> See? The point is, this "generic" hacky interface is just stupid. It's
> not adding any value. If you add user space "I'm ready now" points
> anyway, you might as well make those points do the right thing and
> just load the module that is now loadable.

This is not about firmware anymore though, and we need to address built-in.

> We could mark such "late loading" modules explicitly if people want
> to, so that you can automate the whole thing about delaying the
> loading in user space.

Now *that* could actually help, for instance add another late
init call which would be called after initramfs and friends -- perhaps
way after prepare_namespace() -- by then we would ensure userspace has
all critical fs mounted. The problem though is we'd still need a way
for userspace to tell the kernel that all critical fs are mounted
as only userspace can know this for sure.

When is that done? How would the kernel know?

We do have PROBE_PREFER_ASYNCHRONOUS, but that does not provide any
guarantees over ready

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Luis R. Rodriguez

On Tue, Sep 06, 2016 at 11:32:05AM -0700, Linus Torvalds wrote:
> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson
>  wrote:
> >
> > Linus, I reversed the order of your questions/answers to fit my answer
> > better.
> 
> Nobody has actually answered the "why don't we just tie the firmware

This is not about firmware anymore. The fact that we were reading files from
the filesystem in different ways called for a generic API, that was done by
Mimi with the new generic kernel_read_file_from_path(), the fact that there
were race concerns with firmware means such races are also in theory possible
outside of firmware.

> and module together" question.

In terms of the firmware, licensing is one reason I'm aware of. Another reason
is updates to firmware files may not implicate a driver update -- its pointless
to rebuild a kernel if all you need is a firmware update.

For both modules and built-in we already have the option to bundle firmware
into initramfs, and CONFIG_EXTRA_FIRMWARE, some folks do not want to use these.
Bjorn noted a few reasons why. Here's the full list of reasons I've heard why
folks shy away from these:

  o Licensing
  o You still want the ability to do updates
  o The size of the files can be huge (remoteproc is talking about 10 MiB files)

Do we want to bring the firmware closer together than what we allow for 
modules? If
so it may make sense to use a modpost command to try to bundle module and
firmware together, the build system could use the MODULE_FIRMWARE() for that.
It does mean you now have a build dependency for linux-firmware on the kernel.
Do we want that?

> Really. If the driver doesn't work without the firmware, then why the
> hell is it separated from it in the first place?

This I agree with, although you still have to consider you may want to enable
updates for firmware without a driver update.

  Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Bjorn Andersson

On Tue 06 Sep 14:52 PDT 2016, Luis R. Rodriguez wrote:

> We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or
> something like it to help annotate the the driver was only functional with the
> firmware, punt things to kmod to deal with the requirements.

That implies that a single driver will only use a single version of the
firmware. There are cases where we want a single driver to load firmware
depending on e.g. hardware revisions, or previous firmware version and
there are cases where we want to load firmware based on requested
use-cases.

Regards,
Bjorn

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Luis R. Rodriguez

On Sat, Sep 03, 2016 at 11:10:02AM -0700, Dmitry Torokhov wrote:
> On Sat, Sep 3, 2016 at 11:01 AM, Linus Torvalds
>  wrote:
> > On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhov
> >  wrote:
> >>
> >> Unfortunately module loading and availability of firmware is very
> >> loosely coupled.
> >
> > The whole "let's add a new magical proc entry 

To be fair it was using a generic sysfs entry.

> > to say that all
> > filesystems are mounted" is all about the user space coupling them.
> 
> I was thinking if we kernel could post "conditions" (maybe simple strings)
> that it waits for, and userspace could unlock these "conditions". One of them
> might be "firmware available".

Dmitry, you seem to be suggesting a generic kernel-userspace "registry" for
intertwined dependencies that the kernel needs and only userspace can provide,
is that right?

If so it sounds overly complicated to resolve this. Do we have other uses
outside of kernel_read_file_from_path() you had in mind for this?

> > I'm just saying that if user space must know about when firmware is
> > ready to be loaded anyway, just do it right. Not with some hacky "you
> > can now do random things" flag.

Note, this isn't just about firmware since we now have a generic API to read
files from the kernel, so kernel_read_file_from_path(). Firmware is now just
*one* user case. Also a complexity here is this is not just for modules but
also for built-in drivers as well. And you want the option to update the
files without the driver.

The proposed solution provides a generic broad syfs entry for letting userspace
inform the kernel when files from the filesystems where it would typically read
from (I'm calling them critical filesystems for lack of a better term) can be
accessible. Something more specific requires a bit more thought given this is
not anymore about just firmware, must also address built-in drivers, and allow
for updates.

> >  But by having user space actually say
> > "put this module together with this firmware".

Keep in mind this isn't about just firmware anymore, and we have to consider
built-in drivers as well. But if we were to just consider firmware... for the
sake of following with examples.

How would this registry work?

We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or
something like it to help annotate the the driver was only functional with the
firmware, punt things to kmod to deal with the requirements.  kmod would only
load the driver if the firmware is present as well, otherwise it could just
return a new -ENOFIRMWARE and try to defer module loading for a later time, it
would load the driver once and if the firmware becomes available... This all
seems rather hacky.

For built-in drivers.. the vmlinux would have the associated firmware reqs, it
would still need a way to let userspace know when such firmware is ready. What
kernel <-> userspace API would we want to use for this ? Or as you note we
could just prevent such drivers to be built-in. I'd be happy with this, however
I don't think the distributions using non-modular kernels will be.

Now whatever we come up with recall this isn't just about firmware anymore.
Which is why I ended up bundling all these requirements up into one generic
"kernel reads file from filesystem" requirement. I can see the syfs approach
being considered hacky -- but I gladly welcome an actual alternative
suggestion.

> > If you just put the two pieces together, then the module "will just work".
> >
> > And quite frankly, that sounds like a much better maintenance practice
> > anyway. If some module doesn't work without the firmware, then dammit,
> > the two *should* go together. Putting them in two different places
> > would be *INSANE*.
> 
> Quite often it does until it does not. Most of the touch controllers
> work just fine until some event (abrupt cutting of power for example)
> where nvram gets corrupted and they come up in bootloader mode. It is
> just an example.

You want to also opt-in for updates, you don't want to require re-building
a driver for a firmware fix, for instance.

 Luis

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Linus Torvalds

On Tue, Sep 6, 2016 at 2:11 PM, Bjorn Andersson
 wrote:
> On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote:
>
>> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson
>> Nobody has actually answered the "why don't we just tie the firmware
>> and module together" question.
>
> The answer to this depends on the details of the suggestion; but
> generally there's a much stronger bond between the kernel and the driver
> than between the driver and the firmware in my cases.

I call BS.

Let me be very clear. I'm not applying that shit-for-brains stupid
patch, and will not be pulling it unless somebody tricks me into it.

Because all these arguments make no sense at all.

If the driver doesn't work without the firmware, then anybody who
distributes the driver binary without a firmware is just
*fundamentally* doing something insane. You may do it for
*development* purposes, but doing so for actual *use* would be
entirely pointless.

See my point? If a distribution is distributing the driver without the
firmware, then what the hell is the point of such a thing?

But even if you decide to do that for some odd reason, the patch is
still just stupid. Instead of adding some crazy infrastructure for
"now I've mounted everything", you could equally well just

 (a) make the driver fail the module load if it cannot find a firmware binary

 (b) after user space has mounted everything, just do "insmod -a"
again (or insmod just that driver).

See? The point is, this "generic" hacky interface is just stupid. It's
not adding any value. If you add user space "I'm ready now" points
anyway, you might as well make those points do the right thing and
just load the module that is now loadable.

We could mark such "late loading" modules explicitly if people want
to, so that you can automate the whole thing about delaying the
loading in user space.

At no point does it make sense to say "I have now mounted all the
important filesystems". Maybe the firmware is extracted later by user
space downloading it from the internet, and the module will then work
only after that point"./

This whole "I have mounted important filesystems" is just pure and
utter garbage. Stop pushing this shit.

 Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Bjorn Andersson

On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote:

> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson
>  wrote:
> >
> > Linus, I reversed the order of your questions/answers to fit my answer
> > better.
> 
> Nobody has actually answered the "why don't we just tie the firmware
> and module together" question.
> 

The answer to this depends on the details of the suggestion; but
generally there's a much stronger bond between the kernel and the driver
than between the driver and the firmware in my cases.

E.g. we have a single remoteproc driver loading and controlling the
Hexagon DSP found in several Qualcomm platforms, so a single kernel
binary could (practically) load hundreds of variants of the firmware.

Both the kernel binary and the firmware in this example are side-loaded
onto the device during development - independently of each other, as
they are developed by different teams (or maybe even different
companies).

I assume that you're not suggesting to actually tie the module together,
as that would be practically difficult and a waste of resources.

Which leaves us with the suggestion that we should store the kernel
module with the firmware file, which is just infeasible from a few
practical reasons - again mostly related to the development flow and how
the files are contained on the devices.

> Really. If the driver doesn't work without the firmware, then why the
> hell is it separated from it in the first place?
> 

In several cases we have a single remoteproc driver controlling several
different co-processors. Further more with the aspiration of being able
to run the same kernel binary (including modules) on more than one
product this is simply not feasible.

As I said above, beyond development there are hundreds of variants of
these firmware files in products - each weighting in at 10-50MB.

The firmware loading part (remoteproc) doesn't care about these
differences and the functional drivers attaching to the services
provided by the firmware can handle the differences between them.

> The hack is a hack, and it just sounds *stupid*.
> 

This I totally agree with.

Regards,
Bjorn

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Bjorn Andersson

On Fri 02 Sep 21:11 PDT 2016, Linus Torvalds wrote:

Linus, I reversed the order of your questions/answers to fit my answer
better.

> On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguez  wrote:
> >
> > Thoughts ?
> What are the drivers that need this, and why can't those drivers just
> be fixed to not do braindead things?
> 

I have several cases where remoteproc drivers are used boot DSPs upon
boot of the device, but the firmware files are way too big for being
stored in initramfs and all consumers of the provided services are
(semi-) probable as the remote processor is booted.

I.e. we need some way to figure out when these files become available so
we can bring these remote processors up.

> It's basically the kernel giving up, and relying on user space to give
> a single flag, and it's broken nasty crap.  Worse, it's broken nasty
> crap with a user interface, so we'll be stuck with it forever. Please
> no.
> 

There are several cases where there granularity of a single flag is not
enough and we do already have a working mechanism for relying on user
space to inform the kernel that firmware is available:
CONFIG_FW_LOADER_USER_HELPER_FALLBACK

Another available solution is, as you say, using kernel modules.  But I
really do not like the deployment issues that comes with kernel modules
during development. (The firmware and remoteproc driver normally does
not have the same flow through a development process)

> 
> I really think this is a horrible hack.
> 

I agree, but that said, I would appreciate a automagical mechanism that
would relieve user space from having to signal to the kernel that the
firmware partition has been mounted.

Regards,
Bjorn

Re: [RFC] fs: add userspace critical mounts event support

2016-09-06 Thread Linus Torvalds

On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson
 wrote:
>
> Linus, I reversed the order of your questions/answers to fit my answer
> better.

Nobody has actually answered the "why don't we just tie the firmware
and module together" question.

Really. If the driver doesn't work without the firmware, then why the
hell is it separated from it in the first place?

The hack is a hack, and it just sounds *stupid*.

   Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-09-03 Thread Dmitry Torokhov

On Sat, Sep 3, 2016 at 11:01 AM, Linus Torvalds
 wrote:
> On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhov
>  wrote:
>>
>> Unfortunately module loading and availability of firmware is very
>> loosely coupled.
>
> The whole "let's add a new magical proc entry to say that all
> filesystems are mounted" is all about the user space coupling them.

I will be first to say that the proposed *implementation* is nowhere
near what should be accepted. I was thinking if we kernel could post
"conditions" (maybe simple stings) that it waits for, and userspace
could unlock these "conditions". One of them might be "firmware
available".

>
> I'm just saying that if user space must know about when firmware is
> ready to be loaded anyway, just do it rigth. Not with some hacky "you
> can now do random things" flag. But by having user space actually say
> "put this module together with this firmware".
>
> If you just put the two pieces together, then the module "will just work".
>
> And quite frankly, that sounds like a much better maintenance practice
> anyway. If some module doesn't work without the firmware, then dammit,
> the two *should* go together. Putting them in two different places
> would be *INSANE*.

Quite often it does until it does not. Most of the touch controllers
work just fine until some event (abrupt cutting of power for example)
where nvram gets corrupted and they come up in bootloader mode. It is
just an example.

Thanks.

-- 
Dmitry

Re: [RFC] fs: add userspace critical mounts event support

2016-09-03 Thread Linus Torvalds

On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhov
 wrote:
>
> Unfortunately module loading and availability of firmware is very
> loosely coupled.

The whole "let's add a new magical proc entry to say that all
filesystems are mounted" is all about the user space coupling them.

I'm just saying that if user space must know about when firmware is
ready to be loaded anyway, just do it rigth. Not with some hacky "you
can now do random things" flag. But by having user space actually say
"put this module together with this firmware".

If you just put the two pieces together, then the module "will just work".

And quite frankly, that sounds like a much better maintenance practice
anyway. If some module doesn't work without the firmware, then dammit,
the two *should* go together. Putting them in two different places
would be *INSANE*.

Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-09-03 Thread Dmitry Torokhov

On Fri, Sep 02, 2016 at 09:41:18PM -0700, Linus Torvalds wrote:
> On Sep 2, 2016 9:20 PM, "Dmitry Torokhov"  wrote:
> >
> > Like what? Some devices do need to have firmware loaded so we know
> > their capabilities, so we really can't push the firmware loading into
> > "open".
> 
> So you
> (a) document that

Document that device may come up half-broken? Not sure how that would
help end user.

> (b) make the driver only build as a module

Unfortunately module loading and availability of firmware is very
loosely coupled. Of course, if you only load modules from the same
partition that your firmware is on you can get away with it, but if some
of the modules are in initramfs and firmware is on final root fs then
it still does not work. And populating also initramfs with firmware that
might be used once in a 1000 boots is somewhat wasteful. That is not
talking about systems that do not wish to use modules for one reason or
another, or even more esoteric setups where non-essential for boot
firmware can be mounted later over nfs, etc, etc.

> (c) make sure the module and the firmware go together

I do not think it is always possible. Quite often it is though, at the
expense of increasing kernel/initramfs size.

> 
> End of problem.
> 
> Why make up random interfaces for crazy stuff?

Because we want a solution that works well for all cases, simple and
complex. This includes allowing drivers to be built into the kernel but
allow them waiting for additional data (config/firmware) that may become
available later in the game. We just need to be able to tell them when
it does not make sense to wait anymore as the data they want is not
coming, and do it more reliably then simply declaring 10 or 30 or 300
seconds time out.

Thanks.

-- 
Dmitry

Re: [RFC] fs: add userspace critical mounts event support

2016-09-03 Thread Linus Torvalds

On Sep 2, 2016 9:20 PM, "Dmitry Torokhov"  wrote:
>
> Like what? Some devices do need to have firmware loaded so we know
> their capabilities, so we really can't push the firmware loading into
> "open".

So you
(a) document that
(b) make the driver only build as a module
(c) make sure the module and the firmware go together

End of problem.

Why make up random interfaces for crazy stuff?

 Linus

Re: [RFC] fs: add userspace critical mounts event support

2016-09-02 Thread Dmitry Torokhov

On Fri, Sep 2, 2016 at 9:11 PM, Linus Torvalds
 wrote:
> On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguez  wrote:
>>
>> Thoughts ?
>
> I really think this is a horrible hack.
>
> It's basically the kernel giving up, and relying on user space to give
> a single flag, and it's broken nasty crap.  Worse, it's broken nasty
> crap with a user interface, so we'll be stuck with it forever. Please
> no.

I agree that interface is bad, but I do believe we need something like this...

>
> What are the drivers that need this, and why can't those drivers just
> be fixed to not do braindead things?

Like what? Some devices do need to have firmware loaded so we know
their capabilities, so we really can't push the firmware loading into
"open". If your touch controller for some reason decided to crap over
it's nvram and comes in bootloader mode it is nice for it to slurp in
config data or firmware so use does not have to trigger update
manually. And while the controller is in bootloader mode we can't
create input device because we do not know what capabilities to
declare.

These devices we want to probe asynchronously and simply tell firmware
loader to wait for firmware to become available. The problem we do not
know when to give up, since we do not know where the firmware might
be. But userspace knows and can tel us.

Thanks.

-- 
Dmitry

Re: [RFC] fs: add userspace critical mounts event support

2016-09-02 Thread Linus Torvalds

On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguez  wrote:
>
> Thoughts ?

I really think this is a horrible hack.

It's basically the kernel giving up, and relying on user space to give
a single flag, and it's broken nasty crap.  Worse, it's broken nasty
crap with a user interface, so we'll be stuck with it forever. Please
no.

What are the drivers that need this, and why can't those drivers just
be fixed to not do braindead things?

Linus

[RFC] fs: add userspace critical mounts event support

2016-09-02 Thread Luis R. Rodriguez

kernel_read_file_from_path() can try to read a file from
the system's filesystem. This is typically done for firmware
for instance, which lives in /lib/firmware. One issue with
this is that the kernel cannot know for sure when the real
final /lib/firmare/ is ready, and even if you use initramfs
drivers are currently initialized *first* prior to the initramfs
kicking off. During init we run through all init calls first
(do_initcalls()) and finally the initramfs is processed via
prepare_namespace():

do_basic_setup() {
   ...
   driver_init();
   ...
   do_initcalls();
   ...
}

kernel_init_freeable() {
   ...
   do_basic_setup();
   ...
   if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
  ramdisk_execute_command = NULL;
  prepare_namespace();
   }
}

This leaves a possible race between loading drivers and any uses
of kernel_read_file_from_path(). Because pivot_root() can be used,
this allows userspace further possibilities in terms of defining
when a kernel critical filesystem should be ready by.

We define kernel critical filesystems as filesystems which the
kernel needs for kernel_read_file_from_path(). Since only userspace
can know when kernel critical filesystems are mounted and ready,
let userspace notify the kernel of this, and enable a new kernel
configuration which lets the kernel wait for this event before
enabling reads from kernel_read_file_from_path().

A default timeout of 10s is used for now. You can override this
through the kernel-parameters using critical_mounts_timeout_ms=T
where T is in ms. cat /sys/kernel/critical_mounts_timeout_ms the
current system value.

When userspace is ready it can simply:

  echo 1 > /sys/kernel/critical_mounts_ready

Signed-off-by: Luis R. Rodriguez 
---

Note, this still leaves the puzzle of the fact that initramfs may carry
some firmware, and some drivers may be OK in using firmware from there,
the wait stuff would just get in the way. To address this I think can
perhaps instead check *one* for the file, and if its present immediately
give it back, we'd only resort to the wait in cases of failure.

Another thing -- other than firmware we have:

security/integrity/ima/ima_fs.c:rc = kernel_read_file_from_path(path, 
, , 0, READING_POLICY);
sound/oss/sound_firmware.h: err = kernel_read_file_from_path((char *)fn, 
(void **)fp, ,

What paths are these? So we can document the current uses in the Kconfig
at least.

Thoughts ?

 Documentation/kernel-parameters.txt |  6 +++
 drivers/base/Kconfig| 48 +++
 fs/exec.c   |  3 ++
 include/linux/fs.h  |  8 
 kernel/ksysfs.c | 77 +
 5 files changed, 142 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 8ccacc44622a..1af89faa9fc9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -849,6 +849,12 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
 
+   critical_mounts_timeout_ms=T[KNL] timeout in ms
+   Format: 
+   Use this to override the kernel's default timeout for
+   waiting for critical system mount points to become
+   available.
+
cryptomgr.notests
 [KNL] Disable crypto self-tests
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 12b4f5551501..21576c0a4898 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -25,6 +25,54 @@ config UEVENT_HELPER_PATH
  via /proc/sys/kernel/hotplug or via /sys/kernel/uevent_helper
  later at runtime.
 
+config CRITICAL_MOUNTS_WAIT
+   bool "Enable waiting for critical-filesystems-ready notification"
+   default n
+   help
+ Kernel subsystems and device drivers often need to read files
+ from the filesystem, however in doing this races are possible at
+ bootup -- the subsystem requesting the file might look for it in /
+ early in boot, but if we haven't yet mounted the real root
+ filesystem we'll just tell the subsystem the file is not present and
+ it will fail. Furthermore what path to the filesystem is used varies
+ depending on the subsystem. To help the kernel we provide the option
+ to let the kernel wait for all critical filesystems to mounted and
+ ready before letting the kernel start trying to read files from the
+ systems' filesystem. Since pivot_root() can be used and therefore a
+ system might be configured to change its / filesystem at bootup as
+ many times as it wishes, only userspace can realy know exactly when
+ all critical filesystems are

39 matches

Mail list logo