Re: [RFC] fs: add userspace critical mounts event support
On Tue, 2016-11-29 at 22:37 +0100, Luis R. Rodriguez wrote: > On Tue, Nov 29, 2016 at 10:10:56PM +0100, Tom Gundersen wrote: > > > > On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg > >wrote: > > > > > > My argument basically goes like this: > > > > > > First, given good drivers (i.e. using request_firmware_nowait()) > > > putting firmware even for a built-in driver into initramfs or not > > > should be a system integrator decision. If they don't need the > > > device > > > that early, it should be possible for them to delay it. Or, > > > perhaps, if > > > the firmware is too big, etc. I'm sure we can all come up with > > > more > > > examples of why you'd want to do it one way or another. > > > > This is how I understood the the situation, but I never quite > > bought > > it. What is wrong with the kernel saying "you must put your module > > and > > your firmware together"? Sure, people may want to do things > > differently, but what is the real blocker? > > 0) Firmware upgrades are possible > 1) Some firmware is optional > 2) Firmware licenses may often not be GPLv2 compatible > 3) Some firmwares may be stupid large (remote-proc) as such > neither built-in firmware nor using the firmware in initramfs > is reasonable. 4) "firmware" may be on a separate flash partition because it's really calibration data required by the (wifi) chip/driver (to allow kernel updates without having to taylor the kernel image to each and every device!! johannes
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Nov 09, 2016 at 03:21:07AM -0800, Andy Lutomirski wrote: > On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner >wrote: > > [CC: added Harald] > > > > As Harald pointed out over a beer yesterday evening, there is at least > > one more reason why UMH isn't obsolete. The ordering of the firmware loading > > might be of important. Say you want to greet the user with a splash screen > > really early on, the graphic card firmware should be loaded first. Also the > > automotive world has this fancy requirement that rear camera must be on the > > screen within 2 seconds. So controlling the firmware loading order is of > > importance (e.g. also do not overcommit the I/O bandwith not so important > > firmwares). A user space helper is able to prioritize the request > > accordingly the use case. > > That seems like a valid problem, but I don't think that UMH adequately > solves it. Sure, loading firmware in the right order avoids a >2sec > delay due to firmware loading, but what happens when you have a slow > USB device that *doesn't* need firmware plugged in to your car's shiny > USB port when you start the car? > > It seems to me that this use case requires explicit control over > device probing and, if that gets added, you get your firmware ordering > for free (just probe the important devices first). In theory this is correct, the problem comes with the flexibility we have created with pivot_root() and friends (another is mount on /lib/firmware) which enables system integrators to pick and choose the "real rootfs" to be a few layers away from the first fs picked up by the kernel. In providing this flexibility we did not envision nor have devised signals to enable a deterministic lookup due to the requirements such lookups might have -- in this case the requirements are that direct fs is ready and kosher all the paths possible for firmware are ready. As you can imagine first race is not only an issue for firmware but a generic issue. The generic race on the fs lookup requires a fs availability event, and addressing fs suspend. I'll note that the race on init is addressed today *only* by the firmware UMH (its UMH is kobject uevent and optionally a custom binary) by using the UMH lock. During a cleanup by Daniel recently I realized it was bogus to use the UMH of the UMH was not used, turns out this would still expose the direct FS lookup to a race though. This begs the question if the UMH lock either be removed / shared with the other kernel UMHs or a generic solution provided for direct fs lookup with some requirements specified. This is all a mess so I've documented each component and issues / ideas we've discussed so far separately, the firmware UMH (which we should probably rebrand to firmware kobject uevent helper to avoid confusion) [0], the real kernel usermode helper [1], the new common kernel file loader [2] [0] https://kernelnewbies.org/KernelProjects/firmware-class-enhancements [1] https://kernelnewbies.org/KernelProjects/usermode-helper-enhancements [2] https://kernelnewbies.org/KernelProjects/common-kernel-loader Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Nov 29, 2016 at 10:10:56PM +0100, Tom Gundersen wrote: > On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg >wrote: > > My argument basically goes like this: > > > > First, given good drivers (i.e. using request_firmware_nowait()) > > putting firmware even for a built-in driver into initramfs or not > > should be a system integrator decision. If they don't need the device > > that early, it should be possible for them to delay it. Or, perhaps, if > > the firmware is too big, etc. I'm sure we can all come up with more > > examples of why you'd want to do it one way or another. > > This is how I understood the the situation, but I never quite bought > it. What is wrong with the kernel saying "you must put your module and > your firmware together"? Sure, people may want to do things > differently, but what is the real blocker? 0) Firmware upgrades are possible 1) Some firmware is optional 2) Firmware licenses may often not be GPLv2 compatible 3) Some firmwares may be stupid large (remote-proc) as such neither built-in firmware nor using the firmware in initramfs is reasonable. But note that Johannes' main point was that today only a few properly constructed drivers use async fw request, and furthermore given the lack of a deterministic final rootfs signal his proposal was to address the lack of semantics available between kernel and userspcae available for this with a firmware kobject uevent fallback helper. This fallback kobject uevent helper would not reply firmly against files not found until it knows all rootfs firmware paths are ready. > Fundamentally, it seems to me that if a module needs firmware, it > makes no sense to make the module available before the firmware. I'm > probably missing something though :) You are right but just consider all the above. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Nov 15, 2016 at 10:28 AM, Johannes Bergwrote: > My argument basically goes like this: > > First, given good drivers (i.e. using request_firmware_nowait()) > putting firmware even for a built-in driver into initramfs or not > should be a system integrator decision. If they don't need the device > that early, it should be possible for them to delay it. Or, perhaps, if > the firmware is too big, etc. I'm sure we can all come up with more > examples of why you'd want to do it one way or another. This is how I understood the the situation, but I never quite bought it. What is wrong with the kernel saying "you must put your module and your firmware together"? Sure, people may want to do things differently, but what is the real blocker? Fundamentally, it seems to me that if a module needs firmware, it makes no sense to make the module available before the firmware. I'm probably missing something though :) Cheers, Tom
Re: [RFC] fs: add userspace critical mounts event support
On Tue, 2016-11-08 at 23:47 +0100, Luis R. Rodriguez wrote: > This issue still stands. At Plumbers Johannes Berg did indicate to me > he had a simple elegant solution in mind. He suggested that since the > usermode helper was available, he had added support to be able to > differentiate async firmware request calls form sync requests, For reference: commit e9045f9178f3e3445a3a5b85206f8681b3869562 Author: Johannes BergDate: Mon Mar 29 17:57:20 2010 +0200 firmware class: export nowait to userspace > it can determine we're initramfs. The semantics issue is the same > though, is there a generic way to determine we're initramfs ? What if > we move multiple levels? Anyway -- provided we could figure this out, > userspace would simply yield and wait until the real rootfs is met. One way or another we have to have this kind of information somewhere. I don't actually know how/where though. > Upon pivot_root() the assumption is all previous udev events pending > would be re-triggered and finally udev could finally confirm/deny if > the firmware was present. The retriggering is already the case, as far as I know, if only to load modules that weren't part of initramfs. > note Johannes was asking for *all* async firmware requests to always > rely on the kernel syfs UMH fallback -- this suggestion is against > the direction we've been taking to eventually compartamentalize the > kernel UMH code, so whatever we decide to do, lets please take a > breather and seriously address this properly *with* systemd folks. I was saying that because that's the only way you can actually rely on this functionality as a system integrator. If drivers have to opt in or can opt out then you'll always end up chasing the drivers around. My argument basically goes like this: First, given good drivers (i.e. using request_firmware_nowait()) putting firmware even for a built-in driver into initramfs or not should be a system integrator decision. If they don't need the device that early, it should be possible for them to delay it. Or, perhaps, if the firmware is too big, etc. I'm sure we can all come up with more examples of why you'd want to do it one way or another. Second, all of this can be solved in other ways by adding logic to the kernel, like the rejected proposal to add a "rootfs available" bit somewhere, that would cause async requests to behave similarly within the kernel (don't return "not found" until they time out or this bit is set, and retry loading when the bit gets set) Third, having this in place can be more friendly to users who play with kernel compilation, modules, etc. This is a fringe group in some ways, but it is (was?) actually a relatively common complaint that drivers built into the kernel wouldn't work - we'd always have to direct users to do magic steps like rebuilding initramfs with the right options etc. johannes
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Nov 9, 2016 at 3:21 AM, Andy Lutomirskiwrote: > On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner > wrote: >> [CC: added Harald] >> >> As Harald pointed out over a beer yesterday evening, there is at least >> one more reason why UMH isn't obsolete. The ordering of the firmware loading >> might be of important. Say you want to greet the user with a splash screen >> really early on, the graphic card firmware should be loaded first. Also the >> automotive world has this fancy requirement that rear camera must be on the >> screen within 2 seconds. So controlling the firmware loading order is of >> importance (e.g. also do not overcommit the I/O bandwith not so important >> firmwares). A user space helper is able to prioritize the request >> accordingly the use case. > > That seems like a valid problem, but I don't think that UMH adequately > solves it. Sure, loading firmware in the right order avoids a >2sec > delay due to firmware loading, but what happens when you have a slow > USB device that *doesn't* need firmware plugged in to your car's shiny > USB port when you start the car? > > It seems to me that this use case requires explicit control over > device probing and, if that gets added, you get your firmware ordering > for free (just probe the important devices first). Agreed, we could prioritize requests but actually these days we don't queue requests we simply go through them serially. We could add later a priority aspect for async requests but its not clear this is needed yet. As Andy notes most of these requirements could be solved through alternative means. One is loading your module really early so stuffing it into for instance initramfs and if using systmed for instance listing it on a say /etc/modules-load.d/load.conf file, otherwise priority of loading the module is set through the actual default init level associated for the module, if using module_init() that's just device_initcall() and these are all probed serially. If your driver is built-in and needs to be after rootfs_initcall() we don't provide finer granularity for ordering other than just device_initcall() and late_initcall(). In the future we could extend this easily but I really do want a valid use case which we can't handle with today's infrastructure. If this is a special driver and we are certain it must load really early and first than others for *all* system cases when the driver is needed, it may be worth looking into this as a new category. If so let me know and I think we can work on a solution. If you have systemd though I think using something like /etc/modules-load.d/load.conf should suffice. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Nov 8, 2016 at 2:47 PM, Luis R. Rodriguezwrote: > Whatever the outcome of this discussion is -- Johannes seemed to *want* > to further use the UMH by default on *all* async alls... even if the > driver did not explicitly requested it -- I'm concerned about this given > all the above and the existing flip/flop on systemd for it. Whatever > we try to dream up here, please consider all the above as well. One addition to this: the current API does not always require the UMH firmware fallback, for most distributions that do not enable CONFIG_FW_LOADER_USER_HELPER_FALLBACK but do enable CONFIG_FW_LOADER_USER_HELPER we only require the UMH firmware fallback *iff* the driver explicitly requests it. For kernels with CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled we *always* use the UMH fallback. By fallback note that this means its used only if the first direct filesystem request failed. For further details on complexities of the UMH refer to two ongoing threads [0] [1] about it. Johannes, you seemed to note you added some uevent classifier for async requests, I checked and it seems this was with commit e9045f9178f3e ("firmware class: export nowait to userspace") was this the change you were referring to ? Even with all these complexities annotated, do you still believe we need the UMH always for all async calls ? [0] https://lkml.kernel.org/r/20161109211741.gi13...@wotan.suse.de [1] https://lkml.kernel.org/r/20161109220210.gj13...@wotan.suse.de Luis
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagnerwrote: > [CC: added Harald] > > As Harald pointed out over a beer yesterday evening, there is at least > one more reason why UMH isn't obsolete. The ordering of the firmware loading > might be of important. Say you want to greet the user with a splash screen > really early on, the graphic card firmware should be loaded first. Also the > automotive world has this fancy requirement that rear camera must be on the > screen within 2 seconds. So controlling the firmware loading order is of > importance (e.g. also do not overcommit the I/O bandwith not so important > firmwares). A user space helper is able to prioritize the request > accordingly the use case. That seems like a valid problem, but I don't think that UMH adequately solves it. Sure, loading firmware in the right order avoids a >2sec delay due to firmware loading, but what happens when you have a slow USB device that *doesn't* need firmware plugged in to your car's shiny USB port when you start the car? It seems to me that this use case requires explicit control over device probing and, if that gets added, you get your firmware ordering for free (just probe the important devices first). --Andy
Re: [RFC] fs: add userspace critical mounts event support
[CC: added Harald] On 11/08/2016 11:47 PM, Luis R. Rodriguez wrote: On Wed, Oct 05, 2016 at 09:46:33PM +0200, Luis R. Rodriguez wrote: On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote: On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguezwrote: On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: I did some shuffling around of those code to make initmpfs work, does anybody know why initramfs extraction _before_ we initialize drivers would be a bad thing? No, but it seems sensible to me, if its done before do_initcalls() that should resolve the race for initramfs users initramfs should already be set up before drivers are. Actually you are right, the issue would only be for old initrd, for initramfs we populate that via rootfs_initcall(populate_rootfs), so as long as drivers in question use an init level beyond rootfs's we're good there. Exactly what is it that has trouble right now? It would seem then that the only current stated race possible should then be non-initramfs users. Or: a) initramfs users that include a driver but do not include the firmware into initramfs b) driver is built-in but firmware is not in initrafms (Johannes reports this causes driver failure on intel wireless for instance, and I guess you need to reload) One example if very large firmware for remote-proc, whereby an initramfs is just not practical or desirable. This issue still stands. At Plumbers Johannes Berg did indicate to me he had a simple elegant solution in mind. He suggested that since the usermode helper was available, he had added support to be able to differentiate async firmware request calls form sync requests, and that userspace should not return an error *iff* the request made was async and it can determine we're initramfs. The semantics issue is the same though, is there a generic way to determine we're initramfs ? What if we move multiple levels? Anyway -- provided we could figure this out, userspace would simply yield and wait until the real rootfs is met. Upon pivot_root() the assumption is all previous udev events pending would be re-triggered and finally udev could finally confirm/deny if the firmware was present. This would *also* allow you to stuff your firmware whever, however big it was. This however relied on the userspace firmware loading support, it turns out that (I think because of an incorrect negative backlash back in the day over blaming this over booting issues due to the timeout whereas the real issue was the kmod timeout was affecting our long standing serial init()/probe()) the systemd userspace firmware laoding support was removed from systemd udev in 2014 by Kay via commit be2ea723b1d023b3d ("udev: remove userspace firmware loading support"). Systemd might *still* be able to provide a solution here, however I will note Johannes was asking for *all* async firmware requests to always rely on the kernel syfs UMH fallback -- this suggestion is against the direction we've been taking to eventually compartamentalize the kernel UMH code, so whatever we decide to do, lets please take a breather and seriously address this properly *with* systemd folks. A side race discussed at Plumbers worth mentioning given its related to the UMH was inspired by Jiri's talk on the abuse of the freezer for kthreads -- and his suggestion to use freeze_super(). Currently the UMH lock is used for the UMH but as I have noted in Daniel Wagner's recent patches to give some love to this code and further compartamentalize the UMH -- the UMH lock was originally added to help avoid drivers use the firmware API on resume, given the races. The firmware cache solution implemented by Ming Lei years ago helped address this, whereby devm helpers are used based on the requested firmware and prior to suspend we cache all required firmware for drivers so that upon resume calls would work without the effective race present. This mitigated the actual races / issues with drivers, but they must not use the firmware API on suspend/resume. Since this solution *kills* all pending UMH caller on suspend obviously this means on suspend using request_firmware*() API and expecting it to work is brutally dumb as we will eventually kill any pending requests. This is a long winded way to say that if you rely on the UMH for firmware you must figure out your own proactive firmware cache solution and must definitely not request firmware on suspend. Two things then: 1) I've been brainstorming with Daniel how to use freeze_super() to replace the now invalid UMH lock -- its purpose only helps races on boot, for the fallback case to the UMH. But note most distributions disable forcing it always on, so these days we *only* rely on the UMH as a fallback if the driver explicitly requested it 2) Drivers relying on the UMH very likely have a broken cache solution if they are doing this on suspend Whatever the outcome of this discussion is -- Johannes seemed to *want* to
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Oct 05, 2016 at 09:46:33PM +0200, Luis R. Rodriguez wrote: > On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote: > > On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez> > wrote: > > > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: > > > > > >> I did some shuffling around of those code to make initmpfs work, does > > >> anybody know why initramfs extraction _before_ we initialize drivers > > >> would be a bad thing? > > > > > > No, but it seems sensible to me, if its done before do_initcalls() > > > that should resolve the race for initramfs users > > > > initramfs should already be set up before drivers are. > > Actually you are right, the issue would only be for old initrd, for initramfs > we populate that via rootfs_initcall(populate_rootfs), so as long as drivers > in question use an init level beyond rootfs's we're good there. > > > Exactly what is it that has trouble right now? > > It would seem then that the only current stated race possible should > then be non-initramfs users. Or: a) initramfs users that include a driver but do not include the firmware into initramfs b) driver is built-in but firmware is not in initrafms (Johannes reports this causes driver failure on intel wireless for instance, and I guess you need to reload) > One example if very large firmware for > remote-proc, whereby an initramfs is just not practical or desirable. This issue still stands. At Plumbers Johannes Berg did indicate to me he had a simple elegant solution in mind. He suggested that since the usermode helper was available, he had added support to be able to differentiate async firmware request calls form sync requests, and that userspace should not return an error *iff* the request made was async and it can determine we're initramfs. The semantics issue is the same though, is there a generic way to determine we're initramfs ? What if we move multiple levels? Anyway -- provided we could figure this out, userspace would simply yield and wait until the real rootfs is met. Upon pivot_root() the assumption is all previous udev events pending would be re-triggered and finally udev could finally confirm/deny if the firmware was present. This would *also* allow you to stuff your firmware whever, however big it was. This however relied on the userspace firmware loading support, it turns out that (I think because of an incorrect negative backlash back in the day over blaming this over booting issues due to the timeout whereas the real issue was the kmod timeout was affecting our long standing serial init()/probe()) the systemd userspace firmware laoding support was removed from systemd udev in 2014 by Kay via commit be2ea723b1d023b3d ("udev: remove userspace firmware loading support"). Systemd might *still* be able to provide a solution here, however I will note Johannes was asking for *all* async firmware requests to always rely on the kernel syfs UMH fallback -- this suggestion is against the direction we've been taking to eventually compartamentalize the kernel UMH code, so whatever we decide to do, lets please take a breather and seriously address this properly *with* systemd folks. A side race discussed at Plumbers worth mentioning given its related to the UMH was inspired by Jiri's talk on the abuse of the freezer for kthreads -- and his suggestion to use freeze_super(). Currently the UMH lock is used for the UMH but as I have noted in Daniel Wagner's recent patches to give some love to this code and further compartamentalize the UMH -- the UMH lock was originally added to help avoid drivers use the firmware API on resume, given the races. The firmware cache solution implemented by Ming Lei years ago helped address this, whereby devm helpers are used based on the requested firmware and prior to suspend we cache all required firmware for drivers so that upon resume calls would work without the effective race present. This mitigated the actual races / issues with drivers, but they must not use the firmware API on suspend/resume. Since this solution *kills* all pending UMH caller on suspend obviously this means on suspend using request_firmware*() API and expecting it to work is brutally dumb as we will eventually kill any pending requests. This is a long winded way to say that if you rely on the UMH for firmware you must figure out your own proactive firmware cache solution and must definitely not request firmware on suspend. Two things then: 1) I've been brainstorming with Daniel how to use freeze_super() to replace the now invalid UMH lock -- its purpose only helps races on boot, for the fallback case to the UMH. But note most distributions disable forcing it always on, so these days we *only* rely on the UMH as a fallback if the driver explicitly requested it 2) Drivers relying on the UMH very likely have a broken cache solution if they are doing this on suspend Whatever the outcome of this discussion is -- Johannes seemed to
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote: > On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguezwrote: > > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: > > > >> I did some shuffling around of those code to make initmpfs work, does > >> anybody know why initramfs extraction _before_ we initialize drivers > >> would be a bad thing? > > > > No, but it seems sensible to me, if its done before do_initcalls() > > that should resolve the race for initramfs users > > initramfs should already be set up before drivers are. Actually you are right, the issue would only be for old initrd, for initramfs we populate that via rootfs_initcall(populate_rootfs), so as long as drivers in question use an init level beyond rootfs's we're good there. > Exactly what is it that has trouble right now? It would seem then that the only current stated race possible should then be non-initramfs users. One example if very large firmware for remote-proc, whereby an initramfs is just not practical or desirable. > The gating issue for initramfs is that technically the filesystem > setup needs to be done, which means that it currently ends up being > populated _fairly_ late in the initcall series, but certainly before > drivers. But since initramfs really only needs very limited filesystem > functionality, I assume Rob had few problems with just moving it > earlier. > > Still, what kind of ordering issues did people have? What is it that > needs to load files even before driver init? Some crazy subsystem? No, I think this is just about non-initramfs users now, if we disregard old initrd users. Bjorn, Marc, correct me if I'm wrong, as I think its so far you both who have seemed to run into race issues and have then ended up trying to look for hacks to address this race or considered using the usermode helper (which we're trying to minimize users for). Daniel seems to note a lot of video drivers use firmware on probe as well so there's a potential issue for those users if they don't use initramfs. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguezwrote: > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: > >> I did some shuffling around of those code to make initmpfs work, does >> anybody know why initramfs extraction _before_ we initialize drivers >> would be a bad thing? > > No, but it seems sensible to me, if its done before do_initcalls() > that should resolve the race for initramfs users initramfs should already be set up before drivers are. Exactly what is it that has trouble right now? The gating issue for initramfs is that technically the filesystem setup needs to be done, which means that it currently ends up being populated _fairly_ late in the initcall series, but certainly before drivers. But since initramfs really only needs very limited filesystem functionality, I assume Rob had few problems with just moving it earlier. Still, what kind of ordering issues did people have? What is it that needs to load files even before driver init? Some crazy subsystem? Linus
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: > On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote: > > kernel_read_file_from_path() can try to read a file from > > the system's filesystem. This is typically done for firmware > > for instance, which lives in /lib/firmware. One issue with > > this is that the kernel cannot know for sure when the real > > final /lib/firmare/ is ready, and even if you use initramfs > > drivers are currently initialized *first* prior to the initramfs > > kicking off. > > Why? do_initcalls() is called prior to prepare_namespace(), other than that we have no strict rules over where the real rootfs should be, and since we have pivot_root() its up to userspace to decide when/how the real rootfs goes. This and the fact that its also up to userspace to design what files to place in initramfs of further rootfs -- only userspace will know for sure when all firmware for all drivers is really ready. > > During init we run through all init calls first > > (do_initcalls()) and finally the initramfs is processed via > > prepare_namespace(): > > What's the downside of moving initramfs cpio extraction earlier in the boot? That would help users of initrafms, some folks seem to not want to use initramfs, one of such users are that of the large firmwares for remote-proc (Documentation/remoteproc.txt), we're talking about over 200 MiB for some firmware for example. > I did some shuffling around of those code to make initmpfs work, does > anybody know why initramfs extraction _before_ we initialize drivers > would be a bad thing? No, but it seems sensible to me, if its done before do_initcalls() that should resolve the race for initramfs users but -- so long as the drivers that need firmware early are dumped into initramfs. We have no assurances/warnings for this, but we can add such things if we want them. This would not resolve the race for non-initramfs users / pivot_root() changes. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 04, 2016 at 05:32:22PM -0700, Linus Torvalds wrote: > On Tue, Oct 4, 2016 at 5:24 PM, Luis R. Rodriguezwrote: > > > > Note that the races are beyond firmware, so all > > kernel_read_file_from_path() users, as such re-using such old /sys/ > > interafeces for firmware will not suffice to cover all ground now for > > the same race for other possible users. > > Blah blah blah. > > The reason I've hated this whole discussion is that it's full of > "let's re-architect everything", and then it has these horribly warty > interfaces. To be clear, kernel_read_file_from_path() was an agreed upon strategy about 1 year ago at the Linux Security summit as we found different kernel implementations for the same exact task, reading files from the filesystem -- my point here was simply that acknowledging that the race on early init and driver's init / probe for firmware is implicating that the race is *also* possible for the other kernel-read-from-fs points. Its not clear to me what your grudge here is other than the proposal for a solution in this patch is not what we want. > It's classic second-system syndrome. > > Just do *one* thing, and do it well. Don't change anything else. Don't > force existign drivers to use new interfaces. Don't over-architect, > and don't do stupid interfaces. If there is a race for the other users and we want to avoid wrapping a solution for it to the other callers without doing any vetting for correctness then so be it, but to disregard completely seems error-prone. I accept that thinking about such other users may complicate a solution for firmware and if you prefer we just separate the race solution for both that's fine. > If user-space mounts a new filesystem (or just unpacks files from a > tar-file that has firmware images in it, for chissake), that is not > some magical "critical mount event". The whole concept is just stupid. > Is it a "mount event" when the user downloads a new firmware image > from the internet? > > HELL NO. We've gotten passed that the original implementation proposed is not what we want, let's move on. > But what is equally stupid is to then dismiss simple models because > some totally unrelated "beyond firmware" issue. I have not heard back from the other stakeholders using kernel_read_file_from_path() and possible races for them. You seem to suggest to ignore those possible theoretical races in the name of a simple solution for firmware. Fine. > Anything that is "beyond firmware" shouldn't even be discussed, for > chrissake! It has nothing what-so-ever to do with firmware loading. If > there ends up being some common helper functions, and shared code, > that *still* doesn't make it so. My point was to raise the flag of the possible races on the other call sites where we read files directly from the kernel, that's all, if we agree we really don't care for that fine. > Basic rules of thumb: > > (a) don't over-design > > (b) don't have stupid illogical interfaces > > (c) don't conflate different issues just because you think they may > have shared code. > > (4) be consistent. Don't make up new interfaces, and most certainly > do *NOT* dismiss something just because it's what we have done before. > > That's it. OK.. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 4, 2016 at 6:48 PM, Josh Triplettwrote: > >I definitely don't think it > should be a system-wide "mount event"; it should be a per-device "go > direct-load your firmware" poke from userspace. I don't disagree with that kind of interface. We already have things like "rescan" for PCI bus devices to force a bus rescan. Iit's a simple device attribute. Having a similar thing to trigger firmware reload for a driver sounds entirely sane. Linus
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 04, 2016 at 05:12:58PM -0700, Linus Torvalds wrote: > On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguezwrote: > > I am not sure how/why a firmware loading daemon would be a better > > idea now. What Marc describes that Josh proposed with signals for > > userspcae seems more aligned with what we likely need > > Quite frankly, I doubt you want a signal. > > You will want to have some way to specify where the firmware files > are. Right now we have "fw_path[]" which is hardcoded except for the > first entry that can be set as a module parameter. But you'd probably > want to expand on that, which implies some /sys or /proc interface. > > And once you do that, wouldn't it make more sense to just make the > "update the firmware path /proc/sys/kernel/fw_path file" make things > re-search for firmware? That could work, but it seems like overkill to allow changing the path, rather than the simpler interface of just telling the one driver "go ahead and direct-load your firmware now". I definitely don't think it should be a system-wide "mount event"; it should be a per-device "go direct-load your firmware" poke from userspace. That would solve the "build-in the driver so it can start waking up slow monitors, but wait to load the firmware until you have a filesystem" problem. (And it would avoid creating some unusual driver-specific late-firmware-load mechanism.) That said, the Chrome OS folks apparently have some mechanism where they mount a tmpfs over /lib/firmware to let userspace choose firmware at runtime, so perhaps the path-changing mechanism would help there. Kees?
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 4, 2016 at 5:24 PM, Luis R. Rodriguezwrote: > > Note that the races are beyond firmware, so all > kernel_read_file_from_path() users, as such re-using such old /sys/ > interafeces for firmware will not suffice to cover all ground now for > the same race for other possible users. Blah blah blah. The reason I've hated this whole discussion is that it's full of "let's re-architect everything", and then it has these horribly warty interfaces. It's classic second-system syndrome. Just do *one* thing, and do it well. Don't change anything else. Don't force existign drivers to use new interfaces. Don't over-architect, and don't do stupid interfaces. If user-space mounts a new filesystem (or just unpacks files from a tar-file that has firmware images in it, for chissake), that is not some magical "critical mount event". The whole concept is just stupid. Is it a "mount event" when the user downloads a new firmware image from the internet? HELL NO. But what is equally stupid is to then dismiss simple models because some totally unrelated "beyond firmware" issue. Anything that is "beyond firmware" shouldn't even be discussed, for chrissake! It has nothing what-so-ever to do with firmware loading. If there ends up being some common helper functions, and shared code, that *still* doesn't make it so. Basic rules of thumb: (a) don't over-design (b) don't have stupid illogical interfaces (c) don't conflate different issues just because you think they may have shared code. (4) be consistent. Don't make up new interfaces, and most certainly do *NOT* dismiss something just because it's what we have done before. That's it. Linus
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 4, 2016 at 5:12 PM, Linus Torvaldswrote: > On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguez wrote: >> >> I am not sure how/why a firmware loading daemon would be a better >> idea now. What Marc describes that Josh proposed with signals for >> userspcae seems more aligned with what we likely need > > Quite frankly, I doubt you want a signal. > > You will want to have some way to specify where the firmware files > are. Right now we have "fw_path[]" which is hardcoded except for the > first entry that can be set as a module parameter. But you'd probably > want to expand on that, which implies some /sys or /proc interface. > > And once you do that, wouldn't it make more sense to just make the > "update the firmware path /proc/sys/kernel/fw_path file" make things > re-search for firmware? We can, but re-searching for firmware assumes we cache pending firmware, we currently don't, we just either process sync or async firmware requests. > In other words, the interface has to be something *sensible*. Not some > idiotic ad-hoc "send a signal" (of which that stupid original patch > was just a very odd example). Note that the races are beyond firmware, so all kernel_read_file_from_path() users, as such re-using such old /sys/ interafeces for firmware will not suffice to cover all ground now for the same race for other possible users. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Oct 4, 2016 at 5:00 PM, Luis R. Rodriguezwrote: > > I am not sure how/why a firmware loading daemon would be a better > idea now. What Marc describes that Josh proposed with signals for > userspcae seems more aligned with what we likely need Quite frankly, I doubt you want a signal. You will want to have some way to specify where the firmware files are. Right now we have "fw_path[]" which is hardcoded except for the first entry that can be set as a module parameter. But you'd probably want to expand on that, which implies some /sys or /proc interface. And once you do that, wouldn't it make more sense to just make the "update the firmware path /proc/sys/kernel/fw_path file" make things re-search for firmware? In other words, the interface has to be something *sensible*. Not some idiotic ad-hoc "send a signal" (of which that stupid original patch was just a very odd example). Linus
Re: [RFC] fs: add userspace critical mounts event support
On Sat, Sep 24, 2016 at 10:41:46AM -0700, Dmitry Torokhov wrote: > On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marcwrote: > > On 03/09/2016 11:10, Dmitry Torokhov wrote: > >> I was thinking if we kernel could post > >> "conditions" (maybe simple stings) that it waits for, and userspace > >> could unlock these "conditions". One of them might be "firmware > >> available". > > > > On idea offered by Josh Triplett that seems to overlap with this one > > is to have something similar to the (deprecated) userhelper with > > *per-blob* requests and notifications except for one major difference: > > userspace would not anymore be in charge of *providing* the blob but > > would instead only *signal* when a given blob becomes available and is > > either found or found missing. Then the kernel loads the blob _by > > itself_; unlike the userhelper. No new “critical filesystem” concept > > and a *per-blob basis*, allowing any variation of blob locations > > across any number of initramfs and filesystems. > > > > Really, I do not quite understand why people have issues with usermode > helper/uevents. One reason is you'd have to implement your own cache for suspend/resume. > It used to work reasonably well (if you were using > request_firmware_nowait()), as the kernel would post the request and > then, when userspace was ready[^Hier], uevents would be processed and > firmware would be loaded. We had a timeout of 60(?) seconds by > default, but that would be adjusted as systems needed. The issue with the timeout was kernel developers *assumed* module init and probe were detached, and saying 'thou shall not load firmware on probe' seems actually like a more radical change than just saying 'thou shall load firmware on init'. I'll note that as it stands its the right thing to complain about these users only because we lack the semantics to ensure correctness if used on init or probe. The timeout incurred huge latencies for optional firmwares, and while we had a new API added to avoid the wait on optional firmware, that obviously still leaved the races as possible. We now have async probe which *does* enable some original misconceptions by kernel developers, but by now other issues have also been found on the usermode helper, the cache was one, another one was a recent discusion over the user of the UMH lock with the assumption this was providing a sort of safeguard on early boot use -- it does not, for the same exact reasons why a UMH lock does not suffice to avoid all possible rootfs races. For this later issue refer to a recent discussion in review with Daniel Wagner's patches. > Unfortunately it all broke when udev started insisting [1] on > servicing some uevents in strict sequence, which resulted in boot > stalls. That was not the only issue... another implicit issue was that you are reducing the number of possible supported number of devices Linux supports per module by the timeout, it would depend on the combine time it takes to both init and probe. Some drivers are super complex and even if you *don't* have firmware requirements and say burn the firmware onto a device we found that *probe* alone was taking a long long time on some device drivers -- check out cxgb4 driver, where one device actually ends up loading about 4 subdevices underneath it. Yes that's a mess and the driver needs a major rewrite to address this in a clean way but that takes time. Its no trivial pursuit. The umh timeout then would not be implicated anymore *but* since systemd implemented the timeout in general for kmod loading it did mean system was limiting them Linux drivers and how much devices a driver can support depending on this timeout value. At SUSE we solved this by lifting this timeout for kmod workers for now. A long term goal here, which could help, is also to just detach init and probes, so we give to system what it originally thought. Summary of this all is here: http://www.do-not-panic.com/2015/12/linux-asynchronous-probe.html I have some code that starts to enable some of this on systemd/kmod but it still needs some more testing before I post. > Maybe the ultimate answer is to write a firmware loading > daemon that would also listen to netlink events and do properly what > udev refused to be doing? Meh, in the wireless subsystem we devised our own file loader, check CRDA. That worked for us since we needed to optionally enable digital RSA signed file checking, but long term our experience is that this is pointless. So we're going to phase that out in favor of using the firmware API for the file loading of this file, and support then digital signatures on the firmware. I am not sure how/why a firmware loading daemon would be a better idea now. What Marc describes that Josh proposed with signals for userspcae seems more aligned with what we likely need -- but note that since we now use a shared common API for kernel reads from a path via kernel_read_file_from_path() we'd probably want something
Re: [RFC] fs: add userspace critical mounts event support
On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marcwrote: > On 03/09/2016 11:10, Dmitry Torokhov wrote: >> I was thinking if we kernel could post >> "conditions" (maybe simple stings) that it waits for, and userspace >> could unlock these "conditions". One of them might be "firmware >> available". > > On idea offered by Josh Triplett that seems to overlap with this one > is to have something similar to the (deprecated) userhelper with > *per-blob* requests and notifications except for one major difference: > userspace would not anymore be in charge of *providing* the blob but > would instead only *signal* when a given blob becomes available and is > either found or found missing. Then the kernel loads the blob _by > itself_; unlike the userhelper. No new “critical filesystem” concept > and a *per-blob basis*, allowing any variation of blob locations > across any number of initramfs and filesystems. > Really, I do not quite understand why people have issues with usermode helper/uevents. It used to work reasonably well (if you were using request_firmware_nowait()), as the kernel would post the request and then, when userspace was ready[^Hier], uevents would be processed and firmware would be loaded. We had a timeout of 60(?) seconds by default, but that would be adjusted as systems needed. Unfortunately it all broke when udev started insisting [1] on servicing some uevents in strict sequence, which resulted in boot stalls. Maybe the ultimate answer is to write a firmware loading daemon that would also listen to netlink events and do properly what udev refused to be doing? The distribution would know when it is ready to service firmware requests (and thus when to start this daemon), and we would have the freedom of having drivers both built-in and as modules and bulding firmware into kernel, intiramfs or keep on a "real" fs available at later time. Thanks. -- Dmitry [1] https://lwn.net/Articles/518942/
Re: [RFC] fs: add userspace critical mounts event support
On 03/09/2016 11:10, Dmitry Torokhov wrote: > I was thinking if we kernel could post > "conditions" (maybe simple stings) that it waits for, and userspace > could unlock these "conditions". One of them might be "firmware > available". On idea offered by Josh Triplett that seems to overlap with this one is to have something similar to the (deprecated) userhelper with *per-blob* requests and notifications except for one major difference: userspace would not anymore be in charge of *providing* the blob but would instead only *signal* when a given blob becomes available and is either found or found missing. Then the kernel loads the blob _by itself_; unlike the userhelper. No new “critical filesystem” concept and a *per-blob basis*, allowing any variation of blob locations across any number of initramfs and filesystems. Could this one fly?
Re: [RFC] fs: add userspace critical mounts event support
On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote: > kernel_read_file_from_path() can try to read a file from > the system's filesystem. This is typically done for firmware > for instance, which lives in /lib/firmware. One issue with > this is that the kernel cannot know for sure when the real > final /lib/firmare/ is ready, and even if you use initramfs > drivers are currently initialized *first* prior to the initramfs > kicking off. Why? > During init we run through all init calls first > (do_initcalls()) and finally the initramfs is processed via > prepare_namespace(): What's the downside of moving initramfs cpio extraction earlier in the boot? I did some shuffling around of those code to make initmpfs work, does anybody know why initramfs extraction _before_ we initialize drivers would be a bad thing? (The cpio is in memory, either linked into the kernel or from the bootloader. No drivers are needed to extract it, that's sort of the point.) The only things I can think of are memory churn (large contiguous physical page allocations), or if a driver somehow got us access to more physical memory? Rob
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 06, 2016 at 03:28:47PM -0700, Bjorn Andersson wrote: > On Tue 06 Sep 14:52 PDT 2016, Luis R. Rodriguez wrote: > > > We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or > > something like it to help annotate the the driver was only functional with > > the > > firmware, punt things to kmod to deal with the requirements. > > That implies that a single driver will only use a single version of the > firmware. Today firmware requests are done manually by the driver, in the future it should be possible to specify just an array of firmwares and on that list specify which firmware is optional, and perhaps if you need at last one. Then you treat optional firmware upgrades as optional -- and only treat fatal conditions as such. Today drivers manage all this on their own. This is something we can later do as we have the flexible firmware API in place, but for now -- you are right. There is no clear way to extract the semantics of what firmware requirements really are in an easy way. > There are cases where we want a single driver to load firmware > depending on e.g. hardware revisions, Dynamic firmware names -- indeed. Good point. > or previous firmware version and > there are cases where we want to load firmware based on requested > use-cases. True. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 06, 2016 at 02:50:51PM -0700, Linus Torvalds wrote: > On Tue, Sep 6, 2016 at 2:11 PM, Bjorn Andersson >wrote: > > On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote: > > > >> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson > >> Nobody has actually answered the "why don't we just tie the firmware > >> and module together" question. > > > > The answer to this depends on the details of the suggestion; but > > generally there's a much stronger bond between the kernel and the driver > > than between the driver and the firmware in my cases. > > I call BS. > > Let me be very clear. I'm not applying that shit-for-brains stupid > patch, and will not be pulling it unless somebody tricks me into it. Great thanks. Please note that the only reason I proposed this was to get the ball rolling in terms of finding a solution to the problem for why some folks claim they *need* the firmware usermode helper. Granted, upstream we only have 2 explicit users left, I'm told some out-of-tree users still need and use the usermode helper. They claim that without it there is the race between /lib/firmware being ready and driver asking for the firmware. I was told there were quite a bit of out-of-tree hacks to address this without using the usermode helper, the goal of this patch was to create the discussion needed to a proper resolution to this. Given that upon inspection -- I've determined that even if you stuff the firmware into initramfs you may still run into the race (the commit log of this patch explains that) and that using initramfs was the alternative solution we expected folks to use -- the only current deterministic sure way a driver can depend on a firmware and avoid the race is to use CONFIG_EXTRA_FIRMWARE. This is an issue. > Because all these arguments make no sense at all. > > If the driver doesn't work without the firmware, then anybody who > distributes the driver binary without a firmware is just > *fundamentally* doing something insane. Some companies only redistribute firmware binaries to specific entities due to avoid expanding to whom a patent grant is given to. That is, not every company writing drivers or pushing out binary drivers is willing to dish out the firmware as per the terms in linux-firmware. > You may do it for *development* purposes, but doing so for actual *use* would > be entirely pointless. To be fair we haven't been explicit about our requirements for firmware_class and expectations about what we expect for firmware for driver in Linux. This has all been rather loose. Furthermore the race issues we have found in firmware_class have only come about through introspection, and I've been slowly documenting all this tribal knowledge. > See my point? If a distribution is distributing the driver without the > firmware, then what the hell is the point of such a thing? Agreed. > But even if you decide to do that for some odd reason, the patch is > still just stupid. Instead of adding some crazy infrastructure for > "now I've mounted everything", you could equally well just > > (a) make the driver fail the module load if it cannot find a firmware binary Not sure if its clear but: 0) this is not just about firmware anymore 1) there is a race between using kernel_read_file_from_path() and having the filesystem that should be present on be ready; 2) Only *userspace* can know for sure what the real valid filesystem for files read from kernel_read_file_from_path() can be ready, so only userspace can tell the kernel if a read from kernel_read_file_from_path() is at a certain point in time valid. > (b) after user space has mounted everything, just do "insmod -a" > again (or insmod just that driver). I'm happy to document this as the resolution... I have a feeling some folks will not like it. We also have built-in drivers to consider, what do we advise for that? Keep in mind only CONFIG_EXTRA_FIRMWARE is deterministically safe. > See? The point is, this "generic" hacky interface is just stupid. It's > not adding any value. If you add user space "I'm ready now" points > anyway, you might as well make those points do the right thing and > just load the module that is now loadable. This is not about firmware anymore though, and we need to address built-in. > We could mark such "late loading" modules explicitly if people want > to, so that you can automate the whole thing about delaying the > loading in user space. Now *that* could actually help, for instance add another late init call which would be called after initramfs and friends -- perhaps way after prepare_namespace() -- by then we would ensure userspace has all critical fs mounted. The problem though is we'd still need a way for userspace to tell the kernel that all critical fs are mounted as only userspace can know this for sure. When is that done? How would the kernel know? We do have PROBE_PREFER_ASYNCHRONOUS, but that does not provide any guarantees over ready
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 06, 2016 at 11:32:05AM -0700, Linus Torvalds wrote: > On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson >wrote: > > > > Linus, I reversed the order of your questions/answers to fit my answer > > better. > > Nobody has actually answered the "why don't we just tie the firmware This is not about firmware anymore. The fact that we were reading files from the filesystem in different ways called for a generic API, that was done by Mimi with the new generic kernel_read_file_from_path(), the fact that there were race concerns with firmware means such races are also in theory possible outside of firmware. > and module together" question. In terms of the firmware, licensing is one reason I'm aware of. Another reason is updates to firmware files may not implicate a driver update -- its pointless to rebuild a kernel if all you need is a firmware update. For both modules and built-in we already have the option to bundle firmware into initramfs, and CONFIG_EXTRA_FIRMWARE, some folks do not want to use these. Bjorn noted a few reasons why. Here's the full list of reasons I've heard why folks shy away from these: o Licensing o You still want the ability to do updates o The size of the files can be huge (remoteproc is talking about 10 MiB files) Do we want to bring the firmware closer together than what we allow for modules? If so it may make sense to use a modpost command to try to bundle module and firmware together, the build system could use the MODULE_FIRMWARE() for that. It does mean you now have a build dependency for linux-firmware on the kernel. Do we want that? > Really. If the driver doesn't work without the firmware, then why the > hell is it separated from it in the first place? This I agree with, although you still have to consider you may want to enable updates for firmware without a driver update. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue 06 Sep 14:52 PDT 2016, Luis R. Rodriguez wrote: > We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or > something like it to help annotate the the driver was only functional with the > firmware, punt things to kmod to deal with the requirements. That implies that a single driver will only use a single version of the firmware. There are cases where we want a single driver to load firmware depending on e.g. hardware revisions, or previous firmware version and there are cases where we want to load firmware based on requested use-cases. Regards, Bjorn
Re: [RFC] fs: add userspace critical mounts event support
On Sat, Sep 03, 2016 at 11:10:02AM -0700, Dmitry Torokhov wrote: > On Sat, Sep 3, 2016 at 11:01 AM, Linus Torvalds >wrote: > > On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhov > > wrote: > >> > >> Unfortunately module loading and availability of firmware is very > >> loosely coupled. > > > > The whole "let's add a new magical proc entry To be fair it was using a generic sysfs entry. > > to say that all > > filesystems are mounted" is all about the user space coupling them. > > I was thinking if we kernel could post "conditions" (maybe simple strings) > that it waits for, and userspace could unlock these "conditions". One of them > might be "firmware available". Dmitry, you seem to be suggesting a generic kernel-userspace "registry" for intertwined dependencies that the kernel needs and only userspace can provide, is that right? If so it sounds overly complicated to resolve this. Do we have other uses outside of kernel_read_file_from_path() you had in mind for this? > > I'm just saying that if user space must know about when firmware is > > ready to be loaded anyway, just do it right. Not with some hacky "you > > can now do random things" flag. Note, this isn't just about firmware since we now have a generic API to read files from the kernel, so kernel_read_file_from_path(). Firmware is now just *one* user case. Also a complexity here is this is not just for modules but also for built-in drivers as well. And you want the option to update the files without the driver. The proposed solution provides a generic broad syfs entry for letting userspace inform the kernel when files from the filesystems where it would typically read from (I'm calling them critical filesystems for lack of a better term) can be accessible. Something more specific requires a bit more thought given this is not anymore about just firmware, must also address built-in drivers, and allow for updates. > > But by having user space actually say > > "put this module together with this firmware". Keep in mind this isn't about just firmware anymore, and we have to consider built-in drivers as well. But if we were to just consider firmware... for the sake of following with examples. How would this registry work? We already have MODULE_FIRMWARE(), we could have MODULE_FIRMWARE_REQ() or something like it to help annotate the the driver was only functional with the firmware, punt things to kmod to deal with the requirements. kmod would only load the driver if the firmware is present as well, otherwise it could just return a new -ENOFIRMWARE and try to defer module loading for a later time, it would load the driver once and if the firmware becomes available... This all seems rather hacky. For built-in drivers.. the vmlinux would have the associated firmware reqs, it would still need a way to let userspace know when such firmware is ready. What kernel <-> userspace API would we want to use for this ? Or as you note we could just prevent such drivers to be built-in. I'd be happy with this, however I don't think the distributions using non-modular kernels will be. Now whatever we come up with recall this isn't just about firmware anymore. Which is why I ended up bundling all these requirements up into one generic "kernel reads file from filesystem" requirement. I can see the syfs approach being considered hacky -- but I gladly welcome an actual alternative suggestion. > > If you just put the two pieces together, then the module "will just work". > > > > And quite frankly, that sounds like a much better maintenance practice > > anyway. If some module doesn't work without the firmware, then dammit, > > the two *should* go together. Putting them in two different places > > would be *INSANE*. > > Quite often it does until it does not. Most of the touch controllers > work just fine until some event (abrupt cutting of power for example) > where nvram gets corrupted and they come up in bootloader mode. It is > just an example. You want to also opt-in for updates, you don't want to require re-building a driver for a firmware fix, for instance. Luis
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 6, 2016 at 2:11 PM, Bjorn Anderssonwrote: > On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote: > >> On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson >> Nobody has actually answered the "why don't we just tie the firmware >> and module together" question. > > The answer to this depends on the details of the suggestion; but > generally there's a much stronger bond between the kernel and the driver > than between the driver and the firmware in my cases. I call BS. Let me be very clear. I'm not applying that shit-for-brains stupid patch, and will not be pulling it unless somebody tricks me into it. Because all these arguments make no sense at all. If the driver doesn't work without the firmware, then anybody who distributes the driver binary without a firmware is just *fundamentally* doing something insane. You may do it for *development* purposes, but doing so for actual *use* would be entirely pointless. See my point? If a distribution is distributing the driver without the firmware, then what the hell is the point of such a thing? But even if you decide to do that for some odd reason, the patch is still just stupid. Instead of adding some crazy infrastructure for "now I've mounted everything", you could equally well just (a) make the driver fail the module load if it cannot find a firmware binary (b) after user space has mounted everything, just do "insmod -a" again (or insmod just that driver). See? The point is, this "generic" hacky interface is just stupid. It's not adding any value. If you add user space "I'm ready now" points anyway, you might as well make those points do the right thing and just load the module that is now loadable. We could mark such "late loading" modules explicitly if people want to, so that you can automate the whole thing about delaying the loading in user space. At no point does it make sense to say "I have now mounted all the important filesystems". Maybe the firmware is extracted later by user space downloading it from the internet, and the module will then work only after that point"./ This whole "I have mounted important filesystems" is just pure and utter garbage. Stop pushing this shit. Linus
Re: [RFC] fs: add userspace critical mounts event support
On Tue 06 Sep 11:32 PDT 2016, Linus Torvalds wrote: > On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Andersson >wrote: > > > > Linus, I reversed the order of your questions/answers to fit my answer > > better. > > Nobody has actually answered the "why don't we just tie the firmware > and module together" question. > The answer to this depends on the details of the suggestion; but generally there's a much stronger bond between the kernel and the driver than between the driver and the firmware in my cases. E.g. we have a single remoteproc driver loading and controlling the Hexagon DSP found in several Qualcomm platforms, so a single kernel binary could (practically) load hundreds of variants of the firmware. Both the kernel binary and the firmware in this example are side-loaded onto the device during development - independently of each other, as they are developed by different teams (or maybe even different companies). I assume that you're not suggesting to actually tie the module together, as that would be practically difficult and a waste of resources. Which leaves us with the suggestion that we should store the kernel module with the firmware file, which is just infeasible from a few practical reasons - again mostly related to the development flow and how the files are contained on the devices. > Really. If the driver doesn't work without the firmware, then why the > hell is it separated from it in the first place? > In several cases we have a single remoteproc driver controlling several different co-processors. Further more with the aspiration of being able to run the same kernel binary (including modules) on more than one product this is simply not feasible. As I said above, beyond development there are hundreds of variants of these firmware files in products - each weighting in at 10-50MB. The firmware loading part (remoteproc) doesn't care about these differences and the functional drivers attaching to the services provided by the firmware can handle the differences between them. > The hack is a hack, and it just sounds *stupid*. > This I totally agree with. Regards, Bjorn
Re: [RFC] fs: add userspace critical mounts event support
On Fri 02 Sep 21:11 PDT 2016, Linus Torvalds wrote: Linus, I reversed the order of your questions/answers to fit my answer better. > On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguezwrote: > > > > Thoughts ? > What are the drivers that need this, and why can't those drivers just > be fixed to not do braindead things? > I have several cases where remoteproc drivers are used boot DSPs upon boot of the device, but the firmware files are way too big for being stored in initramfs and all consumers of the provided services are (semi-) probable as the remote processor is booted. I.e. we need some way to figure out when these files become available so we can bring these remote processors up. > It's basically the kernel giving up, and relying on user space to give > a single flag, and it's broken nasty crap. Worse, it's broken nasty > crap with a user interface, so we'll be stuck with it forever. Please > no. > There are several cases where there granularity of a single flag is not enough and we do already have a working mechanism for relying on user space to inform the kernel that firmware is available: CONFIG_FW_LOADER_USER_HELPER_FALLBACK Another available solution is, as you say, using kernel modules. But I really do not like the deployment issues that comes with kernel modules during development. (The firmware and remoteproc driver normally does not have the same flow through a development process) > > I really think this is a horrible hack. > I agree, but that said, I would appreciate a automagical mechanism that would relieve user space from having to signal to the kernel that the firmware partition has been mounted. Regards, Bjorn
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Sep 6, 2016 at 10:46 AM, Bjorn Anderssonwrote: > > Linus, I reversed the order of your questions/answers to fit my answer > better. Nobody has actually answered the "why don't we just tie the firmware and module together" question. Really. If the driver doesn't work without the firmware, then why the hell is it separated from it in the first place? The hack is a hack, and it just sounds *stupid*. Linus
Re: [RFC] fs: add userspace critical mounts event support
On Sat, Sep 3, 2016 at 11:01 AM, Linus Torvaldswrote: > On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhov > wrote: >> >> Unfortunately module loading and availability of firmware is very >> loosely coupled. > > The whole "let's add a new magical proc entry to say that all > filesystems are mounted" is all about the user space coupling them. I will be first to say that the proposed *implementation* is nowhere near what should be accepted. I was thinking if we kernel could post "conditions" (maybe simple stings) that it waits for, and userspace could unlock these "conditions". One of them might be "firmware available". > > I'm just saying that if user space must know about when firmware is > ready to be loaded anyway, just do it rigth. Not with some hacky "you > can now do random things" flag. But by having user space actually say > "put this module together with this firmware". > > If you just put the two pieces together, then the module "will just work". > > And quite frankly, that sounds like a much better maintenance practice > anyway. If some module doesn't work without the firmware, then dammit, > the two *should* go together. Putting them in two different places > would be *INSANE*. Quite often it does until it does not. Most of the touch controllers work just fine until some event (abrupt cutting of power for example) where nvram gets corrupted and they come up in bootloader mode. It is just an example. Thanks. -- Dmitry
Re: [RFC] fs: add userspace critical mounts event support
On Sat, Sep 3, 2016 at 10:49 AM, Dmitry Torokhovwrote: > > Unfortunately module loading and availability of firmware is very > loosely coupled. The whole "let's add a new magical proc entry to say that all filesystems are mounted" is all about the user space coupling them. I'm just saying that if user space must know about when firmware is ready to be loaded anyway, just do it rigth. Not with some hacky "you can now do random things" flag. But by having user space actually say "put this module together with this firmware". If you just put the two pieces together, then the module "will just work". And quite frankly, that sounds like a much better maintenance practice anyway. If some module doesn't work without the firmware, then dammit, the two *should* go together. Putting them in two different places would be *INSANE*. Linus
Re: [RFC] fs: add userspace critical mounts event support
On Fri, Sep 02, 2016 at 09:41:18PM -0700, Linus Torvalds wrote: > On Sep 2, 2016 9:20 PM, "Dmitry Torokhov"wrote: > > > > Like what? Some devices do need to have firmware loaded so we know > > their capabilities, so we really can't push the firmware loading into > > "open". > > So you > (a) document that Document that device may come up half-broken? Not sure how that would help end user. > (b) make the driver only build as a module Unfortunately module loading and availability of firmware is very loosely coupled. Of course, if you only load modules from the same partition that your firmware is on you can get away with it, but if some of the modules are in initramfs and firmware is on final root fs then it still does not work. And populating also initramfs with firmware that might be used once in a 1000 boots is somewhat wasteful. That is not talking about systems that do not wish to use modules for one reason or another, or even more esoteric setups where non-essential for boot firmware can be mounted later over nfs, etc, etc. > (c) make sure the module and the firmware go together I do not think it is always possible. Quite often it is though, at the expense of increasing kernel/initramfs size. > > End of problem. > > Why make up random interfaces for crazy stuff? Because we want a solution that works well for all cases, simple and complex. This includes allowing drivers to be built into the kernel but allow them waiting for additional data (config/firmware) that may become available later in the game. We just need to be able to tell them when it does not make sense to wait anymore as the data they want is not coming, and do it more reliably then simply declaring 10 or 30 or 300 seconds time out. Thanks. -- Dmitry
Re: [RFC] fs: add userspace critical mounts event support
On Sep 2, 2016 9:20 PM, "Dmitry Torokhov"wrote: > > Like what? Some devices do need to have firmware loaded so we know > their capabilities, so we really can't push the firmware loading into > "open". So you (a) document that (b) make the driver only build as a module (c) make sure the module and the firmware go together End of problem. Why make up random interfaces for crazy stuff? Linus
Re: [RFC] fs: add userspace critical mounts event support
On Fri, Sep 2, 2016 at 9:11 PM, Linus Torvaldswrote: > On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguez wrote: >> >> Thoughts ? > > I really think this is a horrible hack. > > It's basically the kernel giving up, and relying on user space to give > a single flag, and it's broken nasty crap. Worse, it's broken nasty > crap with a user interface, so we'll be stuck with it forever. Please > no. I agree that interface is bad, but I do believe we need something like this... > > What are the drivers that need this, and why can't those drivers just > be fixed to not do braindead things? Like what? Some devices do need to have firmware loaded so we know their capabilities, so we really can't push the firmware loading into "open". If your touch controller for some reason decided to crap over it's nvram and comes in bootloader mode it is nice for it to slurp in config data or firmware so use does not have to trigger update manually. And while the controller is in bootloader mode we can't create input device because we do not know what capabilities to declare. These devices we want to probe asynchronously and simply tell firmware loader to wait for firmware to become available. The problem we do not know when to give up, since we do not know where the firmware might be. But userspace knows and can tel us. Thanks. -- Dmitry
Re: [RFC] fs: add userspace critical mounts event support
On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguezwrote: > > Thoughts ? I really think this is a horrible hack. It's basically the kernel giving up, and relying on user space to give a single flag, and it's broken nasty crap. Worse, it's broken nasty crap with a user interface, so we'll be stuck with it forever. Please no. What are the drivers that need this, and why can't those drivers just be fixed to not do braindead things? Linus
[RFC] fs: add userspace critical mounts event support
kernel_read_file_from_path() can try to read a file from the system's filesystem. This is typically done for firmware for instance, which lives in /lib/firmware. One issue with this is that the kernel cannot know for sure when the real final /lib/firmare/ is ready, and even if you use initramfs drivers are currently initialized *first* prior to the initramfs kicking off. During init we run through all init calls first (do_initcalls()) and finally the initramfs is processed via prepare_namespace(): do_basic_setup() { ... driver_init(); ... do_initcalls(); ... } kernel_init_freeable() { ... do_basic_setup(); ... if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) { ramdisk_execute_command = NULL; prepare_namespace(); } } This leaves a possible race between loading drivers and any uses of kernel_read_file_from_path(). Because pivot_root() can be used, this allows userspace further possibilities in terms of defining when a kernel critical filesystem should be ready by. We define kernel critical filesystems as filesystems which the kernel needs for kernel_read_file_from_path(). Since only userspace can know when kernel critical filesystems are mounted and ready, let userspace notify the kernel of this, and enable a new kernel configuration which lets the kernel wait for this event before enabling reads from kernel_read_file_from_path(). A default timeout of 10s is used for now. You can override this through the kernel-parameters using critical_mounts_timeout_ms=T where T is in ms. cat /sys/kernel/critical_mounts_timeout_ms the current system value. When userspace is ready it can simply: echo 1 > /sys/kernel/critical_mounts_ready Signed-off-by: Luis R. Rodriguez--- Note, this still leaves the puzzle of the fact that initramfs may carry some firmware, and some drivers may be OK in using firmware from there, the wait stuff would just get in the way. To address this I think can perhaps instead check *one* for the file, and if its present immediately give it back, we'd only resort to the wait in cases of failure. Another thing -- other than firmware we have: security/integrity/ima/ima_fs.c:rc = kernel_read_file_from_path(path, , , 0, READING_POLICY); sound/oss/sound_firmware.h: err = kernel_read_file_from_path((char *)fn, (void **)fp, , What paths are these? So we can document the current uses in the Kconfig at least. Thoughts ? Documentation/kernel-parameters.txt | 6 +++ drivers/base/Kconfig| 48 +++ fs/exec.c | 3 ++ include/linux/fs.h | 8 kernel/ksysfs.c | 77 + 5 files changed, 142 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 8ccacc44622a..1af89faa9fc9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -849,6 +849,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. + critical_mounts_timeout_ms=T[KNL] timeout in ms + Format: + Use this to override the kernel's default timeout for + waiting for critical system mount points to become + available. + cryptomgr.notests [KNL] Disable crypto self-tests diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 12b4f5551501..21576c0a4898 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -25,6 +25,54 @@ config UEVENT_HELPER_PATH via /proc/sys/kernel/hotplug or via /sys/kernel/uevent_helper later at runtime. +config CRITICAL_MOUNTS_WAIT + bool "Enable waiting for critical-filesystems-ready notification" + default n + help + Kernel subsystems and device drivers often need to read files + from the filesystem, however in doing this races are possible at + bootup -- the subsystem requesting the file might look for it in / + early in boot, but if we haven't yet mounted the real root + filesystem we'll just tell the subsystem the file is not present and + it will fail. Furthermore what path to the filesystem is used varies + depending on the subsystem. To help the kernel we provide the option + to let the kernel wait for all critical filesystems to mounted and + ready before letting the kernel start trying to read files from the + systems' filesystem. Since pivot_root() can be used and therefore a + system might be configured to change its / filesystem at bootup as + many times as it wishes, only userspace can realy know exactly when + all critical filesystems are