Re: udev breakages - was: Re: Need of an .async_probe() type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers
On Wed, Oct 3, 2012 at 12:12 AM, Greg KH gre...@linuxfoundation.org wrote:

 Mauro, what version of udev are you using that is still showing this
 issue?

 Kay, didn't you resolve this already?  If not, what was the reason why?

It's the same in the current release, we still haven't wrapped our
head around how to fix it/work around it.

Unlike what the heated and pretty uncivilized and rude emails here
claim, udev does not dead-lock or break things, it's just slow.
The modprobe event handling runs into a ~30 second event timeout.
Everything is still fully functional though, there's only this delay.

Udev ensures full dependency resolution between parent and child
events. Parent events have to finish the event handling and have to
return, before child event handlers are started. We need to ensure
such things so that (among other things) disk events have finished
their operations before the partition events are started, so they can
rely and access their fully set up parent devices.

What happens here is that the module_init() call blocks in a userspace
transaction, creating a child event that is not started until the
parent event has finished. The event handler for modprobe times out
then the child event loads the firmware.

Having kernel module relying on a running and fully functional
userspace to return from module_init() is surely a broken driver
model, at least it's not how things should work. If userspace does not
respond to firmware requests, module_init() locks up until the
firmware timeout happens.

This all is not so much about how probe() should behave, it's about a
fragile dependency on a specific userspace transaction to link a
loadable module into the kernel. Drivers should avoid such loops for
many reasons. Also, it's unclear in many cases how such a model should
work at all if the module is compiled in and initialized when no
userspace is running.

If that unfortunate module_init() lockup can't be solved properly in
the kernel, we need to find out if we need to make the modprobe
handling in udev async, or let firmware events bypass dependency
resolving. As mentioned, we haven't decided as of now which road to
take here.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udev breakages - was: Re: Need of an .async_probe() type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers
On Wed, Oct 3, 2012 at 6:57 PM, Greg KH gre...@linuxfoundation.org wrote:

 It's the same in the current release, we still haven't wrapped our
 head around how to fix it/work around it.

 Ick, as this is breaking people's previously-working machines, shouldn't
 this be resolved quickly?

Nothing really breaks, It's slow and it will surely be fixed when
we know what's the right fix, which we haven't sorted out at this
moment.

 module_init() can do lots of bad things, sleeping, asking for
 firmware, and lots of other things.  To have userspace block because of
 this doesn't seem very wise.

Not saying that it is right or nice, but it's the kernel itself that
blocks. Run init=/bin/sh and do a modprobe of one of these drivers and
it hangs un-interruptible until the kernel's internal firmware loading
request times out, just because userspace is not there.

 But previously this all just worked as we ran 'modprobe' in a new
 thread/process right?

No, we used to un-queue events which had a timeout specified in the
environment, that code caused other issues and was removed.

 it can do without worrying about stopping anything else in the system that 
 might
 want to happen at the same time (like load multiple modules in a row).

It should not be an issue, the serialization is strictly parent -
child, everything else runs in parallel.

 If that unfortunate module_init() lockup can't be solved properly in
 the kernel, we need to find out if we need to make the modprobe
 handling in udev async, or let firmware events bypass dependency
 resolving. As mentioned, we haven't decided as of now which road to
 take here.

 It's not a lockup, there have never been rules about what a driver could
 and could not do in its module_init() function.  Sure, there are some
 not-nice drivers out there, but don't halt the whole system just because
 of them.

It is a kind of lock up, just try modprobe with the init=/bin/sh boot.

 I recommend making module loading async, like it used to be, and then
 all should be fine, right?

That's the current idea, and Tom is looking into it how it could look like.

I also have no issues at all if the kernel does load the firmware from
the filesystem on its own; it sounds like the simplest and most robust
solution from a general look at the problem. It would also make the
difference between in-kernel firmware and out-of-kernel firmware less
visible, which sounds good.
Honestly, requiring firmware-loading userspace-transactions to
successfully link a module into the kernel sounds like a pretty bad
idea to start with. Unlike module loading which needs the depmod alias
database and userspace configuration; with firmware loading, there is
no policy involved where userspace would add any single additional
value to that step.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udev breakages - was: Re: Need of an .async_probe() type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers
On Wed, Oct 3, 2012 at 10:39 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Wed, Oct 3, 2012 at 12:50 PM, Greg KH gre...@linuxfoundation.org wrote:

 Ok, like this?

 This looks good to me.  Having udev do firmware loading and tieing it to
 the driver model may have not been such a good idea so many years ago.
 Doing it this way makes more sense.

 Ok, I wish this had been getting more testing in Linux-next or
 something, but I suspect that what I'll do is to commit this patch
 asap, and then commit another patch that turns off udev firmware
 loading entirely for the synchronous firmware loading case.

 Why? Just to get more testing, and seeing if there are reports of
 breakage. Maybe some udev out there has a different search path (or
 because udev runs in a different filesystem namespace or whatever), in
 which case running udev as a fallback would otherwise hide the fact
 that he direct kernel firmware loading isn't working.

 Ok? Comments?

The current udev directory search order is:
  /lib/firmware/updates/$(uname -r)/
  /lib/firmware/updates/
  /lib/firmware/$(uname -r)/
  /lib/firmware/

There is no commonly known /firmware directory.

http://cgit.freedesktop.org/systemd/systemd/tree/src/udev/udev-builtin-firmware.c#n100
http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n548

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udev breakages - was: Re: Need of an .async_probe() type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers
On Wed, Oct 3, 2012 at 11:05 PM, Greg KH gre...@linuxfoundation.org wrote:

 As for the firmware path, maybe we should
 change that to be modified by userspace (much like /sbin/hotplug was) in
 a proc file so that distros can override the location if they need to.

If that's needed, a CONFIG_FIRMWARE_PATH= with the array of locations
would probably be sufficient.

Like udev's defaults here:
  http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n550

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udev breakages - was: Re: Need of an .async_probe() type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers
On Thu, Oct 4, 2012 at 12:58 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 That said, there's clearly enough variation here that I think that for
 now I won't take the step to disable the udev part. I'll do the patch
 to support direct filesystem firmware loading using the udev default
 paths, and that hopefully fixes the particular case people see with
 media modules.

If that approach looks like it works out, please aim for full
in-kernel-*only* support. I would absolutely like to get udev entirely
out of the sick game of firmware loading here. I would welcome if we
are not falling back to the blocking timeouted behaviour again.

The whole story would be contained entirely in the kernel, and we get
rid of the rather fragile userspace transaction to execute
module_init(), where the kernel has no idea if userspace is even up to
ever responding to its requests.

There would be no coordination with userspace tools needed, which
sounds like a better fit in the way we develop things with the loosely
coupled kernel - udev requirements.

If that works out, it would a bit like devtmpfs which turned out to be
very simple, reliable and absolutely the right thing we could do to
primarily mange /dev content.

The whole dance with the fake firmware struct device, which has a 60
second timeout to wait for userspace, is a long story of weird
failures at various aspects.

It would not only solve the unfortunate modprobe lockup with
init=/bin/sh we see here, also big servers with an insane amount of
devices happen to run into the 60 sec timeout, because udev, which
runs with 4000-8000 threads in parallel handling things like 30.000
disks is not scheduled in time to fulfill network card firmware
requests. It would be nice if we don't have that arbitrary timeout at
all.

Having any timeout at all to answer the simple question if a file
stored in the rootfs exists, should be a hint that there is something
really wrong with the model that stuff is done.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/4] em28xx: Workaround for new udev versions

2012-06-26 Thread Kay Sievers
On Tue, 2012-06-26 at 18:07 -0300, Mauro Carvalho Chehab wrote:
 Em 26-06-2012 17:40, Greg KH escreveu:
  On Tue, Jun 26, 2012 at 04:34:21PM -0300, Mauro Carvalho Chehab wrote:
  New udev-182 seems to be buggy: even when usermode is enabled, it
  insists on needing that probe would defer any firmware requests.
  So, drivers with firmware need to defer probe for the first
  driver's core request, otherwise an useless penalty of 30 seconds
  happens, as udev will refuse to load any firmware.
  
  Shouldn't you fix udev, if it really is a problem here?  Papering over
  userspace bugs in the kernel isn't usually a good thing to do, as odds
  are, it will hit some other driver sometime, right?
 
 That's my opinion too, but Kay seems to think otherwise. On his opinion,
 waiting for firmware during module_init() is something that were never
 allowed at the device model.

No, that's not at all an udev *bug*, the changelog in this patch is just
plain wrong. It's just udev making noise about a broken driver behavior.
And it's the messenger, not the problem.

Kernel modules must not block in module_init() in a userspace
transaction (fw load) that can take an unpredictable amount of time. It
results in broken suspend/resume paths, or broken compiled-in module
behaviour, and a modprobe processes which hangs uninterruptible until
the firmware timeout happens.

Uevents have dependencies, if a parent device event calls modprobe, the
child device it creates will waits for the parent event to finish, but
if the parent blocks in modprobe, it will not finish and we run into the
deadlock udev complains about.

Udev used to work around that, that workaround we turned into the logged
error we see now. Again, uninterruptible blocking of module_init() in a
in-kernel callout-to-userspace is not proper driver behavior, and needs
to be changed. 

Thanks,
Kay

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Q] udev and soc-camera

2010-01-28 Thread Kay Sievers
On Thu, Jan 28, 2010 at 00:25, Valentin Longchamp
valentin.longch...@epfl.ch wrote:
 I have a system that is built with OpenEmbedded where I use a mt9t031 camera
 with the soc-camera framework. The mt9t031 works ok with the current kernel
 and system.

 However, udev does not create the /dev/video0 device node. I have to create
 it manually with mknod and then it works well. If I unbind the device on the
 soc-camera bus (and then eventually rebind it), udev then creates the node
 correctly. This looks like a timing issue at coldstart.

 OpenEmbedded currently builds udev 141 and I am using kernel 2.6.33-rc5 (but
 this was already like that with earlier kernels).

 Is this problem something known or has at least someone already experienced
 that problem ?

You need to run udevadm trigger as the bootstrap/coldplug step,
after you stared udev. All the devices which are already there at that
time, will not get created by udev, only new ones which udev will see
events for. The trigger will tell the kernel to send all events again.

Or just use the kernel's devtmpfs, and all this should work, even
without udev, if you do not have any other needs than plain device
nodes.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html