Bug#317333: The case of udev and the missing /dev/input/mice

2005-09-21 Thread Kay Sievers
On Wed, Sep 21, 2005 at 03:38:06AM +0100, Scott James Remnant wrote:
 Background: in the upcoming Ubuntu 5.10 we've been having some problems
 with /dev/input/mice not being created on startup despite the mousedev
 module being hard-loaded early in the boot sequence.
 (http://bugzilla.ubuntu.com/show_bug.cgi?id=12915 for those interested).
 
 Debian has had similar problems too (http://bugs.debian.org/317333) and
 found that starting udevd earlier manually seemed to fix it.

Yes, that's a good way to fix it.

 After much debugging, I've finally figured out what's going on ... it's
 a bit of a story, but here goes...

Great, we finally have an idea why this happens. Thanks for finding that
out.

 On receiving the netlink event for the printer port, udevd disables
 receipt of any sequence numbered events from udevsend (ie. those that
 will almost certainly be duplicated over the netlink socket).
 Unfortunately this means all the udevsend events we're about to receive
 from the processes that backed off a second or so while fighting over
 who got to start udevd[1].
 
 These udevsend processes deliver their events to udevd, which cheerfully
 ignores them because it thinks it's going to get another copy over the
 netlink socket any second now.  Unfortunately the netlink event has
 already been and gone, and we just ignored an event we weren't supposed
 to.
 
 
 The two problems as I see them are:
 
 1) The fact that receiving a netlink event disables sequence numbered
udevsend events, when there's already code to deal with de-duping
events anyway.  Is there actually any need for this additional check,
can't we just queue both events and have them ignored by
msg_queue_insert() ?
 
 2) That this ignoring of events is done at receipt, rather than in queue
order.  This means that the later parport_pc netlink event is able
to disable queueing of udevsend events with a lower sequence number.
 
 I can envisage that #1 is necessary in case the time between receiving
 the udevsend and netlink event is so long that we've already processed
 and removed one of the events by the time the second is queued.

Yes, that was the reason for ignoring the incoming messages.

 In which case the problem becomes fixing #2, however unless the kernel
 promises strict ordering of events over the netlink socket (which I
 doubt, otherwise it wouldn't need sequence numbers)

Netlink events are always in the right order. The SEQNUM is only needed
for the forked events.

 we can't assume
 that we've received all of the pre-netlink events we are going to.

Right, as /proc/sys/kernel/hotplug events are forked processes, you will
never know when and in which order they will arrive.

 I suspect the right solution is actually to implement history of what
 events we've already processed, and de-dupe them that way; rather than
 ignoring messages on receipt.

We could just accept all events with a lower sequence number as the first
netlink event's one, that may fix it.

The right solution is to start udevd as one of the first things
after taking over control from the kernel. This way you will only catch
the events for the last non driver core subsystem, the input layer.

At the time the input layer is fixed, the need for udevsend will
completely go away and /proc/sys/kernel/hotplug should be disabled
when taking over control from the kernel - it is only needed in
initramfs.
After input is fixed, the whole event reordering and timeout handling
will be removed from udevd and we need to start udevd manually anyway.

Kay


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#317333: The case of udev and the missing /dev/input/mice

2005-09-21 Thread Scott James Remnant
On Wed, 2005-09-21 at 11:51 +0200, Kay Sievers wrote:

 On Wed, Sep 21, 2005 at 03:38:06AM +0100, Scott James Remnant wrote:
  Background: in the upcoming Ubuntu 5.10 we've been having some problems
  with /dev/input/mice not being created on startup despite the mousedev
  module being hard-loaded early in the boot sequence.
  (http://bugzilla.ubuntu.com/show_bug.cgi?id=12915 for those interested).
  
  Debian has had similar problems too (http://bugs.debian.org/317333) and
  found that starting udevd earlier manually seemed to fix it.
 
 Yes, that's a good way to fix it.
 
One thing I'd like to see changed in udevd is to move the
init_udevd_socket() and init_uevent_netlink_sock() calls to above the
daemonization; that way when you call udevd --daemon from the init
script, you *know* that the next command may cause a netlink event.

Right now there's an unknown amount of time between calling udevd
--daemon and being able to safely modprobe.

This'd also mean that udevd could exit with an error status if it's
unable to create the necessary sockets; rather than the child exiting
and the status being lost.

Scott
-- 
Scott James Remnant
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Bug#317333: The case of udev and the missing /dev/input/mice

2005-09-20 Thread Scott James Remnant
Background: in the upcoming Ubuntu 5.10 we've been having some problems
with /dev/input/mice not being created on startup despite the mousedev
module being hard-loaded early in the boot sequence.
(http://bugzilla.ubuntu.com/show_bug.cgi?id=12915 for those interested).

Debian has had similar problems too (http://bugs.debian.org/317333) and
found that starting udevd earlier manually seemed to fix it.


After much debugging, I've finally figured out what's going on ... it's
a bit of a story, but here goes...


Your system boots up and gets to the S:S20modules-init-tools stage,
that's where we read /etc/modules and modprobe the modules in order.
Now modprobe is basically just a kernel request, and these days tends to
return pretty quicky to userspace without blocking for everything to
happen.

Deep Black Magic happens inside the kernel, and once it's done it
generates a series of hotplug events which it passes back to userspace
through two means; by running the program specified
in /proc/sys/kernel/hotplug with interesting environment; and also
through a netlink socket.

/proc/sys/kernel/hotplug is udevsend, a tool that gathers up this
environment and sends it over a local socket to the udevd process that
marshals all of these events.  If there's no daemon listening it tries
to start one up, and will retry sending the event for a while until it
gets to the other end.

Now we have a whole bunch of udevsend processes all run at pretty much
the same time, all of these try to start up udevd and all of the udevd
processes try to bind to the local socket to receive events on.  One of
them wins, the rest die and go away.  A little time passes by which time
all of the running udevsend will have dispatched their event to this
udevd that will marshal them.

This udevd _also_ begins listening on the netlink socket, as it's a
better way to get events from the kernel than having it execute
something which mucks around with IPC to get it to us.

Meanwhile the kernel is happily generating both /proc/sys/kernel/hotplug
and netlink events for what's happening on the box, in fact it's been
doing this all the time udevd has been getting its clothes on.

If the module sequence loaded is something like psmouse, mousedev, ...,
lp (exactly as it is in breezy machines that have been upgraded from
warty/hoary[0]) you may find that the first netlink event you receive is
actually for the printer port.

But that's ok, we had udevsend events for the rest...

Well, that's the theory; sadly here's the practice.

On receiving the netlink event for the printer port, udevd disables
receipt of any sequence numbered events from udevsend (ie. those that
will almost certainly be duplicated over the netlink socket).
Unfortunately this means all the udevsend events we're about to receive
from the processes that backed off a second or so while fighting over
who got to start udevd[1].

These udevsend processes deliver their events to udevd, which cheerfully
ignores them because it thinks it's going to get another copy over the
netlink socket any second now.  Unfortunately the netlink event has
already been and gone, and we just ignored an event we weren't supposed
to.


The two problems as I see them are:

1) The fact that receiving a netlink event disables sequence numbered
   udevsend events, when there's already code to deal with de-duping
   events anyway.  Is there actually any need for this additional check,
   can't we just queue both events and have them ignored by
   msg_queue_insert() ?

2) That this ignoring of events is done at receipt, rather than in queue
   order.  This means that the later parport_pc netlink event is able
   to disable queueing of udevsend events with a lower sequence number.

I can envisage that #1 is necessary in case the time between receiving
the udevsend and netlink event is so long that we've already processed
and removed one of the events by the time the second is queued.  In
which case the problem becomes fixing #2, however unless the kernel
promises strict ordering of events over the netlink socket (which I
doubt, otherwise it wouldn't need sequence numbers), we can't assume
that we've received all of the pre-netlink events we are going to.

I suspect the right solution is actually to implement history of what
events we've already processed, and de-dupe them that way; rather than
ignoring messages on receipt.

Scott

[0] A common fix has been to simply install breezy fresh; this happens
to change the /etc/modules order slightly and thus hide the bug.
[1] And if we deliberately start udevd before we begin any of this
module loading, it sees the netlink event, and thus again hides the 
bug.
-- 
Scott James Remnant
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part