Hi Konstantin,

On 01/18/15 11:54, Konstantin Belousov wrote:
On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote:
Hi Konstantin,

On 01/16/15 09:03, Konstantin Belousov wrote:
My opinion is that you should have tried to handle the issue at the driver
level, instead of making this devfs issue.  I.e., if you already have
cdev node with the correct name, driver should have replaced the private
data to point to new device.

I think this way cannot be implemented in a clean way, because of
locking order reversal. And if you try to avoid the LOR you end up with
the race. Chess mate sort of ;-)  Let me explain:

Thread 1:
usb_sx_lock();
    cdev = Look in freelist for existing device();
else
    cdev = make_dev();
usb_sx_unlock();

Thread 2:
usb_sx_lock();
put cdev on freelist
usb_sx_unlock();

Thread 3:
usb_sx_lock();
cdev = remove first entry in freelist
usb_sx_unlock();

/*
   * XXX because USB needs to call destroy_dev() unlocked we
   * are racing with Thread 1 again
   */
destroy_dev(cdev);
I do not understand the 'freelist' part.

You have some usb (or whatever other kind) device, which is represented by
some data structure.  This data structure references cdev.  In reverse,
cdev points to this data structure by si_drv1 or similar driver-owned
member of struct cdev.

The structures naming usb devices have some mapping to/from usb bus addresses.
When device is destroyed or created due to external plug event, there is
some mechanism that ensures mapping consistence.  It could be lock, it
could be single-threaded process which discover the bus, or something
else, I do not know.

While the structure notes that device with some address goes out and comes
in, it can look up the corresponding cdev and just change the si_drv1 pointer
to point to the new device.

What about events for "devd" that a new character devices is present in the system or being destroyed for user-space applications?


I find it very strange that you need sleepless operation which can
_both_ destroy and create cdev. At least one operation of creation or
destruction must sleep, at least in the current devfs design. It could
be made sleepless, by making VFS visible part even less connected to the
cdev part, but this is not how it (can) currently work.

I think it would be good practice that all resources needed for creating a character device can be allocated prior to registration. That means we first should allocate all resources needed, but not register as a single function.

For example:

Once make_dev() has returned, we can get an "d_open" callback. But "si_drv1/2" is always set by drivers _after_ that "make_dev()" has returned! This causes a race we could simply avoid by splitting the make_dev() like I suggest. Instead of putting more logic and code inside the drivers to handle the race!



This would also close a window where /dev node is non-existent or operate
erronously.

I'm not saying I plan so, but I think "cdevs" at some point need to
understand mutexes and locks. That means like with other API's in the
kernel we can associate a lock with the "cdev", and this lock is then
used to ensure an atomic shutdown of the system in a non-blocking
fashion. In my past experience multithreaded APIs should be high level
implemented like this:

NON-BLOCKING methods:
lock(); **
xxx_start();
xxx_stop();
unlock();

BLOCKING methods:
setup(); // init
unsetup(); // drain

Any callbacks should always be called locked **

In devfs there was no non-blocking stop before I added the delist_dev()
function.


You do not understand my point.

I object against imposing one additional global reference on all cdevs
just to cope with the delist hack.  See the patch at the end of the message.

It's fine by me.

WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
delist_dev(), the node still exists in the /dev. It is only removed
after populate loop is run sometime later. dev_sched() KPI is inheritly
racy, drivers must handle the races for other reasons.

The populate loop is all running under the dev_lock() from what I can
see and make_dev() is also keeping the same lock when inserting and
removing new cdevs. The populate loop should always give a consistent
No, populate loop is not run under dev_mtx.

Are you sure:

static int
devfs_populate_loop(struct devfs_mount *dm, int cleanup)
{
        struct cdev_priv *cdp;
        struct devfs_dirent *de;
        struct devfs_dirent *dd, *dt;
        struct cdev *pdev;
        int de_flags, depth, j;
        char *q, *s;

        sx_assert(&dm->dm_lock, SX_XLOCKED);
        dev_lock();
          ^^^^ what is this ?
        TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {

It would not be able to
allocate memory, even in M_NOWAIT fashion.  The dev_mtx is after the
vm and vnode locks.  Only iteration over the cdevp_list is protected by
dev_mtx, which is dropped right after something which require an
action, is found on the list. Populate() needs some way to avoid
reading inconsistent data from the cdev layer, but there is not
guarantee that we are always up to date.

view of the character devices available, and I don't see how "cdev"
structures without the CDP_ACTIVE flag can appear with recently created
ones, even if the name is the same.
It cannot simply because it is not synchronized with the device
creation/destruction.



Lookup (or any other VFS-level code) is not protected by dev_mtx, it
is only dm lock which ensures that populate loop is not run in parallel,
modifying direntries.

See comment above.



That system functions can still call into the dangling read/write/ioctl
functions is another story, and that is why I tell, that in order to
simplify this teardown, we possibly should associate a client selectable
lock with each CDEV, for teardown purposes. Like done for several years
in the callout and USB APIs and possibly many other places.
There is d_purge method which provides the tear-down semantic for
drivers which needs it.  Only tty uses the method AFAIK.

Devfs ensures that no client code is active when cdev is removed for real.

--HPS
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to