devfs kern

Hans Petter Selasky Sun, 18 Jan 2015 04:55:15 -0800

Hi Konstantin,

On 01/18/15 12:31, Konstantin Belousov wrote:

On Sun, Jan 18, 2015 at 12:08:26PM +0100, Hans Petter Selasky wrote:

Hi Konstantin,


On 01/18/15 11:54, Konstantin Belousov wrote:

On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote:

Hi Konstantin,


What about events for "devd" that a new character devices is present in
the system or being destroyed for user-space applications?

It is up to the driver to decide.  If wanted, it could post the pair of
destroy/create itself (e.g., if there is some state which require
the userspace to reset), or it could decide not to.  The later is
reasonable if the destroy/create is believed to be caused by a glitch
rather than the actual replug.

I would really prefer if that path is chosen, that kern_conf.c exposesan API for this. Consider the following case:

You have a daemon that is started and stopped based upon "devd" DEVFScreate/destroy events. The daemon is started when the create event isreceived. Then the kernel side calls destroy_dev(), because theapplication still keeps the file handle opened, the destroy_dev() neversends the destroy event, and the application then never receives thedestroy event --- another example of a totally invisible deadlock.


I find it very strange that you need sleepless operation which can
_both_ destroy and create cdev. At least one operation of creation or

My simple argument against sleepable operation inside devfs functionswhich are used to register and unregister a character devices is simplythat it leads to the fact we cannot make an atomic shutdown of a kernelapplication which involves devfs and other kernel subsystems. Let meexplain.

It is not enough that devfs works alone, it needs to work with togetherwith other kernel APIs aswell. Let's make up an example. A kernel moduleis using three different APIs, lets say USB, the callout subsystem anddevfs.


Devfs is calling USB to start and stop USB transfers.
The callout subsystem is calling selwakeup() and waking up devfs.
The USB subsystem is calling selwakeup() and waking up devfs aswell.
The USB subsystem is also calling destroy_dev().

This is a very high-level architectural thought and I hope you get it:

How can you tear down a kernel application using three differentsubsystems in the kernel _atomically_ without having lingering callbacks?

To make the shutdown sequence atomic, we might want to use a mutex. Nowiff destroy_dev() is sleepable we need to drop that mutex whendestroying the character device, and then woops - the shutdown sequenceis no longer atomic. That's the root of the problem, and please thinkbeyond devfs alone. The problem happens when devfs is used together withother APIs and applications in the kernel.

destruction must sleep, at least in the current devfs design. It could
be made sleepless, by making VFS visible part even less connected to the
cdev part, but this is not how it (can) currently work.


I think it would be good practice that all resources needed for creating
a character device can be allocated prior to registration. That means we
first should allocate all resources needed, but not register as a single
function.

For example:

Once make_dev() has returned, we can get an "d_open" callback. But
"si_drv1/2" is always set by drivers _after_ that "make_dev()" has
returned! This causes a race we could simply avoid by splitting the
make_dev() like I suggest. Instead of putting more logic and code inside
the drivers to handle the race!

I wanted to tunnel the si_drv values through the make_dev().
I.e. there could appear yet another variant of make_dev() which takes
the initialization parameters for the si_* driver vars.

I just did not have time/motivation to do the pass.


I might be interested in giving you a hand there, but not right now.


This would also close a window where /dev node is non-existent or operate
erronously.


I'm not saying I plan so, but I think "cdevs" at some point need to
understand mutexes and locks. That means like with other API's in the
kernel we can associate a lock with the "cdev", and this lock is then
used to ensure an atomic shutdown of the system in a non-blocking
fashion. In my past experience multithreaded APIs should be high level
implemented like this:

NON-BLOCKING methods:
lock(); **
xxx_start();
xxx_stop();
unlock();

BLOCKING methods:
setup(); // init
unsetup(); // drain

Any callbacks should always be called locked **

In devfs there was no non-blocking stop before I added the delist_dev()
function.


You do not understand my point.

I object against imposing one additional global reference on all cdevs
just to cope with the delist hack.  See the patch at the end of the message.


It's fine by me.


WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
delist_dev(), the node still exists in the /dev. It is only removed
after populate loop is run sometime later. dev_sched() KPI is inheritly
racy, drivers must handle the races for other reasons.


The populate loop is all running under the dev_lock() from what I can
see and make_dev() is also keeping the same lock when inserting and
removing new cdevs. The populate loop should always give a consistent

No, populate loop is not run under dev_mtx.


Are you sure:

Yes, I am sure.  I explained it in the next sentences which you break from
the first one in the citation.

"devfs_populate()" is called exclusivly locked by "dm_lock" and doesn'treturn until an atomic TAILQ_FOREACH() of the "devp_list" under"dev_lock()" is consistent - right? So the VFS node tree will not beinconsistent under "dm_lock" either - right? Or is there something whichI don't see?


void
devfs_populate(struct devfs_mount *dm)
{
        unsigned gen;

        sx_assert(&dm->dm_lock, SX_XLOCKED);
        gen = devfs_generation;
        if (dm->dm_generation == gen)
                return;

It would not be able to
allocate memory, even in M_NOWAIT fashion.  The dev_mtx is after the
vm and vnode locks.  Only iteration over the cdevp_list is protected by
dev_mtx, which is dropped right after something which require an
action, is found on the list. Populate() needs some way to avoid
reading inconsistent data from the cdev layer, but there is not
guarantee that we are always up to date.

^^^^^^^^^


Are you taking the sleepable "dm_lock" into account?

--HPS
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r277199 - in head/sys: fs/devfs kern

Reply via email to