[ ... ]
> > What's the advantage ? You don't save nothing. Typing 30 characters,
> > maybe. For what ? Purely academic interest ?
> 
> You save multiple calls to a sub-allocator.  In my case, there would need to 
be
> a kmem_alloc and a kmem_free.  Yes, as I mentioned above, I could cache 
allocate
> the condition variable, but why bother when I can simply init and destroy it
> before and after use on the call stack?
> 
> > Best Regards,
> > FrankH.
> 
> Likewise :-)


If saving calls to any allocator is of paramount importance, then
why not just allocate and init (to save cv_init/cv_destroy as well)
a large-enough array of condition variables in driver:_init() ?
Or, as you've said, create a kmem cache with pre-init-ed ones ?

How many concurrently-outstanding requests do you expect ?
What's your average wait time ? Your desired dispatch latency ?

I mean: You obviously want to minimize latency. But I have to wonder:
if this is such an extremely low-latency read() entry point, why not
spinlocks ? Putting the thread to sleep in the first place and resuming
it again takes more time than the cached kmem_alloc/free anyway.

In other words: Yes, you could use a "permanent frame" on the stack
as a source for such a cv. I'm still not sure why doing that would
be an advantage over cached pre-creation. It definitely has the 
problem that the scope/usability of such a cv is severely limited,
to the call stack that contains it only. I.e. the creator may never
leave the kernel while this cv exists (which makes e.g. async reads
in the driver impossible). If it does and some consumer (i.e. the
"notifier") still holds a reference, extremely hard to debug stack
corruption problems will result. A request-notify-based mechanism
(where you alloc-submit-wait-free, one item at a time) is far more
flexible - and far less error-prone should anybody ever decide to
"unrestrict" the codeflow ...

After all, there must be more "state" to such a request than just
the cv-to-wait-on. Unless you can really make sure that all this
state comes from that initial frame (and isn't e.g. retrieved
from userspace requiring possibly-blocking copyin/uiomove), then
I don't see where gains from using locals instead of pre-initialized
globals could come from.

Did you try to measure latency with the different solutions ?

dtrace's timestamping mechanism makes it quite easy to measure
times between:

        - the app doing the read syscall and entering mydrv:read()
        - entering mydrv:read() and issuing the call to cv_wait()
        - returning from cv_wait() and returning from mydrv:read()
        - returning from mydrv:read() and returning from the syscall

i.e. do something like:

#!/usr/sbin/dtrace -s

syscall::read:entry
/* add some condition to check you're reading from your device */
{
        self->syscalltime = timestamp;
        self->traced = 1;
}

fbt:mydrv:read:entry
/self->traced/
{
        self->tracecv = 1;
        self->readtime = timestamp;
}

fbt::cv_wait:entry
/self->tracecv/
{
        self->cvtime = timestamp;
}

fbt::cv_wait:return
/self->tracecv/
{
        self->cvrettime = timestamp;
        self->tracecv = 0;
}

fbt:mydrv:read:return
/self->traced/
{
        self->readrettime = timestamp;
}

syscall::read:return
/self->traced/
{
        @times["systoread"] = quantize(self->readtime - self->syscalltime);
        @times["readtocv"] = quantize(self->cvtime - self->readtime);
        @times["cvwait"] = quantize(self->cvrettime - self->cvtime);
        @times["readret"] = quantize(timestamp - self->readrettime);
        self->traced = 0;
}


This should allow to measure and determine what's quickest, and
whether/where optimizations make sense.

Best regards,
FrankH.

_______________________________________________
opensolaris-code mailing list
[email protected]
https://opensolaris.org:444/mailman/listinfo/opensolaris-code

Reply via email to