[email protected] wrote on Thu, 08 Jan 2009 18:18 -0600:
> Good to hear from you.  I think I understand what you're describing, but 
> I want to make sure.  It probably seems like I'm parroting what you just 
> told me back to you, sorry about that.
>
> Each method that doesn't already use file descriptors (tcp) creates a  
> pipe, and hands back one end of the pipe to the BMI generic code.  The  
> method then registers a callback to the underlying networking api, which 
> writes to its end of the pipe (an operation id or something).  The BMI 
> generic code maps the fds that changed, and for each in turn calls their 
> completion calls.  Is that the idea?
>
> For methods like GM that can't asynchronously notify via a callback, a  
> separate thread would have to poll and write to its pipe on changes.

Yes.  In some of these methods like IB, you get a native fd from the
OS, just like TCP does.  They don't need to construct this extra
pipe.  They just hand back the fd that should be checked for
readibility.  Your pipe mechanism can be used with anything that
needs a non-fd-based notification mechanism, and is a good general
fallback.

> This does solve the problem that I don't have to test a method if  
> nothing is ready, so I skip needlessly waiting up to the timeout for  
> that method.  But what if two methods both have work to be done?  So  
> lets say I poll in the BMI generic code, and discover that work for both 
> tcp and ib can be done, so I first call the completion call for tcp, and 
> then the completion call for ib.  The completed ib operations are still 
> held in the completion list while the tcp method is doing its work, and 
> don't get returned to the job layer (or flow) until the tcp completion 
> call returns.  The callback idea attempts to address this, as the 
> completed operations get notified via callback pretty much right away.

I see what you mean.  First, I'm not convinced that you'll get a lot
of cases where both methods will have something to do when you go
look.  The server code ends up back in BMI_test* often, so hopefully
will see no more than one method that wants attention.  Second, if
you see two methods, and call the TCP testcontext first, it doesn't
take too long, so the maximum time that IB will have to wait is
pretty small.  TCP isn't doing reads and writes in testcontext, just
sticking entries in the list.

If this aspect really bothers you, perhaps you could sort the
methods by their "fast"-ness, and service the active fds
preferentially from those first, returning immediately without
calling into the slow method(s).  The caller will end up back in
BMI_test* again soon, then you can call into the slow method if
there's nothing better to do.  Always have to worry about starvation
in such cases, though.

I think it's cleaner from an API view not to require BMI consumers
to register a callback.  A synchronous callback that is only used to
return values from a running BMI_test*.  Especially if the
performance optimization here is rare and small.  But you've
certainly been in this code more recently than I.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to