Re: [Pvfs2-developers] fix BMI multiplexing of multiple methods

Sam Lang Thu, 08 Jan 2009 16:19:05 -0800


On Jan 8, 2009, at 4:19 PM, Pete Wyckoff wrote:

[email protected] wrote on Wed, 07 Jan 2009 16:06 -0600:

Right now if multiple methods are enabled in BMI, we tend to get poor
performance from the "fast" network, because BMI_testcontext iterates
through all the active methods calling testcontext for each one.  It
tries to be smart about which methods get scheduled ;-) to prevent
starvation, but it treats all the methods fairly, which tends to make
tcp (the slow one) hog the time spent in testcontext.  I have a few

ideas for this, so I'll go ahead and propose them and let you allshoot

them down or propose others.


I've always been fond of a third Option:  CENTRALIZED_POLLING.  All
BMI methods are changed to hand back an fd to some core BMI routine.
Individual BMI methods do not poll their devices.  The core BMI
routine sticks all fds in a single select() or epoll(), and when it
gets one that triggers, calls back into the appropriate BMI method
to do its business.  No need to balance across all the methods.

This can work today with TCP obviously, and IB with some minor
manipulation.  GM cannot fit in such a framework, I believe, being
completely poll-driven.  MX should work however, I think.  If a
method wants to poll for a bit after getting an fd trigger, it can
get away with that.

This is how pretty much all externally driven server applications
work today.  Lots of threads are still not as good a way to manage
concurrency.


Hi Pete,

Good to hear from you. I think I understand what you're describing,but I want to make sure. It probably seems like I'm parroting whatyou just told me back to you, sorry about that.

Each method that doesn't already use file descriptors (tcp) creates apipe, and hands back one end of the pipe to the BMI generic code. Themethod then registers a callback to the underlying networking api,which writes to its end of the pipe (an operation id or something).The BMI generic code maps the fds that changed, and for each in turncalls their completion calls. Is that the idea?

For methods like GM that can't asynchronously notify via a callback, aseparate thread would have to poll and write to its pipe on changes.

This does solve the problem that I don't have to test a method ifnothing is ready, so I skip needlessly waiting up to the timeout forthat method. But what if two methods both have work to be done? Solets say I poll in the BMI generic code, and discover that work forboth tcp and ib can be done, so I first call the completion call fortcp, and then the completion call for ib. The completed ib operationsare still held in the completion list while the tcp method is doingits work, and don't get returned to the job layer (or flow) until thetcp completion call returns. The callback idea attempts to addressthis, as the completed operations get notified via callback prettymuch right away.


-sam



Tweaking construct_poll_plan() is doomed to fail.  I've tried.
Maybe I misunderstood your CALLBACK option and you're thinking like
this too.

                -- Pete


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] fix BMI multiplexing of multiple methods

Reply via email to