On Jan 8, 2009, at 4:19 PM, Pete Wyckoff wrote:
[email protected] wrote on Wed, 07 Jan 2009 16:06 -0600:
Right now if multiple methods are enabled in BMI, we tend to get poor
performance from the "fast" network, because BMI_testcontext iterates
through all the active methods calling testcontext for each one. It
tries to be smart about which methods get scheduled ;-) to prevent
starvation, but it treats all the methods fairly, which tends to make
tcp (the slow one) hog the time spent in testcontext. I have a few
ideas for this, so I'll go ahead and propose them and let you all
shoot
them down or propose others.
I've always been fond of a third Option: CENTRALIZED_POLLING. All
BMI methods are changed to hand back an fd to some core BMI routine.
Individual BMI methods do not poll their devices. The core BMI
routine sticks all fds in a single select() or epoll(), and when it
gets one that triggers, calls back into the appropriate BMI method
to do its business. No need to balance across all the methods.
This can work today with TCP obviously, and IB with some minor
manipulation. GM cannot fit in such a framework, I believe, being
completely poll-driven. MX should work however, I think. If a
method wants to poll for a bit after getting an fd trigger, it can
get away with that.
This is how pretty much all externally driven server applications
work today. Lots of threads are still not as good a way to manage
concurrency.
Hi Pete,
Good to hear from you. I think I understand what you're describing,
but I want to make sure. It probably seems like I'm parroting what
you just told me back to you, sorry about that.
Each method that doesn't already use file descriptors (tcp) creates a
pipe, and hands back one end of the pipe to the BMI generic code. The
method then registers a callback to the underlying networking api,
which writes to its end of the pipe (an operation id or something).
The BMI generic code maps the fds that changed, and for each in turn
calls their completion calls. Is that the idea?
For methods like GM that can't asynchronously notify via a callback, a
separate thread would have to poll and write to its pipe on changes.
This does solve the problem that I don't have to test a method if
nothing is ready, so I skip needlessly waiting up to the timeout for
that method. But what if two methods both have work to be done? So
lets say I poll in the BMI generic code, and discover that work for
both tcp and ib can be done, so I first call the completion call for
tcp, and then the completion call for ib. The completed ib operations
are still held in the completion list while the tcp method is doing
its work, and don't get returned to the job layer (or flow) until the
tcp completion call returns. The callback idea attempts to address
this, as the completed operations get notified via callback pretty
much right away.
-sam
Tweaking construct_poll_plan() is doomed to fail. I've tried.
Maybe I misunderstood your CALLBACK option and you're thinking like
this too.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers