Re: [Pvfs2-developers] bmi testcontext/testunexpected

Phil Carns Tue, 06 Jan 2009 12:50:40 -0800

Yeah, I don't particularly like adding special cases either.

I feel like making the consumer play with timeouts or use an extrathread would be just as much of a hack/workaround, though. Its justmoving the problem elsewhere.

Fundamentally it seems more like a BMI API flaw. It would have mademore sense (for example) if unexpected messages were assigned to aspecific context and the testunexpected() and testcontext() functionswere combined. The consumer could then use a single test call toretrieve both unexpected and normal messages at once if they are in thesame context (as in the pvfs2-server use case). Testing on a differentcontext would ignore the presence of unexpected messages (as in theproblem triggering use case here).

There are other ways to deal with it, that's just an example. We justneed the API to better express the intention of the caller (preferablyin one function) so that BMI doesn't have to optimize by guessing aboutwhat else is going on.

That is more work than just adding a flag, though :) It probablydepends on if we think the use case is going to be around long enough tojustify tweaking the API.


-Phil

Sam Lang wrote:

I've committed the set_info fix for this. I'm not crazy about it, butit should work for now. In the long term, we should probably move awayfrom method specific hacks like this. I.e. it should be up to the APIconsumer (our server) to adjust timeouts or call testunexpected in aseparate thread.
Nawab, in the zoidfs init code after initializing BMI you need to call:

int check = 0;
BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);

-sam

On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
Sam Lang wrote:
Hi All,
I think Nawab has found a bug (or untested code path) in the BMI tcpmethod. He's running a daemon that both receives unexpected requests(as a server), and receives expected responses (as a client).In the BMI_testcontext call, if there aren't any completed (expected)operations, and there are completed unexpected receives, we returnimmediately, assuming that BMI_testunexpected will be called inturn. I think the idea here is that we want to keep our latency downfor unexpected messages, instead of doing work on expected messageswhile unexpected messages are waiting in the hopper. But the daemonis single threaded, and making blocking PVFS_sys_* calls, so weessentially spin forever calling BMI_testcontext over and over.I'm not sure of the best way to fix this. Easy fixes would be toremove the check for completed unexpected receives, and/or dotcp_do_work for a shorter timeout.It seems like we have a special case for blocking PVFS_sys_* calls.We want to ignore unexpected receives just in that case, and actuallycall tcp_do_work. In other contexts, I think we want the behaviorthat we have now, where we assume that a BMI_testunexpected call willfollow a BMI_testcontext call. We could modify the testcontext callto take a separate parameter, but that seems messy. We might also beable to handle this with separate BMI contexts somehow...
I haven't dug in the code yet to see if I see any more elegant way tohandle it, but I wanted to mention that if you want to add a specialflag to toggle the behavior, it might be better to just set itglobally with the set_info() function rather than modifying thetestcontext() api. That way you don't have to change any of the otherBMI methods. There are already a couple of similar set_info() calls totoggle BMI behavior for different use cases.
-Phil


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] bmi testcontext/testunexpected

Reply via email to