I've committed the set_info fix for this. I'm not crazy about it, but
it should work for now. In the long term, we should probably move
away from method specific hacks like this. I.e. it should be up to
the API consumer (our server) to adjust timeouts or call
testunexpected in a separate thread.
Nawab, in the zoidfs init code after initializing BMI you need to call:
int check = 0;
BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
-sam
On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
Sam Lang wrote:
Hi All,
I think Nawab has found a bug (or untested code path) in the BMI
tcp method. He's running a daemon that both receives unexpected
requests (as a server), and receives expected responses (as a
client).
In the BMI_testcontext call, if there aren't any completed
(expected) operations, and there are completed unexpected receives,
we return immediately, assuming that BMI_testunexpected will be
called in turn. I think the idea here is that we want to keep our
latency down for unexpected messages, instead of doing work on
expected messages while unexpected messages are waiting in the
hopper. But the daemon is single threaded, and making blocking
PVFS_sys_* calls, so we essentially spin forever calling
BMI_testcontext over and over.
I'm not sure of the best way to fix this. Easy fixes would be to
remove the check for completed unexpected receives, and/or do
tcp_do_work for a shorter timeout.
It seems like we have a special case for blocking PVFS_sys_*
calls. We want to ignore unexpected receives just in that case,
and actually call tcp_do_work. In other contexts, I think we want
the behavior that we have now, where we assume that a
BMI_testunexpected call will follow a BMI_testcontext call. We
could modify the testcontext call to take a separate parameter, but
that seems messy. We might also be able to handle this with
separate BMI contexts somehow...
I haven't dug in the code yet to see if I see any more elegant way
to handle it, but I wanted to mention that if you want to add a
special flag to toggle the behavior, it might be better to just set
it globally with the set_info() function rather than modifying the
testcontext() api. That way you don't have to change any of the
other BMI methods. There are already a couple of similar set_info()
calls to toggle BMI behavior for different use cases.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers