Yeah, I don't particularly like adding special cases either.
I feel like making the consumer play with timeouts or use an extra
thread would be just as much of a hack/workaround, though. Its just
moving the problem elsewhere.
Fundamentally it seems more like a BMI API flaw. It would have made
more sense (for example) if unexpected messages were assigned to a
specific context and the testunexpected() and testcontext() functions
were combined. The consumer could then use a single test call to
retrieve both unexpected and normal messages at once if they are in the
same context (as in the pvfs2-server use case). Testing on a different
context would ignore the presence of unexpected messages (as in the
problem triggering use case here).
There are other ways to deal with it, that's just an example. We just
need the API to better express the intention of the caller (preferably
in one function) so that BMI doesn't have to optimize by guessing about
what else is going on.
That is more work than just adding a flag, though :) It probably
depends on if we think the use case is going to be around long enough to
justify tweaking the API.
-Phil
Sam Lang wrote:
I've committed the set_info fix for this. I'm not crazy about it, but
it should work for now. In the long term, we should probably move away
from method specific hacks like this. I.e. it should be up to the API
consumer (our server) to adjust timeouts or call testunexpected in a
separate thread.
Nawab, in the zoidfs init code after initializing BMI you need to call:
int check = 0;
BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
-sam
On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
Sam Lang wrote:
Hi All,
I think Nawab has found a bug (or untested code path) in the BMI tcp
method. He's running a daemon that both receives unexpected requests
(as a server), and receives expected responses (as a client).
In the BMI_testcontext call, if there aren't any completed (expected)
operations, and there are completed unexpected receives, we return
immediately, assuming that BMI_testunexpected will be called in
turn. I think the idea here is that we want to keep our latency down
for unexpected messages, instead of doing work on expected messages
while unexpected messages are waiting in the hopper. But the daemon
is single threaded, and making blocking PVFS_sys_* calls, so we
essentially spin forever calling BMI_testcontext over and over.
I'm not sure of the best way to fix this. Easy fixes would be to
remove the check for completed unexpected receives, and/or do
tcp_do_work for a shorter timeout.
It seems like we have a special case for blocking PVFS_sys_* calls.
We want to ignore unexpected receives just in that case, and actually
call tcp_do_work. In other contexts, I think we want the behavior
that we have now, where we assume that a BMI_testunexpected call will
follow a BMI_testcontext call. We could modify the testcontext call
to take a separate parameter, but that seems messy. We might also be
able to handle this with separate BMI contexts somehow...
I haven't dug in the code yet to see if I see any more elegant way to
handle it, but I wanted to mention that if you want to add a special
flag to toggle the behavior, it might be better to just set it
globally with the set_info() function rather than modifying the
testcontext() api. That way you don't have to change any of the other
BMI methods. There are already a couple of similar set_info() calls to
toggle BMI behavior for different use cases.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers