Hi All,
I found a race condition might lead to this warning message and then some
requests might be left unfinished forever in the BMI layer. So I formatted a
patch as attached to fix this.
Here is the details:
The problem here is that the correctness of BMI depends on the order of two
events: the completion event of the work request for the MSG_CTS message, the
arrival of the message with type MSG_RTS_DONE.
So the state transformation of a recv request would be:
RQ_RTS_WAITING_CTS_SEND_COMPLETION -> RQ_RTS_WAITING_RTS_DONE ->
RQ_RTS_WAITING_USER_TEST(recv request is completed.)
However if the MSG_RTS_DONE messages arrives first, then there would be no
request in the state of RQ_RTS_WAITING_RTS_DONE, so it can't advance the state
machine. As a result, a log message is printed and the message is dropped(it
has no impact on the existing requests).
So I want to change the logic to the following, so that the correctness of BMI
doesn't depend on the order of these events:
1. Firstly, let's see the definition of the state flags:
RQ_RTS_WAITING_CTS_SEND_COMPLETION = 0x20,
RQ_RTS_WAITING_RTS_DONE = 0x40,
RQ_RTS_WAITING_USER_TEST = 0x80,
2. When MSG_CTS is sent, set the state of the request to
(RQ_RTS_WAITING_CTS_SEND_COMPLETION | RQ_RTS_WAITING_RTS_DONE |
RQ_RTS_WAITING_USER_TEST)
3. When the work request for MSG_CTS is completed. Do the following:
if (request->state & RQ_RTS_WAITING_CTS_SEND_COMPLETION)
clear_bit(request->state, RQ_RTS_WAITING_CTS_SEND_COMPLETION);
4. When the message of type MSG_RTS_DONE arrives, Do the following:
for (all recv requests where (mop_id == request.id && request.state &
RQ_RTS_WAITING_RTS_DONE)) {
clear_bit(request->state, RQ_RTS_WAITING_RTS_DONE);
}
5. So no matter of the order of step 3 and step 4, the state of the
request will finally get into the state RQ_RTS_WAITING_USER_TEST as expected.
Best Regards,
Jingwang.
Here are some mails in the list reported by someone else which I think is
related to this issue.
troy at
scl.ameslab.gov<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers>
wrote on Fri, 22 Feb 2008 14:11 -0600:
> We just had this occur..
>
> Is this really a valid assert? Are there any other valid states that can
> cause a transition to RQ_RTS_WAITING_USER_TEST besides
> RQ_RTS_WAITING_RTS_DONE?
>
> [D 02/22 13:44] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
> [E 02/22 13:44] max send/recv sge 14 15
> [E 02/22 13:52] Error: encourage_recv_incoming: mop_id 10164ae0 in RTS_DONE
> message not found.
The other side did an RDMA, then sent a message saying "the rdma is
done" and referencing the given mop_id. This side is complaining
it doesn't know about any such mop_id. There's really not much else
to do here but die. How did this side forget about the mop_id? Did
the other side send a duplicate done message? Any of these things
would be bugs.
Perhaps a cancelled message on the receiver might lead to some sort
of breakage here. You probably would have logs talking about that.
You could add more debug to this loop
rq = NULL;
qlist_for_each_entry(rqt, &ib_device->recvq, list) {
if (rqt->c == c && rqt->rts_mop_id == mh_rts_done.mop_id &&
rqt->state.recv == RQ_RTS_WAITING_RTS_DONE) {
rq = rqt;
break;
}
}
to see if it knows about the mop_id but is in the wrong state. Be
sure not to break, then, as multiple rqt may have the same mop_id,
but no more than one should be waiting for the rts done.
-- Pete
fix_missing_rts_done.patch
Description: fix_missing_rts_done.patch
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
