Hi All,


I found a race condition might lead to this warning message and then some 
requests might be left unfinished forever in the BMI layer. So I formatted a 
patch as attached to fix this.



Here is the details:



The problem here is that the correctness of BMI depends on the order of two 
events: the completion event of the work request  for the MSG_CTS message, the 
arrival of the message with type MSG_RTS_DONE.



So the state transformation of a recv request would be:

RQ_RTS_WAITING_CTS_SEND_COMPLETION -> RQ_RTS_WAITING_RTS_DONE -> 
RQ_RTS_WAITING_USER_TEST(recv request is completed.)



However if the MSG_RTS_DONE messages arrives first, then there would be no 
request in the state of RQ_RTS_WAITING_RTS_DONE, so it can't advance the state 
machine. As a result, a log message is printed and the message is dropped(it 
has no impact on the existing requests).



So I want to change the logic to the following, so that the correctness of BMI 
doesn't depend on the order of these events:

1.      Firstly, let's see the definition of the state flags:

    RQ_RTS_WAITING_CTS_SEND_COMPLETION = 0x20,

    RQ_RTS_WAITING_RTS_DONE = 0x40,

    RQ_RTS_WAITING_USER_TEST = 0x80,

2.      When MSG_CTS is sent, set the state of the request to 
(RQ_RTS_WAITING_CTS_SEND_COMPLETION | RQ_RTS_WAITING_RTS_DONE | 
RQ_RTS_WAITING_USER_TEST)

3.      When the work request for MSG_CTS is completed. Do the following:

if (request->state & RQ_RTS_WAITING_CTS_SEND_COMPLETION)

    clear_bit(request->state, RQ_RTS_WAITING_CTS_SEND_COMPLETION);

4.      When the message of type MSG_RTS_DONE arrives, Do the following:

for (all recv requests where (mop_id == request.id && request.state & 
RQ_RTS_WAITING_RTS_DONE)) {

    clear_bit(request->state, RQ_RTS_WAITING_RTS_DONE);

}

5.      So no matter of the order of step 3 and step 4, the state of the 
request will finally get into the state RQ_RTS_WAITING_USER_TEST as expected.



Best Regards,

Jingwang.



Here are some mails in the list reported by someone else which I think is 
related to this issue.



troy at 
scl.ameslab.gov<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers>
 wrote on Fri, 22 Feb 2008 14:11 -0600:

> We just had this occur..

>

> Is this really a valid assert? Are there any other valid states that can

> cause a transition to RQ_RTS_WAITING_USER_TEST besides

> RQ_RTS_WAITING_RTS_DONE?

>

> [D 02/22 13:44] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.

> [E 02/22 13:44] max send/recv sge 14 15

> [E 02/22 13:52] Error: encourage_recv_incoming: mop_id 10164ae0 in RTS_DONE

> message not found.



The other side did an RDMA, then sent a message saying "the rdma is

done" and referencing the given mop_id.  This side is complaining

it doesn't know about any such mop_id.  There's really not much else

to do here but die.  How did this side forget about the mop_id?  Did

the other side send a duplicate done message?  Any of these things

would be bugs.



Perhaps a cancelled message on the receiver might lead to some sort

of breakage here.  You probably would have logs talking about that.



You could add more debug to this loop



        rq = NULL;

        qlist_for_each_entry(rqt, &ib_device->recvq, list) {

            if (rqt->c == c && rqt->rts_mop_id == mh_rts_done.mop_id &&

                rqt->state.recv == RQ_RTS_WAITING_RTS_DONE) {

                rq = rqt;

                break;

            }

        }



to see if it knows about the mop_id but is in the wrong state.  Be

sure not to break, then, as multiple rqt may have the same mop_id,

but no more than one should be waiting for the rts done.



               -- Pete

Attachment: fix_missing_rts_done.patch
Description: fix_missing_rts_done.patch

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to