I made several modifications to the MAD layer to assist with debugging this, and after a multitude of test runs I was able to see the output shown below.

Basically, a send work request is completing, but the MAD associated with the request has been freed/corrupted. It's likely that the errors for MADs "not at the head of send queue" are a result of the corrupted MAD.

- Sean

cmpost: starting client
cmpost: connecting
cmpost: connect time: 4072000 us
cmpost: waiting to disconnect
cmpost: test complete
send_comp_handler - completion for MAD not on send queue QP: 1
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not on send queue QP: 1
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not on send queue QP: 1
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
send_comp_handler - completion for MAD not at head of send queue QP: 1
found later in queue
Slab corruption: start=dfc9ddd8, len=256
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01305f8>](worker_thread+0x1a8/0x230)
000: 60 d3 c9 df 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Prev obj: start=dfc9dccc, len=256
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<f8c6c6eb>](ib_mad_send_done_handler+0x2b/0x120 [ib_mad])
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=dfc9dee4, len=256
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<f8c6c6eb>](ib_mad_send_done_handler+0x2b/0x120 [ib_mad])
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to