Jeff Carr wrote:
I didn't notice this error. In any case, it was something I did wrong; I went back and did a simple check with your code and it is ok. I do notice though that you can generate:

May 5 16:31:50 localhost kernel: ib_mthca 0000:09:00.0: 1a0084/0: error CQE -> QPN 1a0406, WQE @ 00000042
May 5 16:31:50 localhost kernel: [ 0] 001a0406
May 5 16:31:50 localhost kernel: [ 4] 00001aed
May 5 16:31:50 localhost kernel: [ 8] 00000004
May 5 16:31:50 localhost kernel: [ c] 00003800
May 5 16:31:50 localhost kernel: [10] 128a0000
May 5 16:31:50 localhost kernel: [14] 00000000
May 5 16:31:50 localhost kernel: [18] 00000042
May 5 16:31:50 localhost kernel: [1c] ff000000


if you up the message_count to 0x1000. I'm guessing this is just some normal overrun error though.

It's taken me a while to look at this, but I think that this is a real error.

Cmpost is setting the CQ size too small, which can lead to the CQ overrun. The number of cqe's should have been message_count * 2, rather than just message_count. Message_count is fine on the client side, which receives all messages before sending. But on the server side, receives could begin coming in before all sends are done.

Thanks for the info.  I've submitted a change that should fix this.

- Sean
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to