Roland,

 

I am trying to write a user level application that receives multicast UD packets at user level. I am seeing about 1-2 % packet loss between the send side and the receive side apparently independent of the packet rate for low rates. (Heavily traced sends and receives with very low rates still drop packets even though there are more packets posted on the receive side than are sent.) I have a couple of questions:

 

1.       Are there any race issues with ibv_get_cq_event? The example code (ud_pingpong) seems to imply that the correct sequence is

 

Start:

            Call ibv_get_cq_event

            Call ibv_ack_cq_event               <- anywhere so long as it happens before destroy_cq

            Call ibv_req_notify_cq

            Call ibv_poll_cq                        <- just once not as usual until empty according to the example

            Goto start

 

In the old days we called request notify and poll until poll was empty on a notify thread in order to prevent a race.

 

2.       When I post say 500 receive buffers and send say 200 send buffers and tag the sends with a sequence number I often see one or two missing sequence numbers at the receive side at the poll_cq interface having checked at the post and poll interfaces of the send side to see that all the correct sequence numbers went out. I am not sure how this can be possible regardless of the notification scheme used.

 

I would love for this to be a programming error in my code but I can’t figure out how I can mess it up between post_send and poll_cq on the receive side. I see the same behavior between systems and with a loopback between two ports on the same HCA.

 

Please let me know if this rings any bells.

 

Bob Pearson

 

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to