Sean Hefty wrote:
The multicast module should work in this specific case, since the only client is ipoib, and ipoib first leaves the group before re-joining.

I think that there's a race here. If ipoib leaves, then re-joins quickly enough, the join request will be processed before the leave. The result is that the join will be fulfilled locally, without an additional MAD sent. (Trying to process the leave immediately doesn't fix the problem in the generic case, where there may be multiple users of a group.)

A temporary fix would be to always send a MAD, even if the join can be fulfilled locally. But I'm looking at having the multicast module re-join on an event. This raises the possibility that the new join request may fail, which would require the multicast module to report that a membership is no longer active.

Another problem is if some nodes are joined as NonMembers or SendOnlyNonMembers, then the SM will not create the multicast group when they try to re-join. This leads to a race where NonMembers and SendOnlyNonMembers will fail to re-join until one of the FullMembers joins.

- Sean

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to