>I might be missing your point but UD is unreliable so the sends can be >dropped. The delay/retry is to make sure the join does occur,
This is different than a dropped request or reply. In this case, the receiver gets a reply, but it will be a failure from the SA to join the group. For example, a NonMember tries to re-join before a FullMember which would have created the group does. The result is that requests that receive a reply also need to be retried, with the timeout dependent on some remote node in the fabric creating the group. >> So, the only safe thing to do is for all multicast clients to detach from all >> multicast groups, destroy all address handles, > >Why all groups ? Because the SM has lost track that any groups in the fabric existed, so those groups must be recreated, all potentially with different mlids. >> possibly wait for a new group to be created, and then start all over again. > >Start what all over again ? I meant attach the QP to the new group and allocate a new address handle. This is a general comment, and not directed at anyone specific, but is this really the architecture and implementation that we want to aim for? I really think that we need to look at solutions that don't break existing communication, unless the links providing that communication actually go down, even if this means extending the architecture. - Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
