On Wed, 2006-06-07 at 22:48, Sean Hefty wrote: > >I might be missing your point but UD is unreliable so the sends can be > >dropped. The delay/retry is to make sure the join does occur, > > This is different than a dropped request or reply. In this case, the receiver > gets a reply, but it will be a failure from the SA to join the group.
By receiver, I think you are referring to SA requester. Yes, the SA would reject the request with a status ERR_REQ_INSUFFICIENT_COMPONENTS. > For example, a NonMember tries to re-join before a FullMember which would have > created the group does. The result is that requests that receive a reply also > need to be retried, with the timeout dependent on some remote node in the > fabric > creating the group. and it is unknown when such a multicast registration (to create the group) would occur. So the proper timeout is unknown. That's why IPoIB has a couple of different strategies for handling this depending on the JoinState, > >> So, the only safe thing to do is for all multicast clients to detach from > >> all > >> multicast groups, destroy all address handles, > > > >Why all groups ? > > Because the SM has lost track that any groups in the fabric existed, so those > groups must be recreated, all potentially with different mlids. Yes, in the case of client reregister. > >> possibly wait for a new group to be created, and then start all over again. > > > >Start what all over again ? > > I meant attach the QP to the new group and allocate a new address handle. Couldn't it modify the old one as an alternative strategy ? > This is a general comment, and not directed at anyone specific, Don't worry. I'm not taking it personally. Just want to give you my $0.02 worth on what I think you are saying below: > but is this > really the architecture and implementation that we want to aim for? I really > think that we need to look at solutions that don't break existing > communication, > unless the links providing that communication actually go down, even if this > means extending the architecture. If this comment is directed at client reregister mechanism, you should note that when this was brought up there was resistance to it based on the recommendation (probably not a strong enough word for this) that SMs be redundant in the subnet. There was a fair bit of anecdotal evidence that this was not how they were being used at the time but it may have been a chicken and egg problem. -- Hal > - Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
