Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/9/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > I'd like to propose the following change as a simple solution for handling SA > scalability problems: > > Modify the ib_sa module to support an SA LID that's separate from the SM LID. > > This concept is supported by the spec through SA redirection; however, I > propose > that we also allow the SA LID to be set manually by an administrator. > Additional details are below. > > --- > > The SA LID can be set to a local or remote LID - it doesn't matter to the > kernel. All SA MADs (PR queries, MC joins, event registration, etc.) would be > sent to that destination for processing. > Initially, I envision a user space library capable of responding to PR > queries, > but it could be expanded to respond to other types of requests. How the > library > responds to requests (forwarding them to the SM/SA, using lookup tables, etc.) > is outside the scope of the proposal. Other than PRs, what SA requests are planned to be handled without reforwarding to the "real" SM/SA ? Is it just PRs ? Even PRs in QoS mode will be a challenge and likely be forwarded. -- Hal > - Sean > ___ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
Roland Dreier wrote: My initial hope is that we can get away with replicating the data, and using the existing SA protocol to keep it in sync. For the example you gave, SA LID X could forward the set to the master SA before recording it locally. SA LID Y would check with the master SA for the data if it weren't found local. In any case, I don't see what having a mechanism for manual SA redirect buys you. I'm probably missing the point, but obviously you're not planning on having sysadmins manually set the SA LID on each node, which implies having some automatic agent that can set the SA LID. And given the existence of that agent, I don't see the difficulty in putting that agent in the real SA and using the existing SA redirect protocol to handle setting the SA LID. Sean, Knowing the long way you have passed in order to solve the order-n-squared-load-on-the-SA-with-all-to-all-path-query-on-mpi-job-startup I really liked your idea which allows for implementing what you have called the "local sa" as user space service which --if-- installed and running allows for offloading the SA in the mentioned scenario. From many aspects which were discussed over the previous threads, it --really-- makes a difference if the "local sa" resides in the kernel or in user space, and my take is put it in user space. Roland, What this mechanism buys Sean is solving the PR problem. Indeed sysadmins would have to install/enable the user space rpm that does the job, if they want this feature. Other then that, I don't see any manual work: a possible design I see here, is that the "user space SA" would use the SM LID for each port such that it does PR offload as the actual SA LID to replicate/invalidate/etc PR info and "proxy" non PR queries to. Indeed, end-in-mind, this package may or may not be the basis for real distributed SA for those many K ports IB clusters. However, assuming its possible to implement the concept with only a little change in the kernel ib_sa (ie conditioned on a module param etc), I would not block this approach at this stage, and let Sean suggest a concrete design. Or. ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
> > Don't you need a distributed SA for your idea to work? Otherwise what > > happens if node A sets a service record at SA LID X, and then node B > > sends a query for that service record at SA LID Y? > > My initial hope is that we can get away with replicating the data, and > using the existing SA protocol to keep it in sync. For the example > you gave, SA LID X could forward the set to the master SA before > recording it locally. SA LID Y would check with the master SA for the > data if it weren't found local. You would need a protocol to invalidate any locally cached service records when a service record is deleted. And similarly for everything else that needs to be kept coherent. That sounds like what I would call a distributed SA. In any case, I don't see what having a mechanism for manual SA redirect buys you. I'm probably missing the point, but obviously you're not planning on having sysadmins manually set the SA LID on each node, which implies having some automatic agent that can set the SA LID. And given the existence of that agent, I don't see the difficulty in putting that agent in the real SA and using the existing SA redirect protocol to handle setting the SA LID. - R. ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/13/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > In terms of the redirection fields (for the local subnet), right ? > > Yes - any field in ClassPortInfo named Redirect*. I don't see that it > needs to be limited to the local subnet. > > > I was referring to the implications on the partition configuration in > > terms of the ports for the ib_sa module. > > I don't know that an implementation needs to handle different PKeys or > GRH fields right away, but I don't believe that general SA redirection > prohibits this. (I really only care about the LID at the moment.) Even in this case, it requires that the port(s) running the ib_sa module are in the full members of the default partition whereas before they might have been limited members. > - Sean > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
In terms of the redirection fields (for the local subnet), right ? Yes - any field in ClassPortInfo named Redirect*. I don't see that it needs to be limited to the local subnet. I was referring to the implications on the partition configuration in terms of the ports for the ib_sa module. I don't know that an implementation needs to handle different PKeys or GRH fields right away, but I don't believe that general SA redirection prohibits this. (I really only care about the LID at the moment.) - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/13/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > There are implications on the "deployment" model for this based on the PKey. > > The intent is to match ClassPortInfo fields. In terms of the redirection fields (for the local subnet), right ? I was referring to the implications on the partition configuration in terms of the ports for the ib_sa module. > - Sean > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/13/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > > In that mode, I suppose it also requires an admin to reset it when the > > node for the ib_sa module fails. > > Not necessarily. See C13-43.1.1: > > When a request for a particular MADHeader:MgmtClass has been redirected > to another location, that location shall continue to service requests > for the MADHeader:MgmtClass until either the location becomes inoperable > for some reason or the requests are redirected again away from that > location. > > Separately from admin or SA controlled redirection, the ib_sa module > will need to determine when to fail back to the master SM/SA. I would > do this based on X number of retries/timeouts. Yes, that was what I was getting at. > > - Sean > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
Don't you need a distributed SA for your idea to work? Otherwise what happens if node A sets a service record at SA LID X, and then node B sends a query for that service record at SA LID Y? My initial hope is that we can get away with replicating the data, and using the existing SA protocol to keep it in sync. For the example you gave, SA LID X could forward the set to the master SA before recording it locally. SA LID Y would check with the master SA for the data if it weren't found local. - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
In that mode, I suppose it also requires an admin to reset it when the node for the ib_sa module fails. Not necessarily. See C13-43.1.1: When a request for a particular MADHeader:MgmtClass has been redirected to another location, that location shall continue to service requests for the MADHeader:MgmtClass until either the location becomes inoperable for some reason or the requests are redirected again away from that location. Separately from admin or SA controlled redirection, the ib_sa module will need to determine when to fail back to the master SM/SA. I would do this based on X number of retries/timeouts. - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
There are implications on the "deployment" model for this based on the PKey. The intent is to match ClassPortInfo fields. - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
> > My first reaction to this is to wonder why we wouldn't just use > > redirection as already specified in the IB architecture? It seems > > that having something that has to be set manually and that breaks all > > communication if it is set wrong is a bad idea. > > This requires changes to the SAs to send the redirect message in > response to a query initiated by the host. I was trying to avoid > creating an actual distributed SA. Don't you need a distributed SA for your idea to work? Otherwise what happens if node A sets a service record at SA LID X, and then node B sends a query for that service record at SA LID Y? - R. ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
My first reaction to this is to wonder why we wouldn't just use redirection as already specified in the IB architecture? It seems that having something that has to be set manually and that breaks all communication if it is set wrong is a bad idea. This requires changes to the SAs to send the redirect message in response to a query initiated by the host. I was trying to avoid creating an actual distributed SA. - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
> >This concept is supported by the spec through SA redirection; however, I > >propose that we also allow the SA LID to be set manually by an > >administrator. > Is the following an acceptable approach for this? > > Add a class device file to the ib_sa that allows setting all SA redirection > parameters (SL, LID, PKey, QP - with processing similar to SRP's add_target). > > /sys/class/infiniband_sa/sa-mthca-0/redirect_sa My first reaction to this is to wonder why we wouldn't just use redirection as already specified in the IB architecture? It seems that having something that has to be set manually and that breaks all communication if it is set wrong is a bad idea. - R. ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/9/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > I'd like to propose the following change as a simple solution for handling SA > scalability problems: > > Modify the ib_sa module to support an SA LID that's separate from the SM LID. > > This concept is supported by the spec through SA redirection; however, I > propose > that we also allow the SA LID to be set manually by an administrator. In that mode, I suppose it also requires an admin to reset it when the node for the ib_sa module fails. > Additional details are below. > > --- > > The SA LID can be set to a local or remote LID - it doesn't matter to the > kernel. All SA MADs (PR queries, MC joins, event registration, etc.) would be > sent to that destination for processing. > > Initially, I envision a user space library capable of responding to PR > queries, > but it could be expanded to respond to other types of requests. How the > library > responds to requests (forwarding them to the SM/SA, using lookup tables, etc.) > is outside the scope of the proposal. Currently outside the scope of the proposal is how a failure of that SA cache node is handled and how nodes learn the SA redirection information (other than it being admin'd currently). > - Sean > ___ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] the never ending search for SA scalability
On 8/10/07, Sean Hefty <[EMAIL PROTECTED]> wrote: > >I'd like to propose the following change as a simple solution for handling SA > >scalability problems: > > > >Modify the ib_sa module to support an SA LID that's separate from the SM LID. > > > >This concept is supported by the spec through SA redirection; however, I > >propose that we also allow the SA LID to be set manually by an administrator. > > Roland, > > Is the following an acceptable approach for this? > > Add a class device file to the ib_sa that allows setting all SA redirection > parameters (SL, LID, PKey, QP There are implications on the "deployment" model for this based on the PKey. > - with processing similar to SRP's add_target). > > /sys/class/infiniband_sa/sa-mthca-0/redirect_sa > > Alternatively, I can use a module parameter to redirect only the SA LID. > > - Sean > ___ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] [RFC] the never ending search for SA scalability
>I'd like to propose the following change as a simple solution for handling SA >scalability problems: > >Modify the ib_sa module to support an SA LID that's separate from the SM LID. > >This concept is supported by the spec through SA redirection; however, I >propose that we also allow the SA LID to be set manually by an administrator. Roland, Is the following an acceptable approach for this? Add a class device file to the ib_sa that allows setting all SA redirection parameters (SL, LID, PKey, QP - with processing similar to SRP's add_target). /sys/class/infiniband_sa/sa-mthca-0/redirect_sa Alternatively, I can use a module parameter to redirect only the SA LID. - Sean ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
