On Thu, 2021-06-17 at 15:49 +0000, Pritchard Jr., Howard via ofiwg wrote: > Hi All, > > For cray aries network the auth key is handled by two external > widgets: > 1. part of job launching procedure either with aprun or slurm, or > 2. there' an rdma credentials server an application can use - > https://cug.org/proceedings/cug2016_proceedings/includes/files/pap108s2-file1.pdf > I think mercury and some other libfabric consumers have used that.
This is getting a little far afield from libfabric, but maybe Michael Heinz might appreciate if I provide a more concrete example of a higher-level library (mercury) using auth_key to manage RDMA credentials A service provider acquires a credential allowing other processes not in this cotext (e.g. aprun or srun) https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-launch-group-drc.c#L172 The provider shares that credential with the other provider processes via MPI https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-launch-group-drc.c#L175 or PMIx https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-launch-group-drc.c#L203 We turn the 'credential' into a 'cookie' https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-launch-group-drc.c#L231 And stash that string-type cookie into Mercury's "auth_key" https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-launch-group-drc.c#L235 This provider saves a little blob of state, containing information such as the network address of the provider and this credential. Clients of this provder load up this blob, obtain the credential, and inform mercury of the "auth_key" to use for communication: https://github.com/mochi-hpc/mochi-ssg/blob/main/tests/ssg-observe-group-drc.c#L112 Now that I write this all out it sounds kind of convoluted, but it turns out to be more portable than relying on Cray "aprun" protection domains. ==rob > In both cases It's an external agent that is handling this. > > I believe for HPE slingshot11 there's a pmix plugin that will do 1 > (not sure about that though) > > Howar > > > On 6/17/21, 8:57 AM, "ofiwg on behalf of Hefty, Sean" < > [email protected] on behalf of [email protected] > > wrote: > > > Thanks for the reply, Sean. > > > > I agree that the auth_key needs to come from something at a > higher level. I've been > > experimenting with Intel MPI, though, and I can't figure out > how to get it to generate > > one - the auth_key fields in the domain and ep attributes are > null when I see them. > > I've ended up using a shell variable passed in on the mpirun > command but I feel like > > that should be the fallback rather than the only solution. > > I don't know how Intel MPI handles job keys. But having MPI > generate a key doesn't seem any better than libfabric generating one, > unless you're including mpirun or the the start-up as part of > MPI. I'll forward your email separately to one of the MPI > developers. > > - Sean > _______________________________________________ > ofiwg mailing list > [email protected] > https://lists.openfabrics.org/mailman/listinfo/ofiwg > > _______________________________________________ > ofiwg mailing list > [email protected] > https://lists.openfabrics.org/mailman/listinfo/ofiwg _______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
