Yo Ralph -- I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. Do you want to add a blurb in README about it, and/or have this executable compiled as part of the PSM MTL and then installed into $bindir (maybe named ompi-psm-keygen)?
Right now, it's only compiled as part of "make check" and not installed, right? On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: > Run the program only once - it can be in the prolog of the job if you like. > The output value needs to be in the env of every rank. > > You can reuse the value as many times as you like - it doesn't have to be > unique for each job. There is nothing magic about the value itself. > > On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: > >> How early does this need to run? Can I run it as part of a task >> prolog, or does it need to be the shell env for each rank? And does >> it need to run on one node or all the nodes in the job? >> >> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Well, I couldn't do it as a patch - proved too complicated as the psm >>> system looks for the value early in the boot procedure. >>> >>> What I can do is give you the attached key generator program. It outputs >>> the envar required to run your program. So if you run the attached program >>> and then export the output into your environment, you should be okay. Looks >>> like this: >>> >>> $ ./psm_keygen >>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>> $ >>> >>> You compile the program with the usual mpicc. >>> >>> Let me know if this solves the problem (or not). >>> Ralph >>> >>> >>> >>> >>> On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: >>> >>>> Sure, i'll give it a go >>>> >>>> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Ah, yes - that is going to be a problem. The PSM key gets generated by >>>>> mpirun as it is shared info - i.e., every proc has to get the same value. >>>>> >>>>> I can create a patch that will do this for the srun direct-launch >>>>> scenario, if you want to try it. Would be later today, though. >>>>> >>>>> >>>>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: >>>>> >>>>>> Well maybe not horray, yet. I might have jumped the gun a bit, it's >>>>>> looking like srun works in general, but perhaps not with PSM >>>>>> >>>>>> With PSM i get this error, (at least now i know what i changed) >>>>>> >>>>>> Error obtaining unique transport key from ORTE >>>>>> (orte_precondition_transports not present in the environment) >>>>>> PML add procs failed >>>>>> --> Returned "Error" (-1) instead of "Success" (0) >>>>>> >>>>>> Turn off PSM and srun works fine >>>>>> >>>>>> >>>>>> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> Hooray! >>>>>>> >>>>>>> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: >>>>>>> >>>>>>>> I think i take it all back. I just tried it again and it seems to >>>>>>>> work now. I'm not sure what I changed (between my first and this >>>>>>>> msg), but it does appear to work now. >>>>>>>> >>>>>>>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >>>>>>>> <mdidomeni...@gmail.com> wrote: >>>>>>>>> Yes that's true, error messages help. I was hoping there was some >>>>>>>>> documentation to see what i've done wrong. I can't easily cut and >>>>>>>>> paste errors from my cluster. >>>>>>>>> >>>>>>>>> Here's a snippet (hand typed) of the error message, but it does look >>>>>>>>> like a rank communications error >>>>>>>>> >>>>>>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>>>>>>>> contact information is unknown in file rml_oob_send.c at line 145. >>>>>>>>> *** MPI_INIT failure message (snipped) *** >>>>>>>>> orte_grpcomm_modex failed >>>>>>>>> --> Returned "A messages is attempting to be sent to a process whose >>>>>>>>> contact information us uknown" (-117) instead of "Success" (0) >>>>>>>>> >>>>>>>>> This msg repeats for each rank, an ultimately hangs the srun which i >>>>>>>>> have to Ctrl-C and terminate >>>>>>>>> >>>>>>>>> I have mpiports defined in my slurm config and running srun with >>>>>>>>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>>>>>>>> getting parts to the shell >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>>> wrote: >>>>>>>>>> I'm not sure there is any documentation yet - not much clamor for >>>>>>>>>> it. :-/ >>>>>>>>>> >>>>>>>>>> It would really help if you included the error message. Otherwise, >>>>>>>>>> all I can do is guess, which wastes both of our time :-( >>>>>>>>>> >>>>>>>>>> My best guess is that the port reservation didn't get passed down to >>>>>>>>>> the MPI procs properly - but that's just a guess. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: >>>>>>>>>> >>>>>>>>>>> Can anyone point me towards the most recent documentation for using >>>>>>>>>>> srun and openmpi? >>>>>>>>>>> >>>>>>>>>>> I followed what i found on the web with enabling the MpiPorts config >>>>>>>>>>> in slurm and using the --resv-ports switch, but I'm getting an error >>>>>>>>>>> from openmpi during setup. >>>>>>>>>>> >>>>>>>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >>>>>>>>>>> >>>>>>>>>>> I'm sure I'm missing a step. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/