Should have also warned you: you'll need to configure OMPI --with-devel-headers to get this program to build/run.
On Dec 30, 2010, at 1:54 PM, Ralph Castain wrote: > Well, I couldn't do it as a patch - proved too complicated as the psm system > looks for the value early in the boot procedure. > > What I can do is give you the attached key generator program. It outputs the > envar required to run your program. So if you run the attached program and > then export the output into your environment, you should be okay. Looks like > this: > > $ ./psm_keygen > OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 > $ > > You compile the program with the usual mpicc. > > Let me know if this solves the problem (or not). > Ralph > > <psm_keygen.c> > > On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: > >> Sure, i'll give it a go >> >> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Ah, yes - that is going to be a problem. The PSM key gets generated by >>> mpirun as it is shared info - i.e., every proc has to get the same value. >>> >>> I can create a patch that will do this for the srun direct-launch scenario, >>> if you want to try it. Would be later today, though. >>> >>> >>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: >>> >>>> Well maybe not horray, yet. I might have jumped the gun a bit, it's >>>> looking like srun works in general, but perhaps not with PSM >>>> >>>> With PSM i get this error, (at least now i know what i changed) >>>> >>>> Error obtaining unique transport key from ORTE >>>> (orte_precondition_transports not present in the environment) >>>> PML add procs failed >>>> --> Returned "Error" (-1) instead of "Success" (0) >>>> >>>> Turn off PSM and srun works fine >>>> >>>> >>>> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Hooray! >>>>> >>>>> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: >>>>> >>>>>> I think i take it all back. I just tried it again and it seems to >>>>>> work now. I'm not sure what I changed (between my first and this >>>>>> msg), but it does appear to work now. >>>>>> >>>>>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >>>>>> <mdidomeni...@gmail.com> wrote: >>>>>>> Yes that's true, error messages help. I was hoping there was some >>>>>>> documentation to see what i've done wrong. I can't easily cut and >>>>>>> paste errors from my cluster. >>>>>>> >>>>>>> Here's a snippet (hand typed) of the error message, but it does look >>>>>>> like a rank communications error >>>>>>> >>>>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>>>>>> contact information is unknown in file rml_oob_send.c at line 145. >>>>>>> *** MPI_INIT failure message (snipped) *** >>>>>>> orte_grpcomm_modex failed >>>>>>> --> Returned "A messages is attempting to be sent to a process whose >>>>>>> contact information us uknown" (-117) instead of "Success" (0) >>>>>>> >>>>>>> This msg repeats for each rank, an ultimately hangs the srun which i >>>>>>> have to Ctrl-C and terminate >>>>>>> >>>>>>> I have mpiports defined in my slurm config and running srun with >>>>>>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>>>>>> getting parts to the shell >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> wrote: >>>>>>>> I'm not sure there is any documentation yet - not much clamor for it. >>>>>>>> :-/ >>>>>>>> >>>>>>>> It would really help if you included the error message. Otherwise, all >>>>>>>> I can do is guess, which wastes both of our time :-( >>>>>>>> >>>>>>>> My best guess is that the port reservation didn't get passed down to >>>>>>>> the MPI procs properly - but that's just a guess. >>>>>>>> >>>>>>>> >>>>>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: >>>>>>>> >>>>>>>>> Can anyone point me towards the most recent documentation for using >>>>>>>>> srun and openmpi? >>>>>>>>> >>>>>>>>> I followed what i found on the web with enabling the MpiPorts config >>>>>>>>> in slurm and using the --resv-ports switch, but I'm getting an error >>>>>>>>> from openmpi during setup. >>>>>>>>> >>>>>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >>>>>>>>> >>>>>>>>> I'm sure I'm missing a step. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >