Yo Ralph --

I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197.  
Do you want to add a blurb in README about it, and/or have this executable 
compiled as part of the PSM MTL and then installed into $bindir (maybe named 
ompi-psm-keygen)?  

Right now, it's only compiled as part of "make check" and not installed, right?



On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:

> Run the program only once - it can be in the prolog of the job if you like. 
> The output value needs to be in the env of every rank.
> 
> You can reuse the value as many times as you like - it doesn't have to be 
> unique for each job. There is nothing magic about the value itself.
> 
> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
> 
>> How early does this need to run? Can I run it as part of a task
>> prolog, or does it need to be the shell env for each rank?  And does
>> it need to run on one node or all the nodes in the job?
>> 
>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Well, I couldn't do it as a patch - proved too complicated as the psm 
>>> system looks for the value early in the boot procedure.
>>> 
>>> What I can do is give you the attached key generator program. It outputs 
>>> the envar required to run your program. So if you run the attached program 
>>> and then export the output into your environment, you should be okay. Looks 
>>> like this:
>>> 
>>> $ ./psm_keygen
>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
>>> $
>>> 
>>> You compile the program with the usual mpicc.
>>> 
>>> Let me know if this solves the problem (or not).
>>> Ralph
>>> 
>>> 
>>> 
>>> 
>>> On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote:
>>> 
>>>> Sure, i'll give it a go
>>>> 
>>>> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Ah, yes - that is going to be a problem. The PSM key gets generated by 
>>>>> mpirun as it is shared info - i.e., every proc has to get the same value.
>>>>> 
>>>>> I can create a patch that will do this for the srun direct-launch 
>>>>> scenario, if you want to try it. Would be later today, though.
>>>>> 
>>>>> 
>>>>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote:
>>>>> 
>>>>>> Well maybe not horray, yet.  I might have jumped the gun a bit, it's
>>>>>> looking like srun works in general, but perhaps not with PSM
>>>>>> 
>>>>>> With PSM i get this error, (at least now i know what i changed)
>>>>>> 
>>>>>> Error obtaining unique transport key from ORTE
>>>>>> (orte_precondition_transports not present in the environment)
>>>>>> PML add procs failed
>>>>>> --> Returned "Error" (-1) instead of "Success" (0)
>>>>>> 
>>>>>> Turn off PSM and srun works fine
>>>>>> 
>>>>>> 
>>>>>> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>> Hooray!
>>>>>>> 
>>>>>>> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote:
>>>>>>> 
>>>>>>>> I think i take it all back.  I just tried it again and it seems to
>>>>>>>> work now.  I'm not sure what I changed (between my first and this
>>>>>>>> msg), but it does appear to work now.
>>>>>>>> 
>>>>>>>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico
>>>>>>>> <mdidomeni...@gmail.com> wrote:
>>>>>>>>> Yes that's true, error messages help.  I was hoping there was some
>>>>>>>>> documentation to see what i've done wrong.  I can't easily cut and
>>>>>>>>> paste errors from my cluster.
>>>>>>>>> 
>>>>>>>>> Here's a snippet (hand typed) of the error message, but it does look
>>>>>>>>> like a rank communications error
>>>>>>>>> 
>>>>>>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
>>>>>>>>> contact information is unknown in file rml_oob_send.c at line 145.
>>>>>>>>> *** MPI_INIT failure message (snipped) ***
>>>>>>>>> orte_grpcomm_modex failed
>>>>>>>>> --> Returned "A messages is attempting to be sent to a process whose
>>>>>>>>> contact information us uknown" (-117) instead of "Success" (0)
>>>>>>>>> 
>>>>>>>>> This msg repeats for each rank, an ultimately hangs the srun which i
>>>>>>>>> have to Ctrl-C and terminate
>>>>>>>>> 
>>>>>>>>> I have mpiports defined in my slurm config and running srun with
>>>>>>>>> -resv-ports does show the SLURM_RESV_PORTS environment variable
>>>>>>>>> getting parts to the shell
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>> wrote:
>>>>>>>>>> I'm not sure there is any documentation yet - not much clamor for 
>>>>>>>>>> it. :-/
>>>>>>>>>> 
>>>>>>>>>> It would really help if you included the error message. Otherwise, 
>>>>>>>>>> all I can do is guess, which wastes both of our time :-(
>>>>>>>>>> 
>>>>>>>>>> My best guess is that the port reservation didn't get passed down to 
>>>>>>>>>> the MPI procs properly - but that's just a guess.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote:
>>>>>>>>>> 
>>>>>>>>>>> Can anyone point me towards the most recent documentation for using
>>>>>>>>>>> srun and openmpi?
>>>>>>>>>>> 
>>>>>>>>>>> I followed what i found on the web with enabling the MpiPorts config
>>>>>>>>>>> in slurm and using the --resv-ports switch, but I'm getting an error
>>>>>>>>>>> from openmpi during setup.
>>>>>>>>>>> 
>>>>>>>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM
>>>>>>>>>>> 
>>>>>>>>>>> I'm sure I'm missing a step.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to