Well maybe not horray, yet.  I might have jumped the gun a bit, it's
looking like srun works in general, but perhaps not with PSM

With PSM i get this error, (at least now i know what i changed)

Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in the environment)
PML add procs failed
--> Returned "Error" (-1) instead of "Success" (0)

Turn off PSM and srun works fine


On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Hooray!
>
> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote:
>
>> I think i take it all back.  I just tried it again and it seems to
>> work now.  I'm not sure what I changed (between my first and this
>> msg), but it does appear to work now.
>>
>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico
>> <mdidomeni...@gmail.com> wrote:
>>> Yes that's true, error messages help.  I was hoping there was some
>>> documentation to see what i've done wrong.  I can't easily cut and
>>> paste errors from my cluster.
>>>
>>> Here's a snippet (hand typed) of the error message, but it does look
>>> like a rank communications error
>>>
>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
>>> contact information is unknown in file rml_oob_send.c at line 145.
>>> *** MPI_INIT failure message (snipped) ***
>>> orte_grpcomm_modex failed
>>> --> Returned "A messages is attempting to be sent to a process whose
>>> contact information us uknown" (-117) instead of "Success" (0)
>>>
>>> This msg repeats for each rank, an ultimately hangs the srun which i
>>> have to Ctrl-C and terminate
>>>
>>> I have mpiports defined in my slurm config and running srun with
>>> -resv-ports does show the SLURM_RESV_PORTS environment variable
>>> getting parts to the shell
>>>
>>>
>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> I'm not sure there is any documentation yet - not much clamor for it. :-/
>>>>
>>>> It would really help if you included the error message. Otherwise, all I 
>>>> can do is guess, which wastes both of our time :-(
>>>>
>>>> My best guess is that the port reservation didn't get passed down to the 
>>>> MPI procs properly - but that's just a guess.
>>>>
>>>>
>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote:
>>>>
>>>>> Can anyone point me towards the most recent documentation for using
>>>>> srun and openmpi?
>>>>>
>>>>> I followed what i found on the web with enabling the MpiPorts config
>>>>> in slurm and using the --resv-ports switch, but I'm getting an error
>>>>> from openmpi during setup.
>>>>>
>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM
>>>>>
>>>>> I'm sure I'm missing a step.
>>>>>
>>>>> Thanks
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to