Thank You Raplh
It works
:)
On Wed, Dec 29, 2010 at 4:23 PM, Ralph Castain wrote:
> Both look perfectly right to me. The difference is only because your
> "success" one still has the ssh session active.
>
> It looks to me like something is preventing communication when the ssh
> session is t
Yes that's true, error messages help. I was hoping there was some
documentation to see what i've done wrong. I can't easily cut and
paste errors from my cluster.
Here's a snippet (hand typed) of the error message, but it does look
like a rank communications error
ORTE_ERROR_LOG: A message is at
I think i take it all back. I just tried it again and it seems to
work now. I'm not sure what I changed (between my first and this
msg), but it does appear to work now.
On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico
wrote:
> Yes that's true, error messages help. I was hoping there was so
Hooray!
On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote:
> I think i take it all back. I just tried it again and it seems to
> work now. I'm not sure what I changed (between my first and this
> msg), but it does appear to work now.
>
> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenic
Well maybe not horray, yet. I might have jumped the gun a bit, it's
looking like srun works in general, but perhaps not with PSM
With PSM i get this error, (at least now i know what i changed)
Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in the environ
Ah, yes - that is going to be a problem. The PSM key gets generated by mpirun
as it is shared info - i.e., every proc has to get the same value.
I can create a patch that will do this for the srun direct-launch scenario, if
you want to try it. Would be later today, though.
On Dec 30, 2010, at
Sure, i'll give it a go
On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain wrote:
> Ah, yes - that is going to be a problem. The PSM key gets generated by mpirun
> as it is shared info - i.e., every proc has to get the same value.
>
> I can create a patch that will do this for the srun direct-launch
Well, I couldn't do it as a patch - proved too complicated as the psm system
looks for the value early in the boot procedure.
What I can do is give you the attached key generator program. It outputs the
envar required to run your program. So if you run the attached program and then
export the o
Should have also warned you: you'll need to configure OMPI --with-devel-headers
to get this program to build/run.
On Dec 30, 2010, at 1:54 PM, Ralph Castain wrote:
> Well, I couldn't do it as a patch - proved too complicated as the psm system
> looks for the value early in the boot procedure.
How early does this need to run? Can I run it as part of a task
prolog, or does it need to be the shell env for each rank? And does
it need to run on one node or all the nodes in the job?
On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain wrote:
> Well, I couldn't do it as a patch - proved too compl
Run the program only once - it can be in the prolog of the job if you like. The
output value needs to be in the env of every rank.
You can reuse the value as many times as you like - it doesn't have to be
unique for each job. There is nothing magic about the value itself.
On Dec 30, 2010, at 2:
11 matches
Mail list logo