Hmmm...yes, I guess we did get off-track then. This soln is exactly what I
proposed on the first response to your thread, and was repeated by others later
on. :-/
So long as mpirun is executed on the node where the "sister mom" is located,
and as long as your script "B" does -not- include an
On Apr 4, 2011, at 10:38 AM, Laurence Marks wrote:
> Thanks, I think we may have a mistaken communication here; I assume
> that the computer where they have disabled rsh and ssh they have
> "something" to communicate with so we don't need to use pbsdsh.
Clarification in terminology:
-
Thanks, I think we may have a mistaken communication here; I assume
that the computer where they have disabled rsh and ssh they have
"something" to communicate with so we don't need to use pbsdsh. If
they don't there is not much a lowly user like me can do.
I think we can close this, since like
I apologize - I realized late last night that I had a typo in my recommended
command. It should read:
mpirun -mca plm rsh -mca plm_rsh_agent pbsdsh -mca ras ^tm --machinefile m1
^^^
Also, if you know that #procs <= #cores on your nodes,
On Apr 3, 2011, at 5:25 PM, Laurence Marks wrote:
> Thanks. I will test this tomorrow.
>
> Many people run Wien2k with openmpi as you say, I only became aware of
> the issue of Wien2k (and perhaps other codes) leaving orphaned
> processes still running a few days ago. I also know someone who
And, before someone wonders, while Wien2k is a commercial code it is
about 500 Eu for a lifetime licence so this is not the same as Vasp or
Gaussian which cost $. And, I have no financial interest in the
code, but like many others help make it better (semi gnu).
On Sun, Apr 3, 2011 at 6:25
Thanks. I will test this tomorrow.
Many people run Wien2k with openmpi as you say, I only became aware of
the issue of Wien2k (and perhaps other codes) leaving orphaned
processes still running a few days ago. I also know someone who wants
to run Wien2k on a system where both rsh and ssh are
On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
>> Am 03.04.2011 um 23:59 schrieb David Singleton:
>>
>>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
What I still don't understand is why you are trying to
On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why
>>> not just run
>>>
>>> time mpirun -v
On Apr 3, 2011, at 4:08 PM, Reuti wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why
>>> not just run
>>>
>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH
On Apr 3, 2011, at 3:22 PM, Reuti wrote:
> Am 03.04.2011 um 22:57 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>
>
> I am not using that computer. A scenario that I have come across is
> that when a msub job is killed because it has exceeded it's
Works great for me...sleep is dead every time.
On Apr 3, 2011, at 3:13 PM, David Singleton wrote:
>
>> You can prove this to yourself rather easily. Just ssh to a remote node and
>> execute any command that lingers for awhile - say something simple like
>> "sleep". Then kill the ssh and do a
Am 03.04.2011 um 23:59 schrieb David Singleton:
> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>
>> What I still don't understand is why you are trying to do it this way. Why
>> not just run
>>
>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN
>>
On 04/04/2011 12:56 AM, Ralph Castain wrote:
What I still don't understand is why you are trying to do it this way. Why not
just run
time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN
/home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
where machineN contains the names
Am 03.04.2011 um 22:57 schrieb Ralph Castain:
> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>
I am not using that computer. A scenario that I have come across is
that when a msub job is killed because it has exceeded it's Walltime
mpi tasks spawned by ssh may not be
>
> It most certainly will! That mpirun on nodeB is executing under the ssh from
> nodeA, so when that ssh session is killed, it automatically kills everything
> run underneath it. And when mpirun dies, so does the job it was running, as
> per above.
> You can prove this to yourself rather easily.
You can prove this to yourself rather easily. Just ssh to a remote node and execute any command
that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a
"ps" on the remote node. I guarantee that the command will have died.
H ...
vayu1:~ > ssh v37 sleep
On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>>
>>> I am not using that computer. A scenario that I have come across is
>>> that when a msub job is killed because it has exceeded it's Walltime
>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>> Torque does not
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
>
>> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>>
>>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>>
Let me expand on this
On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set). As
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> Wien2k
On Apr 3, 2011, at 9:12 AM, Reuti wrote:
> Am 03.04.2011 um 16:56 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set). As currently constructed a shellscript in
Am 03.04.2011 um 16:56 schrieb Ralph Castain:
> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> Wien2k (www.wien2k.at) launches a series of
On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
> Let me expand on this slightly (in response to Ralph Castain's posting
> -- I had digest mode set). As currently constructed a shellscript in
> Wien2k (www.wien2k.at) launches a series of tasks using
>
> ($remote $remotemachine "cd $PWD;$t
Let me expand on this slightly (in response to Ralph Castain's posting
-- I had digest mode set). As currently constructed a shellscript in
Wien2k (www.wien2k.at) launches a series of tasks using
($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]")
>>.time1_$loop &
where the
I'm afraid I have no idea what you are talking about. Are you saying you are
launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent???
That would be a very bad idea. If you are running under Torque, then let mpirun
"do the right thing" and use its Torque-based launcher.
On
I have a problem which may or may not be openmpi, but since this list
was useful before with a race condition I am posting.
I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
Torque does not know about ssh tasks launched from a task. In a simple
case, a script launches three
27 matches
Mail list logo