I'm afraid I have no idea what you are talking about. Are you saying you are
launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent???
That would be a very bad idea. If you are running under Torque, then let mpirun
"do the right thing" and use its Torque-based launcher.
On
Let me expand on this slightly (in response to Ralph Castain's posting
-- I had digest mode set). As currently constructed a shellscript in
Wien2k (www.wien2k.at) launches a series of tasks using
($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]")
>>.time1_$loop &
where the
On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
> Let me expand on this slightly (in response to Ralph Castain's posting
> -- I had digest mode set). As currently constructed a shellscript in
> Wien2k (www.wien2k.at) launches a series of tasks using
>
> ($remote $remotemachine "cd $PWD;$t
Am 03.04.2011 um 16:56 schrieb Ralph Castain:
> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> Wien2k (www.wien2k.at) launches a series of
On Apr 3, 2011, at 9:12 AM, Reuti wrote:
> Am 03.04.2011 um 16:56 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set). As currently constructed a shellscript in
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> Wien2k
On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set). As
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
>
>> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>>
>>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>>
Let me expand on this
On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>>
>>> I am not using that computer. A scenario that I have come across is
>>> that when a msub job is killed because it has exceeded it's Walltime
>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>> Torque does not
You can prove this to yourself rather easily. Just ssh to a remote node and execute any command
that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a
"ps" on the remote node. I guarantee that the command will have died.
H ...
vayu1:~ > ssh v37 sleep
>
> It most certainly will! That mpirun on nodeB is executing under the ssh from
> nodeA, so when that ssh session is killed, it automatically kills everything
> run underneath it. And when mpirun dies, so does the job it was running, as
> per above.
> You can prove this to yourself rather easily.
Am 03.04.2011 um 22:57 schrieb Ralph Castain:
> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>
I am not using that computer. A scenario that I have come across is
that when a msub job is killed because it has exceeded it's Walltime
mpi tasks spawned by ssh may not be
On 04/04/2011 12:56 AM, Ralph Castain wrote:
What I still don't understand is why you are trying to do it this way. Why not
just run
time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN
/home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
where machineN contains the names
Am 03.04.2011 um 23:59 schrieb David Singleton:
> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>
>> What I still don't understand is why you are trying to do it this way. Why
>> not just run
>>
>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN
>>
Works great for me...sleep is dead every time.
On Apr 3, 2011, at 3:13 PM, David Singleton wrote:
>
>> You can prove this to yourself rather easily. Just ssh to a remote node and
>> execute any command that lingers for awhile - say something simple like
>> "sleep". Then kill the ssh and do a
On Apr 3, 2011, at 3:22 PM, Reuti wrote:
> Am 03.04.2011 um 22:57 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>
>
> I am not using that computer. A scenario that I have come across is
> that when a msub job is killed because it has exceeded it's
On Apr 3, 2011, at 4:08 PM, Reuti wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why
>>> not just run
>>>
>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH
On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why
>>> not just run
>>>
>>> time mpirun -v
On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
>> Am 03.04.2011 um 23:59 schrieb David Singleton:
>>
>>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
What I still don't understand is why you are trying to
Thanks. I will test this tomorrow.
Many people run Wien2k with openmpi as you say, I only became aware of
the issue of Wien2k (and perhaps other codes) leaving orphaned
processes still running a few days ago. I also know someone who wants
to run Wien2k on a system where both rsh and ssh are
And, before someone wonders, while Wien2k is a commercial code it is
about 500 Eu for a lifetime licence so this is not the same as Vasp or
Gaussian which cost $. And, I have no financial interest in the
code, but like many others help make it better (semi gnu).
On Sun, Apr 3, 2011 at 6:25
On Apr 3, 2011, at 5:25 PM, Laurence Marks wrote:
> Thanks. I will test this tomorrow.
>
> Many people run Wien2k with openmpi as you say, I only became aware of
> the issue of Wien2k (and perhaps other codes) leaving orphaned
> processes still running a few days ago. I also know someone who
22 matches
Mail list logo