Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
I'm afraid I have no idea what you are talking about. Are you saying you are launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent??? That would be a very bad idea. If you are running under Torque, then let mpirun "do the right thing" and use its Torque-based launcher. On

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
Let me expand on this slightly (in response to Ralph Castain's posting -- I had digest mode set). As currently constructed a shellscript in Wien2k (www.wien2k.at) launches a series of tasks using ($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]") >>.time1_$loop & where the

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > Let me expand on this slightly (in response to Ralph Castain's posting > -- I had digest mode set). As currently constructed a shellscript in > Wien2k (www.wien2k.at) launches a series of tasks using > > ($remote $remotemachine "cd $PWD;$t

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 16:56 schrieb Ralph Castain: > On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > >> Let me expand on this slightly (in response to Ralph Castain's posting >> -- I had digest mode set). As currently constructed a shellscript in >> Wien2k (www.wien2k.at) launches a series of

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 9:12 AM, Reuti wrote: > Am 03.04.2011 um 16:56 schrieb Ralph Castain: > >> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >> >>> Let me expand on this slightly (in response to Ralph Castain's posting >>> -- I had digest mode set). As currently constructed a shellscript in

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > >> Let me expand on this slightly (in response to Ralph Castain's posting >> -- I had digest mode set). As currently constructed a shellscript in >> Wien2k

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote: > On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: >> >> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >> >>> Let me expand on this slightly (in response to Ralph Castain's posting >>> -- I had digest mode set). As

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote: > >> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: >>> >>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >>> Let me expand on this

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: >>> >>> I am not using that computer. A scenario that I have come across is >>> that when a msub job is killed because it has exceeded it's Walltime >>> mpi tasks spawned by ssh may not be terminated because (so I am told) >>> Torque does not

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread David Singleton
You can prove this to yourself rather easily. Just ssh to a remote node and execute any command that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee that the command will have died. H ... vayu1:~ > ssh v37 sleep

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
> > It most certainly will! That mpirun on nodeB is executing under the ssh from > nodeA, so when that ssh session is killed, it automatically kills everything > run underneath it. And when mpirun dies, so does the job it was running, as > per above. > You can prove this to yourself rather easily.

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 22:57 schrieb Ralph Castain: > On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: > I am not using that computer. A scenario that I have come across is that when a msub job is killed because it has exceeded it's Walltime mpi tasks spawned by ssh may not be

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread David Singleton
On 04/04/2011 12:56 AM, Ralph Castain wrote: What I still don't understand is why you are trying to do it this way. Why not just run time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def where machineN contains the names

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 23:59 schrieb David Singleton: > On 04/04/2011 12:56 AM, Ralph Castain wrote: >> >> What I still don't understand is why you are trying to do it this way. Why >> not just run >> >> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN >>

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
Works great for me...sleep is dead every time. On Apr 3, 2011, at 3:13 PM, David Singleton wrote: > >> You can prove this to yourself rather easily. Just ssh to a remote node and >> execute any command that lingers for awhile - say something simple like >> "sleep". Then kill the ssh and do a

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 3:22 PM, Reuti wrote: > Am 03.04.2011 um 22:57 schrieb Ralph Castain: > >> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: >> > > I am not using that computer. A scenario that I have come across is > that when a msub job is killed because it has exceeded it's

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 4:08 PM, Reuti wrote: > Am 03.04.2011 um 23:59 schrieb David Singleton: > >> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>> >>> What I still don't understand is why you are trying to do it this way. Why >>> not just run >>> >>> time mpirun -v -x LD_LIBRARY_PATH -x PATH

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote: > Am 03.04.2011 um 23:59 schrieb David Singleton: > >> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>> >>> What I still don't understand is why you are trying to do it this way. Why >>> not just run >>> >>> time mpirun -v

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote: > On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote: >> Am 03.04.2011 um 23:59 schrieb David Singleton: >> >>> On 04/04/2011 12:56 AM, Ralph Castain wrote: What I still don't understand is why you are trying to

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
Thanks. I will test this tomorrow. Many people run Wien2k with openmpi as you say, I only became aware of the issue of Wien2k (and perhaps other codes) leaving orphaned processes still running a few days ago. I also know someone who wants to run Wien2k on a system where both rsh and ssh are

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
And, before someone wonders, while Wien2k is a commercial code it is about 500 Eu for a lifetime licence so this is not the same as Vasp or Gaussian which cost $. And, I have no financial interest in the code, but like many others help make it better (semi gnu). On Sun, Apr 3, 2011 at 6:25

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 5:25 PM, Laurence Marks wrote: > Thanks. I will test this tomorrow. > > Many people run Wien2k with openmpi as you say, I only became aware of > the issue of Wien2k (and perhaps other codes) leaving orphaned > processes still running a few days ago. I also know someone who