On Apr 3, 2011, at 9:12 AM, Reuti wrote:

> Am 03.04.2011 um 16:56 schrieb Ralph Castain:
> 
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>> 
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set). As currently constructed a shellscript in
>>> Wien2k (www.wien2k.at) launches a series of tasks using
>>> 
>>> ($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]")
>>>>> .time1_$loop &
>>> 
>>> where the standard setting for "remote" is "ssh", remotemachine is the
>>> appropriate host, "t" is "time" and "ttt" is a concatenation of
>>> commands, for instance when using 2 cores on one node for Task1, 2
>>> cores on 2 nodes for Task2 and 2 cores on 1 node for Task3
>>> 
>>> Task1:
>>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine1
>>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
>>> Task2:
>>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 4 -machinefile .machine2
>>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_2.def
>>> Task3:
>>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine3
>>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_3.def
>>> 
>>> This is a stable script, works under SGI, linux, mvapich and many
>>> others using ssh or rsh (although I've never myself used it with rsh).
>>> It is general purpose, i.e. will work to run just 1 task on 8x8
>>> nodes/cores or 8 parallel tasks on 8 nodes all with 8 cores or any
>>> scatter of nodes/cores.
>>> 
>>> According to some, ssh is becoming obsolete within supercomputers and
>>> the "replacement" is pbsdsh at least under Torque.
>> 
>> Somebody is playing an April Fools joke on you. The majority of 
>> supercomputers use ssh as their sole launch mechanism, and I have seen no
>> indication that anyone intends to change that situation. That said, Torque 
>> is certainly popular and a good environment.
> 
> I operate my Linux clusters without `ssh` or `rsh`. I use SGE's `qrsh` 
> instead. How will you get a tight integration with correct accounting and job 
> control otherwise? This might be different when you have an AIX or NEC SX 
> machine, as they provide additonal control mechanisms.

Like I said, the majority of supercomputers use ssh as their sole -launch- 
mechanism. They use a variety of methods for resource management, which is a 
separate issue.

I'm not arguing the good/bad of any arrangement. We support quite a few, as you 
know :-)

Just saying that the notion that ssh is going away isn't supported by the facts.

> 
> -- Reuti
> 
> 
>>> Getting pbsdsh is
>>> certainly not as simple as the documentation I've seen. To get it to
>>> even partially work I am using for "remote" a script "pbsh" which
>>> creates an executable bash file where HOME, PATH, LD_LIBRARY_PATH etc
>>> as well as the PBS environmental variables listed at the bottom of
>>> http://www.bear.bham.ac.uk/bluebear/pbsdsh.shtml plus PBS_NODEFILE to
>>> a file $PBS_O_WORKDIR/.tmp_$1 followed by the relevant command and
>>> then runs
>>> 
>>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1  "
>>> 
>>> This works fine so long as Task2 does not have 2 nodes (probably 3 as
>>> well, I've not tested this). If it does there is a communications
>>> failure with nothing launched on the 2nd node of Task2.
>>> 
>>> I'm including the script below, as maybe there are some other
>>> environmental variables needed or some should not be there in order to
>>> properly rebuilt the environment so things will work. (And yes, I know
>>> there should be tests to see if the variables are set first and so
>>> forth and this is not so clean, this is just an initial version.)
>> 
>> By providing all those PBS-related envars to OMPI, you are causing OMPI to 
>> think it should use Torque as the launch mechanism. Unfortunately, that 
>> won't work in this scenario.
>> 
>> When you start a Torque job (get an allocation etc.), Torque puts you on one 
>> of the allocated nodes and creates a "sister mom" on that node. This is your 
>> job's "master node". All Torque-based launches must take place from that 
>> location.
>> 
>> So when you pbsdsh to another node and attempt to execute mpirun with those 
>> envars set, mpirun attempts to contact the local "sister mom" so it can 
>> order the launch of any daemons on other nodes....only the "sister mom" 
>> isn't there! So the connection fails and mpirun aborts.
>> 
>> If mpirun is -only- launching procs on the local node, then it doesn't need 
>> to launch another daemon (as mpirun will host the local procs itself), and 
>> so it doesn't attempt to contact the "sister mom" and the comm failure 
>> doesn't occur.
>> 
>> What I still don't understand is why you are trying to do it this way. Why 
>> not just run
>> 
>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN 
>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
>> 
>> where machineN contains the names of the nodes where you want the MPI apps 
>> to execute? mpirun will only execute apps on those nodes, so this 
>> accomplishes the same thing as your script - only with a lot less pain.
>> 
>> Your script would just contain a sequence of these commands, each with its 
>> number of procs and machinefile as required.
>> 
>> Actually, it would be pretty much identical to the script I use when doing 
>> scaling tests...
>> 
>> 
>>> 
>>> ----------
>>> # Script to replace ssh by pbsdsh
>>> # Beta version, April 2011, L. D. Marks
>>> #
>>> # Remove old file -- needed !
>>> rm -f $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> # Create a script that exports the environment we have
>>> # This may not be enough
>>> echo #!/bin/bash > $PBS_O_WORKDIR/.tmp_$1
>>> echo source $HOME/.bashrc                       >> $PBS_O_WORKDIR/.tmp_$1
>>> echo cd $PBS_O_WORKDIR                          >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PATH=$PBS_O_PATH                    >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export TMPDIR=$TMPDIR                      >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export SCRATCH=$SCRATCH                    >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH    >> $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> # Openmpi needs to have this defined, even if we don't use it
>>> echo export PBS_NODEFILE=$PBS_NODEFILE >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_ENVIRONMENT=$PBS_ENVIRONMENT    >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_JOBCOOKIE=$PBS_JOBCOOKIE        >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_JOBID=$PBS_JOBID                >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_JOBNAME=$PBS_JOBNAME            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_MOMPORT=$PBS_MOMPORT            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_NODENUM=$PBS_NODENUM            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_HOME=$PBS_O_HOME              >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_HOST=$PBS_O_HOST              >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_LANG=$PBS_O_LANG              >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_LOGNAME=$PBS_O_LOGNAME        >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_MAIL=$PBS_O_MAIL              >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_PATH=$PBS_O_PATH              >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_QUEUE=$PBS_O_QUEUE            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_SHELL=$PBS_O_SHELL            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_O_WORKDIR=$PBS_O_WORKDIR        >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_QUEUE=$PBS_QUEUE                >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_TASKNUM=$PBS_TASKNUM            >> $PBS_O_WORKDIR/.tmp_$1
>>> echo export PBS_VNODENUM=$PBS_VNODENUM          >> $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> # Now the command we want to run
>>> echo $2 >> $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> # Make it executable
>>> chmod a+x $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1  "
>>> 
>>> #Cleanup if needed (commented out for debugging)
>>> #rm $PBS_O_WORKDIR/.tmp_$1
>>> 
>>> 
>>> On Sat, Apr 2, 2011 at 9:36 PM, Laurence Marks <l-ma...@northwestern.edu> 
>>> wrote:
>>>> I have a problem which may or may not be openmpi, but since this list
>>>> was useful before with a race condition I am posting.
>>>> 
>>>> I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
>>>> Torque does not know about ssh tasks launched from a task. In a simple
>>>> case, a script launches three mpi tasks in parallel,
>>>> 
>>>> Task1: NodeA
>>>> Task2: NodeB and NodeC
>>>> Task3: NodeD
>>>> 
>>>> (some cores on each, all handled correctly). Reproducible (but with
>>>> different nodes and numbers of cores) Task1 and Task3 work fine, the
>>>> mpi task starts on NodeB but nothing starts on NodeC, it appears that
>>>> NodeC does not communicate. It does not have to be this it could be
>>>> 
>>>> Task1: NodeA NodeB
>>>> Task2: NodeC NodeD
>>>> 
>>>> Here NodeC will start and it looks as if NodeD never starts anything.
>>>> I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
>>>> Node (number of cores do not matter) it is fine.
>>>> 
>>>> --
>>>> Laurence Marks
>>>> Department of Materials Science and Engineering
>>>> MSE Rm 2036 Cook Hall
>>>> 2220 N Campus Drive
>>>> Northwestern University
>>>> Evanston, IL 60208, USA
>>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>>> email: L-marks at northwestern dot edu
>>>> Web: www.numis.northwestern.edu
>>>> Chair, Commission on Electron Crystallography of IUCR
>>>> www.numis.northwestern.edu/
>>>> Research is to see what everybody else has seen, and to think what
>>>> nobody else has thought
>>>> Albert Szent-Györgi
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Laurence Marks
>>> Department of Materials Science and Engineering
>>> MSE Rm 2036 Cook Hall
>>> 2220 N Campus Drive
>>> Northwestern University
>>> Evanston, IL 60208, USA
>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>> email: L-marks at northwestern dot edu
>>> Web: www.numis.northwestern.edu
>>> Chair, Commission on Electron Crystallography of IUCR
>>> www.numis.northwestern.edu/
>>> Research is to see what everybody else has seen, and to think what
>>> nobody else has thought
>>> Albert Szent-Györgi
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to