On Apr 3, 2011, at 9:12 AM, Reuti wrote: > Am 03.04.2011 um 16:56 schrieb Ralph Castain: > >> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >> >>> Let me expand on this slightly (in response to Ralph Castain's posting >>> -- I had digest mode set). As currently constructed a shellscript in >>> Wien2k (www.wien2k.at) launches a series of tasks using >>> >>> ($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]") >>>>> .time1_$loop & >>> >>> where the standard setting for "remote" is "ssh", remotemachine is the >>> appropriate host, "t" is "time" and "ttt" is a concatenation of >>> commands, for instance when using 2 cores on one node for Task1, 2 >>> cores on 2 nodes for Task2 and 2 cores on 1 node for Task3 >>> >>> Task1: >>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine1 >>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def >>> Task2: >>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 4 -machinefile .machine2 >>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_2.def >>> Task3: >>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine3 >>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_3.def >>> >>> This is a stable script, works under SGI, linux, mvapich and many >>> others using ssh or rsh (although I've never myself used it with rsh). >>> It is general purpose, i.e. will work to run just 1 task on 8x8 >>> nodes/cores or 8 parallel tasks on 8 nodes all with 8 cores or any >>> scatter of nodes/cores. >>> >>> According to some, ssh is becoming obsolete within supercomputers and >>> the "replacement" is pbsdsh at least under Torque. >> >> Somebody is playing an April Fools joke on you. The majority of >> supercomputers use ssh as their sole launch mechanism, and I have seen no >> indication that anyone intends to change that situation. That said, Torque >> is certainly popular and a good environment. > > I operate my Linux clusters without `ssh` or `rsh`. I use SGE's `qrsh` > instead. How will you get a tight integration with correct accounting and job > control otherwise? This might be different when you have an AIX or NEC SX > machine, as they provide additonal control mechanisms.
Like I said, the majority of supercomputers use ssh as their sole -launch- mechanism. They use a variety of methods for resource management, which is a separate issue. I'm not arguing the good/bad of any arrangement. We support quite a few, as you know :-) Just saying that the notion that ssh is going away isn't supported by the facts. > > -- Reuti > > >>> Getting pbsdsh is >>> certainly not as simple as the documentation I've seen. To get it to >>> even partially work I am using for "remote" a script "pbsh" which >>> creates an executable bash file where HOME, PATH, LD_LIBRARY_PATH etc >>> as well as the PBS environmental variables listed at the bottom of >>> http://www.bear.bham.ac.uk/bluebear/pbsdsh.shtml plus PBS_NODEFILE to >>> a file $PBS_O_WORKDIR/.tmp_$1 followed by the relevant command and >>> then runs >>> >>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1 " >>> >>> This works fine so long as Task2 does not have 2 nodes (probably 3 as >>> well, I've not tested this). If it does there is a communications >>> failure with nothing launched on the 2nd node of Task2. >>> >>> I'm including the script below, as maybe there are some other >>> environmental variables needed or some should not be there in order to >>> properly rebuilt the environment so things will work. (And yes, I know >>> there should be tests to see if the variables are set first and so >>> forth and this is not so clean, this is just an initial version.) >> >> By providing all those PBS-related envars to OMPI, you are causing OMPI to >> think it should use Torque as the launch mechanism. Unfortunately, that >> won't work in this scenario. >> >> When you start a Torque job (get an allocation etc.), Torque puts you on one >> of the allocated nodes and creates a "sister mom" on that node. This is your >> job's "master node". All Torque-based launches must take place from that >> location. >> >> So when you pbsdsh to another node and attempt to execute mpirun with those >> envars set, mpirun attempts to contact the local "sister mom" so it can >> order the launch of any daemons on other nodes....only the "sister mom" >> isn't there! So the connection fails and mpirun aborts. >> >> If mpirun is -only- launching procs on the local node, then it doesn't need >> to launch another daemon (as mpirun will host the local procs itself), and >> so it doesn't attempt to contact the "sister mom" and the comm failure >> doesn't occur. >> >> What I still don't understand is why you are trying to do it this way. Why >> not just run >> >> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN >> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def >> >> where machineN contains the names of the nodes where you want the MPI apps >> to execute? mpirun will only execute apps on those nodes, so this >> accomplishes the same thing as your script - only with a lot less pain. >> >> Your script would just contain a sequence of these commands, each with its >> number of procs and machinefile as required. >> >> Actually, it would be pretty much identical to the script I use when doing >> scaling tests... >> >> >>> >>> ---------- >>> # Script to replace ssh by pbsdsh >>> # Beta version, April 2011, L. D. Marks >>> # >>> # Remove old file -- needed ! >>> rm -f $PBS_O_WORKDIR/.tmp_$1 >>> >>> # Create a script that exports the environment we have >>> # This may not be enough >>> echo #!/bin/bash > $PBS_O_WORKDIR/.tmp_$1 >>> echo source $HOME/.bashrc >> $PBS_O_WORKDIR/.tmp_$1 >>> echo cd $PBS_O_WORKDIR >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PATH=$PBS_O_PATH >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export TMPDIR=$TMPDIR >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export SCRATCH=$SCRATCH >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH >> $PBS_O_WORKDIR/.tmp_$1 >>> >>> # Openmpi needs to have this defined, even if we don't use it >>> echo export PBS_NODEFILE=$PBS_NODEFILE >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_ENVIRONMENT=$PBS_ENVIRONMENT >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_JOBCOOKIE=$PBS_JOBCOOKIE >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_JOBID=$PBS_JOBID >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_JOBNAME=$PBS_JOBNAME >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_MOMPORT=$PBS_MOMPORT >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_NODENUM=$PBS_NODENUM >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_HOME=$PBS_O_HOME >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_HOST=$PBS_O_HOST >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_LANG=$PBS_O_LANG >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_LOGNAME=$PBS_O_LOGNAME >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_MAIL=$PBS_O_MAIL >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_PATH=$PBS_O_PATH >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_QUEUE=$PBS_O_QUEUE >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_SHELL=$PBS_O_SHELL >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_O_WORKDIR=$PBS_O_WORKDIR >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_QUEUE=$PBS_QUEUE >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_TASKNUM=$PBS_TASKNUM >> $PBS_O_WORKDIR/.tmp_$1 >>> echo export PBS_VNODENUM=$PBS_VNODENUM >> $PBS_O_WORKDIR/.tmp_$1 >>> >>> # Now the command we want to run >>> echo $2 >> $PBS_O_WORKDIR/.tmp_$1 >>> >>> # Make it executable >>> chmod a+x $PBS_O_WORKDIR/.tmp_$1 >>> >>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1 " >>> >>> #Cleanup if needed (commented out for debugging) >>> #rm $PBS_O_WORKDIR/.tmp_$1 >>> >>> >>> On Sat, Apr 2, 2011 at 9:36 PM, Laurence Marks <l-ma...@northwestern.edu> >>> wrote: >>>> I have a problem which may or may not be openmpi, but since this list >>>> was useful before with a race condition I am posting. >>>> >>>> I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as >>>> Torque does not know about ssh tasks launched from a task. In a simple >>>> case, a script launches three mpi tasks in parallel, >>>> >>>> Task1: NodeA >>>> Task2: NodeB and NodeC >>>> Task3: NodeD >>>> >>>> (some cores on each, all handled correctly). Reproducible (but with >>>> different nodes and numbers of cores) Task1 and Task3 work fine, the >>>> mpi task starts on NodeB but nothing starts on NodeC, it appears that >>>> NodeC does not communicate. It does not have to be this it could be >>>> >>>> Task1: NodeA NodeB >>>> Task2: NodeC NodeD >>>> >>>> Here NodeC will start and it looks as if NodeD never starts anything. >>>> I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one >>>> Node (number of cores do not matter) it is fine. >>>> >>>> -- >>>> Laurence Marks >>>> Department of Materials Science and Engineering >>>> MSE Rm 2036 Cook Hall >>>> 2220 N Campus Drive >>>> Northwestern University >>>> Evanston, IL 60208, USA >>>> Tel: (847) 491-3996 Fax: (847) 491-7820 >>>> email: L-marks at northwestern dot edu >>>> Web: www.numis.northwestern.edu >>>> Chair, Commission on Electron Crystallography of IUCR >>>> www.numis.northwestern.edu/ >>>> Research is to see what everybody else has seen, and to think what >>>> nobody else has thought >>>> Albert Szent-Györgi >>>> >>> >>> >>> >>> -- >>> Laurence Marks >>> Department of Materials Science and Engineering >>> MSE Rm 2036 Cook Hall >>> 2220 N Campus Drive >>> Northwestern University >>> Evanston, IL 60208, USA >>> Tel: (847) 491-3996 Fax: (847) 491-7820 >>> email: L-marks at northwestern dot edu >>> Web: www.numis.northwestern.edu >>> Chair, Commission on Electron Crystallography of IUCR >>> www.numis.northwestern.edu/ >>> Research is to see what everybody else has seen, and to think what >>> nobody else has thought >>> Albert Szent-Györgi >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users