Anthony Thus sounds as if the environment variable OMP_NUM_THREADS was not sent to the second node. This would be the fault of the mpirun command. You might need to use a particular option.
-erik On Thu, Jan 30, 2020 at 22:17 Shoup, Anthony <[email protected]> wrote: > Hi all, > > I am running ETK (2019_10) on a home built cluster consisting of two nodes > (8 cores, 16 threads, 64GB 4.3 GHz each). I just finished my second node > and am trying to run a simulation (BBHMedRes) over both nodes. For starters > I am just running one process (one thread per process) on each node. When > I execute my simfactory submit command, I get one process with one thread > on the node I submitted the simulation on. However, I get one process with > 16 threads on the second node which I don't want. When I run on just the > first node, the number of processes and threads per process I get are just > what I specify in the simfactory submit command. If I submit the > simulation on the second node and just run on the second node I get > processs/threads just what I specify in the simfactory submit command. Its > only when I run on multiply nodes that don't get the # of processes/threads > that I specify. Is there something I am doing wrong? I am using OpenMPI. > > Thanks for any help, Tony... > > Relevant data is: > > > 1. RunScript: > > #!/bin/sh > > # This runscript is used internally by simfactory as a template during > the > # sim setup and sim setup-silent commands > # Edit at your own risk > > echo "Preparing:" > set -x # Output commands > set -e # Abort on errors > > cd @RUNDIR@-active > > echo "Checking:" > pwd > hostname > date > > echo "Environment:" > export CACTUS_NUM_PROCS=@NUM_PROCS@ > export CACTUS_NUM_THREADS=@NUM_THREADS@ > export GMON_OUT_PREFIX=gmon.out > export OMP_NUM_THREADS=@NUM_THREADS@ > env | sort > SIMFACTORY/ENVIRONMENT > > echo "Starting:" > export CACTUS_STARTTIME=$(date +%s) > > if [ ${CACTUS_NUM_PROCS} = 1 ]; then > if [ @RUNDEBUG@ -eq 0 ]; then > @EXECUTABLE@ -L 3 @PARFILE@ > else > gdb --args @EXECUTABLE@ -L 3 @PARFILE@ > fi > else > mpirun --hostfile /home/mpiuser/mpi-hosts -np @NUM_PROCS@ @EXECUTABLE@ > -L 3 @PARFILE@ > fi > > echo "Stopping:" > date > echo "Done." > > 2. mpi-hosts file: > > localhost slots=1 > RZNode2 slots=1 > > 3. simfactory submit command: ./simfactory/bin/sim submit BBHMedRes > --parfile=par/BBHMedRes.par --procs=2 --num-smt=1 --num-threads=1 > --ppn-used=1 --ppn=1 --wallt > ime=99:0:0 | cat > > 4. Machine file on first node (RZNode1): > > > [RZNode1] > > # This machine description file is used internally by simfactory as a > template > # during the sim setup and sim setup-silent commands > # Edit at your own risk > # Machine description > nickname = RZNode1 > name = RZNode1 > location = somewhere > description = Whatever > status = personal > > # Access to this machine > hostname = RZNode1 > aliaspattern = ^generic\.some\.where$ > > # Source tree management > sourcebasedir = /home/Cactus > optionlist = generic.cfg > submitscript = generic.sub > runscript = generic.run > make = make -j@MAKEJOBS@ > basedir = /home/mpiuser/simulations > ppn = 1 # was 16 > max-num-threads = 1 # was 16 > num-threads = 1 # was 16 > nodes = 2 > submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@ > /@[email protected] 2> @RUNDIR@/@[email protected] & echo $! > getstatus = ps @JOB_ID@ > stop = kill @JOB_ID@ > submitpattern = (.*) > statuspattern = "^ *@JOB_ID@ " > queuedpattern = $^ > runningpattern = ^ > holdingpattern = $^ > exechost = echo localhost > exechostpattern = (.*) > stdout = cat @[email protected] > stderr = cat @[email protected] > stdout-follow = tail -n 100 -f @[email protected] > @[email protected] > > 5. Machine file on second node (RZNode2): > > [RZNode2] > > # This machine description file is used internally by simfactory as a > template > # during the sim setup and sim setup-silent commands > # Edit at your own risk > # Machine description > nickname = RZNode2 > name = RZNode2 > location = somewhere > description = Whatever > status = personal > > # Access to this machine > hostname = RZNode2 > aliaspattern = ^generic\.some\.where$ > > # Source tree management > sourcebasedir = /home/ET_2019_10 > optionlist = generic.cfg > submitscript = generic.sub > runscript = generic.run > make = make -j@MAKEJOBS@ > basedir = /home/mpiuser/simulations > ppn = 1 > max-num-threads = 1 > num-threads = 1 > nodes = 1 > submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@ > /@[email protected] 2> @RUNDIR@/@[email protected] & echo $! > getstatus = ps @JOB_ID@ > stop = kill @JOB_ID@ > submitpattern = (.*) > statuspattern = "^ *@JOB_ID@ " > queuedpattern = $^ > runningpattern = ^ > holdingpattern = $^ > exechost = echo localhost > exechostpattern = (.*) > stdout = cat @[email protected] > stderr = cat @[email protected] > stdout-follow = tail -n 100 -f @[email protected] > @[email protected] > > > > > *Anthony Shoup* PhD, Senior Lecturer > College of Arts & Sciences, College of Engineering Departments of > Physics, Astronomy, EEIC > 315 Science Bldg. | 4250 Campus Dr. Lima, OH 45807 > 419-995-8018 Office | 419-516-2257 Mobile > *[email protected]* > <https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=mailto%3ashoup.31%40osu.edu> > *osu.edu* > <https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=http%3a%2f%2fosu.edu> > _______________________________________________ > Users mailing list > [email protected] > http://lists.einsteintoolkit.org/mailman/listinfo/users > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
