Hi Erik, Thanks for the info. I will try that. Tony…
From: Erik Schnetter <[email protected]> Sent: Friday, January 31, 2020 9:01 AM To: Shoup, Anthony <[email protected]> Cc: Einstein Toolkit Users <[email protected]> Subject: Re: [Users] Getting too many threads per process started on "remote" nodes Anthony Thus sounds as if the environment variable OMP_NUM_THREADS was not sent to the second node. This would be the fault of the mpirun command. You might need to use a particular option. -erik On Thu, Jan 30, 2020 at 22:17 Shoup, Anthony <[email protected]<mailto:[email protected]>> wrote: Hi all, I am running ETK (2019_10) on a home built cluster consisting of two nodes (8 cores, 16 threads, 64GB 4.3 GHz each). I just finished my second node and am trying to run a simulation (BBHMedRes) over both nodes. For starters I am just running one process (one thread per process) on each node. When I execute my simfactory submit command, I get one process with one thread on the node I submitted the simulation on. However, I get one process with 16 threads on the second node which I don't want. When I run on just the first node, the number of processes and threads per process I get are just what I specify in the simfactory submit command. If I submit the simulation on the second node and just run on the second node I get processs/threads just what I specify in the simfactory submit command. Its only when I run on multiply nodes that don't get the # of processes/threads that I specify. Is there something I am doing wrong? I am using OpenMPI. Thanks for any help, Tony... Relevant data is: 1. RunScript: #!/bin/sh # This runscript is used internally by simfactory as a template during the # sim setup and sim setup-silent commands # Edit at your own risk echo "Preparing:" set -x # Output commands set -e # Abort on errors cd @RUNDIR@-active echo "Checking:" pwd hostname date echo "Environment:" export CACTUS_NUM_PROCS=@NUM_PROCS@ export CACTUS_NUM_THREADS=@NUM_THREADS@ export GMON_OUT_PREFIX=gmon.out export OMP_NUM_THREADS=@NUM_THREADS@ env | sort > SIMFACTORY/ENVIRONMENT echo "Starting:" export CACTUS_STARTTIME=$(date +%s) if [ ${CACTUS_NUM_PROCS} = 1 ]; then if [ @RUNDEBUG@ -eq 0 ]; then @EXECUTABLE@ -L 3 @PARFILE@ else gdb --args @EXECUTABLE@ -L 3 @PARFILE@ fi else mpirun --hostfile /home/mpiuser/mpi-hosts -np @NUM_PROCS@ @EXECUTABLE@ -L 3 @PARFILE@ fi echo "Stopping:" date echo "Done." 1. mpi-hosts file: localhost slots=1 RZNode2 slots=1 1. simfactory submit command: ./simfactory/bin/sim submit BBHMedRes --parfile=par/BBHMedRes.par --procs=2 --num-smt=1 --num-threads=1 --ppn-used=1 --ppn=1 --wallt ime=99:0:0 | cat 2. Machine file on first node (RZNode1): [RZNode1] # This machine description file is used internally by simfactory as a template # during the sim setup and sim setup-silent commands # Edit at your own risk # Machine description nickname = RZNode1 name = RZNode1 location = somewhere description = Whatever status = personal # Access to this machine hostname = RZNode1 aliaspattern = ^generic\.some\.where$ # Source tree management sourcebasedir = /home/Cactus optionlist = generic.cfg submitscript = generic.sub runscript = generic.run make = make -j@MAKEJOBS@ basedir = /home/mpiuser/simulations ppn = 1 # was 16 max-num-threads = 1 # was 16 num-threads = 1 # was 16 nodes = 2 submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@/@[email protected] 2> @RUNDIR@/@[email protected] & echo $! getstatus = ps @JOB_ID@ stop = kill @JOB_ID@ submitpattern = (.*) statuspattern = "^ *@JOB_ID@ " queuedpattern = $^ runningpattern = ^ holdingpattern = $^ exechost = echo localhost exechostpattern = (.*) stdout = cat @[email protected] stderr = cat @[email protected] stdout-follow = tail -n 100 -f @[email protected] @[email protected] 1. Machine file on second node (RZNode2): [RZNode2] # This machine description file is used internally by simfactory as a template # during the sim setup and sim setup-silent commands # Edit at your own risk # Machine description nickname = RZNode2 name = RZNode2 location = somewhere description = Whatever status = personal # Access to this machine hostname = RZNode2 aliaspattern = ^generic\.some\.where$ # Source tree management sourcebasedir = /home/ET_2019_10 optionlist = generic.cfg submitscript = generic.sub runscript = generic.run make = make -j@MAKEJOBS@ basedir = /home/mpiuser/simulations ppn = 1 max-num-threads = 1 num-threads = 1 nodes = 1 submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@/@[email protected] 2> @RUNDIR@/@[email protected] & echo $! getstatus = ps @JOB_ID@ stop = kill @JOB_ID@ submitpattern = (.*) statuspattern = "^ *@JOB_ID@ " queuedpattern = $^ runningpattern = ^ holdingpattern = $^ exechost = echo localhost exechostpattern = (.*) stdout = cat @[email protected] stderr = cat @[email protected] stdout-follow = tail -n 100 -f @[email protected] @[email protected] [https://email.osu.edu/owa/attachment.ashx?id=RgAAAAAb%2fHy0wVvTSoHQx8OJXAaLBwCiA5IZrwRKTqiLVNbt4xWyAAAAAAFUAADltPc25wRDT4tJbW9en2wXAHKArVn%2fAAAJ&attcnt=1&attid0=EAD8Cse5Lj5uQ6ZJWk98Q%2blj] Anthony Shoup PhD, Senior Lecturer College of Arts & Sciences, College of Engineering Departments of Physics, Astronomy, EEIC 315 Science Bldg. | 4250 Campus Dr. Lima, OH 45807 419-995-8018 Office | 419-516-2257 Mobile [email protected]<https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=mailto%3ashoup.31%40osu.edu> osu.edu<https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=http%3a%2f%2fosu.edu> _______________________________________________ Users mailing list [email protected]<mailto:[email protected]> http://lists.einsteintoolkit.org/mailman/listinfo/users<https://urldefense.com/v3/__http:/lists.einsteintoolkit.org/mailman/listinfo/users__;!!KGKeukY!jUlB7XYEfe8UiFCjZejH7x0NHdXyZRt5OCcbfMlxrgf6Z9Ir5xjkxswipzGAyF2t$> -- Erik Schnetter <[email protected]<mailto:[email protected]>> http://www.perimeterinstitute.ca/personal/eschnetter/<https://urldefense.com/v3/__http:/www.perimeterinstitute.ca/personal/eschnetter/__;!!KGKeukY!jUlB7XYEfe8UiFCjZejH7x0NHdXyZRt5OCcbfMlxrgf6Z9Ir5xjkxswipxVTOxb_$>
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
