Hi Ian and Erik, Setting export OMP_NUM_THREADS=1 did the trick! I'm now up and running.
Thank you very much for helping me out! Gwyneth On Sun, Feb 5, 2017 at 9:24 PM, Ian Hinder <[email protected]> wrote: > > On 5 Feb 2017, at 18:09, Gwyneth Allwright <[email protected]> wrote: > > Hi Ian and Erik, > > Thank you very much for all the advice and pointers so far! > > I didn't compile the ET myself; it was done by an HPC engineer. He is > unfamiliar with Cactus and started off not using a config file, so he had > to troubleshoot his way through the compilation process. We are both > scratching our heads about what the issue with mpirun could be. > > I suspect he didn't set MPI_DIR, so I'm going to suggest that he fixes > that and see if recompiling takes care of things. > > The scheduler automatically terminates jobs that run on too many > processors. For my simulation, this appears to happen as soon as > TwoPunctures starts generating the initial data. I then get error messages > of the form: "Job terminated as it used more cores (17.6) than requested > (4)." (I switched from requesting 3 processors to requesting 4.) The number > of cores it tries to use appears to differ from run to run. > > The parameter file uses Carpet. It generates the following output (when I > request 4 processors): > > INFO (Carpet): MPI is enabled > INFO (Carpet): Carpet is running on 4 processes > INFO (Carpet): This is process 0 > INFO (Carpet): OpenMP is enabled > INFO (Carpet): This process contains 16 threads, this is thread 0 > INFO (Carpet): There are 64 threads in total > INFO (Carpet): There are 16 threads per process > > > It looks like mpirun has started the 4 processes that you asked for, and > each of those processes has started 16 threads. The ET uses OpenMP threads > by default. You need to set the environment variable OMP_NUM_THREADS to > the number of threads you want per process. If you just want 4 MPI > processes, each with one thread, then you can try putting > > export OMP_NUM_THREADS=1 > > before your mpirun command. On Linux, OMP_NUM_THREADS defaults to the > number of "hardware threads" in the system (which will likely be the number > of cores multiplied by 2, if hyperthreading is enabled). So a single > process that supports OpenMP will use all the cores available. If you want > to have more than one MPI process using OpenMP on the same node, you will > have to restrict the number of threads per process. > > Carpet has a couple of environment variables which is uses to cross-check > that you have the number of MPI processes and threads that you were > expecting. To help with debugging, you can set > > export CACTUS_NUM_THREADS=1 > export CACTUS_NUM_PROCS=4 > > if you want 4 processes with one thread each. This won't affect the > number of threads or processes, but it will allow Carpet to check that what > you intended matches reality. In this case, it should abort with an error > (or in older versions of Carpet, output a warning), since while you have 4 > processes, each one has 16 threads, not 1. > > Mpirun gives me the following information for the node allocation: > slots=4, max_slots=0, slots_inuse=0, state=UP. > > The tree view of the processes looks like this: > > PID TTY STAT TIME COMMAND > 19503 ? S 0:00 sshd: allgwy001@pts/7 > > 19504 pts/7 Ss 0:00 \_ -bash > 6047 pts/7 R+ 0:00 \_ ps -u allgwy001 f > > > This is not showing the Cactus or mpirun process at all; something is > wrong. Was Cactus running when you typed this? Were you logged in to the > node that it was running on? > > Adding "cat $PBS_NODEFILE" to my PBS script didn't seem to produce > anything, although I could be doing something stupid. I'm very new to the > syntax! > > > That's odd. > > -- > Ian Hinder > http://members.aei.mpg.de/ianhin > > Disclaimer - University of Cape Town This e-mail is subject to UCT > policies and e-mail disclaimer published on our website at > http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 > 21 650 9111 <+27%2021%20650%209111>. If this e-mail is not related to the > business of UCT, it is sent by the sender in an individual capacity. Please > report security incidents or abuse via [email protected] >
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
