Hello Shamim,
The error says that you're calling MPI with the wrong parameters,
specificall -npernode. Since you're using slurm, MPI should be smart
enough that you don't need to pass -n, -npernode, How did you get a
Runscript and Submitscript for this machine. Did you create yourself?
--Steve
On 5/1/2024 6:54 AM, Shamim Haque 1910511 wrote:
Hi all,
I am attempting ETK installation in KALINGA Cluster at NISER, India.
This cluster has 40 procs per node and SLURM workload manager.
I compiled ETK with gcc-7.5 and openmpi-4.0.5 (attached the
machinefile, optionlist, submitscript and runscript). The installation
is mostly alright, as I can run parfiles for test TOV and BNS mergers.
I tried to run a simulation with procs=160 (nodes 4) and num-threads=1
but landed with this error (error file also attached):
/+ mpiexec -n 640 -npernode 40.0
/home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/SIMFACTORY/exe/cactus_sim
-L 3
/home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/output-0000/eos20_dx25_r500_rg7.par
----------------------------------------------------------------------------
Open MPI has detected that a parameter given to a command line
option does not match the expected format:
Option: npernode
Param: 40.0
This is frequently caused by omitting to provide the parameter
to an option that requires one. Please check the command line and try
again.
----------------------------------------------------------------------------
/
Strangely, this error is not at all regular. Mostly, the error won't
appear, and the simulation works just fine (with no changes being made
in the scripts or simfactory command). In fact, this exact simulation
has worked fine before. Since I am unable to find the source of this
issue, I am also unable to recreate the error on my own. But it does
kick in occasionally.
My command for mpi execution in runscript looks like this:
/time mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@
@EXECUTABLE@ -L 3 @PARFILE@/
If I replace / @(@PPN_USED@ / @NUM_THREADS@)@ /with a desired value,
then the script always works. My simfactory command looks like this:
/./simfactory/bin/sim create-submit dx25_r500_rg7_t30_p640-1_2
--parfile=par-smooth/scale_test/eos20_dx25_r500_rg7.par --queue=large1
--procs=640 --num-threads=1 --walltime=00:45:00
/
I am unable to understand how to solve this issue. Any help with this
issue is appreciated. Please let me know if you need more information.
Thank you.
Regards
Shamim Haque
Senior Research Fellow (SRF)
Department of Physics
IISER Bhopal
ᐧ
_______________________________________________
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users