Hello Maya, Sorry for the long delay in responding.
> My name is Maya Baireddy. I am working on my school research project trying > to run the simulation of BNS merger on amarel supercomputer from my local > university. Thanks for contacting me. I will try and see if I can provide some help. > Could you please help me to start my simulation on SLURM. I have followed > the ETK gallery example for BNS simulation steps 1-5. But I am not able to > proceed to successfully create a machine to run the simulation. > > I run the following steps > /home/sb1554/BNS/simfactory/bin/sim create bns --parfile > /home/sb1554/BNS/bns.par --machine slurmbns > > srun bns.sh -o slurm.bns.%N.%j.out Looks ok to me. > and got the error: > > **** An error occurred in MPI_Init_thread*** on a NULL communicator*** > MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,*** > and potentially your MPI job)* With errors like this my first guess is usually that he MPI stack used to compile and the one used at runtime is not the same. SLURM in particular can be dicey in that respect since it can try and directly interface with MPI. > I am attaching my machine, submit script, run script, log files. Thank you. > I would appreciate any pointers from you. Or if you could point me to the > right person. > > I was trying to post this on EKT forum, but need one credit to post. To the gitter chat? I though that was open to all with no requirements? This one: https://gitter.im/EinsteinToolkit/EinsteinToolkit The mailing list users@einsteintoolkit.org was unavailable for a while due to required maintenance. You would have to sign up to post, otherwise the post will be held for "moderator approval" (which it will receive, just may take a little bit of time). Now for the actual question. The machine.ini file (slurmbns.ini-machine.txt) looks strange (eg it contains two [slurmbns] sections). The line submit = sbatch /home/sb1554/BNS/simfactory/mdb/runscripts/slurmbns.run will always make is use the file "/home/sb1554/BNS/simfactory/mdb/runscripts/slurmbns.run" as the file passed to SLURM as the job script. The "submit" entry should be just "sbatch" without the extra file name. There is no "envsetup" section so you will have to yourself make sure that he same modules (in particular mpi modules) are loaded when you compile and when you submit the job, otherwise SLURM (and srun) may use the wrong MPI stack. The run script "slurmbns.ini-runscript.txt" is also strange since it contains a "/home/sb1554/BNS/simfactory/bin/sim run" which would again call the runscript. Instead it should contain the "srun" call. The submitscript is also strange since it should end with a line calling "sim run". It may be best to first directly add the srun command to the SLURM batch file (the "submitscript" mostly since it has the SBATCH headers) and set the headers by hand, load the modules, and call srun from there. This should make it look very similar to a MPI+OpenMP "Hybrid" parallelization example submit script that your cluster admins may provide. For SLURM based machines, the machine ini files usually look very similar. In your case I would suggest taking a look at say the ones for the Delta cluster at NCSA: https://bitbucket.org/simfactory/simfactory2/src/master/mdb/machines/delta.ini https://bitbucket.org/simfactory/simfactory2/src/master/mdb/optionlists/delta.cfg https://bitbucket.org/simfactory/simfactory2/src/master/mdb/runscripts/delta.run https://bitbucket.org/simfactory/simfactory2/src/master/mdb/submitscripts/delta.sub You may also want to call in to the Einstein Toolkit weekly call on Thursday (the Gitter chat may also work for real time communication). Yours, Roland -- My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu .
pgpop1F3KDDZT.pgp
Description: OpenPGP digital signature
_______________________________________________ Users mailing list Users@einsteintoolkit.org http://lists.einsteintoolkit.org/mailman/listinfo/users