Hi Ian,

thanks for your quick reply!

There are a couple of other guys here who have successfully used the toolkit with this version of openmpi, so I'm not sure if this is at fault. There is not another version of openmpi available currently on the cluster, but I could ask for this to be remedied.

I have tried using procs=48 and num-threads=2 but I find the same problem.

As you say, I recompiled a clean config to double check. I have also used the May 16 version but to no avail.

Could it be something to do with compilation, that has gone wrong somehow, even though it finishes successfully with done?

Cheers,

Chris


On 05/15/2017 09:10 PM, Ian Hinder wrote:

On 15 May 2017, at 16:49, Chris Stevens <[email protected] <mailto:[email protected]>> wrote:

Hi there,

I am new to Cactus, and have been having trouble getting the qc0-mclachlan.par test file to run. I have compiled the latest version of Cactus successfully on the CHPC cluster in South Africa.

I have attached .out and .err files for the run, along with my machine file, optionlist file, run and submit scripts. The submit command was

./simfactory/bin/sim submit mctest --configuration=mclachlantest_mpidebug --parfile=par/qc0-mclachlan.par --procs=240 --num-threads=12 --walltime=10:0:0 --queue=normal --machine=lengau-intel

Using --mca orte_base_help_aggregate 0 in the mpirun command in the runscript, the error is:

[cnode0823:136405] *** An error occurred in MPI_Comm_create_keyval
[cnode0823:136405] *** reported by process [476512257,3]
[cnode0823:136405] *** on communicator MPI_COMM_WORLD
[cnode0823:136405] *** MPI_ERR_ARG: invalid argument of some other kind
[cnode0823:136405] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[cnode0823:136405] ***    and potentially your MPI job)

I unfortunately have no idea where to go from here, and some help would be greatly appreciated! I hope I have attached enough information.

Hi Chris,

Welcome to Cactus!  (meant in a friendly sense, not sarcastic!)

I cannot see anything wrong, and I've never seen this error before. It's a mystery. Have you tried running on fewer MPI processes? I wonder if something is going wrong because the problem size is too small for the number of processes. This *shouldn't* cause a problem, but it's something to try. Is there another MPI implementation, or another version of OpenMPI, available on the machine? Maybe it's a bug in OpenMPI 1.8.8.

When faced with such strange behaviour, it's always worth wiping the configuration and rebuilding it, just in case it was not built cleanly. e.g. when developing the optionlist, maybe you partially built the configuration, then corrected something, and the resulting configuration has a mixture of the two versions. You can do this with rm -rf configs/mclachlantest_mpidebug, though I suspect that the debug version was built cleanly, so this is unlikely to be the problem.

--
Ian Hinder
http://members.aei.mpg.de/ianhin


--
Dr Chris Stevens

Department of Mathematics

Rhodes University

Room 5

Ph: +27 46 603 8932

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to