Hello everyone, I am trying to debug through the MPI functionality at our local clusters. I use openmpi 3.0 and the executable were compiled by PGI 10.9. The executable is a regional air quality model called "CAMx" which is widely used in our community. In our local clusters setting, I have a cluster (npsx2) with 24 CPUs with 24G memory and three clusters with 40 CPUs with 65G memory (npsx4, npsx5,npsx6). The OS on all the clusters is CentOS 6.5. I use the command "lstopo" to generate the CPU architecture and attached below.
I can run through the CAMx benchmark case and the outputs is the same as the benchmark outputs by using all the available CPUs across nodes with the command: mpirun -np 72 --hostfile [mynodes.txt] [myexe] Then I move to run my own specific case. CAMx model has the function of MPI as well as OpenMP to speed up the computation. Previously, our group only use the OpenMP, it works smoothly. Now I try to run it use MPI. The wired thing is if I assign 4 cpus, it run through and the results is correct, BUT if I assign 5 CPUs, it will stuck at certain time steps and idle there like forever, furthermore, if I assgin 6 CPUs or more for MPI run, it will crash at the first few time steps and report segmentation fault. My specific case has 5 times more total grids than the benchmark case, so my first guess is the memory issue. However, if I try this on npsx2 with fewer total memory or npsx5 with larger total memory, it has the same error pattern: works for assigning 4 CPUs, idle for assigning 5 CPUs and crash for assigning 6 CPUs. I tried to look at some hints for the previous post, but didn't find particular insightful post. I use the valgrind tool to try to debug the executable on cluster npsx5 as: valgrind mpirun -np 6 [myexe] It crashed with log file attache below and I can not find a clue how to solve it, so please help me to troubleshooting this if you have time. Thanks for your attention and hope your suggestions. Best regards, zhangrui
log.npsx5.segmentation_fault
Description: Binary data
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users