Folks, Thanks for your help and prompt replies. We appreciate all the support we get from the community.
S. -- Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM, USA On 12/4/18, 6:57 PM, "users on behalf of Gilles Gouaillardet" <users-boun...@lists.open-mpi.org on behalf of gil...@rist.or.jp> wrote: Thanks for the report. As far as I am concerned, this is a bug in the IMB benchmark, and I issued a PR to fix that https://github.com/intel/mpi-benchmarks/pull/11 Meanwhile, you can manually download and apply the patch at https://github.com/intel/mpi-benchmarks/pull/11.patch Cheers, Gilles On 12/4/2018 4:41 AM, Hammond, Simon David via users wrote: > Hi Open MPI Users, > > Just wanted to report a bug we have seen with OpenMPI 3.1.3 and 4.0.0 when using the Intel 2019 Update 1 compilers on our Skylake/OmniPath-1 cluster. The bug occurs when running the Github master src_c variant of the Intel MPI Benchmarks. > > Configuration: > > ./configure --prefix=/home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144 --with-slurm --with-psm2 CC=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/icc CXX=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/icpc FC=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/ifort --with-zlib=/home/projects/x86-64/zlib/1.2.11 --with-valgrind=/home/projects/x86-64/valgrind/3.13.0 > > Operating System is RedHat 7.4 release and we utilize a local build of GCC 7.2.0 for our Intel compiler (C++) header files. Everything makes correctly, and passes a make check without any issues. > > We then compile IMB and run IMB-MPI1 on 24 nodes and get the following: > > #---------------------------------------------------------------- > # Benchmarking Reduce_scatter > # #processes = 64 > # ( 1088 additional processes waiting in MPI_Barrier) > #---------------------------------------------------------------- > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > 0 1000 0.18 0.19 0.18 > 4 1000 7.39 10.37 8.68 > 8 1000 7.84 11.14 9.23 > 16 1000 8.50 12.37 10.14 > 32 1000 10.37 14.66 12.15 > 64 1000 13.76 18.82 16.17 > 128 1000 21.63 27.61 24.87 > 256 1000 39.98 47.27 43.96 > 512 1000 72.93 78.59 75.15 > 1024 1000 147.21 152.98 149.94 > 2048 1000 413.41 426.90 420.15 > 4096 1000 421.28 442.58 434.52 > 8192 1000 418.31 450.20 438.51 > 16384 1000 1082.85 1221.44 1140.92 > 32768 1000 2434.11 2529.90 2476.72 > 65536 640 5469.57 6048.60 5687.08 > 131072 320 11702.94 12435.06 12075.07 > 262144 160 19214.42 20433.83 19883.80 > 524288 80 49462.22 53896.43 52101.56 > 1048576 40 119422.53 131922.20 126920.99 > 2097152 20 256345.97 288185.72 275767.05 > [node06:351648] *** Process received signal *** > [node06:351648] Signal: Segmentation fault (11) > [node06:351648] Signal code: Invalid permissions (2) > [node06:351648] Failing at address: 0x7fdb6efc4000 > [node06:351648] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7fdb8646c5e0] > [node06:351648] [ 1] ./IMB-MPI1(__intel_avx_rep_memcpy+0x140)[0x415380] > [node06:351648] [ 2] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libopen-pal.so.40(opal_datatype_copy_content_same_ddt+0xca)[0x7fdb858d847a] > [node06:351648] [ 3] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x3f9)[0x7fdb86c43b29] > [node06:351648] [ 4] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1d7)[0x7fdb86c1de67] > [node06:351648] [ 5] ./IMB-MPI1[0x40d624] > [node06:351648] [ 6] ./IMB-MPI1[0x407d16] > [node06:351648] [ 7] ./IMB-MPI1[0x403356] > [node06:351648] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdb860bbc05] > [node06:351648] [ 9] ./IMB-MPI1[0x402da9] > [node06:351648] *** End of error message *** > [node06:351649] *** Process received signal *** > [node06:351649] Signal: Segmentation fault (11) > [node06:351649] Signal code: Invalid permissions (2) > [node06:351649] Failing at address: 0x7f9b19c6f000 > [node06:351649] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7f9b311295e0] > [node06:351649] [ 1] ./IMB-MPI1(__intel_avx_rep_memcpy+0x140)[0x415380] > [node06:351649] [ 2] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libopen-pal.so.40(opal_datatype_copy_content_same_ddt+0xca)[0x7f9b3059547a] > [node06:351649] [ 3] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x3f9)[0x7f9b31900b29] > [node06:351649] [ 4] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1d7)[0x7f9b318dae67] > [node06:351649] [ 5] ./IMB-MPI1[0x40d624] > [node06:351649] [ 6] ./IMB-MPI1[0x407d16] > [node06:351649] [node06:351657] *** Process received signal *** > > _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users