Re: [OMPI users] Segmentation fault when using 31 or 32 ranks
Jeff and Steven, Thanks for your help. I downloaded the nightly snapshot and it fixes the problem. I need to do more testing tomorrow and I will report back if any issues arise. Thanks again. T. On 10/07/2019 18:44, Jeff Squyres (jsquyres) via users wrote: It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch: https://www.open-mpi.org/nightly/v4.0.x/ On Jul 10, 2019, at 1:29 PM, Steven Varga via users wrote: Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 series release? On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, wrote: Hi, I have the following issue with version 4.0.1 when running on a node with two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or less is fine, and running 33 or above gives the "not enough slots" message which is expected. However, using 31 or 32 ranks results in the following error: [nodek19:391429] *** Process received signal *** [nodek19:391429] Signal: Segmentation fault (11) [nodek19:391429] Signal code: Address not mapped (1) [nodek19:391429] Failing at address: 0x7fa34954d008 [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] [nodek19:391429] [ 1] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] [nodek19:391429] [ 2] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] [nodek19:391429] [ 3] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] [nodek19:391429] [ 4] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] [nodek19:391429] [ 5] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] [nodek19:391429] [ 6] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] [nodek19:391429] [ 7] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] [nodek19:391429] [ 8] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] [nodek19:391429] [ 9] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] [nodek19:391429] [10] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] [nodek19:391429] [11] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] [nodek19:391429] [12] mpitest[0x4007fe] [nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] [nodek19:391429] [14] mpitest[0x400729] [nodek19:391429] *** End of error message *** Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores, 28 and 24 respectivity. Version 3.1.4 works across all three computers without issue. Any comments would be appreciated. Regards, T. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Segmentation fault when using 31 or 32 ranks
It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch: https://www.open-mpi.org/nightly/v4.0.x/ > On Jul 10, 2019, at 1:29 PM, Steven Varga via users > wrote: > > Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 > series release? > > On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, > wrote: > Hi, > > I have the following issue with version 4.0.1 when running on a node with > two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or > less is fine, and running 33 or above gives the "not enough slots" message > which is expected. > > However, using 31 or 32 ranks results in the following error: > > [nodek19:391429] *** Process received signal *** > [nodek19:391429] Signal: Segmentation fault (11) > [nodek19:391429] Signal code: Address not mapped (1) > [nodek19:391429] Failing at address: 0x7fa34954d008 > [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] > [nodek19:391429] [ 1] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] > [nodek19:391429] [ 2] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] > [nodek19:391429] [ 3] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] > [nodek19:391429] [ 4] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] > [nodek19:391429] [ 5] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] > [nodek19:391429] [ 6] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] > [nodek19:391429] [ 7] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] > [nodek19:391429] [ 8] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] > [nodek19:391429] [ 9] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] > [nodek19:391429] [10] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] > [nodek19:391429] [11] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] > [nodek19:391429] [12] mpitest[0x4007fe] > [nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] > [nodek19:391429] [14] mpitest[0x400729] > [nodek19:391429] *** End of error message *** > > > Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) > or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores, > 28 and 24 respectivity. Version 3.1.4 works across all three computers > without > issue. > > Any comments would be appreciated. > > Regards, > > T. > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Segmentation fault when using 31 or 32 ranks
Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 series release? On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, < users@lists.open-mpi.org> wrote: > Hi, > > I have the following issue with version 4.0.1 when running on a node with > two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or > less is fine, and running 33 or above gives the "not enough slots" message > which is expected. > > However, using 31 or 32 ranks results in the following error: > > [nodek19:391429] *** Process received signal *** > [nodek19:391429] Signal: Segmentation fault (11) > [nodek19:391429] Signal code: Address not mapped (1) > [nodek19:391429] Failing at address: 0x7fa34954d008 > [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] > [nodek19:391429] [ 1] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] > [nodek19:391429] [ 2] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] > [nodek19:391429] [ 3] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] > [nodek19:391429] [ 4] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] > [nodek19:391429] [ 5] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] > [nodek19:391429] [ 6] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] > [nodek19:391429] [ 7] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] > [nodek19:391429] [ 8] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] > [nodek19:391429] [ 9] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] > [nodek19:391429] [10] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] > [nodek19:391429] [11] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] > [nodek19:391429] [12] mpitest[0x4007fe] > [nodek19:391429] [13] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] > [nodek19:391429] [14] mpitest[0x400729] > [nodek19:391429] *** End of error message *** > > > Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) > or 6126 (12 cores), the issue doesn't occur. I'm able to use all the > cores, > 28 and 24 respectivity. Version 3.1.4 works across all three computers > without > issue. > > Any comments would be appreciated. > > Regards, > > T. > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Segmentation fault when using 31 or 32 ranks
Hi, I have the following issue with version 4.0.1 when running on a node with two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or less is fine, and running 33 or above gives the "not enough slots" message which is expected. However, using 31 or 32 ranks results in the following error: [nodek19:391429] *** Process received signal *** [nodek19:391429] Signal: Segmentation fault (11) [nodek19:391429] Signal code: Address not mapped (1) [nodek19:391429] Failing at address: 0x7fa34954d008 [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] [nodek19:391429] [ 1] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] [nodek19:391429] [ 2] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] [nodek19:391429] [ 3] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] [nodek19:391429] [ 4] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] [nodek19:391429] [ 5] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] [nodek19:391429] [ 6] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] [nodek19:391429] [ 7] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] [nodek19:391429] [ 8] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] [nodek19:391429] [ 9] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] [nodek19:391429] [10] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] [nodek19:391429] [11] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] [nodek19:391429] [12] mpitest[0x4007fe] [nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] [nodek19:391429] [14] mpitest[0x400729] [nodek19:391429] *** End of error message *** Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores, 28 and 24 respectivity. Version 3.1.4 works across all three computers without issue. Any comments would be appreciated. Regards, T. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users