Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3
series release?

On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, <
users@lists.open-mpi.org> wrote:

> Hi,
>
> I have the following issue with version 4.0.1 when running on a node with
> two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or
> less is fine, and running 33 or above gives the "not enough slots" message
> which is expected.
>
> However, using 31 or 32 ranks results in the following error:
>
> [nodek19:391429] *** Process received signal ***
> [nodek19:391429] Signal: Segmentation fault (11)
> [nodek19:391429] Signal code: Address not mapped (1)
> [nodek19:391429] Failing at address: 0x7fa34954d008
> [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0]
> [nodek19:391429] [ 1]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a]
> [nodek19:391429] [ 2]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3]
> [nodek19:391429] [ 3]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14]
> [nodek19:391429] [ 4]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8]
> [nodek19:391429] [ 5]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83]
> [nodek19:391429] [ 6]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08]
> [nodek19:391429] [ 7]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb]
> [nodek19:391429] [ 8]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6]
> [nodek19:391429] [ 9]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21]
> [nodek19:391429] [10]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24]
> [nodek19:391429] [11]
> /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89]
> [nodek19:391429] [12] mpitest[0x4007fe]
> [nodek19:391429] [13]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5]
> [nodek19:391429] [14] mpitest[0x400729]
> [nodek19:391429] *** End of error message ***
>
>
> Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores)
> or 6126 (12 cores), the issue doesn't occur. I'm able to use all the
> cores,
> 28 and 24 respectivity. Version 3.1.4 works across all three computers
> without
> issue.
>
> Any comments would be appreciated.
>
> Regards,
>
> T.
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to