Il 13/10/20 16:33, Jeff Squyres (jsquyres) ha scritto: > That's odd. What version of Open MPI are you using?
The version is 3.1.3 , as packaged in Debian Buster. I don't know OpenMPI (or even MPI in general) much. Some time ago, I've had to add a mtl = psm2 line to /etc/openmpi/openmpi-mca-params.conf . Another strangeness is that I've had the same problem on other nodes, that got "solved" (or, more likely, just "masked") by simply installing gdb: while trying to debug the issue I noticed that when I installed gdb I could no longer reproduce the problem. Too bad on this server gdb is already installed and apparently useless to debug the issue. >> On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users >> <users@lists.open-mpi.org> wrote: >> >> Hello all. >> >> I have a problem on a server: launching a job with mpirun fails if I >> request all 32 CPUs (threads, since HT is enabled) but succeeds if I >> only request 30. >> >> The test code is really minimal: >> -8<-- >> #include "mpi.h" >> #include <stdio.h> >> #include <stdlib.h> >> #define MASTER 0 >> >> int main (int argc, char *argv[]) >> { >> int numtasks, taskid, len; >> char hostname[MPI_MAX_PROCESSOR_NAME]; >> MPI_Init(&argc, &argv); >> // int provided=0; >> // MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); >> //printf("MPI provided threads: %d\n", provided); >> MPI_Comm_size(MPI_COMM_WORLD, &numtasks); >> MPI_Comm_rank(MPI_COMM_WORLD,&taskid); >> >> if (taskid == MASTER) >> printf("This is an MPI parallel code for Hello World with no >> communication\n"); >> //MPI_Barrier(MPI_COMM_WORLD); >> >> >> MPI_Get_processor_name(hostname, &len); >> >> printf ("Hello from task %d on %s!\n", taskid, hostname); >> >> if (taskid == MASTER) >> printf("MASTER: Number of MPI tasks is: %d\n",numtasks); >> >> MPI_Finalize(); >> >> printf("END OF CODE from task %d\n", taskid); >> } >> -8<-- >> (the commented section is a leftover of one of the tests). >> >> The error is : >> -8<-- >> [str957-bl0-03:19637] *** Process received signal *** >> [str957-bl0-03:19637] Signal: Segmentation fault (11) >> [str957-bl0-03:19637] Signal code: Address not mapped (1) >> [str957-bl0-03:19637] Failing at address: 0x7ffff7fac008 >> [str957-bl0-03:19637] [ 0] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7ffff7e92730] >> [str957-bl0-03:19637] [ 1] >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7ffff646d936] >> [str957-bl0-03:19637] [ 2] >> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x7ffff6444733] >> [str957-bl0-03:19637] [ 3] >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7ffff646d5b4] >> [str957-bl0-03:19637] [ 4] >> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7ffff659346e] >> [str957-bl0-03:19637] [ 5] >> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7ffff654b88d] >> [str957-bl0-03:19637] [ 6] >> /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x7ffff6507d7c] >> [str957-bl0-03:19637] [ 7] >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x7ffff6603fe4] >> [str957-bl0-03:19637] [ 8] >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x7ffff7fb1656] >> [str957-bl0-03:19637] [ 9] >> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x7ffff7c1c11a] >> [str957-bl0-03:19637] [10] >> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x7ffff7eece62] >> [str957-bl0-03:19637] [11] >> /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x7ffff7f1b17e] >> [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x5555555551c6] >> [str957-bl0-03:19637] [13] >> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7ffff7ce309b] >> [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x5555555550da] >> [str957-bl0-03:19637] *** End of error message *** >> -8<-- >> >> I'm using Debian stable packages. On other servers there is no problem >> (but there was in the past, and it got "solved" by just installing gdb). >> >> Any hints? >> >> TIA >> >> -- >> Diego Zuccato >> DIFA - Dip. di Fisica e Astronomia >> Servizi Informatici >> Alma Mater Studiorum - Università di Bologna >> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy >> tel.: +39 051 20 95786 > > -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786