I have tried valgrind 3.17.0 and openmpi 4.0.2, and it works.

Do you know if there are some reported bugs with that specific
version?

Regards,

Federico Tesser



On Wed, 07 Jul 2021 10:25:52 +0200
 "TESSER FEDERICO" <federico.tes...@polito.it> wrote:
Good morning.

I have installed valgrind 3.17.0, having previously loaded the module for openmpi 4.0.5, so it found the "MPI2-compliant mpicc
and mpi.h...".

However, trying to run just a simple program like this one:



#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {

MPI_Init(NULL, NULL);

int world_size;
int world_rank;
int name_len;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(processor_name, &name_len);

printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

MPI_Finalize();

}



will produce the following errors:



==113228== Memcheck, a memory error detector
==113228== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==113228== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==113228== Command: ./pure_mpi_valgrind_try/a.out
==113228==
valgrind MPI wrappers 113228: Active for pid 113228
valgrind MPI wrappers 113228: Try MPIWRAP_DEBUG=help for possible options vex amd64->IR: unhandled instruction bytes: 0x62 0xF2 0x7D 0x8 0x7C 0xC5 0xC5 0xF9 0xD6 0x43
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==113228== valgrind: Unrecognised instruction at address 0x5c79318. ==113228== at 0x5C79318: opal_pointer_array_init (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5CA4BDB: mca_base_var_init (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5C82F11: opal_init_util (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5157FD9: ompi_mpi_init (ompi_mpi_init.c:428)
==113228==    by 0x50FB3A8: PMPI_Init (pinit.c:69)
==113228== by 0x4E4BC26: PMPI_Init (libmpiwrap.c:2288)
==113228==    by 0x10893B: main (main.c:6)
==113228== Your program just tried to execute an instruction that Valgrind ==113228== did not recognise. There are two possible reasons for this. ==113228== 1. Your program has a bug and erroneously jumped to a non-code ==113228== location. If you are running Memcheck and you just saw a ==113228== warning about a bad jump, it's probably your program's fault. ==113228== 2. The instruction is legitimate but Valgrind doesn't handle it, ==113228== i.e. it's Valgrind's fault. If you think this is the case or ==113228== you are not sure, please let us know and we'll try to fix it. ==113228== Either way, Valgrind will now raise a SIGILL signal which will
==113228== probably kill your program.
==113228==
==113228== Process terminating with default action of signal 4 (SIGILL): dumping core
==113228==  Illegal opcode at address 0x5C79318
==113228== at 0x5C79318: opal_pointer_array_init (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5CA4BDB: mca_base_var_init (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5C82F11: opal_init_util (in /usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5) ==113228== by 0x5157FD9: ompi_mpi_init (ompi_mpi_init.c:428)
==113228==    by 0x50FB3A8: PMPI_Init (pinit.c:69)
==113228== by 0x4E4BC26: PMPI_Init (libmpiwrap.c:2288)
==113228==    by 0x10893B: main (main.c:6)
slurmstepd: error: *** JOB 159641 ON node01 CANCELLED AT 2021-07-07T10:21:29 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
slurmstepd: error: *** STEP 159641.0 ON node01 CANCELLED AT 2021-07-07T10:22:48 ***



What am I doing wrong?

Regards,

Federico Tesser



_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to