Re: [Valgrind-users] Problem with valgrind-3.17.0 and openmpi-4.0.5

TESSER FEDERICO Wed, 07 Jul 2021 02:49:42 -0700

I have tried valgrind 3.17.0 and openmpi 4.0.2, and itworks.

Do you know if there are some reported bugs with thatspecific

version?


Regards,

Federico Tesser



On Wed, 07 Jul 2021 10:25:52 +0200
 "TESSER FEDERICO" <federico.tes...@polito.it> wrote:

Good morning.
I have installed valgrind 3.17.0, having previouslyloaded themodule for openmpi 4.0.5, so it found the"MPI2-compliant mpicc
and mpi.h...".
However, trying to run just a simple program like thisone:
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {

MPI_Init(NULL, NULL);

int world_size;
int world_rank;
int name_len;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(processor_name, &name_len);
printf("Hello world from processor %s, rank %d out of %dprocessors\n",
           processor_name, world_rank, world_size);

MPI_Finalize();

}



will produce the following errors:



==113228== Memcheck, a memory error detector
==113228== Copyright (C) 2002-2017, and GNU GPL'd, byJulian Seward et al.==113228== Using Valgrind-3.17.0 and LibVEX; rerun with-h for copyright info
==113228== Command: ./pure_mpi_valgrind_try/a.out
==113228==
valgrind MPI wrappers 113228: Active for pid 113228
valgrind MPI wrappers 113228: Try MPIWRAP_DEBUG=help forpossible optionsvex amd64->IR: unhandled instruction bytes: 0x62 0xF20x7D 0x8 0x7C 0xC5 0xC5 0xF9 0xD6 0x43
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==113228== valgrind: Unrecognised instruction at address0x5c79318.==113228== at 0x5C79318: opal_pointer_array_init (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5CA4BDB: mca_base_var_init (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5C82F11: opal_init_util (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5157FD9: ompi_mpi_init(ompi_mpi_init.c:428)
==113228==    by 0x50FB3A8: PMPI_Init (pinit.c:69)
==113228== by 0x4E4BC26: PMPI_Init(libmpiwrap.c:2288)
==113228==    by 0x10893B: main (main.c:6)
==113228== Your program just tried to execute aninstruction that Valgrind==113228== did not recognise. There are two possiblereasons for this.==113228== 1. Your program has a bug and erroneouslyjumped to a non-code==113228== location. If you are running Memcheck andyou just saw a==113228== warning about a bad jump, it's probablyyour program's fault.==113228== 2. The instruction is legitimate but Valgrinddoesn't handle it,==113228== i.e. it's Valgrind's fault. If you thinkthis is the case or==113228== you are not sure, please let us know andwe'll try to fix it.==113228== Either way, Valgrind will now raise a SIGILLsignal which will
==113228== probably kill your program.
==113228==
==113228== Process terminating with default action ofsignal 4 (SIGILL): dumping core
==113228==  Illegal opcode at address 0x5C79318
==113228== at 0x5C79318: opal_pointer_array_init (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5CA4BDB: mca_base_var_init (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5C82F11: opal_init_util (in/usr/local/openmpi-4.0.5/lib/libopen-pal.so.40.20.5)==113228== by 0x5157FD9: ompi_mpi_init(ompi_mpi_init.c:428)
==113228==    by 0x50FB3A8: PMPI_Init (pinit.c:69)
==113228== by 0x4E4BC26: PMPI_Init(libmpiwrap.c:2288)
==113228==    by 0x10893B: main (main.c:6)
slurmstepd: error: *** JOB 159641 ON node01 CANCELLED AT2021-07-07T10:21:29 ***srun: Job step aborted: Waiting up to 32 seconds for jobstep to finish.
srun: error: Timed out waiting for job step to complete
slurmstepd: error: *** STEP 159641.0 ON node01 CANCELLEDAT 2021-07-07T10:22:48 ***
What am I doing wrong?

Regards,

Federico Tesser




_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] Problem with valgrind-3.17.0 and openmpi-4.0.5

Reply via email to