Hi,
I'm new to valgrind. My goal is to investigate a possible memory problem in a large parallel MPI+OpenMP code. I've cloned Valgrind from git and built it with GCC7.3 and fortran 3.1 for mpicc (my application is built with the same environment). I'm using these 2 options: --enable-only64bit --with-mpicc=$(which mpicc) "mpirun -np 8 my_application" is working on my fat node (just to have few processes for the test, I use nearly 60GB of RAM over more than 1TB). It fails after some tenth of iterations. "mpirun -np 8 valgrind /bin/hostname" works too. So Valgrind seams working with MPI 3.1 compiled with GCC7.3. But "mpirun -np 8 valgrind ./my_application" immediately fails with: Program received signal SIGILL: Illegal instruction. Backtrace for this error: vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 0x25 0xA8 0x18 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==377969== valgrind: Unrecognised instruction at address 0xabf9581. ==377969== at 0xABF9581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==377969== by 0xAC1BA78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==377969== by 0xABFDE39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==377969== by 0x911AD60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==377969== by 0x914BB34: PMPI_Init_thread (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==377969== by 0x8E97C1F: MPI_INIT_THREAD (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==377969== by 0x543066: __mpi_m_MOD_init_mpi (mpi_m.f90:140) ==377969== by 0x411447: __yales2_m_MOD_init_yales2_env (yales2_m.f90:511) ==377969== by 0x411595: __yales2_m_MOD_run_yales2 (yales2_m.f90:378) ==377969== by 0x40B9E0: MAIN__ (3D_cylinder.f90:20) ==377969== by 0x40B9E0: main (3D_cylinder.f90:8) ==377969== Your program just tried to execute an instruction that Valgrind ==377969== did not recognise. There are two possible reasons for this. ==377969== 1. Your program has a bug and erroneously jumped to a non-code ==377969== location. If you are running Memcheck and you just saw a ==377969== warning about a bad jump, it's probably your program's fault. ==377969== 2. The instruction is legitimate but Valgrind doesn't handle it, ==377969== i.e. it's Valgrind's fault. If you think this is the case or ==377969== you are not sure, please let us know and we'll try to fix it. ==377969== Either way, Valgrind will now raise a SIGILL signal which will ==377969== probably kill your program. May be I've missed something ? I'm using master branch. The branch VALGRIND_3_16_BRANCH that I have tested do not build: make: *** Aucune règle pour fabriquer la cible « exp-sgcheck.supp », nécessaire pour « default.supp ». Arrêt. Thanks for your help Patrick _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users