Le 19/05/2020 à 14:32, Julian Seward a écrit : > > Greetings. > > A first release candidate for 3.16.0 is available at > https://sourceware.org/pub/valgrind/valgrind-3.16.0.RC2.tar.bz2 > (md5 = 21ac87434ed32bcfe5ea86a0978440ba) > > Please give it a try on platforms that are important for you. If no > serious > issues are reported, the 3.16.0 final release will happen on 25 May, > that is, > next Monday. > > J
Hi all, valgrind-3.16.0.RC2 doesn't work for me (as previous version on this server). _*My fortran test program (error prune I think) is as simple as:*_ PROGRAM reduce USE mpi IMPLICIT NONE INTEGER :: me, ncpus, ierr REAL :: buff, resu=0 CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,me,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,ncpus,ierr) buff=1 CALL MPI_ALLREDUCE(buff,resu,1,MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr) if (me == 0 ) WRITE(6,'(a,i0,2(a,f14.6))') 'On ',me,' I have ',buff,' and got ',resu CALL MPI_FINALIZE(ierr) END PROGRAM reduce _*Compilation with:*_ mpifort reduce.F90 -o reduce mpifort --show /opt/GCC73/bin/gfortran -I/opt/openmpi-GCC73/v3.1.x-20181010/include -pthread -I/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,-rpath -Wl,/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,--enable-new-dtags -L/opt/openmpi-GCC73/v3.1.x-20181010/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi _*OS is *_CentOS Linux release 7.7.1908 (Core) _*Valgrind compiled with gcc7.3, configure options are:*_ ./configure --enable-only64bit --with-mpicc=$(which mpicc) --prefix=/robin/data/begou/VALGRIND/valgrind-binaries *Hardware is:* Dell Poweredge R940 4 x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (laubnched Q3 2017) 1.5 TB of RAM _*Compiler*_ is gcc (GCC) 7.3.0 (january 2018) _*OpenMPI *_is v3.1. from the git repo 2018/10/10 (because a patch was needed at this time) compiled with gcc7.3.0 *configure options are:* --prefix=/opt/openmpi-GCC73/v3.1.x-20181010' '--enable-mpirun-prefix-by-default' '--disable-dlopen' '--enable-mca-no-build=openib' '- -without-verbs' '--enable-mpi-cxx' '--without-slurm' '--enable-mpi-thread-multiple _*Error is:*_ [begou@grivola TESTS]$valgrind --version valgrind-3.16.0.RC2 [begou@grivola TESTS]$mpirun -np 2 valgrind ./reduce ==306850== Memcheck, a memory error detector ==306850== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==306850== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info ==306850== Command: ./reduce ==306850== ==306851== Memcheck, a memory error detector ==306851== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==306851== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info ==306851== Command: ./reduce ==306851== vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 0x25 0xA8 0x18 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 0x25 0xA8 0x18 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==306851== valgrind: Unrecognised instruction at address 0x6ddf581. ==306851== at 0x6DDF581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306851== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306851== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306851== Your program just tried to execute an instruction that Valgrind ==306851== did not recognise. There are two possible reasons for this. ==306851== 1. Your program has a bug and erroneously jumped to a non-code ==306851== location. If you are running Memcheck and you just saw a ==306851== warning about a bad jump, it's probably your program's fault. ==306851== 2. The instruction is legitimate but Valgrind doesn't handle it, ==306851== i.e. it's Valgrind's fault. If you think this is the case or ==306851== you are not sure, please let us know and we'll try to fix it. ==306851== Either way, Valgrind will now raise a SIGILL signal which will ==306851== probably kill your program. ==306850== at 0x6DDF581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306850== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306850== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306850== Your program just tried to execute an instruction that Valgrind ==306850== did not recognise. There are two possible reasons for this. ==306850== 1. Your program has a bug and erroneously jumped to a non-code ==306850== location. If you are running Memcheck and you just saw a ==306850== warning about a bad jump, it's probably your program's fault. ==306850== 2. The instruction is legitimate but Valgrind doesn't handle it, ==306850== i.e. it's Valgrind's fault. If you think this is the case or ==306850== you are not sure, please let us know and we'll try to fix it. ==306850== Either way, Valgrind will now raise a SIGILL signal which will ==306850== probably kill your program. Program received signal SIGILL: Illegal instruction. Program received signal SIGILL: Illegal instruction. Backtrace for this error: Backtrace for this error: #0 0x66ce3af in ??? #0 0x66ce3af in ??? #1 0x6ddf581 in ??? #2 0x6e01a78 in ??? #3 0x6de3e39 in ??? #1 0x6ddf581 in ??? #4 0x552ed60 in ??? #5 0x555f9ed in ??? #6 0x52abbb7 in ??? #2 0x6e01a78 in ??? #7 0x400cdd in ??? #8 0x400e8c in ??? #9 0x66ba504 in ??? #10 0x400c18 in ??? #3 0x6de3e39 in ??? #11 0xffffffffffffffff in ??? #4 0x552ed60 in ??? #5 0x555f9ed in ??? #6 0x52abbb7 in ??? #7 0x400cdd in ??? #8 0x400e8c in ??? ==306851== ==306851== Process terminating with default action of signal 4 (SIGILL) #9 0x66ba504 in ??? ==306851== at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so) ==306851== by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so) ==306851== by 0x6DDF580: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306851== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306851== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) #10 0x400c18 in ??? #11 0xffffffffffffffff in ??? ==306850== ==306850== Process terminating with default action of signal 4 (SIGILL) ==306850== at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so) ==306850== by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so) ==306850== by 0x6DDF580: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306850== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306850== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306851== ==306851== HEAP SUMMARY: ==306851== in use at exit: 8,830 bytes in 65 blocks ==306851== total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated ==306851== ==306850== ==306850== HEAP SUMMARY: ==306850== in use at exit: 8,830 bytes in 65 blocks ==306850== total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated ==306850== ==306851== LEAK SUMMARY: ==306851== definitely lost: 0 bytes in 0 blocks ==306851== indirectly lost: 0 bytes in 0 blocks ==306851== possibly lost: 0 bytes in 0 blocks ==306851== still reachable: 8,830 bytes in 65 blocks ==306851== suppressed: 0 bytes in 0 blocks ==306851== Rerun with --leak-check=full to see details of leaked memory ==306851== ==306851== For lists of detected and suppressed errors, rerun with: -s ==306851== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==306850== LEAK SUMMARY: ==306850== definitely lost: 0 bytes in 0 blocks ==306850== indirectly lost: 0 bytes in 0 blocks ==306850== possibly lost: 0 bytes in 0 blocks ==306850== still reachable: 8,830 bytes in 65 blocks ==306850== suppressed: 0 bytes in 0 blocks ==306850== Rerun with --leak-check=full to see details of leaked memory ==306850== ==306850== For lists of detected and suppressed errors, rerun with: -s ==306850== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node grivola exited on signal 4 (Illegal instruction). Thanks Patrick
_______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users