Le 19/05/2020 à 14:32, Julian Seward a écrit :
>
> Greetings.
>
> A first release candidate for 3.16.0 is available at
> https://sourceware.org/pub/valgrind/valgrind-3.16.0.RC2.tar.bz2
> (md5 = 21ac87434ed32bcfe5ea86a0978440ba)
>
> Please give it a try on platforms that are important for you.  If no
> serious
> issues are reported, the 3.16.0 final release will happen on 25 May,
> that is,
> next Monday.
>
> J

Hi all,

valgrind-3.16.0.RC2 doesn't work for me (as previous version on this
server). 

_*My fortran test program (error prune I think) is as simple as:*_

    PROGRAM reduce
    USE mpi
    IMPLICIT NONE

    INTEGER :: me,  ncpus, ierr
    REAL :: buff,  resu=0

    CALL MPI_INIT(ierr)
    CALL MPI_COMM_RANK(MPI_COMM_WORLD,me,ierr)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD,ncpus,ierr)
    buff=1

    CALL MPI_ALLREDUCE(buff,resu,1,MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr)
    if (me == 0 ) WRITE(6,'(a,i0,2(a,f14.6))') 'On ',me,' I have
    ',buff,' and got ',resu

    CALL MPI_FINALIZE(ierr)

    END PROGRAM reduce

_*Compilation with:*_

mpifort reduce.F90 -o reduce
mpifort --show
/opt/GCC73/bin/gfortran -I/opt/openmpi-GCC73/v3.1.x-20181010/include
-pthread -I/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,-rpath
-Wl,/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,--enable-new-dtags
-L/opt/openmpi-GCC73/v3.1.x-20181010/lib -lmpi_usempif08
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi

_*OS is *_CentOS Linux release 7.7.1908 (Core)

_*Valgrind compiled with gcc7.3, configure options are:*_

    ./configure --enable-only64bit --with-mpicc=$(which mpicc)
    --prefix=/robin/data/begou/VALGRIND/valgrind-binaries

*Hardware is:*

    Dell Poweredge R940

    4 x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (laubnched Q3 2017)

    1.5 TB of RAM

_*Compiler*_ is gcc (GCC) 7.3.0 (january 2018)

_*OpenMPI *_is

    v3.1. from the git repo 2018/10/10 (because a patch was needed at
    this time) compiled with gcc7.3.0

    *configure options are:*
    --prefix=/opt/openmpi-GCC73/v3.1.x-20181010'
    '--enable-mpirun-prefix-by-default' '--disable-dlopen'
    '--enable-mca-no-build=openib' '-
    -without-verbs' '--enable-mpi-cxx' '--without-slurm'
    '--enable-mpi-thread-multiple

_*Error is:*_

[begou@grivola TESTS]$valgrind --version
valgrind-3.16.0.RC2

[begou@grivola TESTS]$mpirun -np 2 valgrind ./reduce
==306850== Memcheck, a memory error detector
==306850== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==306850== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for
copyright info
==306850== Command: ./reduce
==306850==
==306851== Memcheck, a memory error detector
==306851== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==306851== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for
copyright info
==306851== Command: ./reduce
==306851==
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5
0x25 0xA8 0x18 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==306850== valgrind: Unrecognised instruction at address 0x6ddf581.
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5
0x25 0xA8 0x18 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==306851== valgrind: Unrecognised instruction at address 0x6ddf581.
==306851==    at 0x6DDF581: opal_pointer_array_construct (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x6E01A78: mca_base_var_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x6DE3E39: opal_init_util (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x552ED60: ompi_mpi_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306851==    by 0x555F9ED: PMPI_Init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306851==    by 0x52ABBB7: PMPI_INIT (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2)
==306851==    by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce)
==306851==    by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce)
==306851== Your program just tried to execute an instruction that Valgrind
==306851== did not recognise.  There are two possible reasons for this.
==306851== 1. Your program has a bug and erroneously jumped to a non-code
==306851==    location.  If you are running Memcheck and you just saw a
==306851==    warning about a bad jump, it's probably your program's fault.
==306851== 2. The instruction is legitimate but Valgrind doesn't handle it,
==306851==    i.e. it's Valgrind's fault.  If you think this is the case or
==306851==    you are not sure, please let us know and we'll try to fix it.
==306851== Either way, Valgrind will now raise a SIGILL signal which will
==306851== probably kill your program.
==306850==    at 0x6DDF581: opal_pointer_array_construct (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x6E01A78: mca_base_var_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x6DE3E39: opal_init_util (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x552ED60: ompi_mpi_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306850==    by 0x555F9ED: PMPI_Init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306850==    by 0x52ABBB7: PMPI_INIT (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2)
==306850==    by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce)
==306850==    by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce)
==306850== Your program just tried to execute an instruction that Valgrind
==306850== did not recognise.  There are two possible reasons for this.
==306850== 1. Your program has a bug and erroneously jumped to a non-code
==306850==    location.  If you are running Memcheck and you just saw a
==306850==    warning about a bad jump, it's probably your program's fault.
==306850== 2. The instruction is legitimate but Valgrind doesn't handle it,
==306850==    i.e. it's Valgrind's fault.  If you think this is the case or
==306850==    you are not sure, please let us know and we'll try to fix it.
==306850== Either way, Valgrind will now raise a SIGILL signal which will
==306850== probably kill your program.

Program received signal SIGILL: Illegal instruction.

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:

Backtrace for this error:
#0  0x66ce3af in ???
#0  0x66ce3af in ???
#1  0x6ddf581 in ???
#2  0x6e01a78 in ???
#3  0x6de3e39 in ???
#1  0x6ddf581 in ???
#4  0x552ed60 in ???
#5  0x555f9ed in ???
#6  0x52abbb7 in ???
#2  0x6e01a78 in ???
#7  0x400cdd in ???
#8  0x400e8c in ???
#9  0x66ba504 in ???
#10  0x400c18 in ???
#3  0x6de3e39 in ???
#11  0xffffffffffffffff in ???
#4  0x552ed60 in ???
#5  0x555f9ed in ???
#6  0x52abbb7 in ???
#7  0x400cdd in ???
#8  0x400e8c in ???
==306851==
==306851== Process terminating with default action of signal 4 (SIGILL)
#9  0x66ba504 in ???
==306851==    at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so)
==306851==    by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so)
==306851==    by 0x6DDF580: opal_pointer_array_construct (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x6E01A78: mca_base_var_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x6DE3E39: opal_init_util (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306851==    by 0x552ED60: ompi_mpi_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306851==    by 0x555F9ED: PMPI_Init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306851==    by 0x52ABBB7: PMPI_INIT (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2)
==306851==    by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce)
==306851==    by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce)
#10  0x400c18 in ???
#11  0xffffffffffffffff in ???
==306850==
==306850== Process terminating with default action of signal 4 (SIGILL)
==306850==    at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so)
==306850==    by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so)
==306850==    by 0x6DDF580: opal_pointer_array_construct (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x6E01A78: mca_base_var_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x6DE3E39: opal_init_util (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3)
==306850==    by 0x552ED60: ompi_mpi_init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306850==    by 0x555F9ED: PMPI_Init (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3)
==306850==    by 0x52ABBB7: PMPI_INIT (in
/opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2)
==306850==    by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce)
==306850==    by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce)
==306851==
==306851== HEAP SUMMARY:
==306851==     in use at exit: 8,830 bytes in 65 blocks
==306851==   total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated
==306851==
==306850==
==306850== HEAP SUMMARY:
==306850==     in use at exit: 8,830 bytes in 65 blocks
==306850==   total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated
==306850==
==306851== LEAK SUMMARY:
==306851==    definitely lost: 0 bytes in 0 blocks
==306851==    indirectly lost: 0 bytes in 0 blocks
==306851==      possibly lost: 0 bytes in 0 blocks
==306851==    still reachable: 8,830 bytes in 65 blocks
==306851==         suppressed: 0 bytes in 0 blocks
==306851== Rerun with --leak-check=full to see details of leaked memory
==306851==
==306851== For lists of detected and suppressed errors, rerun with: -s
==306851== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==306850== LEAK SUMMARY:
==306850==    definitely lost: 0 bytes in 0 blocks
==306850==    indirectly lost: 0 bytes in 0 blocks
==306850==      possibly lost: 0 bytes in 0 blocks
==306850==    still reachable: 8,830 bytes in 65 blocks
==306850==         suppressed: 0 bytes in 0 blocks
==306850== Rerun with --leak-check=full to see details of leaked memory
==306850==
==306850== For lists of detected and suppressed errors, rerun with: -s
==306850== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node grivola exited on
signal 4 (Illegal instruction).


Thanks

Patrick

_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to