It took a long time to run - but valgrind gave no errors. balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck ./ex238 -mat_block_size 15 ==8099== Memcheck, a memory error detector ==8099== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==8099== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==8099== Command: ./ex238 -mat_block_size 15 ==8099== ==8099== Warning: set address range perms: large range [0x59e43040, 0xeabb0bc0) (undefined) ==8099== Warning: set address range perms: large range [0x59e43028, 0xeabb0bd8) (noaccess) ==8099== ==8099== HEAP SUMMARY: ==8099== in use at exit: 121 bytes in 2 blocks ==8099== total heap usage: 535 allocs, 533 frees, 2,457,025,714 bytes allocated ==8099== ==8099== LEAK SUMMARY: ==8099== definitely lost: 0 bytes in 0 blocks ==8099== indirectly lost: 0 bytes in 0 blocks ==8099== possibly lost: 0 bytes in 0 blocks ==8099== still reachable: 121 bytes in 2 blocks ==8099== suppressed: 0 bytes in 0 blocks ==8099== Rerun with --leak-check=full to see details of leaked memory ==8099== ==8099== For counts of detected and suppressed errors, rerun with: -v ==8099== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Satish On Tue, 23 Feb 2021, Barry Smith wrote: > > Satish, > > Thanks for running this, but it is the 15 that is breaking, not the 12 > :-). It is crashing inside building the matrix on Solaris with memory > corruption. But I am having trouble getting it to cause problems elsewhere. > > Barry > > I think it is just code what was not previously properly tested in the > nightly builds, the code has been around for a while. Or could be a bug in my > test program. > > > > > > > On Feb 22, 2021, at 10:31 PM, Satish Balay <[email protected]> wrote: > > > > I get the following with a debug build. > > > >>>>>>>>> > > balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ make ex238 > > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > > -fstack-protector -fvisibility=hidden -g3 -fPIC -Wall -Wwrite-strings > > -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector > > -fvisibility=hidden -g3 -I/scratch/balay/petsc/include > > -I/scratch/balay/petsc/arch-linux-c-debug/include ex238.c > > -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib > > -L/scratch/balay/petsc/arch-linux-c-debug/lib > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 > > -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > > -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 > > -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > > -ldl -o ex238 > > balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck > > ./ex238 -mat_block_size 12 > > ==34355== Memcheck, a memory error detector > > ==34355== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > > ==34355== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info > > ==34355== Command: ./ex238 -mat_block_size 12 > > ==34355== > > ==34355== Warning: set address range perms: large range [0x59e43040, > > 0xb696a840) (undefined) > > <<<<<<<< > > > > Hang? takes a long time. try a different example > > > >>>>>>> > > balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ make ex237 > > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > > -fstack-protector -fvisibility=hidden -g3 -fPIC -Wall -Wwrite-strings > > -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector > > -fvisibility=hidden -g3 -I/scratch/balay/petsc/include > > -I/scratch/balay/petsc/arch-linux-c-debug/include ex237.c > > -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib > > -L/scratch/balay/petsc/arch-linux-c-debug/lib > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 > > -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > > -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 > > -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > > -ldl -o ex237 > > balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck > > -q ./ex237 -f > > /scratch/balay/petsc/share/petsc/datafiles/matrices/spd-real-int32-float64 > > Benchmarking MatMult: with A seqaij 12x12 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x2 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x4 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x8 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x16 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x32 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x64 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x128 > > balay@petsc-02:/scratch/balay/petsc/src/mat/tests$ > > > > <<<<<<<<< > > > > So the likely issue is - this opt build with '-march=native' [perhaps this > > valgrind version is older than the cpu]. > > > > Ok try an optimized build on an older CPU - aka es [@gce] > > > >>>>>> > > > > balay@es:/scratch/balay/petsc/src/mat/tests$ make ex237 > > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > > -fstack-protector -fvisibility=hidden -march=native -O3 -fPIC -Wall > > -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector > > -fvisibility=hidden -march=native -O3 -I/scratch/balay/petsc/include > > -I/scratch/balay/petsc/arch-linux-c-opt/include ex237.c > > -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-opt/lib > > -L/scratch/balay/petsc/arch-linux-c-opt/lib > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 > > -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > > -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 > > -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ > > -ldl -o ex237 > > balay@es:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck -q > > ./ex237 -f > > /scratch/balay/petsc/share/petsc/datafiles/matrices/spd-real-int32-float64 > > Benchmarking MatMult: with A seqaij 12x12 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x2 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x4 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x8 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x16 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x32 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x64 > > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x128 > > balay@es:/scratch/balay/petsc/src/mat/tests$ > > > > <<<<<<< > > > > Satish > > > > > > > > On Mon, 22 Feb 2021, Barry Smith wrote: > > > >> > >> I knew they hate Macs but now Linux? Any trustworthy machines to run > >> valgrind? > >> > >> > >> $ petscmpiexec -valgrind -n 1 ./ex238 -mat_block_size 12 > >> ==14144== > >> ==14144== Process terminating with default action of signal 4 (SIGILL) > >> ==14144== Illegal opcode at address 0x4F808A9 > >> ==14144== at 0x4F808A9: PetscSetDisplay (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14144== by 0x4F086BD: PetscOptionsCheckInitial_Private (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14144== by 0x4F0D5BC: PetscInitialize (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14144== by 0x108D0E: main (in > >> /scratch/bsmith/petsc/src/mat/tests/ex238) > >> Illegal instruction (core dumped) > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> $ echo $PETSC_OPTIONS > >> > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> $ hostname > >> petsc-02 > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> $ uname -a > >> Linux petsc-02 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC > >> 2021 x86_64 x86_64 x86_64 GNU/Linux > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> $ which valgrind > >> /usr/bin/valgrind > >> > >> $ make ex237 > >> gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > >> -fstack-protector -fvisibility=hidden -march=native -O3 -fPIC -Wall > >> -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > >> -fstack-protector -fvisibility=hidden -march=native -O3 > >> -I/scratch/bsmith/petsc/include > >> -I/scratch/bsmith/petsc/arch-add-baij-12/include ex237.c > >> -Wl,-rpath,/scratch/bsmith/petsc/arch-add-baij-12/lib > >> -L/scratch/bsmith/petsc/arch-add-baij-12/lib -lpetsc -llapack -lblas > >> -lpthread -lm -lX11 -lquadmath -ldl -o ex237 > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> $ petscmpiexec -valgrind -n 1 ./ex237 > >> ==14841== > >> ==14841== Process terminating with default action of signal 4 (SIGILL) > >> ==14841== Illegal opcode at address 0x4F808A9 > >> ==14841== at 0x4F808A9: PetscSetDisplay (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14841== by 0x4F086BD: PetscOptionsCheckInitial_Private (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14841== by 0x4F0D5BC: PetscInitialize (in > >> /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4) > >> ==14841== by 0x109DE0: main (in > >> /scratch/bsmith/petsc/src/mat/tests/ex237) > >> Illegal instruction (core dumped) > >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) > >> arch-add-baij-12 > >> > >> > > >
