[valgrind] [Bug 413251] Compilation error using GCC 7.4.0 & OpenMPI 4.0.2
https://bugs.kde.org/show_bug.cgi?id=413251 --- Comment #6 from Carl Ponder --- I wouldn't have a clue how. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413251] Compilation error using GCC 7.4.0 & OpenMPI 4.0.2
https://bugs.kde.org/show_bug.cgi?id=413251 Carl Ponder changed: What|Removed |Added Status|NEEDSINFO |REPORTED Resolution|NOT A BUG |--- --- Comment #4 from Carl Ponder --- (I had put this into "NEEDSINFO" state, but evidently that means "NEEDSINFO" from me not you!) I'm going to hold this open because the MPI 1 support might not always be available for OpenMPI, and you ought to consider adjusting your interface for future use. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413251] Compilation error using GCC 7.4.0 & OpenMPI 4.0.2
https://bugs.kde.org/show_bug.cgi?id=413251 Carl Ponder changed: What|Removed |Added Resolution|--- |NOT A BUG Status|REPORTED|NEEDSINFO --- Comment #2 from Carl Ponder --- Building OpenMPI with --enable-mpi1-compatibility" appears to solve the problem for both Valgrind and PNetCDF. I'm going to close this issue now. It looks like OpenMPI broke compatibility going from 4.0.1 -> 4.0.2. Can anyone comment on the Valgrind dependency though? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413251] Compilation error using GCC 7.4.0 & OpenMPI 4.0.2
https://bugs.kde.org/show_bug.cgi?id=413251 Carl Ponder changed: What|Removed |Added CC||cpon...@nvidia.com --- Comment #1 from Carl Ponder --- This is the 3.15.0 release, not just a circa-3.15 snapshot from the SVN repository. Shouldn't you update the Version list? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413251] New: Compilation error using GCC 7.4.0 & OpenMPI 4.0.2
https://bugs.kde.org/show_bug.cgi?id=413251 Bug ID: 413251 Summary: Compilation error using GCC 7.4.0 & OpenMPI 4.0.2 Product: valgrind Version: 3.15 SVN Platform: Ubuntu Packages OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: general Assignee: jsew...@acm.org Reporter: cpon...@nvidia.com Target Milestone: --- SUMMARY I get these errors in the build: Making all in mpi make[2]: Entering directory '/gpfs/fs1/SHARE/Utils/Valgrind/3.15.0/GCC-7.4.0_OpenMPI-4.0.2/distro/mpi' /gpfs/fs1/SHARE/Utils/OpenMPI/4.0.2/GCC-7.4.0_CUDA-10.1.243.0_418.87.00_UCX-2019-10-19_HWLoc-2.1.0_ZLib-1.2.11_NUMActl-2.0.13/bin/mpicc -I../include -g -O -fno-omit-frame- pointer -Wall -fpic -m64 -Wno-deprecated-declarations -MT libmpiwrap_amd64_linux_so-libmpiwrap.o -MD -MP -MF .deps/libmpiwrap_amd64_linux_so-libmpiwrap.Tpo -c -o libmpiwrap_a md64_linux_so-libmpiwrap.o `test -f 'libmpiwrap.c' || echo './'`libmpiwrap.c In file included from libmpiwrap.c:116:0: libmpiwrap.c: In function ‘showTy’: libmpiwrap.c:281:19: error: expected expression before ‘_Static_assert’ else if (ty == MPI_UB) fprintf(f,"UB"); ^ libmpiwrap.c:282:19: error: expected expression before ‘_Static_assert’ else if (ty == MPI_LB) fprintf(f,"LB"); ^ libmpiwrap.c: In function ‘showCombiner’: libmpiwrap.c:354:12: error: expected expression before ‘_Static_assert’ case MPI_COMBINER_HVECTOR_INTEGER: fprintf(f, "HVECTOR_INTEGER"); break; ^ libmpiwrap.c:354:12: error: expected expression before ‘_Static_assert’ libmpiwrap.c:354:40: error: expected expression before ‘:’ token case MPI_COMBINER_HVECTOR_INTEGER: fprintf(f, "HVECTOR_INTEGER"); break; ^ In file included from libmpiwrap.c:116:0: libmpiwrap.c:359:12: error: expected expression before ‘_Static_assert’ case MPI_COMBINER_HINDEXED_INTEGER: fprintf(f, "HINDEXED_INTEGER"); break; ^ libmpiwrap.c:359:12: error: expected expression before ‘_Static_assert’ libmpiwrap.c:359:41: error: expected expression before ‘:’ token case MPI_COMBINER_HINDEXED_INTEGER: fprintf(f, "HINDEXED_INTEGER"); break; ^ In file included from libmpiwrap.c:116:0: libmpiwrap.c:366:12: error: expected expression before ‘_Static_assert’ case MPI_COMBINER_STRUCT_INTEGER: fprintf(f, "STRUCT_INTEGER"); break; ^ In file included from libmpiwrap.c:116:0: libmpiwrap.c:366:12: error: expected expression before ‘_Static_assert’ case MPI_COMBINER_STRUCT_INTEGER: fprintf(f, "STRUCT_INTEGER"); break; ^ libmpiwrap.c:366:12: error: expected expression before ‘_Static_assert’ libmpiwrap.c:366:39: error: expected expression before ‘:’ token case MPI_COMBINER_STRUCT_INTEGER: fprintf(f, "STRUCT_INTEGER"); break; ^ libmpiwrap.c: In function ‘extentOfTy’: libmpiwrap.c:462:8: warning: implicit declaration of function ‘PMPI_Type_extent’; did you mean ‘MPI_Type_extent’? [-Wimplicit-function-declaration] r = PMPI_Type_extent(ty, ); ^~~~ MPI_Type_extent In file included from libmpiwrap.c:116:0: libmpiwrap.c: In function ‘walk_type’: libmpiwrap.c:736:17: error: expected expression before ‘_Static_assert’ if (ty == MPI_LB || ty == MPI_UB) ^ Makefile:645: recipe for target 'libmpiwrap_amd64_linux_so-libmpiwrap.o' failed make[2]: *** [libmpiwrap_amd64_linux_so-libmpiwrap.o] Error 1 make[2]: Leaving directory '/gpfs/fs1/SHARE/Utils/Valgrind/3.15.0/GCC-7.4.0_OpenMPI-4.0.2/distro/mpi' Makefile:841: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/gpfs/fs1/SHARE/Utils/Valgrind/3.15.0/GCC-7.4.0_OpenMPI-4.0.2/distro' Makefile:710: recipe for target 'all' failed make: *** [all] Error 2 STEPS TO REPRODUCE I'm using these configuration parameters + ./configure --with-mpicc=/gpfs/fs1/SHARE/Utils/OpenMPI/4.0.2/GCC-7.4.0_CUDA-10.1.243.0_418.87.00_UCX-2019-10-19_HWLoc-2.1.0_ZLib-1.2.11_NUMActl-2.0.13/bin/mpicc --prefix=/gpfs/fs1/SHARE/Utils/Valgrind/3.15.0/GCC-7.4.0_OpenMPI-4.0.2 SOFTWARE/OS VERSIONS Linux/KDE Plasma: Ubuntu 18,04 OpenMPI: 4.0.2 ADDITIONAL INFORMATION I believe the list of constants has changed between OpenMPI 4.0.1 & 4.0.2. I'm seeing similar breakages building the latest PNetCDF. I'm thinking there may be a flag to get around this though. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #24 from Carl Ponder <cpon...@nvidia.com> --- I can upload an executable, or I can give you the source-code for the test and instructions on how to build and run it. You'd still need to have the PGI runtime installed. I can help you get a demo copy if you need. About the zeroing of the space, (a) I can see there's nonzero junk in the array, and (b) PGI insists that they don't zero-out stack arrays. Why do you keep insisting that they do? NVIDIA owns PGI and I've been in weekly con-calls with their compiler developers for the last 5 years. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #22 from Carl Ponder <cpon...@nvidia.com> --- I know they're not zeroing out the space. As far as trying to intercept the subroutine-call, I've worked a little on this level coregrind/m_syswrap but these only intercept system-calls, right? And you're saying that there's no analogous convention for me to intercept calls into the PGI runtime and record the uninitialized data state, right? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #18 from Carl Ponder <cpon...@nvidia.com> --- PGI confirms that this call to "__builtin_aa" is what's bumping the stack pointer. It's a subroutine inside the PGI runtime. Does valgrind have a way for us to intercept this subroutine-call and then mark the array-elements as being uninitialized? I think this would solve the problem for us. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #17 from Carl Ponder <cpon...@nvidia.com> --- I uploaded the two assembly-files. From the "sdiff", I think this is where the allocations vary: -Mnostack_arrays -Mstack_arrays --- 494 ..Dcfi3: ..Dcfi3: 495 subq$48, %rsp| subq$32, %rsp 496 movq%rbx, -24(%rbp) | movq%rbx, -16(%rbp) 497 movq%r12, -32(%rbp) | movq%r12, -24(%rbp) 498 movq%r13, -40(%rbp) | movq%r13, -32(%rbp) 499 ## lineno: 38 ## lineno: 38 500 movq%rdi, %rbx movq%rdi, %rbx 501 movl(%rbx), %eax movl(%rbx), %eax 502 movl%eax, -16(%rbp) | movl%eax, -8(%rbp) 503 movslq -16(%rbp), %rax | movslq -8(%rbp), %rdi 504 movq%rax, -8(%rbp) | shlq$2, %rdi 505 leaq-8(%rbp), %rdi | call__builtin_aa xorl%eax, %eax < movl$.C2_299, %esi < callpgf90_auto_alloc04 < movq%rax, %r12 movq%rax, %r12 (I'm including the line-numbers, up to the point where they correspond between the two files). I'm guessing that these pgf90_auto_alloc04 / __builtin_aa are performing the allocations, I'll check with PGI on this. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #16 from Carl Ponder <cpon...@nvidia.com> --- Created attachment 102409 --> https://bugs.kde.org/attachment.cgi?id=102409=edit Assembly generated with stack arrays, where valgrind doesn't work -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #15 from Carl Ponder <cpon...@nvidia.com> --- Created attachment 102408 --> https://bugs.kde.org/attachment.cgi?id=102408=edit Assembly generated without stack-arrays, where valgrind works -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #13 from Carl Ponder <cpon...@nvidia.com> --- Given that there's junk in the array, I know that the contents aren't being zero'd out, and the PGI people confirm that -Mstack_arrays are not initialized. How does valgrind recognize that an array is being initialized under the circumstances? Is it following the control-flow instruction-by-instruction? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #11 from Carl Ponder <cpon...@nvidia.com> --- Back to comment #9, there *is* no instruction initializing the array, which is why it has some junk entries, regardless of valgirind's lack of mention. Talking to the PGI people, the -Mxtack_arrays flag causes the local arrays to be allocated on the stack, so the allocation is just a matter of adjusting the stack-pointer, rather than invoking "malloc" or equivalent. Does valgrind work by intercepting the malloc calls and then tabulating the uninitialized memory-cells? And if the arrays are allocated off of the stack in gfortran or gcc, how would valgrind keep track of this? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #10 from Carl Ponder <cpon...@nvidia.com> --- Stopping at line 70 puts it right after the array-allocation but before the array-writes are happening: 62 implicit none 63 integer, intent(in) :: N 64 integer ( kind = 4 ) i 65 integer ( kind = 4 ) :: x(1:N) 66 67 ! 68 ! X = { 0, 1, 2, 3, 4, ?a, ?b, ?c, ?d, ?e }. 69 ! 70 do i = 1, 5 The data-state still says initialized, even though the array contains junk values: (gdb) print x $2 = (40, 0, 117993993, 0, 117993992, 0, 69349896, 0, 19, 0) (gdb) print $3 = (PTR TO -> ( integer (10))) 0xffeffed90 (gdb) monitor xb 0xffeffed90 40 00 00 00 00 00 00 00 00 0xFFEFFED90:0x280x000x000x000x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFED98:0x090x720x080x070x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDA0:0x080x720x080x070x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDA8:0x080x320x220x040x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDB0:0x130x000x000x000x000x000x000x00 I'm checking with the compiler guys on this. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #8 from Carl Ponder <cpon...@nvidia.com> --- If I *don't* compile with the -Mstack_arrays, I get this at line 77 instead: (gdb) print x $1 = (0, 1, 2, 3, 4, 0, 0, 0, 0, 0) (gdb) print $2 = (PTR TO -> ( integer (10))) 0x70881d0 (gdb) monitor xb 0x70881d0 40 00 00 00 00 00 00 00 00 0x70881D0: 0x000x000x000x000x010x000x000x00 00 00 00 00 00 00 00 00 0x70881D8: 0x020x000x000x000x030x000x000x00 00 00 00 00 ff ff ff ff 0x70881E0: 0x040x000x000x000x000x000x000x00 ff ff ff ff ff ff ff ff 0x70881E8: 0x000x000x000x000x000x000x000x00 ff ff ff ff ff ff ff ff 0x70881F0: 0x000x000x000x000x000x000x000x00 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #7 from Carl Ponder <cpon...@nvidia.com> --- Ok here's better -- I can see the data if I compile using "-O0 -g" rather than "-O0 -gopt", which I'd assumed would be the same thing. Here's what I'm seeing in the step-through: at line 77, the array contains (gdb) print x $1 = (0, 1, 2, 3, 4, 0, 69349896, 0, 19, 0) where x(6:10) are uninitialized values. Here are the bits for the 40-byte range of x: (gdb) print $6 = (PTR TO -> ( integer (10))) 0xffeffed90 (gdb) monitor xb 0xffeffed90 40 00 00 00 00 00 00 00 00 0xFFEFFED90:0x000x000x000x000x010x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFED98:0x020x000x000x000x030x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDA0:0x040x000x000x000x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDA8:0x080x320x220x040x000x000x000x00 00 00 00 00 00 00 00 00 0xFFEFFEDB0:0x130x000x000x000x000x000x000x00 This doesn't look right to me, given that x(4) is assigned but x(8) is not: (gdb) print x(4) $18 = 3 (gdb) print (4) $19 = (PTR TO -> ( integer )) 0xffeffed9c (gdb) monitor xb 0xffeffed9c 4 00 00 00 00 0xFFEFFED9C:0x030x000x000x00 (gdb) print x(8) $20 = 0 (gdb) print (8) $21 = (PTR TO -> ( integer )) 0xffeffedac (gdb) monitor xb 0xffeffedac 4 00 00 00 00 0xFFEFFEDAC:0x000x000x000x00 Based on the explanation in the document, I would expect all the bytes to be assigned FF for X(1:5) and 00 for the rest. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #4 from Carl Ponder <cpon...@nvidia.com> --- Can you please list out the commands more precisely? I ran these commands in one window: module purge module load pgi/16.9 module load gcc/4.8.5 module load valgrind pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays valgrind --tool=memcheck --vgdb=full --vgdb-error=0 test03.pgi Then in the second window I ran these commands: module purge module load pgi/16.9 module load gcc/4.8.5 module load valgrind gdb test03.pgi target remote | vgdb b 77 c so far so good. But now: print N gives Cannot access memory at address 0x4011a000 Why is this? And print x(1) gives value being subranged must be in memory And xb 0x4011a000 gives Undefined command: "xb". Try "help". -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #3 from Carl Ponder <cpon...@nvidia.com> --- This "pgfortran" is the PGI Fortran compiler. What I'm puzzled about is why valgrind is finding more uninitialized array-elements when I compiled with gfortran than with pgfortran, and if I use pgfortran -O0 -gopt -Mstack_arrays ... valgrind doesn't find any uninitialized array-elements at all. So this "gdb+vgdb" will show me the valgrind internal tables that keep track of what's initialized and what isn't? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 --- Comment #1 from Carl Ponder <cpon...@nvidia.com> --- I attached the test-case here. You can reproduce the issue as follows: pgfortran -o test03.pgi test03.f90 -O0 -gopt valgrind test03.pgi # 12 errors. pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays valgrind test03.pgi # 0 errors. I'm using the PGI 16.9 compiler running on CentOS 7.2. The valgrind was built with GCC 4.8.5. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 371966] New: No uninitialised values reported with PGI -Mstack_arrays
https://bugs.kde.org/show_bug.cgi?id=371966 Bug ID: 371966 Summary: No uninitialised values reported with PGI -Mstack_arrays Product: valgrind Version: 3.11.0 Platform: unspecified OS: Linux Status: UNCONFIRMED Severity: normal Priority: NOR Component: memcheck Assignee: jsew...@acm.org Reporter: cpon...@nvidia.com Target Milestone: --- Created attachment 101954 --> https://bugs.kde.org/attachment.cgi?id=101954=edit Simple Fortran test-case using array with dynamic bound. I have a simple Fortran test-case that allocates an array and uses uninitialized values from it. Using the PGI compiler, if I compile it using the -Mstack_arrays option, valgrind reports 0 errors. I also have a HUGE program (WRF) where valgrind is likewise not reporting anything in spite of the fact that uninitialized array-elements are being used, so I'm trying to track down issues like this one. Can you guys explain what's going on? I'm also checking with PGI on this. -- You are receiving this mail because: You are watching all bug changes.