[Bug tree-optimization/98254] Failure to optimize simple pattern for __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 --- Comment #4 from rguenther at suse dot de --- On December 12, 2020 8:36:07 PM GMT+01:00, "jakub at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 > >--- Comment #3 from Jakub Jelinek --- >(In reply to rguent...@suse.de from comment #2) >> Should already be handled by vectorizing the CTOR. > >I've tried: > >typedef int __attribute__((vector_size(16))) V; > >V >foo (short *a) >{ > return (V){a[0], a[1], a[2], a[3]}; >} > >V >bar (int *a) >{ > return (V){a[0], a[1], a[2], a[3]}; >} > >and we don't do a vector (unaligned) read even in bar with -O3 >-fno-tree-slp-vectorize, it is just SLP vectorization that makes it >vectorize. >If we should handle foo as convertvector, we should handle bar in the >same spot >as vector load from memory. I see. Forwprop handles some conversions already, would need to check what is missing for this case.
[Bug target/98259] New: [11 Regression] error: 'void verify_insn_chain()' causes a section type conflict with 'void init_rtl_bb_info(basic_block)'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98259 Bug ID: 98259 Summary: [11 Regression] error: 'void verify_insn_chain()' causes a section type conflict with 'void init_rtl_bb_info(basic_block)' Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: doko at debian dot org Target Milestone: --- seen with 20201212 with a profiled bootstrap on arm-linux-gnueabihf: ../../src/gcc/symtab.c: In static member function 'static void symtab_node::verify_symtab_nodes()': ../../src/gcc/symtab.c:1349:1: error: 'void symtab_node::verify()' causes a section type conflict with 'void symbol_table::sym tab_initialize_asm_name_hash()' 1349 | symtab_node::verify (void) | ^~~ ../../src/gcc/symtab.c:259:1: note: 'void symbol_table::symtab_initialize_asm_name_hash()' was declared here 259 | symbol_table::symtab_initialize_asm_name_hash (void) | ^~~~ make[5]: *** [Makefile:1123: symtab.o] Error 1 make[5]: *** Waiting for unfinished jobs ../../src/gcc/cfgrtl.c: In function 'void cfg_layout_finalize()': ../../src/gcc/cfgrtl.c:4057:1: error: 'void verify_insn_chain()' causes a section type conflict with 'void init_rtl_bb_info(basic_block)' 4057 | verify_insn_chain (void) | ^ ../../src/gcc/cfgrtl.c:5134:1: note: 'void init_rtl_bb_info(basic_block)' was declared here 5134 | init_rtl_bb_info (basic_block bb) | ^~~~ make[5]: *** [Makefile:1123: cfgrtl.o] Error 1 gcc is configured with: --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --with-gcc-major-version-only --program-prefix= --enable-shared --enable-linker-build-id --disable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=yes --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf build target: profiledbootstrap-lean
[Bug libgomp/98258] New: Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 Bug ID: 98258 Summary: Can't compile programs for both OpenMP (CPU) + OpenACC (GPU) Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: mehdi.chinoune at hotmail dot com CC: jakub at gcc dot gnu.org Target Milestone: --- Trying to use OpenMP (CPU) for some parts and OpenACC (GPU) for others. I got: mkoffload: fatal error: either '-fopenacc' or '-fopenmp' must be set Another use is for multi-GPU programming, where OpenMP is used to distribute work among different GPUs
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #8 from Chinoune --- Adding "parallel do" to openmp directive solves the problem. The crash reappears with "collapse(2)" with both OpenMP and OpenACC. program main implicit none integer, parameter :: sp = selected_real_kind(6,37) real(sp), allocatable :: a(:,:), b(:,:), c(:,:) character( len=5 ) :: val integer :: n, l, m integer :: i, j, k integer :: t1, t2 real(sp) :: tic ! call get_command_argument( 1, val ) read( val, *) n l = n m = n ! call system_clock( t1, tic) ! allocate( a(l,m), b(m,n), c(l,n) ) ! call random_number(a) call random_number(b) c = 0._sp ! !$acc data copyin(a,b) copy(c) !$acc parallel loop collapse(3) !$omp target teams distribute parallel do collapse(3) map( to:a,b ) map( tofrom:c ) do j = 1, n do k = 1, m do i = 1, l c(i,j) = a(i,k)*b(k,j) + c(i,j) end do end do end do !$acc end data ! call system_clock(t2) print*, n, (t2-t1)/tic, sum(c) ! end program main $ gfortran -O3 -fopenmp -foffload=nvptx-none matmul.f90 -o test.x $ for i in {1..5}; do ./test.x $((512*2**$i)); done 1024 0.28788 268377424. 2048 7.4010E-02 0. 4096 0.17002 0. 8192 0.57401 0. 16384 2.1049 0.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Known to fail|10.1.0 |10.2.0 Keywords||openacc Version|10.1.0 |10.2.0 --- Comment #7 from Chinoune --- with OpenACC, I got a similar message: libgomp: cuStreamSynchronize error: the launch timed out and was terminated
[Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257 Bug ID: 98257 Summary: Replace Donald B. Johnson's cycle enumeration with iterative loop finding Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- gcov used _J. C. Tiernan, An Efficient Search Algorithm to Find the Elementary Circuits of a Graph, Comm ACM 1970_. The worst-case time bound is exponential in the number of elementary circuits. It enumerated cycles (aka simple circuit, aka elementary circuit) and performed cycle cancelling. In 2016, the resolution to PR67992 switched to Donald B. Johnson's algorithm to improve performance. The theoretical time complexity is $O((V+E)(c+1))$ where $c$ is the number of cycles, which is exponential in the size of the graph. (Boost attributed the algorithm to K. A. Hawick and H. A. James, and gcov inherited this name. However, that paper did not improve Johnson's algorithm.) Actually every step of cycle cancelling decreases the count of at lease one arc to 0, so there is at most $O(E)$ cycles. The resolution to PR90380 skipped non-positive arcs and decreased the time complexity to $O(V*E^2)$ (in theory it could be $O(E^2)$ but the implementation has a linear scan). This is all unnecessary. We can just iteratively find cycles (using the classical tri-color DFS) and perform cycle cancelling. There are at most O(E) cycles and the overall time complexity is O(E^2). ( We are processing a reducible flow graph (there is no intuitive cycle count for an irreducible flow graph). Every natural loop is identified by a back edge. By constructing a dominator tree, finding back edges, identifying natural loops and clearing the arc counters (we will compute incoming counts so we clear counters to prevent duplicates), the time complexity can be decreased to $O(depthOfNestedLoops*E)$. In practice, the semi-NCA algorithm (time complexity: $O(V^2)$, but considered faster than the almost linear Lengauer-Tarjan's algorithm) is not difficult to implement, but identifying natural loops is troublesome. So the method is not useful.)
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #9 from Steve Kargl --- On Sat, Dec 12, 2020 at 11:55:41PM +, damian at sourceryinstitute dot org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 > > Damian Rouson changed: > >What|Removed |Added > > Resolution|--- |FIXED > Status|WAITING |RESOLVED > > --- Comment #8 from Damian Rouson --- > Steve, one more question. How do you interpret the second sentence in the > text > that I originally quoted: "In each execution of the program with the same > execution environment, if the invoking image index value in the initial team > is > the same, the value for PUT shall be the same." This is in 16.9.155 Case (i) > describing the relationship between random_init and random_seed. I originally > interpreted this quote to mean that each image would use the same seed each > time the program runs, which would be a constraint on the PRNG. I'm now > thinking that the reference to PUT implies that the user is setting the seed > and this is saying that the program must set the same seed each a given image > executes, but that seems like an odd constraint so I'm probably still horribly > confused. Feel free to mark this issue as invalid if this is starting to seem > like a waste of time. I'm just trying to understand. > > Either way, an image number is defined for all programs whether or not there > are coarrays anywhere in the program and whether or not the program is ever > executed in multiple images -- for example, this_image() is just an intrinsic > function rather than a (hypothetical) "coarray" intrinsic function. This > point > is most meaningful with a compiler like the Cray compiler, which requires no > special flags to compile a program that invokes this_image(). In some sense, > all Fortran programs are now parallel programs whether the user takes > advantage > of that fact in any explicit way or not. I suspect that's the reason that > IMAGE_DISTINCT is not optional. Possibly the committee deemed it better to > require users to specify the desired behavior in multi-image execution. Even > libraries that were never designed in any way to exploit parallelism can be > linked into parallel programs so it seems better to have developers of such a > library specify the desired behavior if their code is ultimately linked into a > parallel program -- analogous to requiring that code be thread-safe even if > the > code makes no explicit use of multi-threading. > It's been awhile since I implemented random_init(), and thought about the combinations for the two arguments. Presonally, I think the standard is flawed. If someone wants to review the wording of the standard and the implementation details of random_init(), I am certainly not going to object.
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 Damian Rouson changed: What|Removed |Added Resolution|--- |FIXED Status|WAITING |RESOLVED --- Comment #8 from Damian Rouson --- Steve, one more question. How do you interpret the second sentence in the text that I originally quoted: "In each execution of the program with the same execution environment, if the invoking image index value in the initial team is the same, the value for PUT shall be the same." This is in 16.9.155 Case (i) describing the relationship between random_init and random_seed. I originally interpreted this quote to mean that each image would use the same seed each time the program runs, which would be a constraint on the PRNG. I'm now thinking that the reference to PUT implies that the user is setting the seed and this is saying that the program must set the same seed each a given image executes, but that seems like an odd constraint so I'm probably still horribly confused. Feel free to mark this issue as invalid if this is starting to seem like a waste of time. I'm just trying to understand. Either way, an image number is defined for all programs whether or not there are coarrays anywhere in the program and whether or not the program is ever executed in multiple images -- for example, this_image() is just an intrinsic function rather than a (hypothetical) "coarray" intrinsic function. This point is most meaningful with a compiler like the Cray compiler, which requires no special flags to compile a program that invokes this_image(). In some sense, all Fortran programs are now parallel programs whether the user takes advantage of that fact in any explicit way or not. I suspect that's the reason that IMAGE_DISTINCT is not optional. Possibly the committee deemed it better to require users to specify the desired behavior in multi-image execution. Even libraries that were never designed in any way to exploit parallelism can be linked into parallel programs so it seems better to have developers of such a library specify the desired behavior if their code is ultimately linked into a parallel program -- analogous to requiring that code be thread-safe even if the code makes no explicit use of multi-threading.
[Bug middle-end/98227] [11 Regression] ICE: tree check: expected tree that contains 'decl common' structure, have 'constructor' in get_section, at varasm.c:297 on riscv64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98227 --- Comment #5 from Jim Wilson --- My bootstrap with ada succeeded. I used the same configure options except for --prefix. make check is still running.
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #7 from Damian Rouson --- I agree that it would have been better for image_distinct to be optional. I co-hosted the 2018 WG5 meeting at which there were lengthy discussions around random number generation. I don't recall whether making that argument optional was discussed. I assume it wouldn't break any existing code to make it optional in a future standard.
[Bug tree-optimization/98256] [11 Regression] ICE at -Os and above: verify_ssa failed since r11-5957
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98256 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-12-12 Target Milestone|--- |11.0 Ever confirmed|0 |1 Summary|ICE at -Os and above: |[11 Regression] ICE at -Os |verify_ssa failed |and above: verify_ssa ||failed since r11-5957 CC||jakub at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #6 from kargl at gcc dot gnu.org --- (In reply to Dominique d'Humieres from comment #4) > Invalid expectation? Not sure. This long response was composed before I saw Damian's reply. At the risk of starting an existential argument, I'll provide my understanding of the situtation. Prior to random_init(), Fortran had random_seed(). When J3 added random_number() and random_seed() to Fortran standard, the individual who wrote the specification forgot to include a statement about the state of the PRNG if random_seed() was not called. So, this simple program program foo real r call random_number(r) print *, r end program foo when compiled with ifort would give a different PRN on each invocation. A long time ago, when compiled with gfortran, the program always gave the same PRN. (Janne changed gfortran's behavior when he replaced replaced the KISS PRNG with xshiro++.) The problem was that ifort used a different processor-dependent set of seeds on each invocation whereas gfortran used the same processor-dependent set of seeds on each invocation. Both behaviors are standard conforming. To resolve the problem, J3 could not select one behavior over the other without causing problems with a conforming program that relied on the old behavior. Steve Lionel wrote the specification for random_init() in hopes of fixing shortcomings of random_seed(). Unfortunately, he (and/or J3) decided to conflated behavior for coarray programs into the specification. Consider the simply non-coarray program: % cat u.f90 program foo call random_init(repeatable=.false., image_distinct=.false.) do i = 1, 3 call random_number(r) print '(F8.5), r end do print * call random_init(repeatable=.false., image_distinct=.false.) do i = 1, 3 call random_number(r) print '(F8.5), r end do end program foo % gfcx -o z u.f90 && ./z 0.78330 0.40072 0.22728 0.44823 0.12879 0.50003 Exactly, the behavior one would expect. Reseeding the PRNG uses a new set of processor-dependent seeds. If 'z' is run again, a different set of processor-dependent seeds are used. Now change the code to have 'repeatable=.true.' % gfcx -o z u.f90 && ./z 0.67367 0.06375 0.69694 0.67367 0.06375 0.69694 Exactly, what one expects. When the second random_init() is called, the PRNG is re-initialized with the original set of processor-dependent seeds. What happens if the executable is run again? Well, % ./z 0.34318 0.90421 0.38122 0.34318 0.90421 0.38122 A different set of processor-dependent seeds are used to initially seed the PRNG, and when random_init() is called a second time, it uses that "different set of processor-dependent seeds" to re-initialize the PRNG. So, where does existentialism enter into the issue? When executable './z' is run the following occurs: 1) 'image0' is instantiated 2) random_init() is called 3) the do-loop executes 3) 'image0' is terminated. a year later when './z' is run again, the following occurs: a) 'image0' is instantiated b) random_init() is called c) the do-loop executes d) 'image0' is terminated. When 'image0' in a) is instantiated, 'image0' in 1) no longer exists. There is no way to determined what set of processor-dependent seeds were used for random_init() in 2) when random_init() is called in b). It does not matter what value is assigned to image_distinct in the above code. Is 'image0' in a) the same as 'image0' in 1) or are these images distinct? IMO, image_distinct only applies when more than one image is instantiate during the execution of a co-array program. image_distinct should have been an optional argument. Suppose you have a program that has num_images() return a value of 2. You execute that program and the following occurs: I) 'image0' is instantiated II) 'image1' is instantiated III) one or more images call random_init() IV) work is done V) one or more images call random_init(), again. VI) image1 terminates VII) image0 terminates It is here that image_distinct can affect the seeding of PRNG. When I developed random_init(), I spent a few days getting opencoarray installed on my system. I then spent some time trying to getting a reasonable approach of dealing with images (don't remember any consideration about teams). The comment in libgfortran/intrinsics/random_init.f90 details what happens with combinations of 'repeatable' and 'image_distinct'
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #5 from Damian Rouson --- Steve, thanks for all the time you put into implementing random_init and responding to this PR. My confusion stemmed from the first sentence that I quoted from the standard. It states that the provided random_init call is equivalent to a processor-dependent random_seed call so I was attempting to replace my two random_seed calls with one random_init call. I see now that such a replacement only works if one knows the correct, processor-dependent seed values, but I also understand now that it would be pointless to do what I'm trying to do. Because the matching seeds would be processor-dependent, the code wouldn't be portable. On a related note, I've been trying over time to evolve away from using "coarray" as the blanket term for all parallel features. Fortran now has so many parallel features that don't necessarily involve coarrays. The IMAGE_DISTINCT argument is one small example so I don't think IMAGE_DISTINCT necessarily has anything to do with coarrays, but it does have to do with multi-image execution.
[Bug fortran/90207] Debugging generated tree code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90207 --- Comment #5 from Thomas Koenig --- https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561720.html allows debugging of the generated variables.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Resolution|WONTFIX |--- Status|RESOLVED|UNCONFIRMED --- Comment #6 from Chinoune --- Reopen, as I have reproduced the same crash with another GPU.
[Bug tree-optimization/98256] New: ICE at -Os and above: verify_ssa failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98256 Bug ID: 98256 Summary: ICE at -Os and above: verify_ssa failed Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhendong.su at inf dot ethz.ch Target Milestone: --- [547] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.0 20201212 (experimental) [master revision ff2dfdef2f2:87144b47033:815eb852a2d293331eba2e241a986b8641d4da1f] (GCC) [548] % [548] % gcctk -O1 -c small.c [549] % [549] % gcctk -Os -c small.c small.c: In function āgā: small.c:3:6: error: definition in block 2 follows the use 3 | void g() { f(1 && ~a / b); } | ^ for SSA_NAME: b.1_3 in statement: _8 = .ADD_OVERFLOW (a.0_1, b.1_3); during GIMPLE pass: widening_mul small.c:3:6: internal compiler error: verify_ssa failed 0xfa18ab verify_ssa(bool, bool) ../../gcc-trunk/gcc/tree-ssa.c:1214 0xc26fe7 execute_function_todo ../../gcc-trunk/gcc/passes.c:2049 0xc27d92 execute_todo ../../gcc-trunk/gcc/passes.c:2096 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. [550] % [550] % cat small.c extern void f (int); unsigned a, b; void g() { f(1 && ~a / b); }
[Bug tree-optimization/98255] [10/11 Regression] wrong code at -Os and above with -fPIC on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98255 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||jamborm at gcc dot gnu.org Summary|wrong code at -Os and above |[10/11 Regression] wrong |with -fPIC on |code at -Os and above with |x86_64-pc-linux-gnu |-fPIC on ||x86_64-pc-linux-gnu Last reconfirmed||2020-12-12 Target Milestone|--- |10.3 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Jakub Jelinek --- Started with r10-917-g3b47da42de621c6c3bf7d2f9245df989aa7eb5a1
[Bug tree-optimization/98254] Failure to optimize simple pattern for __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 --- Comment #3 from Jakub Jelinek --- (In reply to rguent...@suse.de from comment #2) > Should already be handled by vectorizing the CTOR. I've tried: typedef int __attribute__((vector_size(16))) V; V foo (short *a) { return (V){a[0], a[1], a[2], a[3]}; } V bar (int *a) { return (V){a[0], a[1], a[2], a[3]}; } and we don't do a vector (unaligned) read even in bar with -O3 -fno-tree-slp-vectorize, it is just SLP vectorization that makes it vectorize. If we should handle foo as convertvector, we should handle bar in the same spot as vector load from memory.
[Bug c/98252] gcc 10 unaligned copy (with tree-loop-vectorize) produce wrong result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98252 Jakub Jelinek changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #3 from Jakub Jelinek --- When there is UB, you can't make any assumptions, the program can do anything after it reaches it.
[Bug tree-optimization/98255] New: wrong code at -Os and above with -fPIC on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98255 Bug ID: 98255 Summary: wrong code at -Os and above with -fPIC on x86_64-pc-linux-gnu Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhendong.su at inf dot ethz.ch Target Milestone: --- [510] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.0 20201212 (experimental) [master revision ff2dfdef2f2:87144b47033:815eb852a2d293331eba2e241a986b8641d4da1f] (GCC) [511] % [511] % gcctk -Os small.c; ./a.out [512] % [512] % gcctk -Os -fPIC small.c [513] % ./a.out Segmentation fault [514] % [514] % cat small.c struct a { volatile unsigned b; unsigned c; }; int d, *e, h, k, l; static struct a f; long g; static unsigned i = 4294967294; volatile int j; long m() { char n[4][4][3] = {{{9, 2, 8}, {9, 2, 8}, {9, 2, 8}, {9}}, {{8}}, {{8}}, {{2}}}; while (d) { for (; f.c < 4; f.c++) { *e = 0; h = n[f.c + 4][0][d]; } while (g) return n[0][3][i]; while (1) { if (k) { j = 0; if (j) continue; } if (l) break; } } return 0; } int main() { m(); return 0; }
[Bug fortran/97455] ICE on invalid code (wrong pointer assignment) in SELECT TYPE construct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97455 Dominique d'Humieres changed: What|Removed |Added Last reconfirmed||2020-12-12 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Dominique d'Humieres --- Confirmed since at least GCC7. Note pr86551 is now fixed.
[Bug c/98252] gcc 10 unaligned copy (with tree-loop-vectorize) produce wrong result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98252 --- Comment #2 from Azat --- >If you compile your testcase with -fsanitize=undefined, you'll see that it >invokes UB. Jakub, Indeed I saw them, but is there any explanation (except "UB") why it does copy by 16 if the memory overlaps?
[Bug fortran/86551] [OOP] ICE on invalid code with select type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86551 --- Comment #5 from Dominique d'Humieres --- The ICE is gone for GCC10.2.1 and 11.0.
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 Dominique d'Humieres changed: What|Removed |Added Last reconfirmed||2020-12-12 Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #4 from Dominique d'Humieres --- Invalid expectation?
[Bug tree-optimization/98254] Failure to optimize simple pattern for __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 --- Comment #2 from rguenther at suse dot de --- On December 12, 2020 7:27:01 PM GMT+01:00, "jakub at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 > >Jakub Jelinek changed: > > What|Removed |Added > > CC||jakub at gcc dot gnu.org, > ||rguenth at gcc dot gnu.org > >--- Comment #1 from Jakub Jelinek --- >Guess a task for SLP vectorization. Should already be handled by vectorizing the CTOR.
[Bug c/98252] gcc 10 unaligned copy (with tree-loop-vectorize) produce wrong result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98252 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- If you compile your testcase with -fsanitize=undefined, you'll see that it invokes UB.
[Bug tree-optimization/98254] Failure to optimize simple pattern for __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Guess a task for SLP vectorization.
[Bug fortran/98022] [9/10/11 Regression] ICE in gfc_assign_data_value, at fortran/data.c:468 since r9-3803-ga5fbc2f36a291cbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 --- Comment #9 from Steve Kargl --- On Sat, Dec 12, 2020 at 05:54:43PM +, pault at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 > > --- Comment #8 from Paul Thomas --- > The example that you give shows that setting the undefined part to zero > certainly is not correct. I updated my tree for the commit and am only just > now > rebuilding. It'll be tomorrow before I put this right. > > I guess that this is in the category of invalid but not forbidden. It's in the > same category as: > complex :: a, b > a%im = 1.0 > b = a > print *, a, b > end > Yes, it's invalid under the same portion of section 19 I quoted earlier. 'a' is undefined because 'a%re' is undefined. I cannot find anything in the Standard that requires an error or a warning message.
[Bug target/92729] [avr] Convert the backend to MODE_CC so it can be kept in future releases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92729 --- Comment #43 from abebeos at lazaridis dot com --- The patch is now (after further validation zero regressions within gcc/g++ testsuite in 2 different test-setups) "out there": https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561718.html My understanding of the process tells me: - the relevant maintainers decide about the patch. - if merged, then this issue can be closed. then... - the bounty backers (and only they) decide about the claims to the bounty. https://github.com/bountysource/core/wiki/Frequently-Asked-Questions#how-are-claims-processed 38 backers - it looks quite impossible for one malicious claimant to cheat the system.
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #3 from kargl at gcc dot gnu.org --- Third thought. Here are the programs you meant to write (without error checking such as how_to_use_random_init must be run before how_to_seed_with_random_seed_like_random_init). program how_to_use_random_init implicit none integer fd, i, n integer, allocatable :: seeds(:) real r call random_init(repeatable=.true., image_distinct=.true.) call random_seed(size=n) allocate(seeds(n)) call random_seed(get=seeds) open(newunit=fd,file='seed.cache',access='stream',status='replace') write(fd) seeds close(fd) do i=1,5 call random_number(r) print *,r end do end program how_to_use_random_init program how_to_seed_with_random_seed_like_random_init implicit none integer fd, i, n integer, allocatable :: seeds(:) real r call random_seed(size=n) allocate(seeds(n)) open(newunit=fd,file='seed.cache',access='stream',status='old') read(fd) seeds close(fd) call random_seed(put=seeds) do i=1,5 call random_number(r) print *,r end do end program how_to_seed_with_random_seed_like_random_init
[Bug fortran/98022] [9/10/11 Regression] ICE in gfc_assign_data_value, at fortran/data.c:468 since r9-3803-ga5fbc2f36a291cbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 --- Comment #8 from Paul Thomas --- The example that you give shows that setting the undefined part to zero certainly is not correct. I updated my tree for the commit and am only just now rebuilding. It'll be tomorrow before I put this right. I guess that this is in the category of invalid but not forbidden. It's in the same category as: complex :: a, b a%im = 1.0 b = a print *, a, b end Thanks Paul
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 --- Comment #2 from kargl at gcc dot gnu.org --- On 2nd thought. Of course, the results are different. In your first example, you have call random_init(repeatable=.true., image_distinct=.true.) which gets you processor-dependent seeds. In your second example, you have call random_seed(size=n) call random_seed(put=[(i,i=1,n)]) that is not processor-dependent. You are explicitly seeding the PRNG.
[Bug tree-optimization/98254] New: Failure to optimize simple pattern for __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254 Bug ID: 98254 Summary: Failure to optimize simple pattern for __builtin_convertvector Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef int32_t __attribute__((vector_size(16))) v4i32; typedef int16_t __attribute__((vector_size(8))) v4i16; v4i32 f(short *a) { return (v4i32){a[0], a[1], a[2], a[3]}; } This can be optimized to `return __builtin_convertvector(*(v4i16 *)a, v4i32);` (or at least, something very close to that, if aliasing is to be taken into account). LLVM does this transformation, but GCC does not.
[Bug fortran/98253] Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #1 from kargl at gcc dot gnu.org --- Of course, the results are different. When I wrote random_init(), I asked several times on the J3 list what image_distinct meant. No one would provide an answer. I concluded that image_distinct only affects co-array programs. You can read the long comment in libgfortran/intrinsics/random_init.f90.
[Bug libstdc++/98003] FAIL: 27_io/basic_syncbuf/sync_ops/1.cc (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98003 --- Comment #5 from dave.anglin at bell dot net --- There is no --as-needed support. I think either approach would simplify things as most targets don't need to link against libatomic.
[Bug fortran/98022] [9/10/11 Regression] ICE in gfc_assign_data_value, at fortran/data.c:468 since r9-3803-ga5fbc2f36a291cbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 --- Comment #7 from Steve Kargl --- On Sat, Dec 12, 2020 at 04:02:54PM +, pault at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 > > --- Comment #6 from Paul Thomas --- > (In reply to kargl from comment #4) > > (In reply to Paul Thomas from comment #3) > > > > > function kn1() result(hm2) > > > complex :: hm(1:2), hm2(1:2) > > > data (hm(md)%re, md=1,2)/1.0, 2.0/ > > > hm2 = hm > > > end function kn1 > > > > Are you sure that this is valid Fortran? I cannot > > find anything in the Fortran standard that says hm%im > > is defined. Thus, 'hm2=hm' is referencing a variable > > that is no completely defined. > > > > > > 19.6.1 Definition of objects and subobjects > > > > 2 Arrays, including sections, and variables of derived, character, > > or complex type are objects that consist of zero or more subobjects. > > Associations may be established between variables and subobjects and > > between subobjects of different variables. These subobjects may become > > defined or undefined. > > > > 5 A complex or character scalar object is defined if and only if all > > of its subobjects are defined. > > Hi Steve, > > I saw your comment a bit too late. I think that you are correct. I guess that, > at very least, I should not zero out the undefined part of the complex object? > That way it would be equivalent to using assignment to achieve the same thing > or to partially define a derived type. > > I'll post on clf. > I recall looking at this PR a long time ago, but came up empty with ideas on how to fix it. You've some made some progress. It gets messy (at least to me) to determine if it is valid, and comes from reading 8.6.7 "Data statement", carefully. One gets to A data-stmt-constant other than boz-literal-constant, null-init, or initial-data-target shall be compatible with its corresponding variable according to the rules of intrinsic assignment (10.2.1.2). The variable is initially defined with the value specified by the data-stmt-constant; if necessary, the value is converted according to the rules of intrinsic assignment (10.2.1.3) to a value that agrees in type, type parameters, and shape with the variable. Now, we go to "what is a variable?" R902 variable is designator R901 designator is ... or complex-part-designator ... R915 complex-part-designator is designator % RE or designator % IM In your example, hm%re is real, so the rules for intrinsic assignment to a real applies. Of course, I could be wrong.
[Bug ada/98228] [11 Regression] ICE: Assert_Failure atree.adb:931: Error detected at s-gearop.adb:382:34 [a-ngrear.adb:313:7 [a-nllrar.ads:18:1]] on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98228 --- Comment #3 from Matthias Klose --- I still see this with 20201212, 54f75d8fb3f:a415eda93e0:cc9b9c0b68233d38a26f7acd68cc5f9a8fc4d994
[Bug fortran/98253] New: Conflicting random_seed/random_init results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98253 Bug ID: 98253 Summary: Conflicting random_seed/random_init results Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: damian at sourceryinstitute dot org Target Milestone: --- 16.9.155 Case (i) in the Fortran 2018 standard states CALL RANDOM_INIT (REPEATABLE=true, IMAGE_DISTINCT=true) is equivalent to invoking RANDOM_SEED with a processor-dependent value for PUT that is different on every invoking image. In each execution of the program with the same execution environment, if the invoking image index value in the initial team is the same, the value for PUT shall be the same. but the two programs below give different results. % cat random_init.f90 implicit none integer i real r call random_init(repeatable=.true., image_distinct=.true.) do i=1,5 call random_number(r) print *,r end do end % cat random_seed.f90 implicit none integer i, n real r call random_seed(size=n) call random_seed(put=[(i,i=1,n)]) do i=1,5 call random_number(r) print *,r end do end % /usr/local/Cellar/gnu/11.0.0/bin/gfortran random_init.f90 % ./a.out 0.731217086 0.652637541 0.381399393 0.817764997 0.394176722 % /usr/local/Cellar/gnu/11.0.0/bin/gfortran random_seed.f90 % ./a.out 0.471070886 0.117344737 0.357547939 0.318134785 0.696753800 % /usr/local/Cellar/gnu/11.0.0/bin/gfortran --version GNU Fortran (GCC) 11.0.0 20200804 (experimental)
[Bug fortran/98022] [9/10/11 Regression] ICE in gfc_assign_data_value, at fortran/data.c:468 since r9-3803-ga5fbc2f36a291cbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 --- Comment #6 from Paul Thomas --- (In reply to kargl from comment #4) > (In reply to Paul Thomas from comment #3) > > > function kn1() result(hm2) > > complex :: hm(1:2), hm2(1:2) > > data (hm(md)%re, md=1,2)/1.0, 2.0/ > > hm2 = hm > > end function kn1 > > Are you sure that this is valid Fortran? I cannot > find anything in the Fortran standard that says hm%im > is defined. Thus, 'hm2=hm' is referencing a variable > that is no completely defined. > > > 19.6.1 Definition of objects and subobjects > > 2 Arrays, including sections, and variables of derived, character, > or complex type are objects that consist of zero or more subobjects. > Associations may be established between variables and subobjects and > between subobjects of different variables. These subobjects may become > defined or undefined. > > 5 A complex or character scalar object is defined if and only if all > of its subobjects are defined. Hi Steve, I saw your comment a bit too late. I think that you are correct. I guess that, at very least, I should not zero out the undefined part of the complex object? That way it would be equivalent to using assignment to achieve the same thing or to partially define a derived type. I'll post on clf. Cheers Paul
[Bug fortran/98022] [9/10/11 Regression] ICE in gfc_assign_data_value, at fortran/data.c:468 since r9-3803-ga5fbc2f36a291cbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98022 --- Comment #5 from CVS Commits --- The master branch has been updated by Paul Thomas : https://gcc.gnu.org/g:ff2dfdef2f2e01c579dd280daa1d81fbeb4d7ac5 commit r11-5959-gff2dfdef2f2e01c579dd280daa1d81fbeb4d7ac5 Author: Paul Thomas Date: Sat Dec 12 14:01:08 2020 + Fortran: Enable inquiry references in data statements [PR98022]. 2020-12-12 Paul Thomas gcc/fortran PR fortran/98022 * data.c (gfc_assign_data_value): Handle inquiry references in the data statement object list. gcc/testsuite/ PR fortran/98022 * gfortran.dg/data_inquiry_ref.f90: New test.
[Bug fortran/97920] [FINAL] -O2 segment fault due to extend derive type's member being partially allocated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97920 Thomas Koenig changed: What|Removed |Added Resolution|--- |INVALID Status|WAITING |RESOLVED --- Comment #3 from Thomas Koenig --- Paul is correct, the state of the pointers is undefined. What you can do to correct this is to use module m type t1 real, dimension(:), pointer :: a => NULL() contains final :: t1f end type type, extends(t1) :: t2 real, dimension(:), pointer :: b => NULL() contains final :: t2f end type which will then run as expected.
[Bug tree-optimization/96685] Failure to optimize not+sub to add+not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96685 --- Comment #7 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:0bd675183d94e6bca100c3aaaf87ee9676fb3c26 commit r11-5958-g0bd675183d94e6bca100c3aaaf87ee9676fb3c26 Author: Jakub Jelinek Date: Sat Dec 12 14:49:57 2020 +0100 match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] This patch adds the ~(X - Y) -> ~X + Y simplification requested in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can be safely negated. The first two simplify blocks is what has been requested in the PR and that makes the first testcase pass. Unfortunately, that change also breaks the second testcase, because while the same expressions appearing in the same stmt and split across multiple stmts has been folded (not really) before, with this optimization fold-const.c optimizes ~X + Y further into (Y - X) - 1 in fold_binary_loc associate: code, but we have nothing like that in GIMPLE and so end up with different expressions. The last simplify is an attempt to deal with just this case, had to rule out there the Y == -1U case, because then we reached infinite recursion as ~X + -1U was canonicalized by the pattern into (-1U - X) + -1U but there is a canonicalization -1 - A -> ~A that turns it back. Furthermore, had to make it #if GIMPLE only, because it otherwise resulted in infinite recursion when interacting with the associate: optimization. The end result is that we pass all 3 testcases and thus canonizalize the 3 possible forms of writing the same thing. 2020-12-12 Jakub Jelinek PR tree-optimization/96685 * match.pd (~(X - Y) -> ~X + Y): New optimization. (~X + Y -> (Y - X) - 1): Likewise. * gcc.dg/tree-ssa/pr96685-1.c: New test. * gcc.dg/tree-ssa/pr96685-2.c: New test. * gcc.dg/tree-ssa/pr96685-3.c: New test.
[Bug tree-optimization/96272] Failure to optimize overflow check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96272 --- Comment #7 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:fe78528c05fdd562f21e12675781473b0fbe892e commit r11-5957-gfe78528c05fdd562f21e12675781473b0fbe892e Author: Jakub Jelinek Date: Sat Dec 12 14:48:47 2020 +0100 widening_mul: Recognize another form of ADD_OVERFLOW [PR96272] The following patch recognizes another form of hand written __builtin_add_overflow (this time _p), in particular when the code does unsigned if (x > ~0U - y) or if (x <= ~0U - y) it can be optimized (if the subtraction turned into ~y is single use) into if (__builtin_add_overflow_p (x, y, 0U)) or if (!__builtin_add_overflow_p (x, y, 0U)) and generate better code, e.g. for the first function in the testcase: - movl%esi, %eax addl%edi, %esi - notl%eax - cmpl%edi, %eax - movl$-1, %eax - cmovnb %esi, %eax + jc .L3 + movl%esi, %eax + ret +.L3: + orl $-1, %eax ret on x86_64. As for the jumps vs. conditional move case, that is some CE issue with complex branch patterns we should fix up no matter what, but in this case I'm actually not sure if branchy code isn't better, overflow is something that isn't that common. 2020-12-12 Jakub Jelinek PR tree-optimization/96272 * tree-ssa-math-opts.c (uaddsub_overflow_check_p): Add OTHER argument. Handle BIT_NOT_EXPR. (match_uaddsub_overflow): Optimize unsigned a > ~b into __imag__ .ADD_OVERFLOW (a, b). (math_opts_dom_walker::after_dom_children): Call match_uaddsub_overflow even for BIT_NOT_EXPR. * gcc.dg/tree-ssa/pr96272.c: New test.
[Bug c/98252] New: gcc 10 unaligned copy (with tree-loop-vectorize) produce wrong result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98252 Bug ID: 98252 Summary: gcc 10 unaligned copy (with tree-loop-vectorize) produce wrong result Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: a3at.mail at gmail dot com Target Milestone: --- Created attachment 49750 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49750&action=edit test case In the attachment there is an example of two functions: - incremental_copy_fast_path - incremental_copy_fast_path_safe If it will be compiled with -O1 -ftree-loop-vectorize, safe variants works correctly (incremental_copy_fast_path_safe), while other (incremental_copy_fast_path) does not, and looks like the problem is that it copies 16 bytes at a time (movdqu+movups), while this does not looks correct, since it may be changed after copying (since the memory overlaps). Is this some problem in the code due to some UB because of unaligned store/load? Or a compiler issue? Thanks in advance!