[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #15 from jakub at gcc dot gnu dot org 2010-06-25 08:13 --- Created an attachment (id=21000) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21000action=view) gcc46-pr43866.patch Here is a complete fix. This includes what the earlier two patches did, plus for nested tree_unswitch_single_loop call in the first pass optimizes all conditions using entry tests, then if there are some possible unswitching condition candidates does a still reachable loop bb discovery and only considers bbs in still unreachable bbs. Not copying the unreachable blocks would be IMHO harder. The current pass was trying to be very simple, not modify the original loop, just duplicate it and add a guard condition in front of the two loops, then in the nested call just simplify conditions using the entry checks and leave all the cleanups to cfg cleanup after the pass (e.g. to avoid redoing loop discovery etc.). With the patch pr43866.s is actually smaller, eventhough more loops are unswitched (previously some of the 15 unswitched loops were on 0 != 0 or 1 != 0 conditions and thus thrown away immediately). -- jakub at gcc dot gnu dot org changed: What|Removed |Added Attachment #20997|0 |1 is obsolete|| Attachment #20998|0 |1 is obsolete|| AssignedTo|unassigned at gcc dot gnu |jakub at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #16 from rakdver at gcc dot gnu dot org 2010-06-25 09:04 --- Now, in the first loop if we decide to unswitch on cond3, it transforms this into: ... If cond3 tests some variable that is initialized only if cond1 is false, this unswitching (besides not being very useful because cond3 is never tested when cond1 is false in the original program) results in jumps on uninitialized values comparison. while unswitching on cond3 indeed does not make sense, I don't think it can result in a jump on uninitialized value comparison. Unswitching only works if the arguments of the comparison are ssa names whose values are computed outside of the loop (it does not do invariant motion by itself). So, the computations based on uninitialized values would have to be introduced by some other pass (possibly invariant motion). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #17 from rakdver at gcc dot gnu dot org 2010-06-25 09:12 --- (In reply to comment #16) Now, in the first loop if we decide to unswitch on cond3, it transforms this into: ... If cond3 tests some variable that is initialized only if cond1 is false, this unswitching (besides not being very useful because cond3 is never tested when cond1 is false in the original program) results in jumps on uninitialized values comparison. while unswitching on cond3 indeed does not make sense, I don't think it can result in a jump on uninitialized value comparison. Unswitching only works if the arguments of the comparison are ssa names whose values are computed outside of the loop (it does not do invariant motion by itself). So, the computations based on uninitialized values would have to be introduced by some other pass (possibly invariant motion). unless the arguments of the comparison are completely uninitialized (their values are not defined in the function and are not parameters of the function), of course -- is that the case here? We should avoid unswitching on such conditions (in addition to the proposed patch), since the comparison could be unreachable due to some reason that cannot be detected in compile-time. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #18 from jakub at gcc dot gnu dot org 2010-06-25 09:14 --- That part comes from the questionable testcase, which does a_sp = matrix%local_data_sp before the loop unconditionally, eventhough matrix%local_data_sp is uninitialized unless use_sp is .true. Without the unswitching the values from the uninitialized struct matrix%local_data_sp are read, but not tested when use_sp is .false. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #19 from jakub at gcc dot gnu dot org 2010-06-25 09:19 --- The compiler doesn't know that matrix%local_data_sp is uninitialized (at least unless it would propagate that info in some IPA pass), matrix%local_data_sp is a field in what an arguments points at (and the argument is known to be non-NULL). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #20 from jv244 at cam dot ac dot uk 2010-06-25 09:28 --- (In reply to comment #18) That part comes from the questionable testcase, which does a_sp = matrix%local_data_sp before the loop unconditionally, eventhough matrix%local_data_sp is uninitialized unless use_sp is .true. thanks for looking into the bug report. Actually the testcase can be turned in 'unquestionable' by adding a 'NULLIFY(a%local_data_sp)' in the main program. At that point the pointer assignment (a_sp = matrix%local_data_sp) makes a_sp defined, but valgrind still yields an error. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #21 from jakub at gcc dot gnu dot org 2010-06-25 09:38 --- True. Most of the fields of the pointer are still uninitialized (NULLIFY only clears the data field). I'm afraid when NULLIFY doesn't clear the whole struct there will be always a possibility valgrind might complain, the tree optimizers see the comparison as not having any side-effects, not possibly trapping etc., so some pass might exchange the two comparisons somewhere. But worse case it would be just a valgrind warning, conditional jump based on uninitialized values which in the end wouldn't have any runtime visible effect. This patch doesn't try to do anything about that, just improve unswitching such that it chooses conditions that it can usefully unswitch on (and not unswitch on always false/true conditions). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #22 from jakub at gcc dot gnu dot org 2010-06-25 12:11 --- Subject: Bug 43866 Author: jakub Date: Fri Jun 25 12:10:42 2010 New Revision: 161375 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=161375 Log: PR middle-end/43866 * tree-ssa-loop-unswitch.c (tree_may_unswitch_on): If stmt is always true or always false, return NULL_TREE. (tree_unswitch_single_loop): Optimize conditions even when reaching max-unswitch-level parameter. If num 0, optimize first all conditions using entry checks, then do still reachable block discovery and consider only conditions in still reachable basic blocks in the loop. * gfortran.dg/pr43866.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/pr43866.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-unswitch.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #23 from jakub at gcc dot gnu dot org 2010-06-25 12:30 --- Fixed on the trunk so far. -- jakub at gcc dot gnu dot org changed: What|Removed |Added Known to fail|4.3.1 4.4.0 4.5.0 4.6.0 |4.3.1 4.4.0 4.5.0 Known to work|4.1.2 |4.1.2 4.6.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #10 from jakub at gcc dot gnu dot org 2010-06-24 13:31 --- This looks like a bug in the source to me, certainly not a middle-end or tree-optimization bug. The a_sp = matrix%local_data_sp assignment when matrix%local_data_sp is uninitialized means you are copying uninitialized data. Either nullify it in the caller or do this assignment only if (use_sp). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #11 from jakub at gcc dot gnu dot org 2010-06-24 16:21 --- Created an attachment (id=20997) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20997action=view) unswitch-tuplification.patch While the validity of the testcase is questionable, what unswitching does looks certainly wrong. I'm attaching two small patches that fix things I've found while reading tree-ssa-loop-unswitch. The real problem for this testcase though seems to be that there is no unreachable block removal in between the unswitching levels, so it can very well choose to unswitch on a condition that is never reachable in the unswitched loop. Say there is while (...) { if (cond1) { if (cond2) A else B } else { if (cond3) C else D } E } The first level chooses to unswitch on the cond1 condition, so we have: if (cond1) while (...) { if (1 != 0) { if (cond2) A else B } else { if (cond3) C else D } E } else while (...) { if (0 != 0) { if (cond2) A else B } else { if (cond3) C else D } E } Now, in the first loop if we decide to unswitch on cond3, it transforms this into: if (cond1) { if (cond3) while (...) { if (1 != 0) { if (cond2) A else B } else { if (1 != 0) C else D } E } else while (...) { if (1 != 0) { if (cond2) A else B } else { if (0 != 0) C else D } E } } else while (...) { if (0 != 0) { if (cond2) A else B } else { if (cond3) C else D } E } If cond3 tests some variable that is initialized only if cond1 is false, this unswitching (besides not being very useful because cond3 is never tested when cond1 is false in the original program) results in jumps on uninitialized values comparison. Anyway, this first patch tries to fix a bug caused by tuplification. Before tuplification, COND_EXPR_COND would be 0 or 1 when optimized, but 0 != 0 or 1 != 0 condition never satisfies integer_zerop nor integer_nonzerop. We could fold it, but we already have predicates for this stuff. This patch actually causes more unswitching to happen, because it never unswitches on 0 != 0 or 1 != 0 conditions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #12 from jakub at gcc dot gnu dot org 2010-06-24 16:24 --- Created an attachment (id=20998) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20998action=view) level-opt.patch When we reach the max-unswitch-level param, the function returns immediately, which means the conditions the last unswitching optimized aren't simplified (likely that happens in some later optimization passes, but unswitching has code to handle it, so I think it won't hurt to do it immediately). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #13 from jakub at gcc dot gnu dot org 2010-06-24 16:47 --- For the last issue, perhaps doing delete_unreachable_blocks in between the levels wouldn't work too well, loops would need to be discovered again etc. Maybe we can just do something similar to find_unreachable_blocks limited only to the loop bbs and ignoring false resp. true edges if gimple_cond_{true,false}_p on the last stmt and not consider as loop unswitching candidate any bb's that aren't reachable from the header in the loop. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #14 from rguenth at gcc dot gnu dot org 2010-06-24 20:49 --- Both patches look ok. For the remaining issue it would probably work to not copy unreachable blocks during unswitching? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #8 from jv244 at cam dot ac dot uk 2010-05-22 14:09 --- (In reply to comment #7) 4.5.1, using '-O2 -funswitch-loops' obviously '-O2 -funswitch-loops -fbounds-check' -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #9 from rguenth at gcc dot gnu dot org 2010-05-22 18:14 --- GCC 4.3.5 is being released, adjusting target milestone. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.3.5 |4.3.6 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #7 from jv244 at cam dot ac dot uk 2010-04-26 11:06 --- I have been trying to reduce more, but this is the smallest so far, seems like it needs the derived types, the 2D arrays, the pointers. I've reduced this with 4.5.1, using '-O2 -funswitch-loops' MODULE M1 IMPLICIT NONE TYPE cp_fm_type REAL(KIND=4), DIMENSION(:,:), POINTER :: local_data_sp REAL(KIND=8), DIMENSION(:,:), POINTER :: local_data END TYPE CONTAINS SUBROUTINE cp_fm_upper_to_full(matrix,nrow_global,ncol_global,use_sp) TYPE(cp_fm_type), POINTER :: matrix INTEGER, INTENT(IN) :: ncol_global, nrow_global INTEGER :: irow_global, icol_global LOGICAL :: use_sp REAL(KIND = 4), DIMENSION(:,:), POINTER :: a_sp REAL(KIND = 8), DIMENSION(:,:), POINTER :: a a = matrix%local_data a_sp = matrix%local_data_sp DO irow_global=1,nrow_global DO icol_global=irow_global+1,ncol_global IF(use_sp) THEN a_sp(icol_global,irow_global)=a_sp(irow_global,icol_global) ELSE a(icol_global,irow_global)=a(irow_global,icol_global) ENDIF ENDDO ENDDO END SUBROUTINE cp_fm_upper_to_full END MODULE M1 USE M1 TYPE(cp_fm_type), POINTER :: a INTEGER, PARAMETER :: N=17 ALLOCATE(a) ALLOCATE(a%local_data(N,N)) a%local_data=0 CALL cp_fm_upper_to_full(a,N,N,.FALSE.) END -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866
[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops
--- Comment #6 from jv244 at cam dot ac dot uk 2010-04-23 17:29 --- both testcases work with 4.1.2. I've also checked various versions of valgrind, which all report consistent results. Also 4.5 only fails on the original testcase. for reference, -v gives: gcc-4.5/bin/gfortran-4.5 -O3 -fbounds-check -g -static -v t.f90 Driving: gcc-4.5/bin/gfortran-4.5 -O3 -fbounds-check -g -static -v t.f90 -lgfortran -lm Using built-in specs. COLLECT_GCC=gcc-4.5/bin/gfortran-4.5 COLLECT_LTO_WRAPPER=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /lustre/ESPFS/scratch/pci/vondele/gcc-4.5/gcc-4.5.0/configure --prefix=/home/pci/vondele/gcc-4.5 --enable-languages=c,c++,fortran --program-suffix=-4.5 --disable-multilib --with-gmp=/home/pci/vondele/gcc-4.5 --with-mpfr=/home/pci/vondele/gcc-4.5 --with-mpc=/home/pci/vondele/gcc-4.5 Thread model: posix gcc version 4.5.0 (GCC) COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic' '-march=x86-64' /panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/f951 t.f90 -quiet -dumpbase t.f90 -mtune=generic -march=x86-64 -auxbase t -g -O3 -version -fbounds-check -fintrinsic-modules-path /panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/finclude -o /tmp/cc4wFHAz.s GNU Fortran (GCC) version 4.5.0 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.5.0, GMP version 4.2.4, MPFR version 2.4.1, MPC version 0.8 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU Fortran (GCC) version 4.5.0 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.5.0, GMP version 4.2.4, MPFR version 2.4.1, MPC version 0.8 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic' '-march=x86-64' as -V -Qy --64 -o /tmp/ccvIH9et.o /tmp/cc4wFHAz.s GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version 2.16.91.0.5 20051219 (SUSE Linux) COMPILER_PATH=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/ LIBRARY_PATH=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic' '-march=x86-64' /panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/collect2 -m elf_x86_64 -static /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtbeginT.o -L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0 -L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc -L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../.. /tmp/ccvIH9et.o -lgfortran -lm --start-group -lgcc -lgcc_eh -lc --end-group /panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtend.o /usr/lib/../lib64/crtn.o -- jv244 at cam dot ac dot uk changed: What|Removed |Added Known to fail|4.3.1 4.4.0 4.6.0 |4.3.1 4.4.0 4.5.0 4.6.0 Known to work||4.1.2 Summary|wrong code with -fbounds- |[4.3/4.4/4.5/4.6 Regression] |check -funswitch-loops |wrong code with -fbounds- ||check -funswitch-loops Target Milestone|--- |4.3.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866