[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #15 from jakub at gcc dot gnu dot org  2010-06-25 08:13 ---
Created an attachment (id=21000)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21000action=view)
gcc46-pr43866.patch

Here is a complete fix.

This includes what the earlier two patches did, plus for nested
tree_unswitch_single_loop call in the first pass optimizes all conditions using
entry tests, then if there are some possible unswitching condition candidates
does a still reachable loop bb discovery and only considers bbs in still
unreachable bbs.

Not copying the unreachable blocks would be IMHO harder.  The current pass was
trying to be very simple, not modify the original loop, just duplicate it and
add a guard condition in front of the two loops, then in the nested call just
simplify conditions using the entry checks and leave all the cleanups to cfg
cleanup after the pass (e.g. to avoid redoing loop discovery etc.).

With the patch pr43866.s is actually smaller, eventhough more loops are
unswitched (previously some of the 15 unswitched loops were on 0 != 0 or 1 != 0
conditions and thus thrown away immediately).


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

  Attachment #20997|0   |1
is obsolete||
  Attachment #20998|0   |1
is obsolete||
 AssignedTo|unassigned at gcc dot gnu   |jakub at gcc dot gnu dot org
   |dot org |
 Status|UNCONFIRMED |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread rakdver at gcc dot gnu dot org


--- Comment #16 from rakdver at gcc dot gnu dot org  2010-06-25 09:04 
---
 Now, in the first loop if we decide to unswitch on cond3, it transforms this
 into:
...
 If cond3 tests some variable that is initialized only if cond1 is false, this
 unswitching (besides not being very useful because cond3 is never tested when
 cond1 is false in the original program) results in jumps on uninitialized
 values comparison.

while unswitching on cond3 indeed does not make sense, I don't think it can
result in a jump on uninitialized value comparison.  Unswitching only works if
the arguments of the comparison are ssa names whose values are computed outside
of the loop (it does not do invariant motion by itself).  So, the computations
based on uninitialized values would have to be introduced by some other pass
(possibly invariant motion).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread rakdver at gcc dot gnu dot org


--- Comment #17 from rakdver at gcc dot gnu dot org  2010-06-25 09:12 
---
(In reply to comment #16)
  Now, in the first loop if we decide to unswitch on cond3, it transforms this
  into:
 ...
  If cond3 tests some variable that is initialized only if cond1 is false, 
  this
  unswitching (besides not being very useful because cond3 is never tested 
  when
  cond1 is false in the original program) results in jumps on uninitialized
  values comparison.
 
 while unswitching on cond3 indeed does not make sense, I don't think it can
 result in a jump on uninitialized value comparison.  Unswitching only works if
 the arguments of the comparison are ssa names whose values are computed 
 outside
 of the loop (it does not do invariant motion by itself).  So, the computations
 based on uninitialized values would have to be introduced by some other pass
 (possibly invariant motion).

unless the arguments of the comparison are completely uninitialized (their
values are not defined in the function and are not parameters of the function),
of course -- is that the case here?  We should avoid unswitching on such
conditions (in addition to the proposed patch), since the comparison could be
unreachable due to some reason that cannot be detected in compile-time.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #18 from jakub at gcc dot gnu dot org  2010-06-25 09:14 ---
That part comes from the questionable testcase, which does
a_sp = matrix%local_data_sp
before the loop unconditionally, eventhough matrix%local_data_sp is
uninitialized unless use_sp is .true.
Without the unswitching the values from the uninitialized struct
matrix%local_data_sp are read, but not tested when use_sp is .false.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #19 from jakub at gcc dot gnu dot org  2010-06-25 09:19 ---
The compiler doesn't know that matrix%local_data_sp is uninitialized (at least
unless it would propagate that info in some IPA pass), matrix%local_data_sp is
a field in what an arguments points at (and the argument is known to be
non-NULL).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jv244 at cam dot ac dot uk


--- Comment #20 from jv244 at cam dot ac dot uk  2010-06-25 09:28 ---
(In reply to comment #18)
 That part comes from the questionable testcase, which does
 a_sp = matrix%local_data_sp
 before the loop unconditionally, eventhough matrix%local_data_sp is
 uninitialized unless use_sp is .true.

thanks for looking into the bug report. Actually the testcase can be turned in
'unquestionable' by adding a 'NULLIFY(a%local_data_sp)' in the main program. At
that point the pointer assignment (a_sp = matrix%local_data_sp) makes a_sp
defined, but valgrind still yields an error.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #21 from jakub at gcc dot gnu dot org  2010-06-25 09:38 ---
True.  Most of the fields of the pointer are still uninitialized (NULLIFY only
clears the data field).
I'm afraid when NULLIFY doesn't clear the whole struct there will be always a
possibility valgrind might complain, the tree optimizers see the comparison as
not having any side-effects, not possibly trapping etc., so some pass might
exchange the two comparisons somewhere.  But worse case it would be just a
valgrind warning, conditional jump based on uninitialized values which in the
end wouldn't have any runtime visible effect.

This patch doesn't try to do anything about that, just improve unswitching such
that it chooses conditions that it can usefully unswitch on (and not unswitch
on always false/true conditions).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #22 from jakub at gcc dot gnu dot org  2010-06-25 12:11 ---
Subject: Bug 43866

Author: jakub
Date: Fri Jun 25 12:10:42 2010
New Revision: 161375

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=161375
Log:
PR middle-end/43866
* tree-ssa-loop-unswitch.c (tree_may_unswitch_on): If stmt is always
true or always false, return NULL_TREE.
(tree_unswitch_single_loop): Optimize conditions even when reaching
max-unswitch-level parameter.  If num  0, optimize first all
conditions
using entry checks, then do still reachable block discovery and
consider
only conditions in still reachable basic blocks in the loop.

* gfortran.dg/pr43866.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/pr43866.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-unswitch.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-25 Thread jakub at gcc dot gnu dot org


--- Comment #23 from jakub at gcc dot gnu dot org  2010-06-25 12:30 ---
Fixed on the trunk so far.


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

  Known to fail|4.3.1 4.4.0 4.5.0 4.6.0 |4.3.1 4.4.0 4.5.0
  Known to work|4.1.2   |4.1.2 4.6.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-24 Thread jakub at gcc dot gnu dot org


--- Comment #10 from jakub at gcc dot gnu dot org  2010-06-24 13:31 ---
This looks like a bug in the source to me, certainly not a middle-end or
tree-optimization bug.
The
a_sp = matrix%local_data_sp
assignment when
matrix%local_data_sp
is uninitialized means you are copying uninitialized data.  Either nullify
it in the caller or do this assignment only if (use_sp).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-24 Thread jakub at gcc dot gnu dot org


--- Comment #11 from jakub at gcc dot gnu dot org  2010-06-24 16:21 ---
Created an attachment (id=20997)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20997action=view)
unswitch-tuplification.patch

While the validity of the testcase is questionable, what unswitching does looks
certainly wrong.

I'm attaching two small patches that fix things I've found while reading
tree-ssa-loop-unswitch.

The real problem for this testcase though seems to be that there is no
unreachable block removal in between the unswitching levels, so it can very
well choose to unswitch on a condition that is never reachable in the
unswitched loop.
Say there is
while (...)
  {
if (cond1)
  {
if (cond2)
  A
else
  B
  }
else
  {
if (cond3)
  C
else
  D
  }
E
  }

The first level chooses to unswitch on the cond1 condition, so we have:
if (cond1)  
  while (...)
{  
  if (1 != 0)
{
  if (cond2)
A
  else
B
}
  else
{
  if (cond3)
C
  else
D
}
  E
}
else
  while (...) 
{
  if (0 != 0)
{ 
  if (cond2)
A
  else
B 
}
  else
{
  if (cond3)
C
  else
D
}
  E
}

Now, in the first loop if we decide to unswitch on cond3, it transforms this
into:
if (cond1)
  {
if (cond3)
  while (...)
{
  if (1 != 0)
{
  if (cond2)
A
  else
B 
}
  else
{ 
  if (1 != 0)
C
  else
D 
}
  E  
}
else   
  while (...)
{
  if (1 != 0)
{
  if (cond2)
A
  else
B 
}
  else
{ 
  if (0 != 0)
C
  else
D 
}
  E  
}
  }
else
  while (...)
{
  if (0 != 0)
{
  if (cond2)
A
  else
B 
}
  else
{ 
  if (cond3)
C
  else
D 
}
  E  
}  

If cond3 tests some variable that is initialized only if cond1 is false, this
unswitching (besides not being very useful because cond3 is never tested when
cond1 is false in the original program) results in jumps on uninitialized
values comparison.

Anyway, this first patch tries to fix a bug caused by tuplification.  Before
tuplification, COND_EXPR_COND would be 0 or 1 when optimized, but 0 != 0 or 1
!= 0 condition never satisfies integer_zerop nor integer_nonzerop.  We could
fold it, but we already have predicates for this stuff.

This patch actually causes more unswitching to happen, because it never
unswitches on 0 != 0 or 1 != 0 conditions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-24 Thread jakub at gcc dot gnu dot org


--- Comment #12 from jakub at gcc dot gnu dot org  2010-06-24 16:24 ---
Created an attachment (id=20998)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20998action=view)
level-opt.patch

When we reach the max-unswitch-level param, the function returns immediately,
which means the conditions the last unswitching optimized aren't simplified
(likely that happens in some later optimization passes, but unswitching has
code to handle it, so I think it won't hurt to do it immediately).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-24 Thread jakub at gcc dot gnu dot org


--- Comment #13 from jakub at gcc dot gnu dot org  2010-06-24 16:47 ---
For the last issue, perhaps doing delete_unreachable_blocks in between the
levels wouldn't work too well, loops would need to be discovered again etc.
Maybe we can just do something similar to find_unreachable_blocks
limited only to the loop bbs and ignoring false resp. true edges if
gimple_cond_{true,false}_p on the last stmt and not consider as loop
unswitching candidate any bb's that aren't reachable from the header in the
loop.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-06-24 Thread rguenth at gcc dot gnu dot org


--- Comment #14 from rguenth at gcc dot gnu dot org  2010-06-24 20:49 
---
Both patches look ok.  For the remaining issue it would probably work to
not copy unreachable blocks during unswitching?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-05-22 Thread jv244 at cam dot ac dot uk


--- Comment #8 from jv244 at cam dot ac dot uk  2010-05-22 14:09 ---
(In reply to comment #7)
 4.5.1, using '-O2 -funswitch-loops'

obviously '-O2 -funswitch-loops -fbounds-check'


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-05-22 Thread rguenth at gcc dot gnu dot org


--- Comment #9 from rguenth at gcc dot gnu dot org  2010-05-22 18:14 ---
GCC 4.3.5 is being released, adjusting target milestone.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.3.5   |4.3.6


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-05-14 Thread rguenth at gcc dot gnu dot org


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-04-26 Thread jv244 at cam dot ac dot uk


--- Comment #7 from jv244 at cam dot ac dot uk  2010-04-26 11:06 ---
I have been trying to reduce more, but this is the smallest so far, seems like
it needs the derived types, the 2D arrays, the pointers. I've reduced this with
4.5.1, using '-O2 -funswitch-loops'

MODULE M1
  IMPLICIT NONE
  TYPE cp_fm_type
 REAL(KIND=4), DIMENSION(:,:), POINTER :: local_data_sp
 REAL(KIND=8), DIMENSION(:,:), POINTER :: local_data
  END TYPE
CONTAINS
  SUBROUTINE cp_fm_upper_to_full(matrix,nrow_global,ncol_global,use_sp)
TYPE(cp_fm_type), POINTER  :: matrix
INTEGER, INTENT(IN) :: ncol_global, nrow_global
INTEGER :: irow_global, icol_global
LOGICAL :: use_sp
REAL(KIND = 4), DIMENSION(:,:), POINTER :: a_sp
REAL(KIND = 8), DIMENSION(:,:), POINTER :: a

a = matrix%local_data
a_sp = matrix%local_data_sp
DO irow_global=1,nrow_global
   DO icol_global=irow_global+1,ncol_global
  IF(use_sp) THEN
 a_sp(icol_global,irow_global)=a_sp(irow_global,icol_global)
  ELSE
 a(icol_global,irow_global)=a(irow_global,icol_global)
  ENDIF
   ENDDO
ENDDO
  END SUBROUTINE cp_fm_upper_to_full
END MODULE M1

  USE M1
  TYPE(cp_fm_type), POINTER :: a
  INTEGER, PARAMETER :: N=17
  ALLOCATE(a)
  ALLOCATE(a%local_data(N,N))
  a%local_data=0
  CALL cp_fm_upper_to_full(a,N,N,.FALSE.)
END


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866



[Bug middle-end/43866] [4.3/4.4/4.5/4.6 Regression] wrong code with -fbounds-check -funswitch-loops

2010-04-23 Thread jv244 at cam dot ac dot uk


--- Comment #6 from jv244 at cam dot ac dot uk  2010-04-23 17:29 ---
both testcases work with 4.1.2. I've also checked various versions of valgrind,
which all report consistent results. Also 4.5 only fails on the original
testcase. for reference, -v gives:


 gcc-4.5/bin/gfortran-4.5 -O3 -fbounds-check -g -static -v t.f90
Driving: gcc-4.5/bin/gfortran-4.5 -O3 -fbounds-check -g -static -v t.f90
-lgfortran -lm
Using built-in specs.
COLLECT_GCC=gcc-4.5/bin/gfortran-4.5
COLLECT_LTO_WRAPPER=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /lustre/ESPFS/scratch/pci/vondele/gcc-4.5/gcc-4.5.0/configure
--prefix=/home/pci/vondele/gcc-4.5 --enable-languages=c,c++,fortran
--program-suffix=-4.5 --disable-multilib --with-gmp=/home/pci/vondele/gcc-4.5
--with-mpfr=/home/pci/vondele/gcc-4.5 --with-mpc=/home/pci/vondele/gcc-4.5
Thread model: posix
gcc version 4.5.0 (GCC)
COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic'
'-march=x86-64'

/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/f951
t.f90 -quiet -dumpbase t.f90 -mtune=generic -march=x86-64 -auxbase t -g -O3
-version -fbounds-check -fintrinsic-modules-path
/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/finclude
-o /tmp/cc4wFHAz.s
GNU Fortran (GCC) version 4.5.0 (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.5.0, GMP version 4.2.4, MPFR version 2.4.1,
MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU Fortran (GCC) version 4.5.0 (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.5.0, GMP version 4.2.4, MPFR version 2.4.1,
MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic'
'-march=x86-64'
 as -V -Qy --64 -o /tmp/ccvIH9et.o /tmp/cc4wFHAz.s
GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version
2.16.91.0.5 20051219 (SUSE Linux)
COMPILER_PATH=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/
LIBRARY_PATH=/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-fbounds-check' '-g' '-static' '-v' '-mtune=generic'
'-march=x86-64'

/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/collect2
-m elf_x86_64 -static /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o
/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtbeginT.o
-L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0
-L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc
-L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64
-L/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../..
/tmp/ccvIH9et.o -lgfortran -lm --start-group -lgcc -lgcc_eh -lc --end-group
/panfs/panfs0.ften.es.hpcn.uzh.ch/es/home/pci/vondele/gcc-4.5/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.5.0/crtend.o
/usr/lib/../lib64/crtn.o


-- 

jv244 at cam dot ac dot uk changed:

   What|Removed |Added

  Known to fail|4.3.1 4.4.0 4.6.0   |4.3.1 4.4.0 4.5.0 4.6.0
  Known to work||4.1.2
Summary|wrong code with -fbounds-   |[4.3/4.4/4.5/4.6 Regression]
   |check -funswitch-loops  |wrong code with -fbounds-
   ||check -funswitch-loops
   Target Milestone|--- |4.3.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43866