[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-07 Thread uros at gcc dot gnu dot org


--- Comment #11 from uros at gcc dot gnu dot org  2006-09-07 17:45 ---
Subject: Bug 28946

Author: uros
Date: Thu Sep  7 17:45:48 2006
New Revision: 116756

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=116756
Log:
PR target/28946
* config/i386/i386.md (*ashldi3_cconly_rex64, *ashlsi3_cconly,
*ashlhi3_cconly, *ashlqi3_cconly, *ashrdi3_one_bit_cconly_rex64,
*ashrdi3_cconly_rex64, *ashrsi3_one_bit_cconly, *ashrsi3_cconly,
*ashrhi3_one_bit_cconly, *ashrhi3_cconly,
*ashrqi3_one_bit_cconly, *ashrqi3_cconly,
*lshrdi3_cconly_one_bit_rex64, *lshrdi3_cconly_rex64,
*lshrsi3_one_bit_cconly, *lshrsi3_cconly,
*lshrhi3_one_bit_cconly, *lshrhi3_cconly,
*lshrqi2_one_bit_cconly, *lshrqi2_cconly: New patterns to
implement only CC setting effects of shift instructions.

testsuite/ChangeLog:

   PR target/28946
   * gcc.target/i386/pr28946.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr28946.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-06 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-09-06 11:33 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00162.html implements
missing i386.md RTL patterns. This is i386 target-specific fix for this bug.

The patch was bootstrapped on i686-pc-linux-gnu and x86_64-pc-linux-gnu,
regtested for c,c++ and fortran.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

URL|http://gcc.gnu.org/ml/gcc-  |http://gcc.gnu.org/ml/gcc-
   |patches/2006-   |patches/2006-
   |09/msg00137.html|09/msg00162.html


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-06 Thread hjl at lucon dot org


--- Comment #10 from hjl at lucon dot org  2006-09-06 14:26 ---
The proposed patch will slow down Core and Core 2 by 70-100% in some testcases
due to partial flag register stall. I have a followup patch to implement
TARGET_PARTIAL_FLAG_REG_STALL.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-06 Thread mmitchel at gcc dot gnu dot org


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-09-05 06:20 ---
(In reply to comment #2)
 It is entirely coincident. For some processors, it is an optimization to avoid
 partial flag register stall. When it is fixed, it should be reenabled with a
 new flag, something like TARGET_PARTIAL_FLAG_REG_STALL.

There is TARGET_USE_INCDEC flag that already implements your suggestion.

From predicates.md:
  /* On Pentium4, the inc and dec operations causes extra dependency on flag
 registers, since carry flag is not set.  */
  if (!TARGET_USE_INCDEC  !optimize_size)

If used elsewhere, this flag should perhaps be renamed to proposed
TARGET_PARTIAL_FLAG_REG_STALL.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-09-05 09:35 ---
The problem here is following:

We already have the patterns, that would satisfy combined instruction
(*lshrsi3_cmp) in above testcase. However, combiner rejects combined
instruction because the register that holds shifted result is unused!

The problematic part is in combine.c, around line 2236 (please read the
comment, which describes exactly the situation we have here). This part of code
is activated only when the register that holds the result of arith operation is
keept alive. This is quite strange - even if the result is unused, resulting
code will be still smaller as we avoid extra CC setting instruction.

The patch bellow (currently under testing, but so far OK) forces generation of
combined instruction even if the arithmetic result is unused.

Index: combine.c
===
--- combine.c   (revision 116691)
+++ combine.c   (working copy)
@@ -2244,7 +2244,7 @@
  needed, and make the PARALLEL by just replacing I2DEST in I3SRC with
  I2SRC.  Later we will make the PARALLEL that contains I2.  */

-  if (i1 == 0  added_sets_2  GET_CODE (PATTERN (i3)) == SET
+  if (i1 == 0  GET_CODE (PATTERN (i3)) == SET
GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
XEXP (SET_SRC (PATTERN (i3)), 1) == const0_rtx
rtx_equal_p (XEXP (SET_SRC (PATTERN (i3)), 0), i2dest))
@@ -2254,6 +2254,13 @@
   enum machine_mode compare_mode;
 #endif

+  /* To force generation of the combined comparison and arithmetic
+operation PARALLEL, pretend that the set in I2 is to be used,
+even if it is dead after I2. This results in better generated
+code, as only CC setting arithmetic instruction will be
+emitted in conditionals.  */
+  added_sets_2 = 1;
+
   newpat = PATTERN (i3);
   SUBST (XEXP (SET_SRC (newpat), 0), i2src);


Compiling testcase with this patch results in following code:

fct:
movl 4(%esp), %eax
shrl $5, %eax
je  .L2
jmp fct1
.p2align 4,,7
.L2:
jmp fct2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-09-05 11:45 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00137.html

BTW: This patch eliminates 869 test instructions in povray-3.6.1 compile.
(And my test raytraced pictures are still correct.)


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||09/msg00137.html
 Status|NEW |ASSIGNED
   Keywords||patch
   Last reconfirmed|2006-09-04 16:50:06 |2006-09-05 11:45:14
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-09-05 13:43 ---
Hm, proposed patch now generates worse code for following test:

extern int fnc1(void);
extern int fnc2(void);

int test(int x)
{
if (x  0x02)
 return fnc1();
else if (x  0x01)
 return fnc2();
else
 return 0;
}

It generates:

test:
movl 4(%esp), %edx
movl %edx, %eax
andl $2, %eax
jne .L10
andl $1, %edx
jne .L11
xorl %eax, %eax
ret
.p2align 4,,7
.L11:
.p2align 4,,8
jmp fnc2
.p2align 4,,7
.L10:
.p2align 4,,7
jmp fnc1

due to marking %eax live in first comparison, and is used instead of test,
and a regmove is emitted before comparison. Ideally gcc should generate:

test:
movl 4(%esp), %eax
testl  $2, %eax
jne .L6
andl $1, %eax
jne .L7
xorl %eax, %eax
ret
.p2align 2,,3
.L7:
jmp fnc2
.p2align 2,,3
.L6:
jmp fnc1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread hjl at lucon dot org


--- Comment #8 from hjl at lucon dot org  2006-09-05 14:54 ---
TARGET_PARTIAL_FLAG_REG_STALL and TARGET_USE_INCDEC are totally different.
TARGET_USE_INCDEC favors inc/dec over add/sub while
TARGET_PARTIAL_FLAG_REG_STALL
adds test after shift.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-04 Thread pinskia at gcc dot gnu dot org


--- Comment #1 from pinskia at gcc dot gnu dot org  2006-09-04 16:50 ---
Confirmed, a regression from 2.95.3 which almost means the new ia32 back-end
caused this.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
  GCC build triplet|i486-linux-gnu  |
   GCC host triplet|i486-linux-gnu  |
  Known to fail||4.0.0 4.1.0 4.2.0 3.0.4
   ||3.2.3 3.3.3
  Known to work||2.95.3
   Last reconfirmed|-00-00 00:00:00 |2006-09-04 16:50:06
   date||
Summary|assembler shifts set the|[4.0/4.1/4.2 Regression]
   |flag ZF, no need to re-test |assembler shifts set the
   |to zero |flag ZF, no need to re-test
   ||to zero
   Target Milestone|--- |4.0.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-04 Thread hjl at lucon dot org


--- Comment #2 from hjl at lucon dot org  2006-09-04 17:49 ---
It is entirely coincident. For some processors, it is an optimization to avoid
partial flag register stall. When it is fixed, it should be reenabled with a
new flag, something like TARGET_PARTIAL_FLAG_REG_STALL.


-- 

hjl at lucon dot org changed:

   What|Removed |Added

 CC||hjl at lucon dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-04 Thread dann at godzilla dot ics dot uci dot edu


--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2006-09-04 
17:56 ---
This specific case can probably be solved at the tree level by changing the
test:

(nb  5) != 0
to 
nb  32


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 CC||dann at godzilla dot ics dot
   ||uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946