[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch

2006-10-28 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-10-28 09:43 ---
Fixed on 4.3 mainline


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
  Known to fail||4.2.0
  Known to work||4.3.0
   Last reconfirmed|-00-00 00:00:00 |2006-10-28 09:43:15
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377



[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch

2006-10-28 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-10-28 10:04 ---
Fixed for 4.1.2.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  Known to work|4.3.0   |4.1.2 4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377



[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.

2006-10-26 Thread uros at kss-loka dot si


--- Comment #13 from uros at kss-loka dot si  2006-10-26 22:22 ---
Just some performance numbers (sorry for the C testcase...) on x86_64:
--cut here--
#include math.h
#include stdio.h

int main()
{
  double x;
  double t = 0.0;

  for (x = 1000.0; x  0.0; x -= 1.0)
t += fmod (x, 1.7e-8);

  printf(%f\n, t);

  return 0;
}
--cut here--

[EMAIL PROTECTED] x86_64-test]$ gcc -march=k8 -O2 -lm mod.c
[EMAIL PROTECTED] x86_64-test]$ time ./a.out
0.089927

real0m4.304s
user0m4.294s
sys 0m0.009s
[EMAIL PROTECTED] x86_64-test]$ gcc -march=k8 -O2 -lm -mfpmath=387 mod.c
[EMAIL PROTECTED] x86_64-test]$ time ./a.out
0.089927

real0m0.351s
user0m0.349s
sys 0m0.002s

I know that this measurement depends on the library implementation, but this is
current situation, where above tests shows that intrinsic MOD is 12.3 _times_
faster.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518



[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.

2006-10-25 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-10-25 07:33 ---
Revision 118024 clears the way for MOD and MODULO implementation:
http://gcc.gnu.org/ml/gcc-cvs/2006-10/msg00703.html

BTW: I don't know fortran requirements, but built-in functions produce faster
code if errno is not needed. -mno-math-errno should be used in this case.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518



[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.

2006-10-25 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2006-10-25 11:48 ---
(In reply to comment #7)

 Just to be sure I understand: we are garanteed that BUILT_IN_REMAINDER{F,,L}
 and BUILT_IN_FMOD{F,,L} are always available, right?

Yes. The expansion does not depend on -ffast-math anymore. However, the named
pattern should be present in .md files. Currently, i386 provides named pattern
for -mfpmath=387, but not for -mfpmath=sse. In the later case, expansion will
fall-back to normal library call.

 gfortran doesn't have a need for errno to be set after math functions are
 called. However, we do want that have correct results in all cases: Inf,
 NaN, subnormals, etc. From my reading of the manual, -fno-math-errno would
 imply that we do not get such correct results, am I right?

Fortunatelly, no. The result will be correct. You can see the effect of
-fno-math-errno at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01158.html.
Fixup code detects NaN (as an abnormal return from builtin funcion) and calls
library function in order to set global variable errno. If global variable
errno is not needed (as I suspect is the case with fortran libraries), fixup
code is not needed, so -fno-math-errno shoul be used.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518



[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-10-25 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-10-25 12:04 ---
(In reply to comment #5)
 With more registers (x86_64) the stack moves are gone, but: (!)

 (testing done on AMD Athlon fam 15 model 35 stepping 2)

On Xeon 3.6, SSE is now faster:

gcc -O2 -march=pentium4 -mfpmath=387 pr19780.c 
time ./a.out
Start?
Stop!
Result = 0.00, 0.00, 1.00

real0m0.805s
user0m0.804s
sys 0m0.000s

gcc -O2 -march=pentium4 -mfpmath=sse pr19780.c 
time ./a.out
Start?
Stop!
Result = 0.00, 0.00, 1.00

real0m0.707s
user0m0.704s
sys 0m0.004s

vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 3.60GHz
stepping: 10
cpu MHz : 3600.970
cache size  : 2048 KB

The question is now, why is Athlon so slow with SFmode SSE?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780



[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-10-25 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-10-25 12:18 ---
(In reply to comment #6)

 On Xeon 3.6, SSE is now faster:

... but for -ffast-math:

SSE: user0m0.756s
x87: user0m0.612s

Yes, x87 is faster for -ffast-math by some 20%.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780



[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.

2006-10-25 Thread uros at kss-loka dot si


--- Comment #10 from uros at kss-loka dot si  2006-10-25 14:16 ---
(In reply to comment #9)

  In the later case, expansion will fall-back to normal library call.
 
 OK. So on system where the math library doesn't have remainderl, for example,
 we shouldn't use BUILT_IN_REMAINDERL or it will be missing at link-time? If
 that's the case, then we can't implement MOD/MODULO with these built-ins.

You can check for TARGET_C99_FUNCTIONS before they are used.

  Fortunatelly, no. The result will be correct. You can see the effect of
  -fno-math-errno at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01158.html.
 
 And now, a harder question: could we activate no-math-errno on a per-call
 basis? That is, have the front-end emit a call to BUILT_IN_FOO and specify
 that, for this call, errno doesn't have to be set?

errno expansion for this particular built-in is inhibited in line 1995 of
builtins.c. For a per-call basis, we need an argument to expand_builtin()
function to disable errno expansion. However, the rationale for this is unclear
to me. IMO - either we use errno, or we don't. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518



[Bug target/27440] [4.0/4.1/4.2 regression] code quality regression due to ivopts

2006-10-10 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-10-10 14:48 ---
(In reply to comment #6)
 Confirmed (as in comment #1).  With -Os instead of -O2 we even produce
 
 .L3:
 movl%ebx, -4(%edx)

The -4(...) part comes from PR 24669.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440



[Bug target/28924] x86 sync builtins fail for char and short memory operands

2006-10-07 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2006-10-07 06:12 ---
Testcase was commited to trunk and 4.1 branch, and now passes everywhere.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to fail|4.1.0 4.2.0 |4.1.0
  Known to work||4.2.0 4.1.2
 Resolution||FIXED
   Target Milestone|--- |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924



[Bug target/29377] New: Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch

2006-10-07 Thread uros at kss-loka dot si
Build for h8300-elf target crashes on 64bit hosts with:

../../gcc-svn/trunk/gcc/libgcc2.c: In function '__muldi3':
../../gcc-svn/trunk/gcc/libgcc2.c:542: error: unrecognizable insn:
(insn 234 233 235 2 ../../gcc-svn/trunk/gcc/libgcc2.c:533 (set (reg:HI 3 r3)
(const_int 4294967214 [0xffae])) -1 (nil)
(nil))
../../gcc-svn/trunk/gcc/libgcc2.c:542: internal compiler error: in
extract_insn, at recog.c:2077


-- 
   Summary: Build for h8300-elf crashes on 64bit hosts due to
int/HWI mismatch
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Keywords: build
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: h8300-elf


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377



[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch

2006-10-07 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-10-07 07:51 ---
Propsoed patch at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00337.html


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||10/msg00337.html
   Keywords||patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377



[Bug target/28924] x86 sync builtins fail for char and short memory operands

2006-10-06 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-10-06 08:27 ---
Please note, that in addition to
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00250.html,
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00244.html is also needed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924



[Bug target/29337] -mfpmath=387 doesn't use fistp for double-to-integer conversion

2006-10-05 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2006-10-05 07:08 ---

 try -O2 -msse2, you get:
 _Z8todoubledd:
 subl$12, %esp
 fldl24(%esp)
 faddl   16(%esp)
 fstpl   (%esp)
 movsd   (%esp), %xmm0
 addl$12, %esp
 cvttsd2si   %xmm0, %eax
 ret
 
 
 Though I think the movsd should not be there but that is a different
 issue.

This is PR 19398. I have a patch that adds a bunch of peephole2 patterns to
address this particular issue. The patch is already approved and waits for
stage1.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29337



[Bug target/29347] i386 mode switching clobbers fp exception handling bits

2006-10-05 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-10-05 07:51 ---
(In reply to comment #0)
 The mode switching for floating point rounding that the i386 backend does
 does not actually place mode switches, but rather the calculation of values
 used for mode switches.  Not only does that defeat the purpose of doing
 lazy code motion of the mode switches themselves (this problem could easily
 be remedied by handling the actual mode switches as a separate entity),
 it also leads to information in the floating point control register being
 clobbered if the user changes it (e.g. with feenableexcept:
 http://www.gnu.org/software/libc/manual/html_node/Control-Functions.html)
 between the calculation of the value used for a mode switch, and the point
 where a mode switch actually takes place.
 

Please note, that gcc i386 description is missing FP control register
definition, so x86_fnstcw_1 and x86_fldcw_1 patterns are totally wrong - they
handle control register, not status register. After that, we can add correct
clobber to x87 FP-int instructions.

Regarding mode-switching values calculation: please note that x87 arithmetic
instructions depend on control word. Currently, this is solved by setting and
restoring control word just before/after fist instruction, otherwise (use
(reg:HI FPCW_REG)) has to be added to all affected instructions. I think that
it has to be added anyway, if fesetround() is to be used.

Some time ago, I had a patch that added FPCW_REG to i386.h, I'll look if I can
still found it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29347



[Bug target/29337] -mfpmath=387 doesn't use fistp for double-to-integer conversion

2006-10-04 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-10-04 06:46 ---

 I'm afraid you're missing my point.
 The problem is that for 64-bit and 32-bit floating-point to integer 
 conversion,
 x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN
 -mfpmath=387.
 This defeats the purpose of the option -mfpmath=387 which is supposed to make
 floating-point computations to use 387, instead of SSE2.

If SSE is available, then SSE cvt* is used in order to avoid long control-word
setting sequences. This is cheaper even if we have to move value from x87
register, as cvt* can handle mem-reg transformations.

If you really need fistp* sequence, you can try with -mno-sse2 (you can't just
disable sse on x86_64 target) or perhaps use -msse3, where fisttp insn will be
generated.

Saying that, I wonder where excess precision effects come into play here. We
are talking about truncate-to-integer instruction, so I would really like to
see an example of this effect.
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29337



[Bug target/29300] FAIL: gcc.dg/pthread-init-[12].c (test for excess errors)

2006-10-03 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-10-03 07:04 ---
Similar problems were recently fixed for solaris and glibc-2.3.5. It looks that
hpux needs a fixinclude hack that would cure these errors/warnings, somehing
like:

http://gcc.gnu.org/ml/gcc-patches/2006-09/msg01317.html
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg9.html

and perhaps

http://gcc.gnu.org/ml/gcc-patches/2006-10/msg9.html

Confirmed, as gcc.dg/pthread-* tests were introduced in order to catch problems
as described in the bug description.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-10-03 07:04:17
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29300



[Bug target/29169] sse3-not-fisttp.c scan-assembler-not fisttp FAILs on i386-pc-solaris2.10

2006-09-23 Thread uros at kss-loka dot si


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||09/msg01012.html
 Status|NEW |ASSIGNED
   Keywords||patch
   Last reconfirmed|2006-09-21 17:52:55 |2006-09-23 13:36:43
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29169



[Bug target/29169] sse3-not-fisttp.c scan-assembler-not fisttp FAILs on i386-pc-solaris2.10

2006-09-23 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-09-23 14:41 ---
Fixed.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29169



[Bug target/28946] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-19 Thread uros at kss-loka dot si


--- Comment #14 from uros at kss-loka dot si  2006-09-19 11:31 ---
Fixed everywhere.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to fail|4.0.0 3.0.4 3.2.3 3.3.3 |3.0.4 3.2.3 3.3.3
  Known to work|2.95.3 4.2.0 4.1.2  |2.95.3 4.2.0 4.1.2 4.0.4
 Resolution||FIXED
Summary|[4.0 Only] assembler shifts |assembler shifts set the
   |set the flag ZF, no need to |flag ZF, no need to re-test
   |re-test to zero |to zero


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/26968] [4.1 Regression] HDF5 1.7.52 test segfaults with 4.1.0, fine with 4.0.2 (regression)

2006-09-07 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-09-07 06:58 ---
I have built and run a testsuite of HDF5 library on i686-pc-linux-gnu with:

gcc version 4.2.0 20060906 (experimental)

hdf5-1.6.5 (production):
(CFLAGS=-fno-strict-aliasing is needed before configure)
All tests PASS with default compile flags out of the box.

hdf5-1.8.0-alpha4:
All tests PASS with defult compile flags out of the box.

I guess this bugreport can be considered as 4.1 regression only.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  Component|middle-end  |target
 GCC target triplet||i386-pc-linux-gnu
  Known to work||4.2.0
Summary|[4.1/4.2 Regression] HDF5   |[4.1 Regression] HDF5 1.7.52
   |1.7.52 test segfaults with  |test segfaults with 4.1.0,
   |4.1.0, fine with 4.0.2  |fine with 4.0.2 (regression)
   |(regression)|


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26968



[Bug target/28924] x86 sync builtins fail for char and short memory operands

2006-09-07 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-09-08 05:47 ---
I have been playing with following patch to optabs.c that forces operands in
functions expand_sync_operation(), expand_sync_fetch_operation() and
expand_sync_lock_test_and_set() into registers through subregs of word-mode
temp registers. The testcase in the description is then expanded as:

;; __sync_fetch_and_add_1 (s, 255) [tail call]
(insn 10 8 11 (set (reg:SI 58)
(const_int 255 [0xff])) -1 (nil)
(nil))

(insn 11 10 0 (parallel [
(set (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1
A8])
(unspec_volatile:QI [
(plus:QI (mem/v:QI (symbol_ref:SI (s) var_decl
0x402410b0 s) [-1 S1 A8])
(subreg:QI (reg:SI 58) 0))
] 13))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))

and RTL optimizers are able to optimize this back into:

(insn:HI 11 8 12 2 (parallel [
(set (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1
A8])
(unspec_volatile:QI [
(plus:QI (mem/v:QI (symbol_ref:SI (s) var_decl
0x402410b0 s) [-1 S1 A8])
(const_int -1 [0x]))
] 13))
(clobber (reg:CC 17 flags))
]) 924 {sync_addqi} (insn_list:REG_DEP_TRUE 10 (nil))
(nil))

This results in expected asm code:

tests:
lock
addb   $-1, s
ret

However, the patch does not cover all backup code-paths in sync_* expanders, so
in some cases an integer argument can still be forced into register in the
wrong way.

--cut here--
Index: optabs.c
===
--- optabs.c (revision 116739)
+++ optabs.c (working copy)
@@ -6023,7 +6023,7 @@
   if (GET_MODE (val) != VOIDmode  GET_MODE (val) != mode)
val = convert_modes (mode, GET_MODE (val), val, 1);
   if (!insn_data[icode].operand[1].predicate (val, mode))
-   val = force_reg (mode, val);
+   val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val));

   insn = GEN_FCN (icode) (mem, val);
   if (insn)
@@ -6156,7 +6156,7 @@
   if (GET_MODE (val) != VOIDmode  GET_MODE (val) != mode)
val = convert_modes (mode, GET_MODE (val), val, 1);
   if (!insn_data[icode].operand[2].predicate (val, mode))
-   val = force_reg (mode, val);
+   val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val));

   insn = GEN_FCN (icode) (target, mem, val);
   if (insn)
@@ -6243,7 +6243,7 @@
   if (GET_MODE (val) != VOIDmode  GET_MODE (val) != mode)
val = convert_modes (mode, GET_MODE (val), val, 1);
   if (!insn_data[icode].operand[2].predicate (val, mode))
-   val = force_reg (mode, val);
+   val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val));

   insn = GEN_FCN (icode) (target, mem, val);
   if (insn)
--cut here--


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-06 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-09-06 11:33 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00162.html implements
missing i386.md RTL patterns. This is i386 target-specific fix for this bug.

The patch was bootstrapped on i686-pc-linux-gnu and x86_64-pc-linux-gnu,
regtested for c,c++ and fortran.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

URL|http://gcc.gnu.org/ml/gcc-  |http://gcc.gnu.org/ml/gcc-
   |patches/2006-   |patches/2006-
   |09/msg00137.html|09/msg00162.html


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-09-05 06:20 ---
(In reply to comment #2)
 It is entirely coincident. For some processors, it is an optimization to avoid
 partial flag register stall. When it is fixed, it should be reenabled with a
 new flag, something like TARGET_PARTIAL_FLAG_REG_STALL.

There is TARGET_USE_INCDEC flag that already implements your suggestion.

From predicates.md:
  /* On Pentium4, the inc and dec operations causes extra dependency on flag
 registers, since carry flag is not set.  */
  if (!TARGET_USE_INCDEC  !optimize_size)

If used elsewhere, this flag should perhaps be renamed to proposed
TARGET_PARTIAL_FLAG_REG_STALL.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-09-05 09:35 ---
The problem here is following:

We already have the patterns, that would satisfy combined instruction
(*lshrsi3_cmp) in above testcase. However, combiner rejects combined
instruction because the register that holds shifted result is unused!

The problematic part is in combine.c, around line 2236 (please read the
comment, which describes exactly the situation we have here). This part of code
is activated only when the register that holds the result of arith operation is
keept alive. This is quite strange - even if the result is unused, resulting
code will be still smaller as we avoid extra CC setting instruction.

The patch bellow (currently under testing, but so far OK) forces generation of
combined instruction even if the arithmetic result is unused.

Index: combine.c
===
--- combine.c   (revision 116691)
+++ combine.c   (working copy)
@@ -2244,7 +2244,7 @@
  needed, and make the PARALLEL by just replacing I2DEST in I3SRC with
  I2SRC.  Later we will make the PARALLEL that contains I2.  */

-  if (i1 == 0  added_sets_2  GET_CODE (PATTERN (i3)) == SET
+  if (i1 == 0  GET_CODE (PATTERN (i3)) == SET
GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
XEXP (SET_SRC (PATTERN (i3)), 1) == const0_rtx
rtx_equal_p (XEXP (SET_SRC (PATTERN (i3)), 0), i2dest))
@@ -2254,6 +2254,13 @@
   enum machine_mode compare_mode;
 #endif

+  /* To force generation of the combined comparison and arithmetic
+operation PARALLEL, pretend that the set in I2 is to be used,
+even if it is dead after I2. This results in better generated
+code, as only CC setting arithmetic instruction will be
+emitted in conditionals.  */
+  added_sets_2 = 1;
+
   newpat = PATTERN (i3);
   SUBST (XEXP (SET_SRC (newpat), 0), i2src);


Compiling testcase with this patch results in following code:

fct:
movl 4(%esp), %eax
shrl $5, %eax
je  .L2
jmp fct1
.p2align 4,,7
.L2:
jmp fct2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-09-05 11:45 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00137.html

BTW: This patch eliminates 869 test instructions in povray-3.6.1 compile.
(And my test raytraced pictures are still correct.)


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||09/msg00137.html
 Status|NEW |ASSIGNED
   Keywords||patch
   Last reconfirmed|2006-09-04 16:50:06 |2006-09-05 11:45:14
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-05 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-09-05 13:43 ---
Hm, proposed patch now generates worse code for following test:

extern int fnc1(void);
extern int fnc2(void);

int test(int x)
{
if (x  0x02)
 return fnc1();
else if (x  0x01)
 return fnc2();
else
 return 0;
}

It generates:

test:
movl 4(%esp), %edx
movl %edx, %eax
andl $2, %eax
jne .L10
andl $1, %edx
jne .L11
xorl %eax, %eax
ret
.p2align 4,,7
.L11:
.p2align 4,,8
jmp fnc2
.p2align 4,,7
.L10:
.p2align 4,,7
jmp fnc1

due to marking %eax live in first comparison, and is used instead of test,
and a regmove is emitted before comparison. Ideally gcc should generate:

test:
movl 4(%esp), %eax
testl  $2, %eax
jne .L6
andl $1, %eax
jne .L7
xorl %eax, %eax
ret
.p2align 2,,3
.L7:
jmp fnc2
.p2align 2,,3
.L6:
jmp fnc1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946



[Bug libgomp/28926] FAIL: libgomp.c/ordered-1.c execution test

2006-09-03 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-09-04 05:49 ---
The problem is that RH8.0 defines SYS_gettid and SYS_futex in headers although
futex syscall is not really supported in the kernel. The build process detects
this and issues a warning to configure with --disable-linux-futex, but still
defaults to use futex syscall.

Perhaps futex support detection logic in libgomp/configure.ac (around line 200)
should be reversed, so it would default to don't use futex by default, but use
them if all tests pass.

Anyway, --disable-linux-futex works for me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28926



[Bug target/28909] Missed optimization with x86 sync builtins

2006-09-01 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-09-01 10:18 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00010.html


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||09/msg00010.html
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-08-31 03:13:03 |2006-09-01 10:18:03
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28909



[Bug target/28924] New: x86 sync builtins fail for char and short memory operands

2006-09-01 Thread uros at kss-loka dot si
Following testcases ICEs with current mainline:

--cut here--
char c;

void testc(void)
{
  (void) __sync_fetch_and_add(c, -1);
}

short s;

void tests(void)
{
  (void) __sync_fetch_and_add(s, -1);
}
--cut here--

inc.c: In function âtestsâ:
inc.c:13: error: unrecognizable insn:
(insn 10 8 11 3 (set (reg:HI 58)
(const_int 65535 [0x])) -1 (nil)
(nil))
inc.c:13: internal compiler error: in extract_insn, at recog.c:2077
Please submit a full bug report,

and:

inc.c: In function âtestcâ:
inc.c:6: error: unrecognizable insn:
(insn 7 5 8 3 (set (reg:QI 58)
(const_int 255 [0xff])) -1 (nil)
(nil))
inc.c:6: internal compiler error: in extract_insn, at recog.c:2077
Please submit a full bug report,

ICE happens for all optimization levels, also for unsigned c and s variables. I
have checked _sync_fetch_and_add() and _sync_fetch_and_sub() builtins, but due
to the nature of error all other sync_* builtins may be affected.


-- 
   Summary: x86 sync builtins fail for char and short memory
operands
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924



[Bug libgomp/28926] New: FAIL: libgomp.c/ordered-1.c execution test

2006-09-01 Thread uros at kss-loka dot si
libgomp.c/ordered-1.c and libgomp.c/ordered-3.c currently timeouts on my system
(RedHat 8.0 with 2.4.18-14, i686) due to unimplemented FUTEX syscall.

strace of produced binary shows endless lines of Function not implemented
lines. This is the beginning:

rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
write(4, [EMAIL PROTECTED]@[EMAIL PROTECTED]@\340\370\377\277\0\0\0..., 148) =
148
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([] unfinished ...
--- SIGRTMIN (Real-time signal 0) ---
... rt_sigsuspend resumed )   = -1 EINTR (Interrupted system call)
sigreturn() = ? (mask now [RTMIN])
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
futex(0x40019458, FUTEX_WAIT, 0, NULL)  = -1 ENOSYS (Function not implemented)
...

Breaking execution in the middle produces following backtrace:

Program received signal SIGINT, Interrupt.
[Switching to Thread 8192 (LWP 5941)]
0x40017c83 in gomp_sem_wait_slow (sem=0x804b09c) at
../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h:73
in ../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h

(gdb) bt
#0  0x40017c83 in gomp_sem_wait_slow (sem=0x804b09c) at
../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h:73
#1  0x400167ce in gomp_ordered_sync () at
../../../gcc-svn/trunk/libgomp/config/linux/sem.h:46
#2  0x40016412 in gomp_loop_ordered_static_next (istart=0xb8e8,
iend=0xb8e4) at ../../../gcc-svn/trunk/libgomp/loop.c:307
#3  0x08048b45 in f_static_1 (dummy=0x0) at ordered-1.c:72


-- 
   Summary: FAIL: libgomp.c/ordered-1.c execution test
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28926



[Bug tree-optimization/28915] [4.2 regression] ICE: tree check: expected class 'constant', have 'declaration' (var_decl) in build_vector, at tree.c:973

2006-08-31 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-08-31 19:15 ---
Confirmed on x86_64.

Backtrace:

(gdb) bt
#0  build_vector (type=0x2db3e6e0, vals=0x2db37cc0) at
../../gcc-svn/trunk/gcc/tree.c:973
#1  0x007b829d in force_const_mem (mode=V2DImode, x=0x2da089e0) at
../../gcc-svn/trunk/gcc/varasm.c:3229
#2  0x005d496a in emit_move_insn (x=0x2db309a0, y=0x2da089e0)
at ../../gcc-svn/trunk/gcc/expr.c:3288
#3  0x006b2ec6 in gen_vec_initv2di (operand0=0x2db309a0,
operand1=0x2da089d0) at ../../gcc-svn/trunk/gcc/config/i386/sse.md:3678
#4  0x005c9e37 in store_constructor (exp=0x2db37900,
target=0x2db309a0, cleared=0, size=16) at
../../gcc-svn/trunk/gcc/expr.c:5431
#5  0x005ce327 in expand_expr_real_1 (exp=0x2db37900,
target=0x2db309a0, tmode=V2DImode, modifier=EXPAND_NORMAL,
alt_rtl=0x7fcf5800) at ../../gcc-svn/trunk/gcc/expr.c:7142
#6  0x005d40cf in expand_expr_real (exp=0x2db37900,
target=0x2db309a0, tmode=V2DImode, modifier=EXPAND_NORMAL,
alt_rtl=0x7fcf5800) at ../../gcc-svn/trunk/gcc/expr.c:6706
#7  0x005c7264 in store_expr (exp=0x2db37900,
target=0x2db309a0, call_param_p=0) at ../../gcc-svn/trunk/gcc/expr.c:4370
#8  0x005c8397 in expand_assignment (to=0x2db3e0b0,
from=0x2db37900) at ../../gcc-svn/trunk/gcc/expr.c:4249
#9  0x005cc403 in expand_expr_real_1 (exp=0x2db3c140, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at
../../gcc-svn/trunk/gcc/expr.c:8603
#10 0x005d40cf in expand_expr_real (exp=0x2db3c140,
target=0x2d956400, tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at
../../gcc-svn/trunk/gcc/expr.c:6706

At the point of ICE, value dumps to:

var_decl 0x2db3ea50 D.1935
type vector_type 0x2db3e6e0
type integer_type 0x2d961630 long int public DI
size integer_cst 0x2d951db0 constant invariant 64
unit size integer_cst 0x2d951de0 constant invariant 8
align 64 symtab 0 alias set -1 precision 64 min integer_cst
0x2d951d20 -9223372036854775808 max integer_cst 0x2d951d50
9223372036854775807
pointer_to_this pointer_type 0x2d974a50
V2DI
size integer_cst 0x2d96c0f0 constant invariant 128
unit size integer_cst 0x2d96c120 constant invariant 16
align 128 symtab 0 alias set -1 nunits 2
V2DI file xskat-xdial.c line 16 size integer_cst 0x2d96c0f0 128 unit
size integer_cst 0x2d96c120 16
align 128
(const:DI (plus:DI (symbol_ref:DI (lanip) [flags 0x40] var_decl
0x2db1cbb0 lanip)
(const_int 40 [0x28])))


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-08-31 19:15:44
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28915



[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark

2006-08-29 Thread uros at kss-loka dot si


--- Comment #10 from uros at kss-loka dot si  2006-08-29 06:12 ---
(In reply to comment #9)
 Fixed on the mainline by:
 http://gcc.gnu.org/ml/gcc-patches/2006-08/msg01036.html

Not really, the above patch fixed only one of three problems. The other two
remains, that is:

- ivopts problem (see comment #6)
- -march=pentium4 (see comment #8)

I'll try to see which option causes problems, described in #8.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

Summary|[4.0/4.1 Regression]|[4.0/4.1/4.2 Regression]
   |Optimizer regression:   |Optimizer regression:
   |SciMark sparse matrix   |SciMark sparse matrix
   |benchmark   |benchmark


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676



[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark

2006-08-17 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-08-17 07:21 ---
(In reply to comment #6)

 I think that remaining time difference is due to strange loop above innermost:

... due to strange _header_ above innermost loop ...

The problem is that we load zero in both arms of if.

This is what I get in .099t.optimized (using gcc-4.2 -O2 -fno-ivopts):

L1:;
  r.0 = (unsigned int) r;
  D.1556 = r.0 * 4;
  rowR = *((int *) D.1556 + row);
  rowRp1 = *((int *) D.1556 + row + 4B);
  if (rowR  rowRp1) goto L41; else goto L42;

L42:;
  sum = 0.0;
  goto bb 5 (L4);

L41:;
  i = rowR;
  sum = 0.0;

Assignment to sum should be moved before if...

SSE is able to somehow CSE zero load during RTL:

.L8:
movl 20(%ebp), %edx
movapd  %xmm2, %xmm1
movl (%edx,%ebx,4), %eax
movl 4(%edx,%ebx,4), %ecx
cmpl %ecx, %eax
jge .L11
movl %eax, %edx
.p2align 4,,7
.L12:


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676



[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark

2006-08-17 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2006-08-17 07:45 ---
Also interesting is, that -march=pentium4 produces following de-optimized
code, adding a couple more instructions and wasting %eax register:

.L8:
leal(%ebx,%ebx), %eax
movl40(%esp), %edx
movl(%edx,%eax,2), %edx
movl%edx, (%esp)
movl40(%esp), %edx
movl4(%edx,%eax,2), %ecx
movapd  %xmm2, %xmm1
cmpl%ecx, (%esp)
jge .L11
movl(%esp), %edx
.L12:

Some additiona timing can be shown (gcc-4.2 -O2 -fomit-frame-pointer): 

-march=pentium4: 0m2.756s
-march=pentium4 -fno-ivopts: 0m2.500s
-march=pentium4 -fno-ivopts -mfpmath=sse: 0m2.461s
-msse2 -fno-ivopts -mfmpath=sse: 0m2.311s

In the last case, the generated code is equal to gcc-3.2 generated one:

.L8:
movl36(%esp), %edx
movapd  %xmm2, %xmm1
movl(%edx,%ebx,4), %eax
movl4(%edx,%ebx,4), %ecx
cmpl%ecx, %eax
jge .L11
movl%eax, %edx
.p2align 4,,7
.L12:
movl(%edi,%edx,4), %eax
movsd   (%esi,%eax,8), %xmm0
mulsd   (%ebp,%edx,8), %xmm0
addl$1, %edx
cmpl%edx, %ecx
addsd   %xmm0, %xmm1
jg  .L12


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676



[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark

2006-08-16 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-08-16 12:15 ---
IMO the problem here is in IVopts. Using gcc-3.x, the innermost loop compiles
to:

.L15:
movl(%edi,%edx,4), %eax
fldl(%ebp,%edx,8)
addl$1, %edx
fmull   (%esi,%eax,8)
cmpl%ecx, %edx
faddp   %st, %st(1)
jl  .L15

and with current SVN gcc-4.2 into:

.L12:
movl(%ecx), %eax
fldl(%ebp,%eax,8)
fmull   (%edx)
faddp   %st, %st(1)
addl$1, %ebx
addl$4, %ecx
addl$8, %edx
cmpl%esi, %ebx
jne .L12

Adding -fno-ivopts, this loop gets compiled into:

.L12:
movl(%edi,%edx,4), %eax
fldl(%esi,%eax,8)
fmull   (%ebp,%edx,8)
faddp   %st, %st(1)
addl$1, %edx
cmpl%edx, %ecx
jg  .L12

Timings (-O3 -march=pentium4 -fomit-frame-pointer):

gcc-3.2: 0m2.301s
gcc-4.2: 0m2.713s
gcc-4.2 + -fno-ivopts: 0m2.473s

with:

gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
gcc version 4.2.0 20060816 (experimental)

I think that remaining time difference is due to strange loop above innermost:
gcc-3.2:

fld %st(0)
.L16:
movl36(%esp), %eax
fld %st(0)
movl4(%eax,%ebx,4), %ecx
movl(%eax,%ebx,4), %edx
cmpl%ecx, %edx
jge .L23
.L15:
movl(%edi,%edx,4), %eax
fldl(%ebp,%edx,8)
addl$1, %edx
fmull   (%esi,%eax,8)
cmpl%ecx, %edx
faddp   %st, %st(1)
jl  .L15
.L23:
movl28(%esp), %eax
fstpl   (%eax,%ebx,8)
addl$1, %ebx
cmpl24(%esp), %ebx
jl  .L16


gcc-4.2:

.L8:
movl36(%esp), %edx
movl(%edx,%edi,4), %eax
movl4(%edx,%edi,4), %esi
fldz
cmpl%esi, %eax
jge .L11
fstp%st(0)
movl40(%esp), %ebx
leal(%ebx,%eax,4), %ecx
movl32(%esp), %ebx
leal(%ebx,%eax,8), %edx
fldz
xorl%ebx, %ebx
subl%eax, %esi
.L12:
movl(%ecx), %eax
fldl(%ebp,%eax,8)
fmull   (%edx)
faddp   %st, %st(1)
addl$1, %ebx
addl$4, %ecx
addl$8, %edx
cmpl%esi, %ebx
jne .L12
.L11:
movl28(%esp), %eax
fstpl   (%eax,%edi,8)
addl$1, %edi
cmpl24(%esp), %edi
jne .L8


and gcc-4.2 -fno-ivopts:

.L8:
leal(%ebx,%ebx), %eax
movl40(%esp), %edx
movl(%edx,%eax,2), %edx
movl%edx, (%esp)
movl40(%esp), %edx
movl4(%edx,%eax,2), %ecx
fldz
cmpl%ecx, (%esp)
jge .L11
fstp%st(0)
movl(%esp), %edx
fldz
.L12:
movl(%edi,%edx,4), %eax
fldl(%esi,%eax,8)
fmull   (%ebp,%edx,8)
faddp   %st, %st(1)
addl$1, %edx
cmpl%edx, %ecx
jg  .L12
.L11:
movl32(%esp), %ecx
fstpl   (%ecx,%ebx,8)
addl$1, %ebx
cmpl%ebx, 28(%esp)
jg  .L8


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-08-16 12:15:56
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676



[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-11 Thread uros at kss-loka dot si


--- Comment #64 from uros at kss-loka dot si  2006-08-11 09:18 ---
Slightly offtopic, but to put some numbers to comment #8 and comment #11,
equivalent SSE code now reaches only 50% of x87 single performance and 60% of
x87 double performance on AMD x86_64:


ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

[float] -O2 -mfpmath=sse -march=k8:
atlasmm   60   1000   0.273 1582.66
[float] -O2 -mfpmath=387 -march=k8:
atlasmm   60   1000   0.138 3130.91

[double] -O2 -mfpmath=sse -march=k8:
atlasmm   60   1000   0.252 1714.54
[double] -O2 -mfpmath=387 -march=k8:
atlasmm   60   1000   0.152 2842.55

This effect was first observed in PR19780.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug middle-end/28685] New: Multiple comparisons are not simplified

2006-08-10 Thread uros at kss-loka dot si
These two testcases should produce equivalent code:

int test(int a, int b)
{
  int lt = a  b;
  int eq = a == b;

  return (lt || eq);
}

int test_(int a, int b)
{
  return (a  b || a == b);
}

However, the optimized tree code is:

;; Function test (test)

Analyzing Edge Insertions.
test (a, b)
{
bb 2:
  return (a == b | a  b) != 0;

}

;; Function test_ (test_)

Analyzing Edge Insertions.
test_ (a, b)
{
bb 2:
  return a = b;

}

And the resultinh x86_64 asm is unoptimal for test() function:

test:
cmpl%esi, %edi
sete%dl
cmpl%esi, %edi
setl%al
orl %edx, %eax
movzbl  %al, %eax
ret

test_:
xorl%eax, %eax
cmpl%esi, %edi
setle   %al
ret


-- 
   Summary: Multiple comparisons are not simplified
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28685



[Bug middle-end/28411] gfortran: Internal error: Illegal instruction

2006-07-18 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-07-18 07:29 ---
This is the backtrace for the testcase in comment #3:

#1  0x0827ae67 in fold_binary_to_constant (code=TRUNC_MOD_EXPR,
type=0x402473f4, op0=0x402d9438, op1=0x0) at
../../gcc-svn/trunk/gcc/fold-const.c:12314
#2  0x08174b25 in constant_multiple_of (type=0x402473f4, top=0x402d9438,
bot=0x0) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:2623
#3  0x081799d1 in get_computation_cost (data=0xb704, use=0x8706e70,
cand=0x8707358, address_p=0 '\0', depends_on=0xb5f4) at
../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:3758
#4  0x0817a364 in determine_use_iv_cost (data=0xb704, use=0x8706e70,
cand=0x8707358) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:3901
#5  0x0817d41e in determine_use_iv_costs (data=0xb704) at
../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:4128
#6  0x0817f3ac in tree_ssa_iv_optimize_loop (data=0xb704, loop=Variable
loop is not available.

constant_multiple_of() is calling fold_binary_to_constant() here:

  if (!zero_p (fold_binary_to_constant (TRUNC_MOD_EXPR, type, top, bot)))
return NULL_TREE;

As can be seen from backtrace above, bot operand is NULL, and this triggers
assert in fold_binary().


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28411



[Bug tree-optimization/28411] gfortran: Internal error: Illegal instruction

2006-07-18 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-07-18 08:06 ---
This error can be tracked down to fold_negate_expr() returning NULL_TREE via
this path:

(a) constant_multiple_of() calls fold_unary_to_constant():

  /* If BOT seems to be negative, try dividing by -BOT instead, and negate
 the result afterwards.  */
  if (tree_int_cst_sign_bit (bot))
{
  negate = true;
  bot = fold_unary_to_constant (NEGATE_EXPR, type, bot);
}

(b) fold_unary_to_constant() calls fold_unary()

(c) fold_unary() calls fold_unary_negate() for NEGATE_EXPR:

case NEGATE_EXPR:
  tem = fold_negate_expr (arg0);
  if (tem)
return fold_convert (type, tem);
  return NULL_TREE;

(d) fold_negate_expr() returns NULL_TREE, because:

case INTEGER_CST:
  tem = fold_negate_const (t, type);
  if (! TREE_OVERFLOW (tem)
  || TYPE_UNSIGNED (type)
  || ! flag_trapv)
return tem;
  break;
  ...

default:
  break;
}

  return NULL_TREE;
}

From here, I don't know, what a correct solution would be...


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28411



[Bug target/26949] [4.2 regression] worse code generated for -march=pentium4

2006-07-06 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-07-06 08:23 ---
This problem appears to be fixed in gcc version 4.2.0 20060705 (experimental).
The generated asm for the loop is now:

-O2 -march=pentium4 -fno-tree-ch:

jmp .L2
.L3:
movl%esi, -4(%edx)
addl$1, %eax
.L2:
addl$4, %edx
cmpl%ecx, %eax
jle .L3

-O2 -march=i686 -fno-tree-ch:

jmp .L2
.p2align 4,,7
.L3:
movl%ebx, -4(%ecx)
addl$1, %edx
.L2:
addl$4, %ecx
cmpl%eax, %edx
jle .L3

Closing the bug as FIXED.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949



[Bug target/26949] [4.2 regression] worse code generated for -march=pentium4

2006-07-06 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-07-06 08:24 ---
Closing it for real...


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949



[Bug middle-end/28252] pow(x,1/3.0) should be converted to cbrt(x)

2006-07-05 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-07-05 08:25 ---
Created an attachment (id=11824)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11824action=view)
Patch to implement pow(x,1.0/3.0) = cbrt(x) optimization

I have the patch that implements the optimization ready, just waiting for the
mainline to open again. Should I post it to gcc-patches anyway?

2006-07-05  Uros Bizjak  [EMAIL PROTECTED]

* builtins.c (fold_builtin): Fold pow(x,1.0/3.0) as cbrt(x) if
flag_unsafe_math_optimizations is set.

testsuite:

* gcc.dg/builtins-8.c: Also check pow(x,1.0/3.0) to cbrt(x)
transformation.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28252



[Bug middle-end/28252] pow(x,1/3.0) should be converted to cbrt(x)

2006-07-05 Thread uros at kss-loka dot si


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-07-04 22:52:33 |2006-07-05 08:26:53
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28252



[Bug tree-optimization/27474] ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776

2006-07-05 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2006-07-05 10:10 ---
This still fails with current mainline gcc.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

   Last reconfirmed|2006-05-08 07:45:56 |2006-07-05 10:10:38
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474



[Bug middle-end/24929] long long shift/mask operations should be better optimized

2006-06-27 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-06-27 10:12 ---
(In reply to comment #4)

 which may be optimal.

movzbl  18(%esp), %eax

could be used in this particular case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24929



[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-26 Thread uros at kss-loka dot si


--- Comment #20 from uros at kss-loka dot si  2006-06-26 06:31 ---
(In reply to comment #15)

 Can someone tell me if anyone is looking into this problem with the hopes of
 fixing it?  I just noticed that despite the posted code demonstrating the
 problem, and verification on: Pentium Pro, Pentium III, Pentium 4e, Pentium-D,
 Athlon-64 X2 and Opteron, it is still marked as new, and no one is assigned
 to look at it  . . .

Hm, I tried your single testcase (SSE) on:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping: 9
cpu MHz : 3191.917
cache size  : 512 KB

And the results are a bit suprising (this is the exact output of your test):

/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2
-mfpmath=sse -DTYPE=float -c mmbench.c
/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2
-mfpmath=sse -c sgemm_atlas.c
/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2
-mfpmath=sse -o xsmm_gcc mmbench.o sgemm_atlas.o
rm -f *.o
/usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse
-DTYPE=float -c mmbench.c
/usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -c
sgemm_atlas.c
/usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -o
xsmm_gc4 mmbench.o sgemm_atlas.o
rm -f *.o
echo GCC 3.x single performance:
GCC 3.x single performance:
./xsmm_gcc
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.141 3072.00

echo GCC 4.x single performance:
GCC 4.x single performance:
./xsmm_gc4
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.141 3072.00

where:

gcc (GCC) 3.4.6 was tested against gcc version 4.2.0 20060608
(experimental)

FYI: there is another pathological testcase (PR target/19780), where SSE code
is 30% slower on AMD64, despite the fact that for SSE, 16 xmm registers were
available and _no_ memory was accessed in a for loop.

 The reason I ask is that I am preparing the next stable release of ATLAS, and
 I'm getting close to having to make a decision on what compilers I will
 support.
 If someone is working feverishly in the background, I will be sure to wait
 for it, in the hopes that there'll be a fix that will allow me to use
 gcc 4, which I think will be what most of my users want.  If this problem
 is not being looked into, I should not delay the ATLAS release for it, and
 just require my users to install gcc 3 in order to get decent performance.
 
 I realize you guys are busy, and fp performance is probably not your main
 concern, so hopefully this message sounds more like a request for info on what
 is going on, than a bitch about help that I'm getting for free :)  

Without any other information available, I can only speculate, that perhaps
gcc4 code does not fully utilize multiple FP pipelines in the processors you
listed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-26 Thread uros at kss-loka dot si


--- Comment #22 from uros at kss-loka dot si  2006-06-27 05:49 ---
(In reply to comment #21)

 Note that you are running the opposite of my test case: SSE vs SSE rather than
 x87 vs x87.  This whole bug report is about x87 performance.  You can get more
 detail on why I want x87 in my messages above, particularly comment #11, but
 single precision is indeed the place where SSE cannot compete with the x87
 unit.  To see it, put the flags back the way I had them in the attachment, and
 you'll see that gcc 3 is much faster.  Also, you should find in single

Hm, these are x87 results:

/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -DTYPE=float
-c mmbench.c
/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c
sgemm_atlas.c
/usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xsmm_gcc
mmbench.o sgemm_atlas.o
rm -f *.o
/usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -DTYPE=float -c
mmbench.c
/usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c sgemm_atlas.c
/usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xsmm_gc4
mmbench.o sgemm_atlas.o
rm -f *.o
echo GCC 3.x single performance:
GCC 3.x single performance:
./xsmm_gcc
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.141 3072.00

echo GCC 4.x single performance:
GCC 4.x single performance:
./xsmm_gc4
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.143 3029.92


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug c++/28041] [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C

2006-06-19 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-06-19 08:56 ---
Works OK with gcc version 4.2.0 20060619 (experimental).


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28041



[Bug c++/28041] New: [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C

2006-06-15 Thread uros at kss-loka dot si
The compilation crashes in

/* Gimplify an OMP_ATOMIC statement.  */

static enum gimplify_status
gimplify_omp_atomic (tree *expr_p, tree *pre_p)
{
  tree addr = TREE_OPERAND (*expr_p, 0);
  tree rhs = TREE_OPERAND (*expr_p, 1);
   tree type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (addr)));
  HOST_WIDE_INT index;


Program received signal SIGSEGV, Segmentation fault.
0x001bb40c in gimplify_omp_atomic (expr_p=0xff0edf48, pre_p=0xffbed64c) at
/export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5148
(gdb) bt
#0  0x001bb40c in gimplify_omp_atomic (expr_p=0xff0edf48, pre_p=0xffbed64c) at
/export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5148
#1  0x001bd2a0 in gimplify_expr (expr_p=0xff0edf48, pre_p=0xffbed64c,
post_p=0xffbed648, gimple_test_f=0x1aa228 is_gimple_stmt, fallback=fb_none)
at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5646
#2  0x001b6078 in gimplify_statement_list (expr_p=0xffbed6f8) at
/export/home/uros/gcc-svn/trunk/gcc/tree-iterator.h:86
#3  0x001bcdfc in gimplify_expr (expr_p=0xff115970, pre_p=0xffbed784,
post_p=0xffbed780, gimple_test_f=0x1aa228 is_gimple_stmt, fallback=fb_none)
at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5595
#4  0x001bde60 in gimplify_body (body_p=0xff115970, fndecl=0xff115900,
do_parms=1 '\001') at 
/export/home/uros/gcc-svn/trunk/gcc/gimplify.c:6113
#5  0x001be2a8 in gimplify_function_tree (fndecl=0xff115900) at
/export/home/uros/gcc-svn/trunk/gcc/gimplify.c:6189
#6  0x0017917c in c_genericize (fndecl=0xff115900) at
/export/home/uros/gcc-svn/trunk/gcc/c-gimplify.c:106
#7  0x00143864 in cp_genericize (fndecl=0xff115900) at
/export/home/uros/gcc-svn/trunk/gcc/cp/cp-gimplify.c:739
#8  0x00045a74 in finish_function (flags=0) at
/export/home/uros/gcc-svn/trunk/gcc/cp/decl.c:11130


-- 
   Summary: [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: sparc-sun-solaris2.8
  GCC host triplet: sparc-sun-solaris2.8
GCC target triplet: sparc-sun-solaris2.8


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28041



[Bug target/28007] sse autovectorizer emits wrong code involving shifts

2006-06-13 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-06-13 07:44 ---
Similar problem was solved for gcc-4.1 in PR target/22480.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28007



[Bug target/27790] [4.1 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2

2006-06-07 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-06-07 07:05 ---
Fixed on 4.1 branch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-06-02 10:04 ---
(In reply to comment #1)
 There is nothing special about reassociation at all.  In fact what you are
 seeing is register allocator going funky.  This what you get with x87.

This is also what you get with SSE.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-01 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-06-01 08:43 ---
The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit):

vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping: 9
cpu MHz : 3191.917
cache size  : 512 KB

shows even more interesting results:

gcc version 3.4.6
vs.
gcc version 4.2.0 20060601 (experimental)

-fomit-frame-pointer -O -msse2 -mfpmath=sse

GCC 3.x performance:
./xmm_gcc
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.162 2664.87

GCC 4.x performance:
./xmm_gc4
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.164 2633.13

and

-fomit-frame-pointer -O -mfpmath=387

GCC 3.x performance:
./xmm_gcc
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.160 2697.37

GCC 4.x performance:
./xmm_gc4
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.164 2633.15

There is a small performance drop on gcc-4.x, but nothing critical.

I can confirm, that code indeed runs 50% slower on 64bit athlon. Perhaps the
problem is in the order of instructions (Software Optimization Guide for AMD
Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how
things should be, and gcc-4.2 code looks similar to the example, how things
should _NOT_ be.

BTW: Did you try to run the benchmark on AMD target with -march=k8? The effects
of this flag are devastating on Pentium4 CPU:

-O -msse2 -mfpmath=sse -march=k8

./xmm_gcc
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.836  516.79

GCC 4.x performance:
./xmm_gc4
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.287 1504.66


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-06-01 08:43:34
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug tree-optimization/27855] New: reassociation pass produces ~30% slower matrix multiplication code

2006-06-01 Thread uros at kss-loka dot si
The testcase from PR target/27827 shows another problem, this time with
-ffast-math. The runtime performance of -ffast-math code drops for ~30%.

The problem could be traced down to reassociation tree pass, because the
performance jumps back when flag_unsafe_math_optimizations switch is disabled
by changing every occurence in tree-ssa-reassoc.c with
(flag_unsafe_math_optimizations  0).

To see the problem, -funsafe-math-optimizations should be added to MMFLAGS in
target/27827 example Makefile:

MM4FLAGS = $(GMMFLAGS) -funsafe-math-optimizations


Current mainline gcc produces code with following results:

-O -mfpmath=387

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.260 1663.04

-O -msse2 -mfpmath=sse

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.229 1890.47


gcc with disabled reassoc pass for floating point values:

-O -mfpmath=387

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.162 2664.87

-O -msse2 -mfpmath=sse

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

atlasmm   60   1000   0.164 2633.15


-- 
   Summary: reassociation pass produces ~30% slower matrix
multiplication code
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
OtherBugsDependingO 27827
 nThis:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-05-31 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-05-31 10:56 ---
IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure
luck.

Looking into 3.x RTL, these things can be observed:

Instruction that multiplies pA0 and rB0 is described as:

__.20.combine:

(insn 75 73 76 2 (set (reg:DF 84)
(mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64])
(reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65
(nil))
(nil))

At this point, first input operand does not satisfy the operand constraint, so
register allocator pushes memory operand into the register:

__.25.greg:

(insn 703 73 75 2 (set (reg:DF 8 st [84])
(mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96
{*movdf_integer} (nil)
(nil))

(insn 75 703 76 2 (set (reg:DF 8 st [84])
(mult:DF (reg:DF 8 st [84])
(reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse}
(insn_list 65 (nil))
(nil))

This RTL produces following asm sequence:

fldl(%rax)  #* pA0
fmul%st(1), %st #


In 4.x case, we have:

__.127r.combine:

(insn 60 58 61 4 (set (reg:DF 207)
(mult:DF (reg/v:DF 187 [ rB0 ])
(mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))

This instruction almost satisfies operand constraint, and register allocator
produces:

__.138r.greg:

(insn 470 58 60 5 (set (reg:DF 12 st(4) [207])
(reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil)
(nil))

(insn 60 470 61 5 (set (reg:DF 12 st(4) [207])
(mult:DF (reg:DF 12 st(4) [207])
(mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))

Stack handling then fixes this RTL to:

__.151r.stack:

(insn 470 58 60 4 (set (reg:DF 8 st)
(reg:DF 8 st)) 94 {*movdf_integer} (nil)
(nil))

(insn 60 470 61 4 (set (reg:DF 8 st)
(mult:DF (reg:DF 8 st)
(mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))


From your measurement, it looks that instead of:

fld %st(0)  #
fmull   (%rax)  #* pA0.161

it is faster to emit

fldl(%rax)  #* pA0
fmul%st(1), %st #,


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug target/27790] [4.1/4.2 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2

2006-05-29 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-05-29 10:29 ---
I'm testing a patch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-05-29 04:28:52 |2006-05-29 10:29:47
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790



[Bug target/27790] [4.1/4.2 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2

2006-05-29 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-05-29 11:52 ---
(In reply to comment #4)

 pr27790.patch
 
 This seems to work for me.

In V4SImode case above, there is

emit_insn (gen_subv4si3 (t1, cop0, cop1));

subv4si insn also needs cop0 in the register:

(define_expand submode3
  [(set (match_operand:SSEMODEI 0 register_operand )
(minus:SSEMODEI (match_operand:SSEMODEI 1 register_operand )
(match_operand:SSEMODEI 2 nonimmediate_operand )))]
  TARGET_SSE2
  ix86_fixup_binary_operands_no_copy (MINUS, MODEmode, operands);)


 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790



[Bug tree-optimization/27638] New: Strange initialization of uninitialized structure part

2006-05-17 Thread uros at kss-loka dot si
This testcase generates some kind of initalization of uninitialized part of the
structure:

--cut here--
struct ret_struct
{
  int long_buf[10];
};

struct ret_struct
strc_test (int i)
{
  struct ret_struct ret;
  ret.long_buf[0] = i;
  ret.long_buf[1] = 0x2;
  ret.long_buf[2] = 0x3;
  ret.long_buf[3] = 0x4;
  ret.long_buf[4] = 0x5;
  ret.long_buf[5] = 0x6;
  return ret;
}
--cut here--

gcc -O2 -fverbose-asm -fomit-frame-pointer:

--cut here--
strc_test:
movl4(%esp), %eax   # D.1563, D.1563
movl8(%esp), %edx   # i, i
movl%eax, 36(%eax)  # ret$long_buf$9, result.long_buf   !!
movl%eax, 32(%eax)  # ret$long_buf$8, result.long_buf   !!
movl%eax, 28(%eax)  # ret$long_buf$7, result.long_buf   !!
movl%eax, 24(%eax)  # ret$long_buf$6, result.long_buf   !!
movl$6, 20(%eax)#, result.long_buf
movl$5, 16(%eax)#, result.long_buf
movl$4, 12(%eax)#, result.long_buf
movl$3, 8(%eax) #, result.long_buf
movl$2, 4(%eax) #, result.long_buf
movl%edx, (%eax)# i, result.long_buf
ret $4  #
--cut here--

These extra variables can be seen in the _.optimized tree dump:

--cut here--
;; Function strc_test (strc_test)

Analyzing Edge Insertions.
strc_test (i)
{
  int ret$long_buf$9;
  int ret$long_buf$8;
  int ret$long_buf$7;
  int ret$long_buf$6;

bb 2:
  retval.long_buf[9] = ret$long_buf$9;
  retval.long_buf[8] = ret$long_buf$8;
  retval.long_buf[7] = ret$long_buf$7;
  retval.long_buf[6] = ret$long_buf$6;
  retval.long_buf[5] = 6;
  retval.long_buf[4] = 5;
  retval.long_buf[3] = 4;
  retval.long_buf[2] = 3;
  retval.long_buf[1] = 2;
  retval.long_buf[0] = i;
  return retval;

}
--cut here--

Using -Wall, gcc correctly warns about uninitialized part
(why extra 'u' in index?):

t.c:16: warning: 'ret.long_buf[9u]' is used uninitialized in this function
t.c:16: warning: 'ret.long_buf[8u]' is used uninitialized in this function
t.c:16: warning: 'ret.long_buf[7u]' is used uninitialized in this function
t.c:16: warning: 'ret.long_buf[6u]' is used uninitialized in this function

IMO emitting some sort of initialization in this case is not needed.


-- 
   Summary: Strange initialization of uninitialized structure part
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27638



[Bug target/26726] -fivopts producing out of bounds array refs

2006-05-13 Thread uros at kss-loka dot si


--- Comment #14 from uros at kss-loka dot si  2006-05-13 08:46 ---
(In reply to comment #13)
 This is now a target specific problem, on i?86 and x86_64 we are left with an
 offset of -4B and so referencing a[5] in the exit condition.
 
This is PR target/24669.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  BugsThisDependsOn||24669


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26726



[Bug target/27277] [4.2 Regression] standard i387 constant loading insns (fldz, fld1) are not generated anymore

2006-05-08 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2006-05-08 06:12 ---
Fixed.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27277



[Bug tree-optimization/27474] New: ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776

2006-05-07 Thread uros at kss-loka dot si
This ICE happens during compilation of PovRay-3.6.1 with -msse2
-ftree-vectorize (also on x86_64). The ICE is in express.cpp.

The reduced testcase is attached, this is the failure with -O -ftree-vectorize:
g++ -O -ftree-vectorize -m32 -msse2 reduced.cpp
reduced.cpp: In function ‘void pov::Parse_Num_Factor(double*, int*)’:
reduced.cpp:94: internal compiler error: tree check: expected ssa_name, have
struct_field_tag in verify_ssa, at tree-ssa.c:776
Please submit a full bug report,
[etc]

The same failure happens with g++ -O -ftree-vectorize on x86_64.


-- 
   Summary: ICE: tree check: expected ssa_name, have
struct_field_tag in verify_ssa, at tree-ssa.c:776
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474



[Bug tree-optimization/27474] ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776

2006-05-07 Thread uros at kss-loka dot si


--- Comment #1 from uros at kss-loka dot si  2006-05-07 19:30 ---
Created an attachment (id=11396)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11396action=view)
Reduced cpp testcase

The testcase, reduced with Delta.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474



[Bug target/27277] New: standard i387 constant loading insns (fldz, fld1) are not generated anymore

2006-04-24 Thread uros at kss-loka dot si
It looks that standard i387 constant loading insns are not generated anymore.
This testcase:

--cut here--
double test(void)
{
   return 1.0;
}
--cut here--

generates (gcc -O2 -fomit-frame-pointer):
test:
flds.LC0 fld1 should be here
ret

.LC0:
.long   1065353216

The problem is in extendsfdf2 expander, which expects CONST_DOUBLE as an
operand[1] to generate simple constant move instruction. The constant is pushed
to the constant pool (as a SFmode constant) for some reason, so the expander
receives a (reg:SF 60) as an operand[1]. Following RTL sequence is produced:

(insn 9 8 10 (set (reg:SF 60)
(mem/u/c/i:SF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [2 S4 A32])) -1
(nil)
(expr_list:REG_EQUAL (const_double:SF 1.0e+0 [0x0.8p+1])
(nil)))

(insn 10 9 11 (set (reg:DF 58 [ result ])
(float_extend:DF (reg:SF 60))) -1 (nil)
(expr_list:REG_EQUAL (const_double:DF 1.0e+0 [0x0.8p+1])
(nil)))

this sequence corresponds to final asm:

test:
flds.LC0# 16*extendsfdf2_i387/1 [length = 6]
ret # 30return_internal [length = 1]

The same problem arises for other i387 constants.


-- 
   Summary: standard i387 constant loading insns (fldz, fld1) are
not generated anymore
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
GCC target triplet: i386-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27277



[Bug middle-end/27134] [4.1 regression] ICE with floor and -ffast-math

2006-04-16 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-04-16 11:22 ---
Fixed.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work|4.2.0   |4.2.0 4.1.1
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134



[Bug middle-end/27134] [4.1 regression] ICE with floor and -ffast-math

2006-04-14 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2006-04-14 07:18 ---
Fixed on SVN head.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  Known to work||4.2.0
Summary|[4.1/4.2 regression] ICE|[4.1 regression] ICE with
   |with floor and -ffast-math  |floor and -ffast-math


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134



[Bug middle-end/27134] [4.1/4.2 regression] ICE with floor and -ffast-math

2006-04-12 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-04-12 17:54 ---
 There seems to be something wrong with -ffast-math and floor.

I have done some analysis on this. Start from expand_builtin_int_roundingfn()
in builtins.c source, where we fallback to FP rounding optab.

fallback_fndecl from mathfn_builtin looks like:

 function_decl 0x2d992200 __builtin_floor
type function_type 0x2d9756e0
type real_type 0x2d970420 double DF
size integer_cst 0x2d951d80 constant invariant 64
unit size integer_cst 0x2d951db0 constant invariant 8
align 64 symtab 0 alias set -1 precision 64
pointer_to_this pointer_type 0x2d970630
QI
size integer_cst 0x2d9517e0 constant invariant 8
unit size integer_cst 0x2d951810 constant invariant 1
align 8 symtab 0 alias set -1
arg-types tree_list 0x2d9740f0 value real_type 0x2d970420
double
chain tree_list 0x2d96be10 value void_type 0x2d9700b0
void
pointer_to_this pointer_type 0x2dabad10
readonly used nothrow public external built-in decl_6 QI file built-in
line 0
built-in BUILT_IN_NORMAL:BUILT_IN_FLOOR attributes tree_list
0x2d9918d0
(mem:QI (symbol_ref:DI (floor) [flags 0x41] function_decl 0x2d992200
__builtin_floor) [0 S1 A8]) chain function_decl 0x2d992300 floor



After that, build_function_call_expr() is called, with an argument list:

 tree_list 0x2dabf180
value float_expr 0x2d95b240
type real_type 0x2d970420 double DF
size integer_cst 0x2d951d80 constant invariant 64
unit size integer_cst 0x2d951db0 constant invariant 8
align 64 symtab 0 alias set -1 precision 64
pointer_to_this pointer_type 0x2d970630

arg 0 parm_decl 0x2d958780 i type integer_type 0x2d9604d0
int
used SI file pr27134.c line 5
size integer_cst 0x2d951bd0 constant invariant 32
unit size integer_cst 0x2d9516f0 constant invariant 4
align 32 context function_decl 0x2daa1600 foo initial
integer_type 0x2d9604d0 int
(reg/v:SI 59 [ i ]) arg-type integer_type 0x2d9604d0 int

This is simplified in fold_build3() to:

 nop_expr 0x2dac5300
type real_type 0x2d970420 double DF
size integer_cst 0x2d951d80 constant invariant 64
unit size integer_cst 0x2d951db0 constant invariant 8
align 64 symtab 0 alias set -1 precision 64
pointer_to_this pointer_type 0x2d970630

arg 0 float_expr 0x2d95b240 type real_type 0x2d970420 double

arg 0 parm_decl 0x2d958780 i type integer_type 0x2d9604d0
int
used SI file pr27134.c line 5
size integer_cst 0x2d951bd0 constant invariant 32
unit size integer_cst 0x2d9516f0 constant invariant 4
align 32 context function_decl 0x2daa1600 foo initial
integer_type 0x2d9604d0 int
(reg/v:SI 59 [ i ]) arg-type integer_type 0x2d9604d0 int
incoming-rtl (reg:SI 5 di [ i ])

It looks to me, that fold_convert3() is trying to kill
(int) __builtin_lfloor ((double) i), where i is an integer argument.

Uros.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-04-12 14:59:12 |2006-04-12 17:54:41
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134



[Bug middle-end/27139] New: Optimize double INT-FP-INT conversions

2006-04-12 Thread uros at kss-loka dot si
This testcase:

int test (int a)
{
return (double) a;
}

Produces:

cvtsi2sd%edi, %xmm0
cvttsd2si   %xmm0, %eax
ret

However, following code does the same (at least for -ffast-math):
movl%edi, %eax
ret


-- 
   Summary: Optimize double INT-FP-INT conversions
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27139



[Bug middle-end/27069] -ffast-math crash

2006-04-07 Thread uros at kss-loka dot si


--- Comment #14 from uros at kss-loka dot si  2006-04-07 06:10 ---
This is a duplicate of PR 26869.

*** This bug has been marked as a duplicate of 26869 ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27069



[Bug middle-end/26869] [4.1/4.2 Regression] Segfault in find_lattice_value() for complex operands.

2006-04-07 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2006-04-07 06:10 ---
*** Bug 27069 has been marked as a duplicate of this bug. ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||nuno dot bandeira at ist dot
   ||utl dot pt


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26869



[Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math

2006-03-29 Thread uros at kss-loka dot si


--- Comment #12 from uros at kss-loka dot si  2006-03-29 14:08 ---
(In reply to comment #11)
 it looks like 4.1.1 and 4.2.0 still produce unoptimal code.

 test:   pushl   %ebp
 movl%esp, %ebp
 fldl8(%ebp)
 fldz
 fcomip  %st(1), %st
 jae .L2
 popl%ebp
 fcos
 ret
 
 .L2:popl%ebp
 fsin
 ret

No, this code is optimal. Please compare the code above to the code in
description, where fcos is calculated even if x = 0.0


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187



[Bug middle-end/26869] New: Segfault in find_lattice_value() for complex operands.

2006-03-25 Thread uros at kss-loka dot si
This testcase segfaults in find_lattice_value() in tree-complex.c line 116:

_Complex float f (_Complex float b, _Complex float c)
{
_Complex float a = 1.0 + 0.0i;
return a / c;
}

gcc -O x.c
x.c: In function ‘f’:
x.c:2: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.

(gdb) bt
#0 find_lattice_value (t=0x0) at ../../gcc-svn/trunk/gcc/tree-complex.c:116
#1 0x00853e50 in set_component_ssa_name (ssa_name=0x0, imag_p=0 '\0',
value=0x2d95b600) at ../../gcc-svn/trunk/gcc/tree-complex.c:485
#2 0x00854126 in update_complex_components_on_edge (e=0x2d95b300,
lhs=0x0, r=Variable r is not available.
) at ../../gcc-svn/trunk/gcc/tree-complex.c:608
#3 0x008579eb in tree_lower_complex () at
../../gcc-svn/trunk/gcc/tree-complex.c:658


-- 
   Summary: Segfault in find_lattice_value() for complex operands.
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26869



[Bug middle-end/26717] [4.2 Regression] complex/complex gives a REAL_CST

2006-03-23 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2006-03-23 10:33 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01435.html


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2006-
   ||03/msg01435.html
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-03-16 16:21:05 |2006-03-23 10:33:46
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26717



[Bug target/13685] Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction)

2006-02-22 Thread uros at kss-loka dot si


--- Comment #17 from uros at kss-loka dot si  2006-02-22 10:15 ---
Works OK with gcc-4.2 and -Os -msse -fomit-frame-pointer.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685



[Bug driver/26274] New: gcc --target-help segfaults

2006-02-13 Thread uros at kss-loka dot si
gcc segfaults in print_filtered_help when invoked with --target-help option:

Starting program: /export/home/uros/gcc-build/gcc/cc1 --target-help

Target specific options:

Program received signal SIGSEGV, Segmentation fault.
0x0838547b in print_filtered_help (flag=4194304)
at ../../gcc-svn/trunk/gcc/opts.c:1335
1335  memset (printed, 0, cl_options_count);
(gdb) bt
#0  0x0838547b in print_filtered_help (flag=4194304)
at ../../gcc-svn/trunk/gcc/opts.c:1335
#1  0x0838635e in decode_options (argc=2, argv=0xba24)
at ../../gcc-svn/trunk/gcc/opts.c:746
#2  0x083eb77b in toplev_main (argc=2, argv=0xba24)
at ../../gcc-svn/trunk/gcc/toplev.c:1970


print_filtered_help (unsigned int flag)
{
  unsigned int i, len, filter, indent = 0;
  bool duplicates = false;
  const char *help, *opt, *tab;
  static char *printed;

  if (flag == CL_COMMON || flag == CL_TARGET)
{
  filter = flag;
  if (!printed)
printed = xmalloc (cl_options_count);
memset (printed, 0, cl_options_count);   
}
  else


-- 
   Summary: gcc --target-help segfaults
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26274



[Bug target/17390] missing floating point compare optimization

2006-01-18 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2006-01-18 09:50 ---
(In reply to comment #7)

 Hmm, I get (but that looks like different branch predictions):

It looks that your default is -mtune=pentium.

 _testf:
 fldl4(%esp)
 ftst
 fnstsw  %ax
 testb   $64, %ah
 jne L10
 ftst
 fnstsw  %ax
 fstp%st(0)
 testb   $69, %ah
 jne L5
 fld1
 ret
 .align 2,0x90
 L10:
 fstp%st(0)
 fldz
 ret
 L5:
 fldsLC2
 ret

With proposed patch, this code is compiled to (-O2 -ffast-math -mtune=pentium
-fomit-frame-pointer):

testf:
fldl   4(%esp)
ftst
fnstsw %ax
fstp   %st(0)
testb  $64, %ah
jne .L10
testb  $69, %ah
jne .L5
fld1
ret
.p2align 4,,7
.L10:
fldz
ret
.L5:
flds   .LC2
ret

and for -mtune=i686:

testf:
fldl   4(%esp)
ftst
fnstsw %ax
fstp   %st(0)
sahf
je .L10
jbe .L5
fld1
ret
.p2align 4,,7
.L10:
fldz
.p2align 4,,8
ret
.L5:
flds   .LC2
.p2align 4,,4
ret

BTW: I'll attach a patch, rediffed to current SVN.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17390



[Bug target/17390] missing floating point compare optimization

2006-01-18 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2006-01-18 09:53 ---
Created an attachment (id=10666)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10666action=view)
patch to SVN GCC: (GNU) 4.2.0 20060117 (experimental)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17390



[Bug regression/25531] New: [4.0/4.1/4.2 Regression]: Handling of __attribute__ ((alias (foo+X)))

2005-12-22 Thread uros at kss-loka dot si
A regresion with __attribute__ ((alias (foo+X))) breaks newlib builds. The
testcase is distilled from newlib-1.13.0/newlib/libc/ctype/ctype_.c:

--cut here--
static const char _foo_b[4] = {
  'a', 'b', 'c', 'd'
};

extern const char _foo_[4] __attribute__ ((alias (_foo_b+2)));
--cut here--

gcc-3.4:
 ~/gcc-build-34/gcc/cc1 x.c
 test

 more x.s
.file   x.c
.section.rodata
.type   _foo_b, @object
.size   _foo_b, 4
_foo_b:
.byte   97
.byte   98
.byte   99
.byte   100
.globl _foo_
.set_foo_,_foo_b+2
.section.note.GNU-stack,,@progbits
.ident  GCC: (GNU) 3.4.5 20051110 (prerelease)

gcc-4.x:
~/gcc-build/gcc/cc1 x.c
x.c:5: error: '_foo_' aliased to undefined symbol '_foo_b+2'


-- 
   Summary: [4.0/4.1/4.2 Regression]: Handling of __attribute__
((alias (foo+X)))
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25531



[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32

2005-12-01 Thread uros at kss-loka dot si


--- Comment #10 from uros at kss-loka dot si  2005-12-02 06:59 ---
Fixed on 4.1 and mainline.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475



[Bug tree-optimization/20219] Missed optimisation sin / tan -- cos

2005-11-27 Thread uros at kss-loka dot si


--- Comment #3 from uros at kss-loka dot si  2005-11-28 07:20 ---
Reopened to ...


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|WONTFIX |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20219



[Bug middle-end/20219] Missed optimisation sin / tan -- cos

2005-11-27 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2005-11-28 07:32 ---
... close as FIXED.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
  Component|tree-optimization   |middle-end
 Resolution||FIXED
   Target Milestone|--- |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20219



[Bug target/24476] [4.1/4.2 Regression] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA64

2005-11-24 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2005-11-24 08:09 ---
The testsuite patch that fixes IA32 tests (and should also fix IA64 issues
reported here) is at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01059.html.

Patch is still waiting for review, however I can't test it on IA64.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  BugsThisDependsOn||24475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24476



[Bug rtl-optimization/24995] [4.1/4.2 Regression] gcc.dg/vect/vect-10.c fails for -march=athlon

2005-11-24 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2005-11-24 10:19 ---
This also fails for i686-pc-linux-gnu with '-march=athlon'.

The patch at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01648.html fixes
i86_64-pc-linux-gnu failure in original report and -march=athlon failure.

FWIW, -fomit-frame-pointer also fixes these failures.

This PR is a duplicate of 24982.

*** This bug has been marked as a duplicate of 24982 ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 GCC target triplet|x86_64-linux-gnu|i686-pc-linux-gnu
 Resolution||DUPLICATE
Summary|[4.1/4.2 Regression]|[4.1/4.2 Regression]
   |gcc.dg/vect/vect-10.c fails |gcc.dg/vect/vect-10.c fails
   |on x86_64 with -m32 |for -march=athlon


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24995



[Bug target/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p

2005-11-24 Thread uros at kss-loka dot si


--- Comment #5 from uros at kss-loka dot si  2005-11-24 10:19 ---
*** Bug 24995 has been marked as a duplicate of this bug. ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu dot
   ||org
Bug 24982 depends on bug 24995, which changed state.

Bug 24995 Summary: [4.1/4.2 Regression] gcc.dg/vect/vect-10.c fails for 
-march=athlon
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24995

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||DUPLICATE

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982



[Bug rtl-optimization/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p

2005-11-24 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2005-11-24 10:24 ---
(In reply to comment #4)
 I've proposed a patch to this PR in
 
 http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01648.html
 
 Does it solve PR 24995?

Yes, both i86_64 and -march=athlon failures.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||rth at gcc dot gnu dot org
  BugsThisDependsOn|24995   |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2005-
   ||11/msg01648.html
 Status|UNCONFIRMED |NEW
  Component|target  |rtl-optimization
 Ever Confirmed|0   |1
   GCC host triplet|sh4-*-linux-gnu |
 GCC target triplet|sh4-*-linux-gnu |
   Keywords||patch
   Last reconfirmed|-00-00 00:00:00 |2005-11-24 10:24:02
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982



[Bug rtl-optimization/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p

2005-11-24 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2005-11-24 14:40 ---
Critical, according to comment #7 and #8.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si
   Severity|normal  |critical
   Priority|P3  |P1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982



[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32

2005-11-15 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2005-11-15 08:13 ---
Perhaps a runtime check should be added to target-supports.exp
( check_effective_target_tls-runtime perhaps) that would check if the system is
capable of running tls enabled binaries. Alternatively, my proposed patch
(http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00963.html) could try to run the
tls testcase, instead of just compiling it.

However, addind { dg-require-effective-target tls-runtime }, runtime tests will
also be skipped on the system that is otherwise able to compile testcases.

The job of compiler is IMO to compile sources correctly, and the purpose of
runtime test is to check if the system is able to run testcases. Runtime
failure, reported here, just says that the tested system is not able to run the
testcase and that the system should be upgraded/fixed.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475



[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32

2005-11-15 Thread uros at kss-loka dot si


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC|uros at kss-loka dot si |
 AssignedTo|unassigned at gcc dot gnu   |uros at kss-loka dot si
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2005-
   ||11/msg01059.html
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Keywords||patch
   Last reconfirmed|-00-00 00:00:00 |2005-11-15 13:43:11
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475



[Bug libgomp/24797] Segfault in libgomp.c/nested-1.c

2005-11-13 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2005-11-14 07:13 ---
Fixed by Jakub's patch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24797



[Bug rtl-optimization/15439] ICE with -fschedule-insns2 -fsched2-use-traces

2005-11-11 Thread uros at kss-loka dot si


--- Comment #4 from uros at kss-loka dot si  2005-11-11 08:20 ---
This is in fact duplicate of PR 19340. Fixed in 3.4.5.

*** This bug has been marked as a duplicate of 19340 ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work|4.0.0   |4.0.0 3.4.5
 Resolution||DUPLICATE
   Target Milestone|--- |3.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15439



[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.

2005-11-11 Thread uros at kss-loka dot si


--- Comment #10 from uros at kss-loka dot si  2005-11-11 08:20 ---
*** Bug 15439 has been marked as a duplicate of this bug. ***


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||coyote at coyotegulch dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340



[Bug libgomp/24797] New: Segfault in libgomp.c/nested-1.c

2005-11-11 Thread uros at kss-loka dot si
Hello!

Testcase libgomp.c/netsted-1.c currently segfaults when run on
i686-pc-linux.gnu (pentium4) wiht non-TLS libc (Redhat 8.0):

(gdb) run
[Thread debugging using libthread_db enabled]
[New Thread 8192 (LWP 700)]
[New Thread 16385 (LWP 702)]
[New Thread 8194 (LWP 703)]
[New Thread 16387 (LWP 704)]
[New Thread 24580 (LWP 705)]
[New Thread 32773 (LWP 706)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 32773 (LWP 706)]
0x42073fe0 in _int_free () from /lib/i686/libc.so.6
(gdb) 

Backtrace:

#6  0x420da1ca in thread_start () from libc.so.6
#5  0x4005fa45 in pthread_start_thread_event () from libpthread.so.0
#4  0x4005f94d in pthread_start_thread () from libpthread.so.0
#3  0x4005e65a in __pthread_do_exit () from libpthread.so.0
#2  0x4006233c in __pthread_destroy_specifics () from libpthread.so.0
#1  0x42074a2c in free () from libc.so.6
#0  0x42073fe0 in _int_free () from libc.so.6


-- 
   Summary: Segfault in libgomp.c/nested-1.c
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24797



[Bug rtl-optimization/24319] [3.4/4.0/4.1 regression] amd64 register spill error with -fschedule-insns

2005-11-09 Thread uros at kss-loka dot si


--- Comment #6 from uros at kss-loka dot si  2005-11-09 15:27 ---
The problem is caused by the combination of (1) x86_64 parameter passing
convention, (2) x86 instructions that _require_ parameters in specific
registers and (3) sched1 scheduling pass.

ad 1)

x86_64 passes function parameters in registers in the order, defined in
x86_64_int_parameter_registers[] array.

  5 /*RDI*/, 4 /*RSI*/, 1 /*RDX*/, 2 /*RCX*/,
  FIRST_REX_INT_REG /*R8 */, FIRST_REX_INT_REG + 1 /*R9 */

Additionally, RAX is used as a hidden argument register.

In original example, call sequence to memory_to_string is constructed as:

(insn 17 15 18 0 (set (reg:DI 4 si)
(reg:DI 61)) 81 {*movdi_1_rex64} (insn_list:REG_DEP_TRUE 15 (nil))
(expr_list:REG_DEAD (reg:DI 61)
(nil)))

(insn 18 17 19 0 (set (reg:DI 5 di [ c_string ])
(reg/v/f:DI 60 [ c_string ])) 81 {*movdi_1_rex64} (nil)
(expr_list:REG_DEAD (reg/v/f:DI 60 [ c_string ])
(nil)))

(call_insn 19 18 20 0 (set (reg:DI 0 ax)
(call (mem:QI (symbol_ref:DI (memory_to_string) [flags 0x3]
function_decl 0x4044f080 memory_to_string) [0 S1 A8])
(const_int 0 [0x0]))) 732 {*call_value_0_rex64}
(insn_list:REG_DEP_TRUE 17 (insn_list:REG_DEP_TRUE 18 (nil)))
(expr_list:REG_DEAD (reg:DI 4 si)
(expr_list:REG_DEAD (reg:DI 5 di [ c_string ])
(expr_list:REG_EH_REGION (const_int 0 [0x0])
(nil
(expr_list:REG_DEP_TRUE (use (reg:DI 5 di [ c_string ]))
(expr_list:REG_DEP_TRUE (use (reg:DI 4 si))
(nil


ad 2)

Please note, that this sequence can be found just after *strlenqi_rex_1
mega-pattern. This pattern requires parameters to be put in excactly defined
registers:

(define_insn *strlenqi_rex_1
  [(set (match_operand:DI 0 register_operand =c)
(unspec:DI [(mem:BLK (match_operand:DI 5 register_operand 1))
(match_operand:QI 2 register_operand a)
(match_operand:DI 3 immediate_operand i)
(match_operand:DI 4 register_operand 0)] UNSPEC_SCAS))
   (use (reg:SI DIRFLAG_REG))
   (clobber (match_operand:DI 1 register_operand =D))
   (clobber (reg:CC FLAGS_REG))]

However, at the time of sched1 pass (before reload) hard registers are not
known yet. We have following RTL pattern just above memory_to_string call
sequence (reg_notes are not shown for clarity):

(insn 13 12 14 0 (parallel [
(set (reg:DI 63)
(unspec:DI [
(mem:BLK (reg/f:DI 65 [ c_string ]) [0 A8])
(reg:QI 67)
(const_int 1 [0x1])
(reg:DI 66)
] 20))
(use (reg:SI 19 dirflag))
(clobber (reg/f:DI 65 [ c_string ]))
(clobber (reg:CC 17 flags))
]) 511 {*strlenqi_rex_1}


ad 3)

Sched1 pass is free to move (insn 17) and (insn 18) before (insn 13) as it
doesn't recognize register allocating conflicts between these instructions.
Following that move, reload has no registers to spill and ICEs.

The testcase from comment #3 ICEs with:
error: unable to find a register to spill in class âAREGâ

Here, the same problem could be observed. As foo is missing a prototype,
hidden RAX register gets allocated in addition to RDI:

(insn 20 18 21 0 (set (reg:DI 5 di)
(reg:DI 61)) 81 {*movdi_1_rex64} (insn_list:REG_DEP_TRUE 18 (nil))
(expr_list:REG_DEAD (reg:DI 61)
(nil)))

(insn 21 20 22 0 (set (reg:QI 0 ax)
(const_int 0 [0x0])) 55 {*movqi_1} (nil)
(nil))

(call_insn 22 21 23 0 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI (foo) [flags 0x41] function_decl
0x402cbd80 foo) [0 S1 A8])
(const_int 0 [0x0]))) 732 {*call_value_0_rex64}
(insn_list:REG_DEP_TRUE 20 (insn_list:REG_DEP_TRUE 21 (nil)))
(expr_list:REG_DEAD (reg:DI 5 di)
(nil))
(expr_list:REG_DEP_TRUE (use (reg:QI 0 ax))
(expr_list:REG_DEP_TRUE (use (reg:DI 5 di))
(nil

This AX register is then moved before strlenqi_rex_1 pattern and this blocks
the AX register. (BTW: If prototype of foo is added, this particular testcase
compiles OK.)

One possible fix to this problem would be not to schedule instructions that
have assigned hard registers (move insns in above case). Considering the number
of x86 instructions, that require fixed registers I would suggest bugmasters to
raise the priority of this bug.

The x86 backend should not have these problems, but using -mregparm=X I think
it could also be tricked to this sort of ICEs.

(BTW: I have added Jim Wilson to CC of this bug as he is current maintaine of
insn scheduling pass code. Perhaps he has some ideas on how to solve this
problem.)


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu dot

[Bug target/24315] [3.4 Regression] amd64 fails -fpeephole2

2005-11-09 Thread uros at kss-loka dot si


--- Comment #17 from uros at kss-loka dot si  2005-11-10 07:31 ---
Fixed on 3.4 branch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to fail|3.3.5 4.0.2 3.4.5   |3.3.5 4.0.2
  Known to work|3.2.3 4.1.0 4.0.3   |3.2.3 4.1.0 4.0.3 3.4.5
 Resolution||FIXED
   Target Milestone|4.0.3   |3.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24315



[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.

2005-11-09 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2005-11-10 07:33 ---
Fixed on 3.4 branch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

  Known to work|4.0.3 4.1.0 |4.0.3 4.1.0 3.4.5
   Target Milestone|4.0.3   |3.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340



[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.

2005-11-08 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2005-11-08 08:12 ---
Fixed on mainline and 4.0 branch.


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to fail|3.4.0 4.0.0 |3.4.0
  Known to work||4.0.3 4.1.0
 Resolution||FIXED
   Target Milestone|--- |4.0.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340



[Bug c/24101] [3.4/4.0/4.1 Regression] Segfault with preprocessed source

2005-11-08 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2005-11-08 10:04 ---
Patch here: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00498.html


-- 

uros at kss-loka dot si changed:

   What|Removed |Added

URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2005-
   ||11/msg00498.html
   Keywords||patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24101



[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro

2005-11-08 Thread uros at kss-loka dot si


--- Comment #7 from uros at kss-loka dot si  2005-11-08 12:40 ---
Created an attachment (id=10173)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10173action=view)
Patch to fix the ice

This patch fixes the failure for me, but...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265



[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro

2005-11-08 Thread uros at kss-loka dot si


--- Comment #8 from uros at kss-loka dot si  2005-11-08 12:53 ---
 This patch fixes the failure for me, but...

... we actually gain nothing here.

From .loop2_done, we have following sequence, where mem-reg load is pushed out
of the loop:

(insn 21 16 39 0 (set (reg:DF 64)
(mem/u/c/i:DF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0 S8 A64])) -1
(nil)
(nil))
;; End of basic block 0, registers live:
 (nil)

(note 39 21 17 NOTE_INSN_LOOP_BEG)

;; Start of basic block 1, registers live: (nil)
(code_label 17 39 18 1 2  [1 uses])

(note 18 17 47 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 47 18 22 1 (set (mem:DF (plus:SI (reg/f:SI 7 sp)
(const_int 8 [0x8])) [0 S8 A32])
(reg:DF 64)) -1 (nil)
(nil))


However, in .postreload, the insn 21 (now insn 53) is moved back _into_ the
loop (why?):

(note 21 16 39 0 NOTE_INSN_DELETED)
;; End of basic block 0, registers live:
 6 [bp] 7 [sp] 16 [argp] 20 [frame] 60 64

(note 39 21 17 NOTE_INSN_LOOP_BEG)

;; Start of basic block 1, registers live: 6 [bp] 7 [sp] 59 60 64
(code_label 17 39 18 1 2  [1 uses])

(note 18 17 53 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 53 18 47 1 (set (reg:DF 8 st)
(mem/u/c/i:DF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0 S8 A64])) 63
{*movdf_noin
teger} (nil)
(nil))

(insn 47 53 54 1 (set (mem:DF (plus:SI (reg/f:SI 7 sp)
(const_int 8 [0x8])) [0 S8 A32])
(reg:DF 8 st)) 63 {*movdf_nointeger} (nil)
(nil))


Proposed patch thus only fixes the damage. Otherwise, all this register
moving/copying doesn't gain anything, as reload fixes something on its own.

Also, REG_EQUAL notes are lost (before and after the patch). This results in
following asm:

...
movl $-1717986918, 8(%esp)
movl $1070176665, 12(%esp)
fldl -16(%ebp)
fstpl   (%esp)
call dset
movl $1, %ebx
.L2:
fldl .LC0   reload moves this insn back into the loop
fstpl   8(%esp)
fldl -16(%ebp)
fstpl   (%esp)
call dset
incl %ebx
cmpl $4, %ebx
jne  .L2
addl $36, %esp
...



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265



[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro

2005-11-08 Thread uros at kss-loka dot si


--- Comment #9 from uros at kss-loka dot si  2005-11-08 13:23 ---
Bah... set_unique_reg_note is needed:

  /* If new move insn is invalid (i.e. move of const_double to
 387 stack register), force constant into memory.  */
  if (recog_memoized (inv-insn) == -1)
{
  rtx src = SET_SRC (set);

  if (GET_CODE (src) == CONST_DOUBLE)
{
  SET_SRC (set) = validize_mem (force_const_mem (mode, src));
  set_unique_reg_note (inv-insn, REG_EQUAL, src);
}
}

to produce:

movl $1, %ebx
.L2:
movl $-1717986918, 8(%esp)
movl $1070176665, 12(%esp)
fldl -16(%ebp)
fstpl   (%esp)
call dset
addl $1, %ebx
cmpl $4, %ebx
jne  .L2



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265



  1   2   3   4   >