[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch
--- Comment #3 from uros at kss-loka dot si 2006-10-28 09:43 --- Fixed on 4.3 mainline -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Known to fail||4.2.0 Known to work||4.3.0 Last reconfirmed|-00-00 00:00:00 |2006-10-28 09:43:15 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377
[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch
--- Comment #5 from uros at kss-loka dot si 2006-10-28 10:04 --- Fixed for 4.1.2. -- uros at kss-loka dot si changed: What|Removed |Added Known to work|4.3.0 |4.1.2 4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377
[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.
--- Comment #13 from uros at kss-loka dot si 2006-10-26 22:22 --- Just some performance numbers (sorry for the C testcase...) on x86_64: --cut here-- #include math.h #include stdio.h int main() { double x; double t = 0.0; for (x = 1000.0; x 0.0; x -= 1.0) t += fmod (x, 1.7e-8); printf(%f\n, t); return 0; } --cut here-- [EMAIL PROTECTED] x86_64-test]$ gcc -march=k8 -O2 -lm mod.c [EMAIL PROTECTED] x86_64-test]$ time ./a.out 0.089927 real0m4.304s user0m4.294s sys 0m0.009s [EMAIL PROTECTED] x86_64-test]$ gcc -march=k8 -O2 -lm -mfpmath=387 mod.c [EMAIL PROTECTED] x86_64-test]$ time ./a.out 0.089927 real0m0.351s user0m0.349s sys 0m0.002s I know that this measurement depends on the library implementation, but this is current situation, where above tests shows that intrinsic MOD is 12.3 _times_ faster. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518
[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.
--- Comment #6 from uros at kss-loka dot si 2006-10-25 07:33 --- Revision 118024 clears the way for MOD and MODULO implementation: http://gcc.gnu.org/ml/gcc-cvs/2006-10/msg00703.html BTW: I don't know fortran requirements, but built-in functions produce faster code if errno is not needed. -mno-math-errno should be used in this case. -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518
[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.
--- Comment #8 from uros at kss-loka dot si 2006-10-25 11:48 --- (In reply to comment #7) Just to be sure I understand: we are garanteed that BUILT_IN_REMAINDER{F,,L} and BUILT_IN_FMOD{F,,L} are always available, right? Yes. The expansion does not depend on -ffast-math anymore. However, the named pattern should be present in .md files. Currently, i386 provides named pattern for -mfpmath=387, but not for -mfpmath=sse. In the later case, expansion will fall-back to normal library call. gfortran doesn't have a need for errno to be set after math functions are called. However, we do want that have correct results in all cases: Inf, NaN, subnormals, etc. From my reading of the manual, -fno-math-errno would imply that we do not get such correct results, am I right? Fortunatelly, no. The result will be correct. You can see the effect of -fno-math-errno at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01158.html. Fixup code detects NaN (as an abnormal return from builtin funcion) and calls library function in order to set global variable errno. If global variable errno is not needed (as I suspect is the case with fortran libraries), fixup code is not needed, so -fno-math-errno shoul be used. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518
[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse
--- Comment #6 from uros at kss-loka dot si 2006-10-25 12:04 --- (In reply to comment #5) With more registers (x86_64) the stack moves are gone, but: (!) (testing done on AMD Athlon fam 15 model 35 stepping 2) On Xeon 3.6, SSE is now faster: gcc -O2 -march=pentium4 -mfpmath=387 pr19780.c time ./a.out Start? Stop! Result = 0.00, 0.00, 1.00 real0m0.805s user0m0.804s sys 0m0.000s gcc -O2 -march=pentium4 -mfpmath=sse pr19780.c time ./a.out Start? Stop! Result = 0.00, 0.00, 1.00 real0m0.707s user0m0.704s sys 0m0.004s vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.60GHz stepping: 10 cpu MHz : 3600.970 cache size : 2048 KB The question is now, why is Athlon so slow with SFmode SSE? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780
[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse
--- Comment #7 from uros at kss-loka dot si 2006-10-25 12:18 --- (In reply to comment #6) On Xeon 3.6, SSE is now faster: ... but for -ffast-math: SSE: user0m0.756s x87: user0m0.612s Yes, x87 is faster for -ffast-math by some 20%. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780
[Bug fortran/24518] Intrinsic MOD incorrect for large arg1/arg2 and slow.
--- Comment #10 from uros at kss-loka dot si 2006-10-25 14:16 --- (In reply to comment #9) In the later case, expansion will fall-back to normal library call. OK. So on system where the math library doesn't have remainderl, for example, we shouldn't use BUILT_IN_REMAINDERL or it will be missing at link-time? If that's the case, then we can't implement MOD/MODULO with these built-ins. You can check for TARGET_C99_FUNCTIONS before they are used. Fortunatelly, no. The result will be correct. You can see the effect of -fno-math-errno at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01158.html. And now, a harder question: could we activate no-math-errno on a per-call basis? That is, have the front-end emit a call to BUILT_IN_FOO and specify that, for this call, errno doesn't have to be set? errno expansion for this particular built-in is inhibited in line 1995 of builtins.c. For a per-call basis, we need an argument to expand_builtin() function to disable errno expansion. However, the rationale for this is unclear to me. IMO - either we use errno, or we don't. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24518
[Bug target/27440] [4.0/4.1/4.2 regression] code quality regression due to ivopts
--- Comment #7 from uros at kss-loka dot si 2006-10-10 14:48 --- (In reply to comment #6) Confirmed (as in comment #1). With -Os instead of -O2 we even produce .L3: movl%ebx, -4(%edx) The -4(...) part comes from PR 24669. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440
[Bug target/28924] x86 sync builtins fail for char and short memory operands
--- Comment #8 from uros at kss-loka dot si 2006-10-07 06:12 --- Testcase was commited to trunk and 4.1 branch, and now passes everywhere. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to fail|4.1.0 4.2.0 |4.1.0 Known to work||4.2.0 4.1.2 Resolution||FIXED Target Milestone|--- |4.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924
[Bug target/29377] New: Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch
Build for h8300-elf target crashes on 64bit hosts with: ../../gcc-svn/trunk/gcc/libgcc2.c: In function '__muldi3': ../../gcc-svn/trunk/gcc/libgcc2.c:542: error: unrecognizable insn: (insn 234 233 235 2 ../../gcc-svn/trunk/gcc/libgcc2.c:533 (set (reg:HI 3 r3) (const_int 4294967214 [0xffae])) -1 (nil) (nil)) ../../gcc-svn/trunk/gcc/libgcc2.c:542: internal compiler error: in extract_insn, at recog.c:2077 -- Summary: Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch Product: gcc Version: 4.2.0 Status: UNCONFIRMED Keywords: build Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: h8300-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377
[Bug target/29377] Build for h8300-elf crashes on 64bit hosts due to int/HWI mismatch
--- Comment #1 from uros at kss-loka dot si 2006-10-07 07:51 --- Propsoed patch at http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00337.html -- uros at kss-loka dot si changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2006- ||10/msg00337.html Keywords||patch http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29377
[Bug target/28924] x86 sync builtins fail for char and short memory operands
--- Comment #4 from uros at kss-loka dot si 2006-10-06 08:27 --- Please note, that in addition to http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00250.html, http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00244.html is also needed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924
[Bug target/29337] -mfpmath=387 doesn't use fistp for double-to-integer conversion
--- Comment #8 from uros at kss-loka dot si 2006-10-05 07:08 --- try -O2 -msse2, you get: _Z8todoubledd: subl$12, %esp fldl24(%esp) faddl 16(%esp) fstpl (%esp) movsd (%esp), %xmm0 addl$12, %esp cvttsd2si %xmm0, %eax ret Though I think the movsd should not be there but that is a different issue. This is PR 19398. I have a patch that adds a bunch of peephole2 patterns to address this particular issue. The patch is already approved and waits for stage1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29337
[Bug target/29347] i386 mode switching clobbers fp exception handling bits
--- Comment #2 from uros at kss-loka dot si 2006-10-05 07:51 --- (In reply to comment #0) The mode switching for floating point rounding that the i386 backend does does not actually place mode switches, but rather the calculation of values used for mode switches. Not only does that defeat the purpose of doing lazy code motion of the mode switches themselves (this problem could easily be remedied by handling the actual mode switches as a separate entity), it also leads to information in the floating point control register being clobbered if the user changes it (e.g. with feenableexcept: http://www.gnu.org/software/libc/manual/html_node/Control-Functions.html) between the calculation of the value used for a mode switch, and the point where a mode switch actually takes place. Please note, that gcc i386 description is missing FP control register definition, so x86_fnstcw_1 and x86_fldcw_1 patterns are totally wrong - they handle control register, not status register. After that, we can add correct clobber to x87 FP-int instructions. Regarding mode-switching values calculation: please note that x87 arithmetic instructions depend on control word. Currently, this is solved by setting and restoring control word just before/after fist instruction, otherwise (use (reg:HI FPCW_REG)) has to be added to all affected instructions. I think that it has to be added anyway, if fesetround() is to be used. Some time ago, I had a patch that added FPCW_REG to i386.h, I'll look if I can still found it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29347
[Bug target/29337] -mfpmath=387 doesn't use fistp for double-to-integer conversion
--- Comment #3 from uros at kss-loka dot si 2006-10-04 06:46 --- I'm afraid you're missing my point. The problem is that for 64-bit and 32-bit floating-point to integer conversion, x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN -mfpmath=387. This defeats the purpose of the option -mfpmath=387 which is supposed to make floating-point computations to use 387, instead of SSE2. If SSE is available, then SSE cvt* is used in order to avoid long control-word setting sequences. This is cheaper even if we have to move value from x87 register, as cvt* can handle mem-reg transformations. If you really need fistp* sequence, you can try with -mno-sse2 (you can't just disable sse on x86_64 target) or perhaps use -msse3, where fisttp insn will be generated. Saying that, I wonder where excess precision effects come into play here. We are talking about truncate-to-integer instruction, so I would really like to see an example of this effect. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29337
[Bug target/29300] FAIL: gcc.dg/pthread-init-[12].c (test for excess errors)
--- Comment #1 from uros at kss-loka dot si 2006-10-03 07:04 --- Similar problems were recently fixed for solaris and glibc-2.3.5. It looks that hpux needs a fixinclude hack that would cure these errors/warnings, somehing like: http://gcc.gnu.org/ml/gcc-patches/2006-09/msg01317.html http://gcc.gnu.org/ml/gcc-patches/2006-10/msg9.html and perhaps http://gcc.gnu.org/ml/gcc-patches/2006-10/msg9.html Confirmed, as gcc.dg/pthread-* tests were introduced in order to catch problems as described in the bug description. -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2006-10-03 07:04:17 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29300
[Bug target/29169] sse3-not-fisttp.c scan-assembler-not fisttp FAILs on i386-pc-solaris2.10
-- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2006- ||09/msg01012.html Status|NEW |ASSIGNED Keywords||patch Last reconfirmed|2006-09-21 17:52:55 |2006-09-23 13:36:43 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29169
[Bug target/29169] sse3-not-fisttp.c scan-assembler-not fisttp FAILs on i386-pc-solaris2.10
--- Comment #4 from uros at kss-loka dot si 2006-09-23 14:41 --- Fixed. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED Target Milestone|--- |4.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29169
[Bug target/28946] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #14 from uros at kss-loka dot si 2006-09-19 11:31 --- Fixed everywhere. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to fail|4.0.0 3.0.4 3.2.3 3.3.3 |3.0.4 3.2.3 3.3.3 Known to work|2.95.3 4.2.0 4.1.2 |2.95.3 4.2.0 4.1.2 4.0.4 Resolution||FIXED Summary|[4.0 Only] assembler shifts |assembler shifts set the |set the flag ZF, no need to |flag ZF, no need to re-test |re-test to zero |to zero http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug target/26968] [4.1 Regression] HDF5 1.7.52 test segfaults with 4.1.0, fine with 4.0.2 (regression)
--- Comment #9 from uros at kss-loka dot si 2006-09-07 06:58 --- I have built and run a testsuite of HDF5 library on i686-pc-linux-gnu with: gcc version 4.2.0 20060906 (experimental) hdf5-1.6.5 (production): (CFLAGS=-fno-strict-aliasing is needed before configure) All tests PASS with default compile flags out of the box. hdf5-1.8.0-alpha4: All tests PASS with defult compile flags out of the box. I guess this bugreport can be considered as 4.1 regression only. -- uros at kss-loka dot si changed: What|Removed |Added Component|middle-end |target GCC target triplet||i386-pc-linux-gnu Known to work||4.2.0 Summary|[4.1/4.2 Regression] HDF5 |[4.1 Regression] HDF5 1.7.52 |1.7.52 test segfaults with |test segfaults with 4.1.0, |4.1.0, fine with 4.0.2 |fine with 4.0.2 (regression) |(regression)| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26968
[Bug target/28924] x86 sync builtins fail for char and short memory operands
--- Comment #3 from uros at kss-loka dot si 2006-09-08 05:47 --- I have been playing with following patch to optabs.c that forces operands in functions expand_sync_operation(), expand_sync_fetch_operation() and expand_sync_lock_test_and_set() into registers through subregs of word-mode temp registers. The testcase in the description is then expanded as: ;; __sync_fetch_and_add_1 (s, 255) [tail call] (insn 10 8 11 (set (reg:SI 58) (const_int 255 [0xff])) -1 (nil) (nil)) (insn 11 10 0 (parallel [ (set (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1 A8]) (unspec_volatile:QI [ (plus:QI (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1 A8]) (subreg:QI (reg:SI 58) 0)) ] 13)) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) and RTL optimizers are able to optimize this back into: (insn:HI 11 8 12 2 (parallel [ (set (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1 A8]) (unspec_volatile:QI [ (plus:QI (mem/v:QI (symbol_ref:SI (s) var_decl 0x402410b0 s) [-1 S1 A8]) (const_int -1 [0x])) ] 13)) (clobber (reg:CC 17 flags)) ]) 924 {sync_addqi} (insn_list:REG_DEP_TRUE 10 (nil)) (nil)) This results in expected asm code: tests: lock addb $-1, s ret However, the patch does not cover all backup code-paths in sync_* expanders, so in some cases an integer argument can still be forced into register in the wrong way. --cut here-- Index: optabs.c === --- optabs.c (revision 116739) +++ optabs.c (working copy) @@ -6023,7 +6023,7 @@ if (GET_MODE (val) != VOIDmode GET_MODE (val) != mode) val = convert_modes (mode, GET_MODE (val), val, 1); if (!insn_data[icode].operand[1].predicate (val, mode)) - val = force_reg (mode, val); + val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val)); insn = GEN_FCN (icode) (mem, val); if (insn) @@ -6156,7 +6156,7 @@ if (GET_MODE (val) != VOIDmode GET_MODE (val) != mode) val = convert_modes (mode, GET_MODE (val), val, 1); if (!insn_data[icode].operand[2].predicate (val, mode)) - val = force_reg (mode, val); + val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val)); insn = GEN_FCN (icode) (target, mem, val); if (insn) @@ -6243,7 +6243,7 @@ if (GET_MODE (val) != VOIDmode GET_MODE (val) != mode) val = convert_modes (mode, GET_MODE (val), val, 1); if (!insn_data[icode].operand[2].predicate (val, mode)) - val = force_reg (mode, val); + val = gen_lowpart (mode, copy_to_mode_reg (word_mode, val)); insn = GEN_FCN (icode) (target, mem, val); if (insn) --cut here-- -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #9 from uros at kss-loka dot si 2006-09-06 11:33 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00162.html implements missing i386.md RTL patterns. This is i386 target-specific fix for this bug. The patch was bootstrapped on i686-pc-linux-gnu and x86_64-pc-linux-gnu, regtested for c,c++ and fortran. -- uros at kss-loka dot si changed: What|Removed |Added URL|http://gcc.gnu.org/ml/gcc- |http://gcc.gnu.org/ml/gcc- |patches/2006- |patches/2006- |09/msg00137.html|09/msg00162.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #4 from uros at kss-loka dot si 2006-09-05 06:20 --- (In reply to comment #2) It is entirely coincident. For some processors, it is an optimization to avoid partial flag register stall. When it is fixed, it should be reenabled with a new flag, something like TARGET_PARTIAL_FLAG_REG_STALL. There is TARGET_USE_INCDEC flag that already implements your suggestion. From predicates.md: /* On Pentium4, the inc and dec operations causes extra dependency on flag registers, since carry flag is not set. */ if (!TARGET_USE_INCDEC !optimize_size) If used elsewhere, this flag should perhaps be renamed to proposed TARGET_PARTIAL_FLAG_REG_STALL. -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #5 from uros at kss-loka dot si 2006-09-05 09:35 --- The problem here is following: We already have the patterns, that would satisfy combined instruction (*lshrsi3_cmp) in above testcase. However, combiner rejects combined instruction because the register that holds shifted result is unused! The problematic part is in combine.c, around line 2236 (please read the comment, which describes exactly the situation we have here). This part of code is activated only when the register that holds the result of arith operation is keept alive. This is quite strange - even if the result is unused, resulting code will be still smaller as we avoid extra CC setting instruction. The patch bellow (currently under testing, but so far OK) forces generation of combined instruction even if the arithmetic result is unused. Index: combine.c === --- combine.c (revision 116691) +++ combine.c (working copy) @@ -2244,7 +2244,7 @@ needed, and make the PARALLEL by just replacing I2DEST in I3SRC with I2SRC. Later we will make the PARALLEL that contains I2. */ - if (i1 == 0 added_sets_2 GET_CODE (PATTERN (i3)) == SET + if (i1 == 0 GET_CODE (PATTERN (i3)) == SET GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE XEXP (SET_SRC (PATTERN (i3)), 1) == const0_rtx rtx_equal_p (XEXP (SET_SRC (PATTERN (i3)), 0), i2dest)) @@ -2254,6 +2254,13 @@ enum machine_mode compare_mode; #endif + /* To force generation of the combined comparison and arithmetic +operation PARALLEL, pretend that the set in I2 is to be used, +even if it is dead after I2. This results in better generated +code, as only CC setting arithmetic instruction will be +emitted in conditionals. */ + added_sets_2 = 1; + newpat = PATTERN (i3); SUBST (XEXP (SET_SRC (newpat), 0), i2src); Compiling testcase with this patch results in following code: fct: movl 4(%esp), %eax shrl $5, %eax je .L2 jmp fct1 .p2align 4,,7 .L2: jmp fct2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #6 from uros at kss-loka dot si 2006-09-05 11:45 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00137.html BTW: This patch eliminates 869 test instructions in povray-3.6.1 compile. (And my test raytraced pictures are still correct.) -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2006- ||09/msg00137.html Status|NEW |ASSIGNED Keywords||patch Last reconfirmed|2006-09-04 16:50:06 |2006-09-05 11:45:14 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #7 from uros at kss-loka dot si 2006-09-05 13:43 --- Hm, proposed patch now generates worse code for following test: extern int fnc1(void); extern int fnc2(void); int test(int x) { if (x 0x02) return fnc1(); else if (x 0x01) return fnc2(); else return 0; } It generates: test: movl 4(%esp), %edx movl %edx, %eax andl $2, %eax jne .L10 andl $1, %edx jne .L11 xorl %eax, %eax ret .p2align 4,,7 .L11: .p2align 4,,8 jmp fnc2 .p2align 4,,7 .L10: .p2align 4,,7 jmp fnc1 due to marking %eax live in first comparison, and is used instead of test, and a regmove is emitted before comparison. Ideally gcc should generate: test: movl 4(%esp), %eax testl $2, %eax jne .L6 andl $1, %eax jne .L7 xorl %eax, %eax ret .p2align 2,,3 .L7: jmp fnc2 .p2align 2,,3 .L6: jmp fnc1 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug libgomp/28926] FAIL: libgomp.c/ordered-1.c execution test
--- Comment #1 from uros at kss-loka dot si 2006-09-04 05:49 --- The problem is that RH8.0 defines SYS_gettid and SYS_futex in headers although futex syscall is not really supported in the kernel. The build process detects this and issues a warning to configure with --disable-linux-futex, but still defaults to use futex syscall. Perhaps futex support detection logic in libgomp/configure.ac (around line 200) should be reversed, so it would default to don't use futex by default, but use them if all tests pass. Anyway, --disable-linux-futex works for me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28926
[Bug target/28909] Missed optimization with x86 sync builtins
--- Comment #2 from uros at kss-loka dot si 2006-09-01 10:18 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00010.html -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2006- ||09/msg00010.html Status|NEW |ASSIGNED Last reconfirmed|2006-08-31 03:13:03 |2006-09-01 10:18:03 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28909
[Bug target/28924] New: x86 sync builtins fail for char and short memory operands
Following testcases ICEs with current mainline: --cut here-- char c; void testc(void) { (void) __sync_fetch_and_add(c, -1); } short s; void tests(void) { (void) __sync_fetch_and_add(s, -1); } --cut here-- inc.c: In function âtestsâ: inc.c:13: error: unrecognizable insn: (insn 10 8 11 3 (set (reg:HI 58) (const_int 65535 [0x])) -1 (nil) (nil)) inc.c:13: internal compiler error: in extract_insn, at recog.c:2077 Please submit a full bug report, and: inc.c: In function âtestcâ: inc.c:6: error: unrecognizable insn: (insn 7 5 8 3 (set (reg:QI 58) (const_int 255 [0xff])) -1 (nil) (nil)) inc.c:6: internal compiler error: in extract_insn, at recog.c:2077 Please submit a full bug report, ICE happens for all optimization levels, also for unsigned c and s variables. I have checked _sync_fetch_and_add() and _sync_fetch_and_sub() builtins, but due to the nature of error all other sync_* builtins may be affected. -- Summary: x86 sync builtins fail for char and short memory operands Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28924
[Bug libgomp/28926] New: FAIL: libgomp.c/ordered-1.c execution test
libgomp.c/ordered-1.c and libgomp.c/ordered-3.c currently timeouts on my system (RedHat 8.0 with 2.4.18-14, i686) due to unimplemented FUTEX syscall. strace of produced binary shows endless lines of Function not implemented lines. This is the beginning: rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0 write(4, [EMAIL PROTECTED]@[EMAIL PROTECTED]@\340\370\377\277\0\0\0..., 148) = 148 rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0 rt_sigsuspend([] unfinished ... --- SIGRTMIN (Real-time signal 0) --- ... rt_sigsuspend resumed ) = -1 EINTR (Interrupted system call) sigreturn() = ? (mask now [RTMIN]) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) futex(0x40019458, FUTEX_WAIT, 0, NULL) = -1 ENOSYS (Function not implemented) ... Breaking execution in the middle produces following backtrace: Program received signal SIGINT, Interrupt. [Switching to Thread 8192 (LWP 5941)] 0x40017c83 in gomp_sem_wait_slow (sem=0x804b09c) at ../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h:73 in ../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h (gdb) bt #0 0x40017c83 in gomp_sem_wait_slow (sem=0x804b09c) at ../../../gcc-svn/trunk/libgomp/config/linux/x86/futex.h:73 #1 0x400167ce in gomp_ordered_sync () at ../../../gcc-svn/trunk/libgomp/config/linux/sem.h:46 #2 0x40016412 in gomp_loop_ordered_static_next (istart=0xb8e8, iend=0xb8e4) at ../../../gcc-svn/trunk/libgomp/loop.c:307 #3 0x08048b45 in f_static_1 (dummy=0x0) at ordered-1.c:72 -- Summary: FAIL: libgomp.c/ordered-1.c execution test Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28926
[Bug tree-optimization/28915] [4.2 regression] ICE: tree check: expected class 'constant', have 'declaration' (var_decl) in build_vector, at tree.c:973
--- Comment #3 from uros at kss-loka dot si 2006-08-31 19:15 --- Confirmed on x86_64. Backtrace: (gdb) bt #0 build_vector (type=0x2db3e6e0, vals=0x2db37cc0) at ../../gcc-svn/trunk/gcc/tree.c:973 #1 0x007b829d in force_const_mem (mode=V2DImode, x=0x2da089e0) at ../../gcc-svn/trunk/gcc/varasm.c:3229 #2 0x005d496a in emit_move_insn (x=0x2db309a0, y=0x2da089e0) at ../../gcc-svn/trunk/gcc/expr.c:3288 #3 0x006b2ec6 in gen_vec_initv2di (operand0=0x2db309a0, operand1=0x2da089d0) at ../../gcc-svn/trunk/gcc/config/i386/sse.md:3678 #4 0x005c9e37 in store_constructor (exp=0x2db37900, target=0x2db309a0, cleared=0, size=16) at ../../gcc-svn/trunk/gcc/expr.c:5431 #5 0x005ce327 in expand_expr_real_1 (exp=0x2db37900, target=0x2db309a0, tmode=V2DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fcf5800) at ../../gcc-svn/trunk/gcc/expr.c:7142 #6 0x005d40cf in expand_expr_real (exp=0x2db37900, target=0x2db309a0, tmode=V2DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fcf5800) at ../../gcc-svn/trunk/gcc/expr.c:6706 #7 0x005c7264 in store_expr (exp=0x2db37900, target=0x2db309a0, call_param_p=0) at ../../gcc-svn/trunk/gcc/expr.c:4370 #8 0x005c8397 in expand_assignment (to=0x2db3e0b0, from=0x2db37900) at ../../gcc-svn/trunk/gcc/expr.c:4249 #9 0x005cc403 in expand_expr_real_1 (exp=0x2db3c140, target=0x0, tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc-svn/trunk/gcc/expr.c:8603 #10 0x005d40cf in expand_expr_real (exp=0x2db3c140, target=0x2d956400, tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc-svn/trunk/gcc/expr.c:6706 At the point of ICE, value dumps to: var_decl 0x2db3ea50 D.1935 type vector_type 0x2db3e6e0 type integer_type 0x2d961630 long int public DI size integer_cst 0x2d951db0 constant invariant 64 unit size integer_cst 0x2d951de0 constant invariant 8 align 64 symtab 0 alias set -1 precision 64 min integer_cst 0x2d951d20 -9223372036854775808 max integer_cst 0x2d951d50 9223372036854775807 pointer_to_this pointer_type 0x2d974a50 V2DI size integer_cst 0x2d96c0f0 constant invariant 128 unit size integer_cst 0x2d96c120 constant invariant 16 align 128 symtab 0 alias set -1 nunits 2 V2DI file xskat-xdial.c line 16 size integer_cst 0x2d96c0f0 128 unit size integer_cst 0x2d96c120 16 align 128 (const:DI (plus:DI (symbol_ref:DI (lanip) [flags 0x40] var_decl 0x2db1cbb0 lanip) (const_int 40 [0x28]))) -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2006-08-31 19:15:44 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28915
[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark
--- Comment #10 from uros at kss-loka dot si 2006-08-29 06:12 --- (In reply to comment #9) Fixed on the mainline by: http://gcc.gnu.org/ml/gcc-patches/2006-08/msg01036.html Not really, the above patch fixed only one of three problems. The other two remains, that is: - ivopts problem (see comment #6) - -march=pentium4 (see comment #8) I'll try to see which option causes problems, described in #8. -- uros at kss-loka dot si changed: What|Removed |Added Summary|[4.0/4.1 Regression]|[4.0/4.1/4.2 Regression] |Optimizer regression: |Optimizer regression: |SciMark sparse matrix |SciMark sparse matrix |benchmark |benchmark http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676
[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark
--- Comment #7 from uros at kss-loka dot si 2006-08-17 07:21 --- (In reply to comment #6) I think that remaining time difference is due to strange loop above innermost: ... due to strange _header_ above innermost loop ... The problem is that we load zero in both arms of if. This is what I get in .099t.optimized (using gcc-4.2 -O2 -fno-ivopts): L1:; r.0 = (unsigned int) r; D.1556 = r.0 * 4; rowR = *((int *) D.1556 + row); rowRp1 = *((int *) D.1556 + row + 4B); if (rowR rowRp1) goto L41; else goto L42; L42:; sum = 0.0; goto bb 5 (L4); L41:; i = rowR; sum = 0.0; Assignment to sum should be moved before if... SSE is able to somehow CSE zero load during RTL: .L8: movl 20(%ebp), %edx movapd %xmm2, %xmm1 movl (%edx,%ebx,4), %eax movl 4(%edx,%ebx,4), %ecx cmpl %ecx, %eax jge .L11 movl %eax, %edx .p2align 4,,7 .L12: -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676
[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark
--- Comment #8 from uros at kss-loka dot si 2006-08-17 07:45 --- Also interesting is, that -march=pentium4 produces following de-optimized code, adding a couple more instructions and wasting %eax register: .L8: leal(%ebx,%ebx), %eax movl40(%esp), %edx movl(%edx,%eax,2), %edx movl%edx, (%esp) movl40(%esp), %edx movl4(%edx,%eax,2), %ecx movapd %xmm2, %xmm1 cmpl%ecx, (%esp) jge .L11 movl(%esp), %edx .L12: Some additiona timing can be shown (gcc-4.2 -O2 -fomit-frame-pointer): -march=pentium4: 0m2.756s -march=pentium4 -fno-ivopts: 0m2.500s -march=pentium4 -fno-ivopts -mfpmath=sse: 0m2.461s -msse2 -fno-ivopts -mfmpath=sse: 0m2.311s In the last case, the generated code is equal to gcc-3.2 generated one: .L8: movl36(%esp), %edx movapd %xmm2, %xmm1 movl(%edx,%ebx,4), %eax movl4(%edx,%ebx,4), %ecx cmpl%ecx, %eax jge .L11 movl%eax, %edx .p2align 4,,7 .L12: movl(%edi,%edx,4), %eax movsd (%esi,%eax,8), %xmm0 mulsd (%ebp,%edx,8), %xmm0 addl$1, %edx cmpl%edx, %ecx addsd %xmm0, %xmm1 jg .L12 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676
[Bug rtl-optimization/21676] [4.0/4.1/4.2 Regression] Optimizer regression: SciMark sparse matrix benchmark
--- Comment #6 from uros at kss-loka dot si 2006-08-16 12:15 --- IMO the problem here is in IVopts. Using gcc-3.x, the innermost loop compiles to: .L15: movl(%edi,%edx,4), %eax fldl(%ebp,%edx,8) addl$1, %edx fmull (%esi,%eax,8) cmpl%ecx, %edx faddp %st, %st(1) jl .L15 and with current SVN gcc-4.2 into: .L12: movl(%ecx), %eax fldl(%ebp,%eax,8) fmull (%edx) faddp %st, %st(1) addl$1, %ebx addl$4, %ecx addl$8, %edx cmpl%esi, %ebx jne .L12 Adding -fno-ivopts, this loop gets compiled into: .L12: movl(%edi,%edx,4), %eax fldl(%esi,%eax,8) fmull (%ebp,%edx,8) faddp %st, %st(1) addl$1, %edx cmpl%edx, %ecx jg .L12 Timings (-O3 -march=pentium4 -fomit-frame-pointer): gcc-3.2: 0m2.301s gcc-4.2: 0m2.713s gcc-4.2 + -fno-ivopts: 0m2.473s with: gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7) gcc version 4.2.0 20060816 (experimental) I think that remaining time difference is due to strange loop above innermost: gcc-3.2: fld %st(0) .L16: movl36(%esp), %eax fld %st(0) movl4(%eax,%ebx,4), %ecx movl(%eax,%ebx,4), %edx cmpl%ecx, %edx jge .L23 .L15: movl(%edi,%edx,4), %eax fldl(%ebp,%edx,8) addl$1, %edx fmull (%esi,%eax,8) cmpl%ecx, %edx faddp %st, %st(1) jl .L15 .L23: movl28(%esp), %eax fstpl (%eax,%ebx,8) addl$1, %ebx cmpl24(%esp), %ebx jl .L16 gcc-4.2: .L8: movl36(%esp), %edx movl(%edx,%edi,4), %eax movl4(%edx,%edi,4), %esi fldz cmpl%esi, %eax jge .L11 fstp%st(0) movl40(%esp), %ebx leal(%ebx,%eax,4), %ecx movl32(%esp), %ebx leal(%ebx,%eax,8), %edx fldz xorl%ebx, %ebx subl%eax, %esi .L12: movl(%ecx), %eax fldl(%ebp,%eax,8) fmull (%edx) faddp %st, %st(1) addl$1, %ebx addl$4, %ecx addl$8, %edx cmpl%esi, %ebx jne .L12 .L11: movl28(%esp), %eax fstpl (%eax,%edi,8) addl$1, %edi cmpl24(%esp), %edi jne .L8 and gcc-4.2 -fno-ivopts: .L8: leal(%ebx,%ebx), %eax movl40(%esp), %edx movl(%edx,%eax,2), %edx movl%edx, (%esp) movl40(%esp), %edx movl4(%edx,%eax,2), %ecx fldz cmpl%ecx, (%esp) jge .L11 fstp%st(0) movl(%esp), %edx fldz .L12: movl(%edi,%edx,4), %eax fldl(%esi,%eax,8) fmull (%ebp,%edx,8) faddp %st, %st(1) addl$1, %edx cmpl%edx, %ecx jg .L12 .L11: movl32(%esp), %ecx fstpl (%ecx,%ebx,8) addl$1, %ebx cmpl%ebx, 28(%esp) jg .L8 -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2006-08-16 12:15:56 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #64 from uros at kss-loka dot si 2006-08-11 09:18 --- Slightly offtopic, but to put some numbers to comment #8 and comment #11, equivalent SSE code now reaches only 50% of x87 single performance and 60% of x87 double performance on AMD x86_64: ALGORITHM NB REPSTIME MFLOPS = = = == == [float] -O2 -mfpmath=sse -march=k8: atlasmm 60 1000 0.273 1582.66 [float] -O2 -mfpmath=387 -march=k8: atlasmm 60 1000 0.138 3130.91 [double] -O2 -mfpmath=sse -march=k8: atlasmm 60 1000 0.252 1714.54 [double] -O2 -mfpmath=387 -march=k8: atlasmm 60 1000 0.152 2842.55 This effect was first observed in PR19780. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug middle-end/28685] New: Multiple comparisons are not simplified
These two testcases should produce equivalent code: int test(int a, int b) { int lt = a b; int eq = a == b; return (lt || eq); } int test_(int a, int b) { return (a b || a == b); } However, the optimized tree code is: ;; Function test (test) Analyzing Edge Insertions. test (a, b) { bb 2: return (a == b | a b) != 0; } ;; Function test_ (test_) Analyzing Edge Insertions. test_ (a, b) { bb 2: return a = b; } And the resultinh x86_64 asm is unoptimal for test() function: test: cmpl%esi, %edi sete%dl cmpl%esi, %edi setl%al orl %edx, %eax movzbl %al, %eax ret test_: xorl%eax, %eax cmpl%esi, %edi setle %al ret -- Summary: Multiple comparisons are not simplified Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28685
[Bug middle-end/28411] gfortran: Internal error: Illegal instruction
--- Comment #4 from uros at kss-loka dot si 2006-07-18 07:29 --- This is the backtrace for the testcase in comment #3: #1 0x0827ae67 in fold_binary_to_constant (code=TRUNC_MOD_EXPR, type=0x402473f4, op0=0x402d9438, op1=0x0) at ../../gcc-svn/trunk/gcc/fold-const.c:12314 #2 0x08174b25 in constant_multiple_of (type=0x402473f4, top=0x402d9438, bot=0x0) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:2623 #3 0x081799d1 in get_computation_cost (data=0xb704, use=0x8706e70, cand=0x8707358, address_p=0 '\0', depends_on=0xb5f4) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:3758 #4 0x0817a364 in determine_use_iv_cost (data=0xb704, use=0x8706e70, cand=0x8707358) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:3901 #5 0x0817d41e in determine_use_iv_costs (data=0xb704) at ../../gcc-svn/trunk/gcc/tree-ssa-loop-ivopts.c:4128 #6 0x0817f3ac in tree_ssa_iv_optimize_loop (data=0xb704, loop=Variable loop is not available. constant_multiple_of() is calling fold_binary_to_constant() here: if (!zero_p (fold_binary_to_constant (TRUNC_MOD_EXPR, type, top, bot))) return NULL_TREE; As can be seen from backtrace above, bot operand is NULL, and this triggers assert in fold_binary(). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28411
[Bug tree-optimization/28411] gfortran: Internal error: Illegal instruction
--- Comment #5 from uros at kss-loka dot si 2006-07-18 08:06 --- This error can be tracked down to fold_negate_expr() returning NULL_TREE via this path: (a) constant_multiple_of() calls fold_unary_to_constant(): /* If BOT seems to be negative, try dividing by -BOT instead, and negate the result afterwards. */ if (tree_int_cst_sign_bit (bot)) { negate = true; bot = fold_unary_to_constant (NEGATE_EXPR, type, bot); } (b) fold_unary_to_constant() calls fold_unary() (c) fold_unary() calls fold_unary_negate() for NEGATE_EXPR: case NEGATE_EXPR: tem = fold_negate_expr (arg0); if (tem) return fold_convert (type, tem); return NULL_TREE; (d) fold_negate_expr() returns NULL_TREE, because: case INTEGER_CST: tem = fold_negate_const (t, type); if (! TREE_OVERFLOW (tem) || TYPE_UNSIGNED (type) || ! flag_trapv) return tem; break; ... default: break; } return NULL_TREE; } From here, I don't know, what a correct solution would be... -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28411
[Bug target/26949] [4.2 regression] worse code generated for -march=pentium4
--- Comment #1 from uros at kss-loka dot si 2006-07-06 08:23 --- This problem appears to be fixed in gcc version 4.2.0 20060705 (experimental). The generated asm for the loop is now: -O2 -march=pentium4 -fno-tree-ch: jmp .L2 .L3: movl%esi, -4(%edx) addl$1, %eax .L2: addl$4, %edx cmpl%ecx, %eax jle .L3 -O2 -march=i686 -fno-tree-ch: jmp .L2 .p2align 4,,7 .L3: movl%ebx, -4(%ecx) addl$1, %edx .L2: addl$4, %ecx cmpl%eax, %edx jle .L3 Closing the bug as FIXED. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949
[Bug target/26949] [4.2 regression] worse code generated for -march=pentium4
--- Comment #2 from uros at kss-loka dot si 2006-07-06 08:24 --- Closing it for real... -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949
[Bug middle-end/28252] pow(x,1/3.0) should be converted to cbrt(x)
--- Comment #2 from uros at kss-loka dot si 2006-07-05 08:25 --- Created an attachment (id=11824) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11824action=view) Patch to implement pow(x,1.0/3.0) = cbrt(x) optimization I have the patch that implements the optimization ready, just waiting for the mainline to open again. Should I post it to gcc-patches anyway? 2006-07-05 Uros Bizjak [EMAIL PROTECTED] * builtins.c (fold_builtin): Fold pow(x,1.0/3.0) as cbrt(x) if flag_unsafe_math_optimizations is set. testsuite: * gcc.dg/builtins-8.c: Also check pow(x,1.0/3.0) to cbrt(x) transformation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28252
[Bug middle-end/28252] pow(x,1/3.0) should be converted to cbrt(x)
-- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | Status|NEW |ASSIGNED Last reconfirmed|2006-07-04 22:52:33 |2006-07-05 08:26:53 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28252
[Bug tree-optimization/27474] ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776
--- Comment #4 from uros at kss-loka dot si 2006-07-05 10:10 --- This still fails with current mainline gcc. -- uros at kss-loka dot si changed: What|Removed |Added Last reconfirmed|2006-05-08 07:45:56 |2006-07-05 10:10:38 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474
[Bug middle-end/24929] long long shift/mask operations should be better optimized
--- Comment #5 from uros at kss-loka dot si 2006-06-27 10:12 --- (In reply to comment #4) which may be optimal. movzbl 18(%esp), %eax could be used in this particular case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24929
[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #20 from uros at kss-loka dot si 2006-06-26 06:31 --- (In reply to comment #15) Can someone tell me if anyone is looking into this problem with the hopes of fixing it? I just noticed that despite the posted code demonstrating the problem, and verification on: Pentium Pro, Pentium III, Pentium 4e, Pentium-D, Athlon-64 X2 and Opteron, it is still marked as new, and no one is assigned to look at it . . . Hm, I tried your single testcase (SSE) on: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping: 9 cpu MHz : 3191.917 cache size : 512 KB And the results are a bit suprising (this is the exact output of your test): /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -DTYPE=float -c mmbench.c /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -c sgemm_atlas.c /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -o xsmm_gcc mmbench.o sgemm_atlas.o rm -f *.o /usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -DTYPE=float -c mmbench.c /usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -c sgemm_atlas.c /usr/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -msse2 -mfpmath=sse -o xsmm_gc4 mmbench.o sgemm_atlas.o rm -f *.o echo GCC 3.x single performance: GCC 3.x single performance: ./xsmm_gcc ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.141 3072.00 echo GCC 4.x single performance: GCC 4.x single performance: ./xsmm_gc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.141 3072.00 where: gcc (GCC) 3.4.6 was tested against gcc version 4.2.0 20060608 (experimental) FYI: there is another pathological testcase (PR target/19780), where SSE code is 30% slower on AMD64, despite the fact that for SSE, 16 xmm registers were available and _no_ memory was accessed in a for loop. The reason I ask is that I am preparing the next stable release of ATLAS, and I'm getting close to having to make a decision on what compilers I will support. If someone is working feverishly in the background, I will be sure to wait for it, in the hopes that there'll be a fix that will allow me to use gcc 4, which I think will be what most of my users want. If this problem is not being looked into, I should not delay the ATLAS release for it, and just require my users to install gcc 3 in order to get decent performance. I realize you guys are busy, and fp performance is probably not your main concern, so hopefully this message sounds more like a request for info on what is going on, than a bitch about help that I'm getting for free :) Without any other information available, I can only speculate, that perhaps gcc4 code does not fully utilize multiple FP pipelines in the processors you listed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #22 from uros at kss-loka dot si 2006-06-27 05:49 --- (In reply to comment #21) Note that you are running the opposite of my test case: SSE vs SSE rather than x87 vs x87. This whole bug report is about x87 performance. You can get more detail on why I want x87 in my messages above, particularly comment #11, but single precision is indeed the place where SSE cannot compete with the x87 unit. To see it, put the flags back the way I had them in the attachment, and you'll see that gcc 3 is much faster. Also, you should find in single Hm, these are x87 results: /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -DTYPE=float -c mmbench.c /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c sgemm_atlas.c /usr/local.uros/gcc34/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xsmm_gcc mmbench.o sgemm_atlas.o rm -f *.o /usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -DTYPE=float -c mmbench.c /usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c sgemm_atlas.c /usr/local.uros/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xsmm_gc4 mmbench.o sgemm_atlas.o rm -f *.o echo GCC 3.x single performance: GCC 3.x single performance: ./xsmm_gcc ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.141 3072.00 echo GCC 4.x single performance: GCC 4.x single performance: ./xsmm_gc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.143 3029.92 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug c++/28041] [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C
--- Comment #1 from uros at kss-loka dot si 2006-06-19 08:56 --- Works OK with gcc version 4.2.0 20060619 (experimental). -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28041
[Bug c++/28041] New: [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C
The compilation crashes in /* Gimplify an OMP_ATOMIC statement. */ static enum gimplify_status gimplify_omp_atomic (tree *expr_p, tree *pre_p) { tree addr = TREE_OPERAND (*expr_p, 0); tree rhs = TREE_OPERAND (*expr_p, 1); tree type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (addr))); HOST_WIDE_INT index; Program received signal SIGSEGV, Segmentation fault. 0x001bb40c in gimplify_omp_atomic (expr_p=0xff0edf48, pre_p=0xffbed64c) at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5148 (gdb) bt #0 0x001bb40c in gimplify_omp_atomic (expr_p=0xff0edf48, pre_p=0xffbed64c) at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5148 #1 0x001bd2a0 in gimplify_expr (expr_p=0xff0edf48, pre_p=0xffbed64c, post_p=0xffbed648, gimple_test_f=0x1aa228 is_gimple_stmt, fallback=fb_none) at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5646 #2 0x001b6078 in gimplify_statement_list (expr_p=0xffbed6f8) at /export/home/uros/gcc-svn/trunk/gcc/tree-iterator.h:86 #3 0x001bcdfc in gimplify_expr (expr_p=0xff115970, pre_p=0xffbed784, post_p=0xffbed780, gimple_test_f=0x1aa228 is_gimple_stmt, fallback=fb_none) at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:5595 #4 0x001bde60 in gimplify_body (body_p=0xff115970, fndecl=0xff115900, do_parms=1 '\001') at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:6113 #5 0x001be2a8 in gimplify_function_tree (fndecl=0xff115900) at /export/home/uros/gcc-svn/trunk/gcc/gimplify.c:6189 #6 0x0017917c in c_genericize (fndecl=0xff115900) at /export/home/uros/gcc-svn/trunk/gcc/c-gimplify.c:106 #7 0x00143864 in cp_genericize (fndecl=0xff115900) at /export/home/uros/gcc-svn/trunk/gcc/cp/cp-gimplify.c:739 #8 0x00045a74 in finish_function (flags=0) at /export/home/uros/gcc-svn/trunk/gcc/cp/decl.c:11130 -- Summary: [gomp] ICE in g++.dg/gomp/atomic-[4,5,9].C Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: sparc-sun-solaris2.8 GCC host triplet: sparc-sun-solaris2.8 GCC target triplet: sparc-sun-solaris2.8 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28041
[Bug target/28007] sse autovectorizer emits wrong code involving shifts
--- Comment #5 from uros at kss-loka dot si 2006-06-13 07:44 --- Similar problem was solved for gcc-4.1 in PR target/22480. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28007
[Bug target/27790] [4.1 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2
--- Comment #9 from uros at kss-loka dot si 2006-06-07 07:05 --- Fixed on 4.1 branch. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790
[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code
--- Comment #2 from uros at kss-loka dot si 2006-06-02 10:04 --- (In reply to comment #1) There is nothing special about reassociation at all. In fact what you are seeing is register allocator going funky. This what you get with x87. This is also what you get with SSE. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855
[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #9 from uros at kss-loka dot si 2006-06-01 08:43 --- The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit): vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping: 9 cpu MHz : 3191.917 cache size : 512 KB shows even more interesting results: gcc version 3.4.6 vs. gcc version 4.2.0 20060601 (experimental) -fomit-frame-pointer -O -msse2 -mfpmath=sse GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.162 2664.87 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.164 2633.13 and -fomit-frame-pointer -O -mfpmath=387 GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.160 2697.37 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.164 2633.15 There is a small performance drop on gcc-4.x, but nothing critical. I can confirm, that code indeed runs 50% slower on 64bit athlon. Perhaps the problem is in the order of instructions (Software Optimization Guide for AMD Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how things should be, and gcc-4.2 code looks similar to the example, how things should _NOT_ be. BTW: Did you try to run the benchmark on AMD target with -march=k8? The effects of this flag are devastating on Pentium4 CPU: -O -msse2 -mfpmath=sse -march=k8 ./xmm_gcc ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.836 516.79 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.287 1504.66 -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2006-06-01 08:43:34 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug tree-optimization/27855] New: reassociation pass produces ~30% slower matrix multiplication code
The testcase from PR target/27827 shows another problem, this time with -ffast-math. The runtime performance of -ffast-math code drops for ~30%. The problem could be traced down to reassociation tree pass, because the performance jumps back when flag_unsafe_math_optimizations switch is disabled by changing every occurence in tree-ssa-reassoc.c with (flag_unsafe_math_optimizations 0). To see the problem, -funsafe-math-optimizations should be added to MMFLAGS in target/27827 example Makefile: MM4FLAGS = $(GMMFLAGS) -funsafe-math-optimizations Current mainline gcc produces code with following results: -O -mfpmath=387 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.260 1663.04 -O -msse2 -mfpmath=sse ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.229 1890.47 gcc with disabled reassoc pass for floating point values: -O -mfpmath=387 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.162 2664.87 -O -msse2 -mfpmath=sse ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 60 1000 0.164 2633.15 -- Summary: reassociation pass produces ~30% slower matrix multiplication code Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu OtherBugsDependingO 27827 nThis: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855
[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #7 from uros at kss-loka dot si 2006-05-31 10:56 --- IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure luck. Looking into 3.x RTL, these things can be observed: Instruction that multiplies pA0 and rB0 is described as: __.20.combine: (insn 75 73 76 2 (set (reg:DF 84) (mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64]) (reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) At this point, first input operand does not satisfy the operand constraint, so register allocator pushes memory operand into the register: __.25.greg: (insn 703 73 75 2 (set (reg:DF 8 st [84]) (mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96 {*movdf_integer} (nil) (nil)) (insn 75 703 76 2 (set (reg:DF 8 st [84]) (mult:DF (reg:DF 8 st [84]) (reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) This RTL produces following asm sequence: fldl(%rax) #* pA0 fmul%st(1), %st # In 4.x case, we have: __.127r.combine: (insn 60 58 61 4 (set (reg:DF 207) (mult:DF (reg/v:DF 187 [ rB0 ]) (mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) This instruction almost satisfies operand constraint, and register allocator produces: __.138r.greg: (insn 470 58 60 5 (set (reg:DF 12 st(4) [207]) (reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 5 (set (reg:DF 12 st(4) [207]) (mult:DF (reg:DF 12 st(4) [207]) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) Stack handling then fixes this RTL to: __.151r.stack: (insn 470 58 60 4 (set (reg:DF 8 st) (reg:DF 8 st)) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 4 (set (reg:DF 8 st) (mult:DF (reg:DF 8 st) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) From your measurement, it looks that instead of: fld %st(0) # fmull (%rax) #* pA0.161 it is faster to emit fldl(%rax) #* pA0 fmul%st(1), %st #, -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27790] [4.1/4.2 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2
--- Comment #3 from uros at kss-loka dot si 2006-05-29 10:29 --- I'm testing a patch. -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | Status|NEW |ASSIGNED Last reconfirmed|2006-05-29 04:28:52 |2006-05-29 10:29:47 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790
[Bug target/27790] [4.1/4.2 Regression] Unrecognizable insn with -ftree-vectorize -O1 -msse2
--- Comment #5 from uros at kss-loka dot si 2006-05-29 11:52 --- (In reply to comment #4) pr27790.patch This seems to work for me. In V4SImode case above, there is emit_insn (gen_subv4si3 (t1, cop0, cop1)); subv4si insn also needs cop0 in the register: (define_expand submode3 [(set (match_operand:SSEMODEI 0 register_operand ) (minus:SSEMODEI (match_operand:SSEMODEI 1 register_operand ) (match_operand:SSEMODEI 2 nonimmediate_operand )))] TARGET_SSE2 ix86_fixup_binary_operands_no_copy (MINUS, MODEmode, operands);) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27790
[Bug tree-optimization/27638] New: Strange initialization of uninitialized structure part
This testcase generates some kind of initalization of uninitialized part of the structure: --cut here-- struct ret_struct { int long_buf[10]; }; struct ret_struct strc_test (int i) { struct ret_struct ret; ret.long_buf[0] = i; ret.long_buf[1] = 0x2; ret.long_buf[2] = 0x3; ret.long_buf[3] = 0x4; ret.long_buf[4] = 0x5; ret.long_buf[5] = 0x6; return ret; } --cut here-- gcc -O2 -fverbose-asm -fomit-frame-pointer: --cut here-- strc_test: movl4(%esp), %eax # D.1563, D.1563 movl8(%esp), %edx # i, i movl%eax, 36(%eax) # ret$long_buf$9, result.long_buf !! movl%eax, 32(%eax) # ret$long_buf$8, result.long_buf !! movl%eax, 28(%eax) # ret$long_buf$7, result.long_buf !! movl%eax, 24(%eax) # ret$long_buf$6, result.long_buf !! movl$6, 20(%eax)#, result.long_buf movl$5, 16(%eax)#, result.long_buf movl$4, 12(%eax)#, result.long_buf movl$3, 8(%eax) #, result.long_buf movl$2, 4(%eax) #, result.long_buf movl%edx, (%eax)# i, result.long_buf ret $4 # --cut here-- These extra variables can be seen in the _.optimized tree dump: --cut here-- ;; Function strc_test (strc_test) Analyzing Edge Insertions. strc_test (i) { int ret$long_buf$9; int ret$long_buf$8; int ret$long_buf$7; int ret$long_buf$6; bb 2: retval.long_buf[9] = ret$long_buf$9; retval.long_buf[8] = ret$long_buf$8; retval.long_buf[7] = ret$long_buf$7; retval.long_buf[6] = ret$long_buf$6; retval.long_buf[5] = 6; retval.long_buf[4] = 5; retval.long_buf[3] = 4; retval.long_buf[2] = 3; retval.long_buf[1] = 2; retval.long_buf[0] = i; return retval; } --cut here-- Using -Wall, gcc correctly warns about uninitialized part (why extra 'u' in index?): t.c:16: warning: 'ret.long_buf[9u]' is used uninitialized in this function t.c:16: warning: 'ret.long_buf[8u]' is used uninitialized in this function t.c:16: warning: 'ret.long_buf[7u]' is used uninitialized in this function t.c:16: warning: 'ret.long_buf[6u]' is used uninitialized in this function IMO emitting some sort of initialization in this case is not needed. -- Summary: Strange initialization of uninitialized structure part Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27638
[Bug target/26726] -fivopts producing out of bounds array refs
--- Comment #14 from uros at kss-loka dot si 2006-05-13 08:46 --- (In reply to comment #13) This is now a target specific problem, on i?86 and x86_64 we are left with an offset of -4B and so referencing a[5] in the exit condition. This is PR target/24669. -- uros at kss-loka dot si changed: What|Removed |Added BugsThisDependsOn||24669 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26726
[Bug target/27277] [4.2 Regression] standard i387 constant loading insns (fldz, fld1) are not generated anymore
--- Comment #6 from uros at kss-loka dot si 2006-05-08 06:12 --- Fixed. -- uros at kss-loka dot si changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27277
[Bug tree-optimization/27474] New: ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776
This ICE happens during compilation of PovRay-3.6.1 with -msse2 -ftree-vectorize (also on x86_64). The ICE is in express.cpp. The reduced testcase is attached, this is the failure with -O -ftree-vectorize: g++ -O -ftree-vectorize -m32 -msse2 reduced.cpp reduced.cpp: In function void pov::Parse_Num_Factor(double*, int*): reduced.cpp:94: internal compiler error: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776 Please submit a full bug report, [etc] The same failure happens with g++ -O -ftree-vectorize on x86_64. -- Summary: ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474
[Bug tree-optimization/27474] ICE: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:776
--- Comment #1 from uros at kss-loka dot si 2006-05-07 19:30 --- Created an attachment (id=11396) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11396action=view) Reduced cpp testcase The testcase, reduced with Delta. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27474
[Bug target/27277] New: standard i387 constant loading insns (fldz, fld1) are not generated anymore
It looks that standard i387 constant loading insns are not generated anymore. This testcase: --cut here-- double test(void) { return 1.0; } --cut here-- generates (gcc -O2 -fomit-frame-pointer): test: flds.LC0 fld1 should be here ret .LC0: .long 1065353216 The problem is in extendsfdf2 expander, which expects CONST_DOUBLE as an operand[1] to generate simple constant move instruction. The constant is pushed to the constant pool (as a SFmode constant) for some reason, so the expander receives a (reg:SF 60) as an operand[1]. Following RTL sequence is produced: (insn 9 8 10 (set (reg:SF 60) (mem/u/c/i:SF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [2 S4 A32])) -1 (nil) (expr_list:REG_EQUAL (const_double:SF 1.0e+0 [0x0.8p+1]) (nil))) (insn 10 9 11 (set (reg:DF 58 [ result ]) (float_extend:DF (reg:SF 60))) -1 (nil) (expr_list:REG_EQUAL (const_double:DF 1.0e+0 [0x0.8p+1]) (nil))) this sequence corresponds to final asm: test: flds.LC0# 16*extendsfdf2_i387/1 [length = 6] ret # 30return_internal [length = 1] The same problem arises for other i387 constants. -- Summary: standard i387 constant loading insns (fldz, fld1) are not generated anymore Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC target triplet: i386-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27277
[Bug middle-end/27134] [4.1 regression] ICE with floor and -ffast-math
--- Comment #7 from uros at kss-loka dot si 2006-04-16 11:22 --- Fixed. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work|4.2.0 |4.2.0 4.1.1 Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134
[Bug middle-end/27134] [4.1 regression] ICE with floor and -ffast-math
--- Comment #5 from uros at kss-loka dot si 2006-04-14 07:18 --- Fixed on SVN head. -- uros at kss-loka dot si changed: What|Removed |Added Known to work||4.2.0 Summary|[4.1/4.2 regression] ICE|[4.1 regression] ICE with |with floor and -ffast-math |floor and -ffast-math http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134
[Bug middle-end/27134] [4.1/4.2 regression] ICE with floor and -ffast-math
--- Comment #3 from uros at kss-loka dot si 2006-04-12 17:54 --- There seems to be something wrong with -ffast-math and floor. I have done some analysis on this. Start from expand_builtin_int_roundingfn() in builtins.c source, where we fallback to FP rounding optab. fallback_fndecl from mathfn_builtin looks like: function_decl 0x2d992200 __builtin_floor type function_type 0x2d9756e0 type real_type 0x2d970420 double DF size integer_cst 0x2d951d80 constant invariant 64 unit size integer_cst 0x2d951db0 constant invariant 8 align 64 symtab 0 alias set -1 precision 64 pointer_to_this pointer_type 0x2d970630 QI size integer_cst 0x2d9517e0 constant invariant 8 unit size integer_cst 0x2d951810 constant invariant 1 align 8 symtab 0 alias set -1 arg-types tree_list 0x2d9740f0 value real_type 0x2d970420 double chain tree_list 0x2d96be10 value void_type 0x2d9700b0 void pointer_to_this pointer_type 0x2dabad10 readonly used nothrow public external built-in decl_6 QI file built-in line 0 built-in BUILT_IN_NORMAL:BUILT_IN_FLOOR attributes tree_list 0x2d9918d0 (mem:QI (symbol_ref:DI (floor) [flags 0x41] function_decl 0x2d992200 __builtin_floor) [0 S1 A8]) chain function_decl 0x2d992300 floor After that, build_function_call_expr() is called, with an argument list: tree_list 0x2dabf180 value float_expr 0x2d95b240 type real_type 0x2d970420 double DF size integer_cst 0x2d951d80 constant invariant 64 unit size integer_cst 0x2d951db0 constant invariant 8 align 64 symtab 0 alias set -1 precision 64 pointer_to_this pointer_type 0x2d970630 arg 0 parm_decl 0x2d958780 i type integer_type 0x2d9604d0 int used SI file pr27134.c line 5 size integer_cst 0x2d951bd0 constant invariant 32 unit size integer_cst 0x2d9516f0 constant invariant 4 align 32 context function_decl 0x2daa1600 foo initial integer_type 0x2d9604d0 int (reg/v:SI 59 [ i ]) arg-type integer_type 0x2d9604d0 int This is simplified in fold_build3() to: nop_expr 0x2dac5300 type real_type 0x2d970420 double DF size integer_cst 0x2d951d80 constant invariant 64 unit size integer_cst 0x2d951db0 constant invariant 8 align 64 symtab 0 alias set -1 precision 64 pointer_to_this pointer_type 0x2d970630 arg 0 float_expr 0x2d95b240 type real_type 0x2d970420 double arg 0 parm_decl 0x2d958780 i type integer_type 0x2d9604d0 int used SI file pr27134.c line 5 size integer_cst 0x2d951bd0 constant invariant 32 unit size integer_cst 0x2d9516f0 constant invariant 4 align 32 context function_decl 0x2daa1600 foo initial integer_type 0x2d9604d0 int (reg/v:SI 59 [ i ]) arg-type integer_type 0x2d9604d0 int incoming-rtl (reg:SI 5 di [ i ]) It looks to me, that fold_convert3() is trying to kill (int) __builtin_lfloor ((double) i), where i is an integer argument. Uros. -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | Status|NEW |ASSIGNED Last reconfirmed|2006-04-12 14:59:12 |2006-04-12 17:54:41 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27134
[Bug middle-end/27139] New: Optimize double INT-FP-INT conversions
This testcase: int test (int a) { return (double) a; } Produces: cvtsi2sd%edi, %xmm0 cvttsd2si %xmm0, %eax ret However, following code does the same (at least for -ffast-math): movl%edi, %eax ret -- Summary: Optimize double INT-FP-INT conversions Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27139
[Bug middle-end/27069] -ffast-math crash
--- Comment #14 from uros at kss-loka dot si 2006-04-07 06:10 --- This is a duplicate of PR 26869. *** This bug has been marked as a duplicate of 26869 *** -- uros at kss-loka dot si changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27069
[Bug middle-end/26869] [4.1/4.2 Regression] Segfault in find_lattice_value() for complex operands.
--- Comment #3 from uros at kss-loka dot si 2006-04-07 06:10 --- *** Bug 27069 has been marked as a duplicate of this bug. *** -- uros at kss-loka dot si changed: What|Removed |Added CC||nuno dot bandeira at ist dot ||utl dot pt http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26869
[Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
--- Comment #12 from uros at kss-loka dot si 2006-03-29 14:08 --- (In reply to comment #11) it looks like 4.1.1 and 4.2.0 still produce unoptimal code. test: pushl %ebp movl%esp, %ebp fldl8(%ebp) fldz fcomip %st(1), %st jae .L2 popl%ebp fcos ret .L2:popl%ebp fsin ret No, this code is optimal. Please compare the code above to the code in description, where fcos is calculated even if x = 0.0 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
[Bug middle-end/26869] New: Segfault in find_lattice_value() for complex operands.
This testcase segfaults in find_lattice_value() in tree-complex.c line 116: _Complex float f (_Complex float b, _Complex float c) { _Complex float a = 1.0 + 0.0i; return a / c; } gcc -O x.c x.c: In function f: x.c:2: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. (gdb) bt #0 find_lattice_value (t=0x0) at ../../gcc-svn/trunk/gcc/tree-complex.c:116 #1 0x00853e50 in set_component_ssa_name (ssa_name=0x0, imag_p=0 '\0', value=0x2d95b600) at ../../gcc-svn/trunk/gcc/tree-complex.c:485 #2 0x00854126 in update_complex_components_on_edge (e=0x2d95b300, lhs=0x0, r=Variable r is not available. ) at ../../gcc-svn/trunk/gcc/tree-complex.c:608 #3 0x008579eb in tree_lower_complex () at ../../gcc-svn/trunk/gcc/tree-complex.c:658 -- Summary: Segfault in find_lattice_value() for complex operands. Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26869
[Bug middle-end/26717] [4.2 Regression] complex/complex gives a REAL_CST
--- Comment #7 from uros at kss-loka dot si 2006-03-23 10:33 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01435.html -- uros at kss-loka dot si changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2006- ||03/msg01435.html Status|NEW |ASSIGNED Last reconfirmed|2006-03-16 16:21:05 |2006-03-23 10:33:46 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26717
[Bug target/13685] Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction)
--- Comment #17 from uros at kss-loka dot si 2006-02-22 10:15 --- Works OK with gcc-4.2 and -Os -msse -fomit-frame-pointer. -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685
[Bug driver/26274] New: gcc --target-help segfaults
gcc segfaults in print_filtered_help when invoked with --target-help option: Starting program: /export/home/uros/gcc-build/gcc/cc1 --target-help Target specific options: Program received signal SIGSEGV, Segmentation fault. 0x0838547b in print_filtered_help (flag=4194304) at ../../gcc-svn/trunk/gcc/opts.c:1335 1335 memset (printed, 0, cl_options_count); (gdb) bt #0 0x0838547b in print_filtered_help (flag=4194304) at ../../gcc-svn/trunk/gcc/opts.c:1335 #1 0x0838635e in decode_options (argc=2, argv=0xba24) at ../../gcc-svn/trunk/gcc/opts.c:746 #2 0x083eb77b in toplev_main (argc=2, argv=0xba24) at ../../gcc-svn/trunk/gcc/toplev.c:1970 print_filtered_help (unsigned int flag) { unsigned int i, len, filter, indent = 0; bool duplicates = false; const char *help, *opt, *tab; static char *printed; if (flag == CL_COMMON || flag == CL_TARGET) { filter = flag; if (!printed) printed = xmalloc (cl_options_count); memset (printed, 0, cl_options_count); } else -- Summary: gcc --target-help segfaults Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26274
[Bug target/17390] missing floating point compare optimization
--- Comment #8 from uros at kss-loka dot si 2006-01-18 09:50 --- (In reply to comment #7) Hmm, I get (but that looks like different branch predictions): It looks that your default is -mtune=pentium. _testf: fldl4(%esp) ftst fnstsw %ax testb $64, %ah jne L10 ftst fnstsw %ax fstp%st(0) testb $69, %ah jne L5 fld1 ret .align 2,0x90 L10: fstp%st(0) fldz ret L5: fldsLC2 ret With proposed patch, this code is compiled to (-O2 -ffast-math -mtune=pentium -fomit-frame-pointer): testf: fldl 4(%esp) ftst fnstsw %ax fstp %st(0) testb $64, %ah jne .L10 testb $69, %ah jne .L5 fld1 ret .p2align 4,,7 .L10: fldz ret .L5: flds .LC2 ret and for -mtune=i686: testf: fldl 4(%esp) ftst fnstsw %ax fstp %st(0) sahf je .L10 jbe .L5 fld1 ret .p2align 4,,7 .L10: fldz .p2align 4,,8 ret .L5: flds .LC2 .p2align 4,,4 ret BTW: I'll attach a patch, rediffed to current SVN. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17390
[Bug target/17390] missing floating point compare optimization
--- Comment #9 from uros at kss-loka dot si 2006-01-18 09:53 --- Created an attachment (id=10666) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10666action=view) patch to SVN GCC: (GNU) 4.2.0 20060117 (experimental) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17390
[Bug regression/25531] New: [4.0/4.1/4.2 Regression]: Handling of __attribute__ ((alias (foo+X)))
A regresion with __attribute__ ((alias (foo+X))) breaks newlib builds. The testcase is distilled from newlib-1.13.0/newlib/libc/ctype/ctype_.c: --cut here-- static const char _foo_b[4] = { 'a', 'b', 'c', 'd' }; extern const char _foo_[4] __attribute__ ((alias (_foo_b+2))); --cut here-- gcc-3.4: ~/gcc-build-34/gcc/cc1 x.c test more x.s .file x.c .section.rodata .type _foo_b, @object .size _foo_b, 4 _foo_b: .byte 97 .byte 98 .byte 99 .byte 100 .globl _foo_ .set_foo_,_foo_b+2 .section.note.GNU-stack,,@progbits .ident GCC: (GNU) 3.4.5 20051110 (prerelease) gcc-4.x: ~/gcc-build/gcc/cc1 x.c x.c:5: error: '_foo_' aliased to undefined symbol '_foo_b+2' -- Summary: [4.0/4.1/4.2 Regression]: Handling of __attribute__ ((alias (foo+X))) Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25531
[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32
--- Comment #10 from uros at kss-loka dot si 2005-12-02 06:59 --- Fixed on 4.1 and mainline. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED Target Milestone|--- |4.1.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475
[Bug tree-optimization/20219] Missed optimisation sin / tan -- cos
--- Comment #3 from uros at kss-loka dot si 2005-11-28 07:20 --- Reopened to ... -- uros at kss-loka dot si changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|WONTFIX | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20219
[Bug middle-end/20219] Missed optimisation sin / tan -- cos
--- Comment #5 from uros at kss-loka dot si 2005-11-28 07:32 --- ... close as FIXED. -- uros at kss-loka dot si changed: What|Removed |Added Status|REOPENED|RESOLVED Component|tree-optimization |middle-end Resolution||FIXED Target Milestone|--- |4.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20219
[Bug target/24476] [4.1/4.2 Regression] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA64
--- Comment #2 from uros at kss-loka dot si 2005-11-24 08:09 --- The testsuite patch that fixes IA32 tests (and should also fix IA64 issues reported here) is at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01059.html. Patch is still waiting for review, however I can't test it on IA64. -- uros at kss-loka dot si changed: What|Removed |Added BugsThisDependsOn||24475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24476
[Bug rtl-optimization/24995] [4.1/4.2 Regression] gcc.dg/vect/vect-10.c fails for -march=athlon
--- Comment #2 from uros at kss-loka dot si 2005-11-24 10:19 --- This also fails for i686-pc-linux-gnu with '-march=athlon'. The patch at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01648.html fixes i86_64-pc-linux-gnu failure in original report and -march=athlon failure. FWIW, -fomit-frame-pointer also fixes these failures. This PR is a duplicate of 24982. *** This bug has been marked as a duplicate of 24982 *** -- uros at kss-loka dot si changed: What|Removed |Added Status|NEW |RESOLVED GCC target triplet|x86_64-linux-gnu|i686-pc-linux-gnu Resolution||DUPLICATE Summary|[4.1/4.2 Regression]|[4.1/4.2 Regression] |gcc.dg/vect/vect-10.c fails |gcc.dg/vect/vect-10.c fails |on x86_64 with -m32 |for -march=athlon http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24995
[Bug target/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p
--- Comment #5 from uros at kss-loka dot si 2005-11-24 10:19 --- *** Bug 24995 has been marked as a duplicate of this bug. *** -- uros at kss-loka dot si changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Bug 24982 depends on bug 24995, which changed state. Bug 24995 Summary: [4.1/4.2 Regression] gcc.dg/vect/vect-10.c fails for -march=athlon http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24995 What|Old Value |New Value Status|NEW |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982
[Bug rtl-optimization/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p
--- Comment #6 from uros at kss-loka dot si 2005-11-24 10:24 --- (In reply to comment #4) I've proposed a patch to this PR in http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01648.html Does it solve PR 24995? Yes, both i86_64 and -march=athlon failures. -- uros at kss-loka dot si changed: What|Removed |Added CC||rth at gcc dot gnu dot org BugsThisDependsOn|24995 | URL||http://gcc.gnu.org/ml/gcc- ||patches/2005- ||11/msg01648.html Status|UNCONFIRMED |NEW Component|target |rtl-optimization Ever Confirmed|0 |1 GCC host triplet|sh4-*-linux-gnu | GCC target triplet|sh4-*-linux-gnu | Keywords||patch Last reconfirmed|-00-00 00:00:00 |2005-11-24 10:24:02 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982
[Bug rtl-optimization/24982] [4.1/4.2 Regression] Bootstrap failure with ICE in refers_to_regno_for_reload_p
--- Comment #9 from uros at kss-loka dot si 2005-11-24 14:40 --- Critical, according to comment #7 and #8. -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si Severity|normal |critical Priority|P3 |P1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24982
[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32
--- Comment #6 from uros at kss-loka dot si 2005-11-15 08:13 --- Perhaps a runtime check should be added to target-supports.exp ( check_effective_target_tls-runtime perhaps) that would check if the system is capable of running tls enabled binaries. Alternatively, my proposed patch (http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00963.html) could try to run the tls testcase, instead of just compiling it. However, addind { dg-require-effective-target tls-runtime }, runtime tests will also be skipped on the system that is otherwise able to compile testcases. The job of compiler is IMO to compile sources correctly, and the purpose of runtime test is to check if the system is able to run testcases. Runtime failure, reported here, just says that the tested system is not able to run the testcase and that the system should be upgraded/fixed. -- uros at kss-loka dot si changed: What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475
[Bug target/24475] gcc.dg/tls/pr24428.c execution test and gcc.dg/tls/pr24428-2.c execution test fail on IA32
-- uros at kss-loka dot si changed: What|Removed |Added CC|uros at kss-loka dot si | AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2005- ||11/msg01059.html Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Keywords||patch Last reconfirmed|-00-00 00:00:00 |2005-11-15 13:43:11 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24475
[Bug libgomp/24797] Segfault in libgomp.c/nested-1.c
--- Comment #2 from uros at kss-loka dot si 2005-11-14 07:13 --- Fixed by Jakub's patch. -- uros at kss-loka dot si changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED Target Milestone|--- |4.1.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24797
[Bug rtl-optimization/15439] ICE with -fschedule-insns2 -fsched2-use-traces
--- Comment #4 from uros at kss-loka dot si 2005-11-11 08:20 --- This is in fact duplicate of PR 19340. Fixed in 3.4.5. *** This bug has been marked as a duplicate of 19340 *** -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work|4.0.0 |4.0.0 3.4.5 Resolution||DUPLICATE Target Milestone|--- |3.4.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15439
[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.
--- Comment #10 from uros at kss-loka dot si 2005-11-11 08:20 --- *** Bug 15439 has been marked as a duplicate of this bug. *** -- uros at kss-loka dot si changed: What|Removed |Added CC||coyote at coyotegulch dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340
[Bug libgomp/24797] New: Segfault in libgomp.c/nested-1.c
Hello! Testcase libgomp.c/netsted-1.c currently segfaults when run on i686-pc-linux.gnu (pentium4) wiht non-TLS libc (Redhat 8.0): (gdb) run [Thread debugging using libthread_db enabled] [New Thread 8192 (LWP 700)] [New Thread 16385 (LWP 702)] [New Thread 8194 (LWP 703)] [New Thread 16387 (LWP 704)] [New Thread 24580 (LWP 705)] [New Thread 32773 (LWP 706)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 32773 (LWP 706)] 0x42073fe0 in _int_free () from /lib/i686/libc.so.6 (gdb) Backtrace: #6 0x420da1ca in thread_start () from libc.so.6 #5 0x4005fa45 in pthread_start_thread_event () from libpthread.so.0 #4 0x4005f94d in pthread_start_thread () from libpthread.so.0 #3 0x4005e65a in __pthread_do_exit () from libpthread.so.0 #2 0x4006233c in __pthread_destroy_specifics () from libpthread.so.0 #1 0x42074a2c in free () from libc.so.6 #0 0x42073fe0 in _int_free () from libc.so.6 -- Summary: Segfault in libgomp.c/nested-1.c Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24797
[Bug rtl-optimization/24319] [3.4/4.0/4.1 regression] amd64 register spill error with -fschedule-insns
--- Comment #6 from uros at kss-loka dot si 2005-11-09 15:27 --- The problem is caused by the combination of (1) x86_64 parameter passing convention, (2) x86 instructions that _require_ parameters in specific registers and (3) sched1 scheduling pass. ad 1) x86_64 passes function parameters in registers in the order, defined in x86_64_int_parameter_registers[] array. 5 /*RDI*/, 4 /*RSI*/, 1 /*RDX*/, 2 /*RCX*/, FIRST_REX_INT_REG /*R8 */, FIRST_REX_INT_REG + 1 /*R9 */ Additionally, RAX is used as a hidden argument register. In original example, call sequence to memory_to_string is constructed as: (insn 17 15 18 0 (set (reg:DI 4 si) (reg:DI 61)) 81 {*movdi_1_rex64} (insn_list:REG_DEP_TRUE 15 (nil)) (expr_list:REG_DEAD (reg:DI 61) (nil))) (insn 18 17 19 0 (set (reg:DI 5 di [ c_string ]) (reg/v/f:DI 60 [ c_string ])) 81 {*movdi_1_rex64} (nil) (expr_list:REG_DEAD (reg/v/f:DI 60 [ c_string ]) (nil))) (call_insn 19 18 20 0 (set (reg:DI 0 ax) (call (mem:QI (symbol_ref:DI (memory_to_string) [flags 0x3] function_decl 0x4044f080 memory_to_string) [0 S1 A8]) (const_int 0 [0x0]))) 732 {*call_value_0_rex64} (insn_list:REG_DEP_TRUE 17 (insn_list:REG_DEP_TRUE 18 (nil))) (expr_list:REG_DEAD (reg:DI 4 si) (expr_list:REG_DEAD (reg:DI 5 di [ c_string ]) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil (expr_list:REG_DEP_TRUE (use (reg:DI 5 di [ c_string ])) (expr_list:REG_DEP_TRUE (use (reg:DI 4 si)) (nil ad 2) Please note, that this sequence can be found just after *strlenqi_rex_1 mega-pattern. This pattern requires parameters to be put in excactly defined registers: (define_insn *strlenqi_rex_1 [(set (match_operand:DI 0 register_operand =c) (unspec:DI [(mem:BLK (match_operand:DI 5 register_operand 1)) (match_operand:QI 2 register_operand a) (match_operand:DI 3 immediate_operand i) (match_operand:DI 4 register_operand 0)] UNSPEC_SCAS)) (use (reg:SI DIRFLAG_REG)) (clobber (match_operand:DI 1 register_operand =D)) (clobber (reg:CC FLAGS_REG))] However, at the time of sched1 pass (before reload) hard registers are not known yet. We have following RTL pattern just above memory_to_string call sequence (reg_notes are not shown for clarity): (insn 13 12 14 0 (parallel [ (set (reg:DI 63) (unspec:DI [ (mem:BLK (reg/f:DI 65 [ c_string ]) [0 A8]) (reg:QI 67) (const_int 1 [0x1]) (reg:DI 66) ] 20)) (use (reg:SI 19 dirflag)) (clobber (reg/f:DI 65 [ c_string ])) (clobber (reg:CC 17 flags)) ]) 511 {*strlenqi_rex_1} ad 3) Sched1 pass is free to move (insn 17) and (insn 18) before (insn 13) as it doesn't recognize register allocating conflicts between these instructions. Following that move, reload has no registers to spill and ICEs. The testcase from comment #3 ICEs with: error: unable to find a register to spill in class âAREGâ Here, the same problem could be observed. As foo is missing a prototype, hidden RAX register gets allocated in addition to RDI: (insn 20 18 21 0 (set (reg:DI 5 di) (reg:DI 61)) 81 {*movdi_1_rex64} (insn_list:REG_DEP_TRUE 18 (nil)) (expr_list:REG_DEAD (reg:DI 61) (nil))) (insn 21 20 22 0 (set (reg:QI 0 ax) (const_int 0 [0x0])) 55 {*movqi_1} (nil) (nil)) (call_insn 22 21 23 0 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI (foo) [flags 0x41] function_decl 0x402cbd80 foo) [0 S1 A8]) (const_int 0 [0x0]))) 732 {*call_value_0_rex64} (insn_list:REG_DEP_TRUE 20 (insn_list:REG_DEP_TRUE 21 (nil))) (expr_list:REG_DEAD (reg:DI 5 di) (nil)) (expr_list:REG_DEP_TRUE (use (reg:QI 0 ax)) (expr_list:REG_DEP_TRUE (use (reg:DI 5 di)) (nil This AX register is then moved before strlenqi_rex_1 pattern and this blocks the AX register. (BTW: If prototype of foo is added, this particular testcase compiles OK.) One possible fix to this problem would be not to schedule instructions that have assigned hard registers (move insns in above case). Considering the number of x86 instructions, that require fixed registers I would suggest bugmasters to raise the priority of this bug. The x86 backend should not have these problems, but using -mregparm=X I think it could also be tricked to this sort of ICEs. (BTW: I have added Jim Wilson to CC of this bug as he is current maintaine of insn scheduling pass code. Perhaps he has some ideas on how to solve this problem.) -- uros at kss-loka dot si changed: What|Removed |Added CC||wilson at gcc dot gnu dot
[Bug target/24315] [3.4 Regression] amd64 fails -fpeephole2
--- Comment #17 from uros at kss-loka dot si 2005-11-10 07:31 --- Fixed on 3.4 branch. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to fail|3.3.5 4.0.2 3.4.5 |3.3.5 4.0.2 Known to work|3.2.3 4.1.0 4.0.3 |3.2.3 4.1.0 4.0.3 3.4.5 Resolution||FIXED Target Milestone|4.0.3 |3.4.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24315
[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.
--- Comment #9 from uros at kss-loka dot si 2005-11-10 07:33 --- Fixed on 3.4 branch. -- uros at kss-loka dot si changed: What|Removed |Added Known to work|4.0.3 4.1.0 |4.0.3 4.1.0 3.4.5 Target Milestone|4.0.3 |3.4.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340
[Bug target/19340] Compilation SEGFAULTs with -O1 -fschedule-insns2 -fsched2-use-traces on an x86 architecture.
--- Comment #7 from uros at kss-loka dot si 2005-11-08 08:12 --- Fixed on mainline and 4.0 branch. -- uros at kss-loka dot si changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to fail|3.4.0 4.0.0 |3.4.0 Known to work||4.0.3 4.1.0 Resolution||FIXED Target Milestone|--- |4.0.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19340
[Bug c/24101] [3.4/4.0/4.1 Regression] Segfault with preprocessed source
--- Comment #9 from uros at kss-loka dot si 2005-11-08 10:04 --- Patch here: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00498.html -- uros at kss-loka dot si changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2005- ||11/msg00498.html Keywords||patch http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24101
[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro
--- Comment #7 from uros at kss-loka dot si 2005-11-08 12:40 --- Created an attachment (id=10173) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10173action=view) Patch to fix the ice This patch fixes the failure for me, but... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265
[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro
--- Comment #8 from uros at kss-loka dot si 2005-11-08 12:53 --- This patch fixes the failure for me, but... ... we actually gain nothing here. From .loop2_done, we have following sequence, where mem-reg load is pushed out of the loop: (insn 21 16 39 0 (set (reg:DF 64) (mem/u/c/i:DF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0 S8 A64])) -1 (nil) (nil)) ;; End of basic block 0, registers live: (nil) (note 39 21 17 NOTE_INSN_LOOP_BEG) ;; Start of basic block 1, registers live: (nil) (code_label 17 39 18 1 2 [1 uses]) (note 18 17 47 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 47 18 22 1 (set (mem:DF (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 S8 A32]) (reg:DF 64)) -1 (nil) (nil)) However, in .postreload, the insn 21 (now insn 53) is moved back _into_ the loop (why?): (note 21 16 39 0 NOTE_INSN_DELETED) ;; End of basic block 0, registers live: 6 [bp] 7 [sp] 16 [argp] 20 [frame] 60 64 (note 39 21 17 NOTE_INSN_LOOP_BEG) ;; Start of basic block 1, registers live: 6 [bp] 7 [sp] 59 60 64 (code_label 17 39 18 1 2 [1 uses]) (note 18 17 53 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 53 18 47 1 (set (reg:DF 8 st) (mem/u/c/i:DF (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0 S8 A64])) 63 {*movdf_noin teger} (nil) (nil)) (insn 47 53 54 1 (set (mem:DF (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 S8 A32]) (reg:DF 8 st)) 63 {*movdf_nointeger} (nil) (nil)) Proposed patch thus only fixes the damage. Otherwise, all this register moving/copying doesn't gain anything, as reload fixes something on its own. Also, REG_EQUAL notes are lost (before and after the patch). This results in following asm: ... movl $-1717986918, 8(%esp) movl $1070176665, 12(%esp) fldl -16(%ebp) fstpl (%esp) call dset movl $1, %ebx .L2: fldl .LC0 reload moves this insn back into the loop fstpl 8(%esp) fldl -16(%ebp) fstpl (%esp) call dset incl %ebx cmpl $4, %ebx jne .L2 addl $36, %esp ... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265
[Bug target/24265] [4.1 Regression] ICE: in extract_insn, at recog.c:2084 with -O -fgcse -fmove-loop-invariants -mtune=pentiumpro
--- Comment #9 from uros at kss-loka dot si 2005-11-08 13:23 --- Bah... set_unique_reg_note is needed: /* If new move insn is invalid (i.e. move of const_double to 387 stack register), force constant into memory. */ if (recog_memoized (inv-insn) == -1) { rtx src = SET_SRC (set); if (GET_CODE (src) == CONST_DOUBLE) { SET_SRC (set) = validize_mem (force_const_mem (mode, src)); set_unique_reg_note (inv-insn, REG_EQUAL, src); } } to produce: movl $1, %ebx .L2: movl $-1717986918, 8(%esp) movl $1070176665, 12(%esp) fldl -16(%ebp) fstpl (%esp) call dset addl $1, %ebx cmpl $4, %ebx jne .L2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24265