[Bug target/52941] SH Target: Add support for movco.l / movli.l atomics on SH4A

2012-04-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52941

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-16 
22:37:31 UTC ---
(In reply to comment #5)
The patch looks just fine.  I don't mind whether those atomics are
fully optimized or not ATM.  Programs having atomics in the minor
loop are pathological in the first place, I think.


[Bug target/52941] SH Target: Add support for movco.l / movli.l atomics on SH4A

2012-04-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52941

--- Comment #8 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-17 
00:54:00 UTC ---
(In reply to comment #7)
 Created attachment 27173 [details]
 Proposed patch

Looks even better.

 Only one thing ... is it safe to do the
 @-r15, @+r15 stuff in the atomic sequence?  I remember there were some
 border cases where things would blow up, but can't recall.  I've also briefly
 checked with atomic vars being on the stack and it looks OK.

I don't know about such restrictions, though my knowledge of
SH4A is very limited.  Perhaps some weired interaction of ll/sc
and cache?  Anyway, if it's a border issue, the patch is OK.
I'd like to pre-approve it.


[Bug target/52941] SH Target: Add support for movco.l / movli.l atomics on SH4A

2012-04-12 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52941

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-13 
03:29:25 UTC ---
(In reply to comment #2)
 One more thing regarding movco/movli ... do you think it's OK to use them also
 to do atomics on types  SImode?  As far as I can see it should be safe to do
 e.g. read SImode, modify QImode subreg, write-back SImode.

Yes, it'll make false-positive cases but would be safe.


[Bug target/52898] SH Target: Inefficient DImode comparisons

2012-04-11 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52898

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-12 
01:13:15 UTC ---
(In reply to comment #2)
I don't know about their history.  -mcbranchdi is enabled by default,
though.  See gcc/common/config/sh/sh-common.c:sh_option_optimization_table.
Unfortunately, it looks -mcmpeqdi causes many new failures on trunk.


[Bug target/52941] SH Target: Add support for movco.l / movli.l atomics on SH4A

2012-04-11 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52941

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-12 
01:18:42 UTC ---
(In reply to comment #0)
 Other than that, should we add another option '-mhard-atomic' (which would
 enable the movco/movli atomics on SH4A and disable all atomic insns for
 non-SH4A)?

I think so.

 Actually, I think the options should be '-msp-atomic' and '-mmp-atomic', where
 '-msp-atomic' would be the current '-msoft-atomic'.

I don't think that -msp/mmp-atomic are good naming here.  SP/MP notion
is not directory connected with the soft/hard implementation of atomics,
even if soft atomics are impossible for real MP system.  Hard atomics
should work with both SP and MP.  I guess that the point is the necessity
of kernel (i.e. software) services.  If the atomics require kernel services,
they are soft atomics even some of them utilize the LL/SC-like insns.
If they don't require any kernel services, they are hard atomics.
Using -msp-atomic for soft atomics looks a bit misleading, from this point
of view.
Perhaps an unsupprising way would be enable movco/movli on SH4A with both
-msoft-atomic/-mhard-atomic if we can.


[Bug libstdc++/29366] atomics config for sh is weird

2012-04-11 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29366

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-04-12 
01:20:06 UTC ---
(In reply to comment #2)
 I think some of the problems will disappear once PR 52941 is done.
 After that, and having the new atomic builtins of 4.7, we could get rid of the
 config/cpu/sh/atomicity.h file completely, if I'm not mistaken.

Agreed.


[Bug target/48806] ICE in reload_cse_simplify_operands, at postreload.c:403

2012-03-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48806

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-22 
21:39:51 UTC ---
Author: kkojima
Date: Thu Mar 22 21:39:45 2012
New Revision: 185714

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=185714
Log:
Backported from mainline
2012-03-02  Kaz Kojima  kkoj...@gcc.gnu.org

PR target/48596
PR target/48806
* config/sh/sh.c (sh_register_move_cost): Increase cost between
GENERAL_REGS and FP_REGS for SImode.


Modified:
branches/gcc-4_7-branch/gcc/ChangeLog
branches/gcc-4_7-branch/gcc/config/sh/sh.c


[Bug rtl-optimization/48596] [4.7/4.8 Regression] [SH] unable to find a register to spill in class 'FPUL_REGS'

2012-03-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48596

--- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-22 
21:39:51 UTC ---
Author: kkojima
Date: Thu Mar 22 21:39:45 2012
New Revision: 185714

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=185714
Log:
Backported from mainline
2012-03-02  Kaz Kojima  kkoj...@gcc.gnu.org

PR target/48596
PR target/48806
* config/sh/sh.c (sh_register_move_cost): Increase cost between
GENERAL_REGS and FP_REGS for SImode.


Modified:
branches/gcc-4_7-branch/gcc/ChangeLog
branches/gcc-4_7-branch/gcc/config/sh/sh.c


[Bug rtl-optimization/48596] [4.7/4.8 Regression] [SH] unable to find a register to spill in class 'FPUL_REGS'

2012-03-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48596

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED

--- Comment #10 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-22 
22:19:47 UTC ---
Fixed.


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-19 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #35 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-20 
01:45:14 UTC ---
(In reply to comment #34)
 Interesting, thanks!  I'll also test your patch and send it around, OK?

OK, thanks!

 I'm a bit confused... was the issue caused by my patches to for this PR, or by
 something else?

I guess that it was caused by another changes but was latent for a while.


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-15 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #33 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-15 
07:52:21 UTC ---
(In reply to comment #31)
 Created attachment 26859 [details]
 testresult on sh4-unknown-linux-gnu [trunk revision 185088].

FYI, looking into the libstdc++ failures for sh4-unknown-linux-gnu,
it seems that the call insn was swapped before prologue frame insns
and then it makes unwinder confused.  -fno-delayed-branch also stops
that swapping for these failing cases.  The patch below works for me.

* config/sh/sh.c (sh_expand_prologue): Emit blockage at the end
of prologue for unwinder and profiler.

--- ORIG/trunk/gcc/config/sh/sh.c2012-03-06 10:28:32.0 +0900
+++ trunk/gcc/config/sh/sh.c2012-03-14 20:22:15.0 +0900
@@ -7234,6 +7234,13 @@ sh_expand_prologue (void)
   emit_insn (gen_shcompact_incoming_args ());
 }

+  /* If we are profiling, make sure no instructions are scheduled before
+ the call to mcount.  Similarly if some call instructions are swapped
+ before frame related insns, it'll make unwinder confused because
+ currently SH has no unwind info for function epilogues.  */
+  if (crtl-profile || flag_exceptions || flag_unwind_tables)
+emit_insn (gen_blockage ());
+
   if (flag_stack_usage_info)
 current_function_static_stack_size = stack_usage;
 }


[Bug target/52479] SH Target: SH4A DFmode fsca tests failing

2012-03-15 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52479

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-16 
03:21:08 UTC ---
There is no concrete definition of -ffast-math and users will have
different expectations.  Numerical programs for astrodynamics may
expect precisions even for -ffast-math and OTOH there are many
programs which don't take care of precisions at all and prefer
sincos instead of sinfcosf.  I guess that the former doesn't make
much sense for SH4A in the first place and for users of the latter
with -ffast-math, the proposed change may be a bit surprising.
I have no strong opinion for this, though.  I also won't object to
the suggested patch.


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #29 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-09 
08:40:32 UTC ---
(In reply to comment #28)
Regtest on sh4-unknown-lunix-gnu has been done successfully.
Oleg, your patch is pre-approved.


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #31 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-09 
10:36:31 UTC ---
Created attachment 26859
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26859
A test result

testresult on sh4-unknown-linux-gnu [trunk revision 185088].


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #24 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-08 
11:11:32 UTC ---
(In reply to comment #23)
 Kaz, if you have some time, could you try it out in your setup, too please?

On trunk revision 185088, for sh4-unknown-linux-gnu, the result of
compare_tests is:

New tests that FAIL:

gfortran.dg/associated_4.f90  -O1  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer -funroll-all-loops
-finline-functions  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer -funroll-loops  execution
test
gfortran.dg/forall_4.f90  -O3 -g  execution test

Old tests that failed, that have disappeared: (Eeek!)

22_locale/ctype/is/char/3.cc execution test
27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test
gfortran.dg/widechar_intrinsics_6.f90  -Os  execution test


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #25 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-08 
11:13:39 UTC ---
Created attachment 26854
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26854
worked .s file associated_4_good.s


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #26 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-08 
11:16:39 UTC ---
Created attachment 26855
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26855
unworked .s file associated_4_bad.s

I've attached .s files against gfortran.dg/associated_4.f90 -O1 with
patched/unpatched compilers.


[Bug fortran/34040] Support for DOUBLE_TYPE_SIZE != 64 targets

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34040

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #12 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-08 
22:15:33 UTC ---
*** Bug 52535 has been marked as a duplicate of this bug. ***


[Bug libfortran/52535] SH Target: libfortran won't build for sub-targets where DFmode is set to SFmode?

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52535

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-08 
22:15:32 UTC ---
This is a known issue as PR34040.

*** This bug has been marked as a duplicate of bug 34040 ***


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #28 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-09 
01:44:52 UTC ---
(In reply to comment #27)
 Created attachment 26858 [details]
 Patch for the patch

Looks all fortran regressions gone away.  I'll run full tests
on sh4-unknown-lunix-gnu.


[Bug target/52503] sh-wrs-vxworks: too many target masks

2012-03-07 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52503

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-07 
22:06:28 UTC ---
Author: kkojima
Date: Wed Mar  7 22:06:25 2012
New Revision: 185081

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=185081
Log:
PR target/52503
* config/sh/sh.opt (msoft-atomic): Use Var instead of Mask.
* config/sh/linux.h (TARGET_DEFAULT): Remove MASK_SOFT_ATOMIC.
(SUBTARGET_OVERRIDE_OPTIONS): Define.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/linux.h
trunk/gcc/config/sh/sh.opt


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #15 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
08:49:27 UTC ---
(In reply to comment #14)
 I've run the testsuite on rev 184966 (without fortran though), but the 
 failures
 that you've mentioned did not show up.  Usually when I rebuild the whole
 toolchain including newlib, I have C/CPP/CXXFLAGS_FOR_TARGET set to '-Os
 -mpretend-cmove'.  This time I removed those, but the results seem to be the
 same.  Could you also please try again?  This is suspicious...

I've seen same failures on sh4-unknown-linux-gnu for trunk rev 184971.
With backing r184966 changes out, they went away.  Weird.


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #17 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
10:36:01 UTC ---
Created attachment 26837
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26837
preprocessed file ctype_configure_char.i


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #18 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
10:37:13 UTC ---
Created attachment 26838
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26838
worked .s file ctype_configure_char_good.s


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #19 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
10:38:22 UTC ---
Created attachment 26839
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26839
unworked .s file ctype_configure_char_bad.s


[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #20 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
10:40:31 UTC ---
(In reply to comment #16)
 Can we keep the r184966 changes anyways?  I will keep an eye on these failures
 whether I can reproduce them.  If you have some time, could you please send me
 the intermediate .i and .s files of the failing and passing version of the
 '22_locale/ctype/is/char/3.cc' test case?

I've confirmed that 22_locale/ctype/is/char/3.cc doesn't fail
if linking with libstdc++.a which is built with the compiler
without r184966 changes. The .s files against 3.cc are same with
the both compilers.  It looks that the problematic object is
libstdc++-v3/src/c++98/ctype_configure_char.o because the error
went away if replacing it with another one.  I've attached .i and
.s files for that file.  The option used is

COLLECT_GCC_OPTIONS='-shared-libgcc' '-B' '/exp/ldroot/dodes/xsh-gcc/./gcc'
'-nostdinc++'
'-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src'
'-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src/.libs'
'-B' '/usr/local/sh4-unknown-linux-gnu/bin/' '-B'
'/usr/local/sh4-unknown-linux-gnu/lib/' '-isystem'
'/usr/local/sh4-unknown-linux-gnu/include' '-isystem'
'/usr/local/sh4-unknown-linux-gnu/sys-include' '-I'
'/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/../libgcc' '-I'
'/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include/sh4-unknown-linux-gnu'
'-I'
'/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include'
'-I' '/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/libsupc++'
'-fno-implicit-templates' '-Wall' '-Wextra' '-Wwrite-strings' '-Wcast-qual'
'-Wabi' '-fdiagnostics-show-location=once' '-ffunction-sections'
'-fdata-sections' '-frandom-seed=ctype_configure_char.lo' '-g' '-O2' '-D'
'_GNU_SOURCE' '-S' '-fPIC' '-D' 'PIC' '-o'


[Bug target/52503] sh-wrs-vxworks: too many target masks

2012-03-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52503

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
23:18:16 UTC ---
Created attachment 26845
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26845
A patch

config/sh/linux.h requires a few changes too.


[Bug target/48806] ICE in reload_cse_simplify_operands, at postreload.c:403

2012-03-05 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48806

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-06 
00:26:30 UTC ---
It looks that the testcase came from a FreeBSD kernel code:
http://www.leidinger.net/FreeBSD/dox/net80211/html/d7/d8d/ieee80211__crypto__ccmp_8c_source.html

gcc.c-torture/execute/pr20527-1.c is an example of gcc testcase
which includes a BSD-like license notice, though I'm not sure about
copyright issues.  The gcc list would be more appropriate for
the questions about copyrights.


[Bug target/52479] SH Target: SH4A DFmode fsca tests failing

2012-03-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52479

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-04 
13:46:58 UTC ---
I'd like to add Alex to the CC list.  Alex, what do you think?


[Bug target/52480] SH Target: SH4A movua.l does not work for big endian

2012-03-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52480

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-05 
05:30:18 UTC ---
Created attachment 26831
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26831
A possible patch

Looks to be a similar problem with PR52394.


[Bug target/52483] SH Target: Loads from volatile memory leave redundant sign/zero extensions

2012-03-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52483

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-05 
05:33:39 UTC ---
(In reply to comment #0)
 Maybe a few peepholes would help here?

Sure.  Peephole looks to be reasonable for this.


[Bug rtl-optimization/48596] [4.7/4.8 Regression] [SH] unable to find a register to spill in class 'FPUL_REGS'

2012-03-02 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48596

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-02 
23:59:16 UTC ---
Author: kkojima
Date: Fri Mar  2 23:59:08 2012
New Revision: 184844

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=184844
Log:
PR target/48596
PR target/48806
* config/sh/sh.c (sh_register_move_cost): Increase cost between
GENERAL_REGS and FP_REGS for SImode.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c


[Bug target/48806] ICE in reload_cse_simplify_operands, at postreload.c:403

2012-03-02 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48806

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-02 
23:59:17 UTC ---
Author: kkojima
Date: Fri Mar  2 23:59:08 2012
New Revision: 184844

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=184844
Log:
PR target/48596
PR target/48806
* config/sh/sh.c (sh_register_move_cost): Increase cost between
GENERAL_REGS and FP_REGS for SImode.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c


[Bug target/52441] SH Target: Double sign/zero extensions for function arguments

2012-03-01 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52441

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-01 
22:00:14 UTC ---
(In reply to comment #0)
 The sign/zero extensions in the caller (_xx) are not emitted when using the
 original Renesas ABI (-mrenesas), which is correct.

Correct for efficiency, but not for robustness :-)

 Maybe this double sign/zero extension has some historical reason for some ABI
 backwards compatibilities in the GNU SH ABI... but shouldn't it actually be
 safe to leave out the sign/zero extensions on one side of the function call
 (either caller or callee)?

I don't know any historical reason but x86 uses that double sign/zero
extension too.  It wouldn't be a safe ABI change.  There can exist hand
written functions depending that behavior.  It's too late to change
the default behavior, I think.  Of course, you can add a new -m option
or function attribute changing it, though it shouldn't be default for
non Renesas ABI.


[Bug rtl-optimization/11736] Stackpointer messed up on SuperH

2012-03-01 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11736

--- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-03-01 
22:03:09 UTC ---
I think so too.


[Bug target/49468] SH Target: inefficient integer abs code

2012-02-29 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

--- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-02-29 
23:18:23 UTC ---
(In reply to comment #8)
Perhaps.  Anyway looks fine to me except one minor failure
on sh64-elf:

xsh64-elf-combined/combined/libgcc/libgcc2.c: In function '__powisf2':
xsh64-elf-combined/combined/libgcc/libgcc2.c:1779:1: error: unrecognizable
insn:
(insn 11 10 12 3 (set (reg:DI 170)
(abs:DI (reg:DI 169)))
xsh64-elf-combined/combined/libgcc/libgcc2.c:1770 -1
 (nil))
xsh64-elf-combined/combined/libgcc/libgcc2.c:1779:1: internal compiler error:
in extract_insn, at recog.c:2123

The failure went away if restricting new absdi2 expander to TARGET_SH1.

--- gcc/config/sh/sh.md~2012-02-29 10:52:16.0 +0900
+++ gcc/config/sh/sh.md2012-02-29 11:07:42.0 +0900
@@ -4538,7 +4538,7 @@ label:
   [(set (match_operand:DI 0 arith_reg_dest )
 (abs:DI (match_operand:DI 1 arith_reg_operand )))
(clobber (reg:SI T_REG))]
-  
+  TARGET_SH1
   )

 (define_insn_and_split *absdi2


[Bug target/52394] SH Target: SH2A defunct bitops

2012-02-27 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52394

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-02-28 
00:37:56 UTC ---
I guess that now these tests require -fno-strict-volatile-bitfields,
though it isn't enough to avoid failures.  It looks that something
wrong happens in expmed.c:{store, extract}_bit_field_1 and they decide
to use slow fallback {store, extract}_fixed_bit_field instead of
generating insv/extv.

Here is suspicious part of {store, extract}_bit_field_1:

  /* Now convert from counting within UNIT to counting in EXT_MODE.  */
  if (BYTES_BIG_ENDIAN  !MEM_P (xop0))
xbitpos += GET_MODE_BITSIZE (ext_mode) - unit;

  unit = GET_MODE_BITSIZE (ext_mode);

  /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count
 backwards from the size of the unit we are extracting from.
 Otherwise, we count bits from the most significant on a
 BYTES/BITS_BIG_ENDIAN machine.  */

  if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
xbitpos = unit - bitsize - xbitpos;

In the problematic cases, xop0 is a QImode memory and ext_mode is SImode.
The initial value of unit is 8.  When starting xbitops is 3 and bitsize is
1 for example, these lines set xbitspos to 28!  There is no insv/extv which
inserts/extracts such bit position for QImode memory and maybe_expand_insn
for CODE_FOR_{insv, extv} fails.  Perhaps, these parts should be something
like

  /* We have been counting XBITPOS within UNIT.
 Count instead within the size of the register.  */
  if (BYTES_BIG_ENDIAN  !MEM_P (xop0))
xbitpos += GET_MODE_BITSIZE (op_mode) - unit;

  /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count
 backwards from the size of the unit we are inserting into.
 Otherwise, we count bits from the most significant on a
 BYTES/BITS_BIG_ENDIAN machine.  */

  if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
{
  if (!MEM_P (xop0))
xbitpos = GET_MODE_BITSIZE (op_mode) - bitsize - xbitpos;
  else
xbitpos = unit - bitsize - xbitpos;
}

  unit = GET_MODE_BITSIZE (op_mode);

though I don't understand these routines well.


[Bug target/52049] SH Target: Inefficient constant address access

2012-01-29 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52049

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2012-01-30 
00:15:16 UTC ---
(In reply to comment #0)
 I'm not sure whether this is actually a problem of the SH back-end or of some
 middle-end passes.  It happens for all sub-targets and regardless of the
 endianess.

I've tried these cases on arm/thumb and got similar results
which look not very good.  From the rtl dumps, it looks a general
issue with postreload optimization on some targets.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-12-29 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #11 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-30 
03:24:01 UTC ---
(In reply to comment #10)
 If OK, I'd like to change it from target PR to middle-end PR.

Sure.


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-28 
22:25:48 UTC ---
(In reply to comment #3)
 I haven't ran all tests on it yet, but CSiBE shows average code size reduction
 of approx. -0.1% for -m4* with some code size increases in some files.
 Would something like that be OK for stage 3?

Looks good, though not appropriate for stage 3, I think.


[Bug target/51340] SH Target: Make -mfused-madd enabled by default

2011-12-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-28 
22:31:27 UTC ---
(In reply to comment #2)
 Uhm, yes...
 The title should have been Enable -mfused-madd by -ffast-math

Do you mean something like this?

--- ORIG/trunk/gcc/config/sh/sh.c2011-12-03 10:03:41.0 +0900
+++ trunk/gcc/config/sh/sh.c2011-12-27 08:33:23.0 +0900
@@ -838,6 +838,11 @@ sh_option_override (void)
 align_functions = min_align;
 }

+  /* Default to use fmac insn when -ffast-math.  See PR target/29100.  */
+  if (global_options_set.x_TARGET_FMAC == 0
+   fast_math_flags_set_p (global_options)
+TARGET_FMAC = 1;
+
   if (sh_fixed_range_str)
 sh_fix_range (sh_fixed_range_str);

 I don't know the exact semantics for the new patterns.  All I know is that
 rounding is supposed to be done only once after the two operations.  This is
 the case for the SH fmac insn.  Not sure whether this is enough though.

It seems that we can use the fma pattern, though it would be an another issue.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-12-12 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #21 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-12 
22:08:18 UTC ---
(In reply to comment #20)
 As far as I could observe it, this is mainly triggered by the following in
 sh_legitimate_index_p:

 +  if (mode == QImode  (unsigned) INTVAL (op)  16)
 +return true;

It seems that, with that hunk, recog.c:offsettable_address_addr_space_p
returns always true for V2SF mode.  Without that hunk, it returns false
for that case.  There are comments and lines in that function like

  /* Use QImode because an odd displacement may be automatically invalid
 for any wider mode.  But it should be valid for a single byte.  */
  good = (*addressp) (QImode, y, as);

where addrssp is *memory_address_addr_space_p which returns true with
that hunk.

 You mean, by giving the user the option to turn off displacement addressing 
 for
 e.g. some specific files / modules by specifying -mno-preferdisp or something
 like that?  By anomalies do you mean code that gets worse because of too much
 pressure on R0 and all the reloads around it, or do you have any other bad use
 cases?

Yes and yes.  Although I didn't look all dis-improvements,
it looks r0 pressure is the primary factor.

 Another thing I could try out is to have load/store insns that allow arbitrary
 operands in displacement addressing like on SH2A, and split them into two 
 insns
 of one load/store and one reg-reg move after reload.  But that would probably
 require the R0 clobber in the expander which could make worse code in cases
 where displacement addressing is not used, I guess.
 Do you think this approach could make sense?

I guess that it could make worse code in some situations as you say.

 Yep, sure.  I've noticed that the latest version of the patch seems to fix 
 some
 more testsuite failures.  I will investigate which hunk is responsible for the
 fixes so that could be pulled out from the patch.  OK?

Sounds great.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-12-11 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #19 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-11 
23:57:13 UTC ---
(In reply to comment #18)
The results look way better now.  I've tested your latest patch for
sh4-unknown-linux-gnu and found no new regressions for gcc testsuite.
CSiBE with -O2 -fpic on that target shows that 144 improvements and
28 dis-improvements for size on 896 files.  The worst case is
-4.34783 net/ipv4/ip_forward 704 736
which looks the case of the high r0 register pressure.  The best one is
25.7426 arch/testplatform/kernel/traps 10160 8080
which looks to be very impressive.

   /* We want to enable the use of SUBREGs as a means to
  VEC_SELECT a single element of a vector.  */
+
+  /* This effectively disallows using GENERAL_REGS for SFmode vector subregs.
+ This can be problematic when SFmode vector subregs need to be accessed
+ on the stack with displacement addressing, as it happens with -O0.
+ Thus we allow the mode change for -O0.  */
   if (to == SFmode  VECTOR_MODE_P (from)  GET_MODE_INNER (from) == SFmode)
-return (reg_classes_intersect_p (GENERAL_REGS, rclass));
+return optimize ? (reg_classes_intersect_p (GENERAL_REGS, rclass)) : 
false;

Rather than that, I guess that the QI/HImode disp addressing would
be an optimization unneeded for -O0 in the first place.  Perhaps
something like -mpreferdisp option and TARGET_PREFER_DISP macro
which are enable by default but disable at -O0 might be help.  It'll
also help some unfortunate anormallies for which those optimizations
will generate worse codes.

 There are probably smarter ways of doing what the patch does.  I have also
 tried out implementing it with predicates and constraints, few load/store 
 insns
 and lots of alternatives in the insns.  However, reload would refuse to select
 the displacement addressing due to pressure on R0 in many cases.

Maybe.  Implementing it with predicates and constraints would be
smarter if possible but may be difficult because the register
allocator handles the m constraint specially.

 Would something like the attached patch be acceptable (after some cleanups)? 
 If so, I'd also start adding HImode displacement addressing support.

I think so, though we are in stage 3 and have to wait the trunk returns
to stage 1 or 2 for committing such changes.  You have the time for
implementing HImode support.
BTW, the changes for white spaces, spells and other clean-ups which
are not essential for this work should be separated into another patch.


[Bug middle-end/51351] undefined reference to __sync_fetch_and_ior_4

2011-12-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51351

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-04 
11:45:25 UTC ---
Thanks for the quick fix!


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-12-02 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED

--- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-02 
23:42:56 UTC ---
Fixed on trunk.


[Bug target/51337] SH Target: Various testsuite ICEs for -m2a -O0

2011-12-02 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51337

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-02 
23:44:34 UTC ---
Fixed.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-12-01 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-01 
23:02:08 UTC ---
Author: kkojima
Date: Thu Dec  1 23:01:58 2011
New Revision: 181896

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181896
Log:
PR target/50814.
* config/sh/sh.c (expand_ashiftrt): Handle TARGET_SH2A same as
TARGET_SH3.
(shl_sext_kind): Likewise.
* config/sh/sh.h (SH_DYNAMIC_SHIFT_COST): Likewise.
* config/sh/sh.md (ashlsi3_sh2a, ashrsi3_sh2a, lshrsi3_sh2a):
Remove.
(ashlsi3_std): Handle TARGET_SH2A same as TARGET_SH3.
(ashlsi3): Likewise.
(ashrsi3_d): Likewise.
(lshrsi3_d): Likewise.
(lshrsi3): Likewise.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md


[Bug target/51337] SH Target: Various testsuite ICEs for -m2a -O0

2011-11-29 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51337

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-29 
22:52:59 UTC ---
Author: kkojima
Date: Tue Nov 29 22:52:55 2011
New Revision: 181823

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181823
Log:
PR target/51337
* config/sh/sh.c (sh_secondary_reload): Add case when FPUL
register is being loaded from a pseudo in memory.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c


[Bug middle-end/51351] New: undefined reference to __sync_fetch_and_ior_4

2011-11-29 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51351

 Bug #: 51351
   Summary: undefined reference to __sync_fetch_and_ior_4
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kkoj...@gcc.gnu.org


On SH, there are libgomp test failures with

undefined reference to `__sync_fetch_and_ior_4'

Doc refers __sync_fetch_and_or but not __sync_fetch_and_ior.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-11-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 
13:43:16 UTC ---
BTW, when regtesting, I've found that there are many ICEs at -O0.
A typical one is gcc.c-torture/compile/2923-1.c with -m2a -O0:

...: error: insn does not satisfy its constraints:
(insn 142 34 35 (set (mem/c:SI (plus:SI (reg/f:SI 14 r14)
(const_int 36 [0x24])) [0 %sfp+-16 S4 A32])
(reg:SI 150 fpul)) ... {movsi_ie}
 (nil))
...: internal compiler error: in extract_constrain_insn_cached, at recog.c:2052

which is solved by the hunk in the patch against PR50751

--- gcc/config/sh/sh.c.orig2011-11-28 10:03:04.0 +0900
+++ gcc/config/sh/sh.c2011-11-28 15:09:01.0 +0900
@@ -12432,6 +12432,10 @@ sh_secondary_reload (bool in_p, rtx x, r
   if (rclass != GENERAL_REGS  REG_P (x)
TARGET_REGISTER_P (REGNO (x)))
 return GENERAL_REGS;
+  /* If here fall back to loading FPUL register through general regs. 
+ Happens when FPUL has to be loaded from a reg allocated on the stack.  */
+  if (rclass == FPUL_REGS  !REG_P (x))
+return GENERAL_REGS;
   return NO_REGS;
 }

Oleg, it seems that this is the right patch for an independent issue
described in your comment.  Could you please file it to the bugzilla
and propose that patch to the gcc-patch list?


[Bug target/51340] SH Target: Make -mfused-madd enabled by default

2011-11-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 
23:09:32 UTC ---
(In reply to comment #0)
 Is there any particular reason why this should not be enabled by
 default for SH targets that support the FMAC insn?

PR29100?

BTW, if SH fmac satisfies the semantics for fused multiplication and
add operation, the fmaf4 instruction pattern would be better now.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-11-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 
23:29:57 UTC ---
(In reply to comment #8)
 Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.

For that function, -fdump-tree-all shows that the tree loop ivopts
optimization does it.  Try -fno-tree-forwprop -fno-ivopts.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-11-27 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 
00:09:17 UTC ---
(In reply to comment #2)
 According to the SW manual document rej09b0051_sh2a.pdf the SHAD and SHLD 
 insns
 have the same 2-byte format as on SH3:
 
 SHAD Rm, Rn: 01001100
 SHLD Rm, Rn: 01001101 
 
 Am I missing something there?

Ugh.  You are right.  I thought so from sh2a support was introduced
at r85286.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-11-27 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 
04:31:51 UTC ---
Created attachment 25927
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25927
A patch

I'm testing the attached patch.


[Bug target/51244] SH Target: Inefficient conditional branch

2011-11-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-22 
22:33:43 UTC ---
  return (a != b || a != c) ? b : c;

test_func_0_NG and test_func_1_NG cases are related with the target
implementation of cstoresi4.
The middle end expands a complex conditional jump to cstores and
a simple conditional jumps.  For expression a != b, SH's cstoresi4
implementation uses sh.c:sh_emit_compare_and_set which generates
cmp/eq and movnegt insn, because we have no cmp/ne insn.  Then we've
got the sequence

  mov #-1,rn
  negc rn,rm
  tst #255,rm

which is essentially T_reg = T_reg.  Usually combine catches such
situation, but negc might be too complex for combine.
For this case, replacing current movnegt expander by insn, splitter
and peephole something like

(define_insn movnegt
  [(set (match_operand:SI 0 arith_reg_dest =r)
(plus:SI (reg:SI T_REG) (const_int -1)))
   (clobber (match_scratch:SI 1 =r))
   (clobber (reg:SI T_REG))]
  
  #
 [(set_attr length 4)])

(define_split
  [(set (match_operand:SI 0 arith_reg_dest =r)
(plus:SI (reg:SI T_REG) (const_int -1)))
   (clobber (match_scratch:SI 1 =r))
   (clobber (reg:SI T_REG))]
  reload_completed
  [(set (match_dup 1) (const_int -1))
   (parallel [(set (match_dup 0)
   (neg:SI (plus:SI (reg:SI T_REG)
(match_dup 1
  (set (reg:SI T_REG)
   (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1))
  (const_int 0)))])]
  )

(define_peephole2
  [(set (match_operand:SI 1  ) (const_int -1))
   (parallel [(set (match_operand:SI 0  )
   (neg:SI (plus:SI (reg:SI T_REG)
(match_dup 1
  (set (reg:SI T_REG)
   (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1))
  (const_int 0)))])
   (set (reg:SI T_REG)
(eq:SI (match_operand:QI 3  ) (const_int 0)))]
  REGNO (operands[3]) == REGNO (operands[0])
peep2_reg_dead_p (3, operands[0])
peep2_reg_dead_p (3, operands[1])
  [(const_int 0)]
  )

the above useless sequence could be removed, though we will miss
the chance that the -1 can be CSE-ed when the cstore value is
used.  This will cause a bit worse code for the loop like

int
foo (int *a, int x, int n)
{
  int i;
  int count;

  for (i = 0; i  n; i++)
count += (*(a + i) != x);

  return count;
}

though it may be relatively rare.

BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.

  return a = 0  b = 0 ? c : d;

x = 0 is expanded to the sequence like

  ra = not x
  rb = -31
  rc = ra  (neg rb)
  T = (rc == 0)
  conditional jump

and combine tries to simplify it.  combine simplifies b = 0
successfully into shll and bt but fails to simplify a = 0.
It seems that combine doesn't do constant propagation well and
misses the constant -31.  In this case, a peephole like

(define_peephole2
  [(set (match_operand:SI 0 arith_reg_dest )
(not:SI (match_operand:SI 1 arith_reg_operand )))
   (set (match_operand:SI 2 arith_reg_dest ) (const_int -31))
   (set (match_operand:SI 3 arith_reg_dest )
(lshiftrt:SI (match_dup 0) (neg:SI (match_dup 2
   (set (reg:SI T_REG)
(eq:SI (match_operand:QI 4 arith_reg_operand )
   (const_int 0)))
   (set (pc)
(if_then_else (match_operator 5 comparison_operator
[(reg:SI T_REG) (const_int 0)])
  (label_ref (match_operand 6  ))
  (pc)))]
  REGNO (operands[3]) == REGNO (operands[4])
peep2_reg_dead_p (4, operands[0])
(peep2_reg_dead_p (4, operands[3])
   || rtx_equal_p (operands[2], operands[3]))
peep2_regno_dead_p (5, T_REG)
  [(set (match_dup 2) (const_int -31))
   (set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0)))
   (set (pc)
(if_then_else (match_op_dup 7 [(reg:SI T_REG) (const_int 0)])
  (label_ref (match_dup 6))
  (pc)))]
  
{
  operands[7] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[5])),
GET_MODE (operands[5]),
XEXP (operands[5], 0), XEXP (operands[5], 1));
})

will be a workaround.  It isn't ideal, but better than nothing.

  return a == b ? test_sub0 (a, b) : test_sub1 (a, b);
  return a != b ? test_sub0 (a, b) : test_sub1 (a, b);

This case is intresting.  At -Os, two calls are converted into
one computed goto.  A bit surprisingly, the conversion is done
as a side effect of combine-stack-adjustments pass.  That pass
calls

  cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0);

and the cross jumping optimization merges two calls.
With -Os -fno-delayed-branch, the OK case is compiled to

test_func_3_OK:
mov r4,r1
cmp/eq  r5,r1
mov.l   .L4,r0
bf  .L3
mov r1,r5
mov.l   .L5,r0
bra .L3
nop
.L3:
jmp @r0
nop

and the NG case

test_func_3_NG:
mov r4,r1
cmp/eq  r5,r1
bt  .L2
mov.l   .L4,r0
bra .L3
nop
.L2:
mov.l   .L5,r0
mov r1,r5
.L3

[Bug target/51241] SH Target: Unnecessary sign/zero extensions

2011-11-20 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51241

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-21 
01:52:22 UTC ---
Please put the description of the problem into the trail itself
instead of attachment next time.

The problem looks to be splitted into several issues.

   mov.b   @r4+,r3 ! 40*extendqisi2_compact_mem_inc
   extu.b  r3,r3   ! 41*zero_extendqisi2_compact   --- extu.b r3, r0
   mov r3,r0   ! 75movsi_ie/2  --- ??

This mov insn is generated with reload.  After all, SH's and #imm,*
would be too restrictive.

   exts.b  r3,r3   ! 50*extendqisi2_compact--- ??
   and #127,r0 ! 45*andsi3_compact/1 --- makes extu.b useless
   cmp/pz  r3  ! 51cmpgesi_t/2

As you pointed out, if cmp/pz is placed at just after mov.b insn,
exts.b is not required.  I don't know whether such pass exists or not.

   mov.l   @r4,r1  ! 7 movsi_ie/7  [length = 2]
   swap.w  r1,r1   ! 13rotlsi3_16  [length = 2]
   exts.w  r1,r1   ! 14*extendhisi2_compact/1  [length = 2]
   rts ! 21*return_i   [length = 2]
   mov.b   r1,@r5  ! 10*movqi/4[length = 2]

The sequence of swap.w and exts.w are generated ashrsi2_16 insn
and its splitter.  exts.w could be removed by the combine pass,
though the split is done after combine.  Perhaps with replacing
that insn and splitter with an expand like

(define_expand ashrsi2_16
  [(set (match_operand:SI 0 arith_reg_dest )
(rotate:SI (match_operand:SI 1 arith_reg_operand )
   (const_int 16)))
   (set (match_dup 0) (sign_extend:SI (match_dup 2)))]
  TARGET_SH1
  operands[2] = gen_lowpart (HImode, operands[0]);)

the combine will do the work.

   negcr1,r1   ! 10negc[length = 2]
   extu.b  r1,r0   ! 12*zero_extendqisi2_compact   [length = 2]

Again, usually the combine pass can remove such extu.b.
Perhaps negc has a pattern

  [(set (match_operand:SI 0 arith_reg_dest =r)
(neg:SI (plus:SI (reg:SI T_REG)
 (match_operand:SI 1 arith_reg_operand r
   (set (reg:SI T_REG)
(ne:SI (ior:SI (reg:SI T_REG) (match_dup 1))
   (const_int 0)))]

which would be too complex for combine.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-11-13 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #10 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-13 
23:00:15 UTC ---
Author: kkojima
Date: Sun Nov 13 23:00:10 2011
New Revision: 181340

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181340
Log:
PR target/50694
* config/sh/sh.h (IS_LITTLE_ENDIAN_OPTION, UNSUPPORTED_SH2A):
New macros.
(DRIVER_SELF_SPECS): Use new macros to filter out
unsupported options taking the default configuration into
account.
* gcc.target/sh/pr21255-2-ml.c: Skip if -mb or -m5* is
specified.  Remove redundant runtime checks.
* gcc.target/sh/20080410-1.c: Skip if -mb is specified.
Allow for other than -m4.  Fix typos in comments.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.h
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/20080410-1.c
trunk/gcc/testsuite/gcc.target/sh/pr21255-2-ml.c


[Bug target/22553] [4.4/4.5/4.6/4.7 regression] ICE building libstdc++

2011-11-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22553

--- Comment #20 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-09 
14:07:01 UTC ---
(In reply to comment #19)
 So I think the workaround from r105496 can be safely removed now and then 
 close
 this bug as fixed since 4.3.0

I've confirmed that there are no ICEs on SH with reverting
105496 change, though I can't get why does the change pointed
in #19 fix the issue pointed by Joern with
http://gcc.gnu.org/ml/gcc-patches/2005-09/msg01654.html


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-11-01 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #14 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-02 
00:57:59 UTC ---
(In reply to comment #13)
 Apparently this makes something believe that loading the FPUL register from a
 displacement address is possible, which is of course not the case.  However, I
 can't see any connection there...

.ira dump would be your friend, though I suspect that your patch
triggered off some other reload problem like PR48596.  Could you
try the change in #5 of that PR?


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-30 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-30 
23:36:27 UTC ---
(In reply to comment #6)
 I wonder whether there might be something in the target code that suggests the
 early optimizers to do that?  I've tried playing with the TARGET_ADDRESS_COST
 hook but it didn't have any effect in this case.

-ftree-dump-all shows that forward propagation on ssa trees makes
those memory accesses into simple array accesses.  You can try
-fno-tree-forwprop and see the effect of that option.
It seems that there are no special knobs to control forwprop from
the target side.
The problem is that SH target can't do those simple array accesses
well at QI/HImode because of the lack of displacement addressing
for those modes.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-27 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #12 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-27 
22:30:39 UTC ---
It seems that base_reg+index_reg addressing requires special
handling in RA and the move insn like

(define_insn *movqi_m_reg_reg_store
  [(set (mem:QI (plus:SI (match_operand:SI 0 arith_reg_operand %z)
 (match_operand:SI 1 arith_reg_operand r)))
(match_operand:QI 2 arith_reg_operand r))]
  TARGET_SH1
  mov.b%2,@(%0,%1)
  [(set_attr type store)])

might be unexpected for RA.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-26 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #8 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-27 
02:31:35 UTC ---
(In reply to comment #7)
 Created attachment 25622 [details]
 asmcons and ira pass log for the reload failure of z insn constraint

The original insn 13 was

(insn 13 12 14 3 (set (reg:SI 193)
(plus:SI (subreg:SI (reg:QI 191 [ MEM[(char *)buf1_4(D) + 4B] ]) 0)
(subreg:SI (reg:QI 192 [ MEM[(char *)buf0_1(D) + 5B] ]) 0)))

and RA chooses r1 and r0 as the registers to where memories will
be loaded.  The problem is we have no direct way to load buf1[4]
to r1.  In such situation, a secondary reload is needed.  See
the description of TARGET_SECONDARY_RELOAD in the gcc manual.
Here is a trial:

--- ORIG/trunk/gcc/config/sh/sh.c2011-10-16 10:18:53.0 +0900
+++ trunk/gcc/config/sh/sh.c2011-10-27 10:13:21.0 +0900
@@ -12430,6 +12453,10 @@ sh_secondary_reload (bool in_p, rtx x, r
   if (rclass != GENERAL_REGS  REG_P (x)
TARGET_REGISTER_P (REGNO (x)))
 return GENERAL_REGS;
+  if (rclass == GENERAL_REGS  mode == QImode
+   MEM_P (x)  GET_CODE (XEXP (x, 0)) == PLUS
+   CONST_INT_P (XEXP (XEXP (x, 0), 1)))
+return R0_REGS;
   return NO_REGS;
 }

The ICE for your testcase went away with it, though I've got

../../../INTEST/trunk/zlib/trees.c: In function 'send_tree':
../../../INTEST/trunk/zlib/trees.c:797:1: error: unable to find a register to
spill in class 'R0_REGS'
../../../INTEST/trunk/zlib/trees.c:797:1: error: this is the insn:
(insn 415 414 416 28 (set (mem:QI (plus:SI (reg/f:SI 6 r6 [orig:742
s_34(D)-pending_buf ] [742])
(reg:SI 7 r7 [orig:307 D.4248 ] [307])) [0 *D.4249_209+0 S1
A8])
(reg:QI 746 [ s_34(D)-bi_buf ]))
../../../INTEST/trunk/zlib/trees.c:780 206 {*movqi_m_reg_reg_store}
 (expr_list:REG_DEAD (reg:QI 746 [ s_34(D)-bi_buf ])
(expr_list:REG_DEAD (reg/f:SI 6 r6 [orig:742 s_34(D)-pending_buf ]
[742])
(expr_list:REG_DEAD (reg:SI 7 r7 [orig:307 D.4248 ] [307])
(nil)
../../../INTEST/trunk/zlib/trees.c:797:1: internal compiler error: in
spill_failure, at reload1.c:2118

when bootstrapping.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-24 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-24 
23:05:08 UTC ---
(In reply to comment #4)
It seems that clobbering R0 in that expander is simply papering
over the real problem.  Although the reload issue beyonds me,
.ira dump file about that impossible insn which doesn't satisfy
the z constraint would be a starting point.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-20 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #8 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-20 
22:40:27 UTC ---
(In reply to comment #7)

This problem doesn't require the theoretical/mathematical
completeness.  There are many inappropriate combinations
of options which don't get any warning when running compiler
and configurations.  The important thing is to warn very
confusing ones from the user's point of view.  So your patch
in #6 or even one liner in #2 would be OK and enough for
this PR, I think.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-10-20 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-21 
00:24:36 UTC ---
(In reply to comment #0)
 It is also not clear to me why SH2A seems to require different handling for
 dynamic shifts than SH3 or SH4...

Will be slightly different because sh2a's shadshld are 4-byte
insns.  Perhaps something like below will work, though I don't
test it at all.

diff -up ORIG/gcc/config/sh/sh.h gcc/config/sh/sh.h
--- ORIG/gcc/config/sh/sh.h2011-04-23 09:43:19.0 +0900
+++ gcc/config/sh/sh.h2011-10-21 08:15:25.0 +0900
@@ -2371,7 +2371,8 @@ extern int current_function_interrupt;
 #define ACCUMULATE_OUTGOING_ARGS TARGET_ACCUMULATE_OUTGOING_ARGS

 #define SH_DYNAMIC_SHIFT_COST \
-  (TARGET_HARD_SH4 ? 1 : TARGET_SH3 ? (optimize_size ? 1 : 2) : 20)
+  (TARGET_HARD_SH4 ? 1 : TARGET_SH3 ? (optimize_size ? 1 : 2) \
+   : TARGET_SH2A ? 2 : 20)


 #define NUM_MODES_FOR_MODE_SWITCHING { FP_MODE_NONE }
diff -up ORIG/gcc/config/sh/sh.c gcc/config/sh/sh.c
--- ORIG/gcc/config/sh/sh.c2011-07-29 09:31:42.0 +0900
+++ gcc/config/sh/sh.c2011-10-21 09:03:36.0 +0900
@@ -3246,7 +3246,7 @@ expand_ashiftrt (rtx *operands)
   char func[18];
   int value;

-  if (TARGET_SH3)
+  if (TARGET_SH3 || TARGET_SH2A)
 {
   if (!CONST_INT_P (operands[2]))
 {
diff -up ORIG/gcc/config/sh/sh.md gcc/config/sh/sh.md
--- ORIG/gcc/config/sh/sh.md2011-08-02 09:47:17.0 +0900
+++ gcc/config/sh/sh.md2011-10-21 08:58:49.0 +0900
@@ -3424,15 +3424,6 @@ label:
 ;;
 ;; shift left

-(define_insn ashlsi3_sh2a
-  [(set (match_operand:SI 0 arith_reg_dest =r)
-(ashift:SI (match_operand:SI 1 arith_reg_operand 0)
-   (match_operand:SI 2 arith_reg_operand r)))]
-  TARGET_SH2A
-  shad%2,%0
-  [(set_attr type arith)
-   (set_attr length 4)])
-
 ;; This pattern is used by init_expmed for computing the costs of shift
 ;; insns.

@@ -3441,14 +3432,14 @@ label:
 (ashift:SI (match_operand:SI 1 arith_reg_operand 0,0,0,0)
(match_operand:SI 2 nonmemory_operand r,M,P27,?ri)))
(clobber (match_scratch:SI 3 =X,X,X,r))]
-  TARGET_SH3
+  (TARGET_SH3 || TARGET_SH2A)
|| (TARGET_SH1  satisfies_constraint_P27 (operands[2]))
   @
shld%2,%0
add%0,%0
shll%O2%0
#
-  TARGET_SH3
+  (TARGET_SH3 || TARGET_SH2A)
 reload_completed
 CONST_INT_P (operands[2])
 ! satisfies_constraint_P27 (operands[2])
@@ -3457,7 +3448,11 @@ label:
 [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 3)))
  (clobber (match_dup 4))])]
   operands[4] = gen_rtx_SCRATCH (SImode);
-  [(set_attr length *,*,*,4)
+  [(set_attr_alternative length
+ [(if_then_else
+(ne (symbol_ref TARGET_SH2A) (const_int 0))
+(const_int 4) (const_int 2))
+ (const_int 2) (const_int 2) (const_int 4)])
(set_attr type dyn_shift,arith,arith,arith)])

 (define_insn ashlhi3_k
@@ -3584,15 +3579,6 @@ label:
 ; arithmetic shift right
 ;

-(define_insn ashrsi3_sh2a
-  [(set (match_operand:SI 0 arith_reg_dest =r)
-(ashiftrt:SI (match_operand:SI 1 arith_reg_operand 0)
-   (neg:SI (match_operand:SI 2 arith_reg_operand r]
-  TARGET_SH2A
-  shad%2,%0
-  [(set_attr type dyn_shift)
-   (set_attr length 4)])
-
 (define_insn ashrsi3_k
   [(set (match_operand:SI 0 arith_reg_dest =r)
 (ashiftrt:SI (match_operand:SI 1 arith_reg_operand 0)
@@ -3687,9 +3673,13 @@ label:
   [(set (match_operand:SI 0 arith_reg_dest =r)
 (ashiftrt:SI (match_operand:SI 1 arith_reg_operand 0)
  (neg:SI (match_operand:SI 2 arith_reg_operand r]
-  TARGET_SH3
+  TARGET_SH3 || TARGET_SH2A
   shad%2,%0
-  [(set_attr type dyn_shift)])
+  [(set_attr_alternative length
+ [(if_then_else
+(ne (symbol_ref TARGET_SH2A) (const_int 0))
+(const_int 4) (const_int 2))])
+   (set_attr type dyn_shift)])

 (define_insn ashrsi3_n
   [(set (reg:SI R4_REG)
@@ -3735,22 +3725,17 @@ label:

 ;; logical shift right

-(define_insn lshrsi3_sh2a
-  [(set (match_operand:SI 0 arith_reg_dest =r)
-(lshiftrt:SI (match_operand:SI 1 arith_reg_operand 0)
- (neg:SI (match_operand:SI 2 arith_reg_operand r]
-  TARGET_SH2A
-  shld%2,%0
-  [(set_attr type dyn_shift)
-   (set_attr length 4)])
-
 (define_insn lshrsi3_d
   [(set (match_operand:SI 0 arith_reg_dest =r)
 (lshiftrt:SI (match_operand:SI 1 arith_reg_operand 0)
  (neg:SI (match_operand:SI 2 arith_reg_operand r]
-  TARGET_SH3
+  TARGET_SH3 || TARGET_SH2A
   shld%2,%0
-  [(set_attr type dyn_shift)])
+  [(set_attr type dyn_shift)
+   (set_attr_alternative length
+ [(if_then_else
+(ne (symbol_ref TARGET_SH2A) (const_int 0))
+(const_int 4) (const_int 2))])])

 ;;  Only the single bit shift clobbers the T bit.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-19 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-19 
21:36:56 UTC ---
(In reply to comment #3)

USE_LOAD_POST_INCREMENT and USE_STORE_PRE_DECREMENT are used only
in move_by_pieces which is for some block operations when
MOVE_BY_PIECES_P says OK.  They don't disable post_inc/pre_dec
addressing for SI/DImode in general, I think.  It seems that they
are 0 for SI/DImode because we have addressing with display for
a limited size of memory chunk in these modes, though I'm wrong
about it.  I'm a bit curious to see what happens if they are changed
to non-zero for SI/DImode.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-18 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-18 
22:24:32 UTC ---
(In reply to comment #3)
There are no real uses of SH1/SH2/SH2E/SH3E cores anymore, I think.
I agree that taking care of -m2e is not worth.  Perhaps same for
-m1.  Anyway, your change looks plausible to me.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-18 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-18 
22:50:19 UTC ---
(In reply to comment #5)
 I'll send in a patch with a couple of other cosmetic changes later, OK?

Please go for it.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011-10-16
 Ever Confirmed|0   |1

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-16 
23:28:48 UTC ---
(In reply to comment #1)
Ah.  One liner

-#define DRIVER_SELF_SPECS %{m2a:%{ml:%eSH2a does not support little-endian}}
+#define DRIVER_SELF_SPECS %{m2a*:%{ml:%eSH2a does not support
little-endian}}

should work.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-16 
23:33:40 UTC ---
GCC makes usual mem accesses into those with post_inc/pre_dec at
auto_inc_dec pass.  I guess that auto_inc_dec pass can't find
post_inc insns well in that case because other tree/rtl optimizers
tweak the code already.  If this is the case, the problem would be
not target specific.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-17 
00:29:55 UTC ---
This is a known issue.  See the comment just before sh.c:sh_legitimate_index_p.
Unfortunately, I guess this PR might be marked as WONTFIX.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-17 
00:32:39 UTC ---
*** Bug 50750 has been marked as a duplicate of this bug. ***


[Bug target/50750] SH Target: Pre-decrement addressing used only for first memory access

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50750

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-17 
00:32:39 UTC ---
Looks duplicate of PR50749.

*** This bug has been marked as a duplicate of bug 50749 ***


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-17 
00:51:15 UTC ---
(In reply to comment #2)
 Yeah, I know this has been around for a while.
 I'd like to take my chances anyway :)

Welcome to the spill-failure-for-class-'R0_REGS' club :-)


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-14 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #12 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-14 
23:06:06 UTC ---
(In reply to comment #11)
 Created attachment 25491 [details]
 Proposed patch including test case

Looks fine.  A very minor style nits:

 +  if (GET_CODE (XEXP (x, 0)) == AND  /* tst instruction.  */

This comment looks a bit bogus.  A full sentence comment would
be better.

 +
 +

There are some extra empty lines.  GNU/GCC coding style says
that only one empty line is needed.  I know that there are
extra empty lines already, but we should not add new ones :-)


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-14 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #13 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-15 
02:32:56 UTC ---
Author: kkojima
Date: Sat Oct 15 02:32:53 2011
New Revision: 180020

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180020
Log:
PR target/49263
* config/sh/sh.h (ZERO_EXTRACT_ANDMASK): New macro.
* config/sh/sh.c (sh_rtx_costs): Add test instruction case.
* config/sh/sh.md (tstsi_t): Name existing insn.  Make inner
and instruction commutative.
(tsthi_t, tstqi_t, tstqi_t_zero, tstsi_t_and_not,
tstsi_t_zero_extract_eq, tstsi_t_zero_extract_xor,
tstsi_t_zero_extract_subreg_xor_little,
tstsi_t_zero_extract_subreg_xor_big): New insns.
(*movsicc_t_false, *movsicc_t_true): Replace space with tab in
asm output.
(*andsi_compact): Reorder alternatives so that K08 is considered
first.
* gcc.target/sh/pr49263.c: New.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-10 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #10 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-11 
01:47:03 UTC ---
(In reply to comment #9)
 3) only zero_extract special cases

looks to be dominant.

 I'm sorry, I forgot to mention that it was just a proof of concept hack
 of mine, just to see whether it has any chance to work at all.
 I think it would be better to change/fix the behavior of the combine pass
 in this regard, so that it tries matching combined patterns without
 sophisticated transformations. I will try asking on the gcc list about that.

I see.  I also expect that the experts have some idea for
this issue.

 I think it would be a bit too much checking out each individual pattern.

I don't think that it's too much.  Those numbers can be easily
collected for CSiBE.  If your patterns are named, you could
simply add -dap -save-temps to the compiler option which is
specified when ruining CSiBE's create-config and then get
the occurrences of testsi_6, for example, with something like
  grep testsi_6 `find . -name *.s -print` | wc -l
after running the CSiBE size test.


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #8 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-10 
01:31:42 UTC ---
(In reply to comment #7)
 Option 2 seems more robust even if it seems less effective, what do you think?

Another combine pass to reduce size less than 0.3% on one target
would be not acceptable, I guess.  ~10 new patterns would be
overkill for that result, though I'm still expecting that a few
patterns of them were dominant.  Could you get numbers which pattern
was used in the former option?


[Bug bootstrap/49486] [4.7 Regression] Bootstrap failure

2011-09-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49486

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-09-28 
21:43:06 UTC ---
Author: kkojima
Date: Wed Sep 28 21:43:01 2011
New Revision: 179320

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=179320
Log:
PR target/49486
* config/sh/sh.md (negdi2): Move expansion into split to
allow more combination options.  Add T_REG clobber.
(abssi2): New expander.
(*negdi2, *abssi2, *negabssi2): New insns.
(cneg): Change from insn to insn_and_split.  Rename to
negsi_cond.  Add alternative for non-SH4.
* gcc.target/sh/pr49468-si.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr49468-si.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog


[Bug tree-optimization/50287] [4.7 Regression] FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c compilation, -O2 -flto

2011-09-06 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50287

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-09-07 
00:26:13 UTC ---
(In reply to comment #4)
 Testcase that fails on i686-linux for me.

FYI, the testcase is failing also for arm-eabi, mips-elf and sh-elf.


[Bug tree-optimization/50287] New: FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c compilation, -O2 -flto

2011-09-03 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50287

 Bug #: 50287
   Summary: FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c
compilation, -O2 -flto
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kkoj...@gcc.gnu.org
Target: arm-eabi sh*-*-*


Several gcc.c-torture/execute/builtins/*-chk.c tests fail for ARM and SH
with -O2 -flto:

gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c: In function
'__vsnprintf_chk':
gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c:398:1: error: number of
operands and imm-links don't agree in statement
# .MEM_57 = VDEF .MEM_22
ap = ap_18(D);
gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c:398:1: internal compiler
error: verify_ssa failed

A reduced testcase for arm-eabi:

static char buf[4096];

int __attribute__((format(printf,4,0)))
foo (char *str, unsigned int len, unsigned int size, const char *fmt,
 __builtin_va_list ap);

int
foo (char *str, unsigned int len,  unsigned int size, const char *fmt,
 __builtin_va_list ap)
{
  if (!size)
return 0;

  if (size  len)
bar (str, buf, size + 1);
  else
bar (str, buf, len - 1);

  return 0;
}

It has started to fail after revision 178386.  It seems that
the fix for PR49886 reveals this issue.
-fno-partial-inlining makes the ICE go away.


[Bug target/50068] Invalid memory access in incr_ticks_for_insn

2011-08-17 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50068

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-17 
22:49:21 UTC ---
Author: kkojima
Date: Wed Aug 17 22:49:18 2011
New Revision: 177839

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=177839
Log:
PR target/50068
* config/sh/sh.c (sh_output_mi_thunk): Don't call dbr_schedule.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c


[Bug target/50068] Invalid memory access in incr_ticks_for_insn

2011-08-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50068

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Target|shle--netbsdelf |sh*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011-08-16
 CC||kkojima at gcc dot gnu.org
 Ever Confirmed|0   |1
  Known to fail||4.4.6, 4.5.3, 4.6.1, 4.7.0

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-16 
13:00:12 UTC ---
I've added gcc_assert (last_basic_block = NUM_FIXED_BLOCKS) line
to init_resource_info and confirmed that trunk and all released branches
fail with the testcase given in #1 for sh4-unknown-linux-gnu.
Perhaps

  if (optimize  0  flag_delayed_branch)
dbr_schedule (insns);

in sh.c:sh_output_mi_thunk might not be a big deal.  I'm testing
a patch which simply removes these lines.


[Bug rtl-optimization/49977] [4.7 Regression] CFI notes are missed for delayed slot

2011-08-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49977

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED

--- Comment #11 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-10 
02:41:57 UTC ---
Now the testresult for hppa64-hp-hpux11.11 looks good
http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00952.html

I'd like to close this PR.


[Bug rtl-optimization/49686] [4.7 Regression] CFI notes are missed for delayed slot

2011-08-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49686

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-04 
12:18:09 UTC ---
It seems that the problem comes back on trunk revision 177305
for SH.  There are many EH test failures which went away with
-fno-delayed-branch and the testcase in #1 is assembled to

foo:
.LFB0:
tstr4,r4
bt/s.L2
sts.lpr,@-r15
mov.l.L3,r0
jsr@r0
nop

with -O1 -fexceptions -fnon-call-exceptions.


[Bug rtl-optimization/49686] [4.7 Regression] CFI notes are missed for delayed slot

2011-08-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49686

--- Comment #8 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-04 
13:57:14 UTC ---
Thanks for checking cris-elf.  I'd like to open a new PR.


[Bug rtl-optimization/49977] New: [4.7 Regression] CFI notes are missed for delayed slot

2011-08-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49977

   Summary: [4.7 Regression] CFI notes are missed for delayed slot
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kkoj...@gcc.gnu.org
CC: r...@gcc.gnu.org, h...@gcc.gnu.org
Target: sh4-unknown-linux-gnu, cris-elf


Many EH tests fail on SH and CRIS.  These failures went away with
-fno-delayed-branch on SH.  The symptoms are quite similar to those
of PR49686.


[Bug rtl-optimization/49977] [4.7 Regression] CFI notes are missed for delayed slot

2011-08-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49977

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-04 
21:18:55 UTC ---
(In reply to comment #2)
 Kaz, can you enumerate some specific tests that are now failing?

I've got

FAIL: gcc.dg/cleanup-10.c execution test
FAIL: gcc.dg/cleanup-11.c execution test

FAIL: g++.dg/eh/crossjump1.C execution test
FAIL: g++.dg/eh/unexpected1.C execution test
FAIL: g++.dg/ext/cleanup-10.C execution test
FAIL: g++.dg/ext/cleanup-11.C execution test
FAIL: g++.dg/torture/pr49115.C  -O1  execution test
...

A tiny testcase in #1 of PR49686

int foo (int a)
{
  if (a)
bar ();
  return 1;
}

is again compiled to

foo:
.LFB0:
tstr4,r4
bt/s.L2
sts.lpr,@-r15
mov.l.L3,r0
jsr@r0
nop

with -O1 -fexceptions -fnon-call-exceptions.


[Bug rtl-optimization/49982] New: [4.7 Regression] ICE in fixup_args_size_notes, at expr.c:3625

2011-08-04 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49982

   Summary: [4.7 Regression] ICE in fixup_args_size_notes, at
expr.c:3625
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kkoj...@gcc.gnu.org
CC: r...@gcc.gnu.org
Target: sh-*-*


For sh-elf, gcc.c-torture/compile/20030224-1.c fails at -O0 -m4 with ICE:

internal compiler error: in fixup_args_size_notes, at expr.c:3625

#0  fancy_abort (file=0x88edae8 ../../ORIG/trunk/gcc/expr.c, line=3625, 
function=0x88ee6af fixup_args_size_notes)
at ../../ORIG/trunk/gcc/diagnostic.c:893
#1  0x0828560f in fixup_args_size_notes (prev=0xb7f8e18c, last=0xb7f8e1b0, 
end_args_size=0) at ../../ORIG/trunk/gcc/expr.c:3625

where prev and last are

(gdb) call debug_rtx(prev)
(insn 183 182 304 6 (clobber (mem:BLK (reg/f:SI 15 r15) [0 A8]))
20030224-1.c:16 -1
 (nil))
(gdb) call debug_rtx(last)
(insn 184 305 185 6 (set (reg/f:SI 15 r15)
(reg/f:SI 15 r15)) 20030224-1.c:16 176 {movsi_ie}
 (expr_list:REG_ARGS_SIZE (const_int 0 [0])
(expr_list:REG_DEAD (reg:SI 76 fr12 [260])
(nil

It seems that the latter (set stack_pointer_rtx stack_pointer_rtx)
insn confuses fixup_args_size_notes.  The patch below works for me.

--- ORIG/trunk/gcc/expr.c2011-08-04 10:13:24.0 +0900
+++ trunk/gcc/expr.c2011-08-04 20:53:14.0 +0900
@@ -3628,6 +3628,8 @@ fixup_args_size_notes (rtx prev, rtx las
XEXP (SET_SRC (set), 0) == stack_pointer_rtx
CONST_INT_P (XEXP (SET_SRC (set), 1)))
 this_delta = INTVAL (XEXP (SET_SRC (set), 1));
+  else if (SET_SRC (set) == stack_pointer_rtx)
+this_delta = 0;
   else
 saw_unknown = true;
 }


[Bug rtl-optimization/48596] [4.7 Regression] [SH] unable to find a register to spill in class 'FPUL_REGS'

2011-08-02 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48596

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-08-02 
23:53:30 UTC ---
I was trying to find a way that solves it without penalizing -O2
or the higher cases, though it's not easy to me.  It seems that
the target's register_move_cost is the way to discourage trying
to use FP registers for a pointer.  Unfortunately, Pmode is simply
SImode for our case and it also discourages using a FP reg as
a cheap storage for SImode.  I've tried

--- ORIG/trunk/gcc/config/sh/sh.c2011-08-01 09:22:27.0 +0900
+++ trunk/gcc/config/sh/sh.c2011-08-01 09:41:25.0 +0900
@@ -11472,8 +11472,18 @@ sh_register_move_cost (enum machine_mode
 REGCLASS_HAS_GENERAL_REG (srcclass))
   || (REGCLASS_HAS_GENERAL_REG (dstclass)
REGCLASS_HAS_FP_REG (srcclass)))
-return ((TARGET_SHMEDIA ? 4 : TARGET_FMOVD ? 8 : 12)
-* ((GET_MODE_SIZE (mode) + 7) / 8U));
+{
+  if (TARGET_SHMEDIA)
+return 4 * ((GET_MODE_SIZE (mode) + 7) / 8U);
+  else
+{
+  /* Discourage trying to use fp regs for a pointer.  */
+  int addend = (mode == Pmode) ? 40 : 0;
+
+  return (((TARGET_FMOVD ? 8 : 12) + addend)
+  * ((GET_MODE_SIZE (mode) + 7) / 8U));
+}
+}

   if ((dstclass == FPUL_REGS
 REGCLASS_HAS_GENERAL_REG (srcclass))

on the current trunk and observed some CSiBE testresults.  A bit
surprisingly, there are no code size regressions and one 2%
improvement for teem-1.6.0-src src/bane/gkmsTxf which reduces
to 3192 bytes from 3256 bytes.  Now I'm inclined to apply it
on trunk if it passes the bootstrap/regression/other tests.


[Bug target/49880] SuperH: ICE when -m4 is used with -mdiv=call-div1

2011-07-31 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49880

--- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-07-31 
23:01:17 UTC ---
Author: kkojima
Date: Sun Jul 31 23:01:14 2011
New Revision: 176990

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=176990
Log:
PR target/49880
* config/sh/sh.md (udivsi3_i1): Enable for TARGET_DIVIDE_CALL_DIV1.
(divsi3_i1): Likewise.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md


[Bug target/49880] SuperH: ICE when -m4 is used with -mdiv=call-div1

2011-07-28 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49880

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

 Target|shle--netbsdelf |sh*-*-*
 Status|UNCONFIRMED |NEW
   Keywords||ice-on-valid-code
   Last reconfirmed||2011.07.28 22:50:01
 CC||kkojima at gcc dot gnu.org
   Host|i386--netbsdelf |
 Ever Confirmed|0   |1
  Known to fail||4.2.5, 4.3.6, 4.4.7, 4.5.5,
   ||4.6.2, 4.7.0
  Build|i386--netbsdelf |

--- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-07-28 
22:50:01 UTC ---
I've confirmed that trunk and all released compilers fail with
-m4 -mdiv=call-div1.  I'm testing the patch below.

* config/sh/sh.md (udivsi3_i1): Enable for TARGET_DIVIDE_CALL_DIV1.
(divsi3_i1): Likewise.

--- ORIG/trunk/gcc/config/sh/sh.md2011-07-20 09:27:11.0 +0900
+++ trunk/gcc/config/sh/sh.md2011-07-28 06:49:41.0 +0900
@@ -1609,7 +1609,7 @@
(clobber (reg:SI PR_REG))
(clobber (reg:SI R4_REG))
(use (match_operand:SI 1 arith_reg_operand r))]
-  TARGET_SH1  ! TARGET_SH4
+  TARGET_SH1  (! TARGET_SH4 || TARGET_DIVIDE_CALL_DIV1)
   jsr@%1%#
   [(set_attr type sfunc)
(set_attr needs_delay_slot yes)])
@@ -1815,7 +1815,7 @@
(clobber (reg:SI R2_REG))
(clobber (reg:SI R3_REG))
(use (match_operand:SI 1 arith_reg_operand r))]
-  TARGET_SH1  ! TARGET_SH4
+  TARGET_SH1  (! TARGET_SH4 || TARGET_DIVIDE_CALL_DIV1)
   jsr@%1%#
   [(set_attr type sfunc)
(set_attr needs_delay_slot yes)])


[Bug rtl-optimization/49686] New: [4.7 Regression] CFI notes are missed for delayed slot

2011-07-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49686

   Summary: [4.7 Regression] CFI notes are missed for delayed slot
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: EH
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kkoj...@gcc.gnu.org
CC: r...@gcc.gnu.org
Target: sh4-unknown-linux-gnu


Many EH tests fail on SH after the recent dwarf2 clean up.  These
failures went away with -fno-delayed-branch.  A tiny testcase is

int foo (int a)
{
  if (a)
bar ();
  return 1;
}

and with -O1 -fexceptions -fnon-call-exceptions, its assember
output of the new compiler starts like

foo:
.LFB0:
tstr4,r4
bt/s.L2
sts.lpr,@-r15

while the old compiler outputs CFI for the last frame related
insn sts.l pr,@-r15 in the delayed slot:

foo:
.LFB0:
tstr4,r4
.LCFI0:
bt/s.L2
sts.lpr,@-r15

It seems that dwarf2out_frame_debug emits CFI notes at the middle
of the elements of SEQUENCE and they were lost.  The patch below
works for me.

--- ORIG/trunk/gcc/dwarf2cfi.c2011-07-09 14:42:50.0 +0900
+++ trunk/gcc/dwarf2cfi.c2011-07-09 14:46:18.0 +0900
@@ -2170,11 +2170,10 @@ dwarf2out_frame_debug_expr (rtx expr)
sets SP or FP (adjusting how we calculate the frame address) or saves a
register to the stack.  If INSN is NULL_RTX, initialize our state.

-   If AFTER_P is false, we're being called before the insn is emitted,
-   otherwise after.  Call instructions get invoked twice.  */
+   Notes are inserted at WHERE.  Call instructions get invoked twice.  */

 static void
-dwarf2out_frame_debug (rtx insn, bool after_p)
+dwarf2out_frame_debug (rtx insn, rtx where)
 {
   rtx note, n;
   bool handled_one = false;
@@ -2183,13 +2182,13 @@ dwarf2out_frame_debug (rtx insn, bool af
   /* Remember where we are to insert notes.  Do not separate tablejump
  insns from their ADDR_DIFF_VEC.  Putting the note after the VEC
  should be ok.  */
-  if (after_p)
+  if (insn == where)
 {
   if (!tablejump_p (insn, NULL, cfi_insn))
-cfi_insn = insn;
+cfi_insn = where;
 }
   else
-cfi_insn = PREV_INSN (insn);
+cfi_insn = where;

   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
 dwarf2out_flush_queued_reg_saves ();
@@ -2200,7 +2199,7 @@ dwarf2out_frame_debug (rtx insn, bool af
  matter if the stack pointer is not the CFA register anymore but
  is still used to save registers.  */
   if (!ACCUMULATE_OUTGOING_ARGS)
-dwarf2out_notice_stack_adjust (insn, after_p);
+dwarf2out_notice_stack_adjust (insn, (insn == where));
   cfi_insn = NULL;
   return;
 }
@@ -2434,7 +2433,7 @@ create_cfi_notes (void)

   if (BARRIER_P (insn))
 {
-  dwarf2out_frame_debug (insn, false);
+  dwarf2out_frame_debug (insn, PREV_INSN (insn));
   continue;
 }

@@ -2469,7 +2468,7 @@ create_cfi_notes (void)
   pat = PATTERN (insn);
   if (asm_noperands (pat) = 0)
 {
-  dwarf2out_frame_debug (insn, false);
+  dwarf2out_frame_debug (insn, PREV_INSN (insn));
   continue;
 }

@@ -2477,14 +2476,14 @@ create_cfi_notes (void)
 {
   int i, n = XVECLEN (pat, 0);
   for (i = 1; i  n; ++i)
-dwarf2out_frame_debug (XVECEXP (pat, 0, i), false);
+dwarf2out_frame_debug (XVECEXP (pat, 0, i), PREV_INSN (insn));
 }

   if (CALL_P (insn)
   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
-dwarf2out_frame_debug (insn, false);
+dwarf2out_frame_debug (insn, PREV_INSN (insn));

-  dwarf2out_frame_debug (insn, true);
+  dwarf2out_frame_debug (insn, insn);
 }
 }


[Bug rtl-optimization/49686] [4.7 Regression] CFI notes are missed for delayed slot

2011-07-09 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49686

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-07-09 
21:13:52 UTC ---
Thanks for the quick fix!

(In reply to comment #1)
 Does the regression look something like this?

For sh, the failures were

FAIL: g++.dg/compat/eh/unexpected1 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/cpp0x/lambda/lambda-eh2.C execution test
FAIL: g++.dg/eh/crossjump1.C execution test
FAIL: g++.dg/eh/unexpected1.C execution test
FAIL: g++.dg/ext/cleanup-10.C execution test
FAIL: g++.dg/ext/cleanup-11.C execution test
FAIL: g++.dg/torture/pr49115.C  -O1  execution test
...

FAIL: 18_support/exception_ptr/lifespan.cc execution test
FAIL: 18_support/nested_exception/rethrow_if_nested.cc execution test
FAIL: 18_support/nested_exception/throw_with_nested.cc execution test
FAIL: 20_util/function/1.cc execution test
FAIL: 20_util/hash/chi2_quality.cc execution test
FAIL: 20_util/hash/quality.cc execution test
FAIL: 21_strings/basic_string/append/char/1.cc execution test
FAIL: 21_strings/basic_string/append/wchar_t/1.cc execution test
FAIL: 21_strings/basic_string/cons/char/1.cc execution test
FAIL: 21_strings/basic_string/cons/char/3.cc execution test
...


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-27 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-27 
06:39:40 UTC ---
Argh, I also missed clobbers.  Looks fine to me now, except
that insn_and_split *negdi2 forgot to set constraints and
some minor coding style issues below.

The first comment should be started with a capital letter and
ended with a period.  Also please follow GCC C coding style
even for C program segments in .md file.  C lines in the patch
are started with a tab instead of 2 spaces.  A long conditional
should be broken like as

  (cond
   ? value0
   : value1)

instead of

  (cond ?
 value0 :
 value1)

Please use braces


{
  int low_word = ...
  ...

  emit_insn (...
  DONE;
})

instead of


  int low_word = ...
  ...

  emit_insn (...
  DONE;
)

especially when new variables are used, though those braces
aren't required with the current gen* tools.

 + emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1], 
 + GEN_INT (1)));

The first line has an extra space after the last comma and
the indentation of the 2nd line doesn't match with GCC coding
standard.  BTW, you could use const[01]_rtx for GEN_INT ([01]):

  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
 const1_rtx));

There are similar extra white space + broken indentation issues:

 +(define_insn_and_split negsi_cond
 +  [(set (match_operand:SI 0 arith_reg_dest =r,r)
 + (if_then_else:SI (eq:SI (reg:SI T_REG) 
 + (match_operand:SI 3 
 const_int_operand M,N))
...
 +   emit_label_after (skip_neg_label, 
 + emit_insn (gen_negsi2 
 (operands[0], operands[1])));
...

Perhaps mail or editor problem?


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-06-26 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #6 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-27 
05:14:36 UTC ---
(In reply to comment #5)
 Anyway, why not just add all the currently known-to-work cases? What are your
 concerns regarding that? I can imagine that it is a maintenance burden to keep
 all those definitions and special cases in the MD up-to-date (bit rot etc). Do
 you have anything other than that in mind? 

Yep, maintenance burden but I don't mean ack/nak for anything.
If it's enough fruitful, we should take that route.  When it
gives 5% improvement in the usual working set like as CSiBE,
hundreds lines would be OK, but if it's ~0.5% or less, it doesn't
look worth to add many patterns for that.

 Isn't there a way to tell the combine pass not to do so, but instead first 
 look
 deeper at what is in the MD?

I don't know how to do it cleanly.

 I guess this might generate wrong code for e.g. if (x  -2). When x has any
 bits[31:1] set this must return true. The code after the peephole optimization
 will look only at the lower 8 bits and would possibly return false for x =
 0xFF00, which is wrong. So it should be satisfies_constraint_K08 only,
 shouldn't it?

You are right.  That peephole was simply 'something like this'.


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-06-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-22 
22:34:04 UTC ---
Yes, that peephole doesn't catch all the patterns which could
make tst #imm8,r0 use.  Perhaps it would be a good idea to get
numbers for the test like CSiBE test with the vanilla and new
insns/peepholes patched compilers.  Something covers 80% of
the possible cases in the usual working set, it would be enough
successful for such a micro-optimization, I guess.

Cost patch looks fine to me.  Could you propose it as a separate
patch on gcc-patches list with an appropriate ChangeLog entry?
When proposing it, please refer how you've tested it.  Also
the numbers got with the patch are highly welcome.

BTW, do you have FSF copyright assignment for your GCC work?
Although the cost patch itself is essentially several lines which
doesn't require copyright assignment, the other changes you've
proposed clearly require the paper work, I think.


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

Kazumoto Kojima kkojima at gcc dot gnu.org changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-22 
22:37:28 UTC ---
On sh4-unknown-linux-gnu, this patch causes two new failures on
libstdc++ testsuite

FAIL: 27_io/basic_ostream/inserters_arithmetic/char/7.cc execution test
FAIL: 27_io/basic_ostream/inserters_arithmetic/wchar_t/7.cc execution test

I can't find any differences between generated codes for those
test cases by compilers with/without your patch and the failures
go away if the tests are running with libstdc++ library built
with the unpatched compiler.
So it seems that something in libstdc++ library is miscompiled.
Weired and hard to see what is going on, ATM.


[Bug target/49307] [4.5/4.6/4.7 Regression] ICE in spill_failure, at reload1.c:2113

2011-06-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49307

--- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-16 
22:02:48 UTC ---
Author: kkojima
Date: Thu Jun 16 22:02:45 2011
New Revision: 175116

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=175116
Log:
PR target/49307
* config/sh/sh.md (UNSPEC_CHKADD): New.
(chk_guard_add): New define_insn_and_split.
(symGOT_load): Use chk_guard_add instead of blockage.


Added:
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/pr49307.c
Modified:
branches/gcc-4_6-branch/gcc/ChangeLog
branches/gcc-4_6-branch/gcc/config/sh/sh.md
branches/gcc-4_6-branch/gcc/testsuite/ChangeLog


[Bug target/49307] [4.5/4.6/4.7 Regression] ICE in spill_failure, at reload1.c:2113

2011-06-16 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49307

--- Comment #5 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-06-16 
22:08:23 UTC ---
Author: kkojima
Date: Thu Jun 16 22:08:20 2011
New Revision: 175118

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=175118
Log:
PR target/49307
* config/sh/sh.md (UNSPEC_CHKADD): New.
(chk_guard_add): New define_insn_and_split.
(symGOT_load): Use chk_guard_add instead of blockage.


Added:
branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/pr49307.c
Modified:
branches/gcc-4_5-branch/gcc/ChangeLog
branches/gcc-4_5-branch/gcc/config/sh/sh.md
branches/gcc-4_5-branch/gcc/testsuite/ChangeLog


<    1   2   3   4   5   6   >