Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-20 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?
 
 I'm ask about code size because the IRA pass isn't work with
 `scratch:MODE' at all.
 This lead to bad/wrong register allocation in IRA pass.
 The reload pass will correct such a wrong allocation, but reload can't
 generate optimal code. (reload generate correct code).
 Because of that, may be you right and may be better to have
 (clobber (match_operand)) instead of (clobber (match_scratch...)).

So the conclusion is not to commit this patch and that the one-liner
already installed is sufficient to fix the ICE?

 
 Denis.
 



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-20 Thread Denis Chertykov
2011/4/20 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:
 2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?

 I'm ask about code size because the IRA pass isn't work with
 `scratch:MODE' at all.
 This lead to bad/wrong register allocation in IRA pass.
 The reload pass will correct such a wrong allocation, but reload can't
 generate optimal code. (reload generate correct code).
 Because of that, may be you right and may be better to have
 (clobber (match_operand)) instead of (clobber (match_scratch...)).

 So the conclusion is not to commit this patch and that the one-liner
 already installed is sufficient to fix the ICE?

Yes.

Denis.


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/18 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:
 2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?
 I'm ask about code size because the IRA pass isn't work with
 `scratch:MODE' at all.
 I wonder what the difference is; IRA could treat scratch:MODE just the
 same way as pseudo:MODE. Anyway, thanks for that insight. It's merely
 impossible to get the big picture of IRA/reload from digging sources.

 So that means IRA generates garbage for scratch and reload has to fix
 it, generate spill code if needed, etc?
 
 No garbage, IRA just ignore scratch.
 The reload pass will allocate scratch registers if it's possible.
 If it's impossible, then reload pass will spill and generate reload for
 insns with scratches.
 
 Does this mean that if IRA sees a pseudo together with constraint X
 it will allocate the pseudo, anyway?
 
 I think so, but better to test IRA.
 
 This lead to bad/wrong register allocation in IRA pass.
 The reload pass will correct such a wrong allocation, but reload can't
 generate optimal code. (reload generate correct code).
 Because of that, may be you right and may be better to have
 (clobber (match_operand)) instead of (clobber (match_scratch...)).
 I don't see effects on code size, at least for the binaries left by
 testsuite.

 Is there a policy how to drive size-performance tests for avr?
 Anatoly posted a result on code size some weeks ago (concerning move
 costs), it would be interesting to know his setup and what
 sources/options he uses. Using testsuite seems not appropriate to me,
 because it runs -O3, -finline etc. and most code is no real-world
 code, at least not for avr.
 
 IMHO libgcc is a good test case.

I don't see differences there, either. This is presumably because
rotate pattern is not used resp. used only very seldom.

 FYI, what I observe is a remarkable dependency on subreg lowering that
 can be generalized as this:
 * code size decreases for mode = SImode
 * code size increases for mode  SImode
 
 I think that DImode move instructions must be supported for representative
 test results for mode  SImode.
 
 
 Personally I'm don't want to support DImode for AVR.
 IMHO DImode is an overkill for 8bits controller.

Some people use 64bit on AVR. But to support a movdi would be very
painful, I think it would cause more harm like spill failures than good.

 Another one note:
 Few years ago I have played with early splitting of anything possible
 (move,add,sub,and,...). The results was very bad.
 It's happened because flow of splitted insns (8bits insns) becomes
 unreadable for most of GCC optimisation passes.
 May be splitting is appropriate before register allocation or even after
 reload pass.

The -f[no-]split-wide-types happens before reload, in .subreg1 (prior
to combine) resp. .subreg2 (after combine).

For 64bit, without subreg lowering, there is an insn like

(clobber (reg:DI 43 [ retval ]))

which causes bunch of moves and finally the setup of a frame pointer
even though nothing lives in the frame.

How can add, sub etc. be split? This would need an explicit
representation of carry.

 Denis.
 PS: about code size tests: try to email directly to Anatoly.
 PPS: I can try to call him.

Johann



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Denis Chertykov
2011/4/19 Georg-Johann Lay a...@gjlay.de:
 How can add, sub etc. be split? This would need an explicit
 representation of carry.

Yes.

Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html

Denis.


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/19 Georg-Johann Lay a...@gjlay.de:
 How can add, sub etc. be split? This would need an explicit
 representation of carry.
 
 Yes.
 
 Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html

Just skimmed the conversation. I thought about making AVR ISA's
effects on SREG explicit several times, but I always got stuck at some
point.

- It's not only about scheduling (which does not happen for avr) but
  also about moving instructions across jumps.

- Many transformations would happen before reload, but at these stages
the effect on SREG is not yet known in many cases. There is
sophisticated instruction output for many patterns, and their impact
on SREG/CC is not known before reload.

- Making CC explicit would render many single_set insns to PARALLELs
making the optimizers' live much harder or impossible. Imagine
instructions that could be combined. Explicit CC would clutter up
insns and combine won't try to transform the bulky patterns.

- Backend would be much more complicated, harder to maintain and
understand. Almost any insn would have to be changed.

Johann

 
 Denis.
 



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Richard Earnshaw

On Tue, 2011-04-19 at 15:17 +0400, Denis Chertykov wrote:
 2011/4/19 Georg-Johann Lay a...@gjlay.de:
  Denis Chertykov schrieb:
  2011/4/19 Georg-Johann Lay a...@gjlay.de:
  How can add, sub etc. be split? This would need an explicit
  representation of carry.
 
  Yes.
 
  Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html
 
  Just skimmed the conversation. I thought about making AVR ISA's
  effects on SREG explicit several times, but I always got stuck at some
  point.
 
  - It's not only about scheduling (which does not happen for avr) but
   also about moving instructions across jumps.
 
  - Many transformations would happen before reload, but at these stages
  the effect on SREG is not yet known in many cases. There is
  sophisticated instruction output for many patterns, and their impact
  on SREG/CC is not known before reload.
 
  - Making CC explicit would render many single_set insns to PARALLELs
  making the optimizers' live much harder or impossible. Imagine
  instructions that could be combined. Explicit CC would clutter up
  insns and combine won't try to transform the bulky patterns.
 
  - Backend would be much more complicated, harder to maintain and
  understand. Almost any insn would have to be changed.
 
 Generally, I'm agree with you, the AVR port uses CC0 because of that.

Thumb-1 support in the ARM compiler has similar flag-setting properties
(ie most instructions set the condition codes).  It doesn't use CC0.  It
works because it doesn't model the condition code register at all, but
treats compare/branch sequences as indivisible operations.

R.




Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-18 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?
 
 I'm ask about code size because the IRA pass isn't work with
 `scratch:MODE' at all.

I wonder what the difference is; IRA could treat scratch:MODE just the
same way as pseudo:MODE. Anyway, thanks for that insight. It's merely
impossible to get the big picture of IRA/reload from digging sources.

So that means IRA generates garbage for scratch and reload has to fix
it, generate spill code if needed, etc?

Does this mean that if IRA sees a pseudo together with constraint X
it will allocate the pseudo, anyway?

 This lead to bad/wrong register allocation in IRA pass.
 The reload pass will correct such a wrong allocation, but reload can't
 generate optimal code. (reload generate correct code).
 Because of that, may be you right and may be better to have
 (clobber (match_operand)) instead of (clobber (match_scratch...)).

I don't see effects on code size, at least for the binaries left by
testsuite.

Is there a policy how to drive size-performance tests for avr?
Anatoly posted a result on code size some weeks ago (concerning move
costs), it would be interesting to know his setup and what
sources/options he uses. Using testsuite seems not appropriate to me,
because it runs -O3, -finline etc. and most code is no real-world
code, at least not for avr.

FYI, what I observe is a remarkable dependency on subreg lowering that
can be generalized as this:
* code size decreases for mode = SImode
* code size increases for mode  SImode

An example for gcc.target/avr/torture/pr41885.c -Os

-fno-split-wide-types

 0008 T rotl_16a
0008 0006 T rotl_16b
0014 0012 T rotl_32a
0026 0010 T rotl_32b
0036 0006 T rotl_32c
0042 0010 T rotl_32d
0052 0010 T rotl_32e
0062 0088 T rotl_64
0150 0106 T rotl_64a
0256 0082 T rotl_64b
0338 0096 T rotl_64c
0434 0098 T rotl_64d
0532 0100 T rotl_64e
0632 0102 T rotl_64f
0734 0104 T rotl_64g
0838 0110 T rotl_64h

-fsplit-wide-types

 0008 T rotl_16a
0008 0006 T rotl_16b
0014 0024 T rotl_32a
0038 0020 T rotl_32b
0058 0024 T rotl_32c
0082 0022 T rotl_32d
0104 0022 T rotl_32e
0126 0036 T rotl_64
0162 0050 T rotl_64a
0212 0026 T rotl_64b
0238 0036 T rotl_64c
0274 0038 T rotl_64d
0312 0040 T rotl_64e
0352 0042 T rotl_64f
0394 0044 T rotl_64g
0438 0038 T rotl_64h

Johann

 
 Denis.
 



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-18 Thread Denis Chertykov
2011/4/18 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:
 2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?

 I'm ask about code size because the IRA pass isn't work with
 `scratch:MODE' at all.

 I wonder what the difference is; IRA could treat scratch:MODE just the
 same way as pseudo:MODE. Anyway, thanks for that insight. It's merely
 impossible to get the big picture of IRA/reload from digging sources.

 So that means IRA generates garbage for scratch and reload has to fix
 it, generate spill code if needed, etc?

No garbage, IRA just ignore scratch.
The reload pass will allocate scratch registers if it's possible.
If it's impossible, then reload pass will spill and generate reload for
insns with scratches.

 Does this mean that if IRA sees a pseudo together with constraint X
 it will allocate the pseudo, anyway?

I think so, but better to test IRA.

 This lead to bad/wrong register allocation in IRA pass.
 The reload pass will correct such a wrong allocation, but reload can't
 generate optimal code. (reload generate correct code).
 Because of that, may be you right and may be better to have
 (clobber (match_operand)) instead of (clobber (match_scratch...)).

 I don't see effects on code size, at least for the binaries left by
 testsuite.

 Is there a policy how to drive size-performance tests for avr?
 Anatoly posted a result on code size some weeks ago (concerning move
 costs), it would be interesting to know his setup and what
 sources/options he uses. Using testsuite seems not appropriate to me,
 because it runs -O3, -finline etc. and most code is no real-world
 code, at least not for avr.

IMHO libgcc is a good test case.

 FYI, what I observe is a remarkable dependency on subreg lowering that
 can be generalized as this:
 * code size decreases for mode = SImode
 * code size increases for mode  SImode

I think that DImode move instructions must be supported for representative
test results for mode  SImode.


Personally I'm don't want to support DImode for AVR.
IMHO DImode is an overkill for 8bits controller.


Another one note:
Few years ago I have played with early splitting of anything possible
(move,add,sub,and,...). The results was very bad.
It's happened because flow of splitted insns (8bits insns) becomes
unreadable for most of GCC optimisation passes.
May be splitting is appropriate before register allocation or even after
reload pass.

Denis.
PS: about code size tests: try to email directly to Anatoly.
PPS: I can try to call him.


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-18 Thread Denis Chertykov
2011/4/18 Denis Chertykov cherty...@gmail.com:
 2011/4/18 Georg-Johann Lay a...@gjlay.de:
 Few years ago I have played with early splitting of anything possible
 (move,add,sub,and,...). The results was very bad.
 It's happened because flow of splitted insns (8bits insns) becomes
 unreadable for most of GCC optimisation passes.
 May be splitting is appropriate before register allocation or even after
 reload pass.

May be something like this (early splitting) happened with the subreg
lowering.

Denis.


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-17 Thread Denis Chertykov
2011/4/17 Denis Chertykov cherty...@gmail.com:
 2011/4/15 Georg-Johann Lay a...@gjlay.de:
 Finally, I exposed alternative #3 of the insns to the register
 allocator, because it is not possible to distinguish between
 overlapping or non-overlapping regs, and #3 does not need a scratch.

 Ran C-testsuite with no regressions.

 Are you encountered any difference in code size ?

I'm ask about code size because the IRA pass isn't work with
`scratch:MODE' at all.
This lead to bad/wrong register allocation in IRA pass.
The reload pass will correct such a wrong allocation, but reload can't
generate optimal code. (reload generate correct code).
Because of that, may be you right and may be better to have
(clobber (match_operand)) instead of (clobber (match_scratch...)).

Denis.


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-15 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/14 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:
 2011/4/14 Georg-Johann Lay a...@gjlay.de:
 The rotlmode3 expanders (mode \in {HI,SI,DI}) violates synopsis by
 using 4 operands instead of 3. This runs in ICE in top of
 optabs.c:maybe_gen_insn.

 The right way to do this is to use match_dup, not match_operand. So
 the fix is obvious.

 Regenerated avr-gcc and run against avr testsuite:

 gcc.target/avr/torture/pr41885.c generates these patterns

 Johann

 2011-04-14  Georg-Johann Lay  a...@gjlay.de

* config/avr/avr.md (rotlhi3, rotlsi3, rotldi3): Fix
expanders operand 3 from match_operand to match_dup.
 May be better to use (clobber (match_scratch ...)) here and in
 `*rotwmode' and `*rotbmode'.
 These are splitters that might split before reload, producing strange,
 non-matching insns like
  (set (scratch:QI) ...
 or crashing already in avr_rotate_bytes becase the split condition reads
  (reload_completed || MODEmode == DImode)
 
 Generally I'm agree with you, change match_operand to match_dup is easy.

I already submitted the easy patch to avoid the ICE.

 But IMHO the right way is:
  - the splitters and avr_rotate_bytes are wrong and must be fixed too.
  - operands[3] is a scratch register and right way is to use match_scratch.
 
 I can approve the patch.
 
 Denis.

Ok, here is the right-way patch.

The expander generates a SCRATCH instead of a pseudo, and in case of a
pre-reload split the SCRATCH is replaced with a pseudo because it is
not known if or if not the scratch will actually be needed.

avr_rotate_bytes now skips operations on non-REG 'scratch'.
Furthermore, I added an assertion that ensures that 'scratch' is
actually a REG when it is needed.

Besides that, I fixed indentation. I know it is not optimal to fix
indentation alongside with functionality... did it anyway :-)

Finally, I exposed alternative #3 of the insns to the register
allocator, because it is not possible to distinguish between
overlapping or non-overlapping regs, and #3 does not need a scratch.

Ran C-testsuite with no regressions.

Johann

2011-04-15  Georg-Johann Lay  a...@gjlay.de

* config/avr/avr.md (rotlmode3): Use SCRATCH instead
of REG in operand 3. Fix indentation. Unquote C snippet.
(*rotwmode,*rotbwmode): Ditto. Replace scratch
with pseudo for pre-reload splits. Use const_int_operand for
operand 2. Expose alternative 3 to register allocator.
* config/avr/avr.c (avr_rotate_bytes): Handle SCRATCH in
operands[3]. Use GNU indentation style.

Index: config/avr/avr.md
===
--- config/avr/avr.md	(Revision 172493)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -1519,22 +1519,22 @@ (define_mode_attr rotsmode [(DI QI) (S
 
 (define_expand rotlmode3
   [(parallel [(set (match_operand:HIDI 0 register_operand )
-		   (rotate:HIDI (match_operand:HIDI 1 register_operand )
-(match_operand:VOID 2 const_int_operand )))
-		(clobber (match_dup 3))])]
+   (rotate:HIDI (match_operand:HIDI 1 register_operand )
+(match_operand:VOID 2 const_int_operand )))
+  (clobber (match_dup 3))])]
   
-  
-{
-  if (CONST_INT_P (operands[2])  0 == (INTVAL (operands[2]) % 8))
   {
-  if (AVR_HAVE_MOVW  0 == INTVAL (operands[2]) % 16)
-operands[3] = gen_reg_rtx (rotsmodemode);
-  else
-operands[3] = gen_reg_rtx (QImode);
-  }
-  else
-FAIL;
-})
+if (CONST_INT_P (operands[2])
+ 0 == INTVAL (operands[2]) % 8)
+  {
+if (AVR_HAVE_MOVW  0 == INTVAL (operands[2]) % 16)
+  operands[3] = gen_rtx_SCRATCH (rotsmodemode);
+else
+  operands[3] = gen_rtx_SCRATCH (QImode);
+  }
+else
+  FAIL;
+  })
 
 
 ;; Overlapping non-HImode registers often (but not always) need a scratch.
@@ -1545,35 +1545,49 @@ (define_expand rotlmode3
 
 ; Split word aligned rotates using scratch that is mode dependent.
 (define_insn_and_split *rotwmode
-  [(set (match_operand:HIDI 0 register_operand =r,r,#r)
-	(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r)
-		 (match_operand 2 immediate_operand n,n,n)))
-   (clobber (match_operand:rotsmode 3 register_operand  =rotx ))]
-  (CONST_INT_P (operands[2]) 
- (0 == (INTVAL (operands[2]) % 16)  AVR_HAVE_MOVW))
+  [(set (match_operand:HIDI 0 register_operand  =r,r,r)
+(rotate:HIDI (match_operand:HIDI 1 register_operand  0,r,r)
+ (match_operand 2 const_int_operand  n,n,n)))
+   (clobber (match_scratch:rotsmode 3   =rotx))]
+  AVR_HAVE_MOVW
+0 == INTVAL (operands[2]) % 16
   #
(reload_completed || MODEmode == DImode)
   [(const_int 0)]
-  avr_rotate_bytes (operands);
-  DONE;
-)
+  {
+/* Split happens in .split1: Cook up a pseudo */
+if (SCRATCH == GET_CODE (operands[3])
+ !reload_completed)
+  {
+operands[3] = gen_reg_rtx (GET_MODE 

Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-14 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/4/14 Georg-Johann Lay a...@gjlay.de:
 The rotlmode3 expanders (mode \in {HI,SI,DI}) violates synopsis by
 using 4 operands instead of 3. This runs in ICE in top of
 optabs.c:maybe_gen_insn.

 The right way to do this is to use match_dup, not match_operand. So
 the fix is obvious.

 Regenerated avr-gcc and run against avr testsuite:

 gcc.target/avr/torture/pr41885.c generates these patterns

 Johann

 2011-04-14  Georg-Johann Lay  a...@gjlay.de

* config/avr/avr.md (rotlhi3, rotlsi3, rotldi3): Fix
expanders operand 3 from match_operand to match_dup.
 
 May be better to use (clobber (match_scratch ...)) here and in
 `*rotwmode' and `*rotbmode'.

These are splitters that might split before reload, producing strange,
non-matching insns like
 (set (scratch:QI) ...
or crashing already in avr_rotate_bytes becase the split condition reads
 (reload_completed || MODEmode == DImode)

By the way: this and other patterns implement DImode insns, but there
is *no* movdi in avr backend. That is strongly discouraged, i.e.
implementing insns for FOOmode if there in no movfoo insn that can do
reloading for that mode. It's tempting to omit movdi (because movdi
would be the most complicated DI insn) but it's unhealthy.

IMO the DI stuff should be removed / deactivated as long as there is
no proper support of DImode.

 
 Denis.