[Bug target/29560] [avr] Poor optimization for byte shifts

2011-08-10 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

--- Comment #10 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-08-10 
08:58:07 UTC ---
Author: gjl
Date: Wed Aug 10 08:58:03 2011
New Revision: 177616

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=177616
Log:

PR target/29560
* config/avr/avr.md (*ashlhiqi3): New insn-and-split.
(*ashlextend_prefixqihiqi3): New insn-and-splits.
(*ashlextend_prefixqihiqi3.mem): New insn-and-splits.
Add peephole2 to map ashlhi3 to ashlqi3 if high part of
shift target is unused.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/avr/avr.md


[Bug target/29560] [avr] Poor optimization for byte shifts

2011-08-10 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

Georg-Johann Lay gjl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED
  Known to fail|4.7.0   |

--- Comment #11 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-08-10 
09:19:49 UTC ---
Closed as FIXED for 4.7.0 trunk with patch from comment #10.

The test case from attachment 24816 compiles with -Os -dp to:

shift1:
rjmp 2f ;  22*ashlqi3/1[length = 5]
1:lsl r24
2:dec r22
brpl 1b
ret ;  27return[length = 1]

shift2:
mov r25,r24 ;  19*movqi/1[length = 1]
rjmp 2f ;  15*ashlqi3/1[length = 5]
1:lsl r25
2:dec r22
brpl 1b
sts y,r25 ;  16*movqi/3[length = 2]
ret ;  22return[length = 1]

setpin:
in r25,44-0x20 ;  11*movqi/4[length = 1]
ldi r18,lo8(1) ;  13*movhi/4[length = 2]
ldi r19,hi8(1)
mov r0,r24 ;  49*ashlqi3/1[length = 5]
rjmp 2f
1:lsl r18
2:dec r0
brpl 1b
cpi r22,lo8(1) ;  7*cmpqi/3[length = 1]
brne .L4 ;  8branch[length = 1]
or r25,r18 ;  15iorqi3/1[length = 1]
out 44-0x20,r25 ;  17*movqi/3[length = 1]
ret ;  46return[length = 1]
.L4:
com r18 ;  27one_cmplqi2[length = 1]
and r18,r25 ;  28andqi3/1[length = 1]
out 44-0x20,r18 ;  30*movqi/3[length = 1]
ret ;  48return[length = 1]


Notice the *ashlqi3 insns.

The fix uses combine patterns and an RTL peephole to transform
16-bit shift to 8-bit shift if applicable.  There will be
situations where optimization opportunities are not noticed,
e.g. if a move occurs after the shift so that this move will
get a REG_UNUSED note but not the shift loop itself.

The fix just replaces 16-bit loop by 8-bit loop; there is no
back-propagation of the information.


[Bug target/29560] [avr] Poor optimization for byte shifts

2011-07-25 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

--- Comment #9 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-25 
12:48:29 UTC ---
Created attachment 24827
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24827
Fix PR29560 by adding peephole2 pattern.


PR target/29560
* config/avr/avr.md: Add peephole2 to map ashlhi3 to ashlqi3 if
high part of shift target is unused.


[Bug target/29560] [avr] Poor optimization for byte shifts

2011-07-23 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

--- Comment #7 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-23 
14:50:58 UTC ---
Created attachment 24816
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24816
Test case for 16-bit shifts that use low part only

Reopened this optimization issue.

For the attached test case, there are sometimes REG_UNUSED notes for the high
part like with avr-gcc-4.6.1 -Os -dP

shift1:
 ; (insn 8 4 17 (set (reg:HI 24 r24 [50])
 ; (ashift:HI (reg:HI 24 r24 [ x ])
 ; (reg/v:QI 22 r22 [orig:47 s ] [47]))) foo.c:3 68 {ashlhi3}
 ;  (expr_list:REG_DEAD (reg/v:QI 22 r22 [orig:47 s ] [47])
 ; (expr_list:REG_UNUSED (reg:QI 25 r25)
 ; (nil
rjmp 2f ;  8ashlhi3/1[length = 6]
1:lsl r24
rol r25
2:dec r22
brpl 1b

So that there is a way to map ashlhi3 to ashlqi3, i.e. 16-bit shift to a 8-bit
shift.

Unfortunately, these notes are not always present like in shift2:

shift2:
 ; (insn 15 4 8 (set (reg:HI 18 r18)
 ; (reg:HI 24 r24 [ x ])) foo.c:10 10 {*movhi}
 ;  (nil))
movw r18,r24 ;  15*movhi/1[length = 1]
 ; (insn 8 15 9 (set (reg:HI 18 r18)
 ; (ashift:HI (reg:HI 18 r18)
 ; (reg/v:QI 22 r22 [orig:46 s ] [46]))) foo.c:10 68 {ashlhi3}
 ;  (expr_list:REG_DEAD (reg/v:QI 22 r22 [orig:46 s ] [46])
 ; (nil)))
rjmp 2f ;  8ashlhi3/1[length = 6]
1:lsl r18
rol r19
2:dec r22
brpl 1b

The notes are not present in pass .split2 so a split won't help. The notes
appear to be available in .peephole2 so that could be a fix. Moreover, the
notes won't be back-propagated so that an optimization will cover at most one
insn: the shift insn.


[Bug target/29560] [avr] Poor optimization for byte shifts

2011-07-23 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

Georg-Johann Lay gjl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|INVALID |
   Target Milestone|--- |4.7.0

--- Comment #8 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-23 
14:53:52 UTC ---
Set to NEW as explained in previous post.


[Bug target/29560] [avr] Poor optimization for byte shifts

2011-06-21 Thread gjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560

Georg-Johann Lay gjl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||INVALID

--- Comment #6 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-06-21 
18:24:44 UTC ---
Closing this PR as invalid.

As Andy already explained, for shift offsets 7 and 15 there has to be a
defined result because int for avr-gcc is 16 bits, and this is actually *not* a
byte shift.

With alternative code that tested for these offsets at runtime, jumped around
and loaded 0 if appropriate, you were not better of than with the current code
-- in the contrary.

Note that gcc deflates you function calls to just one instruction in main.

So maybe what you really want here is to make these functions static inline so
that you do no more see the code in the functions.  Presumably you always call
this functions with compile time constants.

If you really need to quench out the last tick and call with non-constants, you
could invent inline assembler with an instruction sequence that is similar to
(1  (R24  7)), perhaps together with __builtin_constant_p magic:

   LDI  R24, 1
   SBRC R24, 1
   LDI  R24, 4
   SBRC R24, 0
   LSL  R24
   SBRC R24, 2
   SWAP R24

Note that with -mint8, int is 8 bits.  That is still present as option, but no
more supported and perhaps non-functional.

--

Just for reference; here is a source without external #include:

#define PORTC (*((unsigned char volatile*) 53))

void setpin(unsigned char pin, unsigned char val)
{
if (val == 1) PORTC |= (1pin);
else PORTC = ~(1pin);
}

void setpin1(unsigned char pin, unsigned char val)
{
const unsigned char one = 1;
if (val == 1) PORTC |= (one(pin));
else PORTC = ~(onepin);
}

int main(void)
{
setpin(3, 1);
setpin1(3, 1);
return 0;
}


[Bug target/29560] [avr] Poor optimization for byte shifts

2010-02-18 Thread Rudolf dot Leitgeb at gmx dot at


--- Comment #5 from Rudolf dot Leitgeb at gmx dot at  2010-02-18 13:28 
---
Right away: I am NOT an expert in compiler or optimizer design, so please bear
with me.

I understand, that (((unsigned char)1)  ((unsigned char)something)) may need
more than 8 bits for representation and that gcc's default int size on the
ATmega platform is 16 bits. But the result is assigned to an 8 bit value. I
take it that there is no way to back propagate this potential optimization from
the assignment to the shifting step. If you look in my assembly code, r25 is
computed with great effort yet never transferred anywhere.

The trick with 7 is nice but introduces another unnecessary AND operation. And
it is only correct if the input numebr is guaranteed to be between 0 and 7. Am
I correct that this whole optimization issue comes from the fact that ATmega is
an 8 bit architecture yet gcc's int on that platform is 16 bit?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560



[Bug target/29560] [avr] Poor optimization for byte shifts

2009-12-30 Thread hutchinsonandy at gcc dot gnu dot org


--- Comment #4 from hutchinsonandy at gcc dot gnu dot org  2009-12-31 03:16 
---
The same occurs in 4.5

I think the bug  is invalid

The expression 1 pin will be promoted. This produces a defined result if
pin7 and 15
The expression can not then be lower to 8 bit shift - since a shift by 7 is
undefined.

If you use pin 7, the shift is indeed 8 bit shift. Though it still loads a int
(HIMODE) value of 1 that should have been reduced to QIMODE.

setpinx:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
mov r25,r24
andi r25,lo8(7)
ldi r18,lo8(1)
ldi r19,hi8(1)
mov r24,r18
rjmp 2f
1:  lsl r24
2:  dec r25
brpl 1b
cpi r22,lo8(1)
brne .L7
in r25,53-0x20
or r25,r24
out 53-0x20,r25
ret
.L7:
in r25,53-0x20
com r24
and r24,r25
out 53-0x20,r24
ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560



[Bug target/29560] [avr] Poor optimization for byte shifts

2009-08-24 Thread eric dot weddington at atmel dot com


--- Comment #3 from eric dot weddington at atmel dot com  2009-08-24 17:18 
---
Confirmed on 4.3.2.


-- 

eric dot weddington at atmel dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
  GCC build triplet|powerpc-apple-darwin7.9.0   |
   GCC host triplet|powerpc-apple-darwin7.9.0   |
   Keywords||missed-optimization
  Known to fail||4.1.1 4.3.2
   Last reconfirmed|-00-00 00:00:00 |2009-08-24 17:18:30
   date||
Summary|Poor optimization for   |[avr] Poor optimization for
   |character shifts on Atmel   |byte shifts
   |AVR |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560