Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-10-19 Thread Pitchumani Sivanupandi

On Wednesday 19 October 2016 07:51 PM, Georg-Johann Lay wrote:

On 17.10.2016 09:27, Pitchumani Sivanupandi wrote:

On Thursday 13 October 2016 08:42 PM, Georg-Johann Lay wrote:

On 13.10.2016 13:44, Pitchumani Sivanupandi wrote:

On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch 
case index is

more than 16 bits.

Switch case index of larger than SImode are checked for out of 
range before
'casesi' expand. RTL expand of casesi gets index as SImode, but 
index is

compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index 
comparison

in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode 
which is a
great performance loss for most programs which will switch on 
16-bit values.


IMO we need a less intrusive (w.r.t. performance) approach.


Yes.

I tried to split 'casesi' into several based on case values so that 
compare is

done
in less expensive modes (i.e. QI or HI). In few cases it is not 
possible

without
SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks 
will be

expensive
as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
  if case values in QI range
if upper three bytes index is set
  goto out_of_range

offset = index - lower_bound (QImode)
if offset > case_range   (QImode)
  goto out_of_range
goto jump_table + offset

  else if case values in HI range
if index[2,3] is set
  goto out_of_range

offset = index - lower_bound (HImode)
if offset > case_range   (HImode)
  goto out_of_range
goto jump_table + offset

This modification will not work for the negative index values. 
Because code to

check
upper bytes of index will be expensive than the SImode subtract/ 
compare.


So, I'm trying to update fix to have SImode subtract/ compare if 
the case

values include
negative integers. For, others will try to optimize as mentioned 
above. Is that

approach OK?


But the above code will be executed at run time and add even more 
overhead,
or am I missing something?  If you conclude statically at expand 
time from
the case ranges then we might hit a similar problem as with the 
original

subreg computation.


No. Lower bound and case range are const_int_operand, known at 
compile time.


Yes, but if the range if form 10 to 90, say, then you still don't know 
whether HImode and QImode is appropriate or not which adds to code 
size and register pressure.


As I mentioned earlier, I am working on a different approach which 
would revert your changes:  The casesi is basically unaltered (except 
for operand clean-ups and avoidance of clobbering subregs).


The ups of my approach are:

* The original value is known and whether is is QI or HI.

* The signedness is known which allows to use the maximum range of
  QI resp. HI depending on the sign.

* Also works on negative values.

* All is done at compile time, no need for extra code.

* No intermediate 32-bit values, no unnecessary increase of reg pressure.

* Optimization can be switched off by -fdisable if desired.

* Result can be seen in dumps.

The downsides are:

* Also needs some lines of code (~400).

* Makes assumptions on the anatomy of the code, i.e. extension
  precedes casesi.

First we should decide which route to follow as the changes are 
conflicting each other.  I have not so much time to work on the stuff 
but the results are promising.  If you are interested in the changes, 
I can supply it but it's all still work in progress.


Ok. I'll put this patch on hold for now.
Please share if you have draft version of your fix is ready.

Thanks.

Regards,
Pitchumani


Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-10-19 Thread Georg-Johann Lay

On 17.10.2016 09:27, Pitchumani Sivanupandi wrote:

On Thursday 13 October 2016 08:42 PM, Georg-Johann Lay wrote:

On 13.10.2016 13:44, Pitchumani Sivanupandi wrote:

On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch case index is
more than 16 bits.

Switch case index of larger than SImode are checked for out of range before
'casesi' expand. RTL expand of casesi gets index as SImode, but index is
compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index comparison
in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode which is a
great performance loss for most programs which will switch on 16-bit values.

IMO we need a less intrusive (w.r.t. performance) approach.


Yes.

I tried to split 'casesi' into several based on case values so that compare is
done
in less expensive modes (i.e. QI or HI). In few cases it is not possible
without
SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks will be
expensive
as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
  if case values in QI range
if upper three bytes index is set
  goto out_of_range

offset = index - lower_bound (QImode)
if offset > case_range   (QImode)
  goto out_of_range
goto jump_table + offset

  else if case values in HI range
if index[2,3] is set
  goto out_of_range

offset = index - lower_bound (HImode)
if offset > case_range   (HImode)
  goto out_of_range
goto jump_table + offset

This modification will not work for the negative index values. Because code to
check
upper bytes of index will be expensive than the SImode subtract/ compare.

So, I'm trying to update fix to have SImode subtract/ compare if the case
values include
negative integers. For, others will try to optimize as mentioned above. Is that
approach OK?


But the above code will be executed at run time and add even more overhead,
or am I missing something?  If you conclude statically at expand time from
the case ranges then we might hit a similar problem as with the original
subreg computation.


No. Lower bound and case range are const_int_operand, known at compile time.


Yes, but if the range if form 10 to 90, say, then you still don't know whether 
HImode and QImode is appropriate or not which adds to code size and register 
pressure.


As I mentioned earlier, I am working on a different approach which would revert 
your changes:  The casesi is basically unaltered (except for operand clean-ups 
and avoidance of clobbering subregs).


The ups of my approach are:

* The original value is known and whether is is QI or HI.

* The signedness is known which allows to use the maximum range of
  QI resp. HI depending on the sign.

* Also works on negative values.

* All is done at compile time, no need for extra code.

* No intermediate 32-bit values, no unnecessary increase of reg pressure.

* Optimization can be switched off by -fdisable if desired.

* Result can be seen in dumps.

The downsides are:

* Also needs some lines of code (~400).

* Makes assumptions on the anatomy of the code, i.e. extension
  precedes casesi.

First we should decide which route to follow as the changes are conflicting 
each other.  I have not so much time to work on the stuff but the results are 
promising.  If you are interested in the changes, I can supply it but it's all 
still work in progress.


Johann


Tried to optimize code generated based on case values range.
Attached the revised patch.

Tested with avrtest, no regression found.

Is it OK?


Unfortunately, the generated code (setting cc0, a reg and pc) cannot be
wrapped into an unspec or parallel and then later be rectified...

I am thinking about a new avr target pass to tidy up the code if no 32-bit
computation is needed, but this will be some effort.

Ok.


Regards,
Pitchumani

gcc/ChangeLog

2016-10-17  Pitchumani Sivanupandi 

PR target/71676
PR target/71678
* config/avr/avr.md (casesi_index_qi, casesi_index_hi, casesi_index_si):
Add new expands, called by casesi based on case values range.

gcc/testsuite/ChangeLog

2016-10-17  Pitchumani Sivanupandi 

PR target/71676
PR target/71678
* gcc.target/avr/pr71676-1.c: New test.
* gcc.target/avr/pr71676.c: New test.
* gcc.target/avr/pr71678.c: New test.





Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-10-17 Thread Pitchumani Sivanupandi

On Thursday 13 October 2016 08:42 PM, Georg-Johann Lay wrote:

On 13.10.2016 13:44, Pitchumani Sivanupandi wrote:

On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch 
case index is

more than 16 bits.

Switch case index of larger than SImode are checked for out of 
range before
'casesi' expand. RTL expand of casesi gets index as SImode, but 
index is

compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index 
comparison

in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode 
which is a
great performance loss for most programs which will switch on 16-bit 
values.


IMO we need a less intrusive (w.r.t. performance) approach.


Yes.

I tried to split 'casesi' into several based on case values so that 
compare is

done
in less expensive modes (i.e. QI or HI). In few cases it is not 
possible without

SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks 
will be

expensive
as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
  if case values in QI range
if upper three bytes index is set
  goto out_of_range

offset = index - lower_bound (QImode)
if offset > case_range   (QImode)
  goto out_of_range
goto jump_table + offset

  else if case values in HI range
if index[2,3] is set
  goto out_of_range

offset = index - lower_bound (HImode)
if offset > case_range   (HImode)
  goto out_of_range
goto jump_table + offset

This modification will not work for the negative index values. 
Because code to

check
upper bytes of index will be expensive than the SImode subtract/ 
compare.


So, I'm trying to update fix to have SImode subtract/ compare if the 
case

values include
negative integers. For, others will try to optimize as mentioned 
above. Is that

approach OK?


But the above code will be executed at run time and add even more 
overhead, or am I missing something?  If you conclude statically at 
expand time from the case ranges then we might hit a similar problem 
as with the original subreg computation.


No. Lower bound and case range are const_int_operand, known at compile time.
Tried to optimize code generated based on case values range.
Attached the revised patch.

Tested with avrtest, no regression found.

Is it OK?

Unfortunately, the generated code (setting cc0, a reg and pc) cannot 
be wrapped into an unspec or parallel and then later be rectified...


I am thinking about a new avr target pass to tidy up the code if no 
32-bit computation is needed, but this will be some effort.

Ok.


Regards,
Pitchumani

gcc/ChangeLog

2016-10-17  Pitchumani Sivanupandi 

PR target/71676
PR target/71678
* config/avr/avr.md (casesi_index_qi, casesi_index_hi, 
casesi_index_si):

Add new expands, called by casesi based on case values range.

gcc/testsuite/ChangeLog

2016-10-17  Pitchumani Sivanupandi 

PR target/71676
PR target/71678
* gcc.target/avr/pr71676-1.c: New test.
* gcc.target/avr/pr71676.c: New test.
* gcc.target/avr/pr71678.c: New test.

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 97f3561..b58db14 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -5152,16 +5152,95 @@
(set_attr "isa" "eijmp")
(set_attr "cc" "clobber")])
 
+; casesi for QI mode case values
+(define_expand "casesi_index_qi"
+  [(set (cc0)
+(compare:QI (subreg:QI (match_dup 0) 0)
+(match_operand 2 "const_int_operand" "")))
 
-(define_expand "casesi"
-  [(parallel [(set (match_dup 6)
-   (minus:HI (subreg:HI (match_operand:SI 0 "register_operand" "") 0)
- (match_operand:HI 1 "register_operand" "")))
-  (clobber (scratch:QI))])
-   (parallel [(set (cc0)
-   (compare (match_dup 6)
-(match_operand:HI 2 "register_operand" "")))
-  (clobber (match_scratch:QI 9 ""))])
+   (set (pc)
+(if_then_else (gtu (cc0)
+   (const_int 0))
+  (label_ref (match_operand 4 "" ""))
+  (pc)))
+
+   (set (match_dup 9)
+(match_dup 7))
+
+   (parallel [(set (pc)
+	   (unspec:HI [(match_dup 9)] UNSPEC_INDEX_JMP))
+  (use (label_ref (match_operand 3 "" "")))
+  (clobber (match_dup 9))
+  (clobber (match_dup 8))])]
+  ""
+  {
+operands[5] = gen_reg_rtx 

Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-10-13 Thread Georg-Johann Lay

On 13.10.2016 13:44, Pitchumani Sivanupandi wrote:

On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch case index is
more than 16 bits.

Switch case index of larger than SImode are checked for out of range before
'casesi' expand. RTL expand of casesi gets index as SImode, but index is
compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index comparison
in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode which is a
great performance loss for most programs which will switch on 16-bit values.

IMO we need a less intrusive (w.r.t. performance) approach.


Yes.

I tried to split 'casesi' into several based on case values so that compare is
done
in less expensive modes (i.e. QI or HI). In few cases it is not possible without
SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks will be
expensive
as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
  if case values in QI range
if upper three bytes index is set
  goto out_of_range

offset = index - lower_bound (QImode)
if offset > case_range   (QImode)
  goto out_of_range
goto jump_table + offset

  else if case values in HI range
if index[2,3] is set
  goto out_of_range

offset = index - lower_bound (HImode)
if offset > case_range   (HImode)
  goto out_of_range
goto jump_table + offset

This modification will not work for the negative index values. Because code to
check
upper bytes of index will be expensive than the SImode subtract/ compare.

So, I'm trying to update fix to have SImode subtract/ compare if the case
values include
negative integers. For, others will try to optimize as mentioned above. Is that
approach OK?


But the above code will be executed at run time and add even more overhead, or 
am I missing something?  If you conclude statically at expand time from the 
case ranges then we might hit a similar problem as with the original subreg 
computation.


Unfortunately, the generated code (setting cc0, a reg and pc) cannot be wrapped 
into an unspec or parallel and then later be rectified...


I am thinking about a new avr target pass to tidy up the code if no 32-bit 
computation is needed, but this will be some effort.



Johann



Alternatively we can have flags to generate shorter code for 'casesi' using 
HImode
subtract/ compare. But correctness is not guaranteed (PR71676).

Regards,
Pitchumani






Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-10-13 Thread Pitchumani Sivanupandi

On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch case 
index is

more than 16 bits.

Switch case index of larger than SImode are checked for out of range 
before

'casesi' expand. RTL expand of casesi gets index as SImode, but index is
compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index 
comparison

in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode which 
is a great performance loss for most programs which will switch on 
16-bit values.


IMO we need a less intrusive (w.r.t. performance) approach.


Yes.

I tried to split 'casesi' into several based on case values so that 
compare is done
in less expensive modes (i.e. QI or HI). In few cases it is not possible 
without

SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks will 
be expensive

as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
  if case values in QI range
if upper three bytes index is set
  goto out_of_range

offset = index - lower_bound (QImode)
if offset > case_range   (QImode)
  goto out_of_range
goto jump_table + offset

  else if case values in HI range
if index[2,3] is set
  goto out_of_range

offset = index - lower_bound (HImode)
if offset > case_range   (HImode)
  goto out_of_range
goto jump_table + offset

This modification will not work for the negative index values. Because 
code to check

upper bytes of index will be expensive than the SImode subtract/ compare.

So, I'm trying to update fix to have SImode subtract/ compare if the 
case values include
negative integers. For, others will try to optimize as mentioned above. 
Is that approach OK?


Alternatively we can have flags to generate shorter code for 'casesi' 
using HImode

subtract/ compare. But correctness is not guaranteed (PR71676).

Regards,
Pitchumani



Re: [patch, avr, pr71676 and pr71678] Issues with casesi expand

2016-09-26 Thread Georg-Johann Lay

On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:

Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch case index is
more than 16 bits.

Switch case index of larger than SImode are checked for out of range before
'casesi' expand. RTL expand of casesi gets index as SImode, but index is
compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index comparison
in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
 (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
   (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.


But this means that all comparisons are now performed in SImode which is a 
great performance loss for most programs which will switch on 16-bit values.


IMO we need a less intrusive (w.r.t. performance) approach.

Johann



Regtested using avrtest. No regression found.

If OK, could someone commit please?

Is this OK for gcc-5-branch?

Regards,
Pitchumani

gcc/ChangeLog

2016-09-26  Pitchumani Sivanupandi  

PR target/71676
PR target/71678
* config/avr/avr.md (casesi): Change index compare to SI mode.

gcc/testsuite/ChangeLog

2016-09-26  Pitchumani Sivanupandi  

PR target/71676
PR target/71678
* gcc.target/avr/pr71676-1.c: New test.
* gcc.target/avr/pr71676.c: New test.
* gcc.target/avr/pr71678.c: New test.