RE: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-06 Thread Tamar Christina via Gcc-patches
> Hi!
> 
> On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> > On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> > Yes, *look* better: I have seen no proof or indication that this would
> 
> ("looks", I cannot type, sorry)
> 
> > actually generate better code, not even on just aarch, let alone on
> > the majority of targets.  As I said I have a test running, you may be
> > lucky even :-)  It has to run for about six hours more and after that
> > it needs analysis still (a few more hours if it isn't obviously always
> > better or worse), so expect results tomorrow night at the earliest.
> 
> The results are in:
> 
> $ perl sizes.pl --percent C[12]
> C1C2
>alpha   7082243  100.066%
>  arc   4207975  100.015%
>  arm  11518624  100.008%
>arm64  24514565  100.067%
>armhf  16661684  100.098%
> csky   4031841  100.002%
> i386 0 0
> ia64  20354295  100.029%
> m68k   4394084  100.023%
>   microblaze   6549965  100.014%
> mips  10684680  100.024%
>   mips64   8171850  100.002%
>nios2   4356713  100.012%
> openrisc   5010570  100.003%
>   parisc   8406294  100.002%
> parisc64 0 0
>  powerpc  11104901   99.992%
>powerpc64  24532358  100.057%
>  powerpc64le  21293219  100.062%
>  riscv32   2028474  100.131%
>  riscv64   9515453  100.120%
> s390  20519612  100.279%
>   sh 0 0
>  shnommu   1840960  100.012%
>sparc   5314422  100.004%
>  sparc64   7964129   99.992%
>   x86_64 0 0
>   xtensa   2925723  100.070%
> 
> 
> C1 is the original, C2 with your patch.  These numbers are the code sizes of a
> Linux kernel, some defconfig for every arch.  This is a good measure of how
> effective combine was.
> 
> The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
> everywhere else.  Look at that s390 number!  Or riscv, or most of the arm
> variants (including aarch64).
> 
> Do you want me to look in detail what causes this regression on some
> particular target, i.e. why we really still need the expand_compound
> functionality there?
> 

Hi,

Thanks for having a look! I think the Richards are exploring a different 
solution on the PR
so I don't think it's worth looking at now (maybe in stage-1?).  Thanks for 
checking though!

I Appreciate you all helping to get this fixed!

Kind Regards,
Tamar

> (Btw.  "0" means the target did not build.  For the x86 targets this is just 
> more
> -Werror madness that seeped in it seems.  For parisc64 and sh it is the choice
> of config.  Will fix.)
> 
> 
> Segher


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-06 Thread Segher Boessenkool
Hi!

On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> Yes, *look* better: I have seen no proof or indication that this would

("looks", I cannot type, sorry)

> actually generate better code, not even on just aarch, let alone on the
> majority of targets.  As I said I have a test running, you may be lucky
> even :-)  It has to run for about six hours more and after that it needs
> analysis still (a few more hours if it isn't obviously always better or
> worse), so expect results tomorrow night at the earliest.

The results are in:

$ perl sizes.pl --percent C[12]
C1C2
   alpha   7082243  100.066%
 arc   4207975  100.015%
 arm  11518624  100.008%
   arm64  24514565  100.067%
   armhf  16661684  100.098%
csky   4031841  100.002%
i386 0 0
ia64  20354295  100.029%
m68k   4394084  100.023%
  microblaze   6549965  100.014%
mips  10684680  100.024%
  mips64   8171850  100.002%
   nios2   4356713  100.012%
openrisc   5010570  100.003%
  parisc   8406294  100.002%
parisc64 0 0
 powerpc  11104901   99.992%
   powerpc64  24532358  100.057%
 powerpc64le  21293219  100.062%
 riscv32   2028474  100.131%
 riscv64   9515453  100.120%
s390  20519612  100.279%
  sh 0 0
 shnommu   1840960  100.012%
   sparc   5314422  100.004%
 sparc64   7964129   99.992%
  x86_64 0 0
  xtensa   2925723  100.070%


C1 is the original, C2 with your patch.  These numbers are the code
sizes of a Linux kernel, some defconfig for every arch.  This is a good
measure of how effective combine was.

The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
everywhere else.  Look at that s390 number!  Or riscv, or most of the
arm variants (including aarch64).

Do you want me to look in detail what causes this regression on some
particular target, i.e. why we really still need the expand_compound
functionality there?

(Btw.  "0" means the target did not build.  For the x86 targets this is
just more -Werror madness that seeped in it seems.  For parisc64 and sh
it is the choice of config.  Will fix.)


Segher


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-05 Thread Segher Boessenkool
Hi!

On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> > > The regression was reported during stage-1. A patch was provided during
> > stage 1 and the discussions around combine stalled.
> > >
> > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too 
> > > big just
> > to "take".
> > >
> > > So we need a way forward, even if it's stage-4.
> > Then it needs to be in a way that works within the design constraints of
> > combine.
> > 
> > As Segher has indicated, using a magic constant to say "this is always cheap
> > enough" isn't acceptable.  Furthermore, what this patch changes is combine's
> > internal canonicalization of extensions into shift pairs.
> > 
> > So I think another path forward needs to be found.  I don't see hacking up
> > expand_compound_operation is viable.
> 
> I'm not arguing at all about the merits of the patch. My argument was about 
> Segher saying he doesn't think this is a P1 regression or one that should be 
> addressed in stage-4.

That is not what I said (in the PR).  I said:
  Either this should not be P1, or the proposed patch is taking
  completely the wrong direction.  P1 means there is a regression.
  There is no regression in combine, in fact the proposed patch would
  *cause* regressions on many targets!
()

> We noticed and reported the regression early on during stage-1.  So I'm 
> unsure what else we should have done and it's not right to waive off fixing 
> it now, otherwise what's the point in us filing bug reports.

Something that fixes the regression is of course welcome.  But something
that *causes* many regressions is not.  Something that makes
compound_operation stuff better on all targets is more than welcome as
well, but *better* on *all* targets, not regressing most.  This really
is stage 1 material most likely.  I have been chipping away at this for
years, I don't expect any trivial patch can help, and it certainly won't
solve most problems here.

Maybe a target hook for this is best.  But not a completely ad-hoc one,
something usable and maintainable please.  So, it should say we do not
want certain kinds of code (or what kind of code we want instead), and
it should not add magic to the bowels of basic passes, magic that just
happens to make the code of particular testcases look better on a
particular target.

Yes, *look* better: I have seen no proof or indication that this would
actually generate better code, not even on just aarch, let alone on the
majority of targets.  As I said I have a test running, you may be lucky
even :-)  It has to run for about six hours more and after that it needs
analysis still (a few more hours if it isn't obviously always better or
worse), so expect results tomorrow night at the earliest.


Segher


RE: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-05 Thread Tamar Christina via Gcc-patches
> 
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> >
> > The regression was reported during stage-1. A patch was provided during
> stage 1 and the discussions around combine stalled.
> >
> > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big 
> > just
> to "take".
> >
> > So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of
> combine.
> 
> As Segher has indicated, using a magic constant to say "this is always cheap
> enough" isn't acceptable.  Furthermore, what this patch changes is combine's
> internal canonicalization of extensions into shift pairs.
> 
> So I think another path forward needs to be found.  I don't see hacking up
> expand_compound_operation is viable.

I'm not arguing at all about the merits of the patch. My argument was about 
Segher saying he doesn't think this is a P1 regression or one that should be 
addressed in stage-4.

We noticed and reported the regression early on during stage-1.  So I'm unsure 
what else we should have done and it's not right to waive off fixing it now, 
otherwise what's the point in us filing bug reports.

Tamar.
> 
> Jeff


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-05 Thread Jeff Law via Gcc-patches




On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:


The regression was reported during stage-1. A patch was provided during stage 1 
and the discussions around combine stalled.

The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to 
"take".

So we need a way forward, even if it's stage-4.
Then it needs to be in a way that works within the design constraints of 
combine.


As Segher has indicated, using a magic constant to say "this is always 
cheap enough" isn't acceptable.  Furthermore, what this patch changes is 
combine's internal canonicalization of extensions into shift pairs.


So I think another path forward needs to be found.  I don't see hacking 
up expand_compound_operation is viable.


Jeff


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-05 Thread Tamar Christina via Gcc-patches


The regression was reported during stage-1. A patch was provided during stage 1 
and the discussions around combine stalled.

The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just 
to "take".

So we need a way forward, even if it's stage-4.

Thanks,
Tamar


From: Segher Boessenkool 
Sent: Saturday, March 4, 2023 10:17 PM
To: Roger Sayle 
Cc: 'GCC Patches' ; Tamar Christina 
; Richard Sandiford 
Subject: Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in 
combine when cheap.

On Sat, Mar 04, 2023 at 06:32:15PM -, Roger Sayle wrote:
> This patch addresses PR rtl-optimization/106594, a P1 performance
> regression affecting aarch64.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.

It should be tested for performance everywhere else, too.

It can very easily result in worse code on some targets.  This kind of
thing really should be done in stage 1, not stage 4.

> PR rtl-optimization/106594
> * combine.cc (expand_compound_operation): Don't expand/transform
> ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
> cheap.

That is not how combine works.  If the old code is more expensive than
what combine comes up with., it *should* transform it.  Magic cost
cutoffs are not okay anywhere in combine, either.

If expand_compound_operation and friends misbehave (not really an "if",
unfortunately), then please fix that, instead of randomly disabling
parts of combine?


Segher


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-04 Thread Segher Boessenkool
On Sat, Mar 04, 2023 at 06:32:15PM -, Roger Sayle wrote:
> This patch addresses PR rtl-optimization/106594, a P1 performance
> regression affecting aarch64.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.

It should be tested for performance everywhere else, too.

It can very easily result in worse code on some targets.  This kind of
thing really should be done in stage 1, not stage 4.

> PR rtl-optimization/106594
> * combine.cc (expand_compound_operation): Don't expand/transform
> ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
> cheap.

That is not how combine works.  If the old code is more expensive than
what combine comes up with., it *should* transform it.  Magic cost
cutoffs are not okay anywhere in combine, either.

If expand_compound_operation and friends misbehave (not really an "if",
unfortunately), then please fix that, instead of randomly disabling
parts of combine?


Segher