Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-10 Thread HAO CHEN GUI via Gcc-patches
Hi,

On 9/4/2022 上午 3:36, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 28, 2022 at 11:17:27AM +0800, HAO CHEN GUI wrote:
>>   This patch corrects the match pattern in pr56605.c. The former pattern
>> is wrong and test case fails with GCC11. It should match following insn on
>> each subtarget after mode promotion is disabled. The patch need to be
>> backported to GCC11.
>>
>> //gimple
>> _17 = (unsigned int) _20;
>>  prolog_loop_niters.4_23 = _17 & 3;
>>
>> //rtl
>> (insn 19 18 20 2 (parallel [
>> (set (reg:CC 208)
>> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3]))
>> (const_int 0 [0])))
>> (set (reg:SI 129 [ prolog_loop_niters.5 ])
>> (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3])))
>> ]) 197 {*andsi3_imm_mask_dot2}
>>
>>
>>   Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no 
>> regressions.
>> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2022-02-28 Haochen Gui 
>>
>> gcc/testsuite/
>>  PR target/102146
>>  * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass.
>>
>>
>> patch.diff
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> index fdedbfc573d..231d808aa99 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>>  ia[i] = (int) sb[i];
>>  }
>>
>> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
>> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */
>> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
>> \(reg:DI} 1 "combine" } } */
> 
> The old pattern uses non-capturing braces here, which are required for
> ...-times to work correctly.  The zero_extend alternative is required as
> well, as is making the subreg optional (we have an actual reg in one of
> the cases currently).  What do you consider wrong about the old pattern,
> what in the generated code is different from what you expect?
> 
> It works correctly on p7 etc. btw; where do you see it fail?  p10?
> 
> 
I saw it failed with GCC11.

FAIL: gcc.target/powerpc/pr56605.c scan-rtl-dump-times combine "\\(compare:CC 
\\((?:and|zero_extend):(?:DI) \\((?:sub)?reg:[SD]I" 1

On ppc64le with GCC11, it should match following insn.

(compare:CC (and:SI (subreg:SI (reg:DI 208) 0)

With GCC12, it should match following insn.

(compare:CC (and:SI (subreg:SI (reg:DI 207) 0)

With GCC12 the pattern actually matches:

(compare:CC (and:DI (subreg:DI (reg:SI 136 [ niters.6 ]) 0)

So GCC12 doesn't fail the case. But it actually match wrong insn.
There is no such insn in GCC11 combine dump. So GCC11 hits the problem.

Thanks.

> Segher


[PATCH v2, pushed] rs6000/test: Adjust p9-vec-length-{full, epil}-7.c [PR103196]

2022-04-10 Thread Kewen.Lin via Gcc-patches
on 2022/4/8 10:34 PM, Segher Boessenkool wrote:
> Hi!
> 
> Thanks for investigating.
> 
> On Fri, Apr 08, 2022 at 03:25:51PM +0800, Kewen.Lin wrote:
>> on 2022/4/8 3:29 AM, Segher Boessenkool wrote:
>>> On Thu, Apr 07, 2022 at 09:19:51AM -0500, will schmidt wrote:
 On Mon, 2022-02-28 at 13:37 +0800, Kewen.Lin via Gcc-patches wrote:
> As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the
> complete unrolling can happen on some of its loops.  This patch is to
> use pragma "GCC unroll 0" to disable all possible loop unrollings.
> Hope it can help the case not that fragile.

 ok

 Is the lack of effectiveness of "-fno-unroll-loops" otherwise
 understood, or is there further issue behind that option? 
>>>
>>> There is -fpeel-loops as well, and cunroll is independent of all of
>>> these as well?
>>
>> Yes, kind of.  As below code, unroll-loops and peel-loops only affect
>> whether it's acceptable to grow size.
>>
>>   /* Allow cunroll to grow size accordingly.  */
>>   if (!opts_set->x_flag_cunroll_grow_size)
>> opts->x_flag_cunroll_grow_size
>>   = (opts->x_flag_unroll_loops
>>  || opts->x_flag_peel_loops
>>  || opts->x_optimize >= 3);
>>
>> This flag_cunroll_grow_size doesn't gate the pass, it's only for
>> "may_increase_size".
> 
> Yuck.
> 
>>   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size, 
>> true);
>>
>> The cunroll still can take effect if its transformation doesn't grow
>> size even if we specify -fno-unroll-loops and -fno-peel-loops.
> 
> Yeah.  And the pragma disables it, so we really should have a command
> line option that does the same (but globally), too.
> 
 I would
 expect the effect of the option, versus the pragma, two to roughly
 equivalent.   Obviously it is not.  :-)
>>>
>>> Yes, me too.  And I do not see what makes the difference, if it isn't
>>> the peel thing :-(
>>>
>>> Ke Wen, can you try with -fno-peel-loops please?
>>
>> I had a try and it didn't help as the cunroll pass doesn't grow size
>> for this case.  By the way, -fdisable-tree-cunroll does work.  But I
>> thought using pragma looks better as it's not an internal thing like
>> the disabling option and I also expect other future unrolling related
>> changes would respect it.
>>
>> Do you also prefer pragma, or want that disabling option?
> 
> That flag please.  And we should have a -fno-unroll that disables all
> separate kinds of unrolling, imo -- but that is a separate problem, not
> something to hold up your patch for.
> 
> Okay for trunk without the pragma.  Thanks!
> 

Thanks Segher!  I just committed the attached patch in r12-8075 which
follows your preference with -fdisable-tree-cunroll.

> (The pragma is a good solution if you want to disable unrolling for a
> very specific piece of code only, which is not the case here: the
> problem currently shows up in one place, but it could happen elsewhere!)
> 
> 

Yeah, but for this failure, all loops of the cases use the same macro
on which the proposed pragma is supposed to be used, so all loops should
be affected (I would expect it's the same as global effect).  :)
BR,
Kewen
From e2ecbeeb47d20d6679993a74064bc9efdb827eef Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Sun, 10 Apr 2022 21:31:09 -0500
Subject: [PATCH] rs6000/test: Adjust p9-vec-length-{full,epil}-7.c [PR103196]

As PR103196 shows, complete unrolling pass still takes effect even
if we have specified the option "-fno-unroll-loops".  The loops in
that case are not expected to be transformed by it, otherwise the
expected counts change.  This patch is to add the disabling option
to make them not sensitive to complete unrolling.

PR testsuite/103196

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/p9-vec-length-epil-7.c: Add option
-fdisable-tree-cunroll.
* gcc.target/powerpc/p9-vec-length-full-7.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c | 4 +++-
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c 
b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
index a27ee347ca1..011b731f7c5 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
@@ -1,5 +1,7 @@
 /* { dg-do compile { target { lp64 && powerpc_p9vector_ok } } } */
-/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
-fno-vect-cost-model -fno-unroll-loops -ffast-math" } */
+/* Pass cunroll isn't disabled by -fno-unroll-loops, so use explicit
+   disabling option for it.  */
+/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
-fno-vect-cost-model -fno-unroll-loops -ffast-math -fdisable-tree-cunroll" } */
 
 /* { dg-additional-options "--param=vect-partial-vector-usage=1" } */
 
diff --git 

[committed] Minor bfin codegen bugfix

2022-04-10 Thread Jeff Law via Gcc-patches
builtin-arith-overflow-3 has been failing on bfin-elf for a while. I've 
had a workaround installed on the tester for a few months and I finally 
got some time to dig into it over the weekend.


Ultimately I was able to nail the problem down to the representation of 
the rotate left by one position instruction.  I'm actually quite happy 
with that result as I was briefly worried it was a deeper problem in 
combine.


Essentially we're trying to synthesize a DImode shift using an SImode 
rotates through CC.  The key backend insn looks like:

(define_insn "rol_one"
  [(set (match_operand:SI 0 "register_operand" "+d")
    (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "d") 
(const_int 1))

    (zero_extend:SI (reg:BI REG_CC
   (set (reg:BI REG_CC)
    (zero_extract:BI (match_dup 1) (const_int 31) (const_int 0)))]
  ""
  "%0 = ROT %1 BY 1%!"
  [(set_attr "type" "dsp32shiftimm")])


What the 2nd set in the pattern wants to do is indicate that CC is set 
to the high bit from operand 1.  But the RTL pattern is completely bogus 
as it's asking for a 31 bit wide field starting at offset 0, which is 
just silly for BImode.  That bogus representation led combine to do some 
"unexpected" transformations.


The obvious fix is to use (const_int 1) (const_int 31) in the 2nd set.  
That indicates we want one bit starting at position 31.


With that pattern fixed, bfin returns to a normal state without needing 
my workaround.


Installed on the trunk,
Jeff


commit 8d331aab65488b3998bd106205bbe6cab5df31b5
Author: Jeff Law 
Date:   Sun Apr 10 23:02:48 2022 -0400

[committed] Minor bfin codegen bugfix

gcc/
* config/bfin/bfin.md (rol_one): Fix pattern to indicate the
sign bit of the source ends up in CC.

diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md
index 0e44653d7cb..56b24726bc2 100644
--- a/gcc/config/bfin/bfin.md
+++ b/gcc/config/bfin/bfin.md
@@ -1741,7 +1741,7 @@
(ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "d") 
(const_int 1))
(zero_extend:SI (reg:BI REG_CC
(set (reg:BI REG_CC)
-   (zero_extract:BI (match_dup 1) (const_int 31) (const_int 0)))]
+   (zero_extract:BI (match_dup 1) (const_int 1) (const_int 31)))]
   ""
   "%0 = ROT %1 BY 1%!"
   [(set_attr "type" "dsp32shiftimm")])


Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-10 Thread HAO CHEN GUI via Gcc-patches
Hi,

On 9/4/2022 上午 12:48, will schmidt wrote:
> On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote:
>> Hi,
>>   This patch corrects the match pattern in pr56605.c. The former pattern
>> is wrong and test case fails with GCC11. It should match following insn on
>> each subtarget after mode promotion is disabled. The patch need to be
>> backported to GCC11.
>>
> 
> Hi,
> 
> I note This patch appears to (partially?) address the P1 [11 regression] pr.  
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146

There are two issues left in this PR. One is pr56605.c. My patch fixes it.
Another is prefix-no-update.c. The patch Segher proposed in 103197 could fix it.
Thanks.
> 
> 
> The issue makes reference to a different proposed patch 
> in issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197
> titled  ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx 
> for a single byte
> proposed patch named
> rs6000: Disparage lfiwzx and similar
> 
> I can't address any of the background or history there.  :-)
> 
> 
>> //gimple
>> _17 = (unsigned int) _20;
>>  prolog_loop_niters.4_23 = _17 & 3;
>>
>> //rtl
>> (insn 19 18 20 2 (parallel [
>> (set (reg:CC 208)
>> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3]))
>> (const_int 0 [0])))
>> (set (reg:SI 129 [ prolog_loop_niters.5 ])
>> (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3])))
>> ]) 197 {*andsi3_imm_mask_dot2}
>>
>>
>>   Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no 
>> regressions.
>> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2022-02-28 Haochen Gui 
>>
>> gcc/testsuite/
>>  PR target/102146
>>  * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass.
>>
>>
>> patch.diff
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> index fdedbfc573d..231d808aa99 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>>  ia[i] = (int) sb[i];
>>  }
>>
>> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
>> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */
>> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
>> \(reg:DI} 1 "combine" } } */
> 
> 
> SO with the update, (i squint so this is an approximate handwave) this
> drops the zero_extend and changes the destination type to be DI for the
> scan-rtl.This appears to match the rtl as mentioned in the patch
> comments.
> 
> 
>>
> 


Avoid overflow in ipa-modref-tree.cc

2022-04-10 Thread Jan Hubicka via Gcc-patches
Hi,
the testcase triggers ICE since computation overflows on two accesses
that are very far away d->b[-144115188075855873] and d->b[144678138029277184].
This patch makes the relevant part of modref to use poly_offset_int.

It is kind of weird to store bit offsets into poly_int64 but it is what
alias oracle does.  There are some other places where computations are
done that may overflow, I will fix them incrementally (it is not
completely mechanical due to signed/unsigned info).

Bootstrapped/regtested x86_64-linux.  I plan to commit it tomorrow if
there are no complains.

gcc/ChangeLog:

2022-04-11  Jan Hubicka  

* ipa-modref-tree.cc (modref_access_node::closer_pair_p): Use
poly_offset_int to avoid overflow.
(modref_access_node::update2): Likewise.

gcc/testsuite/ChangeLog:

2022-04-11  Jan Hubicka  

* gcc.c-torture/compile/103818.c: New test.

diff --git a/gcc/ipa-modref-tree.cc b/gcc/ipa-modref-tree.cc
index f19af8c2b55..44cb645954f 100644
--- a/gcc/ipa-modref-tree.cc
+++ b/gcc/ipa-modref-tree.cc
@@ -267,34 +267,42 @@ modref_access_node::closer_pair_p (const 
modref_access_node ,
 
 
   /* Now compute distance of the intervals.  */
-  poly_int64 dist1, dist2;
+  poly_offset_int dist1, dist2;
   if (known_le (offseta1, offsetb1))
 {
   if (!known_size_p (a1.max_size))
dist1 = 0;
   else
-   dist1 = offsetb1 - offseta1 - a1.max_size;
+   dist1 = (poly_offset_int)offsetb1
+   - (poly_offset_int)offseta1
+   - (poly_offset_int)a1.max_size;
 }
   else
 {
   if (!known_size_p (b1.max_size))
dist1 = 0;
   else
-   dist1 = offseta1 - offsetb1 - b1.max_size;
+   dist1 = (poly_offset_int)offseta1
+- (poly_offset_int)offsetb1
+- (poly_offset_int)b1.max_size;
 }
   if (known_le (offseta2, offsetb2))
 {
   if (!known_size_p (a2.max_size))
dist2 = 0;
   else
-   dist2 = offsetb2 - offseta2 - a2.max_size;
+   dist2 = (poly_offset_int)offsetb2
+   - (poly_offset_int)offseta2
+   - (poly_offset_int)a2.max_size;
 }
   else
 {
   if (!known_size_p (b2.max_size))
dist2 = 0;
   else
-   dist2 = offseta2 - offsetb2 - b2.max_size;
+   dist2 = offseta2
+   - (poly_offset_int)offsetb2
+   - (poly_offset_int)b2.max_size;
 }
   /* It may happen that intervals overlap in case size
  is different.  Prefer the overlap to non-overlap.  */
@@ -380,9 +388,16 @@ modref_access_node::update2 (poly_int64 parm_offset1,
 new_max_size = max_size2;
   else
 {
-  new_max_size = max_size2 + offset2 - offset1;
-  if (known_le (new_max_size, max_size1))
-   new_max_size = max_size1;
+  poly_offset_int s = (poly_offset_int)max_size2
+ + (poly_offset_int)offset2
+ - (poly_offset_int)offset1;
+  if (s.to_shwi (_max_size))
+   {
+ if (known_le (new_max_size, max_size1))
+   new_max_size = max_size1;
+   }
+  else
+   new_max_size = -1;
 }
 
   update (parm_offset1, offset1,
diff --git a/gcc/testsuite/gcc.c-torture/compile/103818.c 
b/gcc/testsuite/gcc.c-torture/compile/103818.c
new file mode 100644
index 000..8a0148e243a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/103818.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target lp64 } } */
+struct a {
+  int b[0];
+} c(struct a *d) {
+  d->b[0] = d->b[-144115188075855873] + d->b[11] * d->b[2] +
+d->b[0] % d->b[1025] + d->b[5];
+  d->b[0] =
+  d->b[144678138029277184] + d->b[0] & d->b[-3] * d->b[053] + d->b[7] ^
+  d->b[-9] + d->b[14] + d->b[9] % d->b[49] + d->b[024] + d->b[82] &
+  d->b[4096];
+}
+void main() {}


Re: [PATCH v3] MIPS: IPL is 8bit in Cause and Status registers if TARGET_MCU

2022-04-10 Thread Maciej W. Rozycki
On Tue, 15 Mar 2022, YunQiang Su wrote:

> If MIPS MCU extension is enable, the IPL section in Cause and Status
> registers has been expand to 8bit instead of 6bit.
> 
> In Cause: the bits are 10-17.
> In Status: the bits are 10-16 and 18.
> 
> MD00834-2B-MUCON-AFP-01.03.pdf: P49 and P61.

 I can see you have committed this change, but I cannot see an approval 
posted to the mailing list.  Who has approved your change?

> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index a1c4b437cd4..91e1e964f94 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -12254,10 +12254,22 @@ mips_expand_prologue (void)
> /* Insert the RIPL into our copy of SR (k1) as the new IPL.  */
> if (!cfun->machine->keep_interrupts_masked_p
> && cfun->machine->int_mask == INT_MASK_EIC)
> - emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM),
> -GEN_INT (6),
> + {
> +   emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM),
> +TARGET_MCU ? GEN_INT (7) : GEN_INT (6),
>  GEN_INT (SR_IPL),
>  gen_rtx_REG (SImode, K0_REG_NUM)));
> +   if (TARGET_MCU)
> + {
> +   emit_insn (gen_lshrsi3 (gen_rtx_REG (SImode, K0_REG_NUM),
> + gen_rtx_REG (SImode, K0_REG_NUM),
> + GEN_INT (7)));
> +   emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM),
> +GEN_INT (1),
> +GEN_INT (SR_IPL+8),
> +gen_rtx_REG (SImode, K0_REG_NUM)));
> + }
> + }

 While code generation has been corrected your change has code formatting 
issues which should have been addressed before committing.  A test case 
should have been made too.

  Maciej


New Swedish PO file for 'gcc' (version 12.1-b20220403)

2022-04-10 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-12.1-b20220403.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH, v2] PR fortran/105184 - ICE in gfc_array_init_size, at fortran/trans-array.cc:5841

2022-04-10 Thread Thomas Koenig via Gcc-patches



Hi Harald,


Regtested again with no new failures.  OK for mainline?


Looks good to me.  Thanks for the patch!

Best regards

Thomas


Re: [PATCH, v2] PR fortran/105184 - ICE in gfc_array_init_size, at fortran/trans-array.cc:5841

2022-04-10 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

It looks good to me - OK for mainline.

Thanks

Paul


On Fri, 8 Apr 2022 at 21:45, Harald Anlauf via Fortran 
wrote:

> Dear all,
>
> Am 06.04.22 um 22:30 schrieb Harald Anlauf via Fortran:
> > Dear all,
> >
> > the logic for checking the allocate-coshape-spec in an ALLOCATE
> > statement was sort of sideways, and allowed to pass invalid
> > specifications to the code generation.
> >
> > The fix seems obvious (to me).
>
> after submitting the previous patch, I found another invalid case with
> a missing lower bound which was still silently accepted.  The attached
> revised patch adds this check, improves the original error message to
> actually point to the coarray specification, and renames the testcase
> to align better with existing coarray testcases.
>
> > Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> > (12 or wait for 13?).
>
> Regtested again with no new failures.  OK for mainline?
>
> Thanks,
> Harald
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein