[Bug target/96866] ICE in print_operand_address, at config/rs6000/rs6000.c:13560

2024-04-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96866

--- Comment #3 from Jiu Fu Guo  ---

While, I'm wondering if we could accept this code, and handle it as something
like:

(insn 5 4 6 (set (reg/f:DI 118)
(mem/u/c:DI (unspec:DI [
(symbol_ref/u:DI ("*.LC0") [flags 0x2])
(reg:DI 2 2)
] UNSPEC_TOCREL) [2  S8 A8])) "t.c":8:8 -1
 (expr_list:REG_EQUAL (symbol_ref:DI ("x") [flags 0x80]  )
(nil)))

(insn 6 5 0 (parallel [
(asm_operands/v ("#%a0") ("") 0 [
(reg/f:DI 118)
]
 [
(asm_input:DI ("X") t.c:9)
]
 [] t.c:9)
(clobber (reg:SI 98 ca))
]) "t.c":9:3 -1
 (nil))

[Bug target/96866] ICE in print_operand_address, at config/rs6000/rs6000.c:13560

2024-04-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96866

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #2 from Jiu Fu Guo  ---
with -fPIC, the asm insn in RTL looks like:

(insn 8 7 0 (parallel [ 
(asm_operands/v ("#%a0") ("") 0 [   
(symbol_ref:DI ("x") [flags 0x80]  )
]   
 [  
(asm_input:DI ("X") t.c:9)  
]   
 [] t.c:9)  
(clobber (reg:SI 98 ca))
]) "t.c":9:3 -1 
 (nil))


Here operand 0 of asm is "(symbol_ref:DI ("x")..)", this is not handled as the
invalid address.
Some targets(e.g. x86_64) report messages (like "invalid constraints for
operand") for this code.

This PR mentions ice-on-invalid-code too :)

[Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9

2024-04-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #2 from Jiu Fu Guo  ---
(In reply to Alexander Grund from comment #1)
> It looks like this was fixed in 10.1 by this commit
> https://github.com/gcc-mirror/gcc/commit/
> 37e0df8a9be5a8232f4ccb73cdadb02121ba523f
...
> `HONOR_NANS (compare_mode)` case. However it still ignores signed zeros.

'xsmincdp' may be fine for zeros, it seems '!HONOR_SIGNED_ZEROS' is not needed.

> Maybe xsmindp would be a better fit as it preserves the signed zeros. Only
> downside I see is that it converts sNan to qNan which may be an issue.

[Bug target/114786] ICE in recog.cc: unrecognizable insn while compiling bcd-3.c for power pc

2024-04-22 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114786

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED
 CC||guojiufu at gcc dot gnu.org

--- Comment #1 from Jiu Fu Guo  ---
This seems a duplicate of PR100736.

*** This bug has been marked as a duplicate of bug 100736 ***

[Bug target/100736] ICE: unrecognizable insn

2024-04-22 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100736

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||pheeck at gcc dot gnu.org

--- Comment #7 from Jiu Fu Guo  ---
*** Bug 114786 has been marked as a duplicate of this bug. ***

[Bug target/95782] [s390] ICE in _cpp_pop_context

2024-03-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #2)
> The powerpc issue was fixed in GCC 13 (most likely by
> r13-220-g067fe66c8ba9b16feacf66fce9ae668091e42821 ).
> 
> s390 most likely needs the same treatment:
> ```
> [apinski@xeond2 rs6000]$ git diff ../s390/s390-c.cc
> diff --git a/gcc/config/s390/s390-c.cc b/gcc/config/s390/s390-c.cc
> index 8d3d1a467a8..8096b1ff7c1 100644
> --- a/gcc/config/s390/s390-c.cc
> +++ b/gcc/config/s390/s390-c.cc
> @@ -275,7 +275,7 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token
> *tok)
>/* __vector long __bool a; */
>if (ident == C_CPP_HASHNODE (__bool_keyword))
> expand_bool_p = true;
> -  else
> +  else if (ident)
> {
>   /* Triggered with: __vector long long __bool a; */
>   do
> 
> ```
> 
> I cannot test this at all, and a similar testcase in PR 101168 should be
> added for s390.

Test with cross-compiling, this code can fix the issue as expected.

[Bug testsuite/106879] [13 regression] new test case gcc.dg/vect/bb-slp-layout-19.c fails

2024-01-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106879

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Jiu Fu Guo  ---
(In reply to Richard Biener from comment #9)
> Fixed on trunk I suppose?  If so please also sync to the 13 branch and close
> this issue.

Thanks.

[Bug middle-end/29215] [4.3 Regression] extra store for memcpy

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29215
Bug 29215 depends on bug 30271, which changed state.

Bug 30271 Summary: -mstrict-align can add an store extra for struct argument 
passing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug other/16996] [meta-bug] code size improvements

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16996
Bug 16996 depends on bug 30271, which changed state.

Bug 30271 Summary: -mstrict-align can add an store extra for struct argument 
passing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/101926] [meta-bug] struct/complex/other argument passing and return should be improved

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926
Bug 101926 depends on bug 30271, which changed state.

Bug 30271 Summary: -mstrict-align can add an store extra for struct argument 
passing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/30271] -mstrict-align can add an store extra for struct argument passing

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Jiu Fu Guo  ---
Fix was committed.

[Bug middle-end/101926] [meta-bug] struct/complex/other argument passing and return should be improved

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926
Bug 101926 depends on bug 112525, which changed state.

Bug 112525 Summary: fail to eliminate unused store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/112525] fail to eliminate unused store

2024-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Jiu Fu Guo  ---
Fix committed.

[Bug middle-end/113109] [14 Regression] g++ EH tests fail at execution time for cris-elf after r14-6674-g4759383245ac97

2023-12-23 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113109

--- Comment #13 from Jiu Fu Guo  ---
(In reply to GCC Commits from comment #9)
> The master branch has been updated by Hans-Peter Nilsson :
> 
> https://gcc.gnu.org/g:3d03630b123411340e52d05124cb0cacfa1fc8b0
> 
> commit r14-6817-g3d03630b123411340e52d05124cb0cacfa1fc8b0
> Author: Hans-Peter Nilsson 
> Date:   Sun Dec 24 00:10:32 2023 +0100
> 
> CRIS: Fix PR middle-end/113109; "throw" failing
> 
> TL;DR: the "dse1" pass removed the eh-return-address store.  The
> PA also marks its EH_RETURN_HANDLER_RTX as volatile, for the same
> reason, as does visum.  See PR32769 - it's the same thing on PA.
> 
> Conceptually, it's logical that stores to incoming args are
> optimized out on the return path or if no loads are seen -
> at least before epilogue expansion, when the subsequent load
> isn't seen in the RTL, as is the case for the "dse1" pass.

The stores to the argp/frame can be eliminated only if they are not used.
While for this case, the store may be used by EH handler, it should not be
optimized out. 

Thanks for catching and handling this quickly.

Happy holidays.

[Bug middle-end/113109] [14 Regression] g++ EH tests fail at execution time for cris-elf after r14-6674-g4759383245ac97

2023-12-23 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113109

--- Comment #8 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #6)
> So I did a quick audit of the EH_RETURN_HANDLER_RTX and most are registers
> rather than a memory location  . And the ones which were memory locations
> used either frame or stack pointer directly which seemed to not to be
> removed. I had originally was going to record my findings but then I saw the
> volatile for pa risc and deleted what I had wrote up.

I'm wondering if we need to revert r14-6674 to avoid this functionality issue.
And revisit/enhance the patch later.

[Bug middle-end/113109] [14 Regression] g++ EH tests fail at execution time for cris-elf after r14-6674-g4759383245ac97

2023-12-23 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113109

--- Comment #7 from Jiu Fu Guo  ---
(In reply to Hans-Peter Nilsson from comment #3)
> 
> I'm "guessing" that the problem with the patch, is that anything any port
> stores through a pointer based on virtual_incoming_args_rtx before
> returning, is now eliminated.

Oh, yes, this is a possible place where that patch does not handle well.

[Bug target/30271] -mstrict-align can an store extra for struct agrument passing

2023-12-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #13 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #10)
> (In reply to comment #9)
> > Andrew, 
> > 
> > What is your point here?
> 
> My point here is that currently we do:
>   gi->frame_related =
> (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx);
> 
> But if we change it to be:
>   gi->frame_related =
> (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx)
> || (base == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]);
> 
> It would delete the store (at least in a 4.3 based compiler). 
> arg_pointer_rtx is the incoming argument space so if it is a fixed register
> it will be also frame related and we can safely delete the stores to this
> space.

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639550.html is using
this idea too.  And the 'std' in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271#c2 disappeared.

[Bug rtl-optimization/111971] [12/13/14 regression] ICE: maximum number of generated reload insns per insn achieved (90) since r12-6803-g85419ac59724b7

2023-11-22 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jiu Fu Guo  ---
Fixed in the trunk.

[Bug rtl-optimization/112525] fail to eliminate unused store

2023-11-14 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

--- Comment #6 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #3)
> One possible method is fixing DSE to let is able to remove those 'store's.
> (but need to take care of the case that is using 'arg_pointer' to pass
> parameters.)
> 

Some 'store's to the incoming argument area (arg_pointer_rtx) may not safe to
be removed:
For example: call memset on X86_64 , the insn(s) maybe:
  134: [argp+0x8]=r134:SI
  135: [argp+0x4]=0x1
  136: [argp]=r132:SI
  137: ax:SI=call [`memset'] argc:0xc
  REG_CALL_DECL `memset'
  REG_EH_REGION 0

insn(s) 134/135/136 can not be removed.

[Bug rtl-optimization/112525] fail to eliminate unused store

2023-11-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

--- Comment #3 from Jiu Fu Guo  ---
One possible method is fixing DSE to let is able to remove those 'store's.
(but need to take care of the case that is using 'arg_pointer' to pass
parameters.)


Another method: there is a patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634500.html which
introduces lighter-expander-sra (this patch is only for struct parameter now).
We may enhance this patch to avoid storing the unused parameters.

[Bug rtl-optimization/112525] fail to eliminate unused store

2023-11-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

--- Comment #1 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #0)
> For below code:
> ```
> typedef struct teststruct
> {
>   double d;
>   int arr[15]; /* for ppc64le example foo1, 14: foo is just blr. 15: foo has
> 8 'std's */
> } teststruct;

Here, if the code is "int arr[14];", the struct is just passed via registers
(in foo function); and in the expander pass, they are stored to frame area. 
If the code is "int arr[15];",  the struct is passed through registers
partially, and the other partial is through memory; and in the expander pass,
they are stored to the area which is pointed via arg pointer.

In DSE pass, the 'dead' stores to 'frame' can be eliminated.

[Bug rtl-optimization/112525] New: fail to eliminate unused store

2023-11-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112525

Bug ID: 112525
   Summary: fail to eliminate unused store
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For below code:
```
typedef struct teststruct
{
  double d;
  int arr[15]; /* for ppc64le example foo1, 14: foo is just blr. 15: foo has 8
'std's */
} teststruct;

int
foo (int a, teststruct p)
{
  if (a > 0)
return 1;
  return 2;
}

void
foo1 (teststruct p)
{
}
```

Some instructions are generated to store "p" to stack (stored to areas of arg
pointer/virtual_incoming_pointer).
But those stores are not eliminated.

For example, on ppc64le, below code is generated:
```
foo1:
.LFB1:
.cfi_startproc
std 3,32(1)
std 4,40(1)
std 5,48(1)
std 6,56(1)
std 7,64(1)
std 8,72(1)
std 9,80(1)
std 10,88(1)
blr
```
Those 'std's are dead actually.

[Bug testsuite/112340] [14 regression] assembler instruction counts off for gcc.target/powerpc/pr106550_1.c

2023-11-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112340

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jiu Fu Guo  ---
Case updated.

[Bug rtl-optimization/111971] ICE: maximum number of generated reload insns per insn achieved (90)

2023-10-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

--- Comment #5 from Jiu Fu Guo  ---
With a bisect, the result shows "85419ac59724b7ce710ebb4acf03dbd747edeea3 is
the first bad commit".

[Bug target/108220] ICE: maximum number of generated reload insns per insn achieved (90)

2023-10-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108220

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #2 from Jiu Fu Guo  ---
This may be the same issue as PR111971.

It is ok if it is "register long d asm ("r0") = 0x24;".

The 'd' is 'long long' DImode(64bits), but with -m32, "r0" is not 64bits
without -mpowerpc64.  This may be a reason for the issue.

[Bug rtl-optimization/111971] ICE: maximum number of generated reload insns per insn achieved (90)

2023-10-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

--- Comment #2 from Jiu Fu Guo  ---
It seems gcc11 is ok.

[Bug rtl-optimization/111971] ICE: maximum number of generated reload insns per insn achieved (90)

2023-10-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

--- Comment #1 from Jiu Fu Guo  ---
This issue can be reproduced on 'ppc64' BE machine with -m32.

[Bug rtl-optimization/111971] New: ICE: maximum number of generated reload insns per insn achieved (90)

2023-10-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

Bug ID: 111971
   Summary: ICE: maximum number of generated reload insns per insn
achieved (90)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For the below code, an ICE occurs when built with "-m32 -O2".
```
void
foo (unsigned long long *a)
{
  register long long d asm ("r0") = 0x24;
  long long n;
  asm ("mr %0, %1" : "=r"(n) : "r"(d));
  *a++ = n;
}

```
---
8 | }
  | ^
0x207c4ca3 __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x207c4f07 generic_start_main
../csu/libc-start.c:360
0x207c4f07 __libc_start_main_impl
../sysdeps/unix/sysv/linux/powerpc/libc-start.c:109
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.



It is ok if it is "register long d asm ("r0") = 0x24;".

The 'd' is 'long long' DImode(64bits), but with -m32, "r0" is not 64bits
without -mpowerpc64.  So, it would say this code would be invalid in some
aspects.

[Bug target/111778] PowerPC constant code change uses an undefined shift

2023-10-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778

--- Comment #2 from Jiu Fu Guo  ---
Thanks so much for reporting this issue, and thanks for tracing down it!

For the code, if 'lz' is 0, it is correct to return false.

[Bug target/94395] Powerpc suboptimal 64-bit constant generation near large values with few bits set

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94395

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
After r14-4470, the trunk could generate a better code for this case.

[Bug target/94393] Powerpc suboptimal 64-bit constant comparison

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94393

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||guojiufu at gcc dot gnu.org
 Status|NEW |RESOLVED

--- Comment #9 from Jiu Fu Guo  ---
After r14-4470, trunk generates better code for this case.

[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Jiu Fu Guo  ---
Patches are committed for using "li/lis;rldicl/rldicr/rldic" to construct
constants.

[Bug target/106708] [rs6000] 64bit constant generation with oris xoris

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Jiu Fu Guo  ---
Patch ready on the trunk.

[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jiu Fu Guo  ---
Fixed.

[Bug target/111597] pattern "(T)(A)+cst -->(T)(A+cst)" cause suboptimal for ppc64

2023-09-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111597

--- Comment #3 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #0)
> In match.pd there is a pattern:
> /* ((T)(A)) + CST -> (T)(A + CST)  */
> #if GIMPLE
>   (simplify
>(plus (convert:s SSA_NAME@0) INTEGER_CST@1)
> (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
>  && TREE_CODE (type) == INTEGER_TYPE
>  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
>  && int_fits_type_p (@1, TREE_TYPE (@0)))
>  /* Perform binary operation inside the cast if the constant fits
> and (A + CST)'s range does not overflow.  *
> 
> But this pattern seems not in favor of all targets. 
> For example, the below code hits this pattern, 
> 
> long foo1 (int x)
> {
>   if (x>1000)
> return 0;
>   int x1 = x +1;
>   return (long) x1 + 40;
> 
> }

For this code, without the pattern "((T)(A)) + CST -> (T)(A + CST)",
the final gimple code is:
  x1_4 = x_3(D) + 1;
  _1 = (long int) x1_4;
  _5 = _1 + 40;

With the pattern,
the final gimple code is:
  _4 = x_2(D) + 41;
  _3 = (long int) _4;

So, this pattern would be "reasonable".

[Bug target/111597] pattern "(T)(A)+cst -->(T)(A+cst)" cause suboptimal for ppc64

2023-09-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111597

--- Comment #1 from Jiu Fu Guo  ---
While, even without this pattern in match.pd, the generated the sign
extend(extsw) is still there.
So, just wondering if this sub-optimal asm code would be fixed in another way.

[Bug target/111597] New: pattern "(T)(A)+cst -->(T)(A+cst)" cause suboptimal for ppc64

2023-09-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111597

Bug ID: 111597
   Summary: pattern "(T)(A)+cst -->(T)(A+cst)" cause suboptimal
for ppc64
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
      Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

In match.pd there is a pattern:
/* ((T)(A)) + CST -> (T)(A + CST)  */
#if GIMPLE
  (simplify
   (plus (convert:s SSA_NAME@0) INTEGER_CST@1)
(if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
 && TREE_CODE (type) == INTEGER_TYPE
 && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
 && int_fits_type_p (@1, TREE_TYPE (@0)))
 /* Perform binary operation inside the cast if the constant fits
and (A + CST)'s range does not overflow.  *

But this pattern seems not in favor of all targets. 
For example, the below code hits this pattern, 

long foo1 (int x)
{
  if (x>1000)
return 0;
  int x1 = x +1;
  return (long) x1 + 40;
}

Compile with "-O2 -S", on ppc64le, the generated asm is:
cmpwi 0,3,1000
bgt 0,.L3
addi 3,3,41
extsw 3,3 ;; this is suboptimal
blr
.p2align 4,,15
.L3:
li 3,0
blr
--
But for the below code, the generated asm seems better: without 
long foo1 (int x)
{
  if (x>1000)
return 0;
  return (long) x + 40; 
}

cmpwi 0,3,1000
bgt 0,.L3
addi 3,3,40
blr
.p2align 4,,15
.L3:
li 3,0
blr

[Bug tree-optimization/111495] [14 regression] ICE in lower_bound, at value-range.h:1078 when building LLVM 17.0.1 since r14-3644-g1aceceb1e2d6e8

2023-09-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111495

--- Comment #5 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #3)
> I suspect r14-4192-g4d80863d7f93c0a839d1fe5 fixed this ...

Yes, I reproduced this issue on ppc64le, and the fix r14-4192 seems to work
fine.

[Bug tree-optimization/111482] [14 Regression] ice in lower_bound with -O3

2023-09-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111482

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jiu Fu Guo  ---
Fix via r14-4192-g4d80863d7f93c0a839d1fe5dc59be83153e89110.

[Bug c++/111482] [14 Regression] ice in lower_bound with -O3

2023-09-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111482

--- Comment #7 from Jiu Fu Guo  ---
This is caused by missing to check a vr's "undefine_p".

In the pattern "(X + C) / N", 

(if (exact_mod (c)
...
  && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1) 
...)
   (plus (op @0 @2) { wide_int_to_tree (type, plus_op1 (c)); })
   (if (TYPE_UNSIGNED (type) && c.sign_mask () < 0
&& exact_mod (-c)
/* unsigned "X-(-C)" doesn't underflow.  */
&& wi::geu_p (vr0.lower_bound (), -c))

In the "(if (exact_mode (c)" part, the code "overflow_free_p (vr0, vr1)" checks
v0/vr are defined. 
But in the "else" part, "if (... && wi::geu_p (vr0.lower_bound (), -c)", vr0 is
not checked undefined_p.

[Bug c++/111482] [14 Regression] ice in lower_bound with -O3

2023-09-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111482

--- Comment #6 from Jiu Fu Guo  ---
I reproduced these issue PR11148 and PR111355 on ppc64le too.

[Bug c++/111482] [14 Regression] ice in lower_bound with -O3

2023-09-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111482

--- Comment #5 from Jiu Fu Guo  ---
Thanks a lot for reporting this!

[Bug tree-optimization/111303] [14 Regression] ICE: in type, at value-range.h:869

2023-09-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

--- Comment #9 from Jiu Fu Guo  ---
(In reply to CVS Commits from comment #7) this comment should be linked to
PR111324.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #7 from Jiu Fu Guo  ---
A comment https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303#c7 
should be attached here.

[Bug tree-optimization/111303] [14 Regression] ICE: in type, at value-range.h:869

2023-09-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jiu Fu Guo  ---
Fix is committed via r14-3913-g8d8bc560b6ab7f3153db23ffb37157528e5b2c9a.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jiu Fu Guo  ---
This can be handled by r14-3913-g8d8bc560b6ab7f3153db23ffb37157528e5b2c9a.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #5 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #2)
> Confirmed. 
> 
> So using the local range in this case is ok. There might be only a few times
> we don't want to use it though in match.

Agree, "get_range_query" would be more useful for most cases.


Through a quick look at match.pd, there are another two patterns that use
"get_global_range_query".

Some concerns about those patterns, so those patterns may not need to be
updated.

* (T)(A)+cst -->(T)(A+cst): I'm wondering if this transformation is really in
favor of PPC.
e.g. "return (long) x1 + 40;" could save one "extend-insn" less than "return
(long)(x1 + 40);"

* For pattern "((x * cst) + cst1) * cst2": it seems this pattern does not
affect any cases. I mean this optimization is done by other parts (before
match.pd).

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #4 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #3)
> A patch is posted:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629534.html
It is not for this PR. Sorry for typo.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #3 from Jiu Fu Guo  ---
A patch is posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629534.html

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #1 from Jiu Fu Guo  ---
In match.pd, there is a pattern:

/* Simplify (t * 2) / 2) -> t.  */
(for div (trunc_div ceil_div floor_div round_div exact_div)
 (simplify
  (div (mult:c @0 @1) @1)
  (if (ANY_INTEGRAL_TYPE_P (type))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
@0
#if GIMPLE
(with
 {
   bool overflowed = true;
   value_range vr0, vr1;
   if (INTEGRAL_TYPE_P (type)
   && get_global_range_query ()->range_of_expr (vr0, @0)
   && get_global_range_query ()->range_of_expr (vr1, @1)
   && !vr0.varying_p () && !vr0.undefined_p ()
   && !vr1.varying_p () && !vr1.undefined_p ())
 {

Here, "get_global_range_query" is able to get the value-range info for SSA.
But it does not handle the case t.c. "get_range_query" can handle it.

[Bug middle-end/111324] New: More optimization about "(X * Y) / Y"

2023-09-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

Bug ID: 111324
   Summary: More optimization about "(X * Y) / Y"
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For case:
-- t.c
typedef unsigned int INT;

INT
foo (INT x, INT y)
{
  if (x > 100 || y > 100)
return 0;
  return (x * y) / y;
}
-
gcc -O2 t.c -S -fdump-tree-optimized

   [local count: 467721933]:
  _1 = x_3(D) * y_4(D);
  _5 = _1 / y_4(D);

   [local count: 1073741824]:
  # _2 = PHI <0(2), _5(4), 0(3)>
  return _2;

While for the below case, it can be optimized.

--
typedef unsigned int INT;

INT
foo (INT x, INT y)
{
  if (x > 100 || y > 100)
return 0;
  INT x1 = x + 1;
  INT y1 = y + 1;
  return (x1 * y1) / y1;
}
---

The "(x1 * y1) / y1" is optimized to "x1". 

   [local count: 467721933]:
  x1_4 = x_2(D) + 1;

   [local count: 1073741824]:
  # _1 = PHI <0(2), x1_4(4), 0(3)>
  return _1;

[Bug c/111303] ICE: in type, at value-range.h:869

2023-09-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

--- Comment #4 from Jiu Fu Guo  ---
For the pattern: "(X + C) / N", "op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)"
where "X" has value-range, and "X + C" does not overflow:

&& get_range_query (cfun)->range_of_expr (vr0, @0))
&& get_range_query (cfun)->range_of_expr (vr1, @1)
&& range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)

Then "@3"(it is X+C) would be with value-range usually.
But for particular cases, like this PR, "vr3" is undefined. 
Below would be the reason for why "vr3" is undefined:


_3 = _2 + -5;
if (0 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]
;;  succ:   3
;;  4

;; basic block 3, loop depth 0
;;  pred:   2
_5 = _3 / 5; 
;;  succ:   4

The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb" would be unreachable
(because "if (0 != 0)" is always false).
And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in "bb 3",
"range_of_expr" gets an "undefined vr3".

[Bug c/111303] ICE: in type, at value-range.h:869

2023-09-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

--- Comment #3 from Jiu Fu Guo  ---
In the pattern of match.pd, there is:

  && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
  && get_range_query (cfun)->range_of_expr (vr3, @3)
  /* "X+C" and "X" are not of opposite sign.  */
  && (TYPE_UNSIGNED (type)
  || (vr0.nonnegative_p () && vr3.nonnegative_p ())
  || (vr0.nonpositive_p () && vr3.nonpositive_p (


For this case, "vr3" is "undefined_p", then "vr3.nonnegative_p ()" trige ICE.

Checking "!vr3.undefine_p ()" would be a safe fix for this ICE.

[Bug c/111303] ICE: in type, at value-range.h:869

2023-09-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

Jiu Fu Guo  changed:

   What|Removed |Added

   Last reconfirmed||2023-09-06
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

[Bug c/111303] ICE: in type, at value-range.h:869

2023-09-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111303

Jiu Fu Guo  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |guojiufu at gcc dot 
gnu.org

--- Comment #2 from Jiu Fu Guo  ---
Thanks for reporting this!

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-09-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #26 from Jiu Fu Guo  ---
Patch was committed.

[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones

2023-08-18 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176

--- Comment #12 from Jiu Fu Guo  ---
Thanks a lot for asking!

The patch which handles this is submitted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623519.html

I would ping this patch again.  If ok, I will commit to trunk.

(And the series patches could be committed for "li/lis; rldicl/rldicr/rldic"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621000.html)

[Bug target/106460] internal compiler error: output_operand: invalid expression as operand on -O1

2023-07-18 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106460

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jiu Fu Guo  ---
Fixed in trunk.

[Bug c++/101853] [12/13/14 Regression] g++.dg/modules/xtreme-header-5_b.C ICE

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101853

--- Comment #17 from Jiu Fu Guo  ---
> But "nobody" counts that close, so better say "no xtreme-header-* failures
> since r13-5702-g72058eea9d407e".

:) Since these failures occur erratically, so maybe reopen this or open a new
one if the failures are reproduced. 

As two xtreme-header-5_ failures (not ICE) occur in Results for 14.0.0
20230518: https://gcc.gnu.org/pipermail/gcc-testresults/2023-May/784674.html.

[Bug c++/100052] [11/12/13/14 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

--- Comment #15 from Jiu Fu Guo  ---
(In reply to seurer from comment #14)
> The failures occur erratically so one clean run doesn't mean much.  Scanning
> the test results mailing list I see failures for this just today in trunk.

Yeap, thanks for in time comment!  so it seems still erratic.

[Bug c++/101853] [12/13/14 Regression] g++.dg/modules/xtreme-header-5_b.C ICE

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101853

--- Comment #14 from Jiu Fu Guo  ---
Pass on trunk, gcc-12, gcc-11 for xtreme-header-* cases:

make check-gcc-c++ RUNTESTFLAGS="--target_board=unix'{-m64}'
modules.exp=xtreme-header-*" 
=== g++ Summary ===

# of expected passes72

[Bug c++/100052] [11/12/13/14 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

--- Comment #13 from Jiu Fu Guo  ---
Pass on trunk, gcc-12, gcc-11 for xtreme-header-* cases:

make check-gcc-c++ RUNTESTFLAGS="--target_board=unix'{-m64}'
modules.exp=xtreme-header-*" 
=== g++ Summary ===

# of expected passes72

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #24 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #23)
> /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N
> */
> (for div (trunc_div exact_div)
div was not used in this matcher, yet.  Here rshift is used:  t/(1<>N".

>  (simplify
>   (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
>(wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
>(if (TYPE_OVERFLOW_UNDEFINED (type))
> (div @0 @2)
This should be "(rshift @0 @2)", otherwise it will be error if relax
"TYPE_UNSIGNED (type)"

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #23 from Jiu Fu Guo  ---
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div exact_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
   (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
(div @0 @2)
#if GIMPLE
(with
 {
   bool overflowed = true;
   value_range vr0;
   if (get_range_query (cfun)->range_of_expr (vr0, @0)
   && !vr0.varying_p () && !vr0.undefined_p ())
 {
   wide_int wmin0 = vr0.lower_bound ();
   wide_int wmax0 = vr0.upper_bound ();
   wide_int w1 = -wi::to_wide (@1);
   wi::overflow_type min_ovf, max_ovf;
   wi::sub (wmin0, w1, TYPE_SIGN (type), _ovf);
   wi::sub (wmax0, w1, TYPE_SIGN (type), _ovf);
   if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
 overflowed = false;
 }
 }
(if (!overflowed)
 (rshift @0 @2)))
#endif
   

Got one match for the case.
Checking if it is safe(condition) or how to support other forms:
signed type, negative N, non-power2 N, negative M ...

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #22 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> > return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.

I also tried "get_range_query (cfun)->range_of_expr (vr0, @0)", 
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
> __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> > return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
> __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

Thanks so much!
"get_range_query (cfun)->range_of_expr (vr0, @0)" works for both the case!

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #20 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #19)
> Note in the loop case we know it does not wrap because there is a check
> already:
>[local count: 118111600]:
>   if (rows_8(D) > 3)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _13 = rows_8(D) + 18446744073709551612;
>   _15 = _13 / 4;
>   doloop.6_5 = _15 + 1;

Checking why below code does not work:
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div round_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (ANY_INTEGRAL_TYPE_P (type) &&
   (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
(div @0 @2)
#if GIMPLE
(with
 {
   bool overflowed = true;
   value_range vr0;
   if (INTEGRAL_TYPE_P (type)
   && get_global_range_query ()->range_of_expr (vr0, @0)
   && !vr0.varying_p () && !vr0.undefined_p ())
 {
   wide_int wmin0 = vr0.lower_bound ();
   wide_int wmax0 = vr0.upper_bound ();
   wide_int w1 = -wi::to_wide (@1);
   wi::overflow_type min_ovf, max_ovf;
   wi::add (wmin0, w1, TYPE_SIGN (type), _ovf);
   wi::add (wmax0, w1, TYPE_SIGN (type), _ovf);
   if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
 overflowed = false;
 }
 }
(if (!overflowed)
 (div @0 @2)))
#endif
   


Interesting thing:
the VR is always VR_VARYING, even for the below simple case:

typedef unsigned long INT;
INT __attribute__ ((noinline)) foo (INT x)
{
  if (x < 4)
return 0;
  INT a = x + 18446744073709551612ULL;
  INT b = a >> 2;
  return b + 1;
}

[Bug target/106708] [rs6000] 64bit constant generation with oris xoris

2023-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708

--- Comment #3 from Jiu Fu Guo  ---
With the trunk, for
  *arg++ = 0x98765432ULL;
  *arg++ = 0x7cdeab55ULL;
are expected.

For *arg++ = 0x6543ULL; lis+xoris are not committed to the trunk
yet.

[Bug target/93178] PPC: inefficient 64-bit constant generation if msb is off in low 16 bit

2023-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93178

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
 CC||guojiufu at gcc dot gnu.org

--- Comment #4 from Jiu Fu Guo  ---
On the trunk, the output is expected:
li 3,17185
oris 3,3,0x8765
blr

[Bug testsuite/108809] gcc.target/powerpc/builtins-5-p9-runnable.c fails on power 9 BE

2023-05-09 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108809

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Jiu Fu Guo  ---
Fixed in the trunk.

[Bug testsuite/106879] [13/14 regression] new test case gcc.dg/vect/bb-slp-layout-19.c fails

2023-04-18 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106879

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #5 from Jiu Fu Guo  ---
Like PR65484 and PR87306. 
On P7, option -mno-allow-movmisalign is added during testing, which prevents
slp happen on the code.

[Bug target/108809] gcc.target/powerpc/builtins-5-p9-runnable.c fails on power 9 BE

2023-04-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108809

--- Comment #3 from Jiu Fu Guo  ---
A similar different view between BE and LE on the vector for vec_xst_len_r.
For: 
store_data_uc = (vector unsigned char){ 1, 2, 3, 4, 5, 6, 7, 8,
 9, 10, 11, 12, 13, 14, 15, 16 };
vec_xst_len_r(store_data_uc, address, size);

On BE, from 128bit view (corresponding to store_data_uc) is
`0x102030405060708090a0b0c0d0e0f10`.
But on LE, from 128bit view, it is `0x100f0e0d0c0b0a090807060504030201`.
After right justified (left clean to 0s), the effective bytes to be stored to
buff on LE are those with smaller values (1,2,...);  
On BE, the bytes to be stored are those with bigger values (16,15,...)

BTW: the generated insn sequence aligns with the example implementation in
PVIPR.

[Bug target/108809] gcc.target/powerpc/builtins-5-p9-runnable.c fails on power 9 BE

2023-04-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108809

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #2 from Jiu Fu Guo  ---
(In reply to Kewen Lin from comment #1)
> It's very likely a test issue, may need to adjust some built-in function for
> endianness issue (as they have different element ordering on BE and LE).

Yeap it should be the test case issue on the difference between BE and LE:
For a buff `\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022`,
after loading to a vector right justified, the result is different between BE
and LE.
e.g. for a vector (from 128bit view: 0x102030405060708), 
from v16_int8 view on BE, it is v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8}; but from the v16_int8 view on LE,
it is v16_int8 = {0x8, 0x7, 0x6, 0x5, 0x4, 0x3, 0x2, 0x1, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0}.

So, we would just need to update the expected result for BE for this test case.

[Bug target/108722] gcc.dg/analyzer/file-CWE-1341-example.c fails on power 9 BE

2023-04-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108722

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #1 from Jiu Fu Guo  ---
The case only checkes [CWE-1341] which is `double-fclose`.
While on the BE machine, besides [CWE-1341], a message of [CWE-415] is also
reported. Because attribute `malloc` is attached on fopen:
```
# 258 "/usr/include/stdio.h" 3 4
extern FILE *fopen (const char *__restrict __filename,
  const char *__restrict __modes)   
  __attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;

or say: __attribute_malloc__ __attr_dealloc_fclose __wur;
```

Because this case is not intended to test CWE-415, it may be ok to add
-Wno-analyzer-double-free for this case.

[Bug target/108338] New: use mtvsrws for lowpart DI->SF conversion on P9

2023-01-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338

Bug ID: 108338
   Summary: use mtvsrws for lowpart DI->SF conversion on P9
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

In a mail-list discussion,
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609054.html, as Segher
points out, we could use 'mtvsrws' for the conversion from lowpart DI to SF on
P9;  and use 'mtvsrd' for the conversion from highpart of DI to SF.

float sf_from_di_off0 (long l)
{
  char buff[16];
  *(long*)buff = l;
  float f = *(float*)(buff);
  return f;
}

float sf_from_di_off4 (long l)
{
  char buff[16];
  *(long*)buff = l;
  float f = *(float*)(buff + 4);
  return f;
}

With trunk, -O2 -mcpu=power9,  the asm code could be optimized.

sldi 9,3,32
mtvsrd 1,9
xscvspdpn 1,1
blr
==> mtvsrws;xscvspdpn  (p9 LE) lowpart


srdi 3,3,32
sldi 9,3,32
mtvsrd 1,9
xscvspdpn 1,1
blr
==> (+ clean lowpart to 0) mtvsrd;xscvspdpn   highpart

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2023-01-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jiu Fu Guo  ---
Fixed now.

[Bug middle-end/108073] [rs6000] sub-optimal float member accessing on struct parameter

2022-12-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108073

--- Comment #2 from Jiu Fu Guo  ---
(In reply to Surya Kumari Jangala from comment #1)
> Hi Jiu Fu Guo, are you working on this bug? If not, I would like to take
> this up.

Thanks for your asking!
I drafted an experimental patch. Welcome for any comments or refactoring :)

[Bug c/108073] New: [rs6000] sub-optimal float member accessing on struct parameter

2022-12-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108073

Bug ID: 108073
   Summary: [rs6000] sub-optimal float member accessing on struct
parameter
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For the below code:

typedef struct DF {double a[4]; long l; } DF;
double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}


At -O2, with gcc trunk(13.0), we get below sequence:

std 6,-24(1)
ori 2,2,0
lfd 1,-24(1)
blr

Actually, just one "mtvsrd 1, 6" is enough.

In this case, the argument is passed through integer registers. 

For below code, it is similar:

typedef struct SF {float a[4];short l; } SF;
float foo (SF arg){return arg.a[3];}  

std 4,-24(1)
ori 2,2,0
lfs 1,-20(1)
vs. below seq seems faster.
rldicr 4,4,0,31
mtvsrd 1,4
xscvspdpn 1,1

[Bug target/106708] [rs6000] 64bit constant generation with oris xoris

2022-12-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708

--- Comment #1 from Jiu Fu Guo  ---
PR93178 also mentioned the "li,oris" case.

[Bug target/107692] [13 regression] r13-3950-g071e428c24ee8c breaks many test cases

2022-11-18 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107692

--- Comment #7 from Jiu Fu Guo  ---
(In reply to Hongyu Wang from comment #6)
> (In reply to Jiu Fu Guo from comment #4)
> cut...
> 
> Yes, I've already posted the patch at
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606478.html

One minor finding: 
Like -munroll-only-small-loops on other targets(e.g. rs6000, r13-3950 intends
to enable unrolling and uses this option to control the unroll factor according
to the loop size.  Compare with the previous logic(e.g. for rs6000), the new 
logic will cause: 
-fno-unroll-loops may be unable to prevent rtl_unroll_loops from running, but
loop_unroll_adjust will return 1 to prevent the loop to be unrolled. 
So, there may be side-effects on "slight compiling time" and
"dump-file may be generated with the message of failing to unroll".

[Bug target/107692] [13 regression] r13-3950-g071e428c24ee8c breaks many test cases

2022-11-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107692

--- Comment #5 from Jiu Fu Guo  ---
> -munroll-only-small-loops does not turn on or off -funroll-loops, and it
> should not, so that it does what it says, if nothing else.

Yes, and -funroll-loops would win over -munroll-only-small-loops

[Bug target/107692] [13 regression] r13-3950-g071e428c24ee8c breaks many test cases

2022-11-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107692

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #4 from Jiu Fu Guo  ---
(In reply to Hongyu Wang from comment #2)
> Created attachment 53897 [details]
> A patch
> 
> Sorry for introducing these fails. Here is the patch.
> 
> I've tested the patch with cross-compler and all the fails disappeared, but
> I don't have a powerpc to do full bootstrap & regtest (I'm still applying
> for gcc farm account).
> 
> I'll send out the patch after I can access gcc farm for a power machine, or
> hopefully someone can help testing the patch.
> 
> I suppose s390 has similar issue and I will update that accordingly.
Hi,

One small comment, for code "if (!(flag_unroll_loops ||
flag_unroll_all_loops))"
we may need to add one more condition "|| loop->unroll", like what does in
r13-3950 for i386.cc.  Otherwise, unroll pragma may be affected.

[Bug libstdc++/107037] New: 541.leela_r compiling fail since r13-2779

2022-09-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107037

Bug ID: 107037
   Summary: 541.leela_r compiling fail since r13-2779
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

When building 541.leela_r from spec2017, I encounter one error:

include/c++/13.0.0/bitset: In instantiation of 'void
std::_Base_bitset<_Nw>::_M_do_reset() [with long unsigned int _Nw = 7]':
include/c++/13.0.0/bitset:1145:19:   required from 'std::bitset<_Nb>&
std::bitset<_Nb>::reset() [with long unsigned int _Nb = 441]'
Playout.cpp:20:18:   required from here
include/c++/13.0.0/bitset:187:13: error: forming reference to reference type
'long unsigned int (&)[7]'
  187 | for (_WordT& __w : _M_w)
  | ^~~


The command is 
g++ -std=c++03 -m64 -c -o Playout.o -DSPEC -DNDEBUG -I.
-DSPEC_AUTO_SUPPRESS_OPENMP  -Ofast  -DSPEC_LP64  Playout.cpp

This issue is also reproducible on -O3 or -O2.

It seems compiling fine before commit
r13-2779-9194c13909b72d23e58fee72864a2663b12f6b19.

[Bug target/106550] [rs6000] sub-optimal 64bit constant generation for P10

2022-09-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106550

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jiu Fu Guo  ---
Committed

[Bug middle-end/106928] 500.perlbench_r fail(VE) since r13-1933

2022-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106928

--- Comment #3 from Jiu Fu Guo  ---
(In reply to Martin Liška from comment #2)
> I think you missed -fno-finite-math-only option as documented here:
> https://www.spec.org/cpu2017/Docs/benchmarks/500.perlbench_r.html#portability

Thanks! It pass with -fno-finite-math-only.

[Bug middle-end/106928] 500.perlbench_r fail(VE) since r13-1933

2022-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106928

--- Comment #1 from Jiu Fu Guo  ---
The out.mis file:
3258:  # of abstol errors: 0
   Minimum abstol: nan
   ^
3259:  Maximum reltol: 0.0e+00
   # of abstol errors: 0
   ^
3260:  # of reltol errors: 0
   Maximum reltol: 0.0e+00
   ^
3261:  # of obiwan errors: 0
   Minimum reltol: nan
   ^
3262:  # of skiptol errors: 0
   # of reltol errors: 0
^
3263:  specdiff run completed
   # of obiwan errors: 0
   ^
3264:  (0, 1): spec_diff(--lines, 10, --quiet, --calctol, --histogram, -m,
--cw, one002, two003)
   # of skiptol errors: 0
   ^
3265:  Absolute differences:
   specdiff run completed
   ^
3266:*
   (0, 1): spec_diff(--lines, 10, --quiet, --calctol, --histogram, -m,
--cw, one002, two003)
   ^
3267:*
   Absolute differences:
   ^

[Bug middle-end/106928] New: 500.perlbench_r fail(VE) since r13-1933

2022-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106928

Bug ID: 106928
   Summary: 500.perlbench_r fail(VE) since r13-1933
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

With the latest trunk, the 500.perlbench_r (-Ofast) from spec2017 encounter VE
on power9 and power10 on no matter with or without
-fno-unsafe-math-optimizations.

With bisect, the commit may be r13-1933 (Implement basic range operators to
enable floating point VRP).


One compiling command:
gcc -std=c99   -m64 -c -o reentr.o -DSPEC -DNDEBUG -DPERL_CORE -I.
-Idist/IO -Icpan/Time-HiRes -Icpan/HTML-Parser -Iext/re -Ispecrand
-DDOUBLE_SLASHES_SPECIAL=0 -DSPEC_AUTO_SUPPRESS_OPENMP -D_LARGE_FILES
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64  -Ofast 
-fno-unsafe-math-optimizations-DSPEC_LINUX_PPC_LE   
-fno-strict-aliasing -fgnu89-inline   -DSPEC_LP64  reentr.c



In configure, I use:

default=base: # flags for all base  
   OPTIMIZE= -Ofast  -fno-unsafe-math-optimizations 
   FOPTIMIZE = -std=legacy
intrate,intspeed=base: # flags for integer base 
   EXTRA_COPTIMIZE   = -fno-strict-aliasing -fgnu89-inline


Log message:
*** Miscompare of diffmail.4.800.10.17.19.300.out; for details see

benchspec/CPU/500.perlbench_r/run/run_base_refrate_base_64./diffmail.4.800.10.17.19.300.out.mis
Error: 1x500.perlbench_r

[Bug target/106708] New: [rs6000] 64bit constant generation with oris xoris

2022-08-22 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708

Bug ID: 106708
   Summary: [rs6000] 64bit constant generation with oris xoris
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For code: t.c
void foo (long *arg)
{
  *arg++ = 0x98765432ULL;
  *arg++ = 0x7cdeab55ULL;
  *arg++ = 0x6543ULL;
}

gcc -O2 -S t.c
lis 10,0x
lis 8,0x9876
ori 10,10,0x7cde
lis 9,0x
ori 8,8,0x5432
sldi 10,10,16
ori 9,9,0x6543
rldicl 8,8,0,32
ori 10,10,0xab55
sldi 9,9,16

Below sequences would be better:
li 8,21554
li 10,-21675
lis 9,0xe543
oris 8,8,0x9876
xoris 10,10,0x8321
xoris 9,9,0x8000

[Bug target/106550] [rs6000] sub-optimal 64bit constant generation for P10

2022-08-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106550

--- Comment #2 from Jiu Fu Guo  ---
(In reply to Kewen Lin from comment #1)
> Confirmed.
> 
> Clang supports it as:
> 
> https://godbolt.org/z/Kxj584sfd

Thanks Kewen!

Or another example code could be:

pli 9,101736451 (0x6106003)
sldi 9,9,32
paddi 9,9, 213 (0x0208050)

[Bug target/106550] New: [rs6000] sub-optimal constant generation

2022-08-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106550

Bug ID: 106550
   Summary: [rs6000] sub-optimal constant generation
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

There is 'pli' which supports a 34bits immediate, so to generate a 64bits
constant we just need 3 instructions at most.

void
foo (unsigned long long *a)
{
  *a = 0x020805006106003;
}

On the trunk, below asm is generated:

.file   "test.c"
.machine power10
.abiversion 2
.section".text"
.align 2
.p2align 4,,15
.globl foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
.localentry foo,1
lis 9,0x20
ori 9,9,0x8050
sldi 9,9,32
oris 9,9,0x610
ori 9,9,0x6003
std 9,0(3)
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.cfi_endproc
.LFE0:
.size   foo,.-foo
.ident  "GCC: (GNU) 13.0.0 20220729 (experimental)"
.section.note.GNU-stack,"",@progbits


The compiling command: gcc -O2 -std=c99 test.c -S -mcpu=power10

[Bug target/106460] internal compiler error: output_operand: invalid expression as operand on -O1

2022-07-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106460

--- Comment #1 from Jiu Fu Guo  ---
The ice occur when output rtx "(high:DI (symbol_ref:DI ("var_48")..))) to
constant pool.
This rtx is generated at function "recog_for_combine"(in combine.cc) after
invoking "force_const_mem".

This kind of rtx represents the high part of a symbol_ref/address when passed
as an argument to "cannot_force_const_mem".  Actually, this kind of rtx can not
be put into a constant pool.  
So "cannot_force_const_mem" should return 'true' for them.

[Bug rtl-optimization/106460] New: internal compiler error: output_operand: invalid expression as operand on -O1

2022-07-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106460

Bug ID: 106460
   Summary: internal compiler error: output_operand: invalid
expression as operand on -O1
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For code:

extern short var_48;
void
foo (double *r)
{
  if (var_48)
*r = 1234.5678;
}

On gcc trunk, using the below command, an ICE is raised:
> gcc -mcpu=power10 -O1 t.c

t.c:22:1: internal compiler error: output_operand: invalid expression as
operand
   22 | }
  | ^
0x10bdb24b output_operand_lossage(char const*, ...)
/home/guojiufu/gcc/gcc-mainline-base/gcc/final.cc:3234
0x10bdd2b7 output_addr_const(_IO_FILE*, rtx_def*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/final.cc:3831
0x11a64e57 assemble_integer_with_op(char const*, rtx_def*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:2866
0x11a64f67 default_assemble_integer(rtx_def*, unsigned int, int)
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:2882
0x11b0343f rs6000_assemble_integer
/home/guojiufu/gcc/gcc-mainline-base/gcc/config/rs6000/rs6000.cc:14420
0x11a65063 assemble_integer(rtx_def*, unsigned int, unsigned int, int)
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:2898
0x11a6b727 output_constant_pool_2
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:4074
0x11a6c20b output_constant_pool_1
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:4191
0x11a6cd87 output_constant_pool_contents
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:4348
0x11a6d95b output_shared_constant_pool()
/home/guojiufu/gcc/gcc-mainline-base/gcc/varasm.cc:4546
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-05-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

--- Comment #6 from Jiu Fu Guo  ---
Drafted a patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594702.html

[Bug preprocessor/101168] gnu++14 complains about altivec types defined with using keyword in the same file with preprocessor macros

2022-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101168

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jiu Fu Guo  ---
Fixed in trunk.

[Bug c++/105418] debug_tree does not support well for std::construct_at

2022-04-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105418

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

[Bug c++/105418] debug_tree does not support well for std::construct_at

2022-04-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105418

--- Comment #6 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #5)
...
> This issue happens when calling debug_tree/decl_as_string manually inside
> FE.  At where overloaded functions (::new) are not resolved yet, and then
> cause 'tsubst' to be called. 
> I see, it is not a good place to use debug_tree.
Or add something in FE for it (not a must :)

[Bug c++/105418] debug_tree does not support well for std::construct_at

2022-04-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105418

--- Comment #5 from Jiu Fu Guo  ---
0x1089f887 dump_substitution
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:1654
0x108a1c2f dump_function_decl
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:1817
0x1089e187 dump_decl
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:1385
0x108aa8df decl_as_string(tree_node*, int)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:3146
0x1094d6ef trees_out::insert(tree_node*, walk_kind)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:4801
0x1096300f trees_out::decl_node(tree_node*, walk_kind)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:8582
0x10965da3 trees_out::tree_node(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:9104
0x109542c7 trees_out::core_vals(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:5924
0x10959d4f trees_out::tree_node_vals(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:7074
0x10964dab trees_out::tree_value(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:8911
0x10965ddf trees_out::tree_node(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:9109
0x109542c7 trees_out::core_vals(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:5924
0x10959d4f trees_out::tree_node_vals(tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/module.cc:7074
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


Hi Andrew and Richard,

Thanks a lot! 
This issue happens when calling debug_tree/decl_as_string manually inside FE. 
At where overloaded functions (::new) are not resolved yet, and then cause
'tsubst' to be called. 
I see, it is not a good place to use debug_tree.

[Bug c++/105418] debug_tree does not support well for std::construct_at

2022-04-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105418

Jiu Fu Guo  changed:

   What|Removed |Added

   Priority|P3  |P5

--- Comment #1 from Jiu Fu Guo  ---
Since this just occurs during debugging GCC on the corner case, I put it a low
priority.

[Bug c++/105418] New: debug_tree does not support well for std::construct_at

2022-04-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105418

Bug ID: 105418
   Summary: debug_tree does not support well for std::construct_at
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

During debug GCC, when call "debug_tree" inside gdb, I found "debug_tree" fail
on "std::construct_at". It fails from "decl_as_string"

To reproduce this issue quicker, the below patch could be used. 
---
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index cebf9c35c1d..ead4112dd37 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -4793,10 +4793,12 @@ trees_out::get_tag (tree t)
 }

 /* Insert T into the map, return its tag number.*/
-
+const char* decl_as_string(tree, int);//void debug_tree(tree);
 int
 trees_out::insert (tree t, walk_kind walk)
 {
+  if (TREE_CODE (t) == FUNCTION_DECL)
+decl_as_string (t,TFF_DECL_SPECIFIERS);//debug_tree (t);
   gcc_checking_assert (walk != WK_normal || !TREE_VISITED (t));
   int tag = --ref_num;
   bool existed;
--
configure the GCC with below command:
$ configure --enable-languages=c,c++ --with-cpu=native --enable-checking
--with-long-double-128
--prefix=/home/guojiufu/gcc/install/gcc-mainline-base-debug --disable-bootstrap

Using test case: t.cc
---
#include 

struct S {
int x;
float y;
double z;

S(int x, float y, double z) : x{x}, y{y}, z{z} { }

};

void foo()
{
alignas(S) unsigned char storage[sizeof(S)];

S* ptr = std::construct_at(reinterpret_cast(storage), 42, 2.71828f,
3.1415);

std::destroy_at(ptr);
}


$ g++ -fmodule-header t.cc -std=gnu++20
In file included from
/home/guojiufu/gcc/build/gcc-mainline-base-debug/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/bits/stl_iterator.h:85,
 from
/home/guojiufu/gcc/build/gcc-mainline-base-debug/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:67,
 from
/home/guojiufu/gcc/build/gcc-mainline-base-debug/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/memory:63,
 from t.cc:1:
/home/guojiufu/gcc/build/gcc-mainline-base-debug/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/bits/stl_construct.h:96:17:
internal compiler error: in perform_overload_resolution, at cp/call.cc:4710
   96 | -> decltype(::new((void*)0) _Tp(std::declval<_Args>()...))
  | ^
0x106b32cb perform_overload_resolution
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/call.cc:4710
0x106b3d9f build_operator_new_call(tree_node*, vec**, tree_node**, tree_node**, tree_node*, tree_node*, tree_node**,
int)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/call.cc:4929
0x108d0743 build_new_1
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/init.cc:3412
0x108d300f build_new(unsigned int, vec**,
tree_node*, tree_node*, vec**, int, int)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/init.cc:4014
0x10b210ef tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/pt.cc:20557
0x10b01ddf tsubst(tree_node*, tree_node*, int, tree_node*)
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/pt.cc:16306
0x10896f6b dump_template_bindings
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:486
0x1089f887 dump_substitution
/home/guojiufu/gcc/gcc-mainline-base/gcc/cp/error.cc:1654

[Bug c++/105322] [modules] ICE with constexpr object of local class type from another function

2022-04-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105322

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
One finding:
In trees_out::decl_node, there are code lines:

case FIELD_DECL:
  {
if (streaming_p ())
  i (tt_data_member);

tree ctx = DECL_CONTEXT (decl);
tree_node (ctx); //The context, is the struct "S" inside foo, "S" is
handled in 'tree_node', and "insert" is called for member "d".

tree name = NULL_TREE;

if (TREE_CODE (decl) == USING_DECL)
  ;
else
  {
name = DECL_NAME (decl);
if (name && IDENTIFIER_ANON_P (name))
  name = NULL_TREE;
  }

tree_node (name);
if (!name && streaming_p ())
  {
unsigned ix = get_field_ident (ctx, decl);
u (ix);
  }

int tag = insert (decl); //HERE, "insert" is called again member "d". 

It may be a direct cause that "insert" is called twice for "d", and then a
crash occurs.

While I do not know much about the code and do not found a fix.

[Bug c++/105297] [12 Regression] new modules 'xtreme' test cases FAILs

2022-04-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105297

--- Comment #15 from Jiu Fu Guo  ---
(In reply to Patrick Palka from comment #13)
> (In reply to Jiu Fu Guo from comment #11)
> > (In reply to Patrick Palka from comment #10)
> > > 
> > > Interestingly that doesn't seem to make a difference.  What seems to 
> > > matter
> > > is whether the constexpr function modifies the CONSTRUCTOR that it 
> > > returns:
> > > 
> > > constexpr auto foo() {
> > >   struct S { int d; } t = {};
> > >   t.d = 0; // doesn't ICE if this line is commented out
> > >   return t;
> > > }
> > > 
> > > template
> > > int bar() {
> > >   constexpr auto t = foo();
> > >   return 0;
> > > }
> > 
> > Right, it is weird. Some PRs on Xtreme-* failure (including ICE) were also
> > reported before. e.g. PR100052, PR101853, PR99910.  As commented in those
> > PRs, these may be random failures, and changes in headers that could expose
> > the ICE.
> > I'm also wondering if this may be an issue hidden inside somewhere (GC?).
> 
> In this case I suspect it's just a bug in the modules code, I opened
> PR105322 to track it.

Oh, thanks!  This failure seems only about the module code on 'struct member
cross functions'.

  1   2   3   >