date:20231007

[Bug go/46986] Go is not supported on Darwin

2023-10-07 Thread vital.had at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46986

--- Comment #49 from Sergey Fedorov  ---
If someone happens to have some WIP on this, more recent than 2012, please
share, if possible.

[Bug middle-end/111621] [RISC-V] Bad register allocation in vadd.vi may cause operational error

2023-10-07 Thread mumuxi_ll at outlook dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111621

--- Comment #2 from liu xu  ---
I'm sorry about that and will notice that next time.

The toolchain I used was built using the gcc master branch, and another point
that needs to be added is that only the vadd.vi instruction with mask will
encounter the above problem, and without mask, it will not.

Looking forward to your reply！

[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
   Last reconfirmed||2023-10-08

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> For comment #2 from EVRP:
> Folding statement: _3 = _2 / a_5(D);
> Applying pattern match.pd:934, gimple-match-4.cc:2021
> gimple_simplified to _3 = 2;
> 
> Which corresponds to the match pattern:
> /* Simplify (t * 2) / 2) -> t.  */
> (for div (trunc_div ceil_div floor_div round_div exact_div)
>  (simplify
>   (div (mult:c @0 @1) @1)
>   (if (ANY_INTEGRAL_TYPE_P (type))
>(if (TYPE_OVERFLOW_UNDEFINED (type))
> @0
> #if GIMPLE
> (with {value_range vr0, vr1;}
>  (if (INTEGRAL_TYPE_P (type)
> && get_range_query (cfun)->range_of_expr (vr0, @0)
> && get_range_query (cfun)->range_of_expr (vr1, @1)
> && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
>   @0))
> #endif
>

Which was improved on the trunk by r14-4082-g55b22a6f630e (and then by
r14-4191-gd946fc1c71bd). I don't know why the original testcase is not causing
the above pattern to match though, maybe because a*2 is used twice ...

[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718

--- Comment #3 from Andrew Pinski  ---
For comment #2 from EVRP:
Folding statement: _3 = _2 / a_5(D);
Applying pattern match.pd:934, gimple-match-4.cc:2021
gimple_simplified to _3 = 2;

Which corresponds to the match pattern:
/* Simplify (t * 2) / 2) -> t.  */
(for div (trunc_div ceil_div floor_div round_div exact_div)
 (simplify
  (div (mult:c @0 @1) @1)
  (if (ANY_INTEGRAL_TYPE_P (type))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
@0
#if GIMPLE
(with {value_range vr0, vr1;}
 (if (INTEGRAL_TYPE_P (type)
  && get_range_query (cfun)->range_of_expr (vr0, @0)
  && get_range_query (cfun)->range_of_expr (vr1, @1)
  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
  @0))
#endif

[Bug target/94395] Powerpc suboptimal 64-bit constant generation near large values with few bits set

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94395

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
After r14-4470, the trunk could generate a better code for this case.

[Bug target/94393] Powerpc suboptimal 64-bit constant comparison

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94393

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||guojiufu at gcc dot gnu.org
 Status|NEW |RESOLVED

--- Comment #9 from Jiu Fu Guo  ---
After r14-4470, trunk generates better code for this case.

[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'

2023-10-07 Thread 652023330028 at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718

--- Comment #2 from Yi <652023330028 at smail dot nju.edu.cn> ---
We noticed one change between gcc-13.2 and the current gcc-trunk:

https://godbolt.org/z/j5Mnvno9n

In the following code, gcc-13.2 does not yet have the ability to optimize as
expected, but on gcc-trunk, it does.

unsigned n1,n2;
void func1(unsigned a){
if(a<=10 || a>=20)
return;
n2=(a+a)/a;
}


Maybe this change will help solve this issue?

[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Jiu Fu Guo  ---
Patches are committed for using "li/lis;rldicl/rldicr/rldic" to construct
constants.

[Bug target/106708] [rs6000] 64bit constant generation with oris xoris

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106708

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Jiu Fu Guo  ---
Patch ready on the trunk.

[Bug c++/111723] #pragma GCC system_header suppresses errors from narrowing conversions

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111723

--- Comment #1 from Andrew Pinski  ---
I think this is correct behavior really.
Note even clang with libc++ has the same behavior ...

[Bug c++/111723] New: #pragma GCC system_header suppresses errors from narrowing conversions

2023-10-07 Thread de34 at live dot cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111723

Bug ID: 111723
   Summary: #pragma GCC system_header suppresses errors from
narrowing conversions
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Keywords: accepts-invalid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: de34 at live dot cn
  Target Milestone: ---

In the following program, the conversions are narrowing, but only the one for
nonstd::in_fun_result is rejected.

When -Wsystem-headers is used, then the narrowing conversion for
std::ranges::in_fun_result are correctly diagnosed. But if -pedantic-errors and
-Wsystem-headers are used together, some standard headers are rejected.

Godbolt link: https://godbolt.org/z/fT7b16eoe


```
#include 
#include 
#include 

namespace nonstd {
template
struct in_fun_result {
[[no_unique_address]] I in;
[[no_unique_address]] F fun;

template
requires std::convertible_to &&
std::convertible_to
constexpr operator in_fun_result() const&
{
return {in, fun};
}

template
requires std::convertible_to && std::convertible_to
constexpr operator in_fun_result() &&
{
return {std::move(in), std::move(fun)};
}
};
}

int main()
{
std::ranges::in_fun_result r1{};
std::ranges::in_fun_result r2 = r1; // should be error, but
not diagnosed by default

nonstd::in_fun_result r3{};
nonstd::in_fun_result r4 = r3; // error, rejected with
-pedantic-errors
}
```

It seems to me that #pragma GCC system_header shouldn't suppress errors from
narrowing conversions, because the diagnostics are required by the standard.

[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2

2023-10-07 Thread zfigura at codeweavers dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

--- Comment #5 from Zeb Figura  ---
(In reply to Andrew Pinski from comment #4)
> There is no bug here.
> ICF finds that your definition of memcpy is the same as memmove and merges
> the 2 and then calls memcpy from your memmove and then inlines the normal
> memcpy because well it says it is the same.

I suppose I understand this explanation, but it does not feel like a very
intuitive behaviour.

The ICF part makes sense. The choice to optimize a builtin memcpy/memmove call
into a different instruction sequence (which doesn't match the original) also
makes sense. I would not really expect these two to be combined in this manner,
though. memmove() is not calling builtin memcpy(), it is calling our
implementation of memcpy(), which doesn't have the same semantics as builtin
memcpy().

[It also seems odd to me that func2() would be replaced with a builtin memcpy()
rather than a builtin memmove()?]

> You can just use -fno-builtin to fix the issue by saying memcpy and memmove
> are not builtins and treat them like normal functions.
> 
> That fixes the issue by not inlining the target defined memcpy.

Fair enough, I guess. I suppose that's the right thing to do anyway...

[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from Andrew Pinski  ---
There is no bug here.
ICF finds that your definition of memcpy is the same as memmove and merges the
2 and then calls memcpy from your memmove and then inlines the normal memcpy
because well it says it is the same.

You can just use -fno-builtin to fix the issue by saying memcpy and memmove are
not builtins and treat them like normal functions.

That fixes the issue by not inlining the target defined memcpy.

[Bug c++/94039] conditional operator fails to use proper overload

2023-10-07 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94039

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #3 from Arthur O'Dwyer  ---
You can also hit this with a lambda, which of course is isomorphic to Andre's
test case:

void (*a)() = true ? []{} : nullptr;

Bug #88458 ("GCC rejects (true ? 0 : nullptr)") might be tangentially related.

[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2

2023-10-07 Thread zfigura at codeweavers dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

--- Comment #3 from Zeb Figura  ---
Created attachment 56072
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56072=edit
testcase

Attaching a reduced-ish testcase, that contains the unmodified code of memcpy()
and memmove(), plus two callers. The callers seem to be necessary to trigger
the incorrect optimization.

Compile with '-c -O2 -march=bdver2 -m32'.

[Bug target/111722] manually defined memcpy() and memmove() incorrectly handle overlap with -O2 -m32 -march=bdver2

2023-10-07 Thread zfigura at codeweavers dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

Zeb Figura  changed:

   What|Removed |Added

Version|unknown |13.2.0
   Keywords||wrong-code
 Target||i686-linux-gnu
  Component|c   |target
Summary|gcc generates wrong code|manually defined memcpy()
   |with|and memmove() incorrectly
   ||handle overlap with -O2
   ||-m32 -march=bdver2

--- Comment #2 from Zeb Figura  ---
Really sorry about that, I managed to accidentally hit the Enter key halfway
through writing the title. Here is the actual bug description:

--

Wine provides freestanding libraries, including manual definitions of memcpy()
and memmove() [1].

Those are defined in C, and while our definitions are *technically*
non-compliant C (violating the requirement that the pointers must point to the
same object), they should be fine for our targets, and anyway, the case I'm
running into is failure to handle overlap where the pointers *do* in fact point
into the same object. I can't find fault with the definitions themselves,
although I may be missing something.

We also, contrary to standards, give memcpy() the semantics of memmove(),
because some Windows programs are buggy and make that assumption. We do this by
copy-pasting the definition (I'm not sure why we do this rather than just
calling one function from the other, but it is what it is).

I recently started compiling with -march=native, and found that gcc was failing
to correctly handle overlap in memmove. Further investigation revealed that,
somehow, memmove() was being incorrectly optimized to *not* check for overlap,
while memcpy() remained in its unoptimized form.

I ran into this originally with the i686-w64-mingw32 target, but I've adjusted
the target to i686-linux-gnu since it happens there too. It does *not* happen
on x86_64.

[1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/ntdll/string.c#l98

[Bug c/111722] gcc generates wrong code with

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-10-08
 Status|UNCONFIRMED |WAITING

--- Comment #1 from Andrew Pinski  ---
There is nothing in this bug except saying there is wrong code happening (not
even with what options or with anything else).

[Bug c/111722] New: gcc generates wrong code with

2023-10-07 Thread zfigura at codeweavers dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111722

Bug ID: 111722
   Summary: gcc generates wrong code with
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zfigura at codeweavers dot com
  Target Milestone: ---

[Bug c/111721] New: RISC-V: Failed to SLP for gather_load in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721

Bug ID: 111721
   Summary: RISC-V: Failed to SLP for gather_load in RVV
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

https://godbolt.org/z/d5TPa5e5s

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

RVV ASM:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vlseg2e32.v v2,(a2) > VEC_LOAD_LANES
vsetivlizero,4,e32,m1,ta,ma
vsll.vi v4,v2,2
vsll.vi v1,v3,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v4,(a1),v4
vluxei32.v  v1,(a1),v1
vsetivlizero,4,e32,m1,ta,ma
sllia4,a5,3
vadd.vi v2,v4,1
vadd.vi v3,v1,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v2,(a0)  > VEC_STORE_LANES
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret

Comparing to aarch64 which can SLP, RVV geneates expensive
load_lanes/store_lanes.

This is because RVV is using MASK_LEN_GATHER_LOAD that we currently can didn't
support SLP for it.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #12 from JuzheZhong  ---
Hi, Andrew.

I have another try:

https://godbolt.org/z/heKxcMWsY

change the load into normal load of arr:
vuint8m1_t varr = *(vuint8m1_t*)arr;

Like you said,

The issue is gone (as good as LLVM):
fn:
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
li  a4,32
vl1re8.vv1,0(a5)
vsetvli zero,a4,e8,m1,ta,ma
vand.vi v1,v1,1
vs1r.v  v1,0(a0)
ret

It seems that GCC can only optimize the normal load ?

Do we have a chance to optimize such case (for an unknown load) ?

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #11 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #10)
> The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1
> really.

Ok. Do you know why GCC prop load/store for arr into __riscv_vle8_v_u8m1?

Just because the __riscv_vle8_v_u8m1 pattern is complex?

I don't think we can simplify __riscv_vle8_v_u8m1 pattern since we tried to
fuse
all feature into a single pattern (A pattern includes multiple features become
complex) to reduce the building of insn-emit.cc and insn-opinit.cc

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #10 from Andrew Pinski  ---
The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1
really.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #9 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #7)
> .

Besides, if we remove the data initialization:


https://godbolt.org/z/qcjcP7s1c

#include
vuint8m1_t fn() {

uint8_t arr[32];
uint8_t m = 1;

vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32);
vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32);
//vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m);

return vand_m;
}

The issue is gone:

fn:
addisp,sp,-32
li  a5,32
vsetvli zero,a5,e8,m1,ta,ma
vle8.v  v24,0(sp)
vand.vi v24,v24,1
vs1r.v  v24,0(a0)
addisp,sp,32
jr  ra

The codegen as good as LLVM.

I still think it is something like constant memory pool issue.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #8 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #6)
> I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple
> level, it might just work ...
> 
> But it gets expanded as:
> (insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ])
> (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> (const_vector:RVVMF8BI repeat [
> (const_int 1 [0x1])
> ])
> (reg:DI 145)
> (const_int 2 [0x2]) repeated x2
> (const_int 0 [0])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (mem:RVVM1QI (reg:DI 144) [0  S[16, 16] A8])
> (unspec:RVVM1QI [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1
>  (nil))
> 
> That seems complex.

You mean the normal load MEM_REF in GCC ?

I don't think we can do that since this intrinsic is defined with mask, len,
else value,...etc.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #7 from Andrew Pinski  ---
.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #6 from Andrew Pinski  ---
I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple level,
it might just work ...

But it gets expanded as:
(insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ])
(if_then_else:RVVM1QI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 145)
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(mem:RVVM1QI (reg:DI 144) [0  S[16, 16] A8])
(unspec:RVVM1QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1
 (nil))

That seems complex.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #5 from JuzheZhong  ---
Similar issue in GCC 13.2:

https://godbolt.org/z/axKc4qj47

fn:
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
ld  a1,0(a5)
ld  a2,8(a5)
ld  a3,16(a5)
ld  a4,24(a5)
addisp,sp,-32
sd  a1,0(sp)
sd  a2,8(sp)
sd  a3,16(sp)
sd  a4,24(sp)
li  a5,32
vsetvli zero,a5,e8,m1,ta,ma
vle8.v  v24,0(sp)
vand.vi v24,v24,1
vs1r.v  v24,0(a0)
addisp,sp,32
jr  ra


Multiple ld/sd. It seems that we didn't allow natural constant mem pool

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #4 from JuzheZhong  ---
I found this is not because VLS modes.

with --param=riscv-autovec-preference=fixed-vlmax

disabling VLS modes also see unnecessary load/store:

fn:
lui a5,%hi(.LANCHOR0)
addisp,sp,-32
addia5,a5,%lo(.LANCHOR0)
vl2re64.v   v8,0(a5)   - ??? unnecessary
li  a4,32
vs2r.v  v8,0(sp)- ??? unnecessary
vsetvli zero,a4,e8,m1,ta,ma
vle8.v  v0,0(sp)
vand.vi v0,v0,1
addisp,sp,32
jr  ra

The optimized tree is reasonable, but after the "expand" stage, the redundant
load and store are produced.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #3 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #2)
> I noticed there is an ABI difference here.
> 
> GCC is returning via a store to a0:
> vsm.v   v1,0(a0)
> 
> While LLVM is returning via v0 .
> 
> Which one is correct?

Both are correct. We have a experiment ABI doc.

GCC also support same ABI but need --param=riscv-vector-abi

Then GCC ASM:

fn:
lui a5,%hi(.LANCHOR0)
addisp,sp,-32
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,4,e64,m2,ta,ma
li  a4,32
vle64.v v8,0(a5)
vse64.v v8,0(sp)
vsetvli zero,a4,e8,m1,ta,ma
vle8.v  v0,0(sp)
vand.vi v0,v0,1
addisp,sp,32
jr  ra

GCC also return via v0 with enabling ABI.


The root cause is unnecessary load/store:

vle64.v v8,0(a5)
vse64.v v8,0(sp)

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #2 from Andrew Pinski  ---
I noticed there is an ABI difference here.

GCC is returning via a store to a0:
vsm.v   v1,0(a0)

While LLVM is returning via v0 .

Which one is correct?

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #1 from JuzheZhong  ---
The root cause is unnecessary VLS modes data movement:

(insn 10 9 11 2 (set (reg:V4DI 143)
(mem/u/c:V4DI (reg:DI 142) [0  S32 A128])) "/app/example.c":4:13 1119
{*movv4di}
 (nil))
(insn 11 10 12 2 (set (mem/c:V4DI (reg:DI 141) [0  S32 A128])
(reg:V4DI 143)) "/app/example.c":4:13 1119 {*movv4di}
 (nil))

[Bug regression/111709] [13 Regression] Miscompilation of sysdeps/ieee754/dbl-64/s_fma.c

2023-10-07 Thread dave.anglin at bell dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111709

--- Comment #10 from dave.anglin at bell dot net ---
On 2023-10-06 3:50 a.m., rguenth at gcc dot gnu.org wrote:
> Does it work on trunk? 
No.  Test results with gcc trunk are identical to with Debian gcc-13.

Tried just rebuilding s_fma.c, and a full build and check.

[Bug c/111720] New: RISC-V: Ugly codegen in RVV

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

Bug ID: 111720
   Summary: RISC-V: Ugly codegen in RVV
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Reference: https://godbolt.org/z/YqW7Y5Yve

#include
vbool8_t fn() {

uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};
uint8_t m = 1;

vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32);
vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32);
vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m);

return vmask;
}

GCC asm:

fn:
lui a5,%hi(.LANCHOR0)
addisp,sp,-32
vsetivlizero,4,e64,m2,ta,ma
addia5,a5,%lo(.LANCHOR0)
li  a4,32
vle64.v v2,0(a5)
vse64.v v2,0(sp)
vsetvli zero,a4,e8,m1,ta,ma
vle8.v  v1,0(sp)
vand.vi v1,v1,1
vsetvli a5,zero,e8,m1,ta,ma
vsm.v   v1,0(a0)
addisp,sp,32
jr  ra

LLVM ASM:

fn: # @fn
.Lpcrel_hi0:
auipc   a0, %pcrel_hi(.L__const.fn.arr)
addia0, a0, %pcrel_lo(.Lpcrel_hi0)
li  a1, 32
vsetvli zero, a1, e8, m1, ta, ma
vle8.v  v8, (a0)
vand.vi v0, v8, 1
ret
.L__const.fn.arr:
.ascii 
"\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t"

[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR

2023-10-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||12.3.1, 13.2.1
 Resolution|--- |FIXED
   Target Milestone|13.3|11.5

--- Comment #11 from Andrew Pinski  ---
Fixed everywhere.

[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR

2023-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699

--- Comment #10 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Andrew Pinski
:

https://gcc.gnu.org/g:9d4caf90e7bf1824ebabf0bc0541bfea511ef03b

commit r11-11054-g9d4caf90e7bf1824ebabf0bc0541bfea511ef03b
Author: Andrew Pinski 
Date:   Thu Oct 5 12:21:19 2023 -0700

MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a &
b`

Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for
vec_cond(*,vec_cond,*)
only for GIMPLE so we don't get an infinite loop for fold any more.

Note this is a latent bug since these patterns were added in
r11-2577-g229752afe3156a
and was exposed by r14-3350-g47b833a9abe1 where now able to remove a
VIEW_CONVERT_EXPR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/111699

gcc/ChangeLog:

* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111699-1.c: New test.

(cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)

[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR

2023-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Andrew Pinski
:

https://gcc.gnu.org/g:a63238cd52d974d364677def97d4ed70d26a7410

commit r12-9915-ga63238cd52d974d364677def97d4ed70d26a7410
Author: Andrew Pinski 
Date:   Thu Oct 5 12:21:19 2023 -0700

MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a &
b`

Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for
vec_cond(*,vec_cond,*)
only for GIMPLE so we don't get an infinite loop for fold any more.

Note this is a latent bug since these patterns were added in
r11-2577-g229752afe3156a
and was exposed by r14-3350-g47b833a9abe1 where now able to remove a
VIEW_CONVERT_EXPR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/111699

gcc/ChangeLog:

* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111699-1.c: New test.

(cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)

[Bug bootstrap/111664] [14 regression] Fails to build with mawk (error in gcc/opt-read.awk) after r14-4354-ge4a4b8e983bac8

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111664

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #6 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug middle-end/111699] [11/12/13 Regression] ICE: SIGSEGV: infinite recursion in fold_build3_loc/fold_ternary_loc/generic_simplify_VEC_COND_EXPR

2023-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111699

--- Comment #8 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Andrew Pinski
:

https://gcc.gnu.org/g:add2afa9e25f1776fdfbeb1b99fd1efcf850f91f

commit r13-7938-gadd2afa9e25f1776fdfbeb1b99fd1efcf850f91f
Author: Andrew Pinski 
Date:   Thu Oct 5 12:21:19 2023 -0700

MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a &
b`

Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for
vec_cond(*,vec_cond,*)
only for GIMPLE so we don't get an infinite loop for fold any more.

Note this is a latent bug since these patterns were added in
r11-2577-g229752afe3156a
and was exposed by r14-3350-g47b833a9abe1 where now able to remove a
VIEW_CONVERT_EXPR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/111699

gcc/ChangeLog:

* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111699-1.c: New test.

(cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)

[Bug rtl-optimization/111384] missed optimization: GCC adds extra any extend when storing subreg#0 multiple times

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111384

Jeffrey A. Law  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from Jeffrey A. Law  ---
So this is something we've been pondering over in rv64 land.  Joern has an
extension to DCE which tracks subobjects in an attempt to determine if bits set
by sign/zero extensions are never read.  If they aren't read, then the
extension can be eliminated.

[Bug target/109414] RISC-V: unnecessary sext.w in rv64

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109414

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
These code generation inefficiences have been fixed.  I didn't bisect, but I
would hazard a guess it was Jivan's work on exposing the widening nature of the
32 bit operations and extracting the result via a promoted subreg.

ie, for the first example we now generate this during expand:

(insn 2 5 3 2 (set (reg/v:DI 136 [ x ])
(reg:DI 10 a0 [ x ])) "j.c":1:26 -1
 (nil))
(insn 3 2 4 2 (set (reg/v:DI 137 [ n ])
(reg:DI 11 a1 [ n ])) "j.c":1:26 -1
 (nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg:DI 140)
(sign_extend:DI (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0)
(const_int 1 [0x1] "j.c":2:12 -1
 (nil))
(insn 8 7 9 2 (set (reg:SI 139)
(subreg/s/u:SI (reg:DI 140) 0)) "j.c":2:12 -1
 (expr_list:REG_EQUAL (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0)
(const_int 1 [0x1]))
(nil)))
(insn 9 8 10 2 (set (reg:DI 141)
(xor:DI (reg/v:DI 137 [ n ])
(subreg:DI (reg:SI 139) 0))) "j.c":2:17 -1
 (nil))
(insn 10 9 11 2 (set (reg:DI 142)
(sign_extend:DI (subreg:SI (reg:DI 141) 0))) "j.c":2:17 discrim 1 -1
 (nil))
(insn 11 10 15 2 (set (reg:DI 135 [  ])
(reg:DI 142)) "j.c":2:17 discrim 1 -1
 (nil))
(insn 15 11 16 2 (set (reg/i:DI 10 a0)
(reg:DI 135 [  ])) "j.c":3:1 -1
 (nil))
(insn 16 15 0 2 (use (reg/i:DI 10 a0)) "j.c":3:1 -1
 (nil))


Which is much easier for combine to analyze and prove the trailing sign
extension is unnecessary.

[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jeffrey A. Law  ---
I wasn't aware of this BZ when I made the commit referenced in c#6.  But yes,
the whole point of that commit was to fix this problem.

[Bug target/64215] -Os misses an opportunity to merge two ret instructions

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64215

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
Andrew, the reason the patch you referenced doesn't help this case is because
we don't have an unconditional jump to a return only block.

To optimize this case we'd have to detect that we have a return only block that
is immediately preceded by another return block after bbro.

ie:

(note 48 23 59 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
(insn 59 48 49 6 (use (reg/i:SI 10 a0)) -1
 (nil))
(jump_insn 49 59 37 6 (simple_return) 346 {simple_return}
 (nil)
 -> simple_return)
;; lr  out   1 [ra] 2 [sp] 10 [a0]
;; live  out 1 [ra] 2 [sp] 10 [a0]

;;  succ:   EXIT [always]  count:52738306 (estimated locally, freq 0.4591)

;; basic block 7, loop depth 0, count 6317494 (estimated locally, freq 0.0550),
maybe hot
;;  prev block 6, next block 1, flags: (REACHABLE, RTL)
;;  pred:   2 [5.5% (guessed)]  count:6317494 (estimated locally, freq
0.0550) (CAN_FALLTHRU)
;; bb 7 artificial_defs: { }
;; bb 7 artificial_uses: { u-1(2){ }}
;; lr  in1 [ra] 2 [sp] 10 [a0]
;; lr  use   2 [sp] 10 [a0]
;; lr  def
;; live  in  1 [ra] 2 [sp] 10 [a0]
;; live  gen
;; live  kill

(code_label 37 49 36 7 4 (nil) [1 uses])
(note 36 37 60 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
(insn 60 36 51 7 (use (reg/i:SI 10 a0)) -1
 (nil))
(jump_insn 51 60 41 7 (simple_return) 346 {simple_return}
 (nil)
 -> simple_return)

[Bug middle-end/110859] New FAIL: 23_containers/vector/bool/110807.cc

2023-10-07 Thread danglin at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110859

--- Comment #3 from John David Anglin  ---
FAIL: 23_containers/vector/bool/110807.cc  -std=gnu++17 (test for excess
errors)
Excess errors:
/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:440:
warning: 'void* __builtin_memmove(void*, const void*, unsigned int)' writing
between 5 and 268435455 bytes into a region of size 4 overflows the destination
[-Wstringop-overflow=]

[Bug tree-optimization/111718] Missed optimization of '(a+a)/a'

2023-10-07 Thread vanyacpp at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718

Ivan Sorokin  changed:

   What|Removed |Added

 CC||vanyacpp at gmail dot com

--- Comment #1 from Ivan Sorokin  ---
GCC does the optimization if the return from the function is replaced with
__builtin_unreachable:

unsigned n1, n2;

void func1(unsigned a)
{
if (a <= 10 || a >= 20)
__builtin_unreachable();

n1 = a + a;
n2 = (a + a)/a;
}

func1(unsigned int):
mov DWORD PTR n2[rip], 2
add edi, edi
mov DWORD PTR n1[rip], edi
ret

https://godbolt.org/z/Tjsz6neTs

Perhaps this issue has the same underlying cause as the PR80015.

[Bug fortran/111719] New: Omitting data-sharing attribute for function return value in OpenMP does not raise an error.

2023-10-07 Thread pmblakely at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111719

Bug ID: 111719
   Summary: Omitting data-sharing attribute for function return
value in OpenMP does not raise an error.
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pmblakely at googlemail dot com
  Target Milestone: ---

In the following Fortran90 example:

program prog

contains
  function b(arr)
implicit none
integer :: i, xCells
real(kind = 8) :: dx, a, dc, b, dt
real(kind = 8), allocatable, dimension(:), intent(in) :: arr

b = 1d300
dc = 1d300
!$OMP parallel do default(none) reduction(min:dt) firstprivate(xCells, dx, a)
shared(arr)
do i = 0, xCells 
   a = arr(i)
   b = min(b, 1d0 / a)
end do  
  end function b
end program prog

the return value for function 'b' is the intended reduction value in the
do-loop, but is not mentioned in the OpenMP reduction clause (dt is incorrectly
mentioned instead). Due to the default(none) clause, this should be a
compile-time error.
However:
gfortran test.f90 -o test -fopenmp
compiles this without warnings or errors (versions 13.1.0, 8.4.0, 9.4.0, 11.2.0
and 12.1.0 all tested).

If "b = min(b, 1d0 / a)" is replaced by "dc = min(dc, 1d0/a)"
then gfortran gives: "Error: 'dc' not specified in enclosing ‘parallel’"

I would expect this error to be generated in the original case as well.

Note that the OpenMP standard at
https://www.openmp.org/spec-html/5.2/openmpsu33.html does not give an
implicitly determined data-sharing attribute for the function return value.
Also the Intel Fortran ifort (2021.8.0) does raise the expected error on the
above test-code.

[Bug libstdc++/92798] -fshort-enums can break iterators of std::map

2023-10-07 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92798

--- Comment #6 from Jonathan Wakely  ---
We could add an enumerator that forces sizeof(_Rb_tree_color) == sizeof(int),
which would be valid for C++98.

[Bug libstdc++/111713] libstdc++ accepts invalid regular expression

2023-10-07 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111713

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jonathan Wakely  ---
Yes, it's a dup

*** This bug has been marked as a duplicate of bug 29 ***

[Bug libstdc++/111129] std::regex incorrectly matches quantifiers with plus appended

2023-10-07 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29

Jonathan Wakely  changed:

   What|Removed |Added

 CC||hewillk at gmail dot com

--- Comment #4 from Jonathan Wakely  ---
*** Bug 111713 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/111718] New: Missed optimization of '(a+a)/a'

2023-10-07 Thread 652023330028 at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111718

Bug ID: 111718
   Summary: Missed optimization of '(a+a)/a'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 652023330028 at smail dot nju.edu.cn
  Target Milestone: ---

Hello, we found some optimizations (regarding Arithmetic optimization) that GCC
may have missed. We would greatly appreicate if you can take a look and let us
know what you think.

Given the following code: 
https://godbolt.org/z/5de17zvz9

unsigned n1,n2;
void func1(unsigned a){
if(a>10&<20){
n1=a+a;
n2=(a+a)/a;
}
}

We note that `(a+a)/a` should be optimized to `2`, but gcc-trunk -O3 does not:
func1(unsigned int):
lea eax, [rdi-11]
cmp eax, 8
ja  .L1
lea eax, [rdi+rdi]
xor edx, edx
mov DWORD PTR n1[rip], eax
div edi
mov DWORD PTR n2[rip], eax
.L1:
ret


Thank you very much for your time and effort! We look forward to hearing from
you.

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2023-10-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

JuzheZhong  changed:

   What|Removed |Added

 CC||juzhe.zhong at rivai dot ai

--- Comment #6 from JuzheZhong  ---
Hi, Richi.

Recently, I am evaluating TSVC performance of GCC:

I found both RISC-V and aarch64 can SLP vectorize it:

https://godbolt.org/z/ssvTxxjeT

Both GCC-13 and trunk GCC can SLP it like LLVM (GCC-12 failed) but with
-fno-vect-cost-model.

I suspect we should adjust Vector COST model (I don't think we should ajust
cost
model in target backend since LLVM by default vectorize such case).

[Bug target/111634] RISC-V vector: ICE RTL check: expected code 'reg', have 'lo_sum' in rhs_regno, at rtl.h:1934

2023-10-07 Thread patrick at rivosinc dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111634

Patrick O'Neill  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Patrick O'Neill  ---
Confirmed to be fixed on r14-4443-ga809a556dc0.

Built and ran testsuite on rv32/64gcv glibc/newlib with --enable-checking=rtl.
They all built successfully and no tests fail due to rtl checking!

[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9

2023-10-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jiu Fu Guo  ---
Fixed.

[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9

2023-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jiu Fu Guo :

https://gcc.gnu.org/g:537d7a445ca0ed677751afd3cdcf8465ccd5fb7e

commit r14-4445-g537d7a445ca0ed677751afd3cdcf8465ccd5fb7e
Author: Jiufu Guo 
Date:   Thu Sep 28 17:34:45 2023 +0800

rs6000: use mtvsrws to move sf from si p9

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the bitcast from SI to SF (or lowpart DI to SF).

For example:
  *(long long*)buff = di;
  float f = *(float*)(buff);

"sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".

PR target/108338

gcc/ChangeLog:

* config/rs6000/rs6000.md (movsf_from_si): Update to generate
mtvsrws
for P9.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.

[Bug target/108338] use mtvsrws for lowpart DI->SF conversion on P9

2023-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108338

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Jiu Fu Guo :

https://gcc.gnu.org/g:5f56b76ff1c15118200204569389f85cca4e32d3

commit r14--g5f56b76ff1c15118200204569389f85cca4e32d3
Author: Jiufu Guo 
Date:   Thu Sep 28 17:00:04 2023 +0800

rs6000: optimize moving to sf from highpart di

Currently, we have the pattern "movsf_from_si2" which was trying
to support moving high part DI to SF.

But current pattern only accepts "ashiftrt":
XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should
also be ok.
And current pattern only supports BE.

Here, updating the pattern to support BE and "lshiftrt".

PR target/108338

gcc/ChangeLog:

* config/rs6000/predicates.md (lowpart_subreg_operator): New
define_predicate.
* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si2): Rename to ...
(movsf_from_si2_): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.

[Bug c/110368] Incorrect "is used uninitialized" warning message

2023-10-07 Thread xry111 at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110368

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #8 from Xi Ruoyao  ---
(In reply to Sam James from comment #7)
> That said, I suppose we should do better here with -Wstrict-aliasing. No
> level detects it.

I think it's very difficult to make -Wstrict-aliasing really useful.  A
sanitizer at runtime would be much more useful but the develop of such a
sanitizer seems stalled
(https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/).

For now we can only compare the output with or without -fno-strict-aliasing. 
And we are already saying "try -fno-strict-aliasing" in the bug report
guidance.

55 matches

Mail list logo