[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #31 from Richard Biener  ---
(In reply to Patrick J. LoPresti from comment #29)
> (In reply to Jakub Jelinek from comment #27)
> > 
> > No, that is not a reasonable fix, because it severely pessimizes common code
> > for a theoretical only problem.
> 
> The very existence of (and interest in) this bug report means it is
> obviously not "a theoretical only problem".
> 
> And of course Rich Felker is correct that the cost of the obvious fix is
> trivial and not remotely "severe".

But I didn't see a patch proposed to address this issue, which means it
doesn't seem to be trivial.

> But the bottom line is that GCC is emitting library calls that invoke
> undefined behavior. At a minimum, GCC should document this non-standard
> requirement on its runtime environment. Has anyone bothered to do that? Why
> not?

I think it's written down somewhere but I can't quickly find it (I also
wonder where exactly the best place to document would be - it's related
to porting GCC to a new target architecture I guess, not so much user-facing).

OTOH I see

@cindex @code{cpymem@var{m}} instruction pattern 
@item @samp{cpymem@var{m}}
...
The @code{cpymem@var{m}} patterns need not give special consideration
to the possibility that the source and destination strings might
overlap. These patterns are used to do inline expansion of
@code{__builtin_memcpy}.

which is possibly the closest piece we have and which fails to mention
exact overlap.  I'll propose an adjustment to this.

[Bug c/112676] New: [14 regression] ICE in extract_insn, at recog.cc:2804

2023-11-22 Thread manuel.lauss at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112676

Bug ID: 112676
   Summary: [14 regression] ICE in extract_insn, at recog.cc:2804
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: manuel.lauss at googlemail dot com
  Target Milestone: ---

Created attachment 56669
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56669=edit
compressed unreduced testcase

gcc version 14.0.0 20231123 (experimental)
9d912820d02c7396676e04c4c05f6a0fdd92ed85

This is very recent, on linux g9b6de136:

$ gcc -mno-avx -march=znver4 -O2 -c dcn32_fpu.i
/usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:
In function 'dcn32_internal_validate_bw':
/usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2223:1:
error: unrecognizable insn:
 2223 | }
  | ^
(insn 1628 1627 1629 277 (set (reg:V16QI 1102)
(xor:V16QI (reg:V16QI 1101)
(mem:V16QI (reg:DI 1100) [0 MEM  [(void *)stream_817 +
608B]+0 S16 A8])))
"/usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c":1350:7
-1
 (nil))
during RTL pass: vregs
/usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2223:1:
internal compiler error: in extract_insn, at recog.cc:2804


Omitting either "-march=znver4" or "-mno-avx" gets rid of it.

Thanks!
 Manuel

[Bug target/112675] New: [14 Regression] r14-5385-g0a140730c97087 caused regression on testcases

2023-11-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112675

Bug ID: 112675
   Summary: [14 Regression] r14-5385-g0a140730c97087 caused
regression on testcases
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

As shown in gcc-regression:

https://gcc.gnu.org/pipermail/gcc-regression/2023-November/078504.html

The guilty commit for some regressions is r14-5385-g0a140730c97087.

An easy reproducer would be:

make check RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/fp-int-convert-timode.c
--target_board='unix{-m64\ -march=cascadelake,-m32\
-march=cascadelake,-m32,-m64}'"

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #12 from Richard Biener  ---
Created attachment 56668
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56668=edit
patch  (not working)

So this tries this, moving the duplicate-and-interleave check and changing
code generation.  It seems though that gimple_build_vector_from_val only
uses VEC_DUPLICATE_EXPR for non-constants but tree-vector-builder doesn't
like to build the uniform constant and we ICE:

internal compiler error: in finalize, at vector-builder.h:513
0x1e36958 vector_builder::finalize()
  /space/rguenther/src/gcc/gcc/vector-builder.h:513
0x1e36598 tree_vector_builder::build()
  /space/rguenther/src/gcc/gcc/tree-vector-builder.cc:42
0x15dc80a gimple_build_vector(gimple_stmt_iterator*, bool, gsi_iterator_update,
unsigned int, tree_vector_builder*)
  /space/rguenther/src/gcc/gcc/gimple-fold.cc:9256
0x1ddb2e7 gimple_build_vector(gimple**, tree_vector_builder*)
  /space/rguenther/src/gcc/gcc/gimple-fold.h:241
0x1e0d6f5 vect_create_constant_vectors
  /space/rguenther/src/gcc/gcc/tree-vect-slp.cc:8261

that's the assert

508  void
509  vector_builder::finalize ()
510  {
511/* The encoding requires the same number of elements to come from each
512   pattern.  */
513gcc_assert (multiple_p (m_full_nelts, m_npatterns));

I can of course try to manually build a VEC_DUPLICATE here but I wonder
if we're on the right track here.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #30 from post+gcc at ralfj dot de ---
There have been several assertions above that a certain way to solve this
either has no performance cost at all or severe performance cost. That sounds
like we are missing data -- ideally, someone would benchmark the actual cost of
emitting that branch. It seems kind of pointless to just make assertions about
the impact of this change without real data.

> On the other hand, expecting the libc memcpy to make this check greatly 
> pessimizes every reasonable small use of memcpy with a gratuitous branch for 
> what is undefined behavior and should never appear in any valid program.

I don't think this is true. As far as I can see, the performance impact of
having memcpy support the src==dest case is zero -- the assembly generated by
the current implementations already supports that case. (At least I have not
seen any evidence to the contrary.) No new check in memcpy is required.

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-22 Thread urs at akk dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #25 from urs at akk dot org ---
(In reply to Haochen Jiang from comment #24)
> Patch aims to fix that:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637865.html

Yes, that solved the issue for me. Thanks.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #11 from Richard Biener  ---
OK, I'll give that a try then.

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #24 from Haochen Jiang  ---
Patch aims to fix that:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637865.html

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #8 from Li Pan  ---
For gcc.dg/torture/pr58955-2.c, we can simply reproduce it by options

Pass when: -O3
Pass when: -O3 -ftracer -fno-schedule-insns -fno-schedule-insns2
Fail when: -O3 -ftracer -fno-schedule-insns2

   10154:   4409   li   s0,2
   10156:   9c1d   subw s0,s0,a5
   10158:   1402   sll  s0,s0,0x20
   1015a:   9001   srl  s0,s0,0x20
   1015c:   97ca   add  a5,a5,s2
   1015e:   078a   sll  a5,a5,0x2
   10160:   7b018493   add  s1,gp,1968 # 13400 
   10164:   97a6   add  a5,a5,s1
   10166:   00241613   sll  a2,s0,0x2
   1016a:   853e   mv   a0,a5
   1016c:   4581   li   a1,0
   1016e:   158000ef   jal  102c6 
   10172:   ffc50793   add  a5,a0,-4
   10176:   4689   li   a3,2
   10178:   0d047057   vsetvli  zero,s0,e32,m1,ta,ma
   1017c:   40d8   lw   a4,4(s1)<== Load
   1017e:   5e00b0d7   vmv.v.i  v1,1
   10182:   74d1a423   sw   a3,1864(gp) # 13398 
   10186:   0207e0a7   vse32.v  v1,(a5) <== Store
   1018a:   03271163   bne  a4,s2,101ac 

Looks like the tracer and the sch1 resulted in the failure, it is a typical
Load Before Store issue AFAIK. The lw load should be after the vse32 store in
semantics but the sch1 moves it before the store and of course, the value of a4
is unexpected here.

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)

> 
> I think
>   Value_Range vr (operand_type);
>   if (TREE_CODE_CLASS (operation) == tcc_unary)
> ipa_vr_operation_and_type_effects (vr,
>src_lats->m_value_range.m_vr,
>operation, param_type,
>operand_type);
> should be avoided if param_type is not a compatible type to operand_type,
> unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if
> the float to integral or vice versa ops as well but vrp probably doesn't
> handle that yet).
> In the above case, param_type is struct A *, i.e. pointer, while
> operand_type is int.

the root of the issue is that the precisions are different, and we're invoking
an operation which expects the precisions to be the same (minus in this case). 
 we can't deal with this in dispatch because some operations allow the LH and
RH to be different precisions or even types.

It also seems like overkill to have every operation check the incoming
precision, but perhaps not... we could limit it to the wi_fold() subsets.. let
me have a look. if we get incompatible types, perhaps returning VARYING should
be OK?

[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression

2023-11-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642

--- Comment #10 from Jonathan Wakely  ---
(In reply to Miro Palmu from comment #9)
> Mine is 13.2.1 20230801 so way before Oct 21. (I did not know there were
> different snapshots of the releases, I'm just a user trying to help :) )

13.2.1 (and any x.y.1 version) is not a release, it's a snapshot made from a
branch between releases. See https://gcc.gnu.org/develop.html#num_scheme or
more details.

Releases end with a .0 number.

> > Anyway, the original GCC error is the same as PR 112642
> 
> You probably mean PR 110158

Oops! I meant PR 111258

[Bug c++/110734] Attributes cannot be applied to asm statements

2023-11-22 Thread tanksherman27 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110734

--- Comment #5 from Julian Waters  ---
Note: Trying this with a top level asm gives me:

$ g++ -O3 -flto=auto -std=c++14 -pedantic -Wpedantic -fno-omit-frame-pointer
exceptions.cpp
exceptions.cpp:8:1: error: expected unqualified-id before 'asm'
8 | asm ("nop");
  | ^~~

So while it seems the errors are different, it fundamentally is the same issue

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

--- Comment #3 from Andrew Pinski  ---
parityhi2 should have:
rtx extra = gen_reg_rtx (HImode);
emit_move_insn (extra, operands[1]);
emit_insn (gen_parityhi2_cmp (extra));

Or something similar because parityqi2_cmp clobbers its argument.

[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076

2023-11-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464

Robin Dapp  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Robin Dapp  ---
Fixed.

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Andrew Pinski  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #2 from Andrew Pinski  ---
Actually it has been wrong since r11-1027-gf08995eefbf579 . Just exposed by
Jakub's parity improvement: r14-5557-g6dd4c703be17fa .

[Bug middle-end/112336] fsanitize=address vs _BitInt with a non-mode size (smaller than max mode size)

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112336

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Jakub Jelinek  ---
Created attachment 56667
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56667=edit
gcc14-pr112336.patch

Untested fix.

[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90

2023-11-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445

--- Comment #6 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #4)
> I think this goes wrong during combine.
Combine does not / should not combine moves from hard registers just because of
extending register live range. It looks that this should also include
zero-extracts and other "pseudo-move" instructions.

The relevant patch and discussion is at [1].

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg01356.html

[Bug target/112672] New: [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Bug ID: 112672
   Summary: [14 Regression] wrong code with __builtin_parityl() at
-O and above on x86_64
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 5
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=5=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -O testcase.c
$ ./a.out 
Aborted

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-5761-20231122145100-ge9b39df9333-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-5761-20231122145100-ge9b39df9333-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231122 (experimental) (GCC) 

At the asm output, the problem is obvious:

main:
# testcase.c:8:   u *= g;
movzx   eax, WORD PTR g[rip]# tmp110, g
sal eax, 2  # u,
# testcase.c:9:   return u + __builtin_parityl (u);
xor al, ah  # u <== THIS OVERWRITES "u" in eax
setnp   dl  #, tmp105
movzx   edx, dl # tmp105, tmp105
# testcase.c:9:   return u + __builtin_parityl (u);
add eax, edx# tmp107, tmp105 <== THIS READ "u", but it has
been lost
# testcase.c:16:   if (x != 4 * 254 + 1)
cmp ax, 1017# tmp107,
jne .L6 #,
# testcase.c:19: }
mov eax, 0  #,
ret

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-11-23
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Obvious this is wrong:
;; _5 = .PARITY (u_4);

(insn 7 6 8 (parallel [
(set (reg:CC 17 flags)
(unspec:CC [
(reg/v:HI 99 [ uD.2808 ])
] UNSPEC_PARITY))
(clobber (reg/v:HI 99 [ uD.2808 ]))
]) "/app/example.cpp":9:32 -1
 (nil))
...
;; if (_7 != 1017)

(insn 11 10 12 (parallel [
(set (reg:HI 107)
(plus:HI (reg/v:HI 99 [ uD.2808 ])
(subreg:HI (reg:SI 100 [ _5 ]) 0)))
(clobber (reg:CC 17 flags))
]) "/app/example.cpp":9:34 discrim 1 -1
 (nil))

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 CC||pinskia at gcc dot gnu.org

[Bug target/112592] FAIL: c-c++-common/pr111309-1.c -std=gnu++14 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:216)

2023-11-22 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112592

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from John David Anglin  ---
Fixed.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread lopresti at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #29 from Patrick J. LoPresti  ---
(In reply to Jakub Jelinek from comment #27)
> 
> No, that is not a reasonable fix, because it severely pessimizes common code
> for a theoretical only problem.

The very existence of (and interest in) this bug report means it is obviously
not "a theoretical only problem".

And of course Rich Felker is correct that the cost of the obvious fix is
trivial and not remotely "severe".

But the bottom line is that GCC is emitting library calls that invoke undefined
behavior. At a minimum, GCC should document this non-standard requirement on
its runtime environment. Has anyone bothered to do that? Why not?

[Bug debug/112674] New: [14 Regression] Compare-debug failure after recent change on c6x

2023-11-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112674

Bug ID: 112674
   Summary: [14 Regression] Compare-debug failure after recent
change on c6x
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This patch:

commit 6bf66276e3e41d5d92f7b7260e98b6a111653805
Author: Richard Biener 
Date:   Wed Nov 22 11:10:41 2023 +0100

tree-optimization/112344 - wrong final value replacement

When performing final value replacement chrec_apply that's used to
compute the overall effect of niters to a CHREC doesn't consider that
the overall increment of { -2147483648, +, 2 } doesn't fit in
a signed integer when the loop iterates until the value of the IV
of 20.  The following fixes this mistake, carrying out the multiply
and add in an unsigned type instead, avoiding undefined overflow
and thus later miscompilation by path range analysis.

PR tree-optimization/112344
* tree-chrec.cc (chrec_apply): Perform the overall increment
calculation and increment in an unsigned type.

* gcc.dg/torture/pr112344.c: New testcase.

Is causing a compare-debug failure on the c6x port:

c6x-sim: gcc.dg/pr65779.c (test for excess errors)

I haven't dug into this any deeper.  It could well be a c6x bug in the end. 
While it may sound similar to pr109777, pr109777 has been debugged far enough
to  lay the blame on the bfin backend.

[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||uros at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
I think this goes wrong during combine.
Before combine, we have:
(insn 8 7 10 2 (set (subreg:HI (reg:QI 152) 0)
(zero_extract:HI (reg:HI 1 dx [ cu8_0 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))) "pr112445.c":11:1 114 {*extzvhi}
 (expr_list:REG_DEAD (reg:HI 1 dx [ cu8_0 ])
(nil)))
...
tons of insns including
(insn 36 34 37 2 (parallel [
(set (reg:TI 142 [ _66 ])
(mult:TI (zero_extend:TI (reg:DI 171 [ cu8_0 ]))
(zero_extend:TI (subreg:DI (reg:TI 104 [ _10 ]) 0
(clobber (reg:CC 17 flags))
]) "pr112445.c":12:9 522 {*umulditi3_1}
 (expr_list:REG_DEAD (reg:DI 171 [ cu8_0 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
...
(insn 41 38 44 2 (set (reg:DI 177 [ cu8_0+1 ])
(zero_extend:DI (reg:QI 152))) "pr112445.c":12:9 170 {zero_extendqidi2}
 (expr_list:REG_DEAD (reg:QI 152)
(nil)))
and combine merges insn 41 with insn 8 across 20 other insns:
Trying 8 -> 41:
8: r152:QI#0=zero_extract(dx:HI,0x8,0x8)
  REG_DEAD dx:HI
   41: r177:DI=zero_extend(r152:QI)
  REG_DEAD r152:QI
Successfully matched this instruction:
(set (reg:DI 177 [ cu8_0+1 ])
(zero_extract:DI (reg:DI 1 dx [ cu8_0 ])
(const_int 8 [0x8])
(const_int 8 [0x8])))
into:
(insn 41 38 44 2 (set (reg:DI 177 [ cu8_0+1 ])
(zero_extract:DI (reg:DI 1 dx [ cu8_0 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))) "pr112445.c":12:9 116 {*extzvdi}
 (expr_list:REG_DEAD (reg:HI 1 dx [ cu8_0 ])
(nil)))
and by that it significantly extends the live range of rdx register, which is a
single class register.  Now insn 36 has constraints =r,A on output and %d,a on
first input and rm,rm on second input, meaning that it either has %rdx:%rax
destination (second alternative), or %rdx as one of the inputs, so when %rdx is
live across it, it can't be reloaded.
On that insn, the commit changed
-   (match_operand:DWIH 1 "nonimmediate_operand" "%d,0"))
+   (match_operand:DWIH 1 "register_operand" "%d,a"))
on the constraints, is that something that LRA used to handle fine (how?)?
Actually, in the r14-4967 reload dump I see:
(insn 223 193 202 2 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int 40 [0x28])) [3 %sfp+-40 S8 A64])
(reg:DI 1 dx)) "pr112445.c":12:9 90 {*movdi_internal}
 (nil))
(insn 202 223 36 2 (set (reg:DI 0 ax [orig:142 _66 ] [142])
(mem/c:DI (reg/f:DI 7 sp) [3 %sfp+-80 S8 A128])) "pr112445.c":12:9 90
{*movdi_internal}
 (nil))
(insn 36 202 203 2 (parallel [
(set (reg:TI 0 ax [orig:142 _66 ] [142])
(mult:TI (zero_extend:TI (reg:DI 0 ax [orig:142 _66 ] [142]))
(zero_extend:TI (reg:DI 37 r9 [orig:104 _10 ] [104]
(clobber (reg:CC 17 flags))
]) "pr112445.c":12:9 522 {*umulditi3_1}
 (nil))
(insn 203 36 224 2 (set (mem/c:TI (reg/f:DI 7 sp) [3 %sfp+-80 S16 A128])
(reg:TI 0 ax [orig:142 _66 ] [142])) "pr112445.c":12:9 89
{*movti_internal}
 (nil))
(insn 224 203 158 2 (set (reg:DI 1 dx)
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int 40 [0x28])) [3 %sfp+-40 S8 A64])) "pr112445.c":12:9
90 {*movdi_internal}
 (nil))
so presumably LRA managed in that case to save and restore %rdx around it.
Is the problem the 0->a change when operand 0 is A?

[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
So, can this be closed as fixed?

[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value

2023-11-22 Thread hanicka at hanicka dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633

--- Comment #5 from Hana Dusíková  ---
Thanks for really quick fix! You are awesome!

[Bug target/112670] RISC-V: Run fail on pr65518.c with -flto

2023-11-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670

--- Comment #1 from Robin Dapp  ---
The problem is exposed with the ipa copy propagation pass.  I haven't narrowed
it down yet but will continue tomorrow.

[Bug driver/108865] gcc on Windows fails with Unicode path to source file

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865

--- Comment #46 from CVS Commits  ---
The master branch has been updated by Jonathan Yong :

https://gcc.gnu.org/g:4f1ebd54380e16927cd0085be939165870354eac

commit r14-5768-g4f1ebd54380e16927cd0085be939165870354eac
Author: Costas Argyris 
Date:   Mon Nov 20 17:58:16 2023 +

mingw: Exclude utf8 manifest [PR70, PR108865]

Make the utf8 manifest optional (on by default and
explicitly off with --disable-win32-utf8-manifest)
in the mingw hosts.

Also eliminate duplication between the 32-bit and
64-bit mingw hosts by putting them both in the
same branch and special-case only the 64-bit long
long setting.

PR mingw/70
PR mingw/108865

Signed-off-by: Costas Argyris 
Signed-off-by: Jonathan Yong <10wa...@gmail.com>

gcc/Changelog:

* configure.ac: Handle new --enable-win32-utf8-manifest
option.
* config.host: allow win32 utf8 manifest to be disabled
by user.
* configure: Regenerate.

[Bug modula2/112506] gm2 test failures on x86_64-apple-darwin21

2023-11-22 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112506

Gaius Mulley  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #4 from Gaius Mulley  ---
Thanks for the report - I suspect it is a duplicate of PR 111627.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #28 from Rich Felker  ---
> No, that is not a reasonable fix, because it severely pessimizes common code 
> for a theoretical only problem.

Far less than a call to memmove (which necessarily has something comparable to
that and other unnecessary branches) pessimizes it.

I also disagree that it's severe. On basically any machine with branch
prediction, the branch will be predicted correctly all the time and has
basically zero cost. On the other hand, the branches in memmove could go
different ways depending on the caller, so it's much more
machine-capability-dependent whether they can be predicted.

In some sense the optimal thing to do is "nothing", just assuming it would be
hard to write a memcpy that fails on src==dest. However, at the very least this
precludes hardened memcpy trapping on src==dest, which might be a useful
hardening feature (or rather on a range test for overlapping, which would
happen to also catch exact overlap). So it would be nice if it were fixed.

FWIW, I don't think single branches are relevant to overall performance in
cases where the compiler is doing something reasonable by emitting a call to
memcpy to implement assignment. If the object is small enough that the branch
is relevant, the call overhead is even more of a big deal, and it should be
inlining loads/stores to perform the assignment.

[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Jonathan Yong :

https://gcc.gnu.org/g:4f1ebd54380e16927cd0085be939165870354eac

commit r14-5768-g4f1ebd54380e16927cd0085be939165870354eac
Author: Costas Argyris 
Date:   Mon Nov 20 17:58:16 2023 +

mingw: Exclude utf8 manifest [PR70, PR108865]

Make the utf8 manifest optional (on by default and
explicitly off with --disable-win32-utf8-manifest)
in the mingw hosts.

Also eliminate duplication between the 32-bit and
64-bit mingw hosts by putting them both in the
same branch and special-case only the 64-bit long
long setting.

PR mingw/70
PR mingw/108865

Signed-off-by: Costas Argyris 
Signed-off-by: Jonathan Yong <10wa...@gmail.com>

gcc/Changelog:

* configure.ac: Handle new --enable-win32-utf8-manifest
option.
* config.host: allow win32 utf8 manifest to be disabled
by user.
* configure: Regenerate.

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #9 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #8)

> Well, in this case the user explicitly told compiler not to do that by not
> using a prototype and syntax which doesn't provide one from the definition.
> It is like using
> int f1 (struct C *x, struct A *y)
> {
>   ...
> }
> definition in one TU, and
> int f1 (int, int);
> prototype and
> f1 (0, ~x)
> call in another one + using LTO.  What I meant is how to do decide if the
> param_type vs. operand_type mismatch is ok or not.

I vote we do nothing extra for those clowns! Just return VARYING for a range
:-)

it seems like the safest thing to do?

[Bug target/112669] GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags

2023-11-22 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669

Thomas Schwinge  changed:

   What|Removed |Added

   Last reconfirmed||2023-11-22
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |tschwinge at gcc dot 
gnu.org

[Bug target/112617] [14 regression] ICE when building systemd on HPPA (internal compiler error: in find_reloads, at reload.cc:3839)

2023-11-22 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112617

John David Anglin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from John David Anglin  ---
Should be fixed now.

[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90

2023-11-22 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445

--- Comment #7 from Vladimir Makarov  ---
(In reply to Jakub Jelinek from comment #5)
> Just changing
> --- i386.md.xx2023-11-22 09:47:22.746637132 +0100
> +++ i386.md   2023-11-22 20:38:07.216218697 +0100
> @@ -9984,7 +9984,7 @@
>[(set (match_operand: 0 "register_operand" "=r,A")
>   (mult:
> (zero_extend:
> - (match_operand:DWIH 1 "register_operand" "%d,a"))
> + (match_operand:DWIH 1 "register_operand" "%d,0"))
> (zero_extend:
>   (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"
> (clobber (reg:CC FLAGS_REG))]
> makes the testcase pass.  A question is how RA treats 0 constraint when the
> two operands have different modes, if it is basically the same as a in that

LRA treats the same way as reload pass.  It is the same hard reg for LE target.
 For BE they are different if they require different number of hard regs.


> case, meaning that the first input operand will never be in %rdx even when
> the A constraint contains %rax and %rdx registers (but the double-word mode
> implies it must be low part in %rax high part in $rdx).

I looked at the testcase.  It seems it can be fixed by different placement of
splitting insns.  So I believe the bug will stay and can be latent if we fix
the PR by some other way.

I'll start to work on this bug on Monday as I will be absent the next two days.

[Bug middle-end/112510] [11/12/13/14 Regression]: ASAN code injection breaks alignment of stack variables

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112510

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek  ---
Can't reproduce, neither with GCC 12 nor current trunk.
In the ASAN_OPTIONS=detect_stack_use_after_return=1 case, the stack frames are
allocated by __asan_stack_malloc_4, but that seems to return enough aligned
frames for me (eventhough the routine doesn't have an argument to request a
particular alignment).
Even tried
struct __attribute__((aligned (64))) S { char buf[64]; };

__attribute__((noinline, noclone, noipa)) void
bar (struct S *p, char *a)
{
  if ((__UINTPTR_TYPE__)p % 64)
__builtin_abort ();
}

__attribute__((noinline, noclone, noipa)) void
foo (void)
{
  struct S s;
  char a;
  bar (, );
}

int
main ()
{
  for (int i = 0; i < 32; ++i)
foo ();
}
and the frames were sufficiently aligned in all 32 cases.

[Bug debug/112674] [14 Regression] Compare-debug failure after recent change on c6x

2023-11-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112674

--- Comment #1 from Jeffrey A. Law  ---
And possibly more interesting than the compare-debug failure is this patch
seems to be causing Wstringop-overflow-17 to fail on multiple targets,
including c6x.

[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:3f266c84a15d63e42bfad46397fea9aff92b0720

commit r14-5763-g3f266c84a15d63e42bfad46397fea9aff92b0720
Author: Patrick Palka 
Date:   Wed Nov 22 13:54:29 2023 -0500

c++: alias template of non-template class [PR112633]

The entering_scope adjustment in tsubst_aggr_type assumes if an alias is
dependent, then so is the aliased type (and therefore it has template info)
but that's not true for the dependent alias template specialization ty1
below which aliases the non-template class A.  In this case no adjustment
is needed anyway, so we can just punt.

PR c++/112633

gcc/cp/ChangeLog:

* pt.cc (tsubst_aggr_type): Handle empty TYPE_TEMPLATE_INFO
in the entering_scope adjustment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-75.C: New test.

[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Patrick Palka
:

https://gcc.gnu.org/g:63c65224e778124eee52acc7b9fcb32cd8ad61e8

commit r13-8090-g63c65224e778124eee52acc7b9fcb32cd8ad61e8
Author: Patrick Palka 
Date:   Wed Nov 22 19:07:19 2023 -0500

c++: alias template of non-template class [PR112633]

The entering_scope adjustment in tsubst_aggr_type assumes if an alias is
dependent, then so is the aliased type (and therefore it has template info)
but that's not true for the dependent alias template specialization ty1
below which aliases the non-template class A.  In this case no adjustment
is needed anyway, so we can just punt.

PR c++/112633

gcc/cp/ChangeLog:

* pt.cc (tsubst_aggr_type): Handle empty TYPE_TEMPLATE_INFO
in the entering_scope adjustment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-75.C: New test.

(cherry picked from commit 3f266c84a15d63e42bfad46397fea9aff92b0720)

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #6 from Jakub Jelinek  ---
I don't know the IPA code enough to know whether different operand_type vs.
param_type (in the !types_compatible_p sense) means just user bug (in that case
returning VARYING is perfectly fine), or if it can happen also on valid code,
where say caller has one type of argument and callee a different and there is
an implicit (or explicit) cast in between the two.  The latter case would be
nice to get handled without giving up.
I mean something like
void
foo (int x)
{
  asm volatile ("" : "+r" (x));
}

void
bar (long x)
{
  foo (x);
}

void
baz (long x)
{
  if (x < -42 || x >= 185)
return;
  bar (x);
}
kind of thing (but making sure we don't inline and IPA-VRP tries to propagate
something etc.).

[Bug sanitizer/112336] fsanitize=address vs _BitInt with a non-mode size (smaller than max mode size)

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112336

--- Comment #3 from Jakub Jelinek  ---
Seems one doesn't need the sanitizer for that,
unsigned _BitInt(1) v1;
unsigned _BitInt(1) *p1 = 
ICEs as well.

[Bug target/112592] FAIL: c-c++-common/pr111309-1.c -std=gnu++14 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:216)

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112592

--- Comment #2 from CVS Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:6f59f959e751d73b371d52f9c657f78d7a77983c

commit r14-5765-g6f59f959e751d73b371d52f9c657f78d7a77983c
Author: John David Anglin 
Date:   Wed Nov 22 20:06:22 2023 +

hppa: Define MAX_FIXED_MODE_SIZE

Replace default define.  We support TImode when TARGET_64BIT is true.

2023-11-22  John David Anglin  

gcc/ChangeLog:

PR target/112592
* config/pa/pa.h (MAX_FIXED_MODE_SIZE): Define.

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #8 from Jakub Jelinek  ---
(In reply to Andrew Macleod from comment #7)
> Alternatively, if IPA could figure out when things need promoting..  GCC
> must already do it, although I suppose thats in the front ends :-P

Well, in this case the user explicitly told compiler not to do that by not
using a prototype and syntax which doesn't provide one from the definition.
It is like using
int f1 (struct C *x, struct A *y)
{
  ...
}
definition in one TU, and
int f1 (int, int);
prototype and
f1 (0, ~x)
call in another one + using LTO.  What I meant is how to do decide if the
param_type vs. operand_type mismatch is ok or not.

[Bug tree-optimization/112673] New: [14 Regression] ICE verify_gimple failed since r14-5557-g6dd4c703be17fa

2023-11-22 Thread mjires at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112673

Bug ID: 112673
   Summary: [14 Regression] ICE verify_gimple failed since
r14-5557-g6dd4c703be17fa
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mjires at suse dot cz
CC: jakub at redhat dot com
  Target Milestone: ---

Compiling pr112566-2.c from testsuite.
Bisection points to r14-5557-g6dd4c703be17fa, which also introduced this test.

$ gcc pr112566-2.c -Ofast -mf16c
pr112566-2.c: In function ‘corge’:
pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand
   10 | int corge (_BitInt(256) x) { return __builtin_ctzg ((unsigned
_BitInt(512)) x); }
  | ^
_18 = BIT_FIELD_REF ;
pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand
_1 = BIT_FIELD_REF ;
pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand
_38 = BIT_FIELD_REF ;
during GIMPLE pass: forwprop
pr112566-2.c:10:5: internal compiler error: verify_gimple failed
0x105854d verify_gimple_in_cfg(function*, bool, bool)
/home/mjires/git/GCC/master/gcc/tree-cfg.cc:5662
0xee86b4 execute_function_todo
/home/mjires/git/GCC/master/gcc/passes.cc:2088
0xee8c0e execute_todo
/home/mjires/git/GCC/master/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ gcc -v
Using built-in specs.
COLLECT_GCC=/home/mjires/built/master/bin/gcc
COLLECT_LTO_WRAPPER=/home/mjires/built/master/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/mjires/git/GCC/master/configure
--prefix=/home/mjires/built/master --disable-bootstrap --enable-checking
--enable-languages=c,c++,fortran,lto
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231122 (experimental) (GCC)

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #30 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:990769a343f090088f5025ad233f88824b2c6263

commit r14-5769-g990769a343f090088f5025ad233f88824b2c6263
Author: Pan Li 
Date:   Mon Nov 13 11:22:37 2023 +0800

DSE: Allow vector type for get_stored_val when read < store

Update in v4:
* Merge upstream and removed some independent changes.

Update in v3:
* Take known_le instead of known_lt for vector size.
* Return NULL_RTX when gap is not equal 0 and not constant.

Update in v2:
* Move vector type support to get_stored_val.

Original log:

This patch would like to allow the vector mode in the
get_stored_val in the DSE. It is valid for the read
rtx if and only if the read bitsize is less than the
stored bitsize.

Given below example code with
--param=riscv-autovec-preference=fixed-vlmax.

vuint8m1_t test () {
  uint8_t arr[32] = {
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
  };

  return __riscv_vle8_v_u8m1(arr, 32);
}

Before this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addisp,sp,-32
  addia5,a5,%lo(.LANCHOR0)
  li  a3,32
  vl2re64.v   v2,0(a5)
  vsetvli zero,a3,e8,m1,ta,ma
  vs2r.v  v2,0(sp) <== Unnecessary store to stack
  vle8.v  v1,0(sp) <== Ditto
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

After this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

Below tests are passed within this patch:
* The risc-v regression test.
* The x86 bootstrap and regression test.
* The aarch64 regression test.

PR target/111720

gcc/ChangeLog:

* dse.cc (get_stored_val): Allow vector mode if read size is
less than or equal to stored size.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111720-0.c: New test.
* gcc.target/riscv/rvv/base/pr111720-1.c: New test.
* gcc.target/riscv/rvv/base/pr111720-10.c: New test.
* gcc.target/riscv/rvv/base/pr111720-2.c: New test.
* gcc.target/riscv/rvv/base/pr111720-3.c: New test.
* gcc.target/riscv/rvv/base/pr111720-4.c: New test.
* gcc.target/riscv/rvv/base/pr111720-5.c: New test.
* gcc.target/riscv/rvv/base/pr111720-6.c: New test.
* gcc.target/riscv/rvv/base/pr111720-7.c: New test.
* gcc.target/riscv/rvv/base/pr111720-8.c: New test.
* gcc.target/riscv/rvv/base/pr111720-9.c: New test.

Signed-off-by: Pan Li 

[Bug testsuite/106120] [13 regression] g++.dg/warn/Wstringop-overflow-4.C fails since r13-1268-g8c99e307b20c50

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106120

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Hans-Peter Nilsson :

https://gcc.gnu.org/g:e935151bad1c2a02dc6a31fce3cc21b17d616243

commit r14-5767-ge935151bad1c2a02dc6a31fce3cc21b17d616243
Author: Hans-Peter Nilsson 
Date:   Wed Nov 22 02:54:29 2023 +0100

testsuite: Tweak xfail bogus g++.dg/warn/Wstringop-overflow-4.C:144,
PR106120

The conditions under which this this bogus warning is
emitted has changed to not happen for 32-bit targets
anymore.  Adjust accordingly.

PR testsuite/106120
* g++.dg/warn/Wstringop-overflow-4.C:144 XFAIL bogus warning for
lp64 targets with c++98.

[Bug target/112617] [14 regression] ICE when building systemd on HPPA (internal compiler error: in find_reloads, at reload.cc:3839)

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112617

--- Comment #4 from CVS Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:a89224f819381b77657145fdd8b1d997b989fdc0

commit r14-5764-ga89224f819381b77657145fdd8b1d997b989fdc0
Author: John David Anglin 
Date:   Wed Nov 22 19:47:34 2023 +

hppa: Fix integer REG+D address reloads

I made a mistake in the previous change to integer_store_memory_operand.
There is no support pa_emit_move sequence to handle secondary reloads of
integer REG+D instructions.  Further, the Q constraint is used for some
non-simple instructions (movb and addib).  Thus, we need to return true
when reload is in progress.

2023-11-22  John David Anglin  

gcc/ChangeLog:

PR target/112617
* config/pa/predicates.md (integer_store_memory_operand): Return
true for REG+D addresses when reload_in_progress is true.

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #7 from Andrew Macleod  ---
Explicit casts would be no problem as they go through the proper machinery. The
IL for that case has an explicit cast in it.

  _1 = (int) x_2(D);
  foo (_1);

its when that cast is not present,and we try to, say subtract two values, that
we have a problem.  we expect the compiler to promote things to be compatible
when they are suppose to be. This would apply to dual operand arithmetic like
+, -, /, *, bitwise ops, etc.

The testcase in particular is a bitwise not... but it has a return type that is
64 bits and a operand type that is 32.  It was expected that the compiler would
promote the operand to 64 bits if it expects a 64 bit result. At least for
those tree codes which expect compatible types..

I don't think we want to get into overruling decisions at the range-ops level..
 So we decide whether to trap (which would be the same result as we see now
:-P), or handle it some other way.  returning VARYING was my thought.. because
it means something is amuck so say we dont know anything.

Alternatively, if IPA could figure out when things need promoting..  GCC must
already do it, although I suppose thats in the front ends :-P

[Bug middle-end/112510] [11/12/13/14 Regression]: ASAN code injection breaks alignment of stack variables

2023-11-22 Thread sadko4u at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112510

--- Comment #17 from Vladimir Sadovnikov  ---
Reproducible with 11.4.0

~$ export ASAN_OPTIONS=detect_stack_use_after_return=1
~$ g++ -fsanitize=address -Og test-case.cpp
~$ ./a.out 
Aborted (core dumped)
~$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Not reproducible with 7.5.0:

sadko@tuf-gaming:~/tmp> export ASAN_OPTIONS=detect_stack_use_after_return=1
sadko@tuf-gaming:~/tmp> g++ -fsanitize=address -Og test-case.cpp
sadko@tuf-gaming:~/tmp> ./a.out 
sadko@tuf-gaming:~/tmp> gcc --version
gcc (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Generated code for 11.4.0:

11e9 <_Z3barP1SPc>:
11e9:   f3 0f 1e fa endbr64 
11ed:   40 f6 c7 3f test   $0x3f,%dil
11f1:   75 01   jne11f4 <_Z3barP1SPc+0xb>
11f3:   c3  ret
11f4:   48 83 ec 08 sub$0x8,%rsp
11f8:   e8 c3 fe ff ff  call   10c0
<__asan_handle_no_return@plt>
11fd:   e8 9e fe ff ff  call   10a0 

1202 <_Z3foov>:
1202:   f3 0f 1e fa endbr64 
1206:   55  push   %rbp
1207:   48 89 e5mov%rsp,%rbp
120a:   41 55   push   %r13
120c:   41 54   push   %r12
120e:   53  push   %rbx
120f:   48 83 e4 c0 and$0xffc0,%rsp
1213:   48 81 ec 00 01 00 00sub$0x100,%rsp
121a:   48 8d 5c 24 20  lea0x20(%rsp),%rbx
121f:   49 89 ddmov%rbx,%r13
1222:   83 3d e7 2d 00 00 00cmpl   $0x0,0x2de7(%rip)# 4010
<__asan_option_detect_stack_use_after_return@@Base>
1229:   0f 85 bb 00 00 00   jne12ea <_Z3foov+0xe8>
122f:   48 c7 03 b3 8a b5 41movq   $0x41b58ab3,(%rbx)
1236:   48 8d 05 c7 0d 00 00lea0xdc7(%rip),%rax# 2004
<_IO_stdin_used+0x4>
123d:   48 89 43 08 mov%rax,0x8(%rbx)
1241:   48 8d 05 ba ff ff fflea-0x46(%rip),%rax# 1202
<_Z3foov>
1248:   48 89 43 10 mov%rax,0x10(%rbx)
124c:   49 89 dcmov%rbx,%r12
124f:   49 c1 ec 03 shr$0x3,%r12
1253:   41 c7 84 24 00 80 ffmovl   $0xf1f1f1f1,0x7fff8000(%r12)
125a:   7f f1 f1 f1 f1 
125f:   41 c7 84 24 04 80 ffmovl   $0xf1f1f1f1,0x7fff8004(%r12)
1266:   7f f1 f1 f1 f1 
126b:   41 c7 84 24 08 80 ffmovl   $0xf201f1f1,0x7fff8008(%r12)
1272:   7f f1 f1 01 f2 
1277:   41 c7 84 24 14 80 ffmovl   $0xf3f3f3f3,0x7fff8014(%r12)
127e:   7f f3 f3 f3 f3 
1283:   64 48 8b 04 25 28 00mov%fs:0x28,%rax
128a:   00 00 
128c:   48 89 84 24 f8 00 00mov%rax,0xf8(%rsp)
1293:   00 
1294:   31 c0   xor%eax,%eax
1296:   48 8d 73 50 lea0x50(%rbx),%rsi
129a:   48 8d 7b 60 lea0x60(%rbx),%rdi
129e:   e8 46 ff ff ff  call   11e9 <_Z3barP1SPc>
12a3:   49 39 ddcmp%rbx,%r13
12a6:   75 5d   jne1305 <_Z3foov+0x103>
12a8:   49 c7 84 24 00 80 ffmovq   $0x0,0x7fff8000(%r12)
12af:   7f 00 00 00 00 
12b4:   41 c7 84 24 08 80 ffmovl   $0x0,0x7fff8008(%r12)
12bb:   7f 00 00 00 00 
12c0:   41 c7 84 24 14 80 ffmovl   $0x0,0x7fff8014(%r12)
12c7:   7f 00 00 00 00 
12cc:   48 8b 84 24 f8 00 00mov0xf8(%rsp),%rax
12d3:   00 
12d4:   64 48 2b 04 25 28 00sub%fs:0x28,%rax
12db:   00 00 
12dd:   75 65   jne1344 <_Z3foov+0x142>
12df:   48 8d 65 e8 lea-0x18(%rbp),%rsp
12e3:   5b  pop%rbx
12e4:   41 5c   pop%r12
12e6:   41 5d   pop%r13
12e8:   5d  pop%rbp
12e9:   c3  ret
12ea:   bf c0 00 00 00  mov$0xc0,%edi
12ef:   e8 ec fd ff ff  call   10e0 <__asan_stack_malloc_2@plt>
12f4:   48 85 c0test   %rax,%rax
12f7:   0f 84 32 ff ff ff   je 122f <_Z3foov+0x2d>
12fd:   48 89 c3mov%rax,%rbx
1300:   e9 2a ff ff ff  jmp122f <_Z3foov+0x2d>
1305:   48 c7 03 0e 36 e0 

[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445

--- Comment #5 from Jakub Jelinek  ---
Just changing
--- i386.md.xx  2023-11-22 09:47:22.746637132 +0100
+++ i386.md 2023-11-22 20:38:07.216218697 +0100
@@ -9984,7 +9984,7 @@
   [(set (match_operand: 0 "register_operand" "=r,A")
(mult:
  (zero_extend:
-   (match_operand:DWIH 1 "register_operand" "%d,a"))
+   (match_operand:DWIH 1 "register_operand" "%d,0"))
  (zero_extend:
(match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"
(clobber (reg:CC FLAGS_REG))]
makes the testcase pass.  A question is how RA treats 0 constraint when the two
operands have different modes, if it is basically the same as a in that case,
meaning that the first input operand will never be in %rdx even when the A
constraint contains %rax and %rdx registers (but the double-word mode implies
it must be low part in %rax high part in $rdx).

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #31 from Li Pan  ---
We still have some unnecessary code here, which is stack-related, will take
care of it in another PATCH.

After this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32   <== unnecessary insn
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32<== unnecessary insn
  jr  ra

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #3 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #2)
> > --- Comment #1 from Jakub Jelinek  ---
> > Strange.  On cfarm211 which is
> > SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> > the test passes.
> 
> Can you check which libiconv got picked up there?  The non-standard
> OpenCSW packages on that system may include GNU libiconv and install
> into default system directories, so they are picked up by default.

/opt/csw/lib/libiconv.so.2
> 
> > You get no diagnostics for those lines at all?  Buggy libconv?
> 
> No.  There's no separate libiconv on Solaris; the iconv* functions are
> included in libc.

On Linux I get:
echo á | iconv -f UTF-8 -t ASCII -; echo  | iconv -f UTF-8 -t ISO-8859-1 -
iconv: illegal input sequence at position 0
iconv: illegal input sequence at position 0
while on Solaris
echo á | iconv -f UTF-8 -t ASCII -; echo  | iconv -f UTF-8 -t ISO-8859-1 -
?
?
If it maps all characters which do not have representation in the destination
character set into ?, then it is useless for the test in question.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Jakub Jelinek  ---
> Strange.  On cfarm211 which is
> SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> the test passes.

Can you check which libiconv got picked up there?  The non-standard
OpenCSW packages on that system may include GNU libiconv and install
into default system directories, so they are picked up by default.

> You get no diagnostics for those lines at all?  Buggy libconv?

No.  There's no separate libiconv on Solaris; the iconv* functions are
included in libc.

> I mean the emojis certainly aren't in ISO-8859-1...

Probably not ;-)

FWIW, I've just built trunk with GNU libiconv 1.17 on
i386-pc-solaris2.11.  The test PASSes now with both LANG=C and
LANG=en_US.UTF-8.

I'll dig further into Solaris iconv functions here...

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #27 from Jakub Jelinek  ---
(In reply to Rich Felker from comment #26)
> > The only reasonable fix on the compiler side is to never emit memcpy but 
> > always use memmove.
> 
> No, it can literally just emit (equivalent at whatever intermediate form of):
> 
> cmp src,dst
> je 1f
> call memcpy
> 1:
> 
> in place of memcpy.

No, that is not a reasonable fix, because it severely pessimizes common code
for a theoretical only problem.

[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671

--- Comment #4 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #3 from Arsen Arsenović  ---
> hm, actually, I think I confused reports - sorry.
>
> do you know if this worked a short while ago?  and if so, how did such a
> configuration look?

I have no idea: at least AFAICS back to the gcc-11 branch (didn't look
further) there was only --with-libiconv-prefix.

Still it's inconsistent with how many (all?) other support libs are
handled.

[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}

2023-11-22 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671

--- Comment #3 from Arsen Arsenović  ---
hm, actually, I think I confused reports - sorry.

do you know if this worked a short while ago?  and if so, how did such a
configuration look?

[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Arsen Arsenović  ---
[...]
> I will restore the modifications in the shared tree with the few other patches
> I mentioned on the GCC ML recently soon (I've ran a little low on testing
> bandwidth this week..)
>
> apologies for the inconvenience

No worries, this is the first time ever I tried this on Solaris and can
easily live with 32-bit-only testing for now.

Thanks for taking care of this.

[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}

2023-11-22 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671

--- Comment #1 from Arsen Arsenović  ---
yes, this also came up from the binutils side.  see
https://inbox.sourceware.org/binutils/874jhg2x6p@adacore.com/

I will restore the modifications in the shared tree with the few other patches
I mentioned on the GCC ML recently soon (I've ran a little low on testing
bandwidth this week..)

apologies for the inconvenience

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #26 from Rich Felker  ---
> The only reasonable fix on the compiler side is to never emit memcpy but 
> always use memmove.

No, it can literally just emit (equivalent at whatever intermediate form of):

cmp src,dst
je 1f
call memcpy
1:

in place of memcpy.

It can even optimize out that in the case where it's provable that they're not
equal, e.g. presence of restrict or one of the two objects not having had its
address taken/leaked.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #10 from Richard Sandiford  ---
(In reply to Richard Biener from comment #9)
> So do we expect - independed of whether a constant/external is used as mask
> - that uniform constants/externals are generatable and thus we can elide the
> check for those?  Possibly also go a different path during code-generation
> then?  (because that will otherwise assert)
Yeah, I think so.  At the time, I don't think there were any cases where
treating uniform values differently would have helped, and it wasn't
trivial thing to test on the fly.  But now we have a reason to try :)

[Bug middle-end/112344] [14 Regression] Wrong code at -O2 on x86_64-pc-linux-gnu

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112344

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from Richard Biener  ---
Fixed.

[Bug middle-end/112344] [14 Regression] Wrong code at -O2 on x86_64-pc-linux-gnu

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112344

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:6bf66276e3e41d5d92f7b7260e98b6a111653805

commit r14-5759-g6bf66276e3e41d5d92f7b7260e98b6a111653805
Author: Richard Biener 
Date:   Wed Nov 22 11:10:41 2023 +0100

tree-optimization/112344 - wrong final value replacement

When performing final value replacement chrec_apply that's used to
compute the overall effect of niters to a CHREC doesn't consider that
the overall increment of { -2147483648, +, 2 } doesn't fit in
a signed integer when the loop iterates until the value of the IV
of 20.  The following fixes this mistake, carrying out the multiply
and add in an unsigned type instead, avoiding undefined overflow
and thus later miscompilation by path range analysis.

PR tree-optimization/112344
* tree-chrec.cc (chrec_apply): Perform the overall increment
calculation and increment in an unsigned type.

* gcc.dg/torture/pr112344.c: New testcase.

[Bug c/111911] [11/12/13/14 Regression] ICE with integer overflow converting to _Bool

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111911

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jakub at gcc dot gnu.org,
   ||jsm28 at gcc dot gnu.org
   Priority|P3  |P2

--- Comment #5 from Jakub Jelinek  ---
Started with r10-5922-g3d77686d2eddf76d3498169d0ca5653db45a8662

[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653

--- Comment #6 from Richard Biener  ---
(In reply to Jan Hubicka from comment #5)
> > but the issue is that test2 escapes which makes this conflict:
> 
> It is passed to memmove which is noescape and returned.  Why local PTA
> considers returned values to escape?

The pointed to memory escapes which means that stores to it are not dead.
Mind we do not have a separate points-to set for escaped via return
(some functions can also "return" like via EH or longjmp, and we can't
really know the latter w/o IPA analysis).  Pointers can also escape to
global memory.

Special-casing the regular return path is sth that's possible (also IPA
points-to doesn't compute a "local" escaped at all but preserves the
non-IPA solution for that), but in the end it didn't seem important
enough for me to try doing that ...

We have the function entry state which is NONLOAL, ESCAPED is what
determines "global memory" for all sorts of optimizations.  If we
split out RETURN_ESCAPED then that would be ESCAPED | RETURN_ESCAPED
and alias disambiguation could avoid RETURN_ESCAPED.

But ESCAPED handling is complicated already ...

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

Jakub Jelinek  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com,
   ||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Slightly cleaned up:
void f2 (void);
void f4 (int, int, int);
struct A { int a; };
struct B { struct A *b; int c; } v;

static int
f1 (x, y)
  struct C *x;
  struct A *y;
{
  (v.c = v.b->a) || (v.c = v.b->a);
  f2 ();
}

static void
f3 (int x, int y)
{
  int b = f1 (0, ~x);
  f4 (0, 0, v.c);
}

void
f5 (void)
{
  f3 (0, 0);
}

The problem is in the f1 call, given it uses the K definition style and the
caller invokes UB by using incompatible types (int vs. pointers), I think
IPA-VRP should punt somewhere on the type mismatch.

I think
  Value_Range vr (operand_type);
  if (TREE_CODE_CLASS (operation) == tcc_unary)
ipa_vr_operation_and_type_effects (vr,
   src_lats->m_value_range.m_vr,
   operation, param_type,
   operand_type);
should be avoided if param_type is not a compatible type to operand_type,
unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if the
float to integral or vice versa ops as well but vrp probably doesn't handle
that yet).
In the above case, param_type is struct A *, i.e. pointer, while operand_type
is int.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #9 from Richard Biener  ---
(In reply to Richard Sandiford from comment #8)
> I think we're going down the wrong path here.  If I've understood
> the original change correctly, dummy masks aren't special because
> they're masks.  They're special because all elements are equal to
> the same value.  A mask such as:
> 
>   { 1, 1, 1, 0, 1 }
> 
> would not be OK, just like an integer vector with those values would
> not be OK.
> 
> So IMO we should check whether all elements are equal, rather than
> whether the type is one thing or another.

So do we expect - independed of whether a constant/external is used as mask -
that uniform constants/externals are generatable and thus we can elide the
check for those?  Possibly also go a different path during code-generation
then?  (because that will otherwise assert)

[Bug rtl-optimization/112610] [12/13/14 Regression] ICE: SIGSEGV with -flive-range-shrinkage -fdump-rtl-all-all -fira-verbose=9

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112610

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:95f61de95bbcc2e4fb7020e27698140abea23788

commit r14-5757-g95f61de95bbcc2e4fb7020e27698140abea23788
Author: Vladimir N. Makarov 
Date:   Wed Nov 22 09:01:02 2023 -0500

[IRA]: Fix using undefined dump file in IRA code during insn scheduling

Part of IRA code is used for register pressure sensitive insn
scheduling and live range shrinkage.  Numerous changes of IRA resulted
in that this IRA code uses dump file passed by the scheduler and
internal ira dump file (in called functions) which can be undefined or
freed by the scheduler during compiling previous functions.  The patch
fixes this problem.  To reproduce the error valgrind should be used
and GCC should be compiled with valgrind annotations.  Therefor the
patch does not contain the test case.

gcc/ChangeLog:

PR rtl-optimization/112610
* ira-costs.cc: (find_costs_and_classes): Remove arg.
Use ira_dump_file for printing.
(print_allocno_costs, print_pseudo_costs): Ditto.
(ira_costs): Adjust call of find_costs_and_classes.
(ira_set_pseudo_classes): Set up and restore ira_dump_file.

[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2023-11-22
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek  ---
Created attachment 56665
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56665=edit
gcc14-pr112668.patch

Untested fix.

[Bug other/112671] New: libiconv support lacks separate --with-libiconv-{include,lib}

2023-11-22 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671

Bug ID: 112671
   Summary: libiconv support lacks separate
--with-libiconv-{include,lib}
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ro at gcc dot gnu.org
  Target Milestone: ---

When trying to build trunk on Solaris with GNU libiconv 1.17, I noticed that
the libiconv configure support is limited compared to e.g. gmp etc.  On
multilibbed targets like Solaris, you usually install both 32 and 64-bit
versions of a library into /include (common between 32 and
64-bit) and /lib (32-bit lib) resp.
/lib/{amd64,sparcv9}.

The current (simple-minded) support via --with-libiconv-prefix cannot handle
this,
requiring to use two different installations into different prefixes.  It's
also inconsistent with the rest of gcc which does support configurations like
this.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #8 from Richard Sandiford  ---
I think we're going down the wrong path here.  If I've understood
the original change correctly, dummy masks aren't special because
they're masks.  They're special because all elements are equal to
the same value.  A mask such as:

  { 1, 1, 1, 0, 1 }

would not be OK, just like an integer vector with those values would
not be OK.

So IMO we should check whether all elements are equal, rather than
whether the type is one thing or another.

[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563

--- Comment #11 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #10 from Jakub Jelinek  ---
> (In reply to r...@cebitec.uni-bielefeld.de from comment #9)
[...]
>> I've now come up with an alternative.  It's a bit ugly, but it gets the
>> work done:
>> 
>> diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
>> b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
>> --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
>> +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
>> @@ -17,6 +17,17 @@
>>  // The asm hack only works with GCC and Clang.
>>  #if !defined(_WIN32)
>>  
>> +// FIXME: Explain.
>> +#if defined(__sparc__)
>> +#define ASM_MEM_DEF(FUNC) \
>> +__asm__(".global " #FUNC "\n" \
>> +".type " #FUNC ",function\n" \
>
> Not @function ?

No, this should be #function: that's the only variant sparc as
understands, and gas does for compatibility.

>> +".weak " #FUNC "\n" \
>> +#FUNC ":\n");
>> +ASM_MEM_DEF(__sanitizer_internal_memcpy)
>> +ASM_MEM_DEF(__sanitizer_internal_memmove)
>> +ASM_MEM_DEF(__sanitizer_internal_memset)
>> +#endif
>>  asm("memcpy = __sanitizer_internal_memcpy");
>>  asm("memmove = __sanitizer_internal_memmove");
>>  asm("memset = __sanitizer_internal_memset");
>> 
>> I've run libsanitizer builds on sparc without this patch (gas only since
>> as fails) and with it (as and gas).  It fixes the as build failure and
>> leaves the same number of calls to mem* functions in libasan.so as an
>> unpatched tree with gas.
>
> If it works, nice.  Can you file it on github.com/llvm/llvm-project as an 
> issue
> and see if upstream is willing to accept it?  I think they'll want some

Can do, either as an issue or directly as a pull request.  I'll run it
through a full llvm build, too, first.

> indentation changes (if defined(__sparc__) is below the _WIN32 #if, so they
> probably want it
> indented more and the define even more.  And dunno if defined(__sparc__) or
> SANITIZER_SPARC should be used.

I know: LLVM has clang/tools/clang-format/clang-format-diff.py to handle
this.  I usually run my patches through that first, unlike it messes up
the existing formatting as was the case for pull request #72973.

The patches also needs an explanatory comment; this was just a proof of
concept.  It might be even better to restrict the hack to __sparc__ &&
__sun__ && __svr4__ to avoid interfering with Linux/sparc64.

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #8 from Jan Hubicka  ---
The negative return value branch predictor is set to have 98% hitrate (measured
on SPEC2k17 some time ago).  There is --param predictable-branch-outcome that
is also set to 2% so indeed we consider the branch as well predictable by this
heuristics.

Reducing --param should make cmov to happen.

With profile_probability data type we could try something smarter on guessing
if given branch is predictable (such as ignoring guessed values and let
predictor to optionally mark branches as (un)predictable). But it is not quite
clear to me what desired behavior would be...

Guessing predictability of data branches is generally quite hard problem.
Predictablity of loop branches is easier, but we hardly apply BRANCH_COST on
branch closing loop since those are not if-conversion candidates.

[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563

--- Comment #10 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #9)
> > --- Comment #8 from Jakub Jelinek  ---
> > So, shall we go with
> > --- libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h.jj 
> > 2023-11-15 12:45:17.359586776 +0100
> > +++ libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h 2023-11-21
> > 18:29:52.401817763 +0100
> > @@ -15,7 +15,8 @@
> >  #define SANITIZER_REDEFINE_BUILTINS_H
> >
> >  // The asm hack only works with GCC and Clang.
> > -#if !defined(_WIN32)
> > +// It doesn't work when using Solaris as either.
> > +#if !defined(_WIN32) && !SANITIZER_SOLARIS
> >
> >  asm("memcpy = __sanitizer_internal_memcpy");
> >  asm("memmove = __sanitizer_internal_memmove");
> > @@ -50,7 +51,7 @@ using vector = Define_SANITIZER_COMMON_N
> >  }  // namespace std
> >
> >  #  endif  // __cpluplus
> > -#endif// !_WIN32
> > +#endif// !_WIN32 && !SANITIZER_SOLARIS
> >
> >  #  endif  // SANITIZER_REDEFINE_BUILTINS_H
> >  #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS
> >
> > then (either as local patch or try to push it upstream)?
> 
> That's way to heavy IMO: it punishes the Solaris/x86 as which isn't
> affected and also Solaris/SPARC with gas.
> 
> I've now come up with an alternative.  It's a bit ugly, but it gets the
> work done:
> 
> diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
> @@ -17,6 +17,17 @@
>  // The asm hack only works with GCC and Clang.
>  #if !defined(_WIN32)
>  
> +// FIXME: Explain.
> +#if defined(__sparc__)
> +#define ASM_MEM_DEF(FUNC) \
> +__asm__(".global " #FUNC "\n" \
> +".type " #FUNC ",function\n" \

Not @function ?

> +".weak " #FUNC "\n" \
> +#FUNC ":\n");
> +ASM_MEM_DEF(__sanitizer_internal_memcpy)
> +ASM_MEM_DEF(__sanitizer_internal_memmove)
> +ASM_MEM_DEF(__sanitizer_internal_memset)
> +#endif
>  asm("memcpy = __sanitizer_internal_memcpy");
>  asm("memmove = __sanitizer_internal_memmove");
>  asm("memset = __sanitizer_internal_memset");
> 
> I've run libsanitizer builds on sparc without this patch (gas only since
> as fails) and with it (as and gas).  It fixes the as build failure and
> leaves the same number of calls to mem* functions in libasan.so as an
> unpatched tree with gas.

If it works, nice.  Can you file it on github.com/llvm/llvm-project as an issue
and see if upstream is willing to accept it?  I think they'll want some
indentation changes (if defined(__sparc__) is below the _WIN32 #if, so they
probably want it
indented more and the define even more.  And dunno if defined(__sparc__) or
SANITIZER_SPARC should be used.

[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle

2023-11-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653

--- Comment #5 from Jan Hubicka  ---
> but the issue is that test2 escapes which makes this conflict:

It is passed to memmove which is noescape and returned.  Why local PTA
considers returned values to escape?

[Bug ipa/98925] Extend ipa-prop to handle return functions for slot optimization

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98925

--- Comment #3 from Jan Hubicka  ---
Return value range propagation was added in
r:53ba8d669550d3a1f809048428b97ca607f95cf5

however it works on scalar return values only for now. Extending it to
aggregates is a logical next step and should not be terribly hard.

The code also misses logic for IPA streaming so it works only in ealry and late
opts.

[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563

--- Comment #9 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #8 from Jakub Jelinek  ---
> So, shall we go with
> --- libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h.jj 
> 2023-11-15 12:45:17.359586776 +0100
> +++ libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h 2023-11-21
> 18:29:52.401817763 +0100
> @@ -15,7 +15,8 @@
>  #define SANITIZER_REDEFINE_BUILTINS_H
>
>  // The asm hack only works with GCC and Clang.
> -#if !defined(_WIN32)
> +// It doesn't work when using Solaris as either.
> +#if !defined(_WIN32) && !SANITIZER_SOLARIS
>
>  asm("memcpy = __sanitizer_internal_memcpy");
>  asm("memmove = __sanitizer_internal_memmove");
> @@ -50,7 +51,7 @@ using vector = Define_SANITIZER_COMMON_N
>  }  // namespace std
>
>  #  endif  // __cpluplus
> -#endif// !_WIN32
> +#endif// !_WIN32 && !SANITIZER_SOLARIS
>
>  #  endif  // SANITIZER_REDEFINE_BUILTINS_H
>  #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS
>
> then (either as local patch or try to push it upstream)?

That's way to heavy IMO: it punishes the Solaris/x86 as which isn't
affected and also Solaris/SPARC with gas.

I've now come up with an alternative.  It's a bit ugly, but it gets the
work done:

diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
--- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
+++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h
@@ -17,6 +17,17 @@
 // The asm hack only works with GCC and Clang.
 #if !defined(_WIN32)

+// FIXME: Explain.
+#if defined(__sparc__)
+#define ASM_MEM_DEF(FUNC) \
+__asm__(".global " #FUNC "\n" \
+".type " #FUNC ",function\n" \
+".weak " #FUNC "\n" \
+#FUNC ":\n");
+ASM_MEM_DEF(__sanitizer_internal_memcpy)
+ASM_MEM_DEF(__sanitizer_internal_memmove)
+ASM_MEM_DEF(__sanitizer_internal_memset)
+#endif
 asm("memcpy = __sanitizer_internal_memcpy");
 asm("memmove = __sanitizer_internal_memmove");
 asm("memset = __sanitizer_internal_memset");

I've run libsanitizer builds on sparc without this patch (gas only since
as fails) and with it (as and gas).  It fixes the as build failure and
leaves the same number of calls to mem* functions in libasan.so as an
unpatched tree with gas.

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #17 from Jan Hubicka  ---
-falign-functions/-falign-jumps/-falign-labels/-falign-loops are originally are
intended for performance tuning.  Starting function entry close to the end of
page of code cache may lead to wasted code cache space as well as higher
overhead calling the function when CPU fetches page which contains just little
useful information.

As such I would like to keep them affecting only hot code (we should update
documentation for that).  Internally we have FUNCTION_BOUNDARY which specifies
minimal alignment needed by ABI, which is set to 8bits for i386.  My
understanding is that -fpatchable-function-entry requires the alignment to be
64bits in order to make it possible to atomically change the instruction.

So perhaps we want to make FUNCTION_BOUNDARY to be 64 for functions where we
output the patchable entry?
I am also OK with extending the flag syntax or adding -fmin-function-alignment
to specify optional user-defined minimum (increase FUNCTION_BOUNDARY) if that
seems useful, but I think the first one is most consistent way to go with live
patching?

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #7 from Richard Biener  ---
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4a09b3c2aca..d0967240ae3 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -766,7 +766,10 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned
char swap,
  if ((dt == vect_constant_def
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
- && TREE_CODE (type) != BOOLEAN_TYPE
+ && (!is_gimple_call (stmt_info->stmt)
+ || !gimple_call_internal_p (stmt_info->stmt)
+ || internal_fn_mask_index
+  (gimple_call_internal_fn (stmt_info->stmt)) != opno)
  && !can_duplicate_and_interleave_p (vinfo, stmts.length (),
type))
{
  if (dump_enabled_p ())

fixes the testcase, not sure if it still resolves the issue fixed with the
original change.

[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression

2023-11-22 Thread miro.palmu at helsinki dot fi via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642

--- Comment #9 from Miro Palmu  ---
(In reply to Jonathan Wakely from comment #8)
> > Also tried it locally with clang 16.0.6 with
> > gcc-13.2.1 libstdc++
> 
> Which gcc-13.2.1 though? That's a snapshot that could date from any time in
> the past four months. If I use gcc version 13.2.1 20231025 then clang
> compiles it.

Mine is 13.2.1 20230801 so way before Oct 21. (I did not know there were
different snapshots of the releases, I'm just a user trying to help :) )

> Anyway, the original GCC error is the same as PR 112642

You probably mean PR 110158

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:de6f3e12bd188fee30bc79a5e323e16e0dbbe8ca

commit r14-5755-gde6f3e12bd188fee30bc79a5e323e16e0dbbe8ca
Author: Juzhe-Zhong 
Date:   Wed Nov 22 18:53:22 2023 +0800

RISC-V: Fix incorrect use of vcompress in permutation auto-vectorization

This patch fixes following FAILs on zvl512b of RV32 system:

FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c execution
test
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c execution
test

The root cause is that for permutation indice = {0,3,7,0} use vcompress
optimization
which is incorrect. Fix vcompress optimization bug.

PR target/112598

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_compress_patterns): Fix
vcompress bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112598-3.c: New test.

[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668

--- Comment #2 from Jakub Jelinek  ---
No loop is needed:
/* PR middle-end/112668 */
/* { dg-do compile { target bitint } } */
/* { dg-options "-std=c23 -fnon-call-exceptions" } */

#if __BITINT_MAXWIDTH__ >= 495
struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) c
: 2; };
extern void foo (struct T495 *r495);

unsigned _BitInt(495)
bar (int i)
{
  struct T495 r495[12];
  foo (r495);
  return r495[i].b;
}
#endif

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

--- Comment #6 from Richard Biener  ---
As suggested in the review at time the change would ideally be restricted to
actual mask operands, not random BOOLEAN_TYPE ones.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

Richard Biener  changed:

   What|Removed |Added

Summary|[14] RISC-V ICE: in |[14] RISC-V ICE: in
   |duplicate_and_interleave,   |duplicate_and_interleave,
   |at tree-vect-slp.cc:8025|at tree-vect-slp.cc:8025
   |with maxval_char_3.f90  |with maxval_char_3.f90
   |vlen256b|vlen256b since
   ||r14-5101-g60034ecf25597b

--- Comment #5 from Richard Biener  ---
Btw, a fallout of r14-5101-g60034ecf25597b.

[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression

2023-11-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642

--- Comment #8 from Jonathan Wakely  ---
(In reply to Miro Palmu from comment #7)
> (In reply to Jonathan Wakely from comment #6)
> > The examples in comment 4 do compile using libstdc++ on clang, if you use
> > libstdc++ headers from after sept 29 (for trunk) or oct 21 (for gcc-13).
> 
> I was testing this on compiler explorer on clang 17.0.1 and it used
> gcc-13.2.0 libstdc++.

Which is expected to fail, because 13.2.0 was released before Oct 21.

> Also tried it locally with clang 16.0.6 with
> gcc-13.2.1 libstdc++

Which gcc-13.2.1 though? That's a snapshot that could date from any time in the
past four months. If I use gcc version 13.2.1 20231025 then clang compiles it.

Anyway, the original GCC error is the same as PR 112642 which was apparently
reduced to PR 111284, which does seem relevant.

[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||rsandifo at gcc dot gnu.org
   Last reconfirmed||2023-11-22
 Status|UNCONFIRMED |NEW

--- Comment #4 from Richard Biener  ---
We are code-generating

t.f90:1:12: note: node (constant) 0x53bc430 (max_nunits=1, refcnt=1)
vector([8,8]) unsigned int
t.f90:1:12: note:  { 1, 1, 1, 1, 1 }

during SLP node analysis we assume we can constant generate constants/externals
as only consumers will determine the vector type.  vectorizable_store
doesn't verify it can generate the constant though.  Instead we are checking
this at SLP build time.

We're using E_RVVM1SImode as base_vector_mode and count is 5.  There's
obviously no integer mode for size '5'.  But it is a constant size vector
so I wonder why we ask for can_duplicate_and_interleave_p at all, that is,
how we arrive at vector([8,8]) for a constant size vinfo->vector_mode.

At analysis time we do

  if ((dt == vect_constant_def
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
  && TREE_CODE (type) != BOOLEAN_TYPE
  && !can_duplicate_and_interleave_p (vinfo, stmts.length (),
type))
{  

see how we look at vinfo->vector_mode here.

[Bug c++/110158] Cannot use union with std::string inside in constant expression

2023-11-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110158

--- Comment #9 from Jonathan Wakely  ---
Odd, I thought I'd checked it when testing r14-4334-g28adad7a32ed92.

Seems like the same issue as PR 112642 though (which has a minimized version
without std::string).

[Bug target/111677] [12/13/14 Regression] darktable build on aarch64 fails with unrecognizable insn due to -fstack-protector changes

2023-11-22 Thread costamagnagianfranco at yahoo dot it via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111677

--- Comment #14 from Gianfranco  ---
Hello, any news for this issue?

[Bug c++/112666] Missed optimization: Value initialization zero-initializes members with user-defined constructor

2023-11-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112666

--- Comment #1 from Jonathan Wakely  ---
(In reply to Francisco Paisana from comment #0)
> The struct "C" which is just "B" and an int is much slower at being
> initialized than B when value initialization (via {}) is used. However, my
> understanding of the C++ standard is that members with a user-defined
> default constructor do not need to be zero-initialized in this situation.

I think that's not quite right. Types with a user-provided default constructor
will not be zero-initialized when value-init is used. B does have a
user-provided default constructor, so value-init for an object of type B does
not perform zero-init first.

But that applies when constructing a complete B object, not when constructing a
member subobject.

C does not have a user-provided default constructor, so value-initialization
means:

"- the object is zero-initialized and the semantic constraints for
default-initialization are checked, and if T has a non-trivial default
constructor, the object is default-initialized;"

So first it's zero-initialized, which means:

"- if T is a (possibly cv-qualified) non-union class type, its padding bits
(6.8.1) are initialized to zero bits and each non-static data member, each
non-virtual base class subobject, and, if the object is not a base class
subobject, each virtual base class subobject is zero-initialized;"

This specifically says that *each non-static data member ... is
zero-initialized." So the B subobject must be zero-initialized. That's not the
same as when you value-init a B object.

> Looking at the godbolt assembly output, I see that both `A a{}` and `C c{}`
> generate a memset instruction, while `B b{}` doesn't. Clang, on the other
> hand, seems to initialize C almost as fast as B.

I don't know whether Clang considers the zero-init to be dead stores that are
clobbered by B() and so can be eliminated, or something else. But my
understanding of the standard is that requiring zero-init of B's members is
very intentional here.

> This potentially missed optimization in gcc is particularly nasty for
> structs with large embedded storage (e.g. structs that contain C-arrays,
> std::arrays, or static_vectors).

Arguably, the problem here is that B has a default ctor that intentionally
leaves members uninitialized. If you want to preserve that behaviour in types
that contain a B subobject, then you also need to give those types (e.g. C in
your example) a user-provided default ctor.

[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668

--- Comment #1 from Jakub Jelinek  ---
Reduced:
/* PR middle-end/112668 */
/* { dg-do compile { target bitint } } */
/* { dg-options "-std=c23 -fnon-call-exceptions" } */

#if __BITINT_MAXWIDTH__ >= 495
struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) c
: 2; };
extern void foo (struct T495 *r495);

int
bar (void)
{
  struct T495 r495[12];
  foo (r495);
  for (int i = 0; i < 12; ++i)
if (r495[i].b != 0uwb)
  return 1;
  return 0;
}
#endif

[Bug target/112669] GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags

2023-11-22 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669

--- Comment #1 from Thomas Schwinge  ---
Tracing through 'gcc/gcc.cc': 'build_search_list' -> 'for_each_path', I find:

For '-march=gfx908', we have:

(gdb) print multilib_dir
$3 = 0x82e6c0 "gfx908"
(gdb) print multilib_os_dir
$4 = 0x82e6c0 "gfx908"

For '-march=gfx906 -march=gfx908', we have:

(gdb) print multilib_dir
$3 = 0x0
(gdb) print multilib_os_dir
$4 = 0x0

These are:

/* Subdirectory to use for locating libraries.  Set by
   set_multilib_dir based on the compilation options.  */

static const char *multilib_dir;

/* Subdirectory to use for locating libraries in OS conventions.  Set by
   set_multilib_dir based on the compilation options.  */

static const char *multilib_os_dir;

Indeed, simpler:

$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory
-march=gfx908
gfx908
$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory
-march=gfx906 -march=gfx908
.

Instead of '.' (default), the latter should also print 'gfx908'.

I'll look into 'set_multilib_dir' etc.

[Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector

2023-11-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

--- Comment #4 from Jonathan Wakely  ---
Ah I think that's probably expected. In _M_realloc_insert (and now
_M_realloc_append) we have:

#if __cplusplus >= 201103L
if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
  {
// Relocation cannot throw.
__new_finish = _S_relocate(__old_start, __position.base(),
   __new_start, _M_get_Tp_allocator());
++__new_finish;
__new_finish = _S_relocate(__position.base(), __old_finish,
   __new_finish, _M_get_Tp_allocator());
  }
else
#endif

and then an alternative path used for non-trivial types and for C++98. That
alternative path does more work and probably can't be optimized as well, so the
reads from _M_end_of_storage aren't optimized out.

I think we can just use { target c++11 } for the test.

[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression

2023-11-22 Thread miro.palmu at helsinki dot fi via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642

--- Comment #7 from Miro Palmu  ---
(In reply to Jonathan Wakely from comment #6)
> The examples in comment 4 do compile using libstdc++ on clang, if you use
> libstdc++ headers from after sept 29 (for trunk) or oct 21 (for gcc-13).

I was testing this on compiler explorer on clang 17.0.1 and it used gcc-13.2.0
libstdc++. Also tried it locally with clang 16.0.6 with gcc-13.2.1 libstdc++

Output:

$ cat prog.cpp 

#include 
#include 
int main() {
[](std::string s = {}) consteval {
std::string ss{ std::move(s) };
}();
}

$ clang prog.cpp -std=c++2b -stdlib=libstdc++

prog.cpp:4:5: error: call to consteval function 'main()::(anonymous
class)::operator()' is not a constant expression
[](std::string s = {}) consteval {
^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/stl_construct.h:97:14:
note: construction of subobject of member '_M_local_buf' of union with no
active member is not allowed in a constant expression
{ return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
 ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/char_traits.h:272:6:
note: in call to 'construct_at(_M_local_buf[0], s.._M_local_buf[0])'
std::construct_at(__s1 + __i, __s2[__i]);
^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/char_traits.h:443:11:
note: in call to 'copy(_M_local_buf[0], _M_local_buf[0], 1)'
  return __gnu_cxx::char_traits::copy(__s1, __s2, __n);
 ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:672:6:
note: in call to 'copy(_M_local_buf[0], _M_local_buf[0], 1)'
traits_type::copy(_M_local_buf, __str._M_local_buf,
^
prog.cpp:5:21: note: in call to 'basic_string(s)'
std::string ss{ std::move(s) };
^
prog.cpp:4:5: note: in call to '&[](std::string s) {
std::string ss{std::move(s)};
}->operator()(}}, _M_local_buf[0]}, 0, {._M_local_buf = {0, 0, 0, 0, 0,
0, 0, 0, 0, 0, ...}}})'
[](std::string s = {}) consteval {
^
1 error generated.

[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644

--- Comment #5 from Jakub Jelinek  ---
Thanks.

[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge

2023-11-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644

--- Comment #4 from Tamar Christina  ---
I've asked Matthew to take a look since he wrote the initial support.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #6 from JuzheZhong  ---
Hi, there are these following run FAILs left on RV32/RV64 C/C++:
after this patch fix:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637753.html


FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects execution test
This case I don't have a quick solution, so file a PR here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670

This FAIL may need Robin's help.

Another FAIL is 
FAIL: gcc.dg/torture/pr58955-2.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test

Li Pan from intel will handle this FAIL.

So I am gonna move on zvl1024b.

Btw, could you run zvl2048b, zvl4096b (We can only allow VLEN at most 4096bit
for now) ? I didn't see a PR for these 2.

Thanks.

[Bug c/112670] New: RISC-V: Run fail on pr65518.c with -flto

2023-11-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670

Bug ID: 112670
   Summary: RISC-V: Run fail on pr65518.c with -flto
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Case:

#include 

#if VECTOR_BITS > 256
#define NINTS (VECTOR_BITS / 32)
#else
#define NINTS 8
#endif

#define N (NINTS * 2)
#define RESULT (NINTS * (NINTS - 1) / 2 * N + NINTS)

typedef struct giga
{
  unsigned int g[N];
} giga;

unsigned long __attribute__((noinline,noclone))
addfst(giga const *gptr, int num)
{
  unsigned int retval = 0;
  int i;
  for (i = 0; i < num; i++)
retval += gptr[i].g[0];
  return retval;
}

int main ()
{
  struct giga g[NINTS];
  unsigned int n = 1;
  int i, j;
  for (i = 0; i < NINTS; ++i)
for (j = 0; j < N; ++j)
  {
g[i].g[j] = n++;
__asm__ volatile ("");
  }
  assert (addfst (g, NINTS) == RESULT);
  return 0;
}

with -march=rv64gcv_zvfh_zfh_zvl512b -mabi=lp64d -O3 -fno-vect-cost-model
The run passed.

However, with -march=rv64gcv_zvfh_zfh_zvl512b -mabi=lp64d -O3
-fno-vect-cost-model -flto.

It execution failed:
bbl loader
assertion "addfst (g, NINTS) == RESULT" failed: file "bug.c", line 38,
function: main

I compare the codegen, they are totally the same.
I can't figure out what's the problem ?

[Bug target/112669] New: GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags

2023-11-22 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669

Bug ID: 112669
   Summary: GCN: wrong 'LIBRARY_PATH' in presence of several
different '-march=[...]' flags
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: ams at gcc dot gnu.org, jules at gcc dot gnu.org
  Target Milestone: ---
Target: GCN

I've run into a weird issue when several different '-march=[...]' flags appear.
 This causes linking to fail: the linker tries to link in the wrong multilib's
libraries.  This happens, for example, if the user provides '-march=[...]' for
libgomp offloading testing, but a test cases also specifies a specific
'-march=[...]'.

The problem might perhaps be in GCN multilib setup, however it doesn't seem
related to the recent changes ("amdgcn: deprecate Fiji device and multilib"),
as I'm also reproducing the issue with previous GCC release branches.

The issue -- I suppose -- boils down to:

No '-march=[...]' flag appears, default paths:

$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs | sed -n -e
"/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > default
$ cat < default 
   
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/
   
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/
   
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/
   
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/
[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/

If one '-march=[...]' flag appears, we get those multilib paths prepended:

$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx906
| sed -n -e "/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx906
$ diff -U1 default gfx906
--- default 2023-11-22 11:47:14.021018613 +0100
+++ gfx906  2023-11-22 11:47:21.856931965 +0100
@@ -1 +1,7 @@
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/gfx906/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/gfx906/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/gfx906/

[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/

Similarly, if the same '-march=[...]' flag appears twice, we get those multilib
paths prepended:

$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx908
-march=gfx908 | sed -n -e "/^libraries:
=/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx908
$ diff -U1 default gfx908
--- default 2023-11-22 11:47:14.021018613 +0100
+++ gfx908  2023-11-22 11:47:34.760789347 +0100
@@ -1 +1,7 @@
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/gfx908/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/gfx908/
   
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/
+[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/gfx908/

[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/

However, if several different '-march=[...]' flags appear, we're back to the
default:

$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx906
-march=gfx908 | sed -n -e "/^libraries:
=/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx906,gfx908
$ cmp default gfx906,gfx908 && echo 'no difference'
no difference
$ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx908
-march=gfx906 | sed -n -e 

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #7 from Richard Biener  ---
I think a return of a negative value is predicted to be cold (aka "error"):

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  if (c == 14)
goto ; [INV]
  else
goto ; [INV]
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
  D.2771 = -9;
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653

--- Comment #4 from Richard Biener  ---
We do use the alias oracle in folding memmove:

  /* If the destination and source do not alias optimize into
 memcpy as well.  */
  if ((is_gimple_min_invariant (dest)
   || TREE_CODE (dest) == SSA_NAME)
  && (is_gimple_min_invariant (src)
  || TREE_CODE (src) == SSA_NAME))
{
  ao_ref destr, srcr;
  ao_ref_init_from_ptr_and_size (, dest, len);
  ao_ref_init_from_ptr_and_size (, src, len);
  if (!refs_may_alias_p_1 (, , false))
{
  tree fn;
  fn = builtin_decl_implicit (BUILT_IN_MEMCPY);
  if (!fn)
return false;

but the issue is that test2 escapes which makes this conflict:

  # PT = null { D.2775 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  # USE = nonlocal escaped
  # CLB = nonlocal escaped
  test2_4 = __builtin_malloc (1000);
  # PT = nonlocal escaped null
  test.0_1 = test;
  __builtin_memmove (test2_4, test.0_1, 1000);

it works for

char *test, *test3;
void
copy_test ()
{
char *test2 = __builtin_malloc (1000);
__builtin_memmove (test2, test, 1000);
__builtin_memmove (test3, test2, 1000);
  __builtin_free (test2);
}

where both memmove calls become memcpy.  So this isn't asking for better
folding but for better pointer analysis I guess.

[Bug target/112611] LoongArch: Test cases lsx-vshuf.c and lasx-xvshuf_b.c fails on LA664

2023-11-22 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112611

--- Comment #4 from Xi Ruoyao  ---
(In reply to Jiahao Xu from comment #3)

> We now consider it as undefined behavior rather than a bug for [x]vshuf
> instructions. In vec_perm pattern, we use vector logical AND instructions to
> perform modulo operations in order to correctly use the [x]vshuf
> instructions. Therefore, we have decided to rewrite the two tests and ensure
> that the index values in the selector do not exceed 64.

I guess it would be better to also document this issue somewhere (extend.texi
?) and recommends to just use __builtin_shuffle instead of the intrinsic
(unless the programmer knows the AND operation is not needed but the compiler
does not).

  1   2   >