[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #31 from Richard Biener --- (In reply to Patrick J. LoPresti from comment #29) > (In reply to Jakub Jelinek from comment #27) > > > > No, that is not a reasonable fix, because it severely pessimizes common code > > for a theoretical only problem. > > The very existence of (and interest in) this bug report means it is > obviously not "a theoretical only problem". > > And of course Rich Felker is correct that the cost of the obvious fix is > trivial and not remotely "severe". But I didn't see a patch proposed to address this issue, which means it doesn't seem to be trivial. > But the bottom line is that GCC is emitting library calls that invoke > undefined behavior. At a minimum, GCC should document this non-standard > requirement on its runtime environment. Has anyone bothered to do that? Why > not? I think it's written down somewhere but I can't quickly find it (I also wonder where exactly the best place to document would be - it's related to porting GCC to a new target architecture I guess, not so much user-facing). OTOH I see @cindex @code{cpymem@var{m}} instruction pattern @item @samp{cpymem@var{m}} ... The @code{cpymem@var{m}} patterns need not give special consideration to the possibility that the source and destination strings might overlap. These patterns are used to do inline expansion of @code{__builtin_memcpy}. which is possibly the closest piece we have and which fails to mention exact overlap. I'll propose an adjustment to this.
[Bug c/112676] New: [14 regression] ICE in extract_insn, at recog.cc:2804
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112676 Bug ID: 112676 Summary: [14 regression] ICE in extract_insn, at recog.cc:2804 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: manuel.lauss at googlemail dot com Target Milestone: --- Created attachment 56669 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56669=edit compressed unreduced testcase gcc version 14.0.0 20231123 (experimental) 9d912820d02c7396676e04c4c05f6a0fdd92ed85 This is very recent, on linux g9b6de136: $ gcc -mno-avx -march=znver4 -O2 -c dcn32_fpu.i /usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c: In function 'dcn32_internal_validate_bw': /usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2223:1: error: unrecognizable insn: 2223 | } | ^ (insn 1628 1627 1629 277 (set (reg:V16QI 1102) (xor:V16QI (reg:V16QI 1101) (mem:V16QI (reg:DI 1100) [0 MEM [(void *)stream_817 + 608B]+0 S16 A8]))) "/usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c":1350:7 -1 (nil)) during RTL pass: vregs /usr/src/linux.git/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2223:1: internal compiler error: in extract_insn, at recog.cc:2804 Omitting either "-march=znver4" or "-mno-avx" gets rid of it. Thanks! Manuel
[Bug target/112675] New: [14 Regression] r14-5385-g0a140730c97087 caused regression on testcases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112675 Bug ID: 112675 Summary: [14 Regression] r14-5385-g0a140730c97087 caused regression on testcases Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: haochen.jiang at intel dot com Target Milestone: --- As shown in gcc-regression: https://gcc.gnu.org/pipermail/gcc-regression/2023-November/078504.html The guilty commit for some regressions is r14-5385-g0a140730c97087. An easy reproducer would be: make check RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/fp-int-convert-timode.c --target_board='unix{-m64\ -march=cascadelake,-m32\ -march=cascadelake,-m32,-m64}'"
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #12 from Richard Biener --- Created attachment 56668 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56668=edit patch (not working) So this tries this, moving the duplicate-and-interleave check and changing code generation. It seems though that gimple_build_vector_from_val only uses VEC_DUPLICATE_EXPR for non-constants but tree-vector-builder doesn't like to build the uniform constant and we ICE: internal compiler error: in finalize, at vector-builder.h:513 0x1e36958 vector_builder::finalize() /space/rguenther/src/gcc/gcc/vector-builder.h:513 0x1e36598 tree_vector_builder::build() /space/rguenther/src/gcc/gcc/tree-vector-builder.cc:42 0x15dc80a gimple_build_vector(gimple_stmt_iterator*, bool, gsi_iterator_update, unsigned int, tree_vector_builder*) /space/rguenther/src/gcc/gcc/gimple-fold.cc:9256 0x1ddb2e7 gimple_build_vector(gimple**, tree_vector_builder*) /space/rguenther/src/gcc/gcc/gimple-fold.h:241 0x1e0d6f5 vect_create_constant_vectors /space/rguenther/src/gcc/gcc/tree-vect-slp.cc:8261 that's the assert 508 void 509 vector_builder::finalize () 510 { 511/* The encoding requires the same number of elements to come from each 512 pattern. */ 513gcc_assert (multiple_p (m_full_nelts, m_npatterns)); I can of course try to manually build a VEC_DUPLICATE here but I wonder if we're on the right track here.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #30 from post+gcc at ralfj dot de --- There have been several assertions above that a certain way to solve this either has no performance cost at all or severe performance cost. That sounds like we are missing data -- ideally, someone would benchmark the actual cost of emitting that branch. It seems kind of pointless to just make assertions about the impact of this change without real data. > On the other hand, expecting the libc memcpy to make this check greatly > pessimizes every reasonable small use of memcpy with a gratuitous branch for > what is undefined behavior and should never appear in any valid program. I don't think this is true. As far as I can see, the performance impact of having memcpy support the src==dest case is zero -- the assembly generated by the current implementations already supports that case. (At least I have not seen any evidence to the contrary.) No new check in memcpy is required.
[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643 --- Comment #25 from urs at akk dot org --- (In reply to Haochen Jiang from comment #24) > Patch aims to fix that: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637865.html Yes, that solved the issue for me. Thanks.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #11 from Richard Biener --- OK, I'll give that a try then.
[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643 --- Comment #24 from Haochen Jiang --- Patch aims to fix that: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637865.html
[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598 --- Comment #8 from Li Pan --- For gcc.dg/torture/pr58955-2.c, we can simply reproduce it by options Pass when: -O3 Pass when: -O3 -ftracer -fno-schedule-insns -fno-schedule-insns2 Fail when: -O3 -ftracer -fno-schedule-insns2 10154: 4409 li s0,2 10156: 9c1d subw s0,s0,a5 10158: 1402 sll s0,s0,0x20 1015a: 9001 srl s0,s0,0x20 1015c: 97ca add a5,a5,s2 1015e: 078a sll a5,a5,0x2 10160: 7b018493 add s1,gp,1968 # 13400 10164: 97a6 add a5,a5,s1 10166: 00241613 sll a2,s0,0x2 1016a: 853e mv a0,a5 1016c: 4581 li a1,0 1016e: 158000ef jal 102c6 10172: ffc50793 add a5,a0,-4 10176: 4689 li a3,2 10178: 0d047057 vsetvli zero,s0,e32,m1,ta,ma 1017c: 40d8 lw a4,4(s1)<== Load 1017e: 5e00b0d7 vmv.v.i v1,1 10182: 74d1a423 sw a3,1864(gp) # 13398 10186: 0207e0a7 vse32.v v1,(a5) <== Store 1018a: 03271163 bne a4,s2,101ac Looks like the tracer and the sch1 resulted in the failure, it is a typical Load Before Store issue AFAIK. The lw load should be after the vse32 store in semantics but the sch1 moves it before the store and of course, the value of a4 is unexpected here.
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > > I think > Value_Range vr (operand_type); > if (TREE_CODE_CLASS (operation) == tcc_unary) > ipa_vr_operation_and_type_effects (vr, >src_lats->m_value_range.m_vr, >operation, param_type, >operand_type); > should be avoided if param_type is not a compatible type to operand_type, > unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if > the float to integral or vice versa ops as well but vrp probably doesn't > handle that yet). > In the above case, param_type is struct A *, i.e. pointer, while > operand_type is int. the root of the issue is that the precisions are different, and we're invoking an operation which expects the precisions to be the same (minus in this case). we can't deal with this in dispatch because some operations allow the LH and RH to be different precisions or even types. It also seems like overkill to have every operation check the incoming precision, but perhaps not... we could limit it to the wi_fold() subsets.. let me have a look. if we get incompatible types, perhaps returning VARYING should be OK?
[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642 --- Comment #10 from Jonathan Wakely --- (In reply to Miro Palmu from comment #9) > Mine is 13.2.1 20230801 so way before Oct 21. (I did not know there were > different snapshots of the releases, I'm just a user trying to help :) ) 13.2.1 (and any x.y.1 version) is not a release, it's a snapshot made from a branch between releases. See https://gcc.gnu.org/develop.html#num_scheme or more details. Releases end with a .0 number. > > Anyway, the original GCC error is the same as PR 112642 > > You probably mean PR 110158 Oops! I meant PR 111258
[Bug c++/110734] Attributes cannot be applied to asm statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110734 --- Comment #5 from Julian Waters --- Note: Trying this with a top level asm gives me: $ g++ -O3 -flto=auto -std=c++14 -pedantic -Wpedantic -fno-omit-frame-pointer exceptions.cpp exceptions.cpp:8:1: error: expected unqualified-id before 'asm' 8 | asm ("nop"); | ^~~ So while it seems the errors are different, it fundamentally is the same issue
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 --- Comment #3 from Andrew Pinski --- parityhi2 should have: rtx extra = gen_reg_rtx (HImode); emit_move_insn (extra, operands[1]); emit_insn (gen_parityhi2_cmp (extra)); Or something similar because parityqi2_cmp clobbers its argument.
[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Robin Dapp --- Fixed.
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Andrew Pinski changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #2 from Andrew Pinski --- Actually it has been wrong since r11-1027-gf08995eefbf579 . Just exposed by Jakub's parity improvement: r14-5557-g6dd4c703be17fa .
[Bug middle-end/112336] fsanitize=address vs _BitInt with a non-mode size (smaller than max mode size)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112336 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Jakub Jelinek --- Created attachment 56667 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56667=edit gcc14-pr112336.patch Untested fix.
[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445 --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #4) > I think this goes wrong during combine. Combine does not / should not combine moves from hard registers just because of extending register live range. It looks that this should also include zero-extracts and other "pseudo-move" instructions. The relevant patch and discussion is at [1]. [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg01356.html
[Bug target/112672] New: [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Bug ID: 112672 Summary: [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64 Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zsojka at seznam dot cz Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu Created attachment 5 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=5=edit reduced testcase Output: $ x86_64-pc-linux-gnu-gcc -O testcase.c $ ./a.out Aborted $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-5761-20231122145100-ge9b39df9333-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r14-5761-20231122145100-ge9b39df9333-checking-yes-rtl-df-extra-nobootstrap-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.0 20231122 (experimental) (GCC) At the asm output, the problem is obvious: main: # testcase.c:8: u *= g; movzx eax, WORD PTR g[rip]# tmp110, g sal eax, 2 # u, # testcase.c:9: return u + __builtin_parityl (u); xor al, ah # u <== THIS OVERWRITES "u" in eax setnp dl #, tmp105 movzx edx, dl # tmp105, tmp105 # testcase.c:9: return u + __builtin_parityl (u); add eax, edx# tmp107, tmp105 <== THIS READ "u", but it has been lost # testcase.c:16: if (x != 4 * 254 + 1) cmp ax, 1017# tmp107, jne .L6 #, # testcase.c:19: } mov eax, 0 #, ret
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2023-11-23 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Obvious this is wrong: ;; _5 = .PARITY (u_4); (insn 7 6 8 (parallel [ (set (reg:CC 17 flags) (unspec:CC [ (reg/v:HI 99 [ uD.2808 ]) ] UNSPEC_PARITY)) (clobber (reg/v:HI 99 [ uD.2808 ])) ]) "/app/example.cpp":9:32 -1 (nil)) ... ;; if (_7 != 1017) (insn 11 10 12 (parallel [ (set (reg:HI 107) (plus:HI (reg/v:HI 99 [ uD.2808 ]) (subreg:HI (reg:SI 100 [ _5 ]) 0))) (clobber (reg:CC 17 flags)) ]) "/app/example.cpp":9:34 discrim 1 -1 (nil))
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |14.0 CC||pinskia at gcc dot gnu.org
[Bug target/112592] FAIL: c-c++-common/pr111309-1.c -std=gnu++14 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:216)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112592 John David Anglin changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from John David Anglin --- Fixed.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #29 from Patrick J. LoPresti --- (In reply to Jakub Jelinek from comment #27) > > No, that is not a reasonable fix, because it severely pessimizes common code > for a theoretical only problem. The very existence of (and interest in) this bug report means it is obviously not "a theoretical only problem". And of course Rich Felker is correct that the cost of the obvious fix is trivial and not remotely "severe". But the bottom line is that GCC is emitting library calls that invoke undefined behavior. At a minimum, GCC should document this non-standard requirement on its runtime environment. Has anyone bothered to do that? Why not?
[Bug debug/112674] New: [14 Regression] Compare-debug failure after recent change on c6x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112674 Bug ID: 112674 Summary: [14 Regression] Compare-debug failure after recent change on c6x Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This patch: commit 6bf66276e3e41d5d92f7b7260e98b6a111653805 Author: Richard Biener Date: Wed Nov 22 11:10:41 2023 +0100 tree-optimization/112344 - wrong final value replacement When performing final value replacement chrec_apply that's used to compute the overall effect of niters to a CHREC doesn't consider that the overall increment of { -2147483648, +, 2 } doesn't fit in a signed integer when the loop iterates until the value of the IV of 20. The following fixes this mistake, carrying out the multiply and add in an unsigned type instead, avoiding undefined overflow and thus later miscompilation by path range analysis. PR tree-optimization/112344 * tree-chrec.cc (chrec_apply): Perform the overall increment calculation and increment in an unsigned type. * gcc.dg/torture/pr112344.c: New testcase. Is causing a compare-debug failure on the c6x port: c6x-sim: gcc.dg/pr65779.c (test for excess errors) I haven't dug into this any deeper. It could well be a c6x bug in the end. While it may sound similar to pr109777, pr109777 has been debugged far enough to lay the blame on the bfin backend.
[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||segher at gcc dot gnu.org, ||uros at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- I think this goes wrong during combine. Before combine, we have: (insn 8 7 10 2 (set (subreg:HI (reg:QI 152) 0) (zero_extract:HI (reg:HI 1 dx [ cu8_0 ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) "pr112445.c":11:1 114 {*extzvhi} (expr_list:REG_DEAD (reg:HI 1 dx [ cu8_0 ]) (nil))) ... tons of insns including (insn 36 34 37 2 (parallel [ (set (reg:TI 142 [ _66 ]) (mult:TI (zero_extend:TI (reg:DI 171 [ cu8_0 ])) (zero_extend:TI (subreg:DI (reg:TI 104 [ _10 ]) 0 (clobber (reg:CC 17 flags)) ]) "pr112445.c":12:9 522 {*umulditi3_1} (expr_list:REG_DEAD (reg:DI 171 [ cu8_0 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil ... (insn 41 38 44 2 (set (reg:DI 177 [ cu8_0+1 ]) (zero_extend:DI (reg:QI 152))) "pr112445.c":12:9 170 {zero_extendqidi2} (expr_list:REG_DEAD (reg:QI 152) (nil))) and combine merges insn 41 with insn 8 across 20 other insns: Trying 8 -> 41: 8: r152:QI#0=zero_extract(dx:HI,0x8,0x8) REG_DEAD dx:HI 41: r177:DI=zero_extend(r152:QI) REG_DEAD r152:QI Successfully matched this instruction: (set (reg:DI 177 [ cu8_0+1 ]) (zero_extract:DI (reg:DI 1 dx [ cu8_0 ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) into: (insn 41 38 44 2 (set (reg:DI 177 [ cu8_0+1 ]) (zero_extract:DI (reg:DI 1 dx [ cu8_0 ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) "pr112445.c":12:9 116 {*extzvdi} (expr_list:REG_DEAD (reg:HI 1 dx [ cu8_0 ]) (nil))) and by that it significantly extends the live range of rdx register, which is a single class register. Now insn 36 has constraints =r,A on output and %d,a on first input and rm,rm on second input, meaning that it either has %rdx:%rax destination (second alternative), or %rdx as one of the inputs, so when %rdx is live across it, it can't be reloaded. On that insn, the commit changed - (match_operand:DWIH 1 "nonimmediate_operand" "%d,0")) + (match_operand:DWIH 1 "register_operand" "%d,a")) on the constraints, is that something that LRA used to handle fine (how?)? Actually, in the r14-4967 reload dump I see: (insn 223 193 202 2 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 40 [0x28])) [3 %sfp+-40 S8 A64]) (reg:DI 1 dx)) "pr112445.c":12:9 90 {*movdi_internal} (nil)) (insn 202 223 36 2 (set (reg:DI 0 ax [orig:142 _66 ] [142]) (mem/c:DI (reg/f:DI 7 sp) [3 %sfp+-80 S8 A128])) "pr112445.c":12:9 90 {*movdi_internal} (nil)) (insn 36 202 203 2 (parallel [ (set (reg:TI 0 ax [orig:142 _66 ] [142]) (mult:TI (zero_extend:TI (reg:DI 0 ax [orig:142 _66 ] [142])) (zero_extend:TI (reg:DI 37 r9 [orig:104 _10 ] [104] (clobber (reg:CC 17 flags)) ]) "pr112445.c":12:9 522 {*umulditi3_1} (nil)) (insn 203 36 224 2 (set (mem/c:TI (reg/f:DI 7 sp) [3 %sfp+-80 S16 A128]) (reg:TI 0 ax [orig:142 _66 ] [142])) "pr112445.c":12:9 89 {*movti_internal} (nil)) (insn 224 203 158 2 (set (reg:DI 1 dx) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 40 [0x28])) [3 %sfp+-40 S8 A64])) "pr112445.c":12:9 90 {*movdi_internal} (nil)) so presumably LRA managed in that case to save and restore %rdx around it. Is the problem the 0->a change when operand 0 is A?
[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- So, can this be closed as fixed?
[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633 --- Comment #5 from Hana Dusíková --- Thanks for really quick fix! You are awesome!
[Bug target/112670] RISC-V: Run fail on pr65518.c with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670 --- Comment #1 from Robin Dapp --- The problem is exposed with the ipa copy propagation pass. I haven't narrowed it down yet but will continue tomorrow.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #46 from CVS Commits --- The master branch has been updated by Jonathan Yong : https://gcc.gnu.org/g:4f1ebd54380e16927cd0085be939165870354eac commit r14-5768-g4f1ebd54380e16927cd0085be939165870354eac Author: Costas Argyris Date: Mon Nov 20 17:58:16 2023 + mingw: Exclude utf8 manifest [PR70, PR108865] Make the utf8 manifest optional (on by default and explicitly off with --disable-win32-utf8-manifest) in the mingw hosts. Also eliminate duplication between the 32-bit and 64-bit mingw hosts by putting them both in the same branch and special-case only the 64-bit long long setting. PR mingw/70 PR mingw/108865 Signed-off-by: Costas Argyris Signed-off-by: Jonathan Yong <10wa...@gmail.com> gcc/Changelog: * configure.ac: Handle new --enable-win32-utf8-manifest option. * config.host: allow win32 utf8 manifest to be disabled by user. * configure: Regenerate.
[Bug modula2/112506] gm2 test failures on x86_64-apple-darwin21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112506 Gaius Mulley changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #4 from Gaius Mulley --- Thanks for the report - I suspect it is a duplicate of PR 111627.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #28 from Rich Felker --- > No, that is not a reasonable fix, because it severely pessimizes common code > for a theoretical only problem. Far less than a call to memmove (which necessarily has something comparable to that and other unnecessary branches) pessimizes it. I also disagree that it's severe. On basically any machine with branch prediction, the branch will be predicted correctly all the time and has basically zero cost. On the other hand, the branches in memmove could go different ways depending on the caller, so it's much more machine-capability-dependent whether they can be predicted. In some sense the optimal thing to do is "nothing", just assuming it would be hard to write a memcpy that fails on src==dest. However, at the very least this precludes hardened memcpy trapping on src==dest, which might be a useful hardening feature (or rather on a range test for overlapping, which would happen to also catch exact overlap). So it would be nice if it were fixed. FWIW, I don't think single branches are relevant to overall performance in cases where the compiler is doing something reasonable by emitting a call to memcpy to implement assignment. If the object is small enough that the branch is relevant, the call overhead is even more of a big deal, and it should be inlining loads/stores to perform the assignment.
[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70 --- Comment #11 from CVS Commits --- The master branch has been updated by Jonathan Yong : https://gcc.gnu.org/g:4f1ebd54380e16927cd0085be939165870354eac commit r14-5768-g4f1ebd54380e16927cd0085be939165870354eac Author: Costas Argyris Date: Mon Nov 20 17:58:16 2023 + mingw: Exclude utf8 manifest [PR70, PR108865] Make the utf8 manifest optional (on by default and explicitly off with --disable-win32-utf8-manifest) in the mingw hosts. Also eliminate duplication between the 32-bit and 64-bit mingw hosts by putting them both in the same branch and special-case only the 64-bit long long setting. PR mingw/70 PR mingw/108865 Signed-off-by: Costas Argyris Signed-off-by: Jonathan Yong <10wa...@gmail.com> gcc/Changelog: * configure.ac: Handle new --enable-win32-utf8-manifest option. * config.host: allow win32 utf8 manifest to be disabled by user. * configure: Regenerate.
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #9 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #8) > Well, in this case the user explicitly told compiler not to do that by not > using a prototype and syntax which doesn't provide one from the definition. > It is like using > int f1 (struct C *x, struct A *y) > { > ... > } > definition in one TU, and > int f1 (int, int); > prototype and > f1 (0, ~x) > call in another one + using LTO. What I meant is how to do decide if the > param_type vs. operand_type mismatch is ok or not. I vote we do nothing extra for those clowns! Just return VARYING for a range :-) it seems like the safest thing to do?
[Bug target/112669] GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669 Thomas Schwinge changed: What|Removed |Added Last reconfirmed||2023-11-22 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |tschwinge at gcc dot gnu.org
[Bug target/112617] [14 regression] ICE when building systemd on HPPA (internal compiler error: in find_reloads, at reload.cc:3839)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112617 John David Anglin changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from John David Anglin --- Should be fixed now.
[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445 --- Comment #7 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #5) > Just changing > --- i386.md.xx2023-11-22 09:47:22.746637132 +0100 > +++ i386.md 2023-11-22 20:38:07.216218697 +0100 > @@ -9984,7 +9984,7 @@ >[(set (match_operand: 0 "register_operand" "=r,A") > (mult: > (zero_extend: > - (match_operand:DWIH 1 "register_operand" "%d,a")) > + (match_operand:DWIH 1 "register_operand" "%d,0")) > (zero_extend: > (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm" > (clobber (reg:CC FLAGS_REG))] > makes the testcase pass. A question is how RA treats 0 constraint when the > two operands have different modes, if it is basically the same as a in that LRA treats the same way as reload pass. It is the same hard reg for LE target. For BE they are different if they require different number of hard regs. > case, meaning that the first input operand will never be in %rdx even when > the A constraint contains %rax and %rdx registers (but the double-word mode > implies it must be low part in %rax high part in $rdx). I looked at the testcase. It seems it can be fixed by different placement of splitting insns. So I believe the bug will stay and can be latent if we fix the PR by some other way. I'll start to work on this bug on Monday as I will be absent the next two days.
[Bug middle-end/112510] [11/12/13/14 Regression]: ASAN code injection breaks alignment of stack variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112510 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #16 from Jakub Jelinek --- Can't reproduce, neither with GCC 12 nor current trunk. In the ASAN_OPTIONS=detect_stack_use_after_return=1 case, the stack frames are allocated by __asan_stack_malloc_4, but that seems to return enough aligned frames for me (eventhough the routine doesn't have an argument to request a particular alignment). Even tried struct __attribute__((aligned (64))) S { char buf[64]; }; __attribute__((noinline, noclone, noipa)) void bar (struct S *p, char *a) { if ((__UINTPTR_TYPE__)p % 64) __builtin_abort (); } __attribute__((noinline, noclone, noipa)) void foo (void) { struct S s; char a; bar (, ); } int main () { for (int i = 0; i < 32; ++i) foo (); } and the frames were sufficiently aligned in all 32 cases.
[Bug debug/112674] [14 Regression] Compare-debug failure after recent change on c6x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112674 --- Comment #1 from Jeffrey A. Law --- And possibly more interesting than the compare-debug failure is this patch seems to be causing Wstringop-overflow-17 to fail on multiple targets, including c6x.
[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633 --- Comment #4 from CVS Commits --- The master branch has been updated by Patrick Palka : https://gcc.gnu.org/g:3f266c84a15d63e42bfad46397fea9aff92b0720 commit r14-5763-g3f266c84a15d63e42bfad46397fea9aff92b0720 Author: Patrick Palka Date: Wed Nov 22 13:54:29 2023 -0500 c++: alias template of non-template class [PR112633] The entering_scope adjustment in tsubst_aggr_type assumes if an alias is dependent, then so is the aliased type (and therefore it has template info) but that's not true for the dependent alias template specialization ty1 below which aliases the non-template class A. In this case no adjustment is needed anyway, so we can just punt. PR c++/112633 gcc/cp/ChangeLog: * pt.cc (tsubst_aggr_type): Handle empty TYPE_TEMPLATE_INFO in the entering_scope adjustment. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-75.C: New test.
[Bug c++/112633] [13/14 Regression] ICE with type aliases and depedent value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112633 --- Comment #6 from CVS Commits --- The releases/gcc-13 branch has been updated by Patrick Palka : https://gcc.gnu.org/g:63c65224e778124eee52acc7b9fcb32cd8ad61e8 commit r13-8090-g63c65224e778124eee52acc7b9fcb32cd8ad61e8 Author: Patrick Palka Date: Wed Nov 22 19:07:19 2023 -0500 c++: alias template of non-template class [PR112633] The entering_scope adjustment in tsubst_aggr_type assumes if an alias is dependent, then so is the aliased type (and therefore it has template info) but that's not true for the dependent alias template specialization ty1 below which aliases the non-template class A. In this case no adjustment is needed anyway, so we can just punt. PR c++/112633 gcc/cp/ChangeLog: * pt.cc (tsubst_aggr_type): Handle empty TYPE_TEMPLATE_INFO in the entering_scope adjustment. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-75.C: New test. (cherry picked from commit 3f266c84a15d63e42bfad46397fea9aff92b0720)
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #6 from Jakub Jelinek --- I don't know the IPA code enough to know whether different operand_type vs. param_type (in the !types_compatible_p sense) means just user bug (in that case returning VARYING is perfectly fine), or if it can happen also on valid code, where say caller has one type of argument and callee a different and there is an implicit (or explicit) cast in between the two. The latter case would be nice to get handled without giving up. I mean something like void foo (int x) { asm volatile ("" : "+r" (x)); } void bar (long x) { foo (x); } void baz (long x) { if (x < -42 || x >= 185) return; bar (x); } kind of thing (but making sure we don't inline and IPA-VRP tries to propagate something etc.).
[Bug sanitizer/112336] fsanitize=address vs _BitInt with a non-mode size (smaller than max mode size)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112336 --- Comment #3 from Jakub Jelinek --- Seems one doesn't need the sanitizer for that, unsigned _BitInt(1) v1; unsigned _BitInt(1) *p1 = ICEs as well.
[Bug target/112592] FAIL: c-c++-common/pr111309-1.c -std=gnu++14 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:216)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112592 --- Comment #2 from CVS Commits --- The master branch has been updated by John David Anglin : https://gcc.gnu.org/g:6f59f959e751d73b371d52f9c657f78d7a77983c commit r14-5765-g6f59f959e751d73b371d52f9c657f78d7a77983c Author: John David Anglin Date: Wed Nov 22 20:06:22 2023 + hppa: Define MAX_FIXED_MODE_SIZE Replace default define. We support TImode when TARGET_64BIT is true. 2023-11-22 John David Anglin gcc/ChangeLog: PR target/112592 * config/pa/pa.h (MAX_FIXED_MODE_SIZE): Define.
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #8 from Jakub Jelinek --- (In reply to Andrew Macleod from comment #7) > Alternatively, if IPA could figure out when things need promoting.. GCC > must already do it, although I suppose thats in the front ends :-P Well, in this case the user explicitly told compiler not to do that by not using a prototype and syntax which doesn't provide one from the definition. It is like using int f1 (struct C *x, struct A *y) { ... } definition in one TU, and int f1 (int, int); prototype and f1 (0, ~x) call in another one + using LTO. What I meant is how to do decide if the param_type vs. operand_type mismatch is ok or not.
[Bug tree-optimization/112673] New: [14 Regression] ICE verify_gimple failed since r14-5557-g6dd4c703be17fa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112673 Bug ID: 112673 Summary: [14 Regression] ICE verify_gimple failed since r14-5557-g6dd4c703be17fa Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mjires at suse dot cz CC: jakub at redhat dot com Target Milestone: --- Compiling pr112566-2.c from testsuite. Bisection points to r14-5557-g6dd4c703be17fa, which also introduced this test. $ gcc pr112566-2.c -Ofast -mf16c pr112566-2.c: In function ‘corge’: pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand 10 | int corge (_BitInt(256) x) { return __builtin_ctzg ((unsigned _BitInt(512)) x); } | ^ _18 = BIT_FIELD_REF ; pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand _1 = BIT_FIELD_REF ; pr112566-2.c:10:5: error: ‘bit_field_ref’ of non-mode-precision operand _38 = BIT_FIELD_REF ; during GIMPLE pass: forwprop pr112566-2.c:10:5: internal compiler error: verify_gimple failed 0x105854d verify_gimple_in_cfg(function*, bool, bool) /home/mjires/git/GCC/master/gcc/tree-cfg.cc:5662 0xee86b4 execute_function_todo /home/mjires/git/GCC/master/gcc/passes.cc:2088 0xee8c0e execute_todo /home/mjires/git/GCC/master/gcc/passes.cc:2142 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. $ gcc -v Using built-in specs. COLLECT_GCC=/home/mjires/built/master/bin/gcc COLLECT_LTO_WRAPPER=/home/mjires/built/master/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/mjires/git/GCC/master/configure --prefix=/home/mjires/built/master --disable-bootstrap --enable-checking --enable-languages=c,c++,fortran,lto Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.0 20231122 (experimental) (GCC)
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #30 from CVS Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:990769a343f090088f5025ad233f88824b2c6263 commit r14-5769-g990769a343f090088f5025ad233f88824b2c6263 Author: Pan Li Date: Mon Nov 13 11:22:37 2023 +0800 DSE: Allow vector type for get_stored_val when read < store Update in v4: * Merge upstream and removed some independent changes. Update in v3: * Take known_le instead of known_lt for vector size. * Return NULL_RTX when gap is not equal 0 and not constant. Update in v2: * Move vector type support to get_stored_val. Original log: This patch would like to allow the vector mode in the get_stored_val in the DSE. It is valid for the read rtx if and only if the read bitsize is less than the stored bitsize. Given below example code with --param=riscv-autovec-preference=fixed-vlmax. vuint8m1_t test () { uint8_t arr[32] = { 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, }; return __riscv_vle8_v_u8m1(arr, 32); } Before this patch: test: lui a5,%hi(.LANCHOR0) addisp,sp,-32 addia5,a5,%lo(.LANCHOR0) li a3,32 vl2re64.v v2,0(a5) vsetvli zero,a3,e8,m1,ta,ma vs2r.v v2,0(sp) <== Unnecessary store to stack vle8.v v1,0(sp) <== Ditto vs1r.v v1,0(a0) addisp,sp,32 jr ra After this patch: test: lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) li a4,32 addisp,sp,-32 vsetvli zero,a4,e8,m1,ta,ma vle8.v v1,0(a5) vs1r.v v1,0(a0) addisp,sp,32 jr ra Below tests are passed within this patch: * The risc-v regression test. * The x86 bootstrap and regression test. * The aarch64 regression test. PR target/111720 gcc/ChangeLog: * dse.cc (get_stored_val): Allow vector mode if read size is less than or equal to stored size. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr111720-0.c: New test. * gcc.target/riscv/rvv/base/pr111720-1.c: New test. * gcc.target/riscv/rvv/base/pr111720-10.c: New test. * gcc.target/riscv/rvv/base/pr111720-2.c: New test. * gcc.target/riscv/rvv/base/pr111720-3.c: New test. * gcc.target/riscv/rvv/base/pr111720-4.c: New test. * gcc.target/riscv/rvv/base/pr111720-5.c: New test. * gcc.target/riscv/rvv/base/pr111720-6.c: New test. * gcc.target/riscv/rvv/base/pr111720-7.c: New test. * gcc.target/riscv/rvv/base/pr111720-8.c: New test. * gcc.target/riscv/rvv/base/pr111720-9.c: New test. Signed-off-by: Pan Li
[Bug testsuite/106120] [13 regression] g++.dg/warn/Wstringop-overflow-4.C fails since r13-1268-g8c99e307b20c50
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106120 --- Comment #13 from CVS Commits --- The master branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:e935151bad1c2a02dc6a31fce3cc21b17d616243 commit r14-5767-ge935151bad1c2a02dc6a31fce3cc21b17d616243 Author: Hans-Peter Nilsson Date: Wed Nov 22 02:54:29 2023 +0100 testsuite: Tweak xfail bogus g++.dg/warn/Wstringop-overflow-4.C:144, PR106120 The conditions under which this this bogus warning is emitted has changed to not happen for 32-bit targets anymore. Adjust accordingly. PR testsuite/106120 * g++.dg/warn/Wstringop-overflow-4.C:144 XFAIL bogus warning for lp64 targets with c++98.
[Bug target/112617] [14 regression] ICE when building systemd on HPPA (internal compiler error: in find_reloads, at reload.cc:3839)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112617 --- Comment #4 from CVS Commits --- The master branch has been updated by John David Anglin : https://gcc.gnu.org/g:a89224f819381b77657145fdd8b1d997b989fdc0 commit r14-5764-ga89224f819381b77657145fdd8b1d997b989fdc0 Author: John David Anglin Date: Wed Nov 22 19:47:34 2023 + hppa: Fix integer REG+D address reloads I made a mistake in the previous change to integer_store_memory_operand. There is no support pa_emit_move sequence to handle secondary reloads of integer REG+D instructions. Further, the Q constraint is used for some non-simple instructions (movb and addib). Thus, we need to return true when reload is in progress. 2023-11-22 John David Anglin gcc/ChangeLog: PR target/112617 * config/pa/predicates.md (integer_store_memory_operand): Return true for REG+D addresses when reload_in_progress is true.
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #7 from Andrew Macleod --- Explicit casts would be no problem as they go through the proper machinery. The IL for that case has an explicit cast in it. _1 = (int) x_2(D); foo (_1); its when that cast is not present,and we try to, say subtract two values, that we have a problem. we expect the compiler to promote things to be compatible when they are suppose to be. This would apply to dual operand arithmetic like +, -, /, *, bitwise ops, etc. The testcase in particular is a bitwise not... but it has a return type that is 64 bits and a operand type that is 32. It was expected that the compiler would promote the operand to 64 bits if it expects a 64 bit result. At least for those tree codes which expect compatible types.. I don't think we want to get into overruling decisions at the range-ops level.. So we decide whether to trap (which would be the same result as we see now :-P), or handle it some other way. returning VARYING was my thought.. because it means something is amuck so say we dont know anything. Alternatively, if IPA could figure out when things need promoting.. GCC must already do it, although I suppose thats in the front ends :-P
[Bug middle-end/112510] [11/12/13/14 Regression]: ASAN code injection breaks alignment of stack variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112510 --- Comment #17 from Vladimir Sadovnikov --- Reproducible with 11.4.0 ~$ export ASAN_OPTIONS=detect_stack_use_after_return=1 ~$ g++ -fsanitize=address -Og test-case.cpp ~$ ./a.out Aborted (core dumped) ~$ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Not reproducible with 7.5.0: sadko@tuf-gaming:~/tmp> export ASAN_OPTIONS=detect_stack_use_after_return=1 sadko@tuf-gaming:~/tmp> g++ -fsanitize=address -Og test-case.cpp sadko@tuf-gaming:~/tmp> ./a.out sadko@tuf-gaming:~/tmp> gcc --version gcc (SUSE Linux) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Generated code for 11.4.0: 11e9 <_Z3barP1SPc>: 11e9: f3 0f 1e fa endbr64 11ed: 40 f6 c7 3f test $0x3f,%dil 11f1: 75 01 jne11f4 <_Z3barP1SPc+0xb> 11f3: c3 ret 11f4: 48 83 ec 08 sub$0x8,%rsp 11f8: e8 c3 fe ff ff call 10c0 <__asan_handle_no_return@plt> 11fd: e8 9e fe ff ff call 10a0 1202 <_Z3foov>: 1202: f3 0f 1e fa endbr64 1206: 55 push %rbp 1207: 48 89 e5mov%rsp,%rbp 120a: 41 55 push %r13 120c: 41 54 push %r12 120e: 53 push %rbx 120f: 48 83 e4 c0 and$0xffc0,%rsp 1213: 48 81 ec 00 01 00 00sub$0x100,%rsp 121a: 48 8d 5c 24 20 lea0x20(%rsp),%rbx 121f: 49 89 ddmov%rbx,%r13 1222: 83 3d e7 2d 00 00 00cmpl $0x0,0x2de7(%rip)# 4010 <__asan_option_detect_stack_use_after_return@@Base> 1229: 0f 85 bb 00 00 00 jne12ea <_Z3foov+0xe8> 122f: 48 c7 03 b3 8a b5 41movq $0x41b58ab3,(%rbx) 1236: 48 8d 05 c7 0d 00 00lea0xdc7(%rip),%rax# 2004 <_IO_stdin_used+0x4> 123d: 48 89 43 08 mov%rax,0x8(%rbx) 1241: 48 8d 05 ba ff ff fflea-0x46(%rip),%rax# 1202 <_Z3foov> 1248: 48 89 43 10 mov%rax,0x10(%rbx) 124c: 49 89 dcmov%rbx,%r12 124f: 49 c1 ec 03 shr$0x3,%r12 1253: 41 c7 84 24 00 80 ffmovl $0xf1f1f1f1,0x7fff8000(%r12) 125a: 7f f1 f1 f1 f1 125f: 41 c7 84 24 04 80 ffmovl $0xf1f1f1f1,0x7fff8004(%r12) 1266: 7f f1 f1 f1 f1 126b: 41 c7 84 24 08 80 ffmovl $0xf201f1f1,0x7fff8008(%r12) 1272: 7f f1 f1 01 f2 1277: 41 c7 84 24 14 80 ffmovl $0xf3f3f3f3,0x7fff8014(%r12) 127e: 7f f3 f3 f3 f3 1283: 64 48 8b 04 25 28 00mov%fs:0x28,%rax 128a: 00 00 128c: 48 89 84 24 f8 00 00mov%rax,0xf8(%rsp) 1293: 00 1294: 31 c0 xor%eax,%eax 1296: 48 8d 73 50 lea0x50(%rbx),%rsi 129a: 48 8d 7b 60 lea0x60(%rbx),%rdi 129e: e8 46 ff ff ff call 11e9 <_Z3barP1SPc> 12a3: 49 39 ddcmp%rbx,%r13 12a6: 75 5d jne1305 <_Z3foov+0x103> 12a8: 49 c7 84 24 00 80 ffmovq $0x0,0x7fff8000(%r12) 12af: 7f 00 00 00 00 12b4: 41 c7 84 24 08 80 ffmovl $0x0,0x7fff8008(%r12) 12bb: 7f 00 00 00 00 12c0: 41 c7 84 24 14 80 ffmovl $0x0,0x7fff8014(%r12) 12c7: 7f 00 00 00 00 12cc: 48 8b 84 24 f8 00 00mov0xf8(%rsp),%rax 12d3: 00 12d4: 64 48 2b 04 25 28 00sub%fs:0x28,%rax 12db: 00 00 12dd: 75 65 jne1344 <_Z3foov+0x142> 12df: 48 8d 65 e8 lea-0x18(%rbp),%rsp 12e3: 5b pop%rbx 12e4: 41 5c pop%r12 12e6: 41 5d pop%r13 12e8: 5d pop%rbp 12e9: c3 ret 12ea: bf c0 00 00 00 mov$0xc0,%edi 12ef: e8 ec fd ff ff call 10e0 <__asan_stack_malloc_2@plt> 12f4: 48 85 c0test %rax,%rax 12f7: 0f 84 32 ff ff ff je 122f <_Z3foov+0x2d> 12fd: 48 89 c3mov%rax,%rbx 1300: e9 2a ff ff ff jmp122f <_Z3foov+0x2d> 1305: 48 c7 03 0e 36 e0
[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445 --- Comment #5 from Jakub Jelinek --- Just changing --- i386.md.xx 2023-11-22 09:47:22.746637132 +0100 +++ i386.md 2023-11-22 20:38:07.216218697 +0100 @@ -9984,7 +9984,7 @@ [(set (match_operand: 0 "register_operand" "=r,A") (mult: (zero_extend: - (match_operand:DWIH 1 "register_operand" "%d,a")) + (match_operand:DWIH 1 "register_operand" "%d,0")) (zero_extend: (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm" (clobber (reg:CC FLAGS_REG))] makes the testcase pass. A question is how RA treats 0 constraint when the two operands have different modes, if it is basically the same as a in that case, meaning that the first input operand will never be in %rdx even when the A constraint contains %rax and %rdx registers (but the double-word mode implies it must be low part in %rax high part in $rdx).
[Bug target/111720] RISC-V: Ugly codegen in RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720 --- Comment #31 from Li Pan --- We still have some unnecessary code here, which is stack-related, will take care of it in another PATCH. After this patch: test: lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) li a4,32 addisp,sp,-32 <== unnecessary insn vsetvli zero,a4,e8,m1,ta,ma vle8.v v1,0(a5) vs1r.v v1,0(a0) addisp,sp,32<== unnecessary insn jr ra
[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652 --- Comment #3 from Jakub Jelinek --- (In reply to r...@cebitec.uni-bielefeld.de from comment #2) > > --- Comment #1 from Jakub Jelinek --- > > Strange. On cfarm211 which is > > SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise > > the test passes. > > Can you check which libiconv got picked up there? The non-standard > OpenCSW packages on that system may include GNU libiconv and install > into default system directories, so they are picked up by default. /opt/csw/lib/libiconv.so.2 > > > You get no diagnostics for those lines at all? Buggy libconv? > > No. There's no separate libiconv on Solaris; the iconv* functions are > included in libc. On Linux I get: echo á | iconv -f UTF-8 -t ASCII -; echo | iconv -f UTF-8 -t ISO-8859-1 - iconv: illegal input sequence at position 0 iconv: illegal input sequence at position 0 while on Solaris echo á | iconv -f UTF-8 -t ASCII -; echo | iconv -f UTF-8 -t ISO-8859-1 - ? ? If it maps all characters which do not have representation in the destination character set into ?, then it is useless for the test in question.
[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652 --- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #1 from Jakub Jelinek --- > Strange. On cfarm211 which is > SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise > the test passes. Can you check which libiconv got picked up there? The non-standard OpenCSW packages on that system may include GNU libiconv and install into default system directories, so they are picked up by default. > You get no diagnostics for those lines at all? Buggy libconv? No. There's no separate libiconv on Solaris; the iconv* functions are included in libc. > I mean the emojis certainly aren't in ISO-8859-1... Probably not ;-) FWIW, I've just built trunk with GNU libiconv 1.17 on i386-pc-solaris2.11. The test PASSes now with both LANG=C and LANG=en_US.UTF-8. I'll dig further into Solaris iconv functions here...
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #27 from Jakub Jelinek --- (In reply to Rich Felker from comment #26) > > The only reasonable fix on the compiler side is to never emit memcpy but > > always use memmove. > > No, it can literally just emit (equivalent at whatever intermediate form of): > > cmp src,dst > je 1f > call memcpy > 1: > > in place of memcpy. No, that is not a reasonable fix, because it severely pessimizes common code for a theoretical only problem.
[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671 --- Comment #4 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #3 from Arsen Arsenović --- > hm, actually, I think I confused reports - sorry. > > do you know if this worked a short while ago? and if so, how did such a > configuration look? I have no idea: at least AFAICS back to the gcc-11 branch (didn't look further) there was only --with-libiconv-prefix. Still it's inconsistent with how many (all?) other support libs are handled.
[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671 --- Comment #3 from Arsen Arsenović --- hm, actually, I think I confused reports - sorry. do you know if this worked a short while ago? and if so, how did such a configuration look?
[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671 --- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #1 from Arsen Arsenović --- [...] > I will restore the modifications in the shared tree with the few other patches > I mentioned on the GCC ML recently soon (I've ran a little low on testing > bandwidth this week..) > > apologies for the inconvenience No worries, this is the first time ever I tried this on Solaris and can easily live with 32-bit-only testing for now. Thanks for taking care of this.
[Bug other/112671] libiconv support lacks separate --with-libiconv-{include,lib}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671 --- Comment #1 from Arsen Arsenović --- yes, this also came up from the binutils side. see https://inbox.sourceware.org/binutils/874jhg2x6p@adacore.com/ I will restore the modifications in the shared tree with the few other patches I mentioned on the GCC ML recently soon (I've ran a little low on testing bandwidth this week..) apologies for the inconvenience
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #26 from Rich Felker --- > The only reasonable fix on the compiler side is to never emit memcpy but > always use memmove. No, it can literally just emit (equivalent at whatever intermediate form of): cmp src,dst je 1f call memcpy 1: in place of memcpy. It can even optimize out that in the case where it's provable that they're not equal, e.g. presence of restrict or one of the two objects not having had its address taken/leaked.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #10 from Richard Sandiford --- (In reply to Richard Biener from comment #9) > So do we expect - independed of whether a constant/external is used as mask > - that uniform constants/externals are generatable and thus we can elide the > check for those? Possibly also go a different path during code-generation > then? (because that will otherwise assert) Yeah, I think so. At the time, I don't think there were any cases where treating uniform values differently would have helped, and it wasn't trivial thing to test on the fly. But now we have a reason to try :)
[Bug middle-end/112344] [14 Regression] Wrong code at -O2 on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112344 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #11 from Richard Biener --- Fixed.
[Bug middle-end/112344] [14 Regression] Wrong code at -O2 on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112344 --- Comment #10 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:6bf66276e3e41d5d92f7b7260e98b6a111653805 commit r14-5759-g6bf66276e3e41d5d92f7b7260e98b6a111653805 Author: Richard Biener Date: Wed Nov 22 11:10:41 2023 +0100 tree-optimization/112344 - wrong final value replacement When performing final value replacement chrec_apply that's used to compute the overall effect of niters to a CHREC doesn't consider that the overall increment of { -2147483648, +, 2 } doesn't fit in a signed integer when the loop iterates until the value of the IV of 20. The following fixes this mistake, carrying out the multiply and add in an unsigned type instead, avoiding undefined overflow and thus later miscompilation by path range analysis. PR tree-optimization/112344 * tree-chrec.cc (chrec_apply): Perform the overall increment calculation and increment in an unsigned type. * gcc.dg/torture/pr112344.c: New testcase.
[Bug c/111911] [11/12/13/14 Regression] ICE with integer overflow converting to _Bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111911 Jakub Jelinek changed: What|Removed |Added Keywords|needs-bisection | CC||jakub at gcc dot gnu.org, ||jsm28 at gcc dot gnu.org Priority|P3 |P2 --- Comment #5 from Jakub Jelinek --- Started with r10-5922-g3d77686d2eddf76d3498169d0ca5653db45a8662
[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #6 from Richard Biener --- (In reply to Jan Hubicka from comment #5) > > but the issue is that test2 escapes which makes this conflict: > > It is passed to memmove which is noescape and returned. Why local PTA > considers returned values to escape? The pointed to memory escapes which means that stores to it are not dead. Mind we do not have a separate points-to set for escaped via return (some functions can also "return" like via EH or longjmp, and we can't really know the latter w/o IPA analysis). Pointers can also escape to global memory. Special-casing the regular return path is sth that's possible (also IPA points-to doesn't compute a "local" escaped at all but preserves the non-IPA solution for that), but in the end it didn't seem important enough for me to try doing that ... We have the function entry state which is NONLOAL, ESCAPED is what determines "global memory" for all sorts of optimizations. If we split out RETURN_ESCAPED then that would be ESCAPED | RETURN_ESCAPED and alias disambiguation could avoid RETURN_ESCAPED. But ESCAPED handling is complicated already ...
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 Jakub Jelinek changed: What|Removed |Added CC||amacleod at redhat dot com, ||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Slightly cleaned up: void f2 (void); void f4 (int, int, int); struct A { int a; }; struct B { struct A *b; int c; } v; static int f1 (x, y) struct C *x; struct A *y; { (v.c = v.b->a) || (v.c = v.b->a); f2 (); } static void f3 (int x, int y) { int b = f1 (0, ~x); f4 (0, 0, v.c); } void f5 (void) { f3 (0, 0); } The problem is in the f1 call, given it uses the K definition style and the caller invokes UB by using incompatible types (int vs. pointers), I think IPA-VRP should punt somewhere on the type mismatch. I think Value_Range vr (operand_type); if (TREE_CODE_CLASS (operation) == tcc_unary) ipa_vr_operation_and_type_effects (vr, src_lats->m_value_range.m_vr, operation, param_type, operand_type); should be avoided if param_type is not a compatible type to operand_type, unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if the float to integral or vice versa ops as well but vrp probably doesn't handle that yet). In the above case, param_type is struct A *, i.e. pointer, while operand_type is int.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #9 from Richard Biener --- (In reply to Richard Sandiford from comment #8) > I think we're going down the wrong path here. If I've understood > the original change correctly, dummy masks aren't special because > they're masks. They're special because all elements are equal to > the same value. A mask such as: > > { 1, 1, 1, 0, 1 } > > would not be OK, just like an integer vector with those values would > not be OK. > > So IMO we should check whether all elements are equal, rather than > whether the type is one thing or another. So do we expect - independed of whether a constant/external is used as mask - that uniform constants/externals are generatable and thus we can elide the check for those? Possibly also go a different path during code-generation then? (because that will otherwise assert)
[Bug rtl-optimization/112610] [12/13/14 Regression] ICE: SIGSEGV with -flive-range-shrinkage -fdump-rtl-all-all -fira-verbose=9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112610 --- Comment #2 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:95f61de95bbcc2e4fb7020e27698140abea23788 commit r14-5757-g95f61de95bbcc2e4fb7020e27698140abea23788 Author: Vladimir N. Makarov Date: Wed Nov 22 09:01:02 2023 -0500 [IRA]: Fix using undefined dump file in IRA code during insn scheduling Part of IRA code is used for register pressure sensitive insn scheduling and live range shrinkage. Numerous changes of IRA resulted in that this IRA code uses dump file passed by the scheduler and internal ira dump file (in called functions) which can be undefined or freed by the scheduler during compiling previous functions. The patch fixes this problem. To reproduce the error valgrind should be used and GCC should be compiled with valgrind annotations. Therefor the patch does not contain the test case. gcc/ChangeLog: PR rtl-optimization/112610 * ira-costs.cc: (find_costs_and_classes): Remove arg. Use ira_dump_file for printing. (print_allocno_costs, print_pseudo_costs): Ditto. (ira_costs): Adjust call of find_costs_and_classes. (ira_set_pseudo_classes): Set up and restore ira_dump_file.
[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668 Jakub Jelinek changed: What|Removed |Added Last reconfirmed||2023-11-22 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #3 from Jakub Jelinek --- Created attachment 56665 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56665=edit gcc14-pr112668.patch Untested fix.
[Bug other/112671] New: libiconv support lacks separate --with-libiconv-{include,lib}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112671 Bug ID: 112671 Summary: libiconv support lacks separate --with-libiconv-{include,lib} Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: ro at gcc dot gnu.org Target Milestone: --- When trying to build trunk on Solaris with GNU libiconv 1.17, I noticed that the libiconv configure support is limited compared to e.g. gmp etc. On multilibbed targets like Solaris, you usually install both 32 and 64-bit versions of a library into /include (common between 32 and 64-bit) and /lib (32-bit lib) resp. /lib/{amd64,sparcv9}. The current (simple-minded) support via --with-libiconv-prefix cannot handle this, requiring to use two different installations into different prefixes. It's also inconsistent with the rest of gcc which does support configurations like this.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #8 from Richard Sandiford --- I think we're going down the wrong path here. If I've understood the original change correctly, dummy masks aren't special because they're masks. They're special because all elements are equal to the same value. A mask such as: { 1, 1, 1, 0, 1 } would not be OK, just like an integer vector with those values would not be OK. So IMO we should check whether all elements are equal, rather than whether the type is one thing or another.
[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563 --- Comment #11 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #10 from Jakub Jelinek --- > (In reply to r...@cebitec.uni-bielefeld.de from comment #9) [...] >> I've now come up with an alternative. It's a bit ugly, but it gets the >> work done: >> >> diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h >> b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h >> --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h >> +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h >> @@ -17,6 +17,17 @@ >> // The asm hack only works with GCC and Clang. >> #if !defined(_WIN32) >> >> +// FIXME: Explain. >> +#if defined(__sparc__) >> +#define ASM_MEM_DEF(FUNC) \ >> +__asm__(".global " #FUNC "\n" \ >> +".type " #FUNC ",function\n" \ > > Not @function ? No, this should be #function: that's the only variant sparc as understands, and gas does for compatibility. >> +".weak " #FUNC "\n" \ >> +#FUNC ":\n"); >> +ASM_MEM_DEF(__sanitizer_internal_memcpy) >> +ASM_MEM_DEF(__sanitizer_internal_memmove) >> +ASM_MEM_DEF(__sanitizer_internal_memset) >> +#endif >> asm("memcpy = __sanitizer_internal_memcpy"); >> asm("memmove = __sanitizer_internal_memmove"); >> asm("memset = __sanitizer_internal_memset"); >> >> I've run libsanitizer builds on sparc without this patch (gas only since >> as fails) and with it (as and gas). It fixes the as build failure and >> leaves the same number of calls to mem* functions in libasan.so as an >> unpatched tree with gas. > > If it works, nice. Can you file it on github.com/llvm/llvm-project as an > issue > and see if upstream is willing to accept it? I think they'll want some Can do, either as an issue or directly as a pull request. I'll run it through a full llvm build, too, first. > indentation changes (if defined(__sparc__) is below the _WIN32 #if, so they > probably want it > indented more and the define even more. And dunno if defined(__sparc__) or > SANITIZER_SPARC should be used. I know: LLVM has clang/tools/clang-format/clang-format-diff.py to handle this. I usually run my patches through that first, unlike it messes up the existing formatting as was the case for pull request #72973. The patches also needs an explanatory comment; this was just a proof of concept. It might be even better to restrict the hack to __sparc__ && __sun__ && __svr4__ to avoid interfering with Linux/sparc64.
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #8 from Jan Hubicka --- The negative return value branch predictor is set to have 98% hitrate (measured on SPEC2k17 some time ago). There is --param predictable-branch-outcome that is also set to 2% so indeed we consider the branch as well predictable by this heuristics. Reducing --param should make cmov to happen. With profile_probability data type we could try something smarter on guessing if given branch is predictable (such as ignoring guessed values and let predictor to optionally mark branches as (un)predictable). But it is not quite clear to me what desired behavior would be... Guessing predictability of data branches is generally quite hard problem. Predictablity of loop branches is easier, but we hardly apply BRANCH_COST on branch closing loop since those are not if-conversion candidates.
[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563 --- Comment #10 from Jakub Jelinek --- (In reply to r...@cebitec.uni-bielefeld.de from comment #9) > > --- Comment #8 from Jakub Jelinek --- > > So, shall we go with > > --- libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h.jj > > 2023-11-15 12:45:17.359586776 +0100 > > +++ libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h 2023-11-21 > > 18:29:52.401817763 +0100 > > @@ -15,7 +15,8 @@ > > #define SANITIZER_REDEFINE_BUILTINS_H > > > > // The asm hack only works with GCC and Clang. > > -#if !defined(_WIN32) > > +// It doesn't work when using Solaris as either. > > +#if !defined(_WIN32) && !SANITIZER_SOLARIS > > > > asm("memcpy = __sanitizer_internal_memcpy"); > > asm("memmove = __sanitizer_internal_memmove"); > > @@ -50,7 +51,7 @@ using vector = Define_SANITIZER_COMMON_N > > } // namespace std > > > > # endif // __cpluplus > > -#endif// !_WIN32 > > +#endif// !_WIN32 && !SANITIZER_SOLARIS > > > > # endif // SANITIZER_REDEFINE_BUILTINS_H > > #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS > > > > then (either as local patch or try to push it upstream)? > > That's way to heavy IMO: it punishes the Solaris/x86 as which isn't > affected and also Solaris/SPARC with gas. > > I've now come up with an alternative. It's a bit ugly, but it gets the > work done: > > diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h > b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h > --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h > +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h > @@ -17,6 +17,17 @@ > // The asm hack only works with GCC and Clang. > #if !defined(_WIN32) > > +// FIXME: Explain. > +#if defined(__sparc__) > +#define ASM_MEM_DEF(FUNC) \ > +__asm__(".global " #FUNC "\n" \ > +".type " #FUNC ",function\n" \ Not @function ? > +".weak " #FUNC "\n" \ > +#FUNC ":\n"); > +ASM_MEM_DEF(__sanitizer_internal_memcpy) > +ASM_MEM_DEF(__sanitizer_internal_memmove) > +ASM_MEM_DEF(__sanitizer_internal_memset) > +#endif > asm("memcpy = __sanitizer_internal_memcpy"); > asm("memmove = __sanitizer_internal_memmove"); > asm("memset = __sanitizer_internal_memset"); > > I've run libsanitizer builds on sparc without this patch (gas only since > as fails) and with it (as and gas). It fixes the as build failure and > leaves the same number of calls to mem* functions in libasan.so as an > unpatched tree with gas. If it works, nice. Can you file it on github.com/llvm/llvm-project as an issue and see if upstream is willing to accept it? I think they'll want some indentation changes (if defined(__sparc__) is below the _WIN32 #if, so they probably want it indented more and the define even more. And dunno if defined(__sparc__) or SANITIZER_SPARC should be used.
[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #5 from Jan Hubicka --- > but the issue is that test2 escapes which makes this conflict: It is passed to memmove which is noescape and returned. Why local PTA considers returned values to escape?
[Bug ipa/98925] Extend ipa-prop to handle return functions for slot optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98925 --- Comment #3 from Jan Hubicka --- Return value range propagation was added in r:53ba8d669550d3a1f809048428b97ca607f95cf5 however it works on scalar return values only for now. Extending it to aggregates is a logical next step and should not be terribly hard. The code also misses logic for IPA streaming so it works only in ealry and late opts.
[Bug sanitizer/112563] [14 regression] libsanitizer doesn't assemble with Solaris/sparc as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112563 --- Comment #9 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #8 from Jakub Jelinek --- > So, shall we go with > --- libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h.jj > 2023-11-15 12:45:17.359586776 +0100 > +++ libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h 2023-11-21 > 18:29:52.401817763 +0100 > @@ -15,7 +15,8 @@ > #define SANITIZER_REDEFINE_BUILTINS_H > > // The asm hack only works with GCC and Clang. > -#if !defined(_WIN32) > +// It doesn't work when using Solaris as either. > +#if !defined(_WIN32) && !SANITIZER_SOLARIS > > asm("memcpy = __sanitizer_internal_memcpy"); > asm("memmove = __sanitizer_internal_memmove"); > @@ -50,7 +51,7 @@ using vector = Define_SANITIZER_COMMON_N > } // namespace std > > # endif // __cpluplus > -#endif// !_WIN32 > +#endif// !_WIN32 && !SANITIZER_SOLARIS > > # endif // SANITIZER_REDEFINE_BUILTINS_H > #endif// SANITIZER_COMMON_NO_REDEFINE_BUILTINS > > then (either as local patch or try to push it upstream)? That's way to heavy IMO: it punishes the Solaris/x86 as which isn't affected and also Solaris/SPARC with gas. I've now come up with an alternative. It's a bit ugly, but it gets the work done: diff --git a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h --- a/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h +++ b/libsanitizer/sanitizer_common/sanitizer_redefine_builtins.h @@ -17,6 +17,17 @@ // The asm hack only works with GCC and Clang. #if !defined(_WIN32) +// FIXME: Explain. +#if defined(__sparc__) +#define ASM_MEM_DEF(FUNC) \ +__asm__(".global " #FUNC "\n" \ +".type " #FUNC ",function\n" \ +".weak " #FUNC "\n" \ +#FUNC ":\n"); +ASM_MEM_DEF(__sanitizer_internal_memcpy) +ASM_MEM_DEF(__sanitizer_internal_memmove) +ASM_MEM_DEF(__sanitizer_internal_memset) +#endif asm("memcpy = __sanitizer_internal_memcpy"); asm("memmove = __sanitizer_internal_memmove"); asm("memset = __sanitizer_internal_memset"); I've run libsanitizer builds on sparc without this patch (gas only since as fails) and with it (as and gas). It fixes the as build failure and leaves the same number of calls to mem* functions in libasan.so as an unpatched tree with gas.
[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #17 from Jan Hubicka --- -falign-functions/-falign-jumps/-falign-labels/-falign-loops are originally are intended for performance tuning. Starting function entry close to the end of page of code cache may lead to wasted code cache space as well as higher overhead calling the function when CPU fetches page which contains just little useful information. As such I would like to keep them affecting only hot code (we should update documentation for that). Internally we have FUNCTION_BOUNDARY which specifies minimal alignment needed by ABI, which is set to 8bits for i386. My understanding is that -fpatchable-function-entry requires the alignment to be 64bits in order to make it possible to atomically change the instruction. So perhaps we want to make FUNCTION_BOUNDARY to be 64 for functions where we output the patchable entry? I am also OK with extending the flag syntax or adding -fmin-function-alignment to specify optional user-defined minimum (increase FUNCTION_BOUNDARY) if that seems useful, but I think the first one is most consistent way to go with live patching?
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #7 from Richard Biener --- diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 4a09b3c2aca..d0967240ae3 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -766,7 +766,10 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, if ((dt == vect_constant_def || dt == vect_external_def) && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () - && TREE_CODE (type) != BOOLEAN_TYPE + && (!is_gimple_call (stmt_info->stmt) + || !gimple_call_internal_p (stmt_info->stmt) + || internal_fn_mask_index + (gimple_call_internal_fn (stmt_info->stmt)) != opno) && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type)) { if (dump_enabled_p ()) fixes the testcase, not sure if it still resolves the issue fixed with the original change.
[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642 --- Comment #9 from Miro Palmu --- (In reply to Jonathan Wakely from comment #8) > > Also tried it locally with clang 16.0.6 with > > gcc-13.2.1 libstdc++ > > Which gcc-13.2.1 though? That's a snapshot that could date from any time in > the past four months. If I use gcc version 13.2.1 20231025 then clang > compiles it. Mine is 13.2.1 20230801 so way before Oct 21. (I did not know there were different snapshots of the releases, I'm just a user trying to help :) ) > Anyway, the original GCC error is the same as PR 112642 You probably mean PR 110158
[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598 --- Comment #7 from CVS Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:de6f3e12bd188fee30bc79a5e323e16e0dbbe8ca commit r14-5755-gde6f3e12bd188fee30bc79a5e323e16e0dbbe8ca Author: Juzhe-Zhong Date: Wed Nov 22 18:53:22 2023 +0800 RISC-V: Fix incorrect use of vcompress in permutation auto-vectorization This patch fixes following FAILs on zvl512b of RV32 system: FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c execution test FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c execution test The root cause is that for permutation indice = {0,3,7,0} use vcompress optimization which is incorrect. Fix vcompress optimization bug. PR target/112598 gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_compress_patterns): Fix vcompress bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112598-3.c: New test.
[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668 --- Comment #2 from Jakub Jelinek --- No loop is needed: /* PR middle-end/112668 */ /* { dg-do compile { target bitint } } */ /* { dg-options "-std=c23 -fnon-call-exceptions" } */ #if __BITINT_MAXWIDTH__ >= 495 struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) c : 2; }; extern void foo (struct T495 *r495); unsigned _BitInt(495) bar (int i) { struct T495 r495[12]; foo (r495); return r495[i].b; } #endif
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 --- Comment #6 from Richard Biener --- As suggested in the review at time the change would ideally be restricted to actual mask operands, not random BOOLEAN_TYPE ones.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b since r14-5101-g60034ecf25597b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 Richard Biener changed: What|Removed |Added Summary|[14] RISC-V ICE: in |[14] RISC-V ICE: in |duplicate_and_interleave, |duplicate_and_interleave, |at tree-vect-slp.cc:8025|at tree-vect-slp.cc:8025 |with maxval_char_3.f90 |with maxval_char_3.f90 |vlen256b|vlen256b since ||r14-5101-g60034ecf25597b --- Comment #5 from Richard Biener --- Btw, a fallout of r14-5101-g60034ecf25597b.
[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642 --- Comment #8 from Jonathan Wakely --- (In reply to Miro Palmu from comment #7) > (In reply to Jonathan Wakely from comment #6) > > The examples in comment 4 do compile using libstdc++ on clang, if you use > > libstdc++ headers from after sept 29 (for trunk) or oct 21 (for gcc-13). > > I was testing this on compiler explorer on clang 17.0.1 and it used > gcc-13.2.0 libstdc++. Which is expected to fail, because 13.2.0 was released before Oct 21. > Also tried it locally with clang 16.0.6 with > gcc-13.2.1 libstdc++ Which gcc-13.2.1 though? That's a snapshot that could date from any time in the past four months. If I use gcc version 13.2.1 20231025 then clang compiles it. Anyway, the original GCC error is the same as PR 112642 which was apparently reduced to PR 111284, which does seem relevant.
[Bug tree-optimization/112661] [14] RISC-V ICE: in duplicate_and_interleave, at tree-vect-slp.cc:8025 with maxval_char_3.f90 vlen256b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 CC||rsandifo at gcc dot gnu.org Last reconfirmed||2023-11-22 Status|UNCONFIRMED |NEW --- Comment #4 from Richard Biener --- We are code-generating t.f90:1:12: note: node (constant) 0x53bc430 (max_nunits=1, refcnt=1) vector([8,8]) unsigned int t.f90:1:12: note: { 1, 1, 1, 1, 1 } during SLP node analysis we assume we can constant generate constants/externals as only consumers will determine the vector type. vectorizable_store doesn't verify it can generate the constant though. Instead we are checking this at SLP build time. We're using E_RVVM1SImode as base_vector_mode and count is 5. There's obviously no integer mode for size '5'. But it is a constant size vector so I wonder why we ask for can_duplicate_and_interleave_p at all, that is, how we arrive at vector([8,8]) for a constant size vinfo->vector_mode. At analysis time we do if ((dt == vect_constant_def || dt == vect_external_def) && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () && TREE_CODE (type) != BOOLEAN_TYPE && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type)) { see how we look at vinfo->vector_mode here.
[Bug c++/110158] Cannot use union with std::string inside in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110158 --- Comment #9 from Jonathan Wakely --- Odd, I thought I'd checked it when testing r14-4334-g28adad7a32ed92. Seems like the same issue as PR 112642 though (which has a minimized version without std::string).
[Bug target/111677] [12/13/14 Regression] darktable build on aarch64 fails with unrecognizable insn due to -fstack-protector changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111677 --- Comment #14 from Gianfranco --- Hello, any news for this issue?
[Bug c++/112666] Missed optimization: Value initialization zero-initializes members with user-defined constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112666 --- Comment #1 from Jonathan Wakely --- (In reply to Francisco Paisana from comment #0) > The struct "C" which is just "B" and an int is much slower at being > initialized than B when value initialization (via {}) is used. However, my > understanding of the C++ standard is that members with a user-defined > default constructor do not need to be zero-initialized in this situation. I think that's not quite right. Types with a user-provided default constructor will not be zero-initialized when value-init is used. B does have a user-provided default constructor, so value-init for an object of type B does not perform zero-init first. But that applies when constructing a complete B object, not when constructing a member subobject. C does not have a user-provided default constructor, so value-initialization means: "- the object is zero-initialized and the semantic constraints for default-initialization are checked, and if T has a non-trivial default constructor, the object is default-initialized;" So first it's zero-initialized, which means: "- if T is a (possibly cv-qualified) non-union class type, its padding bits (6.8.1) are initialized to zero bits and each non-static data member, each non-virtual base class subobject, and, if the object is not a base class subobject, each virtual base class subobject is zero-initialized;" This specifically says that *each non-static data member ... is zero-initialized." So the B subobject must be zero-initialized. That's not the same as when you value-init a B object. > Looking at the godbolt assembly output, I see that both `A a{}` and `C c{}` > generate a memset instruction, while `B b{}` doesn't. Clang, on the other > hand, seems to initialize C almost as fast as B. I don't know whether Clang considers the zero-init to be dead stores that are clobbered by B() and so can be eliminated, or something else. But my understanding of the standard is that requiring zero-init of B's members is very intentional here. > This potentially missed optimization in gcc is particularly nasty for > structs with large embedded storage (e.g. structs that contain C-arrays, > std::arrays, or static_vectors). Arguably, the problem here is that B has a default ctor that intentionally leaves members uninitialized. If you want to preserve that behaviour in types that contain a B subobject, then you also need to give those types (e.g. C in your example) a user-provided default ctor.
[Bug middle-end/112668] ICE in bitintlower0 while compiling bitint-42.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112668 --- Comment #1 from Jakub Jelinek --- Reduced: /* PR middle-end/112668 */ /* { dg-do compile { target bitint } } */ /* { dg-options "-std=c23 -fnon-call-exceptions" } */ #if __BITINT_MAXWIDTH__ >= 495 struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) c : 2; }; extern void foo (struct T495 *r495); int bar (void) { struct T495 r495[12]; foo (r495); for (int i = 0; i < 12; ++i) if (r495[i].b != 0uwb) return 1; return 0; } #endif
[Bug target/112669] GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669 --- Comment #1 from Thomas Schwinge --- Tracing through 'gcc/gcc.cc': 'build_search_list' -> 'for_each_path', I find: For '-march=gfx908', we have: (gdb) print multilib_dir $3 = 0x82e6c0 "gfx908" (gdb) print multilib_os_dir $4 = 0x82e6c0 "gfx908" For '-march=gfx906 -march=gfx908', we have: (gdb) print multilib_dir $3 = 0x0 (gdb) print multilib_os_dir $4 = 0x0 These are: /* Subdirectory to use for locating libraries. Set by set_multilib_dir based on the compilation options. */ static const char *multilib_dir; /* Subdirectory to use for locating libraries in OS conventions. Set by set_multilib_dir based on the compilation options. */ static const char *multilib_os_dir; Indeed, simpler: $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx908 gfx908 $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx906 -march=gfx908 . Instead of '.' (default), the latter should also print 'gfx908'. I'll look into 'set_multilib_dir' etc.
[Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879 --- Comment #4 from Jonathan Wakely --- Ah I think that's probably expected. In _M_realloc_insert (and now _M_realloc_append) we have: #if __cplusplus >= 201103L if _GLIBCXX17_CONSTEXPR (_S_use_relocate()) { // Relocation cannot throw. __new_finish = _S_relocate(__old_start, __position.base(), __new_start, _M_get_Tp_allocator()); ++__new_finish; __new_finish = _S_relocate(__position.base(), __old_finish, __new_finish, _M_get_Tp_allocator()); } else #endif and then an alternative path used for non-trivial types and for C++98. That alternative path does more work and probably can't be optimized as well, so the reads from _M_end_of_storage aren't optimized out. I think we can just use { target c++11 } for the test.
[Bug c++/112642] ranges::fold_left tries to access inactive union member of string in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112642 --- Comment #7 from Miro Palmu --- (In reply to Jonathan Wakely from comment #6) > The examples in comment 4 do compile using libstdc++ on clang, if you use > libstdc++ headers from after sept 29 (for trunk) or oct 21 (for gcc-13). I was testing this on compiler explorer on clang 17.0.1 and it used gcc-13.2.0 libstdc++. Also tried it locally with clang 16.0.6 with gcc-13.2.1 libstdc++ Output: $ cat prog.cpp #include #include int main() { [](std::string s = {}) consteval { std::string ss{ std::move(s) }; }(); } $ clang prog.cpp -std=c++2b -stdlib=libstdc++ prog.cpp:4:5: error: call to consteval function 'main()::(anonymous class)::operator()' is not a constant expression [](std::string s = {}) consteval { ^ /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/stl_construct.h:97:14: note: construction of subobject of member '_M_local_buf' of union with no active member is not allowed in a constant expression { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); } ^ /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/char_traits.h:272:6: note: in call to 'construct_at(_M_local_buf[0], s.._M_local_buf[0])' std::construct_at(__s1 + __i, __s2[__i]); ^ /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/char_traits.h:443:11: note: in call to 'copy(_M_local_buf[0], _M_local_buf[0], 1)' return __gnu_cxx::char_traits::copy(__s1, __s2, __n); ^ /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:672:6: note: in call to 'copy(_M_local_buf[0], _M_local_buf[0], 1)' traits_type::copy(_M_local_buf, __str._M_local_buf, ^ prog.cpp:5:21: note: in call to 'basic_string(s)' std::string ss{ std::move(s) }; ^ prog.cpp:4:5: note: in call to '&[](std::string s) { std::string ss{std::move(s)}; }->operator()(}}, _M_local_buf[0]}, 0, {._M_local_buf = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...}}})' [](std::string s = {}) consteval { ^ 1 error generated.
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 --- Comment #5 from Jakub Jelinek --- Thanks.
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 --- Comment #4 from Tamar Christina --- I've asked Matthew to take a look since he wrote the initial support.
[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598 --- Comment #6 from JuzheZhong --- Hi, there are these following run FAILs left on RV32/RV64 C/C++: after this patch fix: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637753.html FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects execution test This case I don't have a quick solution, so file a PR here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670 This FAIL may need Robin's help. Another FAIL is FAIL: gcc.dg/torture/pr58955-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test Li Pan from intel will handle this FAIL. So I am gonna move on zvl1024b. Btw, could you run zvl2048b, zvl4096b (We can only allow VLEN at most 4096bit for now) ? I didn't see a PR for these 2. Thanks.
[Bug c/112670] New: RISC-V: Run fail on pr65518.c with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670 Bug ID: 112670 Summary: RISC-V: Run fail on pr65518.c with -flto Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Case: #include #if VECTOR_BITS > 256 #define NINTS (VECTOR_BITS / 32) #else #define NINTS 8 #endif #define N (NINTS * 2) #define RESULT (NINTS * (NINTS - 1) / 2 * N + NINTS) typedef struct giga { unsigned int g[N]; } giga; unsigned long __attribute__((noinline,noclone)) addfst(giga const *gptr, int num) { unsigned int retval = 0; int i; for (i = 0; i < num; i++) retval += gptr[i].g[0]; return retval; } int main () { struct giga g[NINTS]; unsigned int n = 1; int i, j; for (i = 0; i < NINTS; ++i) for (j = 0; j < N; ++j) { g[i].g[j] = n++; __asm__ volatile (""); } assert (addfst (g, NINTS) == RESULT); return 0; } with -march=rv64gcv_zvfh_zfh_zvl512b -mabi=lp64d -O3 -fno-vect-cost-model The run passed. However, with -march=rv64gcv_zvfh_zfh_zvl512b -mabi=lp64d -O3 -fno-vect-cost-model -flto. It execution failed: bbl loader assertion "addfst (g, NINTS) == RESULT" failed: file "bug.c", line 38, function: main I compare the codegen, they are totally the same. I can't figure out what's the problem ?
[Bug target/112669] New: GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112669 Bug ID: 112669 Summary: GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: ams at gcc dot gnu.org, jules at gcc dot gnu.org Target Milestone: --- Target: GCN I've run into a weird issue when several different '-march=[...]' flags appear. This causes linking to fail: the linker tries to link in the wrong multilib's libraries. This happens, for example, if the user provides '-march=[...]' for libgomp offloading testing, but a test cases also specifies a specific '-march=[...]'. The problem might perhaps be in GCN multilib setup, however it doesn't seem related to the recent changes ("amdgcn: deprecate Fiji device and multilib"), as I'm also reproducing the issue with previous GCC release branches. The issue -- I suppose -- boils down to: No '-march=[...]' flag appears, default paths: $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs | sed -n -e "/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > default $ cat < default [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/ If one '-march=[...]' flag appears, we get those multilib paths prepended: $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx906 | sed -n -e "/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx906 $ diff -U1 default gfx906 --- default 2023-11-22 11:47:14.021018613 +0100 +++ gfx906 2023-11-22 11:47:21.856931965 +0100 @@ -1 +1,7 @@ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/gfx906/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/gfx906/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx906/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/gfx906/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/ Similarly, if the same '-march=[...]' flag appears twice, we get those multilib paths prepended: $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx908 -march=gfx908 | sed -n -e "/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx908 $ diff -U1 default gfx908 --- default 2023-11-22 11:47:14.021018613 +0100 +++ gfx908 2023-11-22 11:47:34.760789347 +0100 @@ -1 +1,7 @@ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/gfx908/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/../../../../../../amdgcn-amdhsa/lib/gfx908/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/gfx908/ +[...]/build-gcc-offload-amdgcn-amdhsa/gcc/../amdgcn-amdhsa/lib/gfx908/ [...]/build-gcc-offload-amdgcn-amdhsa/gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/accel/amdgcn-amdhsa/ However, if several different '-march=[...]' flags appear, we're back to the default: $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx906 -march=gfx908 | sed -n -e "/^libraries: =/{s%[^=]\+=%%;s%$PWD%[...]%g;s%:%\n%g;p}" > gfx906,gfx908 $ cmp default gfx906,gfx908 && echo 'no difference' no difference $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-search-dirs -march=gfx908 -march=gfx906 | sed -n -e
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #7 from Richard Biener --- I think a return of a negative value is predicted to be cold (aka "error"): ;; basic block 2, loop depth 0 ;;pred: ENTRY if (c == 14) goto ; [INV] else goto ; [INV] ;;succ: 3 ;;4 ;; basic block 3, loop depth 0 ;;pred: 2 D.2771 = -9; // predicted unlikely by early return (on trees) predictor. goto ; [INV]
[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #4 from Richard Biener --- We do use the alias oracle in folding memmove: /* If the destination and source do not alias optimize into memcpy as well. */ if ((is_gimple_min_invariant (dest) || TREE_CODE (dest) == SSA_NAME) && (is_gimple_min_invariant (src) || TREE_CODE (src) == SSA_NAME)) { ao_ref destr, srcr; ao_ref_init_from_ptr_and_size (, dest, len); ao_ref_init_from_ptr_and_size (, src, len); if (!refs_may_alias_p_1 (, , false)) { tree fn; fn = builtin_decl_implicit (BUILT_IN_MEMCPY); if (!fn) return false; but the issue is that test2 escapes which makes this conflict: # PT = null { D.2775 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 # USE = nonlocal escaped # CLB = nonlocal escaped test2_4 = __builtin_malloc (1000); # PT = nonlocal escaped null test.0_1 = test; __builtin_memmove (test2_4, test.0_1, 1000); it works for char *test, *test3; void copy_test () { char *test2 = __builtin_malloc (1000); __builtin_memmove (test2, test, 1000); __builtin_memmove (test3, test2, 1000); __builtin_free (test2); } where both memmove calls become memcpy. So this isn't asking for better folding but for better pointer analysis I guess.
[Bug target/112611] LoongArch: Test cases lsx-vshuf.c and lasx-xvshuf_b.c fails on LA664
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112611 --- Comment #4 from Xi Ruoyao --- (In reply to Jiahao Xu from comment #3) > We now consider it as undefined behavior rather than a bug for [x]vshuf > instructions. In vec_perm pattern, we use vector logical AND instructions to > perform modulo operations in order to correctly use the [x]vshuf > instructions. Therefore, we have decided to rewrite the two tests and ensure > that the index values in the selector do not exceed 64. I guess it would be better to also document this issue somewhere (extend.texi ?) and recommends to just use __builtin_shuffle instead of the intrinsic (unless the programmer knows the AND operation is not needed but the compiler does not).