[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-05-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545

--- Comment #6 from AK  ---
> We can use memchr to find a char in a range of signed char, or even to find 
> an int in a range of signed char, as long as we're careful about values.

+1, this approach should fix the bug i reported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115040

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-05-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #5 from AK  ---
> I think we're going to remove the manual loop unrolling in __find_if for GCC
> 15, which should allow the compiler to optimize it better, potentially
> auto-vectorizing. That might make memchr less advantageous, but I think it's
> worth doing anyway.

And even for code-size flags (-Os) memchr still gives best of both worlds as
auto-vectorizing increases the size.

[Bug tree-optimization/115041] New: Missed optimization opportunity in std::find of std::vector elements

2024-05-10 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115041

Bug ID: 115041
   Summary: Missed optimization opportunity in std::find of
std::vector elements
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://gcc.godbolt.org/z/s3hv15935


```
#include 
#include 
#include 

bool find_epi8(const std::vector& v) {
return std::find(v.begin(), v.end(), 42) != v.end();
}

bool find_epi32(const std::vector& v) {
return std::find(v.begin(), v.end(), 42) != v.end();
}
```

$ gcc -O3 -ftree-vectorize -march=pantherlake

```
find_epi8(std::vector > const&):
mov rcx, QWORD PTR [rdi+8]
mov rdx, QWORD PTR [rdi]
mov rsi, rcx
sub rsi, rdx
mov rax, rsi
sar rax, 2
testrax, rax
jle .L2
lea rax, [rdx+rax*4]
jmp .L8
.L3:
cmp BYTE PTR [rdx+1], 42
je  .L23
cmp BYTE PTR [rdx+2], 42
je  .L24
cmp BYTE PTR [rdx+3], 42
je  .L25
add rdx, 4
cmp rdx, rax
je  .L26
.L8:
cmp BYTE PTR [rdx], 42
jne .L3
.L21:
cmp rcx, rdx
setne   al
ret
.L26:
mov rsi, rcx
sub rsi, rdx
.L2:
cmp rsi, 2
je  .L9
cmp rsi, 3
je  .L10
cmp rsi, 1
je  .L11
xor eax, eax
ret
.L10:
cmp BYTE PTR [rdx], 42
je  .L21
add rdx, 1
.L9:
cmp BYTE PTR [rdx], 42
je  .L21
add rdx, 1
.L11:
cmp BYTE PTR [rdx], 42
seteal
cmp rcx, rdx
setne   dl
and eax, edx
ret
.L23:
add rdx, 1
cmp rcx, rdx
setne   al
ret
.L24:
add rdx, 2
cmp rcx, rdx
setne   al
ret
.L25:
add rdx, 3
cmp rcx, rdx
setne   al
ret
find_epi32(std::vector > const&):
mov rcx, QWORD PTR [rdi+8]
mov rdx, QWORD PTR [rdi]
mov rax, rcx
sub rax, rdx
mov rsi, rax
sar rax, 4
sar rsi, 2
testrax, rax
jle .L28
sal rax, 4
add rax, rdx
jmp .L34
.L29:
cmp DWORD PTR [rdx+4], 42
je  .L48
cmp DWORD PTR [rdx+8], 42
je  .L49
cmp DWORD PTR [rdx+12], 42
je  .L50
add rdx, 16
cmp rdx, rax
je  .L51
.L34:
cmp DWORD PTR [rdx], 42
jne .L29
.L47:
cmp rcx, rdx
setne   al
ret
.L51:
mov rsi, rcx
sub rsi, rdx
sar rsi, 2
.L28:
cmp rsi, 2
je  .L35
cmp rsi, 3
je  .L36
cmp rsi, 1
je  .L37
xor eax, eax
ret
.L36:
cmp DWORD PTR [rdx], 42
je  .L47
add rdx, 4
.L35:
cmp DWORD PTR [rdx], 42
je  .L47
add rdx, 4
.L37:
cmp DWORD PTR [rdx], 42
seteal
cmp rcx, rdx
setne   dl
and eax, edx
ret
.L48:
add rdx, 4
cmp rcx, rdx
setne   al
ret
.L49:
add rdx, 8
cmp rcx, rdx
setne   al
ret
.L50:
add rdx, 12
cmp rcx, rdx
setne   al
ret
```


clang lowers both the calls to (w)memchr

[Bug tree-optimization/107263] Memcpy not elided when initializing struct

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #3 from AK  ---
Seems like a duplicate of #59863 ?

[Bug middle-end/59863] const array in function is placed on stack

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59863

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #9 from AK  ---
*** Bug 114342 has been marked as a duplicate of this bug. ***

[Bug middle-end/114342] suboptimal codegen of vector::vector(range)

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

AK  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
Version|unknown |14.0
 Status|NEW |RESOLVED

--- Comment #3 from AK  ---
I see. marking as duplicate. Thanks for clarifying!

*** This bug has been marked as a duplicate of bug 59863 ***

[Bug c++/114342] New: suboptimal codegen of vector::vector(range)

2024-03-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

Bug ID: 114342
   Summary: suboptimal codegen of vector::vector(range)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include 

std::vector td() {
  int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,
15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10,
15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,};
  auto b = std::ranges::begin(arr);
  auto e = std::ranges::end(arr);
  std::vector dd(b, e);
  return dd;
}

What is the reason for calling `rep movsq` twice?

$ gcc -O3 -std=c++23
```
td():
pushrbp
mov esi, OFFSET FLAT:.LC0
mov ecx, 55
pxorxmm0, xmm0
pushrbx
mov rbx, rdi
sub rsp, 456
mov QWORD PTR [rbx+16], 0
mov rbp, rsp
movups  XMMWORD PTR [rbx], xmm0
mov rdi, rbp
rep movsq
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov edi, 444
calloperator new(unsigned long)
lea rdx, [rax+444]
mov QWORD PTR [rbx], rax
lea rdi, [rax+8]
mov rsi, rbp
mov QWORD PTR [rbx+16], rdx
mov rcx, QWORD PTR [rsp]
and rdi, -8
mov QWORD PTR [rax], rcx
mov rcx, QWORD PTR [rsp+436]
mov QWORD PTR [rax+436], rcx
sub rax, rdi
sub rsi, rax
add eax, 444
shr eax, 3
mov ecx, eax
mov rax, rbx
rep movsq
mov QWORD PTR [rbx+8], rdx
add rsp, 456
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L2
td() [clone .cold]:
.L2:
mov rdi, QWORD PTR [rbx]
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
testrdi, rdi
je  .L3
calloperator delete(void*, unsigned long)
.L3:
mov rdi, rbp
call_Unwind_Resume
```

https://godbolt.org/z/5333db8Px

[Bug c++/111806] g++ generates better code for variant at -Os compared to -O3

2023-10-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806

--- Comment #1 from AK  ---
It seems like we could 'sink' the 4 common instructions (of .L2) at -O3

L2:
add rsp, 48
xor eax, eax
pop rbx
ret


Or is it due to some kind of tail duplication?

[Bug c++/111806] New: g++ generates better code for variant at -Os compared to -O3

2023-10-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806

Bug ID: 111806
   Summary: g++ generates better code for variant at
-Os compared to -O3
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include
#include

int foo() {
std::variant v {"abc"};
std::cout << std::get<0>(v);
return 0;
}

g++ -O3 -std=c++20 -g0  -fno-exceptions

foo():
.LFB2484:
pushrbx
mov eax, 25185
mov edx, 3
mov edi, OFFSET FLAT:_ZSt4cout
sub rsp, 48
lea rbx, [rsp+16]
mov WORD PTR [rsp+16], ax
mov rsi, rbx
mov QWORD PTR [rsp], rbx
mov BYTE PTR [rsp+18], 99
mov QWORD PTR [rsp+8], 3
mov BYTE PTR [rsp+19], 0
mov BYTE PTR [rsp+32], 0
callstd::basic_ostream >&
std::__ostream_insert >(std::basic_ostream >&, char const*, long)
cmp BYTE PTR [rsp+32], 0
je  .L5
.L2:
add rsp, 48
xor eax, eax
pop rbx
ret
.L5:
mov rdi, QWORD PTR [rsp]
cmp rdi, rbx
je  .L2
mov rax, QWORD PTR [rsp+16]
lea rsi, [rax+1]
calloperator delete(void*, unsigned long)
add rsp, 48
xor eax, eax
pop rbx
ret
.LFE2484:


g++ -Os -std=c++20 -g0  -fno-exceptions


foo():
.LFB2463:
pushrbx
mov edx, 3
mov edi, OFFSET FLAT:_ZSt4cout
sub rsp, 48
lea rbx, [rsp+24]
mov WORD PTR [rsp+24], 25185
mov rsi, rbx
mov QWORD PTR [rsp+8], rbx
mov BYTE PTR [rsp+26], 99
mov QWORD PTR [rsp+16], 3
mov BYTE PTR [rsp+27], 0
mov BYTE PTR [rsp+40], 0
callstd::basic_ostream >&
std::__ostream_insert >(std::basic_ostream >&, char const*, long)
cmp BYTE PTR [rsp+40], 0
jne .L2
mov rdi, QWORD PTR [rsp+8]
cmp rdi, rbx
je  .L2
mov rax, QWORD PTR [rsp+24]
lea rsi, [rax+1]
calloperator delete(void*, unsigned long)
.L2:
add rsp, 48
xor eax, eax
pop rbx
ret
.LFE2463:


https://godbolt.org/z/3xKh35Mrv

[Bug c++/111805] New: suboptimal codegen of variant

2023-10-13 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111805

Bug ID: 111805
   Summary: suboptimal codegen of variant
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include

std::string foo() {
std::variant v {"abc"};
return std::get<0>(v);
}

g++-13.2 -O2 -std=c++20


foo[abi:cxx11]():
lea rdx, [rdi+16]
mov BYTE PTR [rdi+18], 99
mov rax, rdi
mov QWORD PTR [rdi], rdx
mov edx, 25185
mov WORD PTR [rdi+16], dx
mov QWORD PTR [rdi+8], 3
mov BYTE PTR [rdi+19], 0
ret


clang++ -O2 -std=c++20

foo():# @foo()
mov rax, rdi
mov byte ptr [rdi], 6
mov dword ptr [rdi + 1], 6513249
ret


https://godbolt.org/z/nTv5rYanM

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-15 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #6 from AK  ---
To confirm what Andrew mentioned, the release build (-O3) built successfully.

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

AK  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |MOVED

--- Comment #5 from AK  ---
Created: https://sourceware.org/bugzilla/show_bug.cgi?id=30855

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #4 from AK  ---
good catch. By mistake i built at -O0, i wanted to build at -O3.

[Bug c/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #1 from AK  ---
I got this error while building clang (ninja clang) on a riscv machine.

root@lpi4a:~# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper
Target: riscv64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d
--enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu
--target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=32
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Debian 13.1.0-6) 

--

root@lpi4a:~# uname -a
Linux lpi4a 5.10.113-g7b352f5ac2ba #1 SMP PREEMPT Wed Apr 12 12:06:11 UTC 2023
riscv64 GNU/Linux

[Bug c/111420] New: relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

Bug ID: 111420
   Summary: relocation truncated to fit: R_RISCV_JAL against
`.L12287'
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

CGBuiltin.cpp:(.text._ZN5clang7CodeGen15CodeGenFunction20EmitRISCVBuiltinExprEjPKNS_8CallExprENS0_15ReturnValueSlotE+0x10d0):
relocation truncated to fit: R_RISCV_JAL against `.L12287'


command:


: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition
-fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough
-Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move
-Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor
-Wsuggest-override -Wno-comment -Wno-misleading-indentation
-Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections
-fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing 
-Wl,-z,defs -Wl,-z,nodelete  
-Wl,-rpath-link,/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/./lib
 -Wl,--gc-sections -shared -Wl,-soname,libclangCodeGen.so.18git -o
lib/libclangCodeGen.so.18git
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfoImpl.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGAtomic.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBlocks.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBuiltin.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDANV.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDARuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXXABI.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCall.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGClass.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCleanup.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCoroutine.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDebugInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDecl.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDeclCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGException.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExpr.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprAgg.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprComplex.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprConstant.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprScalar.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGGPUBuiltin.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGHLSLRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGLoopInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGNonTrivialStruct.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjC.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCGNU.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCMac.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenCLRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntimeGPU.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGRecordLayoutBuilder.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmt.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmtOpenMP.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTT.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTables.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenABITypes.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenAction.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenFunction.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenPGO.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTBAA.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTypes.cpp.o

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #9 from AK  ---
i think it is okay to close this bug as this doesn't seem to be related to gcc.

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-13 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #8 from AK  ---

> this does seem like a HW issue. Are you sure you have a decent RISCV machine 
> without any memory issues?
> I suspect ninja is building with all of the cores which pushes the memory 
> usage high.

possible. I have the https://sipeed.com/licheepi4a (licheepi 4a board)


> Maybe lower the clock speed of the CPU you are using.

will do. thanks

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #5 from AK  ---
Created attachment 55890
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55890=edit
GlobalModuleIndex.cpp preprocessed files

Everytime the crash is in a different file. it could be just because of memory
issues.

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #3 from AK  ---
gcc -v

COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper
Target: riscv64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d
--enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu
--target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=32
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Debian 13.1.0-6) 
root@lpi4a:/media/root/d2fc9f48-c166-4a9

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #1 from AK  ---
oot/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build# ninja clang
check-clang
[100/845] Building CXX object
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
FAILED:
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
 
/usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE
-D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/lib/Serialization
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include
-fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time
-fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings
-Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long
-Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull
-Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move
-Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment
-Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG
-std=c++17 -MD -MT
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
-MF
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o.d
-o
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
-c
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp
In file included from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMapInfo.h:20,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMap.h:17,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include/clang/Serialization/GlobalModuleIndex.h:18,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp:13:
/usr/include/c++/13/tuple: In instantiation of ‘struct std::_Tuple_impl<0,
clang::ModuleFileExtensionReader*,
std::default_delete >’:
/usr/include/c++/13/tuple:1232:11:   required from ‘class
std::tuple >’
/usr/include/c++/13/bits/unique_ptr.h:232:27:   required from ‘class
std::__uniq_ptr_impl >’
/usr/include/c++/13/bits/unique_ptr.h:239:12:   required from ‘struct
std::__uniq_ptr_data, true, true>’
/usr/include/c++/13/bits/unique_ptr.h:283:33:   required from ‘class
std::unique_ptr’
/usr/include/c++/13/bits/stl_vector.h:367:35:   required from
‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp =
std::unique_ptr; _Alloc =
std::allocator >]’
/usr/include/c++/13/bits/stl_vector.h:528:7:   required from here
/usr/include/c++/13/tuple:269:7: internal compiler error: Segmentation fault
  269 |   _M_head(_Tuple_impl& __t) noexcept { return _Base::_M_head(__t);
}
  |   ^~~
0x85d7c5 crash_signal
../../src/gcc/toplev.cc:314
0xa0d5e0 profile_count::operator==(profile_count const&) const
../../src/gcc/profile-count.h:865
0xa0d5e0 profile_count::apply_probability(profile_probability) const
../../src/gcc/profile-count.h:1104
0xa0d5e0 edge_def::count() const
../../src/gcc/basic-block.h:639
0xa0d5e0 eliminate_tail_call
../../src/gcc/tree-tailcall.cc:982
0xa0d5e0 optimize_tail_call
../../src/gcc/tree-tailcall.cc:1053
0xa0d5e0 tree_optimize_tail_calls_1
../../src/gcc/tree-tailcall.cc:1193
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/111393] New: ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

Bug ID: 111393
   Summary: ICE: Segmentation fault src/gcc/toplev.cc:314
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

riscv64-gnu-linux (version Debian 13.1) building llvm-project
(GlobalModuleIndex.cpp) crashed with ICE.

src/gcc/toplev.cc:314
profile_count::operator==(proile_count const&) const
 ../../src/gcc/profile-count.h:865
profile_count::apply_probability(proile_probability) const
 ../../src/gcc/profile-count.h:1104

[Bug c++/110909] New: Suboptimal codegen in vector copy assignment

2023-08-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909

Bug ID: 110909
   Summary: Suboptimal codegen in vector copy assignment
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

using Container = std::vector;
int copy_assignment(const Container , Container ) {
  v2 = v1;
  return 0;
}


I'd expect this to only generate a memcpy. but i'm not sure why memmoves are
generated?

$ gcc -std=c++2a -O3  -fno-exceptions

copy_assignment(std::vector > const&, std::vector >&):
cmp rsi, rdi
je  .L21
pushr13
pushr12
pushrbp
mov rbp, rdi
pushrbx
mov rbx, rsi
sub rsp, 8
mov rax, QWORD PTR [rdi+8]
mov r13, QWORD PTR [rdi]
mov rdx, QWORD PTR [rsi+16]
mov rdi, QWORD PTR [rsi]
mov r12, rax
sub r12, r13
sub rdx, rdi
cmp rdx, r12
jb  .L25
mov rcx, QWORD PTR [rsi+8]
mov rdx, rcx
sub rdx, rdi
cmp rdx, r12
jnb .L26
cmp rdx, 4
jle .L12
mov rsi, r13
callmemmove
mov rcx, QWORD PTR [rbx+8]
mov rdi, QWORD PTR [rbx]
mov rax, QWORD PTR [rbp+8]
mov r13, QWORD PTR [rbp+0]
mov rdx, rcx
sub rdx, rdi
.L13:
lea rsi, [r13+0+rdx]
sub rax, rsi
mov rdx, rax
cmp rax, 4
jle .L14
mov rdi, rcx
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
.L8:
mov QWORD PTR [rbx+8], rax
add rsp, 8
xor eax, eax
pop rbx
pop rbp
pop r12
pop r13
ret
.L21:
xor eax, eax
ret
.L25:
movabs  rax, 9223372036854775804
cmp rax, r12
jb  .L27
mov rdi, r12
calloperator new(unsigned long)
mov rbp, rax
cmp r12, 4
jle .L5
mov rdx, r12
mov rsi, r13
mov rdi, rax
callmemcpy
.L6:
mov rdi, QWORD PTR [rbx]
testrdi, rdi
je  .L7
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
calloperator delete(void*, unsigned long)
.L7:
lea rax, [rbp+0+r12]
mov QWORD PTR [rbx], rbp
mov QWORD PTR [rbx+16], rax
jmp .L8
.L26:
cmp r12, 4
jle .L10
mov rdx, r12
mov rsi, r13
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
jmp .L8
.L14:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [rsi]
mov DWORD PTR [rcx], edx
jmp .L8
.L12:
jne .L13
mov esi, DWORD PTR [r13+0]
mov DWORD PTR [rdi], esi
jmp .L13
.L10:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [r13+0]
mov DWORD PTR [rdi], edx
jmp .L8
.L5:
mov eax, DWORD PTR [r13+0]
mov DWORD PTR [rbp+0], eax
jmp .L6
.L27:
callstd::__throw_bad_array_new_length()



Ideally, the above C++ code should translate to an equivalent of the following
C++ code:

using Container = std::vector;
int copy_assignment(const Container , Container ) {
  v2.reserve(v1.size());
  std::memcpy([0], [0], v1.size()*sizeof(int));
  // change the size: v2.size() = v1.size()
  return 0;
}

[Bug c++/110137] implement clang -fassume-sane-operator-new

2023-08-01 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137

--- Comment #3 from AK  ---
1. clang also has noalias on nothrow versions of operator new. will
`-fassume-sane-operator-new` enable that as well?

2. as per: http://eel.is/c++draft/basic.stc.dynamic#allocation-2 

"""If the request succeeds, the value returned by a replaceable allocation
function is a non-null pointer value ([basic.compound]) p0 different from any
previously returned value p1, unless that value p1 was subsequently passed to a
replaceable deallocation function."""

Does this mean that all successful new allocations can be assumed to be a
noalias as long as the pointer wasn't passed to a deallocation function? In
that case when possible, can the compiler `infer` from a bottom-up analysis
that an allocation is a noalias?

[Bug tree-optimization/110819] Missed optimization: when vector's size is 0 but vector::reserve has been called.

2023-07-28 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819

--- Comment #2 from AK  ---
> When compiled with clang, libstdc++'s std::vector uses __builtin_operator_new 
> which always has the -fassume-sane-operator-new semantics, and so can be 
> optimized.

yes clang optimizes with libstdc++ as well. what can be done in gcc for it to
detect that the new+delete pair can be optimized away?

[Bug c++/110819] New: Missed optimization: when vector size is 0 but vector::reserve has been called.

2023-07-26 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819

Bug ID: 110819
   Summary: Missed optimization: when vector size is 0 but
vector::reserve has been called.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

void f(int);

void use_idx_const_size_reserve() {
std::vector v;
v.reserve(10);
auto s = v.size();
for (std::vector::size_type i = 0; i < s; i++)
f(v[i]);
}

$ g++ -O3

use_idx_const_size_reserve():
sub rsp, 8
mov edi, 40
calloperator new(unsigned long)
mov esi, 40
add rsp, 8
mov rdi, rax
jmp operator delete(void*, unsigned long)


$ clang++ -O3 -stdlib=libc++

use_idx_const_size_reserve():# @use_idx_const_size_reserve()
ret

[Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2023-06-15 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442

--- Comment #17 from AK  ---
With recent changes in libc++ (https://reviews.llvm.org/D147741) clang
optimizes away the new-delete pair. https://godbolt.org/z/a6PG54Pvb

$ clang++ -O3 -stdlib=libc++ -fno-exceptions

vat1(std::__1::vector >): #
@vat1(std::__1::vector >)
  sub rsp, 24
  xorps xmm0, xmm0
  movaps xmmword ptr [rsp], xmm0
  mov qword ptr [rsp + 16], 0
  mov rax, qword ptr [rdi + 8]
  sub rax, qword ptr [rdi]
  je .LBB0_2
  js .LBB0_3
.LBB0_2:
  mov eax, 10
  add rsp, 24
  ret
.LBB0_3:
  mov rdi, rsp
  call std::__1::vector
>::__throw_length_error[abi:v17]() const
.L.str:
  .asciz "vector"

.L.str.1:
  .asciz "length_error was thrown in -fno-exceptions mode with message \"%s\""


Previously clang couldn't even convert the copy to a memmove and would generate
a raw loop e.g., https://godbolt.org/z/G8ax1o5bc

.LBB0_6: # =>This Inner Loop Header: Depth=1
  movups xmm0, xmmword ptr [r15 + 4*rdi]
  movups xmm1, xmmword ptr [r15 + 4*rdi + 16]
  movups xmmword ptr [rax + 4*rdi], xmm0
  movups xmmword ptr [rax + 4*rdi + 16], xmm1
  add rdi, 8
  cmp rsi, rdi
  jne .LBB0_6
  cmp rbx, rsi
  jne .LBB0_8
  jmp .LBB0_9
.LBB0_3:

[Bug c++/109443] missed optimization of std::vector access (Related to issue 35269)

2023-06-15 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

--- Comment #17 from AK  ---
Even after vector::size() is hoisted, the codegen is sub-optimal compared to
iterator version.

```
void use_idx_const_size(std::vector v) {
auto s = v.size();
for (std::vector::size_type i = 0; i < s; i++)
f(v[i]);
}
```

$ g++ -O3

use_idx_const_size(std::vector >):
pushr12
pushrbp
pushrbx
mov rdx, QWORD PTR [rdi+8]
mov rax, QWORD PTR [rdi]
mov r12, rdx
sub r12, rax
sar r12, 2
cmp rax, rdx
je  .L1
mov rbp, rdi
xor ebx, ebx
jmp .L3
.L6:
mov rax, QWORD PTR [rbp+0]
.L3:
mov edi, DWORD PTR [rax+rbx*4]
add rbx, 1
callf(int)
cmp rbx, r12
jb  .L6
.L1:
pop rbx
pop rbp
pop r12
ret

It seems compiler is assuming that vector `v` is not loop-invariant?

[Bug target/100811] Consider not omitting frame pointers by default on targets with many registers

2023-05-25 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811

--- Comment #8 from AK  ---
Should we enable frame-pointers by default for RISCV64 as well?

[Bug target/100811] Consider not omitting frame pointers by default on targets with many registers

2023-05-25 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #4 from AK  ---
On AArch64 (typically mobile platforms) app developers typically would enable
frame pointers by default because it helps with crash reporting.

[Bug c++/87628] Redundant check of pointer when operator delete is called

2023-05-17 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #6 from AK  ---
Opened a bug for clang as well:
https://github.com/llvm/llvm-project/issues/62783

[Bug c++/87628] Redundant check of pointer when operator delete is called

2023-05-17 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #5 from AK  ---
As per: https://en.cppreference.com/w/cpp/memory/new/operator_delete
"""
In all cases, if ptr is a null pointer, the standard library deallocation
functions do nothing. If the pointer passed to the standard library
deallocation function was not obtained from the corresponding standard library
allocation function, the behavior is undefined.
"""

So it should be fine to remove the check `if(p)`

[Bug tree-optimization/109441] missed optimization when all elements of vector are known

2023-05-17 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

--- Comment #3 from AK  ---
> But IMHO it's academic, right?

yes. i was just messing with vector codegen. But in case all the elements of a
vector/array are same, maybe the loop can be replaced with equivalent
computation?

[Bug tree-optimization/35269] missed optimization of std::vector access.

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #2 from AK  ---
I posted a revised version of this bug here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

[Bug tree-optimization/109443] missed optimization of std::vector access (Related to issue 35269)

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

--- Comment #1 from AK  ---
Link to issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269 where I
derived the testcase from.

[Bug tree-optimization/109443] New: missed optimization of std::vector access (Related to issue 35269)

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

Bug ID: 109443
   Summary: missed optimization of std::vector access (Related to
issue 35269)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

here is slightly modified code example from issue #35269. Both accesses are
similar bug different code is generated. the function `h` has better codegen
than `g` for some reason.


$ g++ -O3 -std=c++20 -fno-exceptions

void f(int);

void g(std::vector v)
{
for (std::vector::size_type i = 0; i < v.size(); i++)
f( v[ i ] );
}

void h(std::vector v)
{
for (std::vector::const_iterator i = v.begin(); i != v.end(); ++i)
f( *i );
}


g(std::vector >):
mov rdx, QWORD PTR [rdi]
cmp QWORD PTR [rdi+8], rdx
je  .L6
pushrbp
mov rbp, rdi
pushrbx
xor ebx, ebx
sub rsp, 8
.L3:
mov edi, DWORD PTR [rdx+rbx*4]
add rbx, 1
callf(int)
mov rdx, QWORD PTR [rbp+0]
mov rax, QWORD PTR [rbp+8]
sub rax, rdx
sar rax, 2
cmp rbx, rax
jb  .L3
add rsp, 8
pop rbx
pop rbp
ret
.L6:
ret



h(std::vector >):
pushrbp
pushrbx
sub rsp, 8
mov rbx, QWORD PTR [rdi]
cmp rbx, QWORD PTR [rdi+8]
je  .L10
mov rbp, rdi
.L12:
mov edi, DWORD PTR [rbx]
add rbx, 4
callf(int)
cmp QWORD PTR [rbp+8], rbx
jne .L12
.L10:
add rsp, 8
pop rbx
pop rbp
ret

[Bug tree-optimization/109442] New: Dead local copy of std::vector not removed from function

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442

Bug ID: 109442
   Summary: Dead local copy of std::vector not removed from
function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

T vat1(std::vector v1) {
auto v = v1;
return 10;
}

g++ -O3 -std=c++20 -fno-exceptions

vat1(std::vector >):
mov rax, QWORD PTR [rdi+8]
sub rax, QWORD PTR [rdi]
je  .L11
pushrbp
mov rbp, rax
movabs  rax, 9223372036854775804
pushrbx
sub rsp, 8
cmp rax, rbp
jb  .L15
mov rbx, rdi
mov rdi, rbp
calloperator new(unsigned long)
mov rsi, QWORD PTR [rbx]
mov rdx, QWORD PTR [rbx+8]
mov rdi, rax
sub rdx, rsi
cmp rdx, 4
jle .L16
callmemmove
mov rdi, rax
.L6:
mov rsi, rbp
calloperator delete(void*, unsigned long)
add rsp, 8
mov eax, 10
pop rbx
pop rbp
ret
.L11:
mov eax, 10
ret
.L15:
callstd::__throw_bad_array_new_length()
.L16:
jne .L6
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
jmp .L6

[Bug tree-optimization/109441] missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

--- Comment #1 from AK  ---
I guess a better test case is this:

#include
using namespace std;

using T = int;


T v(std::vector v) {
T s;
std::fill(v.begin(), v.end(), T());
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}

which has similar effect.

$ g++ -O3 -std=c++17

v(std::vector >):
pushrbp
pushrbx
sub rsp, 8
mov rbp, QWORD PTR [rdi+8]
mov rcx, QWORD PTR [rdi]
cmp rcx, rbp
je  .L7
sub rbp, rcx
mov rdi, rcx
xor esi, esi
mov rbx, rcx
mov rdx, rbp
callmemset
mov rdi, rbp
mov edx, 1
mov rcx, rbx
sar rdi, 2
testrbp, rbp
cmovne  rdx, rdi
cmp rbp, 12
jbe .L8
mov rax, rdx
pxorxmm0, xmm0
shr rax, 2
sal rax, 4
add rax, rbx
.L4:
movdqu  xmm2, XMMWORD PTR [rbx]
add rbx, 16
paddd   xmm0, xmm2
cmp rbx, rax
jne .L4
movdqa  xmm1, xmm0
psrldq  xmm1, 8
paddd   xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 4
paddd   xmm0, xmm1
movdeax, xmm0
testdl, 3
je  .L1
and rdx, -4
mov esi, edx
.L3:
add eax, DWORD PTR [rcx+rdx*4]
lea edx, [rsi+1]
movsx   rdx, edx
cmp rdx, rdi
jnb .L1
add esi, 2
lea r8, [0+rdx*4]
add eax, DWORD PTR [rcx+rdx*4]
movsx   rsi, esi
cmp rsi, rdi
jnb .L1
add eax, DWORD PTR [rcx+4+r8]
.L1:
add rsp, 8
pop rbx
pop rbp
ret
.L7:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L8:
xor eax, eax
xor esi, esi
xor edx, edx
jmp .L3

[Bug tree-optimization/109441] New: missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

Bug ID: 109441
   Summary: missed optimization when all elements of vector are
known
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Reference: https://godbolt.org/z/af4x6zhz9

When all elements of vector are 0, then the compiler should be able to remove
the loop and just return 0.

Testcase:

#include
using namespace std;

using T = int;


T v() {
T s;
std::vector v;
v.resize(1000, 0);
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}



$ g++ -O3 -std=c++17

.LC0:
  .string "vector::_M_fill_insert"
v():
  push rbx
  pxor xmm0, xmm0
  mov edx, 1000
  xor esi, esi
  sub rsp, 48
  lea rcx, [rsp+12]
  lea rdi, [rsp+16]
  mov QWORD PTR [rsp+32], 0
  mov DWORD PTR [rsp+12], 0
  movaps XMMWORD PTR [rsp+16], xmm0
  call std::vector
>::_M_fill_insert(__gnu_cxx::__normal_iterator > >, unsigned long, int const&)
  mov rdx, QWORD PTR [rsp+24]
  mov rdi, QWORD PTR [rsp+16]
  mov rax, rdx
  sub rax, rdi
  mov rsi, rax
  sar rsi, 2
  cmp rdx, rdi
  je .L99
  test rax, rax
  mov ecx, 1
  cmovne rcx, rsi
  cmp rax, 12
  jbe .L107
  mov rdx, rcx
  pxor xmm0, xmm0
  mov rax, rdi
  shr rdx, 2
  sal rdx, 4
  add rdx, rdi
.L101:
  movdqu xmm2, XMMWORD PTR [rax]
  add rax, 16
  paddd xmm0, xmm2
  cmp rdx, rax
  jne .L101
  movdqa xmm1, xmm0
  psrldq xmm1, 8
  paddd xmm0, xmm1
  movdqa xmm1, xmm0
  psrldq xmm1, 4
  paddd xmm0, xmm1
  movd ebx, xmm0
  test cl, 3
  je .L99
  and rcx, -4
  mov eax, ecx
.L100:
  lea edx, [rax+1]
  add ebx, DWORD PTR [rdi+rcx*4]
  movsx rdx, edx
  cmp rdx, rsi
  jnb .L99
  add eax, 2
  lea rcx, [0+rdx*4]
  add ebx, DWORD PTR [rdi+rdx*4]
  cdqe
  cmp rax, rsi
  jnb .L99
  add ebx, DWORD PTR [rdi+4+rcx]
.L99:
  test rdi, rdi
  je .L98
  mov rsi, QWORD PTR [rsp+32]
  sub rsi, rdi
  call operator delete(void*, unsigned long)
.L98:
  add rsp, 48
  mov eax, ebx
  pop rbx
  ret
.L107:
  xor eax, eax
  xor ecx, ecx
  jmp .L100
  mov rbx, rax
  jmp .L105
v() [clone .cold]:

[Bug tree-optimization/109440] New: Missed optimization of vector::at when a function is called inside the loop

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440

Bug ID: 109440
   Summary: Missed optimization of vector::at when a function is
called inside the loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include
using namespace std;

bool bar();

using T = int;

T vat(std::vector v) {
T s;
for (auto i = 0; i < v.size(); ++i) {
if (bar())
s += v.at(i);
}

return s;
}


$ gcc -O2 -fexceptions -fno-unroll-loops


.LC0:
.string "vector::_M_range_check: __n (which is %zu) >= this->size()
(which is %zu)"
vat(std::vector >):
mov rax, QWORD PTR [rdi]
cmp QWORD PTR [rdi+8], rax
je  .L9
pushr12
pushrbp
mov rbp, rdi
pushrbx
xor ebx, ebx
jmp .L6
.L14:
mov rax, QWORD PTR [rbp+8]
sub rax, QWORD PTR [rbp+0]
add rbx, 1
sar rax, 2
cmp rbx, rax
jnb .L13
.L6:
callbar()
testal, al
je  .L14
mov rcx, QWORD PTR [rbp+0]
mov rdx, QWORD PTR [rbp+8]
sub rdx, rcx
sar rdx, 2
mov rax, rdx
cmp rbx, rdx
jnb .L15
add r12d, DWORD PTR [rcx+rbx*4]
add rbx, 1
cmp rbx, rax
jb  .L6
.L13:
mov eax, r12d
pop rbx
pop rbp
pop r12
ret
.L9:
mov eax, r12d
ret
.L15:
mov rsi, rbx
mov edi, OFFSET FLAT:.LC0
xor eax, eax
callstd::__throw_out_of_range_fmt(char const*, ...)

[Bug tree-optimization/108915] invalid pointer access preserved in optimized code

2023-03-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

--- Comment #6 from AK  ---
For reference, I had opened a related bug in clang:
https://github.com/llvm/llvm-project/issues/60967

[Bug c++/109017] ICE on unexpanded pack from C++20 explicit-template-parameter lambda syntax

2023-03-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109017

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #1 from AK  ---
Example from twitter: https://twitter.com/seanbax/status/1631689332007337985
which had discussion on similar bug.

```
template struct outer1_t {
  void g() {
// Compiles for mysterious reasons.
int array[] { [](){
  int i = Is2;
  return i;
}.template operator()() ... };
  }
};

int main() {
  // Compiles OKAY when this is commented out.
  // ICEs when it's compiled.
  outer1_t<1, 5, 10>().g();
}
```

clang issues a compiler error: https://godbolt.org/z/7f6E55svM

```
:6:15: error: initializer contains unexpanded parameter pack 'Is2'
  int i = Is2;
```

[Bug tree-optimization/108915] invalid pointer access preserved in optimized code

2023-02-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

AK  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #4 from AK  ---
Adding `__attribute__((used))` also fixed it. Does it reflect the same behavior
as using `asm` as you suggested?

[Bug c/108915] New: invalid pointer access preserved in optimized code

2023-02-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

Bug ID: 108915
   Summary: invalid pointer access preserved in optimized code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Testcase has been reduced from u-boot's linker-list macro:
https://github.com/u-boot/u-boot/blob/master/include/linker_lists.h#L127


#include

char* bar() {
static char start_bar[0] __attribute__((aligned(16)))
   __attribute__((unused))
   __attribute__((section("__u_boot_list_2_1")));
char *p = (char *)start_bar;
for (int i = p[0]; i < p[9]; i++)
printf("asdfasd");
return 0;
}



$ gcc -O3 -fno-unroll-loops -S -o -

.LC0:
.string "asdfasd"
bar:
pushrbx
movsx   eax, BYTE PTR start_bar.1[rip+9]
movsx   ebx, BYTE PTR start_bar.1[rip]
cmp ebx, eax
jge .L2
.L3:
mov edi, OFFSET FLAT:.LC0
xor eax, eax
add ebx, 1
callprintf
movsx   eax, BYTE PTR start_bar.1[rip+9]
cmp eax, ebx
jg  .L3
.L2:
xor eax, eax
pop rbx
ret

-
$ clang -O3 -fno-unroll-loops -S -o -

bar:# @bar
xor eax, eax
ret

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

AK  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #7 from AK  ---
not a bug

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

--- Comment #5 from AK  ---
Is this the definition of throw_bad_cast?

https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/rtti.c#L221

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

--- Comment #4 from AK  ---
I wasn't sure if this is expected. Thanks for clarifying.

[Bug c++/107335] New: call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

Bug ID: 107335
   Summary: call to throw_bad_cast even with -fno-exceptions
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Testcase:
#include

void foo() {
std::cout << std::endl;
}

$ g++ -std=c++17 -O3 -fno-exceptions

```asm
foo():
mov rax, QWORD PTR std::cout[rip]
pushrbx
mov rax, QWORD PTR [rax-24]
mov rbx, QWORD PTR std::cout[rax+240]
testrbx, rbx
je  .L10
cmp BYTE PTR [rbx+56], 0
je  .L5
movsx   esi, BYTE PTR [rbx+67]
.L6:
mov edi, OFFSET FLAT:std::cout
callstd::basic_ostream >::put(char)
pop rbx
mov rdi, rax
jmp std::basic_ostream >::flush()
.L5:
mov rdi, rbx
callstd::ctype::_M_widen_init() const
mov rax, QWORD PTR [rbx]
mov esi, 10
mov rax, QWORD PTR [rax+48]
cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc
je  .L6
mov rdi, rbx
callrax
movsx   esi, al
jmp .L6
.L10: 
callstd::__throw_bad_cast() <--- call to __throw_bad_cast
_GLOBAL__sub_I_foo():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
callstd::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
```

[Bug tree-optimization/85611] Suboptimal code generation for (potentially) redundant atomic loads

2022-09-29 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611

AK  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from AK  ---
Don't remember what I was expecting.

[Bug rtl-optimization/107063] New: [X86_64 codegen] Using inc eax instead of inc dword ptr

2022-09-27 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107063

Bug ID: 107063
   Summary: [X86_64 codegen] Using inc eax instead of inc dword
ptr
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

int volatile gv = 0;

void foo() {
++gv;
}


$ gcc -Os

foo():
mov eax, DWORD PTR gv[rip]
inc eax
mov DWORD PTR gv[rip], eax
ret
gv:
.zero   4


$ clang -Os

foo():# @foo()
inc dword ptr [rip + gv]
ret
gv:
.long   0   


https://godbolt.org/z/vzq4jr5vj

[Bug tree-optimization/107011] instruction with undefined behavior not optimized away

2022-09-22 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011

--- Comment #2 from AK  ---
ah ok. sorry for the noise.

[Bug tree-optimization/107011] New: instruction with undefined behavior not optimized away

2022-09-22 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011

Bug ID: 107011
   Summary: instruction with undefined behavior not optimized away
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

int main() {
return INT_MIN / -1;
}


$ gcc -O3

main:
mov eax, -2147483648
ret


$ clang -O3

main:   # @main
ret


https://godbolt.org/z/393EMqs1E

PS: I reported this bug yesterday as well but for some reason it does not
appear in bugzilla so I'm creating another one.

[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565

--- Comment #2 from AK  ---
clang has `-finstrument-function-entry-bare` to this effect: 
https://reviews.llvm.org/D40276

[Bug tree-optimization/107005] New: gcc not exploiting undefined behavior to optimize away the result of division

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107005

Bug ID: 107005
   Summary: gcc not exploiting undefined behavior to optimize away
the result of division
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include 
int main() { return INT_MIN / -1; }

gcc -O2

main:
mov eax, -2147483648
ret


clang -O2

main:   # @main
ret



https://godbolt.org/z/Tjxx3KGdK

[Bug ipa/106991] new+delete pair not optimized by g++ at -O3 but optimized at -Os

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991

--- Comment #3 from AK  ---
Thanks for identifying the underlying issue @Jan 
After modifying the definition of operator delete. gcc does optimize it at -O3
as well.

https://godbolt.org/z/1WPqaWrEr

// source code
#include
#include

int volatile gv = 0;

void* operator new(long unsigned sz ) {
++gv;
return malloc( sz );
}

void operator delete(void *p, unsigned long) noexcept {
--gv;
free(p);
}

class c {
int l;
public:
c() : l(0) {}
int get(){ return l; }
};

int caller( void ){
c *f = new c();
assert( f->get() == 0 );
delete f;
return gv;
}

$ $ g++ -std=c++20 -O3

```
operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*, unsigned long):
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
jmp free
caller():
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
mov eax, DWORD PTR gv[rip]
ret
gv:
.zero   4
```

[Bug c++/87628] Redundant check of pointer when operator delete is called

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #4 from AK  ---
Seems like clang now added the check:

$ clang++ -Oz -fno-exceptions

if_delete(char*): # @if_delete(char*)
testrdi, rdi
jne operator delete(void*)@PLT  # TAILCALL
ret

[Bug c++/106991] New: new+delete pair not optimized by g++ at -O3 but optimized at -Os

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991

Bug ID: 106991
   Summary: new+delete pair not optimized by g++ at -O3 but
optimized at -Os
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://godbolt.org/z/PeYcoqTKn

---
#include
#include

int volatile gv = 0;

void* operator new(long unsigned sz ) {
++gv;
return malloc( sz );
}

void operator delete(void *p) noexcept {
--gv;
free(p);
}

class c {
int l;
public:
c() : l(0) {}
int get(){ return l; }
};

int caller( void ){
c *f = new c();
assert( f->get() == 0 );
delete f;
return gv;
}
---

$ g++ -std=c++20 -O3

operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*):
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
jmp free
caller():
sub rsp, 8
mov eax, DWORD PTR gv[rip]
mov edi, 4
add eax, 1
mov DWORD PTR gv[rip], eax
callmalloc
mov esi, 4
mov rdi, rax
calloperator delete(void*, unsigned long)
mov eax, DWORD PTR gv[rip]
add rsp, 8
ret
gv:
.zero   4
---
$ g++ -std=c++20 -Os

operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
inc eax
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*):
mov eax, DWORD PTR gv[rip]
dec eax
mov DWORD PTR gv[rip], eax
jmp free
caller():
mov eax, DWORD PTR gv[rip]
ret
gv:
.zero   4

[Bug c++/87628] Redundant check of pointer when operator delete is called

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #3 from AK  ---
Still happening with gcc trunk.

https://godbolt.org/z/5K94665GK

[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889

--- Comment #5 from AK  ---
Link to compiler explorer: https://godbolt.org/z/dGYG4dG15

[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889

--- Comment #4 from AK  ---
Seems like clang doesn't sign extend.

$ clang -O3 -std=c++14 -g0

```
.text
.intel_syntax noprefix
.file   "example.cpp"
.globl  lol(int*, int*, unsigned int, unsigned int)   
# -- Begin function lol(int*, int*, unsigned int, unsigned int)
.p2align4, 0x90
.type   lol(int*, int*, unsigned int, unsigned int),@function
lol(int*, int*, unsigned int, unsigned int):   #
@lol(int*, int*, unsigned int, unsigned int)
.cfi_startproc
# %bb.0:
# kill: def $edx killed $edx def $rdx
and edx, ecx
mov r8d, ecx
mov ecx, 1
jmp .LBB0_1
.p2align4, 0x90
.LBB0_4:#   in Loop: Header=BB0_1 Depth=1
testal, 1
jne .LBB0_5
.LBB0_7:#   in Loop: Header=BB0_1 Depth=1
add edx, ecx
and edx, r8d
inc rcx
.LBB0_1:# =>This Inner Loop Header: Depth=1
mov eax, dword ptr [rsi + 4*rdx]
testeax, eax
js  .LBB0_4
# %bb.2:#   in Loop: Header=BB0_1 Depth=1
cmp dword ptr [rdi + 4*rax], 42
jne .LBB0_7
# %bb.3:
mov eax, 1
ret
.LBB0_5:
xor eax, eax
ret
.Lfunc_end0:
.size   lol(int*, int*, unsigned int, unsigned int),
.Lfunc_end0-lol(int*, int*, unsigned int, unsigned int)
.cfi_endproc
# -- End function
.ident  "clang version 16.0.0 (https://github.com/llvm/llvm-project.git
5e22ef3198d1686f7978dd150a3eefad4f737bfc)"
.section".note.GNU-stack","",@progbits
.addrsig
```


$ gcc -O3 -std=c++14 -g0

```
lol(int*, int*, unsigned int, unsigned int):
and edx, ecx
mov r8d, 1
mov ecx, ecx
jmp .L5
.L10:
cmp DWORD PTR [rdi+rax*4], 42
je  .L9
.L4:
add rdx, r8
add r8, 1
and rdx, rcx
.L5:
movsx   rax, DWORD PTR [rsi+rdx*4] <--- sign extend
testeax, eax
jns .L10
testal, 1
je  .L4
xor eax, eax
ret
.L9:
mov eax, 1
ret
```

[Bug libstdc++/78717] no definition of string::find when lowered to gimple

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717

--- Comment #3 from AK  ---
Even with a high inline limit, string::find didn't inline.

g++-11.0.2 -O3  -finline-limit=10  -S -o a.s s.cpp

cat a.s

```
_Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_i:
.LFB1240:
.cfi_startproc
endbr64
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq8(%rsi), %rcx
movslq  %edx, %rbx
xorl%edx, %edx
movq(%rsi), %rsi
call   
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm@PLT
cmpq%rax, %rbx
popq%rbx
.cfi_def_cfa_offset 8
sete%al
movzbl  %al, %eax
ret

```

[Bug other/92396] -ftime-trace support

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92396

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #12 from AK  ---
I was building a giant file that takes around 100 minutes. The -ftime-report
gave nothing useful to find out hotspots. It is also not clear what we are
reporting here as there is no documentation for it in man gcc. The %ages don't
add up to 100 and that makes it confusing.

I'm wondering if making this task a GSoC project will get more attention?

[Bug libstdc++/80331] unused const std::string not optimized away

2022-06-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331

--- Comment #9 from AK  ---
can't repro this with gcc 12.1 Seems like this is fixed?

https://godbolt.org/z/e6n94zK4E

[Bug tree-optimization/105830] call to memcpy when -nostdlib -nodefaultlibs flags provided

2022-06-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830

--- Comment #3 from AK  ---
with -ffreestanding the calls to memcpy did disappear. Thanks.

[Bug tree-optimization/105830] New: call to memcpy when -nostdlib -nodefaultlibs flags provided

2022-06-03 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830

Bug ID: 105830
   Summary: call to memcpy when -nostdlib -nodefaultlibs flags
provided
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://godbolt.org/z/jTEa6ajn3


```
// test.c

// Type your code here, or load an example.
/* Nonzero if either X or Y is not aligned on a "long" boundary.  */
#define UNALIGNED(X, Y) \
  (((unsigned long)X & (sizeof (unsigned long) - 1)) | ((unsigned long)Y &
(sizeof (unsigned long) - 1)))

  #define UNALIGNED1(a) \
((unsigned long)(a) & (sizeof(unsigned long)-1))

/* How many bytes are copied each iteration of the 4X unrolled loop.  */
#define BIGBLOCKSIZE(sizeof (unsigned long) * 4)

/* How many bytes are copied each iteration of the word copy loop.  */
#define LITTLEBLOCKSIZE (sizeof (unsigned long))

/* Threshhold for punting to the byte copier.  */
#define TOO_SMALL(LEN)  ((LEN) < BIGBLOCKSIZE)

void * memcpy (void *__restrict dst0,
const void *__restrict src0,
unsigned long len0)
{
  unsigned char *dst = dst0;
  const unsigned char *src = src0;


  /* If the size is small, or either SRC or DST is unaligned,
 then punt into the byte copy loop.  This should be rare.  */
  if (len0 >= LITTLEBLOCKSIZE && !UNALIGNED (src, dst))
{
unsigned long *aligned_dst;
const unsigned long *aligned_src;
  aligned_dst = (unsigned long*)dst;
  aligned_src = (const unsigned long*)src;

  /* Copy one long word at a time if possible.  */


  /* Copy one long word at a time if possible.  */
  do
{
  *aligned_dst++ = *aligned_src++;
  len0 -= LITTLEBLOCKSIZE;
} while (len0 >= LITTLEBLOCKSIZE);

   /* Pick up any residual with a byte copier.  */
  dst = (unsigned char*)aligned_dst;
  src = (const unsigned char*)aligned_src;
}

  for (; len0; len0--)
*dst++ = *src++;

  return dst0;
}

// ARM gcc trunk
gcc -O3 -nostdlib -nodefaultlibs -S -o -

memcpy:
push{r3, r4, r5, r6, r7, lr}
cmp r2, #3
mov r4, r2
mov r5, r0
mov r6, r1
bls .L5
orr r3, r0, r1
lslsr3, r3, #30
beq .L9
.L3:
mov r2, r4
mov r1, r6
bl  memcpy ; <- call to memcpy
mov r0, r5
pop {r3, r4, r5, r6, r7, pc}
.L9:
subsr7, r2, #4
and r4, r2, #3
bic r7, r7, #3
addsr7, r7, #4
mov r2, r7
add r6, r6, r7
bl  memcpy ; <- call to memcpy
addsr0, r5, r7
.L5:
cmp r4, #0
bne .L3
mov r0, r5
pop {r3, r4, r5, r6, r7, pc}

[Bug c++/105796] New: error: no matching function for call with template function

2022-05-31 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105796

Bug ID: 105796
   Summary: error: no matching function for call with template
function
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

test.cpp
```
int func(int, char);

template
int testFunc(int (*)(TArgs..., char));

int x = testFunc(func);
```

With gcc trunk:
g++ -std=c++20 test.cpp -c


:6:22: error: no matching function for call to 'testFunc(int
(&)(int, char))'
6 | int x = testFunc(func);
  | ~^~
:4:5: note: candidate: 'template int testFunc(int
(*)(TArgs ..., char))'
4 | int testFunc(int (*)(TArgs..., char));
  | ^~~~
:4:5: note:   template argument deduction/substitution failed:
:6:22: note:   mismatched types 'char' and 'int'
6 | int x = testFunc(func);
  | ~^~
Compiler returned: 1

[Bug c++/101138] New: Ambiguous code (with operator==) compiled without error

2021-06-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101138

Bug ID: 101138
   Summary: Ambiguous code (with operator==) compiled without
error
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp

#include 
using namespace std;

template
struct D {
template bool operator==(Y a) const { 
cout << "f" < 
bool operator==(T a, D b) { 
cout << "fD" < a, b;
if (a == b)
return 0;
return 1;
}

gcc compiles this code fine, bug clang errors out.

https://godbolt.org/z/c13EExxeY

[Bug tree-optimization/101116] New: missed peephole optimization not of bitwise and

2021-06-17 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101116

Bug ID: 101116
   Summary: missed peephole optimization not of bitwise and
   Product: gcc
   Version: 11.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.c

bool foo(unsigned i) {
return !(i & 1);
}

gcc -O2 test.c -S -o-
foo(unsigned int):
mov eax, edi
not eax
and eax, 1
ret

clang -O2 test.c -S -o-

foo(unsigned int): # @foo(unsigned int)
  testb $1, %dil
  sete %al
  retq


Ref: https://godbolt.org/z/Tndb1dM8Y

[Bug tree-optimization/100004] Dead write not removed when indirection is introduced.

2021-04-09 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14

--- Comment #1 from AK  ---
godbolt link: https://gcc.godbolt.org/z/f7Y6G1svf

[Bug tree-optimization/100004] New: Dead write not removed when indirection is introduced.

2021-04-09 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14

Bug ID: 14
   Summary: Dead write not removed when indirection is introduced.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

struct Foo {
int x;
};

struct Bar {
int x;
};

void alias(Foo* foo, Bar* bar) {
foo->x = 5;
foo->x = bar->x;
}

struct Wrap1 {
Foo foo;
};

struct Wrap2 {
Foo foo;
};

void assign_direct(Wrap1* w1, Wrap2* w2)
{
w1->foo.x = 5;
w1->foo.x = w2->foo.x;
}

void assign_via_pointer(Wrap1* w1, Wrap2* w2)
{
Foo* f1 = >foo;
Foo* f2 = >foo;
f1->x = 5;
f1->x = f2->x;
}


$ gcc-arm64 -O2 -std=c++17 -fstrict-aliasing -S -o -

alias(Foo*, Bar*):
ldr w1, [x1]
str w1, [x0]
ret
assign_direct(Wrap1*, Wrap2*):
ldr w1, [x1]
str w1, [x0]
ret
assign_via_pointer(Wrap1*, Wrap2*):
mov w2, 5
str w2, [x0]
ldr w1, [x1]
str w1, [x0]
ret

[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp

2021-02-11 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #17 from AK  ---
Now that we have string_view, will it be possible to avoid creating a copy?

[Bug tree-optimization/98497] New: [Potential Perf regression] jne to hot branch instead je to cold

2021-01-01 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98497

Bug ID: 98497
   Summary: [Potential Perf regression] jne to hot branch instead
je to cold
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

In the following code generated by gcc 10.2
```
.L2:
movups  xmm3, XMMWORD PTR [rax]
add rax, 16
addps   xmm0, xmm3
cmp rax, rdx
je  .L6
jmp .L2

matrix_sum_column_major.cold:
.L6:
movaps  xmm2, xmm0
# .

```

I think `jne .L2; jmp.L6` should be more efficient as it avoids one instruction
in the hot path.

c code:
```
float matrix_sum_column_major(float* x, int n) {
n = 32767;
float sum = 0;
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
sum += x[j * n + i];
return sum;
}
```

gcc -Ofast -floop-nest-optimize -o -
```
matrix_sum_column_major:
mov eax, 4294836212
lea rdx, [rdi+131056]
pxorxmm1, xmm1
lea rcx, [rdi+rax]
.L3:
mov rax, rdi
pxorxmm0, xmm0
.L2:
movups  xmm3, XMMWORD PTR [rax]
add rax, 16
addps   xmm0, xmm3
cmp rax, rdx
je  .L6
jmp .L2
matrix_sum_column_major.cold:
.L6:
movaps  xmm2, xmm0
addss   xmm1, DWORD PTR [rax+8]
lea rdx, [rax+131068]
add rdi, 131068
movhlps xmm2, xmm0
addps   xmm2, xmm0
movaps  xmm0, xmm2
shufps  xmm0, xmm2, 85
addps   xmm0, xmm2
movss   xmm2, DWORD PTR [rax+4]
addss   xmm2, DWORD PTR [rax]
addss   xmm1, xmm2
addss   xmm1, xmm0
cmp rdx, rcx
jne .L3
movaps  xmm0, xmm1
ret
```


Link to godbolt: https://gcc.godbolt.org/z/ac7YY1