[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #12 from Pontakorn Prasertsuk --- I notice that GCC also does not optimize this case: https://godbolt.org/z/7oGqjqqz4
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #11 from Pontakorn Prasertsuk --- (In reply to rguent...@suse.de from comment #10) > On Mon, 5 Jun 2023, ptk.prasertsuk at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > > > --- Comment #9 from Pontakorn Prasertsuk > > --- > > (In reply to Richard Biener from comment #8) > > > (In reply to Pontakorn Prasertsuk from comment #7) > > > > For the LLVM IR code of the snippet I provided, Clang's alias analysis > > > > can > > > > prove that `new` call has no side effect to other memory location. This > > > > is > > > > indicated by `noalias` keyword at the return value of the `new` call > > > > (_Znwm). > > > > > > > > According to Clang's Language Reference: > > > > "On function return values, the noalias attribute indicates that the > > > > function acts like a system memory allocation function, returning a > > > > pointer > > > > to allocated storage disjoint from the storage for any other object > > > > accessible to the caller." > > > > > > > > Is this possible for GCC alias analysis pass? > > > > > > > MyClass c = a; > > > > MyClass *b = new MyClass; > > > > *b = c; > > > > > > the point is that 'new' can alter the value of 'a', GCC already knows that > > > 'b' is distinct from c and a but that's not the relevant thing. It looks > > > like LLVM creates wrong-code here. > > > > In what case can 'new' alter 'a'? I thought memory allocation functions > > such as > > 'malloc, 'calloc' and 'new' cannot alias other memory locations than its > > return > > value. > > 'new' can be overridden by the user, you can declare your own > implementation that does fancy stuff behind the scenes, including > in the above case altering 'a'. Welcome to C++ ... I assume you are referring to this case: https://godbolt.org/z/z4Y7YdxWE Clang indeed assumes that 'new' is non-alias and this feature can be turned off by using -fno-assume-sane-operator-new However, can we safely assume that 'malloc' and 'calloc' are non-alias as well?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #9 from Pontakorn Prasertsuk --- (In reply to Richard Biener from comment #8) > (In reply to Pontakorn Prasertsuk from comment #7) > > For the LLVM IR code of the snippet I provided, Clang's alias analysis can > > prove that `new` call has no side effect to other memory location. This is > > indicated by `noalias` keyword at the return value of the `new` call > > (_Znwm). > > > > According to Clang's Language Reference: > > "On function return values, the noalias attribute indicates that the > > function acts like a system memory allocation function, returning a pointer > > to allocated storage disjoint from the storage for any other object > > accessible to the caller." > > > > Is this possible for GCC alias analysis pass? > > > MyClass c = a; > > MyClass *b = new MyClass; > > *b = c; > > the point is that 'new' can alter the value of 'a', GCC already knows that > 'b' is distinct from c and a but that's not the relevant thing. It looks > like LLVM creates wrong-code here. In what case can 'new' alter 'a'? I thought memory allocation functions such as 'malloc, 'calloc' and 'new' cannot alias other memory locations than its return value.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #7 from Pontakorn Prasertsuk --- For the LLVM IR code of the snippet I provided, Clang's alias analysis can prove that `new` call has no side effect to other memory location. This is indicated by `noalias` keyword at the return value of the `new` call (_Znwm). According to Clang's Language Reference: "On function return values, the noalias attribute indicates that the function acts like a system memory allocation function, returning a pointer to allocated storage disjoint from the storage for any other object accessible to the caller." Is this possible for GCC alias analysis pass?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #5 from Pontakorn Prasertsuk --- (In reply to Andrew Pinski from comment #3) > We don't even optimize: > ``` > struct MyClass > { > unsigned long long arr[128]; > }; > > [[gnu::noipa]] > void sink(void *m); > void gg(MyClass , MyClass *b) > { > MyClass c = a; > *b = c; > sink(b); > } > ``` > > As I mentioned there are dups of the above testcase. Would you mind pointing me to the original issue?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #4 from Pontakorn Prasertsuk --- (In reply to Richard Biener from comment #1) > Ick - convoluted C++. We end up with > > void ff (struct MyClass & obj) > { > vector(2) long unsigned int vect_SR.16; > vector(2) long unsigned int vect_SR.15; > vector(2) long unsigned int vect_SR.14; > void * _6; > >[local count: 1073741824]: > vect_SR.14_5 = MEM [(struct MyClass > &)obj_2(D)]; > vect_SR.15_28 = MEM [(struct MyClass > &)obj_2(D) + 16]; > vect_SR.16_30 = MEM [(struct MyClass > &)obj_2(D) + 32]; > _6 = operator new (48); > MEM [(struct MyClass2 *)_6] = vect_SR.14_5; > MEM [(struct MyClass2 *)_6 + 16B] = > vect_SR.15_28; > MEM [(struct MyClass2 *)_6 + 32B] = > vect_SR.16_30; > HandleMyClass2 (_6); [tail call] > > and the issue is that 'operator new (48)' can alter what 'obj' points to, > so we cannot move the loads across the call and we get spilling. > > There is no inter-procedural analysis in GCC that would tell us that > 'obj_2(D)' (the MyClass & obj argument of ff) does not point to an > object that did not escape. In fact 'ff' has global visibility > and it might have other callers. > > If you add -fwhole-program then you get the function inlined to main and > > main: > .LFB652: > .cfi_startproc > subq$8, %rsp > .cfi_def_cfa_offset 16 > movl$48, %edi > call_Znwm > movq$0, (%rax) > movq%rax, %rdi > movq$0, 8(%rax) > movq$0, 16(%rax) > movq$0, 24(%rax) > movq$0, 32(%rax) > movq$0, 40(%rax) > call_Z14HandleMyClass2Pv > xorl%eax, %eax > addq$8, %rsp > .cfi_def_cfa_offset 8 > ret > > (not using vectors because 'main' is considered cold). Do you cite an > inline copy of ff() for clang? Hi Richard, The clang snippet I provided is not inlined into 'main' function.
[Bug tree-optimization/110035] New: Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 Bug ID: 110035 Summary: Missed optimization for dependent assignment statements Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ptk.prasertsuk at gmail dot com Target Milestone: --- Created attachment 55212 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55212=edit Test case, compiled with -stdc++=20 -O2 The test case, when compiled, produces additional move instructions: movdqu (%rdi), %xmm2 movdqu 16(%rdi), %xmm1 movdqu 32(%rdi), %xmm0 movl$48, %edi movaps %xmm2, 32(%rsp) movaps %xmm1, 16(%rsp) movaps %xmm0, (%rsp) call_Znwm@PLT movdqa 32(%rsp), %xmm2 movdqa 16(%rsp), %xmm1 movdqa (%rsp), %xmm0 movq%rax, %rdi movups %xmm2, (%rax) movups %xmm1, 16(%rax) movups %xmm0, 32(%rax) compared to more optimized result using clang++ 14.0.0 with same flags: callq _Znwm@PLT movups (%rbx), %xmm0 movups 16(%rbx), %xmm1 movups 32(%rbx), %xmm2 movups %xmm0, (%rax) movups %xmm1, 16(%rax) movups %xmm2, 32(%rax) movq%rax, %rdi Clang has MemCpyOptPass which detects and removes memory dependency of the second set of move instructions, which allows Dead Store Elimination pass to remove the first set of move instructions. g++-12 -v Using built-in specs. COLLECT_GCC=g++-12 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.1.0-2ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.1.0 (Ubuntu 12.1.0-2ubuntu1~22.04)