[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-06-05 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #12 from Pontakorn Prasertsuk  ---
I notice that GCC also does not optimize this case:
https://godbolt.org/z/7oGqjqqz4

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-06-05 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #11 from Pontakorn Prasertsuk  ---
(In reply to rguent...@suse.de from comment #10)
> On Mon, 5 Jun 2023, ptk.prasertsuk at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035
> > 
> > --- Comment #9 from Pontakorn Prasertsuk  
> > ---
> > (In reply to Richard Biener from comment #8)
> > > (In reply to Pontakorn Prasertsuk from comment #7)
> > > > For the LLVM IR code of the snippet I provided, Clang's alias analysis 
> > > > can
> > > > prove that `new` call has no side effect to other memory location. This 
> > > > is
> > > > indicated by `noalias` keyword at the return value of the `new` call 
> > > > (_Znwm).
> > > > 
> > > > According to Clang's Language Reference:
> > > > "On function return values, the noalias attribute indicates that the
> > > > function acts like a system memory allocation function, returning a 
> > > > pointer
> > > > to allocated storage disjoint from the storage for any other object
> > > > accessible to the caller."
> > > > 
> > > > Is this possible for GCC alias analysis pass?
> > > 
> > > >   MyClass c = a;
> > > >   MyClass *b = new MyClass;
> > > >   *b = c;
> > > 
> > > the point is that 'new' can alter the value of 'a', GCC already knows that
> > > 'b' is distinct from c and a but that's not the relevant thing.  It looks
> > > like LLVM creates wrong-code here.
> > 
> > In what case can 'new' alter 'a'? I thought memory allocation functions 
> > such as
> > 'malloc, 'calloc' and 'new' cannot alias other memory locations than its 
> > return
> > value.
> 
> 'new' can be overridden by the user, you can declare your own 
> implementation that does fancy stuff behind the scenes, including
> in the above case altering 'a'.  Welcome to C++ ...

I assume you are referring to this case: https://godbolt.org/z/z4Y7YdxWE

Clang indeed assumes that 'new' is non-alias and this feature can be turned off
by using -fno-assume-sane-operator-new

However, can we safely assume that 'malloc' and 'calloc' are non-alias as well?

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-06-05 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #9 from Pontakorn Prasertsuk  ---
(In reply to Richard Biener from comment #8)
> (In reply to Pontakorn Prasertsuk from comment #7)
> > For the LLVM IR code of the snippet I provided, Clang's alias analysis can
> > prove that `new` call has no side effect to other memory location. This is
> > indicated by `noalias` keyword at the return value of the `new` call 
> > (_Znwm).
> > 
> > According to Clang's Language Reference:
> > "On function return values, the noalias attribute indicates that the
> > function acts like a system memory allocation function, returning a pointer
> > to allocated storage disjoint from the storage for any other object
> > accessible to the caller."
> > 
> > Is this possible for GCC alias analysis pass?
> 
> >   MyClass c = a;
> >   MyClass *b = new MyClass;
> >   *b = c;
> 
> the point is that 'new' can alter the value of 'a', GCC already knows that
> 'b' is distinct from c and a but that's not the relevant thing.  It looks
> like LLVM creates wrong-code here.

In what case can 'new' alter 'a'? I thought memory allocation functions such as
'malloc, 'calloc' and 'new' cannot alias other memory locations than its return
value.

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-06-02 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #7 from Pontakorn Prasertsuk  ---
For the LLVM IR code of the snippet I provided, Clang's alias analysis can
prove that `new` call has no side effect to other memory location. This is
indicated by `noalias` keyword at the return value of the `new` call (_Znwm).

According to Clang's Language Reference:
"On function return values, the noalias attribute indicates that the function
acts like a system memory allocation function, returning a pointer to allocated
storage disjoint from the storage for any other object accessible to the
caller."

Is this possible for GCC alias analysis pass?

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-05-30 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #5 from Pontakorn Prasertsuk  ---
(In reply to Andrew Pinski from comment #3)
> We don't even optimize:
> ```
> struct MyClass
> {
> unsigned long long arr[128];
> };
> 
> [[gnu::noipa]]
> void sink(void *m);
> void gg(MyClass , MyClass *b)
> {
>   MyClass c = a;
>   *b = c;
>   sink(b);
> }
> ```
> 
> As I mentioned there are dups of the above testcase.

Would you mind pointing me to the original issue?

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-05-30 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

--- Comment #4 from Pontakorn Prasertsuk  ---
(In reply to Richard Biener from comment #1)
> Ick - convoluted C++.  We end up with
> 
> void ff (struct MyClass & obj)
> {
>   vector(2) long unsigned int vect_SR.16;
>   vector(2) long unsigned int vect_SR.15;
>   vector(2) long unsigned int vect_SR.14;
>   void * _6;
> 
>[local count: 1073741824]:
>   vect_SR.14_5 = MEM  [(struct MyClass
> &)obj_2(D)];
>   vect_SR.15_28 = MEM  [(struct MyClass
> &)obj_2(D) + 16];
>   vect_SR.16_30 = MEM  [(struct MyClass
> &)obj_2(D) + 32];
>   _6 = operator new (48);
>   MEM  [(struct MyClass2 *)_6] = vect_SR.14_5;
>   MEM  [(struct MyClass2 *)_6 + 16B] =
> vect_SR.15_28;
>   MEM  [(struct MyClass2 *)_6 + 32B] =
> vect_SR.16_30;
>   HandleMyClass2 (_6); [tail call]
> 
> and the issue is that 'operator new (48)' can alter what 'obj' points to,
> so we cannot move the loads across the call and we get spilling.
> 
> There is no inter-procedural analysis in GCC that would tell us that
> 'obj_2(D)' (the MyClass & obj argument of ff) does not point to an
> object that did not escape.  In fact 'ff' has global visibility
> and it might have other callers.
> 
> If you add -fwhole-program then you get the function inlined to main and
> 
> main:
> .LFB652:
> .cfi_startproc
> subq$8, %rsp
> .cfi_def_cfa_offset 16
> movl$48, %edi
> call_Znwm
> movq$0, (%rax)
> movq%rax, %rdi
> movq$0, 8(%rax)
> movq$0, 16(%rax)
> movq$0, 24(%rax)
> movq$0, 32(%rax)
> movq$0, 40(%rax)
> call_Z14HandleMyClass2Pv
> xorl%eax, %eax
> addq$8, %rsp
> .cfi_def_cfa_offset 8
> ret
> 
> (not using vectors because 'main' is considered cold).  Do you cite an
> inline copy of ff() for clang?

Hi Richard,

The clang snippet I provided is not inlined into 'main' function.

[Bug tree-optimization/110035] New: Missed optimization for dependent assignment statements

2023-05-30 Thread ptk.prasertsuk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

Bug ID: 110035
   Summary: Missed optimization for dependent assignment
statements
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ptk.prasertsuk at gmail dot com
  Target Milestone: ---

Created attachment 55212
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55212=edit
Test case, compiled with -stdc++=20 -O2

The test case, when compiled, produces additional move instructions:

movdqu  (%rdi), %xmm2
movdqu  16(%rdi), %xmm1
movdqu  32(%rdi), %xmm0
movl$48, %edi
movaps  %xmm2, 32(%rsp)
movaps  %xmm1, 16(%rsp)
movaps  %xmm0, (%rsp)
call_Znwm@PLT
movdqa  32(%rsp), %xmm2
movdqa  16(%rsp), %xmm1
movdqa  (%rsp), %xmm0
movq%rax, %rdi
movups  %xmm2, (%rax)
movups  %xmm1, 16(%rax)
movups  %xmm0, 32(%rax)

compared to more optimized result using clang++ 14.0.0 with same flags:

callq   _Znwm@PLT
movups  (%rbx), %xmm0
movups  16(%rbx), %xmm1
movups  32(%rbx), %xmm2
movups  %xmm0, (%rax)
movups  %xmm1, 16(%rax)
movups  %xmm2, 32(%rax)
movq%rax, %rdi

Clang has MemCpyOptPass which detects and removes memory dependency of the
second set of move instructions, which allows Dead Store Elimination pass to
remove the first set of move instructions.

g++-12 -v
Using built-in specs.
COLLECT_GCC=g++-12
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12.1.0-2ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Ubuntu 12.1.0-2ubuntu1~22.04)