[Bug tree-optimization/86318] const local aggregates can be assumed not to be modified even when escaped

2022-12-27 Thread jhaberman at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86318

--- Comment #4 from Josh Haberman  ---
Is there any plan or timeline for fixing this bug?

[Bug tree-optimization/108226] New: __restrict on inlined function parameters does not function as expected

2022-12-25 Thread jhaberman at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108226

Bug ID: 108226
   Summary: __restrict on inlined function parameters does not
function as expected
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jhaberman at gmail dot com
  Target Milestone: ---

In bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58526 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60712#c3 it is said that
restrict/__restrict on inlined function parameters was fixed in GCC 5.  But I
ran into a case where __restrict does not work as expected:

// Godbolt link for this example: https://godbolt.org/z/e5j93Ex3v

long g;

static void Func1(void* p1, int* p2) {
  switch (*p2) {
case 2:
  __builtin_memcpy(p1, &g, 1);
  return;
case 1:
  __builtin_memcpy(p1, &g, 8);
  return;
case 0: {
  __builtin_memcpy(p1, &g, 16);
  return;
}
  }
}

static void Func2(char* __restrict p1, int* __restrict p2) {
  *p2 = 1;
  *p1 = 123;
  Func1(p1, p2);
}

void Func3(char* p1, int* p2) {
  *p2 = 1;
  Func2(p1, p2);
}

The __restrict qualifiers on Func2() should allow the switch() should be
optimized away.  Clang optimizes it, GCC does not.

It appears that __restrict on function parameters can even make the code worse.
Consider a slight variation on this example:

// Godbolt link for this example: https://godbolt.org/z/Y61qajETd

long g;

static void Func1(void* p1, int* p2) {
  switch (*p2) {
case 2:
  __builtin_memcpy(p1, &g, 1);
  return;
case 1:
  __builtin_memcpy(p1, &g, 8);
  return;
case 0: {
  __builtin_memcpy(p1, &g, 16);
  return;
}
  }
}

// If we remove __restrict here, GCC succeeds in optimizing away the switch().
static void Func2(char* __restrict p1, int* __restrict p2) {
  *p1 = 123;
  *p2 = 1;
  Func1(p1, p2);
}

void Func3(char* p1, int* p2) {
  *p2 = 1;
  Func2(p1, p2);
}

In this case, it should be straightforward to optimize away the switch(), even
without __restrict.  But GCC does not optimize this correctly unless we
*remove* __restrict.

[Bug tree-optimization/56456] [meta-bug] bogus/missing -Warray-bounds

2022-12-24 Thread jhaberman at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
Bug 56456 depends on bug 108217, which changed state.

Bug 108217 Summary: bogus -Warray-bounds with pointer to constant local
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

[Bug middle-end/108217] bogus -Warray-bounds with pointer to constant local

2022-12-24 Thread jhaberman at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217

Josh Haberman  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

--- Comment #3 from Josh Haberman  ---
> That being said there is a missed optimization but that is the same as PR 
> 23384 .
> The const part is a misleading you really.

I think there are two issues here.

1. Escape analysis is not flow sensitive.  I agree that aspect of my bug report
is a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384, and closing as
a dup is appropriate there.

2. Escape analysis does not take 'const-ness' of the underlying object into
account.

Let me illustrate (2) with an example that isolates that issue (Godbolt:
https://godbolt.org/z/16cv87s9d)

void ExternFunc(const int*);

int Bad() {
  const int i = 0;
  const int* pi = &i;
  ExternFunc(pi);
  return *pi;
}

int Good() {
  const int i = 0;
  ExternFunc(&i);
  return i;
}

These two functions are effectively the same, but in Bad() GCC does not perform
constant propagation across the external function call.  While it's true that
the pointer escapes, the underlying object is const and cannot change, so
constant propagation should work here, as it does in Good().

Currently GCC re-loads `i` from the stack in Bad(), even though we know
statically that the value must be zero.

The same missed optimization is present in Clang:
https://github.com/llvm/llvm-project/issues/59694

[Bug middle-end/108217] New: bogus -Warray-bounds with pointer to constant local

2022-12-24 Thread jhaberman at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217

Bug ID: 108217
   Summary: bogus -Warray-bounds with pointer to constant local
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jhaberman at gmail dot com
  Target Milestone: ---

Repro:

void ExternFunc1();
void ExternFunc2(const int*);

char mem[32];

static void StaticFunc(const int* i) {
  void* ptr = (void*)0;
  switch (*i) {
case 0:
  ExternFunc2(i);
  return;
case 1:
  __builtin_memcpy(mem, &ptr, sizeof(ptr));
  return;
case 2: {
  __builtin_memcpy(mem, &ptr, 32);
  return;
}
  }
}

void Bad() {
  const int i = 1;
  ExternFunc1();
  StaticFunc(&i);
}

This reproduces on trunk according to Godbolt: https://godbolt.org/z/vYGo1z6bG

Godbolt also indicates a missed optimization, which is probably related to the
bogus warning.  Clang correctly performs constant propagation of the local `i`,
whereas GCC seems to think that all cases of the switch() are reachable.

It is true that &i escapes, but mutating `i` is UB because it is const, so it
should be legal to perform constant propagation here.

Additionally, even if ExternFunc2() mutated `i`, it would be too late to change
its value in time to affect the switch().

[Bug rtl-optimization/70782] zero-initialized long returned by value generates useless stores/loads to the stack

2016-04-24 Thread jhaberman at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70782

Josh Haberman  changed:

   What|Removed |Added

Summary|zero-initialized union  |zero-initialized long
   |returned by value generates |returned by value generates
   |useless stores/loads to the |useless stores/loads to the
   |stack   |stack

--- Comment #1 from Josh Haberman  ---
I just realized that the union has nothing to do with it.  I get exactly the
same results if the function returns a long:

--

#include 

long f(const void *p, int type) {
  long v;
  memset(&v, 0, 8);
  if (type == 1) {
memcpy(&v, p, 1);
  } else if (type <= 5) {
memcpy(&v, p, 4);
  } else if (type <= 8) {
memcpy(&v, p, 8);
  }
  return v;
}

--

I've retitled the bug accordingly.

[Bug rtl-optimization/70782] New: zero-initialized union returned by value generates useless stores/loads to the stack

2016-04-24 Thread jhaberman at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70782

Bug ID: 70782
   Summary: zero-initialized union returned by value generates
useless stores/loads to the stack
   Product: gcc
   Version: 5.2.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jhaberman at gmail dot com
  Target Milestone: ---

Test case:

--
#include 

typedef union {
  char ch;
  float fl;
  double dbl;
} u;

u f(const void *p, int type) {
  u v;
  memset(&v, 0, 8);
  if (type == 1) {
memcpy(&v, p, 1);
  } else if (type <= 5) {
memcpy(&v, p, 4);
  } else if (type <= 8) {
memcpy(&v, p, 8);
  }
  return v;
}

--

With gcc 5.2.1 on Ubuntu, compiled with -O3 -fno-stack-protect I get:

--

 :
   0:   83 fe 01cmpesi,0x1
   3:   48 c7 44 24 e8 00 00movQWORD PTR [rsp-0x18],0x0
   a:   00 00 
   c:   74 32   je 40 
   e:   83 fe 05cmpesi,0x5
  11:   7e 1d   jle30 
  13:   83 fe 08cmpesi,0x8
  16:   7f 08   jg 20 
  18:   48 8b 07movrax,QWORD PTR [rdi]
  1b:   48 89 44 24 e8  movQWORD PTR [rsp-0x18],rax
  20:   48 8b 44 24 e8  movrax,QWORD PTR [rsp-0x18]
  25:   c3  ret
  26:   66 2e 0f 1f 84 00 00nopWORD PTR cs:[rax+rax*1+0x0]
  2d:   00 00 00 
  30:   8b 07   moveax,DWORD PTR [rdi]
  32:   89 44 24 e8 movDWORD PTR [rsp-0x18],eax
  36:   48 8b 44 24 e8  movrax,QWORD PTR [rsp-0x18]
  3b:   c3  ret
  3c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
  40:   0f b6 07movzx  eax,BYTE PTR [rdi]
  43:   88 44 24 e8 movBYTE PTR [rsp-0x18],al
  47:   48 8b 44 24 e8  movrax,QWORD PTR [rsp-0x18]
  4c:   c3  ret

--

In every code path it saves the read value to the stack, only to read it back. 
None of these operations are actually necessary, since the code is already
zeroing the other parts of rax.  This function shouldn't need to use any stack
space at all.

[Bug inline-asm/52813] %rsp in clobber list is silently ignored

2012-04-01 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813

--- Comment #4 from Josh Haberman  2012-04-01 
19:23:14 UTC ---
I understand that GCC may not be able to save/restore %rsp like it does other
registers.  But if that's the case, GCC should throw an error if the user puts
%rsp in the clobber list, instead of silently ignoring it.  Otherwise how is
the user supposed to know that %rsp will not be saved except through trial and
error?


[Bug inline-asm/52813] %rsp in clobber list is silently ignored

2012-04-01 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813

Josh Haberman  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |

--- Comment #2 from Josh Haberman  2012-04-01 
15:54:27 UTC ---
I don't expect the compiler to analyze the asm string.  I expect the compiler
to respect my clobber list.

I told GCC that I would clobber %rsp.  Any other register that I put in the
clobber list causes GCC to save that register to the stack or to another
register before the asm and restore it from the stack/register after the asm. 
For example:

--

#include 
int main() {
  int x = rand();
  asm volatile ("movq $0, %%rax" : : : "%rax");
  return x;
}

$ gcc -Wall -O3 -fomit-frame-pointer -c -o test.o test.c
$ objdump -d -r -M intel test.o
test.o: file format elf64-x86-64


Disassembly of section .text.startup:

 :
   0:48 83 ec 08  subrsp,0x8
   4:e8 00 00 00 00   call   9 
5: R_X86_64_PC32rand-0x4
   9:89 c2movedx,eax
   b:48 c7 c0 00 00 00 00 movrax,0x0
  12:89 d0moveax,edx
  14:48 83 c4 08  addrsp,0x8
  18:c3   ret

--

Notice that it saved eax to edx before my asm and restored it afterwards.  This
works for every register except %rsp, which is silently ignored if you try to
list it in the clobber list.  This is a bug.


[Bug inline-asm/52813] New: %rsp in clobber list is silently ignored

2012-03-31 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813

 Bug #: 52813
   Summary: %rsp in clobber list is silently ignored
Classification: Unclassified
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: jhaber...@gmail.com


The following test program crashes even though I correctly listed %rsp as
clobbered:

--

int main() {
  asm volatile ("movq $0, %%rsp" : : : "%rsp");
  return 0;
}

--

I would prefer gcc to error out in this case instead of silently ignoring my
instruction.


[Bug target/52055] load of 64-bit pointer reads 64 bits even when only 32 are used

2012-01-31 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52055

--- Comment #2 from Josh Haberman  2012-01-31 
17:23:51 UTC ---
Is there any requirement that you trap if the 64-bit read would have trapped?  
Aren't unaligned reads undefined behavior that only happen to work on x86-64?


[Bug target/52055] New: load of 64-bit pointer reads 64 bits even when only 32 are used

2012-01-30 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52055

 Bug #: 52055
   Summary: load of 64-bit pointer reads 64 bits even when only 32
are used
Classification: Unclassified
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: jhaber...@gmail.com


The following test program:

#include 
uint32_t rd32(uint64_t *i) { return *i; }

Compiles to this (-O3 -fomit-frame-pointer):

 :
   0:48 8b 07 movrax,QWORD PTR [rdi]
   3:c3   ret

But Clang compiles to this, which seems correct, is one byte shorter and
touches less memory:

 :
   0:8b 07moveax,DWORD PTR [rdi]
   2:c3   ret


[Bug rtl-optimization/44194] struct returned by value generates useless stores

2011-02-23 Thread jhaberman at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194

--- Comment #8 from Josh Haberman  2011-02-24 
03:27:04 UTC ---
I found another test case for this.  I thought I'd post it since it's extremely
different than the original one.

--

class Foo {
 public:
  virtual ~Foo() {}
  virtual void DoSomething() = 0;
};

void foo(Foo *f, void (Foo::*member)()) {
  (f->*member)();
}

--

$ g++ -c -O3 -fomit-frame-pointer test.cc
$ objdump -M intel -d test.o

test.o: file format elf64-x86-64


Disassembly of section .text:

 <_Z3fooP3FooMS_FvvE>:
   0:40 f6 c6 01  test   sil,0x1
   4:48 89 74 24 e8   movQWORD PTR [rsp-0x18],rsi
   9:48 89 54 24 f0   movQWORD PTR [rsp-0x10],rdx
   e:74 10je 20 <_Z3fooP3FooMS_FvvE+0x20>
  10:48 01 d7 addrdi,rdx
  13:48 8b 07 movrax,QWORD PTR [rdi]
  16:48 8b 74 30 ff   movrsi,QWORD PTR [rax+rsi*1-0x1]
  1b:ff e6jmprsi
  1d:0f 1f 00 nopDWORD PTR [rax]
  20:48 01 d7 addrdi,rdx
  23:ff e6jmprsi

--

We spilled rsi and rdx to the stack (in the red zone, it appears) for no reason
(AFAICS).


[Bug rtl-optimization/44194] struct returned by value generates useless stores

2010-07-09 Thread jhaberman at gmail dot com


--- Comment #7 from jhaberman at gmail dot com  2010-07-10 01:48 ---
I must have been on crack when I wrote that last comment.  Sorry for the noise.

Though I do wonder how difficult the original bug is to fix.  This seems to
make it more expensive to return structures by value.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194



[Bug rtl-optimization/44194] struct returned by value generates useless stores

2010-07-09 Thread jhaberman at gmail dot com


--- Comment #4 from jhaberman at gmail dot com  2010-07-10 01:38 ---
This seems to happen even with POD return types:

int foo();
void bar(int a);

void func() {
  bar(foo());
}

In 32-bit mode it spills the return value to the stack for no reason.  It also
seems to overallocate the stack (28 bytes allocated, only 4 used):

 :
   0:   83 ec 1csubesp,0x1c
   3:   e8 fc ff ff ff  call   4 
4: R_386_PC32   foo
   8:   89 04 24movDWORD PTR [esp],eax
   b:   e8 fc ff ff ff  call   c 
c: R_386_PC32   bar
  10:   83 c4 1caddesp,0x1c
  13:   c3  ret

In 64-bit mode there is no store, but it *does* allocate 8 bytes of stack that
it never uses:

 :
   0:   48 83 ec 08 subrsp,0x8
   4:   31 c0   xoreax,eax
   6:   e8 00 00 00 00  call   b 
7: R_X86_64_PC32foo+0xfffc
   b:   48 83 c4 08 addrsp,0x8
   f:   89 c7   movedi,eax
  11:   e9 00 00 00 00  jmp16 
12: R_X86_64_PC32   bar+0xfffc

Any idea how hard this bug is to fix?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194



[Bug rtl-optimization/44194] New: struct returned by value generates useless stores

2010-05-18 Thread jhaberman at gmail dot com
Test case:

--

#include 

struct twoints { uint64_t a, b; } foo();
void bar(uint64_t a, uint64_t b);

void func() {
  struct twoints s = foo();
  bar(s.a, s.b);
}

--

$ gcc -save-temps -Wall -c -o testbad.o -msse2 -O3 -fomit-frame-pointer
testbad.c 
$ objdump -d -r -M intel testbad.o

testbad.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   48 83 ec 28 subrsp,0x28
   4:   31 c0   xoreax,eax
   6:   e8 00 00 00 00  call   b 
7: R_X86_64_PC32foo-0x4
   b:   48 89 04 24 movQWORD PTR [rsp],rax
   f:   48 89 54 24 08  movQWORD PTR [rsp+0x8],rdx
  14:   48 89 d6movrsi,rdx
  17:   48 89 44 24 10  movQWORD PTR [rsp+0x10],rax
  1c:   48 89 54 24 18  movQWORD PTR [rsp+0x18],rdx
  21:   48 89 c7movrdi,rax
  24:   48 83 c4 28 addrsp,0x28
  28:   e9 00 00 00 00  jmp2d 
29: R_X86_64_PC32   bar-0x4

--

As you can see above, rax and rdx are stored to the stack twice, but these
stores are unnecessary.

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.3-4ubuntu5'
--with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared
--enable-multiarch --enable-linker-build-id --with-system-zlib
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc
--disable-werror --with-arch-32=i486 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)


-- 
   Summary: struct returned by value generates useless stores
   Product: gcc
   Version: 4.4.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jhaberman at gmail dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194