[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2023-05-01 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

--- Comment #3 from Petr Skocik  ---
Here's another example (which may be summarizing it more nicely)

struct a{ char _[4]; };
#include 
int cmp(struct a A, struct a B){ return !!memcmp(,,4); }

Expected x86-64 codegen (✓ for gcc -O2/-O3 and for clang -Os/-O2/-O3)   
xor eax, eax
cmp edi, esi
setne   al
ret

gcc -Os codegen:
subq$24, %rsp
movl$4, %edx
movl%edi, 12(%rsp)
leaq12(%rsp), %rdi
movl%esi, 8(%rsp)
leaq8(%rsp), %rsi
callmemcmp
testl   %eax, %eax
setne   %al
addq$24, %rsp
movzbl  %al, %eax
ret

https://godbolt.org/z/G5eE5GYv4

[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2020-01-15 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

Martin Sebor  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=85330
 Blocks||83819

--- Comment #2 from Martin Sebor  ---
The strlen pass does have this optimization but the pass only performs it at
-O2 and higher and not with -Os.  I don't know why it doesn't run at all
optimization levels.  Although some of its transformations might increase code
size, on balance, I'd expect it to both decrease it and emit faster code.  For
this example:

$ cat z.c && gcc -Os -S -Wall -fdump-tree-optimized=/dev/stdout -o/dev/stdout
z.c
int f (int x)
{
  return __builtin_memcmp (, (int[]){ 42 }, sizeof x) == 0;
}

.file   "z.c"
.text

;; Function f (f, funcdef_no=0, decl_uid=1930, cgraph_uid=1, symbol_order=0)

f (int x)
{
  int D.1932[1];
  int _1;
  _Bool _2;
  int _5;

   [local count: 1073741824]:
  D.1932[0] = 42;
  _1 = __builtin_memcmp (, , 4);
  _2 = _1 == 0;
  _5 = (int) _2;
  D.1932 ={v} {CLOBBER};
  return _5;

}


.globl  f
.type   f, @function
f:
.LFB0:
.cfi_startproc
subq$40, %rsp
.cfi_def_cfa_offset 48
movl$4, %edx
movl%edi, 12(%rsp)
leaq28(%rsp), %rsi
leaq12(%rsp), %rdi
movl$42, 28(%rsp)
callmemcmp
testl   %eax, %eax
sete%al
addq$40, %rsp
.cfi_def_cfa_offset 8
movzbl  %al, %eax
ret
.cfi_endproc
.LFE0:
.size   f, .-f
.ident  "GCC: (GNU) 10.0.0 20200115 (experimental)"
.section.note.GNU-stack,"",@progbits

Whereas at -O2 the object code is much smaller:

.p2align 4
.globl  f
.type   f, @function
f:
.LFB0:
.cfi_startproc
xorl%eax, %eax
cmpl$42, %edi
sete%al
ret
.cfi_endproc
.LFE0:
.size   f, .-f
.ident  "GCC: (GNU) 10.0.0 20200115 (experimental)"
.section.note.GNU-stack,"",@progbits

See also pr85330.  I can look into enabling it at all optimization levels for
GCC 11.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
[Bug 83819] [meta-bug] missing strlen optimizations

[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2020-01-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-14
  Component|c   |tree-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
So I see it already resolved to a direct compare during GENERIC folding
which is probably confused by the COMPOUND_LITERAL_EXPR:

  return memcmp ((const void *) , (const void *) &<<< Unknown tree:
compound_literal_expr
const struct a_tp D.2112 = {.x=42}; >>>, 4) == 0;

I think tree-ssa-strlen.c has code to change memcmp to memcmp_eq which can
be more optimally expanded.  It also has code to directly emit more optimal
GIMPLE.

Not sure why that doesn't fire here.  Confirmed.