[Bug c/108593] New: No inlining after function cloning

2023-01-29 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108593

Bug ID: 108593
   Summary: No inlining after function cloning
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

int __attribute__ ((noinline))
foo (int arg)
{
  return 2 * arg;
}

int
bar (int arg)
{
  return foo (5);
}


results in:
foo.constprop.0:
mov eax, 10
ret
foo:
lea eax, [rdi+rdi]
ret
bar:
jmp foo.constprop.0


But ... why foo.constprop.0 is not inlined fully into bar? Maybe
foo.constprop.0  inherits noinline attribute from foo? 

If so, gcc should drop attributes from cloned functions..

[Bug rtl-optimization/7061] Access of bytes in struct parameters

2022-06-11 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #10 from Dávid Bolvanský  ---
llvm emits just:
im: # @im
shufps  xmm0, xmm0, 85  # xmm0 = xmm0[1,1,1,1]
ret

[Bug ipa/104187] Call site specific attribute to control inliner

2022-03-03 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187

--- Comment #8 from Dávid Bolvanský  ---
So this works in Clang now

int foo(int x, int y) { // any compiler will happily inline this function
return x / y;
}

int test(int x, int y) {
int r = 0;
[[clang::noinline]] r += foo(x, y); // for some reason we don't want any
inlining here
return r;
}


foo(int, int): # @foo(int, int)
  mov eax, edi
  cdq
  idiv esi
  ret
test(int, int): # @test(int, int)
  jmp foo(int, int) # TAILCALL

foo(int, int): # @foo(int, int)
  mov eax, edi
  cdq
  idiv esi
  ret
test(int, int): # @test(int, int)
  jmp foo(int, int) # TAILCALL

[Bug ipa/104187] Call site specific attribute to control inliner

2022-01-25 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187

--- Comment #5 from Dávid Bolvanský  ---
So you prefer eg.

g = a[i] - [[gnu::always_inline]] foo(x, y) + 2 * bar();

over

g = a[i] - __builtin_always_inline(foo(x, y)) + 2 * bar();

?

What is your proposed syntax?

[Bug c/104187] New: Call site specific attribute to control inliner

2022-01-22 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187

Bug ID: 104187
   Summary: Call site specific attribute to control inliner
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

It could be useful to have more control over inlining. Use cases:



int foo();

void bar();

int g; 

void test()
{
 g = __builtin_always_inline(foo()); // force inlining of foo() here
 __builtin_noinline(bar()); // never inline bar to this function
}

[Bug tree-optimization/93150] (A) == CST1 &( ((A)==CST2) | ((A)==CST3) ) is not simplified

2021-11-11 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93150

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #2 from Dávid Bolvanský  ---
Bin ops with constants are simplified by compiler itself..

[Bug tree-optimization/103002] New: Missed loop unrolling opportunity

2021-10-29 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103002

Bug ID: 103002
   Summary: Missed loop unrolling opportunity
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

#define C 3


struct node {
struct node *next;
int payload;
};

static int count_nodes(const node* p) {
int size = 0;
while (p) {
p = p->next;
size++;
}
return size;
}

bool has_one_node(const node* p) {
return count_nodes(p) == 1;
}


bool has_C_nodes(const node* p) {
return count_nodes(p) == C;
}

has_one_node(node const*):# @has_one_node(node const*)
testrdi, rdi
je  .LBB0_1
mov eax, 1
.LBB0_3:# =>This Inner Loop Header: Depth=1
mov rdi, qword ptr [rdi]
add eax, -1
testrdi, rdi
jne .LBB0_3
testeax, eax
seteal
ret
.LBB0_1:
xor eax, eax
ret
has_C_nodes(node const*): # @has_C_nodes(node const*)
testrdi, rdi
je  .LBB1_1
mov eax, 3
.LBB1_3:# =>This Inner Loop Header: Depth=1
mov rdi, qword ptr [rdi]
add eax, -1
testrdi, rdi
jne .LBB1_3
testeax, eax
seteal
ret
.LBB1_1:
xor eax, eax
ret


has_C_nodes is simple with some kind of loop deletion pass, but generally,
these loops can be unrolled for some reasonable C values.


https://godbolt.org/z/do656c17b

[Bug tree-optimization/102564] New: Missed loop vectorization with reduction and ptr load/store inside loop

2021-10-02 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102564

Bug ID: 102564
   Summary: Missed loop vectorization with reduction and ptr
load/store inside loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

void test1(int *p, int *t, int N) {
for (int i = 0; i != N; i++) *t += p[i];
}

void test2(int *p, int *t, int N) {
if (N > 1024) // hint, N is not small
for (int i = 0; i != N; i++) *t += p[i];
}

void test3(int *p, int *t, int N) {
if (N > 1024) { // hint, N is not small
int s = 0;
for (int i = 0; i != N; i++) s += p[i];
*t += s;
}
}

test3 is successfully vectorized with LLVM, GCC, ICC. Sadly, only ICC can catch
test1 and test2.

https://godbolt.org/z/PzoYd4eEK

[Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars

2021-09-25 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

Bug ID: 102483
   Summary: Regression in codegen of reduction of 4 chars
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

char foo (char* p)
 {
   char sum = 0;
for (int i = 0; i != 4; i++)
sum += p[i];
 return sum;
  }

-O3 -march=x86-64


GCC trunk:

foo:
mov edx, DWORD PTR [rdi]
movzx   eax, dh
mov ecx, edx
add eax, edx
shr ecx, 16
add eax, ecx
shr edx, 24
add eax, edx
ret


GCC 11 (much better):
foo:
movzx   eax, BYTE PTR [rdi+1]
add al, BYTE PTR [rdi]
add al, BYTE PTR [rdi+2]
add al, BYTE PTR [rdi+3]
ret


Best? llvm-mca says so..

foo:# @foo
movdxmm0, dword ptr [rdi]   # xmm0 = mem[0],zero,zero,zero
pxorxmm1, xmm1
psadbw  xmm1, xmm0
movdeax, xmm1
ret


https://godbolt.org/z/sT9svvj7W

[Bug c/100260] New: DSE: join stores

2021-04-25 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100260

Bug ID: 100260
   Summary: DSE: join stores
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

#include 

struct pam {
  void *p1;
  void *p2;
  #ifdef LONG
  unsigned long size;
  #else
  unsigned int pad;
  unsigned int size;
  #endif
};

extern int use(struct pam *param);

unsigned int foo(void) {
  struct pam s_pam;
  memset(_pam, 0, sizeof(struct pam));
  s_pam.size = 1;
  return use(_pam);
}

INT

foo():
  sub rsp, 40
  pxor xmm0, xmm0
  mov rdi, rsp
  mov DWORD PTR [rsp+16], 0
  mov DWORD PTR [rsp+20], 1
  movaps XMMWORD PTR [rsp], xmm0
  call use(pam*)
  add rsp, 40
  ret

LONG

foo():
  sub rsp, 40
  pxor xmm0, xmm0
  mov rdi, rsp
  movaps XMMWORD PTR [rsp], xmm0
  mov QWORD PTR [rsp+16], 1
  call use(pam*)
  add rsp, 40
  ret

Stores
  mov DWORD PTR [rsp+16], 0
  mov DWORD PTR [rsp+20], 1
can be replaced with one mov QWORD..

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #7 from Dávid Bolvanský  ---
Still bad for -O3 -march=skylake-avx512

https://godbolt.org/z/azb8aTG43

[Bug target/98348] [10 Regression] GCC 10.2 AVX512 Mask regression from GCC 9

2021-04-11 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #20 from Dávid Bolvanský  ---
Some small regression (missed opportunity to use vptestnmd):

Current trunk

compare(unsigned int __vector(16)):
  vpxor xmm1, xmm1, xmm1
  vpcmpd k0, zmm0, zmm1, 0
  vpmovm2d zmm0, k0
  ret

GCC 9.2

compare(unsigned int __vector(16)):
  vptestnmd k0, zmm0, zmm0
  vpmovm2d zmm0, k0
  ret


https://gcc.godbolt.org/z/5vK68jM3r

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #5 from Dávid Bolvanský  ---
User knows the data better, so he/she may prefer abs with branch.

Also PGO may say that branch for abs is better based on profile data.

[Bug c/98713] New: Failure to generate branch version of abs if user requested it

2021-01-17 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Bug ID: 98713
   Summary: Failure to generate branch version of abs if user
requested it
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

int branch_abs(int v) {
return __builtin_expect(v > 0, 1) ? v : -v;
}

GCC -O2 now:

branch_abs:
  mov eax, edi
  neg eax
  cmovs eax, edi
  ret


Expected:
branch_abs:
  mov eax, edi
  test edi, edi
  js .LBB0_1
  ret
.LBB0_1:
  neg eax
  ret


Same for min/max.

[Bug other/98663] gcc generates endless loop at -O2 or greater depending on order of testExpression

2021-01-13 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98663

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #1 from Dávid Bolvanský  ---
Compiler can do anything if there is UB in the code.

[Bug c/98658] Loop idiom recognization for memcpy/memmove

2021-01-13 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658

--- Comment #3 from Dávid Bolvanský  ---
Yes, runtime check.

[Bug c/98658] Loop idiom recognization for memcpy/memmove

2021-01-13 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658

--- Comment #1 from Dávid Bolvanský  ---
ICC produces memcpy:
https://godbolt.org/z/oKxxTM

[Bug c/98658] New: Loop idiom recognization for memcpy/memmove

2021-01-13 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658

Bug ID: 98658
   Summary: Loop idiom recognization for memcpy/memmove
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

void copy(int *__restrict__ d, int * s, __SIZE_TYPE__ sz) {
__SIZE_TYPE__ i;
for (i = 0; i < sz; i++) {
*d++ = *s++;
}
}

gcc emits call to memcpy.



void copy(int * d, int * s, __SIZE_TYPE__ sz) {
__SIZE_TYPE__ i;
for (i = 0; i < sz; i++) {
*d++ = *s++;
}
}


gcc could emit memmove, but currently does not:
https://godbolt.org/z/5n1rnh