[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2023-06-06 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 Raphael C changed: What|Removed |Added CC||drraph at gmail dot com --- Comment #12

[Bug tree-optimization/79201] missed optimization: sinking doesn't handle calls, swap PRE and sinking

2021-06-11 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79201 --- Comment #5 from Raphael C --- I can confirm you now get f: mov eax, 1 ret with gcc 8 onwards.

[Bug target/79185] [8 Regression] register allocation in the addition of two 128/9 bit ints

2021-06-11 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79185 --- Comment #17 from Raphael C --- Tested in gcc 11.1 with -O2 ai(__int128, __int128): mov r9, rdi mov rax, rdx mov r8, rsi mov rdx, rcx add rax, r9 adc rdx, r8 ret

[Bug c/81127] New: Complex division misses vectorisation opportunity

2017-06-19 Thread drraph at gmail dot com
: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- This report has two parts. The first is about complex float division and the second about complex double division. --- Part 1 --- Consider: #include complex float f(complex

[Bug tree-optimization/46186] Clang creates code running 1600 times faster than gcc's

2017-05-24 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186 Raphael C changed: What|Removed |Added CC||drraph at gmail dot com --- Comment #26

[Bug c/80852] Optimisation fails to recognise sum computed by loop

2017-05-22 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852 --- Comment #2 from Raphael C --- You are quite right. This code shows the same issue: int foo(int num) { int a = 0; for (int x = 0; x < num; x+=2) { a += x; } return a; }

[Bug c/80852] New: Optimisation missed for loop with condition that is always true

2017-05-22 Thread drraph at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider this (slightly odd) code: int foo(int num) { int a = 0; for (int x = 0; x < num; x+=2) { if (!(x % 2)) { a +

[Bug c/79726] New: Type conversion not vectorisde

2017-02-27 Thread drraph at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider: double f(double x[]) { float p = 1.0; for (int i = 0; i < 16; i++) p += x[i]; return p; } gcc with -O3 -march=core-avx2 -ffast-math gives: f: vmovsd xmm0, QWORD

[Bug c/79725] New: Sinking opportunity missed if complex type is changed

2017-02-27 Thread drraph at gmail dot com
Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider: #include complex double f(complex double x[]) { complex float p = 1.0; for (int i = 0; i < 100; i++) p = x[i]; return p; } This compiles using

[Bug c/79491] New: Possibly inefficient code for the inner product of two vectors

2017-02-13 Thread drraph at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider: float f(float x[], float y[]) { float p = 0; for (int i = 0; i <64; i++) p += x[i] * y[i]; return p; } Using gcc 7 (snaps

[Bug tree-optimization/79460] gcc fails to optimise out a trivial additive loop for seemingly arbitrary numbers of iterations

2017-02-12 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460 --- Comment #1 from Raphael C --- After some experimentation (also carried out by Hagen von Eitzen), it seems that any limit of at least 72 which is also a multiple of 4 causes the same optimisation problem. That is the loop is *not* optimised

[Bug c/79460] New: gcc fails to optimise out a simple additive loop for seemingly arbitrary numbers of iterations

2017-02-10 Thread drraph at gmail dot com
Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider: float f(float x[]) { float p = 1.0; for (int i = 0; i < 202; i++) p += 1; retur

[Bug middle-end/79359] Squaring a complex float gives inefficient code with or without -ffast-math

2017-02-06 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79359 --- Comment #2 from Raphael C --- As an additional data point in relation to Part 2 (that is without -ffast-math). In gcc 7 -O3 -ffinite-math-only gives f: movqQWORD PTR [rsp-16], xmm0 movss xmm3, DWORD PTR [rsp-12]

[Bug c/79359] Squaring a complex float gives inefficient code with or without -ffast-math

2017-02-05 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79359 --- Comment #1 from Raphael C --- In case it's of any help, here is an explanation of the assembly that ICC gives with -fp-model strict. R = real and C = complex. Here "x" just means don't know or unused. We start with xmm0 = {x, x, C, R}.

[Bug c/79357] Doubling a single complex float gives inefficient code

2017-02-03 Thread drraph at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79357 --- Comment #1 from Raphael C --- I omitted .L_2il0floatpacket.1: .long 0x4000,0x4000,0x,0x .long 0x4000,0x from the assembly. In other words it just multiplies by 2.0f.

[Bug c/79359] New: Squaring a complex float gives inefficient code with or without -ffast-math

2017-02-03 Thread drraph at gmail dot com
: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider: #include complex float f(complex float x) { return x*x; } This PR has two parts. Part 1. In gcc 7 with -Ofast

[Bug c/79357] New: Doubling a single complex float is not vectorised

2017-02-03 Thread drraph at gmail dot com
: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider the following code: #include complex float f(complex float x) { return 2*x; } gcc with -O3 -march=core-avx2 gives: f: vmovq QWORD PTR [rsp-8], xmm0

[Bug c/79336] New: Poor vectorisation of additive reduction of complex array

2017-02-02 Thread drraph at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider this code: #include complex float f(complex float x[]) { complex float p = 1.0; for (int i = 0; i < 32; i++) p += x[i]; retur

[Bug c/79201] New: issed optimization: gcc fails to cut out unnecessary loop.

2017-01-23 Thread drraph at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider this code: int f(int n) { int i,j=0; for (i = 0; i < 32; i++) { j = __builtin_ffs(i); } return j; } With gcc -O3 you get

[Bug c/79185] New: Possible regression in gcc 4.9 and later with the addition of two 128 bit ints

2017-01-22 Thread drraph at gmail dot com
: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider this code: __int128_t ai (__int128_t x, __int128_t y) { return x + y; } In gcc 4.8.5, clang and icc using -O2

[Bug c/79102] New: gcc fails to auto-vectorise the product of an array of complex floats

2017-01-16 Thread drraph at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: drraph at gmail dot com Target Milestone: --- Consider this simple piece of code. #include complex float f(complex float x[]) { complex float p = 1.0; for (int i = 0; i < 128