[Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors

2024-05-04 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the example below (see https://godbolt.org/z/qnfT4fE5G ) convert and covert3 produce code that looks to me inefficient w/r/t

[Bug target/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484 --- Comment #9 from vincenzo Innocente --- We observe that including xmmintrin.h the behaviour of some code, notably abs(x), when x is float or double changes. And this depends on the platform as xmmintrin.h is x86_64 specific. Yes, is 20

[Bug target/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484 --- Comment #4 from vincenzo Innocente --- in C++ one is supposed to #include not I do not think that there is an explicit version of C++ headers for the intrinsics that avoids the conflicts between C and C++.

[Bug c++/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484 --- Comment #2 from vincenzo Innocente --- *** Bug 114483 has been marked as a duplicate of this bug. ***

[Bug c++/114483] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114483 vincenzo Innocente changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug c++/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484 --- Comment #1 from vincenzo Innocente --- xmmintrin.h includes mm_malloc.h which #include which using std::abs; (among others) see https://godbolt.org/z/cxo65rnr9 or this excerpt from c++ -E dump ``` # 32

[Bug c++/114484] New: #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: ---

[Bug c++/114483] New: #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: ---

[Bug tree-optimization/114363] inconsistent optimization of pow(x,2)+pow(y,2)

2024-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114363 --- Comment #4 from vincenzo Innocente --- Thanks Harald, I missed the point that float z = pow(double(x),2) and float z = x*x would indeed produce exactly the same result, while in all other cases of course not.

[Bug tree-optimization/114363] New: inconsistent optimization of pow(x,2)+pow(y,2)

2024-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- while pow(x,2) is optimized in x*x (float x) in pow(x,2)+pow(y,2) x and y are first promoted to double which I find inconsistent see https

[Bug libstdc++/112649] New: [c++23] in presence of inline functions and debug-info stacktrace reports the deepest callee

2023-11-21 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- Created attachment 56657 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56657=e

[Bug libstdc++/112348] [C++23] defect in struct hash>

2023-11-09 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112348 --- Comment #1 from vincenzo Innocente --- This patch works for me diff --git a/libstdc++-v3/include/std/stacktrace b/libstdc++-v3/include/std/stacktrace index da0e48d3532..9a0d0b16068 100644 --- a/libstdc++-v3/include/std/stacktrace +++

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-05 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 --- Comment #12 from vincenzo Innocente --- confirm that the patch solves the issue c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINLIB -fpic -shared -o liba.so -ldl;c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -L. -la

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 --- Comment #8 from vincenzo Innocente --- Thanks Ian for the patch. For testing I will need the full git diff (including the makefile itself as my autoconf is not compatible with gcc14). Backports down to gcc12 will be appreciated. Could you

[Bug libstdc++/112348] New: [C++23] defect in struct hash>

2023-11-02 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- gcc version 14.0.0 20231028 (experimental) [master r14-4988-g5d2a360f0a5] (GCC) auto k = std::hash()(std::stacktrace::current()); does not compile to

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-01 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 --- Comment #6 from vincenzo Innocente --- Sorry, made the (almost) full exercise: read the doc in https://en.cppreference.com/w/cpp/utility/stacktrace_entry and the code in stacktrace header file and in libstdc++-v3/src/c++23/stacktrace.cc

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-01 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 --- Comment #5 from vincenzo Innocente --- so if I add to std::cout << std::stacktrace::current() << '\n'; I get what needed Dl_info dlinfo; for (auto & entry : std::stacktrace::current() ) { dladdr((const

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-10-31 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 --- Comment #4 from vincenzo Innocente --- intel x86_64 uname -a Linux patatrack01 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Thu May 18 10:27:05 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux boost::backtrace works can provide example

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-10-30 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263 vincenzo Innocente changed: What|Removed |Added CC||ian at gcc dot gnu.org

[Bug libstdc++/112263] New: [C++23] std::stacktrace does not identify symbols in shared library

2023-10-28 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- using gcc version 14.0.0 20231028 (experimental) [master r14-4988-g5d2a360f0a5] (GCC) that contains the fix for #111936

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936 --- Comment #9 from vincenzo Innocente --- Thanks for the second patch. I was indeed struggling with autoconf versions (1.15 vd 1.16) Any chance to backport to gcc12 (our current production version)?

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936 --- Comment #7 from vincenzo Innocente --- not explicitly in the src tree. only run configure in the build directory. what I need to run in the src tree?

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936 --- Comment #5 from vincenzo Innocente --- My bad, long time I'm not using archive libraries and forgot about the order rule. The issue is indeed missing -fPIC. Thanks for the fast action. I applied the patch but it seems not sufficient. If

[Bug c++/111934] ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111934 --- Comment #3 from vincenzo Innocente --- with gcc version 14.0.0 20231024 (experimental) [master r14-4877-g724badcadf8] (GCC) I get the same ICE. Please note that one needs to include "iostream" (in my test compile with "-DICE") to trigger

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936 --- Comment #1 from vincenzo Innocente --- here is a minimal malloc hook that I would like to use [innocent@patatrack01 ctest]$ cat getStacktrace.cc #include std::string get_stacktrace() { std::string trace; for (auto & entry :

[Bug libstdc++/111936] New: std::stacktrace cannot be used in a shared library

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- I would like to use std::stacktrace in a shared library to be preloaded... when I try to build the library even for this minimal example cat

[Bug c++/111934] ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111934 --- Comment #1 from vincenzo Innocente --- sorry missed the version gcc version 14.0.0 20231021 (experimental) [master r14-4817-g405a4140fc3] (GCC)

[Bug c++/111934] New: ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- #ifdef ICE #include #endif struct Me { static Me & me() { thread_local auto me =

[Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in this simple code (on avx2) int sum(float const * x) { int ret = 0; for (int i=0; i<8; ++i) ret +

[Bug c++/109281] New: use std::optional results in suboptimal code

2023-03-25 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- In the following (almost real) code gcc emits suboptimal code if std::optional is used w/r/t home made one and clang see https://godbolt.org/z/Pba51Ye7Y - code

[Bug tree-optimization/109011] New: missed optimization in presence of __builtin_ctz

2023-03-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the following code foo does not vectorize, bar does. clang vectorize foo using a pattern that invokes vplzcntd (code made a bit complex to make

[Bug tree-optimization/108804] New: missed vectorization in presence of conversion from uint64_t to float

2023-02-15 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the following code [1] foo does not vectorize, bar doos compiled with -march=haswell -Ofast --no-math

[Bug tree-optimization/108677] wrong vectorization (when copy constructor is present?)

2023-02-06 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108677 --- Comment #3 from vincenzo Innocente --- sorry. the original internal bug report was for gcc 7.5 https://godbolt.org/z/9crafbqen where I think the generated code is indeed wrong (and does not depend on the presence of the constructor!) SO,

[Bug tree-optimization/108677] New: wrong vectorization (when copy constructor is present?)

2023-02-05 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in this real life code #include struct trig_pair { double CosPhi; double SinPhi; trig_pair() : CosPhi(1

[Bug target/106012] rsqrtps and rcpps instructions generated even if -fno-reciprocal-math specified

2022-12-20 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106012 --- Comment #6 from vincenzo Innocente --- just to confirm that -OfastĀ  -fno-reciprocal-math -mno-recip seems to inhibit all reciprocals... https://godbolt.org/z/f4bccb9GP

[Bug c++/107933] New: std::sqrt complies in intrinsics for float even if --no-builtin is provided

2022-11-30 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- on x86_64 float f(float x) { return std::sqrt(x);} compiles in sqrtss xmm0, xmm0 even if --no-builtin is provided

[Bug tree-optimization/106012] rsqrtps and rcpps instructiona generated even if -fno-reciprocal-math specified

2022-06-19 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106012 vincenzo Innocente changed: What|Removed |Added Summary|rsqrtss instruction |rsqrtps and rcpps

[Bug target/106012] New: rsqrtss instruction generated even if -mno-recip specified

2022-06-17 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- with option -Ofast -mno-recip rsqrtss instruction is still generated. https://godbolt.org/z/hGxrG7xPh inhibiting rsqrtss and rcpss

[Bug tree-optimization/104950] New: GCC does not emit branchless code

2022-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- In this example GCC fails to emit branchless code while CLANG does. In the actual application, measurements shows slow down up to a factor 2. I managed to force

[Bug tree-optimization/97707] avx512 math function invoked even if -mprefer-vector-width=256 specified

2020-11-04 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97707 --- Comment #3 from vincenzo Innocente --- the main point in using -mprefer-vector-width=256 is to avoid clock throttling in "mixed" workloads. In small benchmarks like this one avx512 is faster (even on an old Silver) even if trigger a slower

[Bug tree-optimization/97707] New: avx12 math function invoked even if -mprefer-vector-width=256 specified

2020-11-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- this code will invoke _ZGVeN8v_sin instead of _ZGVdN4v_sin making use of zmm registers #include int main

[Bug tree-optimization/92335] missed transformation to branchless

2019-11-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335 --- Comment #3 from vincenzo Innocente --- Understood for float it seems to me that the transformation does not occur for integer neither (signed or unsigned) as in using T= unsigned int; T bar(T const * __restrict__ x, T const * __restrict__

[Bug tree-optimization/92335] New: missed transformation to branchless

2019-11-03 Thread vincenzo.innocente at cern dot ch
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the following code (compiled with -O2 or -O3 and even with -march=haswell) gcc will use a branchless construct in foo but not in bar (changing from float to int

[Bug tree-optimization/88598] simplification of multiplication by 1 or 0 fails

2018-12-27 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88598 --- Comment #3 from vincenzo Innocente --- what I am interested in is NOT a constant array, more a small-size "sparse"-matrix that I can build explicitly at run time from other sources. I have examples using Eigen if of any interest (

[Bug tree-optimization/88598] New: simplification of multiplication by 1 or 0 fails

2018-12-26 Thread vincenzo.innocente at cern dot ch
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- g++ fails to optimize the code below even with -Ofast https://godbolt.org/z/mYRgVX independently of vectorization options https://godbolt.org/z/XMnCNz

[Bug tree-optimization/86855] REGRESSON: [8.0] -Ofast optimize away mm_set_ps(0.0f,0.0f,-0.0f,0.0f);

2018-08-04 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86855 --- Comment #5 from vincenzo Innocente --- I have indeed worked-around with const __m128i neg = _mm_set_epi32(0,0,0x8000,0); __m128i ret = __m128i(_mm_sub_ps(v5, v3)); return __m128(_mm_xor_si128(ret,neg)); const __m256i neg =

[Bug tree-optimization/86855] REGRESSON: [8.0] -Ofast optimize away mm_set_ps(0.0f,0.0f,-0.0f,0.0f);

2018-08-04 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86855 --- Comment #3 from vincenzo Innocente --- looks more undefined behavior as const __m128 neg = _mm_set_ps(0.0f,0.0f,-0.0f,-0.0f); return _mm_xor_ps(_mm_sub_ps(v5, v3), neg); with -O3 compiles in xorps .LC0(%rip), %xmm0 ret .LC0: .long

[Bug tree-optimization/86855] New: REGRESSON: [8.0] -Ofast optimize away mm_set_ps(0.0f,0.0f,-0.0f,0.0f);

2018-08-04 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- this function _m128 _mm_cross_ps(__m128 v1, __m128 v2) { // same order is _MM_SHUFFLE(3,2,1,0

[Bug tree-optimization/83857] [8 Regression] internal compiler error: in exact_div, at poly-int.h:2139

2018-01-15 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83857 --- Comment #2 from vincenzo Innocente --- (In reply to Richard Biener from comment #1) > I've seen a similar bug so maybe fixed already. if the similar bug is #83753 it is looks "fixed" in the version I tested (at least

[Bug tree-optimization/83857] New: [ICE] internal compiler error: in exact_div, at poly-int.h:2139

2018-01-15 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- Created attachment 43133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43133=edit directory with all fi

[Bug target/80566] New: no use of avx vmovups on ymm registry in set and copy

2017-04-29 Thread vincenzo.innocente at cern dot ch
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in this example #include int * foo() { int * p = new int[16]; memset(p,0,16*sizeof(int)); return p; } int * foo(int * q) { int * p = new int[16

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 --- Comment #19 from vincenzo Innocente --- Could you please have a look also to c++ and lto: this is what I get on my skylake: for c++ or lto -fno-split-paths pessimizes [innocent@vinavx3 scimark2TMP]$ gcc -march=native -Wall -Ofast *.c -lm ;

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 --- Comment #17 from vincenzo Innocente --- [innocent@vinavx3 innocent]$ mkdir scimark2TMP [innocent@vinavx3 innocent]$ cd scimark2TMP [innocent@vinavx3 scimark2TMP]$ wget http://math.nist.gov/scimark2/scimark2_1c.zip . . gcc version 7.0.1

[Bug target/80313] New: -march=znver1 produce worse code than -march=haswell

2017-04-04 Thread vincenzo.innocente at cern dot ch
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- Created attachment 41125 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41125=edit sef contained scimark2 MC benchmark just got hold of a AMD Ryze

[Bug tree-optimization/80248] sparse access to Array of structures does not vectorize using gathers

2017-03-31 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80248 --- Comment #2 from vincenzo Innocente --- side note: the difference is timing between "aos2" and "soa" seems to be fully accounted by the integer multiplication "3*k[i]".

[Bug target/80232] Ofast pessimizes Sparse matmult in scimark2 benchmark on avx platforms

2017-03-30 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80232 --- Comment #5 from vincenzo Innocente --- I confirm that gather is almost twice as fast on Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz w/r/t Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (used a benchmark version of PR80248 example) so on skylake,

[Bug tree-optimization/80248] New: sparse access to Array of structures does not vectorize

2017-03-29 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the following example "aos" does not vectorize while the equivalent aos2 does vectorize using vgatherdps i

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #10 from vincenzo Innocente --- added a self contained "benchmark" on my machine [innocent@vinavx3 ctest]$ c++ -Ofast -Wall SparseOnly.c -march=native ; time ./a.out 0.496u 0.000s 0:00.49 100.0%0+0k 0+0io 0pf+0w

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #9 from vincenzo Innocente --- Created attachment 41070 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41070=edit self contained benchmark of scimark2 SparseMat must content is not randomized param must be modified by hand in

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #8 from vincenzo Innocente --- My understanding of the gather latency is that it essentially corresponds to a load per cacheline: fast if all items are closeby, slower than scalar loads if items are all in different cachelines. Not

[Bug tree-optimization/80232] New: Ofast pessimizes Sparse matmult in scimark2 benchmark on avx platforms

2017-03-28 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- on my machine after the usual mkdir scimark2TMP cd scimark2TMP wget http://math.nist.gov/scimark2

[Bug rtl-optimization/80197] New: pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-26 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- Created attachment 41053 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41053=edit self contained benchm

[Bug tree-optimization/79594] New: -Waggressive-loop-optimizations incomplete and/or inconsistentt

2017-02-19 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- given cat aggressiveLoop.cc #include #include float x[1024]; float y[1024]; float w[512]; float z[128]; float c,q

[Bug tree-optimization/77859] Ofast needed to vectorize loop in presence of conditional code

2016-10-05 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77859 --- Comment #2 from vincenzo Innocente --- Thanks for the fast response I think I can "survive" with -O3 -fno-trapping-math in principle it should not change the binary compatibility of the output w/r/t -O2 and at best of my understanding it

[Bug tree-optimization/77859] New: Ofast needed in presence of conditional code

2016-10-05 Thread vincenzo.innocente at cern dot ch
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- It looks to me that to vectorize this code "relaxed floating point math" is not a requirement currently gcc version 7.0.0 20161004 (experiment

[Bug middle-end/71666] profile-generate not documented

2016-06-26 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71666 --- Comment #2 from vincenzo Innocente --- ok so is just the sentence "" See Optimize Options" which needs to be changed...

[Bug web/71666] New: profile-generate not documented

2016-06-26 Thread vincenzo.innocente at cern dot ch
Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- as of today -fprofile-generate does not seem to be documented in https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html it is quoted 4 times including a self-referencing "

[Bug gcov-profile/70993] New: ICE with gcov and lto

2016-05-07 Thread vincenzo.innocente at cern dot ch
Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- with gcc version 7.0.0 20160506 (experimental) [trunk revision 235977] (GCC) cat main.cpp int main() { return 0;} c++ -O2 main.cpp perf record -e cpu/event=0xc4,umask=0x20,name

[Bug c++/69564] [5/6 Regression] lto and/or C++ make scimark2 LU slower

2016-03-25 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564 --- Comment #19 from vincenzo Innocente --- patch applied to gcc version 6.0.0 20160324 (experimental) [trunk revision 234461] (GCC) I confirm the improvement in timing for c++ and lto timing difference between gcc and c++ seems to be inside

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564 --- Comment #5 from vincenzo Innocente --- it is a regression gcc version 4.9.3 (GCC) c++ -Ofast *.c; ./a.out ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564 --- Comment #3 from vincenzo Innocente --- > Any reason you are using the c++ driver here? Because I am interested in C++ performance never imagined that the c++ front-end could make a difference on such a code... >From my point of view it is

[Bug lto/69564] New: lto makes scimark2 LU slower

2016-01-30 Thread vincenzo.innocente at cern dot ch
Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- mkdir scimark2; cd scimark2 wget http://math.nist.gov/scimark2/scimark2_1c.zip unzip scimark2_1c.zip c++ -Ofast *.c; ./a.out c++ -Ofast *.c -flto; ./a.out with gcc 4.9.3 gcc version

[Bug c++/68180] New: [ICE] at cp/constexpr.c:2768 in initializing __vector in a loop

2015-11-02 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; constexpr float32x4_t fill(float x) { float32x4_t v{0

[Bug c++/68125] New: std::sqrt prevent use of associative math

2015-10-28 Thread vincenzo.innocente at cern dot ch
++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- with -Ofast the code generated differs float rsqrt1(float a, float x, float y) { return a/std::sqrt(x)/std::sqrt(y); } float rsqrt2(float a, float x, float y) { return

[Bug c++/68125] std::sqrt prevent use of associative math

2015-10-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68125 --- Comment #2 from vincenzo Innocente --- Thanks Marc for the fast check I am still with gcc version 6.0.0 20150801 (experimental) [trunk revision 226463] (GCC) will update and verify

[Bug c++/68125] std::sqrt prevent use of associative math

2015-10-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68125 vincenzo Innocente changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug libgomp/67406] OMP SIMD cloning does not generate fma instruction for AVX2 target

2015-09-06 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67406 --- Comment #5 from vincenzo Innocente --- does not work... pragma omp declare simd notinbranch float __attribute__ ((__target__ ("default"))) fma(float x,float y, float z); #pragma omp declare simd notinbranch float __attribute__ ((__target__

[Bug libgomp/67406] OMP SIMD cloning does not generate fma instruction for AVX2 target

2015-09-06 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67406 --- Comment #4 from vincenzo Innocente --- #pragma omp declare simd notinbranch float __attribute__ ((__target__ ("default"))) fma(float x,float y, float z) { return x+y*z; } #pragma omp declare simd notinbranch float __attribute__

[Bug libgomp/67406] New: OMP SIMD cloning does not generate fma instruction for AVX2 target

2015-08-31 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch CC: jakub at gcc dot gnu.org Target Milestone: --- given at simdCloning.cc #pragma omp declare simd notinbranch float fma(float x

[Bug libgomp/67406] OMP SIMD cloning does not generate fma instruction for AVX2 target

2015-08-31 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67406 --- Comment #2 from vincenzo Innocente --- is there any mechanism to tell gcc to generate the AVX2 clone using fma? I understand it reduces portability still at the moment I have to support mostly Intel platforms. for AMD, gcc suggests to use

[Bug c++/67335] New: [ICE] in compiling mop sims function with unused argument

2015-08-24 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- cat ompsimd_t.cc #pragma omp declare simd notinbranch uniform(q) float bar(float x, float * q, int){ return q[0]+q[1]*x; } c++ -fopenmp

[Bug tree-optimization/67326] New: [5.2/6.0 regression] -ftree-loop-if-convert-stores does not vectorize conditional assignment (anymore)

2015-08-23 Thread vincenzo.innocente at cern dot ch
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in 5.1 looks ok (according to http://gcc.godbolt.org) cat condBug.cc float v0

[Bug tree-optimization/63644] New: Kahan Summation with fast-math, pattern not always recognized

2014-10-25 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following example (compiled with -Ofast -std=c++11) the kahan summation pattern is recognized in sum, not in counter see http://goo.gl

[Bug tree-optimization/63599] New: wrong branch optimization with Ofast in a loop

2014-10-20 Thread vincenzo.innocente at cern dot ch
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch given this code #include x86intrin.h typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; inline float32x4_t atan(float32x4_t t) { constexpr float PIO4F

[Bug target/63599] wrong branch optimization with Ofast in a loop

2014-10-20 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I agree that the code produces correct results. It looks to me sub-optimal. I understand that with Ofast the sequence below will be always executed andps%xmm5

[Bug tree-optimization/56829] Feature request: generic builtin to support control flow in vectorized code (movemask, vec_any/all_*)

2014-10-05 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- just to add the OpenCL syntax and doc https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/any.html

[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2014-08-23 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Known to fail

[Bug web/61744] New: misleading documentation about cast of extended vectors

2014-07-08 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch At the very bottom of https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html one reads It is possible to cast from one vector type to another, provided

[Bug tree-optimization/61747] New: min,max pattern not always properly optimized (for sse4 targets)

2014-07-08 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch I was expecting gcc to substitute min/max instruction for (a/b) ? a : b; even for O2. This is not always the case, only Ofast provides

[Bug tree-optimization/61747] min,max pattern not always properly optimized (for sse4 targets)

2014-07-08 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61747 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I think you need -fno-signed-zeros for the transformation to be valid. possible. but then is the O2 code that is wrong? in any case adding -fno-signed-zeros makes

[Bug tree-optimization/61747] min,max pattern not always properly optimized (for sse4 targets)

2014-07-08 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61747 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch --- confirm that -ffinite-math-only -fno-signed-zeros is equivalent to Ofast in this case so we conclude that the code generated at O2 is wrong and -ffinite-math-only -fno

[Bug target/61731] New: Feature request: generic builtin for conversion operator among vectors

2014-07-07 Thread vincenzo.innocente at cern dot ch
: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch gcc is lacking a mechanism to convert (C-style cast) efficiently extended-vectors among different types. clang has recently introduce

[Bug tree-optimization/56829] Feature request: generic builtin to support control flow in vectorized code (movemask, vec_any/all_*)

2014-07-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Summary|Feature request: generic

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2014-06-19 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch --- so with latest 4.9 gcc version 4.10.0 20140611 (experimental) [trunk revision 211467] (GCC) situation has not changed much (the scalar version is now faster!): I think

[Bug c++/61381] constexpr non captured by template lambda

2014-06-03 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61381 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I am still at trunk revision 210507 will update and test again

[Bug c++/61381] constexpr non captured by template lambda

2014-06-03 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61381 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Status|UNCONFIRMED

[Bug c++/61381] New: constexpr non captured by template lambda

2014-06-01 Thread vincenzo.innocente at cern dot ch
++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch cat ceLambda.cc struct Bar { constexpr Bar(float i):f(i){}; float f;}; float foo1(float x) { constexpr Bar z{0}; auto f = [=](auto a, auto b) - Bar { return z;}; return f(x,x).f

[Bug tree-optimization/61338] New: too many permutation in a vectorized reverse loop

2014-05-28 Thread vincenzo.innocente at cern dot ch
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in this example gcc generates 4 permutations for foo (while none is required) On the positive side the code for bar (which is a more realistic use case) seems optimal. float

[Bug tree-optimization/61338] too many permutation in a vectorized reverse loop

2014-05-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch --- if I write it reverse void foo2() { for (int i=511; i=0; --i) x[1023-i] += y[1023-i]*z[512-i]; } its ok __Z4foo2v: LFB1: leaq2048+_x(%rip), %rdx xorl

[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)

2014-05-26 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 --- Comment #23 from vincenzo Innocente vincenzo.innocente at cern dot ch --- Which Syntax? I want to reuse the same code for the various architecture and let gcc deal with vectorization details. The best I manage to do to share code is something

  1   2   3   4   5   >