[Bug c++/61292] auto keyword to vector reference generates wrong alignment move (causing runtime segfault)

2014-05-25 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61292 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Summary|auto keyword to vector

[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)

2014-05-25 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Version|4.7.0

[Bug tree-optimization/61301] New: missed optimization of move if vector passed by reference

2014-05-24 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following test shuffle2 generates not optimized moves. the other two are ok. the problem occurs in real life when the vector is a data

[Bug tree-optimization/61301] missed optimization of move if vector passed by reference

2014-05-24 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61301 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- At least when shuffle2 is inlined it is likely to become like shuffle1... not sure for the case of a struct such as foo (unless the instance of foo itself

[Bug c++/61292] New: auto keyword to vector reference generates wrong alignment move

2014-05-23 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch given typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; typedef float __attribute__( ( vector_size( 16 ) , aligned(4) ) ) float32x4a4_t

[Bug tree-optimization/61245] New: ICE at in expand_ANNOTATE, at internal-fn.c:127 called from cfgexpand.c

2014-05-20 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch apologize for not reducing (trivial reduction (bar below) works) given cat NaiveDod.cc #includearray #includevector #includeutility

[Bug tree-optimization/61247] New: vectorization fails if conversion from unsigned int to signed int is involved

2014-05-20 Thread vincenzo.innocente at cern dot ch
: minor Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following example cat uintLoop.cc unsigned int N; float * a, *b, *c; using Ind = /*unsigned*/ int; inline float val

[Bug tree-optimization/60823] [4.9/4.10 Regression] ICE in gimple_expand_cfg, at cfgexpand.c:5644

2014-05-19 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added CC

[Bug tree-optimization/61194] [4.9/4.10 Regression] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-16 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch --- great! the original version (that vectorized in 4.8.1) void barX() { for (int i=0; i1024; ++i) { k[i] = (x[i]0) (w[i]y[i]); z[i] = (k[i]) ? z[i] : y[i

[Bug tree-optimization/61194] [4.9/4.10 Regression] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-16 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I confirm that with last patch the regression is gone also in a more complex actual application I had. The regression concerns only comment 2 and 3. all the other

[Bug tree-optimization/61194] [4.9/4.10 Regression] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-16 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #14 from vincenzo Innocente vincenzo.innocente at cern dot ch --- provided that future patches will make the code in comment 1 and 2 (and bar) go vectorize is fine with me. if it ends up to vectorize also with bool instead of int

[Bug tree-optimization/61175] failing vectorization in case of complex access pattern

2014-05-15 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61175 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch --- adding #pragma GCC ivdep before the loop makes no difference

[Bug tree-optimization/61194] New: vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-15 Thread vincenzo.innocente at cern dot ch
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch z[i] = ( (x[i]0) (w[i]0)) ? z[i] : y[i]; produces bit-precision arithmetic not supported. note

[Bug tree-optimization/61194] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-15 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch --- what I find quite absurd is that void barX() { for (int i=0; i1024; ++i) { k[i] = x[i]0; k[i] = w[i]y[i]; //z[i] = (k[i]) ? z[i] : y[i]; } } vectorize

[Bug tree-optimization/61194] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-15 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- new test code cat cond0.cc float x[1024]; float y[1024]; float z[1024]; float w[1024]; int k[1024]; void barX() { for (int i=0; i1024; ++i) { k[i] = (x[i]0) (w

[Bug tree-optimization/61194] [4.9/4.10 Regression] vectorization failed with bit-precision arithmetic not supported even if conversion to int is requested

2014-05-15 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch --- of course if you can make z[i] = ( (x[i]0) (w[i]0)) ? z[i] : y[i]; to vectorize would be even better!

[Bug tree-optimization/61171] New: vectorization fails for a reduction in presence of subtraction

2014-05-13 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch give this code cat bug.cc float px[1024]; float xx, vv; unsigned int N=1024; void ok() { for (auto j=0U; jN; ++j) { auto ax = px[j]-xx

[Bug tree-optimization/61175] New: failing vectorization

2014-05-13 Thread vincenzo.innocente at cern dot ch
Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch of these three function only oneOk vectorize. float px[1024]; float vx[1024]; unsigned int N=1024; void one(unsigned int i) { for (auto j=i+1; jN; ++j) { auto ax = px[j]-px[i]; vx[i

[Bug tree-optimization/59262] New: __attribute__ ((optimize())) broken (and corrupts optimization of the whole compilation unit)

2013-11-23 Thread vincenzo.innocente at cern dot ch
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in latest 4.9. seen in 4.8.1 too take cat attribute.cc inline float sum(float x, float y) { return x+y

[Bug libstdc++/58982] New: [4.9 Regression] std::vectorstd::atomicint vai(10); does not compile anymore

2013-11-03 Thread vincenzo.innocente at cern dot ch
: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch gcc version 4.9.0 20131011 (experimental) [trunk revision 203426] (GCC) ok gcc version 4.9.0 20131102 (experimental) [trunk revision

[Bug tree-optimization/58902] New: small matrix multiplication non vectorized

2013-10-28 Thread vincenzo.innocente at cern dot ch
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following example matmul and matmul2 do not vectorize the manual unroll does c++ -std=c++11 -Ofast -S m3x10.cc -march=corei7-avx -fopt-info-vec-all gcc version 4.9.0 20131011

[Bug tree-optimization/58821] New: conditional reduction does not vectorize

2013-10-21 Thread vincenzo.innocente at cern dot ch
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following foo vectorize bar does not (bar does not vectorize even for if (x[i]0) s+=x[i]; ) compiled as c++ -Ofast -fopt-info-loop -S condRed.cc -fopenmp -ftree-loop

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-08 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #28 from vincenzo Innocente vincenzo.innocente at cern dot ch --- updated to the new revision gcc version 4.9.0 20131007 (experimental) [gomp-4_0-branch revision 203250] (GCC) [innocent@olsnba04 parallel]$ setenv OMP_PROC_BIND

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-08 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #30 from vincenzo Innocente vincenzo.innocente at cern dot ch --- better: as usual nastier bugs are in the tests! [innocent@olsnba04 parallel]$ strace ./affinity-1.exe | grep affin execve(./affinity-1.exe, [./affinity-1.exe], [/* 61

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch --- getconf -a | grep _NPROCESSORS _NPROCESSORS_CONF 32 _NPROCESSORS_ONLN 32 ls -l /sys/devices/system/cpu/ total 0 drwxr-xr-x 8 root root

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch --- do you have access to a 32 cpu machine? btw on XEON-PHI one can have 200 cpus

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #10 from vincenzo Innocente vincenzo.innocente at cern dot ch --- seems working [innocent@olsnba04 parallel]$ c++ -std=c++11 -Ofast -fopenmp simpleOMP.cpp [innocent@olsnba04 parallel]$ ./a.out max thread 32 [innocent@olsnba04

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch --- [innocent@olsnba04 parallel]$ setenv OMP_PROC_BIND true; setenv OMP_PLACES 'threads' [innocent@olsnba04 parallel]$ gcc -fopenmp trivialOMP.cpp [innocent@olsnba04

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #14 from vincenzo Innocente vincenzo.innocente at cern dot ch --- On 7 Oct, 2013, at 10:06 AM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #12 from Jakub

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #16 from vincenzo Innocente vincenzo.innocente at cern dot ch --- ./affinity-1.exe Initial thread #1 thread 1 #1 thread 0 #1 thread 3 #1 thread 2 #1,#1 thread 3,1 #1,#1 thread 3,0 #1,#1 thread 3,2 #1,#2 thread 3,4 #1,#2 thread 3,0 #1

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #18 from vincenzo Innocente vincenzo.innocente at cern dot ch --- On 7 Oct, 2013, at 12:27 PM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #17 from Jakub

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #19 from vincenzo Innocente vincenzo.innocente at cern dot ch --- On 7 Oct, 2013, at 12:27 PM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: or config.h doesn't defined HAVE_PTHREAD_AFFINITY_NP, then that's expected

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #22 from vincenzo Innocente vincenzo.innocente at cern dot ch --- on the XEON setenv OMP_PROC_BIND false reakpoint 1, main () at /home/data/newsoft/gcc-gomp4/libgomp/testsuite/libgomp.c/affinity-1.c:181 181/home/data/newsoft/gcc

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #24 from vincenzo Innocente vincenzo.innocente at cern dot ch --- ok, modified to = taskset -c 0-31 gdb ./affinity-1.exe GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) (gdb) b /home/data/newsoft/gcc-gomp4/libgomp/testsuite

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-07 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #26 from vincenzo Innocente vincenzo.innocente at cern dot ch --- On 7 Oct, 2013, at 3:02 PM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #25 from Jakub

[Bug libgomp/58642] New: gomp regression: not honoring anymore task set and numactl

2013-10-06 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch CC: jakub at gcc dot gnu.org till [gomp-4_0-branch revision 202766] int main() { std::cout max thread omp_get_max_threads() std::endl

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-06 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch --- strange indeed rhel6: so is 2.12 or my own version GNU C Library stable release version 2.13, I build gcc by myself c++ -v Using built-in specs. COLLECT_GCC=c

[Bug libgomp/58642] gomp regression: not honoring anymore task set and numactl

2013-10-06 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch --- 24 thread machine ok innocent@vocms19 parallel]$ c++ -Ofast -std=c++11 -fopenmp simpleOMP.cpp [innocent@vocms19 parallel]$ ./a.out max thread 24 [innocent@vocms19

[Bug libgomp/58482] gomp4: user defined reduction produce wrong result

2013-09-21 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58482 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I see. I have several use cases in which the reduction requires the access to two variables (minloc for instance: the minimum and its location) btw tried omp parallel

[Bug libgomp/58482] New: gomp4: user defined reduction produce wrong result

2013-09-20 Thread vincenzo.innocente at cern dot ch
: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch CC: jakub at gcc dot gnu.org I acknowledge that my understanding of omp declare is still limited. Still the example below produces different result with and w/o

[Bug libgomp/58482] gomp4: user defined reduction produce wrong result

2013-09-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58482 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- Thanks Jakub for the clear answer. The reduction operator should be strictly commutative! and I now understand the meaning of omp declare reduction (I hope) so I

[Bug tree-optimization/58472] New: gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch #includecmath float a[1024]; float b[1024]; float sumO1() { auto s = 0.f; #pragma omp simd reduction(+:s) for (auto i=0U;i1024;++i

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- yes cat omp4red.cc float a[1024]; float b[1024]; float sumO1() { float s = 0.f; #pragma omp simd reduction(+:s) for (int i=0;i1024;++i) { s += a[i]*b[i

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch --- on linux c++ -O2 -ftree-vectorizer-verbose=1 -S omp4red.cc -fopenmp omp4red.cc:8:13: note: loop vectorized omp4red.cc: In function 'float sumO1()': omp4red.cc:4:7

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch --- gcc -O2 libgomp/testsuite/libgomp.c/simd-3.c -fopenmp libgomp/testsuite/libgomp.c/simd-3.c: In function ‘foo’: libgomp/testsuite/libgomp.c/simd-3.c:14:1: internal

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #6 from vincenzo Innocente vincenzo.innocente at cern dot ch --- seems so gcc -O2 libgomp/testsuite/libgomp.c/simd-4.c -fopenmp c++ -O2 -S omp4red.cc -fopenmp| cat omp4red.s .text .align 4,0x90 .globl __Z5sumO1v

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch --- Yes I compile gcc with -O2 -ftree-vectorize on linux I also do bootstrap-lto strange that the compiler does not warn about this uninitialized variable: it does

[Bug tree-optimization/58472] gomp4: ICE in in vectorizable_store, at tree-vect-stmts.c:4192

2013-09-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472 --- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch --- w/o opening another bug report c++ -O2 -S omp4red.cc -fopenmp -Wall omp4red.cc: In function ‘float sumO1()’: omp4red.cc:6:9: warning: ‘simduid.0’ is used uninitialized

[Bug libgomp/58462] New: gomp4: invalid controlling predicate for != ( is ok)

2013-09-18 Thread vincenzo.innocente at cern dot ch
Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch CC: jakub at gcc dot gnu.org took me years to learn and teach to use != instead of …. float a[1024]; float b[1024]; void err() { #pragma omp simd for (int i=0;i

[Bug libgomp/58462] gomp4: invalid controlling predicate for != ( is ok)

2013-09-18 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58462 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- Thanks Jakub. Downloaded the standard. waiting for more examples of usage It is a pity that it does not support c++ range loop Let me highjack this bug to congratulate

[Bug ipa/58291] New: ICE with ipa-pta

2013-09-01 Thread vincenzo.innocente at cern dot ch
: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch this is a regression w.r.t. gcc version 4.9.0 20130820 (experimental) [trunk revision 201887] (GCC) c++ -g -O2 -c -std=gnu++11 -fipa-pta ipa_err.i RooMinimizer.cc: In destructor 'RooMinimizer::~RooMinimizer

[Bug ipa/58291] ICE with ipa-pta

2013-09-01 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58291 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch --- Created attachment 30738 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30738action=edit real-code file. just preprocessed no reduction attempted

[Bug target/58268] New: umm registers not used for -march=bdver1

2013-08-29 Thread vincenzo.innocente at cern dot ch
Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in this trival example avx is used for corei7-avx and core-avx2 not for bdver1 float a[1024]; float x[1024]; float bar(float b) { float r=0.; for (int i=0; i!=1024; ++i) r += a[i]+b*x[i

[Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code

2013-07-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 --- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch --- thanks for getting in the trunk. will be possible to back port to at least 4.8? (this issue is there till 4.4!)

[Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code

2013-07-27 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch --- confirmed that the patch fixes the issue c++ -O2 -march=corei7-avx polyAVX.cpp time ./a.out 10358474048 2.965u 0.001s 0:02.97 99.6%0+0k 0+0io 146pf+0w

[Bug target/57952] AVX/AVX2 no ymm registers used in a trivial reduction

2013-07-23 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I modified a bit the benchmark adding timing and the new version now vectorize YMM with avx2, still not with old avx if I remove the call to rdtsc(); it does not use YMM

[Bug target/57952] New: AVX/AVX2 no ymm registries used in a trivial reduction

2013-07-22 Thread vincenzo.innocente at cern dot ch
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in this quite trivial benchmark gcc does not generate avx/avx2 instruction using ymm registries c++ -Ofast -S polyAVX.cpp -march=core-avx2 ; grep -c ymm polyAVX.s 0 clang++ -Ofast -S

[Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code

2013-07-22 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following benchmark performances w/o vectorization are poor wrt to expectations I find out this is due to non zeroing a register before

[Bug target/57927] New: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-07-18 Thread vincenzo.innocente at cern dot ch
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch for instance mkdir scimark2TMP cd scimark2TMP wget http://math.nist.gov/scimark2/scimark2_1c.zip . unzip scimark2_1c.zip c++ -S LU.c -O3

[Bug target/57927] -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-07-18 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57927 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- COLLECT_GCC_OPTIONS='-S' '-O3' '-march=native' '-o' 'LU.native' '-v' '-shared-libgcc' /afs/cern.ch/user/i/innocent/w2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/cc1plus

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-10 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I remember something similar in the past --param max-completely-peel-times=1 sort of fix it… (why pre does not recognize that 1/(1+0) == 1 btw?? of course it is just

[Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt

2013-07-09 Thread vincenzo.innocente at cern dot ch
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following example div uses ymm registries while sqr only xmm ones gcc version 4.9.0 20130630 (experimental) [trunk revision 200570] (GCC) cat avx2sqrt.cc #includemath.h double div

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-09 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- actually the code for div and sqr is different already for standard SSE c++ -std=c++11 -Ofast -S avx2sqrt.cc -ftree-vectorizer-verbose=1 -Wall ; cat avx2sqrt.s .L2

[Bug tree-optimization/57823] New: restrict qualifier non effective with pointer returned by new

2013-07-04 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch I am sure this has been already discussed, not found a specific report though. below the code emitted for add is what expected, for bar gcc

[Bug tree-optimization/57823] restrict qualifier non effective with pointer returned by new

2013-07-04 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57823 --- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch --- indeed float * bar3() { const float * a = (float*) malloc(4*128); const float * b = (float*) malloc(4*128); float * c = (float*) malloc(4*128); a = (const

[Bug tree-optimization/57796] New: AVX2 gather vectorization: code bloat and reduction of performance

2013-07-03 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch At least in scimark2 sparse matrix multiplication the use of gather instructions ends in code bloat and a substantial reduction

[Bug tree-optimization/50789] Gather vectorization

2013-07-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 --- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I just submitted a specific bug-report as PR57796

[Bug tree-optimization/57634] New: Missed vectorization for a fixed point multiplication reduction

2013-06-17 Thread vincenzo.innocente at cern dot ch
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch I the following code the loop in red does not vectorize becauseof note: reduction: not commutative/associative: s_12 = (unsigned int) _11

[Bug tree-optimization/57169] New: fully unrolled matrix multiplication not vectorized

2013-05-04 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57169 Bug #: 57169 Summary: fully unrolled matrix multiplication not vectorized Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity:

[Bug tree-optimization/57162] New: Ofast does not make use of avx while O3 does

2013-05-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57162 Bug #: 57162 Summary: Ofast does not make use of avx while O3 does Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal

[Bug c++/57132] New: spurious warning: division by zero [-Wdiv-by-zero] in if (m) res %=m;

2013-05-01 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57132 Bug #: 57132 Summary: spurious warning: division by zero [-Wdiv-by-zero] in if (m) res %=m; Classification: Unclassified Product: gcc Version: 4.9.0 Status:

[Bug libstdc++/57110] New: is the use of uint_fast32_t in random intentional?

2013-04-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57110 Bug #: 57110 Summary: is the use of uint_fast32_t in random intentional? Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity:

[Bug libstdc++/57110] is the use of uint_fast32_t in random intentional?

2013-04-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57110 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-04-29 11:47:54 UTC --- Understood. The question should than be escalated to the c++ standard committee In my opinion the use of a 32-bit unsigned int

[Bug tree-optimization/56829] New: Feature request: generic builtin for movemask

2013-04-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829 Bug #: 56829 Summary: Feature request: generic builtin for movemask Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity:

[Bug tree-optimization/50789] Gather vectorization

2013-04-02 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added CC

[Bug tree-optimization/56541] New: vectorizaton fails in conditional assignment of a constant

2013-03-05 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56541 Bug #: 56541 Summary: vectorizaton fails in conditional assignment of a constant Classification: Unclassified Product: gcc Version: 4.8.0 Status:

[Bug middle-end/55266] vector expansion: 24 movs for 4 adds

2013-03-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-03-03 11:58:24 UTC --- I see still problems when calling inline functions. It seems that the code to satisfy the calling ABI is generated anyhow. take

[Bug rtl-optimization/50728] Inefficient vector loads from aggregates passed by value

2013-03-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50728 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-03-03 12:01:23 UTC --- crosspost with PR55266. feel free to consolidate in a single PR I see still problems when calling inline functions. It seems

[Bug c++/56381] New: ICE: cc1plus: internal compiler error: in gimplify_expr, at gimplify.c:7842

2013-02-18 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56381 Bug #: 56381 Summary: ICE: cc1plus: internal compiler error: in gimplify_expr, at gimplify.c:7842 Classification: Unclassified Product: gcc Version: 4.8.0 Status:

[Bug c++/56381] ICE: cc1plus: internal compiler error: in gimplify_expr, at gimplify.c:7842

2013-02-18 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56381 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-02-18 17:10:03 UTC --- Created attachment 29484 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29484 preprocessed file of user code (sorry for not reducing)

[Bug tree-optimization/56273] [4.8 regression] Bogus -Warray-bounds warning

2013-02-12 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56273 --- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-02-12 16:24:11 UTC --- I am just rebuilding (Updated to revision 195983.) and noticed /home/data/newsoft/gcc-build/./gcc/xgcc -B/home/data/newsoft/gcc-build/./gcc

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2013-01-08 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch 2013-01-08 15:29:18 UTC --- we just got hit by this great type of code (copysign is unknown to scientists) most probably gcc could optimize it for -Ofast to return

[Bug tree-optimization/55912] New: missing optimization of x/x and x/std::abs(x)

2013-01-08 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55912 Bug #: 55912 Summary: missing optimization of x/x and x/std::abs(x) Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/55723] loop vectorization inefficient in presence of multiple identical conditions

2012-12-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Summary|SLP vectorization vs

[Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss

2012-12-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 Bug #: 55760 Summary: scalar code non using rsqrtss and rcpss Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-20 15:55:03 UTC --- Thanks. not safe meaning producing incorrect results? Is it documented?

[Bug c++/55726] assignment of a scalar to a vector

2012-12-19 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-19 13:25:16 UTC --- I understand your concern, Marc. I think that the compiler shall either prefer double or produce error: call of overloaded 'f(float

[Bug c++/55726] assignment of a scalar to a vector

2012-12-18 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-18 11:39:22 UTC --- no gcc -Ofast -march=corei7 assign.c -std=c99 assign.c: In function ‘main’: assign.c:9:21: error: incompatible types when initializing type

[Bug tree-optimization/55723] New: SLP vectorization vs loop: SLP more efficient!

2012-12-17 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723 Bug #: 55723 Summary: SLP vectorization vs loop: SLP more efficient! Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/55723] SLP vectorization vs loop: SLP more efficient: loop vectorization inefficient in presence of multiple blends

2012-12-17 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Summary|SLP vectorization vs

[Bug c++/55726] New: assignment of a scalar to a vector

2012-12-17 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726 Bug #: 55726 Summary: assignment of a scalar to a vector Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug tree-optimization/55662] New: SLP vectorization of sqrt fails if in a loop

2012-12-12 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55662 Bug #: 55662 Summary: SLP vectorization of sqrt fails if in a loop Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/55645] skipping unlike branch in vectorized loops using movmsk or equivalent

2012-12-11 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645 --- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-11 10:43:21 UTC --- sure use c[i]=std::sqrt(a[i]/b[i]); Recent literature is plenty of examples mostly related to GPU code see for instance Random

[Bug tree-optimization/55645] skipping unlike branch in vectorized loops using movmsk or equivalent

2012-12-11 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645 --- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-11 14:34:45 UTC --- in principle, one could add the movmsk unconditionally (well, if advantagious) after the compare and evaluate only one one of legs

[Bug tree-optimization/55645] New: skipping unlike branch in vectorized loops using movmsk or equivalent

2012-12-10 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645 Bug #: 55645 Summary: skipping unlike branch in vectorized loops using movmsk or equivalent Classification: Unclassified Product: gcc Version: 4.8.0

[Bug c++/55573] New: [4.8 ICE] internal compiler error: in adjust_temp_type, at cp/semantics.c:6454

2012-12-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573 Bug #: 55573 Summary: [4.8 ICE] internal compiler error: in adjust_temp_type, at cp/semantics.c:6454 Classification: Unclassified Product: gcc Version: 4.8.0

[Bug c++/55573] [4.8 Regression] ICE in adjust_temp_type, at cp/semantics.c:6454

2012-12-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573 --- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-03 13:54:14 UTC --- Thanks for the quick fix could you please verify why this other constructor constexpr Rot3( T xx, T xy, T xz, T yx, T yy, T yz, T zx, T zy

[Bug c++/53094] constexpr vector subscripting

2012-12-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53094 --- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-03 14:29:54 UTC --- a bit of cross posting with PR55573] sorry this typedef float __attribute__( ( vector_size( 4*sizeof(float) ) ) ) V4; constexpr V4 build(float

[Bug c++/55573] [4.8 Regression] ICE in adjust_temp_type, at cp/semantics.c:6454

2012-12-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573 --- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-03 14:32:06 UTC --- comment 7 reduced to typedef float __attribute__( ( vector_size( 4*sizeof(float) ) ) ) V4; constexpr V4 build(float x,float y, float z

[Bug c++/53094] constexpr vector subscripting

2012-12-03 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53094 --- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-12-03 19:15:09 UTC --- adding it helps t = build_constructor (TREE_TYPE (t), n); + if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE) +t = fold (t

<    1   2   3   4   5   >