https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61292
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
Summary|auto keyword to vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
Version|4.7.0
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following test shuffle2 generates not optimized moves.
the other two are ok.
the problem occurs in real life when the vector is a data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61301
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
At least when shuffle2 is inlined it is likely to become like shuffle1...
not sure for the case of a struct such as foo (unless the instance of foo
itself
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
given
typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t;
typedef float __attribute__( ( vector_size( 16 ) , aligned(4) ) )
float32x4a4_t
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
apologize for not reducing (trivial reduction (bar below) works)
given
cat NaiveDod.cc
#includearray
#includevector
#includeutility
: minor
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following example
cat uintLoop.cc
unsigned int N;
float * a, *b, *c;
using Ind = /*unsigned*/ int;
inline
float val
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
CC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
great!
the original version (that vectorized in 4.8.1)
void barX() {
for (int i=0; i1024; ++i) {
k[i] = (x[i]0) (w[i]y[i]);
z[i] = (k[i]) ? z[i] : y[i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I confirm that with last patch the regression is gone also in a more complex
actual application I had.
The regression concerns only comment 2 and 3.
all the other
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #14 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
provided that future patches will make the code in comment 1 and 2 (and bar) go
vectorize is fine with me.
if it ends up to vectorize also with bool instead of int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61175
--- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
adding
#pragma GCC ivdep
before the loop makes no difference
: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
z[i] = ( (x[i]0) (w[i]0)) ? z[i] : y[i];
produces
bit-precision arithmetic not supported.
note
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
what I find quite absurd is that
void barX() {
for (int i=0; i1024; ++i) {
k[i] = x[i]0;
k[i] = w[i]y[i];
//z[i] = (k[i]) ? z[i] : y[i];
}
}
vectorize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
new test code
cat cond0.cc
float x[1024];
float y[1024];
float z[1024];
float w[1024];
int k[1024];
void barX() {
for (int i=0; i1024; ++i) {
k[i] = (x[i]0) (w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
of course if you can make
z[i] = ( (x[i]0) (w[i]0)) ? z[i] : y[i];
to vectorize would be even better!
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
give this code
cat bug.cc
float px[1024];
float xx, vv;
unsigned int N=1024;
void ok() {
for (auto j=0U; jN; ++j) {
auto ax = px[j]-xx
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
of these three function only oneOk vectorize.
float px[1024];
float vx[1024];
unsigned int N=1024;
void one(unsigned int i) {
for (auto j=i+1; jN; ++j) {
auto ax = px[j]-px[i];
vx[i
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in latest 4.9. seen in 4.8.1 too
take
cat attribute.cc
inline float sum(float x, float y) { return x+y
: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
gcc version 4.9.0 20131011 (experimental) [trunk revision 203426] (GCC)
ok
gcc version 4.9.0 20131102 (experimental) [trunk revision
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following example
matmul and matmul2 do not vectorize
the manual unroll does
c++ -std=c++11 -Ofast -S m3x10.cc -march=corei7-avx -fopt-info-vec-all
gcc version 4.9.0 20131011
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following foo vectorize bar does not
(bar does not vectorize even for
if (x[i]0) s+=x[i];
)
compiled as
c++ -Ofast -fopt-info-loop -S condRed.cc -fopenmp -ftree-loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #28 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
updated to the new revision
gcc version 4.9.0 20131007 (experimental) [gomp-4_0-branch revision 203250]
(GCC)
[innocent@olsnba04 parallel]$ setenv OMP_PROC_BIND
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #30 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
better: as usual nastier bugs are in the tests!
[innocent@olsnba04 parallel]$ strace ./affinity-1.exe | grep affin
execve(./affinity-1.exe, [./affinity-1.exe], [/* 61
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
getconf -a | grep _NPROCESSORS
_NPROCESSORS_CONF 32
_NPROCESSORS_ONLN 32
ls -l /sys/devices/system/cpu/
total 0
drwxr-xr-x 8 root root
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
do you have access to a 32 cpu machine?
btw on XEON-PHI one can have 200 cpus
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #10 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
seems working
[innocent@olsnba04 parallel]$ c++ -std=c++11 -Ofast -fopenmp simpleOMP.cpp
[innocent@olsnba04 parallel]$ ./a.out
max thread 32
[innocent@olsnba04
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
[innocent@olsnba04 parallel]$ setenv OMP_PROC_BIND true; setenv OMP_PLACES
'threads'
[innocent@olsnba04 parallel]$ gcc -fopenmp trivialOMP.cpp
[innocent@olsnba04
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #14 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
On 7 Oct, 2013, at 10:06 AM, jakub at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org wrote:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #12 from Jakub
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #16 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
./affinity-1.exe
Initial thread
#1 thread 1
#1 thread 0
#1 thread 3
#1 thread 2
#1,#1 thread 3,1
#1,#1 thread 3,0
#1,#1 thread 3,2
#1,#2 thread 3,4
#1,#2 thread 3,0
#1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #18 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
On 7 Oct, 2013, at 12:27 PM, jakub at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org wrote:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #17 from Jakub
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #19 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
On 7 Oct, 2013, at 12:27 PM, jakub at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org wrote:
or config.h doesn't defined HAVE_PTHREAD_AFFINITY_NP, then that's
expected
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #22 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
on the XEON
setenv OMP_PROC_BIND false
reakpoint 1, main () at
/home/data/newsoft/gcc-gomp4/libgomp/testsuite/libgomp.c/affinity-1.c:181
181/home/data/newsoft/gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #24 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
ok, modified to =
taskset -c 0-31 gdb ./affinity-1.exe
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
(gdb) b
/home/data/newsoft/gcc-gomp4/libgomp/testsuite
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #26 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
On 7 Oct, 2013, at 3:02 PM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org
wrote:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #25 from Jakub
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
CC: jakub at gcc dot gnu.org
till
[gomp-4_0-branch revision 202766]
int main() {
std::cout max thread omp_get_max_threads() std::endl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
strange indeed
rhel6: so is 2.12
or my own version
GNU C Library stable release version 2.13,
I build gcc by myself
c++ -v
Using built-in specs.
COLLECT_GCC=c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58642
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
24 thread machine ok
innocent@vocms19 parallel]$ c++ -Ofast -std=c++11 -fopenmp simpleOMP.cpp
[innocent@vocms19 parallel]$ ./a.out
max thread 24
[innocent@vocms19
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58482
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I see.
I have several use cases in which the reduction requires the access to two
variables
(minloc for instance: the minimum and its location)
btw tried
omp parallel
: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
CC: jakub at gcc dot gnu.org
I acknowledge that my understanding of omp declare is still limited.
Still the example below produces different result with and w/o
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58482
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
Thanks Jakub for the clear answer.
The reduction operator should be strictly commutative!
and I now understand the meaning of
omp declare reduction (I hope)
so I
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
#includecmath
float a[1024];
float b[1024];
float sumO1() {
auto s = 0.f;
#pragma omp simd reduction(+:s)
for (auto i=0U;i1024;++i
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
yes
cat omp4red.cc
float a[1024];
float b[1024];
float sumO1() {
float s = 0.f;
#pragma omp simd reduction(+:s)
for (int i=0;i1024;++i) {
s += a[i]*b[i
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
on linux
c++ -O2 -ftree-vectorizer-verbose=1 -S omp4red.cc -fopenmp
omp4red.cc:8:13: note: loop vectorized
omp4red.cc: In function 'float sumO1()':
omp4red.cc:4:7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
gcc -O2 libgomp/testsuite/libgomp.c/simd-3.c -fopenmp
libgomp/testsuite/libgomp.c/simd-3.c: In function ‘foo’:
libgomp/testsuite/libgomp.c/simd-3.c:14:1: internal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #6 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
seems so
gcc -O2 libgomp/testsuite/libgomp.c/simd-4.c -fopenmp
c++ -O2 -S omp4red.cc -fopenmp| cat omp4red.s
.text
.align 4,0x90
.globl __Z5sumO1v
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
Yes I compile gcc with -O2 -ftree-vectorize
on linux I also do bootstrap-lto
strange that the compiler does not warn about this uninitialized variable:
it does
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58472
--- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
w/o opening another bug report
c++ -O2 -S omp4red.cc -fopenmp -Wall
omp4red.cc: In function ‘float sumO1()’:
omp4red.cc:6:9: warning: ‘simduid.0’ is used uninitialized
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
CC: jakub at gcc dot gnu.org
took me years to learn and teach to use != instead of ….
float a[1024];
float b[1024];
void err() {
#pragma omp simd
for (int i=0;i
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58462
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
Thanks Jakub.
Downloaded the standard.
waiting for more examples of usage
It is a pity that it does not support c++ range loop
Let me highjack this bug to congratulate
: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
this is a regression w.r.t.
gcc version 4.9.0 20130820 (experimental) [trunk revision 201887] (GCC)
c++ -g -O2 -c -std=gnu++11 -fipa-pta ipa_err.i
RooMinimizer.cc: In destructor 'RooMinimizer::~RooMinimizer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58291
--- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
Created attachment 30738
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30738action=edit
real-code file. just preprocessed no reduction attempted
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in this trival example avx is used for corei7-avx and core-avx2
not for bdver1
float a[1024];
float x[1024];
float bar(float b) {
float r=0.;
for (int i=0; i!=1024; ++i)
r += a[i]+b*x[i
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
--- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
thanks for getting in the trunk.
will be possible to back port to at least 4.8?
(this issue is there till 4.4!)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
confirmed that the patch fixes the issue
c++ -O2 -march=corei7-avx polyAVX.cpp
time ./a.out
10358474048
2.965u 0.001s 0:02.97 99.6%0+0k 0+0io 146pf+0w
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952
--- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I modified a bit the benchmark adding timing
and the new version now vectorize YMM with avx2, still not with old avx
if I remove the call to rdtsc(); it does not use YMM
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in this quite trivial benchmark
gcc does not generate avx/avx2 instruction using ymm registries
c++ -Ofast -S polyAVX.cpp -march=core-avx2 ; grep -c ymm polyAVX.s
0
clang++ -Ofast -S
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following benchmark performances w/o vectorization are poor wrt to
expectations
I find out this is due to non zeroing a register before
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
for instance
mkdir scimark2TMP
cd scimark2TMP
wget http://math.nist.gov/scimark2/scimark2_1c.zip .
unzip scimark2_1c.zip
c++ -S LU.c -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57927
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
COLLECT_GCC_OPTIONS='-S' '-O3' '-march=native' '-o' 'LU.native' '-v'
'-shared-libgcc'
/afs/cern.ch/user/i/innocent/w2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/cc1plus
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I remember something similar in the past
--param max-completely-peel-times=1
sort of fix it… (why pre does not recognize that 1/(1+0) == 1 btw??
of course it is just
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following example div uses ymm registries while sqr only xmm ones
gcc version 4.9.0 20130630 (experimental) [trunk revision 200570] (GCC)
cat avx2sqrt.cc
#includemath.h
double div
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
actually the code for div and sqr is different already for standard SSE
c++ -std=c++11 -Ofast -S avx2sqrt.cc -ftree-vectorizer-verbose=1 -Wall ; cat
avx2sqrt.s
.L2
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
I am sure this has been already discussed, not found a specific report though.
below the code emitted for add is what expected, for bar gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57823
--- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
indeed
float * bar3() {
const float * a = (float*) malloc(4*128);
const float * b = (float*) malloc(4*128);
float * c = (float*) malloc(4*128);
a = (const
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
At least in scimark2 sparse matrix multiplication the use of gather
instructions ends in code bloat and a substantial reduction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789
--- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I just submitted a specific bug-report as PR57796
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
I the following code the loop in red does not vectorize becauseof
note: reduction: not commutative/associative: s_12 = (unsigned int) _11
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57169
Bug #: 57169
Summary: fully unrolled matrix multiplication not vectorized
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57162
Bug #: 57162
Summary: Ofast does not make use of avx while O3 does
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57132
Bug #: 57132
Summary: spurious warning: division by zero [-Wdiv-by-zero] in
if (m) res %=m;
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57110
Bug #: 57110
Summary: is the use of uint_fast32_t in random intentional?
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57110
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-04-29 11:47:54 UTC ---
Understood.
The question should than be escalated to the c++ standard committee
In my opinion the use of a 32-bit unsigned int
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
Bug #: 56829
Summary: Feature request: generic builtin for movemask
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
CC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56541
Bug #: 56541
Summary: vectorizaton fails in conditional assignment of a
constant
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-03-03 11:58:24 UTC ---
I see still problems when calling inline functions.
It seems that the code to satisfy the calling ABI is generated anyhow.
take
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50728
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-03-03 12:01:23 UTC ---
crosspost with PR55266.
feel free to consolidate in a single PR
I see still problems when calling inline functions.
It seems
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56381
Bug #: 56381
Summary: ICE: cc1plus: internal compiler error: in
gimplify_expr, at gimplify.c:7842
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56381
--- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-02-18 17:10:03 UTC ---
Created attachment 29484
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29484
preprocessed file of user code (sorry for not reducing)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56273
--- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-02-12 16:24:11 UTC ---
I am just rebuilding (Updated to revision 195983.) and noticed
/home/data/newsoft/gcc-build/./gcc/xgcc -B/home/data/newsoft/gcc-build/./gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch
2013-01-08 15:29:18 UTC ---
we just got hit by this great type of code (copysign is unknown to
scientists)
most probably gcc could optimize it for -Ofast to return
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55912
Bug #: 55912
Summary: missing optimization of x/x and x/std::abs(x)
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
Summary|SLP vectorization vs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
Bug #: 55760
Summary: scalar code non using rsqrtss and rcpss
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-20 15:55:03 UTC ---
Thanks.
not safe meaning producing incorrect results?
Is it documented?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-19 13:25:16 UTC ---
I understand your concern, Marc.
I think that the compiler shall either prefer double or produce
error: call of overloaded 'f(float
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726
--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-18 11:39:22 UTC ---
no
gcc -Ofast -march=corei7 assign.c -std=c99
assign.c: In function ‘main’:
assign.c:9:21: error: incompatible types when initializing type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723
Bug #: 55723
Summary: SLP vectorization vs loop: SLP more efficient!
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723
vincenzo Innocente vincenzo.innocente at cern dot ch changed:
What|Removed |Added
Summary|SLP vectorization vs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726
Bug #: 55726
Summary: assignment of a scalar to a vector
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55662
Bug #: 55662
Summary: SLP vectorization of sqrt fails if in a loop
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645
--- Comment #4 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-11 10:43:21 UTC ---
sure use c[i]=std::sqrt(a[i]/b[i]);
Recent literature is plenty of examples mostly related to GPU code
see for instance
Random
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645
--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-11 14:34:45 UTC ---
in principle, one could add the movmsk unconditionally (well, if advantagious)
after the compare and evaluate only one one of legs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645
Bug #: 55645
Summary: skipping unlike branch in vectorized loops using
movmsk or equivalent
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573
Bug #: 55573
Summary: [4.8 ICE] internal compiler error: in
adjust_temp_type, at cp/semantics.c:6454
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573
--- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-03 13:54:14 UTC ---
Thanks for the quick fix
could you please verify why this other constructor
constexpr Rot3( T xx, T xy, T xz, T yx, T yy, T yz, T zx, T zy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53094
--- Comment #7 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-03 14:29:54 UTC ---
a bit of cross posting with PR55573] sorry
this
typedef float __attribute__( ( vector_size( 4*sizeof(float) ) ) ) V4;
constexpr V4 build(float
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55573
--- Comment #8 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-03 14:32:06 UTC ---
comment 7 reduced to
typedef float __attribute__( ( vector_size( 4*sizeof(float) ) ) ) V4;
constexpr V4 build(float x,float y, float z
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53094
--- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch
2012-12-03 19:15:09 UTC ---
adding it helps
t = build_constructor (TREE_TYPE (t), n);
+ if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
+t = fold (t
101 - 200 of 493 matches
Mail list logo