http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49483
Summary: unable to vectorize code equivalent to "scalbnf" Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vincenzo.innoce...@cern.ch I'm trying to write simplified versions of trigonometric and trascendental functions that gcc can auto-vectorize. at the moment I'm blocked with the vectorization of "scalbnf" I'm using code equivalent to the one in glibc sysdeps/ieee754/flt-32/s_scalbnf.c and math/math_private.h which in my c++ version reads cat vldexpf.cc inline float i2f(int x) { union { float f; int i; } tmp; tmp.i=x; return tmp.f; } inline float vect_ldexpf(float x, int n) { n = (n+0x7f)<<23; return x * i2f(n); } float __attribute__ ((aligned(16))) a[1024]; float __attribute__ ((aligned(16))) b[1024]; float __attribute__ ((aligned(16))) c[1024]; void tV() { for (int i=0; i!=1024; ++i) { float z = a[i]; int n = b[i]; c[i] = vect_ldexpf(z,n); } } compiling it produces c++ -Ofast -c vldexpf.cc -msse4.2 -ftree-vectorizer-verbose=7 vldexpf.cc:16: note: vect_model_load_cost: aligned. vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 1, outside_cost = 0. vldexpf.cc:16: note: vect_model_load_cost: aligned. vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0. vldexpf.cc:16: note: vect_model_store_cost: aligned. vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 3, outside_cost = 0. vldexpf.cc:16: note: vect_model_load_cost: aligned. vldexpf.cc:16: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 . vldexpf.cc:16: note: vect_model_load_cost: aligned. vldexpf.cc:16: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 . vldexpf.cc:16: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 . vldexpf.cc:16: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 . vldexpf.cc:16: note: not vectorized: relevant stmt not supported: D.2243_14 = VIEW_CONVERT_EXPR<float>(n_13); vldexpf.cc:15: note: vectorized 0 loops in function. I'm using c++ -v Using built-in specs. COLLECT_GCC=c++ COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin10.7.0/4.7.0/lto-wrapper Target: x86_64-apple-darwin10.7.0 Configured with: ./configure --enable-languages=c,c++,fortran --enable-lto --with-build-config=bootstrap-lto CFLAGS='-O2 -ftree-vectorize -fPIC' CXXFLAGS='-O2 -fPIC -ftree-vectorize -fvisibility-inlines-hidden' Thread model: posix gcc version 4.7.0 20110528 (experimental) (GCC)