[Bug tree-optimization/50374] Support vectorization of min/max location pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed: What|Removed |Added CC||Joost.VandeVondele at mat dot ethz ||.ch --- Comment #28 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch --- actually, minloc/maxloc are also Fortran intrinsics, and while expanded inline, don't vectorize either. The following testcase runs 5x faster with ifort than with gfortran: INTEGER FUNCTION ana(a) REAL :: a(1024) ana=maxloc(a,1) END FUNCTION INTEGER :: NRUN=100 REAL :: a(1024) INTEGER :: ana,I a=(/(I,I=1,1024)/) DO I=1,NRUN j=j+ana(a) ENDDO WRITE(6,*) NRUN,j,ana(a) END ifort -fast -no-ipo -fno-inline-functions t.f90 ; time ./a.out 100 1024001024 real0m0.233s user0m0.228s sys0m0.004s gfortran -Ofast -fno-inline-functions t.f90 ; time ./a.out 100 1024001024 real0m1.193s user0m1.188s sys0m0.000s
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2014-08-24 Ever confirmed|0 |1
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 vincenzo Innocente vincenzo.innocente at cern dot ch changed: What|Removed |Added Known to fail||4.9.1 --- Comment #27 from vincenzo Innocente vincenzo.innocente at cern dot ch --- coming back to this old issue. Any chance to see it implemented in the auto-vectorizer soon? using extended vectors I manage to vectorize min_element as below. In principle the auto-vectorizer should be able to do the same starting from the loop in comment 3 typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; typedef float __attribute__( ( vector_size( 16 ) , aligned(4) ) ) float32x4a4_t; typedef int __attribute__( ( vector_size( 16 ) ) ) int32x4_t; inline float32x4_t load(float const * x) { return *(float32x4a4_t const *)(x); } int minloc(float const * x, int N) { float32x4_t v0; int32x4_t index; auto M = 4*(N/4); for (int i=M; iN; ++i) { v0[i-M] = x[i]; index[i]=i; } for (int i=N; iM+4;++i) { v0[i-M] = x[0]; index[i]=0; } int32x4_t j = {0,1,2,3}; for (int i=0; iM; i+=4) { decltype(auto) v = load(x+i); index = (vv0) ? j : index; v0 = (vv0) ? v : v0; j+=4; } auto k = 0; for (int i=1;i4; ++i) if (v0[i]v0[k]) k=i; return index[k]; } #includeiostream #includealgorithm #include x86intrin.h unsigned int taux=0; inline unsigned long long rdtscp() { return __rdtscp(taux); } int main() { float x[1024]; for (int i=0; i1024; ++i) x[i]= i%2 ? i : -i; for (int i = 0; i10; ++i) { std::random_shuffle(x,x+1024); long long ts = -rdtscp(); int l1 = std::min_element(x+i,x+1024) - (x+i); ts +=rdtscp(); long long tv = -rdtscp(); int l2 = minloc(x+i,1024-i); tv +=rdtscp(); std::cout min is at l1 ' ' ts std::endl; std::cout minloc l2 ' ' tv std::endl; } return 0; } which result in a pretty good speed up c++ -std=c++1y -Ofast minloc.cc -march=nehalem ./a.out ./a.out min is at 959 13780 minloc 959 2380 min is at 536 13680 minloc 536 4972 min is at 513 13648 minloc 513 1848 min is at 825 13640 minloc 825 1924 min is at 885 13628 minloc 885 1644 min is at 636 11252 minloc 636 1536 min is at 982 11240 minloc 982 1416 min is at 382 11228 minloc 382 1392 min is at 271 11216 minloc 271 1340 min is at 50 11204 minloc 50 1384
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #26 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-11-30 09:13:08 UTC --- using a minimal configuration like ./configure --prefix=/something --enable-languages=c,c++ --disable-multilib --disable-bootstrap --enable-lto -enable-gold=yes -disable-libitm I manage to build with the patch int lmin(float const * __restrict__ c, int N) { int k=0; for (int i=1; i!=N; ++i) k = (c[k] c[i]) ? i : k; return k; } indeed vectorizes (sse, not avx). The amount of code emitted is quite large though, it will most probably worth only in a long loop with more instructions to vectorize: the effective speedup need to be properly evaluated case by case
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #21 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-11-29 09:54:49 UTC --- My understanding is that we cannot count to see a min/max location pattern vectorization in 4.7, isn't it?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #22 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-29 09:58:40 UTC --- I gave up on it, because it still needed lots of changes in the generic vectorizer. Guess I can update the patch so that it at least applies, but there is more work on it.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #23 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-11-29 10:10:06 UTC --- a working patch could be useful to check performance and quality of the generated ASM w.r.t. manually crafted soutions
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #24 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-29 14:46:59 UTC --- Created attachment 25945 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25945 gcc47-pr50374.patch This updated patch applies (had to remove various parts from the older patch that have been upstreamed independently and adjust e.g. to the vectorizable_condition SLP changes), and passes vect.exp=\*minmax\* testing, haven't tested anything beyond that. The above written issues still apply of course.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #25 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-11-29 15:24:02 UTC --- Thanks Jakub for the new patch. Using it I get this: any Idea how to fix/avoid? home/data/newsoft/gcc-trunk/host-x86_64-unknown-linux-gnu/gcc/xgcc -B/home/data/newsoft/gcc-trunk/host-x86_64-unknown-linux-gnu/gcc/ -B/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/bin/ -B/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/lib/ -isystem /afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/include -isystem /afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/sys-include-g -O2 -ftree-vectorize -fPIC -O2 -g -O2 -ftree-vectorize -fPIC -DIN_GCC -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fPIC -I. -I. -I../../host-x86_64-unknown-linux-gnu/gcc -I../.././libgcc -I../.././libgcc/. -I../.././libgcc/../gcc -I../.././libgcc/../include -I../.././libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o bid_binarydecimal.o -MT bid_binarydecimal.o -MD -MP -MF bid_binarydecimal.dep -c ../.././libgcc/config/libbid/bid_binarydecimal.c ../.././libgcc/config/libbid/bid_binarydecimal.c: In function '__bid32_to_binary128': ../.././libgcc/config/libbid/bid_binarydecimal.c:145115:1: internal compiler error: vector VEC(vec_void_p,base) index domain error, in vinfo_for_stmt at tree-vectorizer.h:636
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #18 from Ira Rosen irar at il dot ibm.com 2011-09-22 07:51:35 UTC --- (In reply to comment #16) and -3.c fails with an ICE in the vectorizer, Ira, could you look at that? --- tree-vect-stmts.c 2011-09-22 09:48:34.0 +0200 +++ tree-vect-stmts.c.new 2011-09-22 09:48:14.0 +0200 @@ -4806,7 +4806,8 @@ vectorizable_condition (gimple stmt, gim return false; gcc_assert (ncopies = 1); - if (reduc_index ncopies 1) + if ((reduc_index || STMT_VINFO_COMPOUND_PATTERN (stmt_info)) + (ncopies 1)) return false; /* FORNOW */ if (!STMT_VINFO_RELEVANT_P (stmt_info)
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #25333|0 |1 is obsolete|| --- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-22 15:10:34 UTC --- Created attachment 25341 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25341 gcc47-pr50374.patch Thanks. Here is an updated patch, with hopefully fixed backend part, which passes the whole newly added testsuite. Unfortunately, even with -fno-tree-pre -fno-vect-cost-model, on the *-12.c testcase it vectorizes just 12 loops (f_*_[fiu]_u), not even with -mavx2 where e.g. I'd expect f_*_{d,ll,ull}_ull to be vectorized too, or e.g. the [fiu]_i etc. Seems the pattern recognizer is just too restrictive in finding the IV. On the other side as I wrote earlier, the check whether the index is strigly increasing through the whole loop is missing (if the loop bounds are known, I guess we could check its POLYNOMIAL_CHREC whether it has the expected form and whether it won't wrap/overflow, and if the loop bound is unknown, if there is addition done in a signed type, assume it won't wrap, and for unsigned if the increment is 1 and it has the size bigger or equal to pointer size and init 0/1, then it won't wrap either. BTW, I think on i?86/x86_64 we could in theory support even mixed size reductions, e.g. when the index is long long (64-bit) and comparison int or float, then I think we could use {,v}pmovsxdq instruction to extend the mask where extremes are present from vector of 4 ints or floats to a vector of 4 long longs.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #20 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-22 15:52:11 UTC --- The reason why many of the loops in *-12.c aren't vectorized is that for idxtype other than unsigned int which is the type of i there is a cast from i to the type of pos. OT, one big disadvantage of the latest patch is that the code to compute the extreme is different between normal smin/smax/umin/umax reduction and between the first part of the min/max_loc reduction - the former is faster and only computes the extreme in the lowest element of the vector, while the latter computes it duplicated in all vector elements. Which means if a loop wants to compute both the extreme and position of the extreme, it now results in unnecessary extra instructions. We don't support currently two results operations in GIMPLE, so I wonder how to express that. Perhaps use different min/max reduction expression in that case? Say REDUC_MIN_LOC_VALUE_EXPR etc., which would use reduc_min_loc_* optab to expand as well? That optab would have two targets, and if we wanted to expand REDUC_MIN_LOC_EXPR, we'd use one of the targets and pass a gen_reg_rtx () to the other one, for REDUC_MIN_LOC_VALUE_EXPR the other way around. Thoughts?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #25320|0 |1 is obsolete|| Attachment #25324|0 |1 is obsolete|| Attachment #25325|0 |1 is obsolete|| --- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 12:10:28 UTC --- Created attachment 25331 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25331 gcc47-pr50374.patch Updated patch, not very heavily tested yet. The optabs stuff needs to change, because currently i?86 has 480 gen_reduc*loc* expanders, that's way too much. Thus I think it will be better to only iterate on the two modes, and pass the comparison operator as another argument like we pass to vcond{,u}, from which the expansion can determine what kind of comparison is supposed to happen (signed vs. unsigned, min vs. max and first vs. last). I'll change it now. Another thing is that this really ought to work even with -ftree-pre, having a vectorization that requires users to disable PRE would be weird. I believe there is one extra assignment, will look at that. Then there is the question if ifcvt should, as richi asked, fold the COND_EXPR into MIN/MAX and vectorizer be adjusted for that. Or, if we just should it vectorize it using two VECT_COND_EXPRs instead (with the same condition). Then, can also vectorize e.g. double comparisons, but only int indexes?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #15 from Ira Rosen irar at il dot ibm.com 2011-09-21 12:20:47 UTC --- (In reply to comment #14) Another thing is that this really ought to work even with -ftree-pre, having a vectorization that requires users to disable PRE would be weird. I believe there is one extra assignment, will look at that. It was discussed in pr 44711 and http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02982.html.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #25331|0 |1 is obsolete|| --- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 16:08:29 UTC --- Created attachment 25333 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25333 gcc47-pr50374.patch Updated patch which changes the optab as I mentioned, and adds the testcases (both original Ira's and two new testcases, one which tries various type combinations). Currently on both x86_64 and i686 the big new testcase (-12.c) fails, apparently some issue with float comparison and unsigned int index, likely a backend issue, debugging, and -3.c fails with an ICE in the vectorizer, Ira, could you look at that? Let's leave the PRE issue for PRE for now, I'll try tomorrow to adjust ifcvt for the min/max folding.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #17 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 16:57:08 UTC --- Seems my reduction patterns just give some extreme's index, but not necessarily the first or last extreme's index. It was just narrowing down the index vector together with the comparison vector the same way, picking the values the comparison picked. Seems Ira's patch does the right thing, in particular, finding the minimum resp. maximum (then at least on i?86 I'll need to broadcast that value to all vector members as the shifts have zero filled instead of kept the upper bits as is), then compare that to the original comparison vector (i.e. get mask where the minimum values were), mask the index vector with that mask, find the minimum (for *FIRST_LOC*) resp. maximum (for *LAST_LOC*) in the index vector (for FIRST_LOC ~mask actually needs to be ored in). I'll fix that up tomorrow. Now, does the generic vectorizer code ensure that the biv has the right properties (doesn't wrap in the loop and is strictly increasing)? And, if it is signed that it is never negative?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #6 from Ira Rosen irar at il dot ibm.com 2011-09-20 08:27:08 UTC --- Thanks for working on this! It looks like the problem is with the way the stmts are marked. We don't insert pattern stmts now, so the things are more tricky. I'll try to fix this.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #7 from Ira Rosen irar at il dot ibm.com 2011-09-20 11:45:53 UTC --- Created attachment 25322 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25322 fix Here is the fix (it's a diff relative to your patch).
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #8 from Ira Rosen irar at il dot ibm.com 2011-09-20 11:47:00 UTC --- Created attachment 25323 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25323 complete patch including my fix
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-09-20 12:05:01 UTC --- does not compile to me ../.././gcc/tree-vect-loop.c: In function 'vect_is_simple_reduction_1': ../.././gcc/tree-vect-loop.c:2237:35: warning: suggest parentheses around '' within '||' [-Wparentheses] ../.././gcc/tree-vect-loop.c:2238:5: error: expected ')' before '{' token ../.././gcc/tree-vect-loop.c:2402:1: error: expected expression before '}' token ../.././gcc/tree-vect-loop.c:2021:33: warning: unused variable 'def2' [-Wunused-variable] ../.././gcc/tree-vect-loop.c: At top level: ../.././gcc/tree-vect-loop.c:1798:1: warning: 'vect_is_slp_reduction' defined but not used [-Wunused-function] ../.././gcc/tree-vect-loop.c: In function 'vect_is_simple_reduction_1': ../.././gcc/tree-vect-loop.c:2402:1: warning: control reaches end of non-void function [-Wreturn-type] make[3]: *** [tree-vect-loop.o] Error 1 looks like some parentheses is missing in the if statement
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added Attachment #25322|0 |1 is obsolete|| --- Comment #10 from Ira Rosen irar at il dot ibm.com 2011-09-20 12:17:46 UTC --- Created attachment 25324 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25324 sorry, fixed it now
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added Attachment #25323|0 |1 is obsolete|| --- Comment #11 from Ira Rosen irar at il dot ibm.com 2011-09-20 12:18:50 UTC --- Created attachment 25325 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25325 complete patch
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #12 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-09-20 13:46:16 UTC --- I'm getting these errors ../.././gcc/optabs.c: In function 'optab_d* optab_for_tree_code(tree_code, const_tree, optab_subtype)': ../.././gcc/optabs.c:470:9: error: cannot convert 'convert_optab_d*' to 'optab' in return ../.././gcc/optabs.c:474:9: error: cannot convert 'convert_optab_d*' to 'optab' in return ../.././gcc/optabs.c:478:9: error: cannot convert 'convert_optab_d*' to 'optab' in return ../.././gcc/optabs.c:482:9: error: cannot convert 'convert_optab_d*' to 'optab' in return any idea what could be wrong?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-20 14:01:55 UTC --- Ira, thanks for your help, I'll continue working on this now. To vincenzo, this patch isn't finished, so it is premature to test it.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-19 12:51:59 UTC --- I've looked briefly at the http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00631.html patch, it seems the UNSPEC_REDUC* is unnecessary there (only used in the expanders ending with DONE, in that case just a list of the operands is enough), and I don't see any reason for the new RTL codes either (why not just use (gt:V4SI (reg:V4SF r1) (reg:V4SF r2)) etc. in the patterns instead?). Furthermore, the floating vs. integral operands in vcond* should now be handled on the trunk (just supported on i386/x86_64, but there is no reason why it can't be added for rs6000 too). The patch unfortunately doesn't apply cleanly, am trying to resolve the rejects now.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-19 15:59:06 UTC --- Created attachment 25320 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25320 gcc47-pr50374-wip.patch Here is the ported patch with rejects hopefully resolved and a few bugfixes for the 4.6 - 4.7 changes. I've omitted the vcondc/vcondcu stuff because the trunk handles mixed float/integral vcond/u already, the rs6000 stuff (because IMHO the first step for rs6000 should be to add support for the mixed float/integral vcond/u for rs6000 and I'm not familiar with rs6000 enough), GTF/EQF etc. dropped (I think that it is not needed) and testcases separated out of it for now. I had to make changes beyond rejects as COND_EXPR assignments are now represented differently and the pattern stmts are no longer emitted into the basic blocks. I see no reason why this should be limited to floating point comparisons and integral indexes, IMHO e.g. finding location of smallest integer is common too. That said, I'm now stuck with it, on the fast-math-no-pre-minmax-loc-1.c (on x86_64 -O2 -ftree-vectorize -fno-tree-pre) testcase the compound pattern recognition seems to work, but still the vectorizer gives up: 17: Unsupported pattern. 17: not vectorized: unsupported use in stmt. 17: unexpected pattern. Ira, do you think you could have a quick look at this?
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added CC||irar at il dot ibm.com --- Comment #1 from Ira Rosen irar at il dot ibm.com 2011-09-13 07:55:20 UTC --- I am not planning to work on this. Ira
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-13 08:29:24 UTC --- With the recent vcond* changes doing float comparison and having integral vectors after ? and : is fine.
[Bug tree-optimization/50374] Support vectorization of min/max location pattern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374 --- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-09-13 08:45:40 UTC --- with gcc version 4.7.0 20110910 (experimental) (GCC) int lmin(float const * __restrict__ c, int N) { int k=0; for (int i=1; i!=N; ++i) k = (c[k] c[i]) ? i : k; return k; } does not vectorize. any suggestion how to modify it?