[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2014-08-24 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed:

   What|Removed |Added

 CC||Joost.VandeVondele at mat dot 
ethz
   ||.ch

--- Comment #28 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 
---
actually, minloc/maxloc are also Fortran intrinsics, and while expanded inline,
don't vectorize either. The following testcase runs 5x faster with ifort than
with gfortran:

INTEGER FUNCTION ana(a)
 REAL :: a(1024)
 ana=maxloc(a,1)
END FUNCTION


INTEGER :: NRUN=100
REAL :: a(1024)
INTEGER :: ana,I
a=(/(I,I=1,1024)/)
DO I=1,NRUN
   j=j+ana(a)
ENDDO
WRITE(6,*) NRUN,j,ana(a)
END

 ifort -fast -no-ipo -fno-inline-functions t.f90 ; time ./a.out
 100  1024001024

real0m0.233s
user0m0.228s
sys0m0.004s

 gfortran -Ofast -fno-inline-functions t.f90 ; time ./a.out
 100  1024001024

real0m1.193s
user0m1.188s
sys0m0.000s


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2014-08-24 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-08-24
 Ever confirmed|0   |1


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2014-08-23 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

vincenzo Innocente vincenzo.innocente at cern dot ch changed:

   What|Removed |Added

  Known to fail||4.9.1

--- Comment #27 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
coming back to this old issue.
Any chance to see it implemented in the auto-vectorizer soon?

using extended vectors I manage to vectorize min_element as below.
In principle the auto-vectorizer should be able to do the same starting from
the loop in comment 3


typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t;
typedef float __attribute__( ( vector_size( 16 ) , aligned(4) ) )
float32x4a4_t;
typedef int __attribute__( ( vector_size( 16 ) ) ) int32x4_t;


inline
float32x4_t load(float const * x) {
   return *(float32x4a4_t const *)(x);
}


int minloc(float const * x, int N) {
  float32x4_t v0;
  int32x4_t index;

  auto M = 4*(N/4);
  for (int i=M; iN; ++i) {
v0[i-M] = x[i];
index[i]=i;
  }
  for (int i=N; iM+4;++i) {
v0[i-M] = x[0];
index[i]=0;
  }
  int32x4_t j = {0,1,2,3};
  for (int i=0; iM; i+=4) {
decltype(auto) v = load(x+i);
index =  (vv0) ? j : index;
v0 = (vv0) ? v : v0;
j+=4;
  }
  auto k = 0;
  for (int i=1;i4; ++i) if (v0[i]v0[k]) k=i;
  return index[k];
}


#includeiostream
#includealgorithm
#include x86intrin.h
unsigned int taux=0;
inline unsigned long long rdtscp() {
 return __rdtscp(taux);
}

int main() {

  float x[1024];
  for (int i=0; i1024; ++i) x[i]= i%2 ? i : -i;
  for (int i = 0; i10; ++i) {
   std::random_shuffle(x,x+1024);
   long long ts = -rdtscp();
   int l1 = std::min_element(x+i,x+1024) - (x+i);
   ts +=rdtscp();
   long long tv = -rdtscp();
   int l2 = minloc(x+i,1024-i);
   tv +=rdtscp();

std::cout  min is at   l1  ' '  ts  std::endl;
std::cout  minloc   l2  ' '  tv  std::endl;
  }
  return 0;

}


which result in a pretty good speed up
c++ -std=c++1y -Ofast minloc.cc -march=nehalem
./a.out
./a.out 
min is at 959 13780
minloc 959 2380
min is at 536 13680
minloc 536 4972
min is at 513 13648
minloc 513 1848
min is at 825 13640
minloc 825 1924
min is at 885 13628
minloc 885 1644
min is at 636 11252
minloc 636 1536
min is at 982 11240
minloc 982 1416
min is at 382 11228
minloc 382 1392
min is at 271 11216
minloc 271 1340
min is at 50 11204
minloc 50 1384


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-30 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #26 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-11-30 09:13:08 UTC ---
using a minimal configuration like
./configure --prefix=/something --enable-languages=c,c++ --disable-multilib
--disable-bootstrap --enable-lto -enable-gold=yes -disable-libitm
I manage to build with the patch

int lmin(float const * __restrict__ c, int N) {
  int k=0;
  for (int i=1; i!=N; ++i)
 k =  (c[k]  c[i]) ? i : k;
  return k;
}
indeed vectorizes (sse, not avx).
The amount of code emitted is quite large though, it will most probably worth
only in a long loop
with more instructions to vectorize: the effective speedup need to be properly
evaluated case by case


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #21 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-11-29 09:54:49 UTC ---
My understanding is that we cannot count to see a min/max location pattern
vectorization in 4.7, isn't it?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-29 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #22 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-29 
09:58:40 UTC ---
I gave up on it, because it still needed lots of changes in the generic
vectorizer.  Guess I can update the patch so that it at least applies, but
there is more work on it.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #23 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-11-29 10:10:06 UTC ---
a working patch could be useful to check performance and quality of the
generated ASM w.r.t. manually crafted soutions


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-29 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #24 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-29 
14:46:59 UTC ---
Created attachment 25945
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25945
gcc47-pr50374.patch

This updated patch applies (had to remove various parts from the older patch
that have been upstreamed independently and adjust e.g. to the
vectorizable_condition SLP changes), and passes vect.exp=\*minmax\* testing,
haven't tested anything beyond that.  The above written issues still apply of
course.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-11-29 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #25 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-11-29 15:24:02 UTC ---
Thanks Jakub for the new patch.
Using it I get this: any Idea how to fix/avoid?

home/data/newsoft/gcc-trunk/host-x86_64-unknown-linux-gnu/gcc/xgcc
-B/home/data/newsoft/gcc-trunk/host-x86_64-unknown-linux-gnu/gcc/
-B/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/bin/
-B/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/lib/ -isystem
/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/include -isystem
/afs/cern.ch/user/i/innocent/w2/x86_64-unknown-linux-gnu/sys-include-g -O2
-ftree-vectorize -fPIC -O2  -g -O2 -ftree-vectorize -fPIC -DIN_GCC   -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include   -fPIC -g
-DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fPIC -I. -I.
-I../../host-x86_64-unknown-linux-gnu/gcc -I../.././libgcc -I../.././libgcc/.
-I../.././libgcc/../gcc -I../.././libgcc/../include
-I../.././libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS 
-DUSE_TLS -o bid_binarydecimal.o -MT bid_binarydecimal.o -MD -MP -MF
bid_binarydecimal.dep -c ../.././libgcc/config/libbid/bid_binarydecimal.c
../.././libgcc/config/libbid/bid_binarydecimal.c: In function
'__bid32_to_binary128':
../.././libgcc/config/libbid/bid_binarydecimal.c:145115:1: internal compiler
error: vector VEC(vec_void_p,base) index domain error, in vinfo_for_stmt at
tree-vectorizer.h:636


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-22 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #18 from Ira Rosen irar at il dot ibm.com 2011-09-22 07:51:35 UTC 
---
(In reply to comment #16)
 and -3.c fails with an ICE in the vectorizer, Ira,
 could you look at that?  

--- tree-vect-stmts.c   2011-09-22 09:48:34.0 +0200
+++ tree-vect-stmts.c.new   2011-09-22 09:48:14.0 +0200
@@ -4806,7 +4806,8 @@ vectorizable_condition (gimple stmt, gim
 return false;

   gcc_assert (ncopies = 1);
-  if (reduc_index  ncopies  1)
+  if ((reduc_index || STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+   (ncopies  1))
 return false; /* FORNOW */

   if (!STMT_VINFO_RELEVANT_P (stmt_info)


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-22 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #25333|0   |1
is obsolete||

--- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-22 
15:10:34 UTC ---
Created attachment 25341
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25341
gcc47-pr50374.patch

Thanks.  Here is an updated patch, with hopefully fixed backend part, which
passes the whole newly added testsuite.
Unfortunately, even with -fno-tree-pre -fno-vect-cost-model, on the *-12.c
testcase it vectorizes just 12 loops (f_*_[fiu]_u), not even with -mavx2 where
e.g. I'd expect f_*_{d,ll,ull}_ull to be vectorized too, or e.g. the [fiu]_i
etc.  Seems the pattern recognizer is just too restrictive in finding the IV.
On the other side as I wrote earlier, the check whether the index is strigly
increasing through the whole loop is missing (if the loop bounds are known, I
guess we could check its POLYNOMIAL_CHREC whether it has the expected form and
whether it won't wrap/overflow, and if the loop bound is unknown, if there is
addition done in a signed type, assume it won't wrap, and for unsigned if the
increment is 1 and it has the size bigger or equal to pointer size and init
0/1, then it won't wrap either.

BTW, I think on i?86/x86_64 we could in theory support even mixed size
reductions, e.g. when the index is long long (64-bit) and comparison int or
float, then I think we could use {,v}pmovsxdq instruction to extend the mask
where extremes are present from vector of 4 ints or floats to a vector of 4
long longs.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-22 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #20 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-22 
15:52:11 UTC ---
The reason why many of the loops in *-12.c aren't vectorized is that for
idxtype other than unsigned int which is the type of i there is a cast from i
to the type of pos.

OT, one big disadvantage of the latest patch is that the code to compute
the extreme is different between normal smin/smax/umin/umax reduction and
between the first part of the min/max_loc reduction - the former is faster and
only computes the extreme in the lowest element of the vector, while the latter
computes it duplicated in all vector elements.  Which means if a loop wants to
compute both the extreme and position of the extreme, it now results in
unnecessary extra instructions.  We don't support currently two results
operations in GIMPLE, so I wonder how to express that.  Perhaps use different
min/max reduction expression in that case?  Say REDUC_MIN_LOC_VALUE_EXPR etc.,
which would use reduc_min_loc_* optab to expand as well?  That optab would have
two targets, and if we wanted to expand REDUC_MIN_LOC_EXPR, we'd use one of the
targets and pass a gen_reg_rtx () to the other one, for
REDUC_MIN_LOC_VALUE_EXPR the other way around.  Thoughts?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-21 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #25320|0   |1
is obsolete||
  Attachment #25324|0   |1
is obsolete||
  Attachment #25325|0   |1
is obsolete||

--- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 
12:10:28 UTC ---
Created attachment 25331
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25331
gcc47-pr50374.patch

Updated patch, not very heavily tested yet.
The optabs stuff needs to change, because currently i?86 has 480 gen_reduc*loc*
expanders, that's way too much.

Thus I think it will be better to only iterate on the two modes, and pass
the comparison operator as another argument like we pass to vcond{,u}, from
which the expansion can determine what kind of comparison is supposed to happen
(signed vs. unsigned, min vs. max and first vs. last).  I'll change it now.

Another thing is that this really ought to work even with -ftree-pre, having a
vectorization that requires users to disable PRE would be weird.  I believe
there is one extra assignment, will look at that.

Then there is the question if ifcvt should, as richi asked, fold the COND_EXPR
into MIN/MAX and vectorizer be adjusted for that.  Or, if we just should it
vectorize it using two VECT_COND_EXPRs instead (with the same condition).

Then, can also vectorize e.g. double comparisons, but only int indexes?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-21 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #15 from Ira Rosen irar at il dot ibm.com 2011-09-21 12:20:47 UTC 
---
(In reply to comment #14)

 Another thing is that this really ought to work even with -ftree-pre, having a
 vectorization that requires users to disable PRE would be weird.  I believe
 there is one extra assignment, will look at that.

It was discussed in pr 44711 and
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02982.html.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-21 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #25331|0   |1
is obsolete||

--- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 
16:08:29 UTC ---
Created attachment 25333
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25333
gcc47-pr50374.patch

Updated patch which changes the optab as I mentioned, and adds the testcases
(both original Ira's and two new testcases, one which tries various type
combinations).
Currently on both x86_64 and i686 the big new testcase (-12.c) fails,
apparently some issue with float comparison and unsigned int index, likely a
backend issue, debugging, and -3.c fails with an ICE in the vectorizer, Ira,
could you look at that?  Let's leave the PRE issue for PRE for now, I'll try
tomorrow to adjust ifcvt for the min/max folding.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-21 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #17 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-21 
16:57:08 UTC ---
Seems my reduction patterns just give some extreme's index, but not necessarily
the first or last extreme's index.  It was just narrowing down the index vector
together with the comparison vector the same way, picking the values the
comparison picked.  Seems Ira's patch does the right thing, in particular,
finding the minimum resp. maximum (then at least on i?86 I'll need to broadcast
that value to all vector members as the shifts have zero filled instead of kept
the upper bits as is), then compare that to the original comparison vector
(i.e. get mask where the minimum values were), mask the index vector with that
mask, find the minimum (for *FIRST_LOC*) resp. maximum (for *LAST_LOC*) in the
index vector (for FIRST_LOC ~mask actually needs to be ored in).
I'll fix that up tomorrow.

Now, does the generic vectorizer code ensure that the biv has the right
properties (doesn't wrap in the loop and is strictly increasing)?  And, if it
is signed that it is never negative?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #6 from Ira Rosen irar at il dot ibm.com 2011-09-20 08:27:08 UTC 
---
Thanks for working on this!
It looks like the problem is with the way the stmts are marked. We don't insert
pattern stmts now, so the things are more tricky. I'll try to fix this.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #7 from Ira Rosen irar at il dot ibm.com 2011-09-20 11:45:53 UTC 
---
Created attachment 25322
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25322
fix

Here is the fix (it's a diff relative to your patch).


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #8 from Ira Rosen irar at il dot ibm.com 2011-09-20 11:47:00 UTC 
---
Created attachment 25323
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25323
complete patch including my fix


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #9 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-09-20 12:05:01 UTC ---
does not compile to me

../.././gcc/tree-vect-loop.c: In function 'vect_is_simple_reduction_1':
../.././gcc/tree-vect-loop.c:2237:35: warning: suggest parentheses around ''
within '||' [-Wparentheses]
../.././gcc/tree-vect-loop.c:2238:5: error: expected ')' before '{' token
../.././gcc/tree-vect-loop.c:2402:1: error: expected expression before '}'
token
../.././gcc/tree-vect-loop.c:2021:33: warning: unused variable 'def2'
[-Wunused-variable]
../.././gcc/tree-vect-loop.c: At top level:
../.././gcc/tree-vect-loop.c:1798:1: warning: 'vect_is_slp_reduction' defined
but not used [-Wunused-function]
../.././gcc/tree-vect-loop.c: In function 'vect_is_simple_reduction_1':
../.././gcc/tree-vect-loop.c:2402:1: warning: control reaches end of non-void
function [-Wreturn-type]
make[3]: *** [tree-vect-loop.o] Error 1

looks like some parentheses is missing in the if statement


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Ira Rosen irar at il dot ibm.com changed:

   What|Removed |Added

  Attachment #25322|0   |1
is obsolete||

--- Comment #10 from Ira Rosen irar at il dot ibm.com 2011-09-20 12:17:46 UTC 
---
Created attachment 25324
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25324
sorry, fixed it now


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Ira Rosen irar at il dot ibm.com changed:

   What|Removed |Added

  Attachment #25323|0   |1
is obsolete||

--- Comment #11 from Ira Rosen irar at il dot ibm.com 2011-09-20 12:18:50 UTC 
---
Created attachment 25325
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25325
complete patch


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #12 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-09-20 13:46:16 UTC ---
I'm getting these errors

../.././gcc/optabs.c: In function 'optab_d* optab_for_tree_code(tree_code,
const_tree, optab_subtype)':
../.././gcc/optabs.c:470:9: error: cannot convert 'convert_optab_d*' to 'optab'
in return
../.././gcc/optabs.c:474:9: error: cannot convert 'convert_optab_d*' to 'optab'
in return
../.././gcc/optabs.c:478:9: error: cannot convert 'convert_optab_d*' to 'optab'
in return
../.././gcc/optabs.c:482:9: error: cannot convert 'convert_optab_d*' to 'optab'
in return

any idea what could be wrong?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-20 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-20 
14:01:55 UTC ---
Ira, thanks for your help, I'll continue working on this now.
To vincenzo, this patch isn't finished, so it is premature to test it.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-19 
12:51:59 UTC ---
I've looked briefly at the
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00631.html
patch, it seems the UNSPEC_REDUC* is unnecessary there (only used in the
expanders ending with DONE, in that case just a list of the operands is
enough),
and I don't see any reason for the new RTL codes either (why not just use
(gt:V4SI (reg:V4SF r1) (reg:V4SF r2)) etc. in the patterns instead?).
Furthermore, the floating vs. integral operands in vcond* should now be handled
on the trunk (just supported on i386/x86_64, but there is no reason why it
can't be added for rs6000 too).
The patch unfortunately doesn't apply cleanly, am trying to resolve the rejects
now.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-19 
15:59:06 UTC ---
Created attachment 25320
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25320
gcc47-pr50374-wip.patch

Here is the ported patch with rejects hopefully resolved and a few bugfixes for
the 4.6 - 4.7 changes.
I've omitted the vcondc/vcondcu stuff because the trunk handles mixed
float/integral vcond/u already, the rs6000 stuff (because IMHO the first step
for rs6000 should be to add support for the mixed float/integral vcond/u for
rs6000 and I'm not familiar with rs6000 enough), GTF/EQF etc. dropped (I think
that it is not needed) and testcases separated out of it for now.
I had to make changes beyond rejects as COND_EXPR assignments are now
represented differently and the pattern stmts are no longer emitted into the
basic blocks.  I see no reason why this should be limited to floating point
comparisons and integral indexes, IMHO e.g. finding location of smallest
integer is common too.

That said, I'm now stuck with it, on the fast-math-no-pre-minmax-loc-1.c (on
x86_64 -O2 -ftree-vectorize -fno-tree-pre) testcase the compound pattern
recognition seems to work, but still the vectorizer gives up:
17: Unsupported pattern.
17: not vectorized: unsupported use in stmt.
17: unexpected pattern.

Ira, do you think you could have a quick look at this?


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-13 Thread irar at il dot ibm.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Ira Rosen irar at il dot ibm.com changed:

   What|Removed |Added

 CC||irar at il dot ibm.com

--- Comment #1 from Ira Rosen irar at il dot ibm.com 2011-09-13 07:55:20 UTC 
---
I am not planning to work on this.

Ira


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-13 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2011-09-13 
08:29:24 UTC ---
With the recent vcond* changes doing float comparison and having integral
vectors after ? and : is fine.


[Bug tree-optimization/50374] Support vectorization of min/max location pattern

2011-09-13 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50374

--- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2011-09-13 08:45:40 UTC ---
with
gcc version 4.7.0 20110910 (experimental) (GCC) 


int lmin(float const * __restrict__ c, int N) {
  int k=0;
  for (int i=1; i!=N; ++i)
 k =  (c[k]  c[i]) ? i : k;
  return k;
}

does not vectorize.
any suggestion how to modify it?