Re: New GCC options for loop vectorization

2013-09-24 Thread Richard Biener
On Mon, Sep 23, 2013 at 10:37 PM, Xinliang David Li davi...@google.com wrote:
 Thanks. I modified the patch so that the max allowed peel iterations
 can be specified via a parameter. Testing on going. Ok for trunk ?

+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+ vect-max-peeling-for-alignment,
+ Max number of loop peels to enhancement alignment of data
references in a loop,
+ -1, 0, 0)

I believe this doesn't allow --param vect-max-peeling-for-alignment=-1 as you
specify 0 as the minimum.

So, ok with changing minimum and maxmum to -1. (double check that this works)

Thanks,
Richard.



 David

 On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li 
 davi...@google.com wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not 
 on 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the 
 vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

 As said I'll disable all remains of -ftree-vect-loop-version with the cost 
 model
 patch because it wasn't guarding versioning for aliasing but only 
 versioning
 for alignment.

 We have to be consistent here - if we add a way to disable peeling for
 alignment then we certainly don't want to remove the ability to disable
 versioning for alignment, no?

 We already have the ability to turn off versioning -- via --param. It
 is a more natural way to fine tune a pass instead of introducing a -f
 option. For this reason, your planned deprecation of the option is a
 good 

Re: New GCC options for loop vectorization

2013-09-23 Thread Richard Biener
On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li davi...@google.com wrote:
 On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

 As said I'll disable all remains of -ftree-vect-loop-version with the cost 
 model
 patch because it wasn't guarding versioning for aliasing but only versioning
 for alignment.

 We have to be consistent here - if we add a way to disable peeling for
 alignment then we certainly don't want to remove the ability to disable
 versioning for alignment, no?

 We already have the ability to turn off versioning -- via --param. It
 is a more natural way to fine tune a pass instead of introducing a -f
 option. For this reason, your planned deprecation of the option is a
 good thing to do.

 For consistency, I think we should introduce a new parameter to turn
 on/off peeling. This can also be tied closely with the cost model
 change.

 The proposed patch attached. Does this one look ok?

I'd principally support adding a param but rather would for example make
it limit the number of peeled iterations (zero turning off the feature).  As
you implemented it it will give false messages for

  supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
  do_peeling = vector_alignment_reachable_p (dr);
  if (do_peeling)
{
...
}
  else
{
  if (!aligned_access_p (dr))
{
  if (dump_enabled_p ())

Re: New GCC options for loop vectorization

2013-09-23 Thread Xinliang David Li
Thanks. I modified the patch so that the max allowed peel iterations
can be specified via a parameter. Testing on going. Ok for trunk ?

David

On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li 
 davi...@google.com wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not 
 on 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

 As said I'll disable all remains of -ftree-vect-loop-version with the cost 
 model
 patch because it wasn't guarding versioning for aliasing but only versioning
 for alignment.

 We have to be consistent here - if we add a way to disable peeling for
 alignment then we certainly don't want to remove the ability to disable
 versioning for alignment, no?

 We already have the ability to turn off versioning -- via --param. It
 is a more natural way to fine tune a pass instead of introducing a -f
 option. For this reason, your planned deprecation of the option is a
 good thing to do.

 For consistency, I think we should introduce a new parameter to turn
 on/off peeling. This can also be tied closely with the cost model
 change.

 The proposed patch attached. Does this one look ok?

 I'd principally support adding a param but rather would for example make
 it limit the number of peeled iterations (zero turning off the feature).  As
 you implemented it it will give false messages for

   supportable_dr_alignment = vect_supportable_dr_alignment (dr, 

Re: New GCC options for loop vectorization

2013-09-18 Thread Xinliang David Li
On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

 As said I'll disable all remains of -ftree-vect-loop-version with the cost 
 model
 patch because it wasn't guarding versioning for aliasing but only versioning
 for alignment.

 We have to be consistent here - if we add a way to disable peeling for
 alignment then we certainly don't want to remove the ability to disable
 versioning for alignment, no?

We already have the ability to turn off versioning -- via --param. It
is a more natural way to fine tune a pass instead of introducing a -f
option. For this reason, your planned deprecation of the option is a
good thing to do.

For consistency, I think we should introduce a new parameter to turn
on/off peeling. This can also be tied closely with the cost model
change.

The proposed patch attached. Does this one look ok?

thanks,

David




 Richard.


 thanks,

 David


 Richard.


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

 

Re: New GCC options for loop vectorization

2013-09-17 Thread Richard Biener
On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

But that's just a bug that should be fixed (looking into it).

  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

As said I'll disable all remains of -ftree-vect-loop-version with the cost model
patch because it wasn't guarding versioning for aliasing but only versioning
for alignment.

We have to be consistent here - if we add a way to disable peeling for
alignment then we certainly don't want to remove the ability to disable
versioning for alignment, no?

Richard.


 thanks,

 David


 Richard.


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


 Right -- same is true for -fprofile-use option. FDO enables some
 passes, but can not re-enable them if they are flipped off before.


 That said, the option machinery doesn't handle an option being an alias
 for two other options, so it's mechanism to contract positives/negatives
 doesn't work here and the override hooks do not work reliably for
 repeated options.

 Or am I wrong here?  Should 

Re: New GCC options for loop vectorization

2013-09-17 Thread Richard Biener
On Tue, Sep 17, 2013 at 10:20 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

Bug in the testcase.  b[i] asserts that b is aligned to 'int', so this invokes
undefined behavior if peeling cannot reach an alignment of 16.

Richard.


Re: New GCC options for loop vectorization

2013-09-17 Thread Xinliang David Li
On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

 Using cost model to do a coarse grain control/configuration is
 certainly something we want, but having a fine grain control is still
 useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

 Loop peeling seems to be the main one. There is also a correctness
 issue related. For instance, the following code is common in practice,
 but loop peeling wrongly assumes initial base-alignment and generates
 aligned mov instruction after peeling, leading to SEGV.  Peeling is
 not something we can blindly turned on -- even when it is on, there
 should be a way to turn it off explicitly:

 char a[1];

 void foo(int n)
 {
   int* b = (int*)(a+n);
   int i = 0;
   for (; i  1000; ++i)
 b[i] = 1;
 }

 int main(int argn, char** argv)
 {
   foo(argn);
 }

 But that's just a bug that should be fixed (looking into it).

This kind of code is not uncommon for certain applications (e.g, group
varint decoding).  Besides, the code like this may be built with
-fno-strict-aliasing.



  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

 This is not true for target supporting mis-aligned access. I have not
 seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

 Yet another reason to turn it off as it is not effective anyways?

 As said I'll disable all remains of -ftree-vect-loop-version with the cost 
 model
 patch because it wasn't guarding versioning for aliasing but only versioning
 for alignment.

 We have to be consistent here - if we add a way to disable peeling for
 alignment then we certainly don't want to remove the ability to disable
 versioning for alignment, no?

yes, for consistency, the version control flag may also be useful to be kept.

David


 Richard.


 thanks,

 David


 Richard.


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


 Right -- same is true for -fprofile-use option. FDO 

Re: New GCC options for loop vectorization

2013-09-17 Thread Xinliang David Li
On Tue, Sep 17, 2013 at 8:45 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Sep 17, 2013 at 08:37:57AM -0700, Xinliang David Li wrote:
  char a[1];
 
  void foo(int n)
  {
int* b = (int*)(a+n);
int i = 0;
for (; i  1000; ++i)
  b[i] = 1;
  }
 
  int main(int argn, char** argv)
  {
foo(argn);
  }
 
  But that's just a bug that should be fixed (looking into it).

 This kind of code is not uncommon for certain applications (e.g, group
 varint decoding).  Besides, the code like this may be built with

 That is irrelevant to the fact that it is invalid.

 -fno-strict-aliasing.

 It isn't invalid because of aliasing violations, but because of unaligned
 access without saying that it is unaligned (say accessing through
 aligned(1) type, or packed struct or similar, or doing memcpy).
 On various architectures unaligned accesses don't cause faults, so it
 may appear to work, and even on i?86/x86_64 often appears to work, as
 long as you aren't trying to vectorize code (which doesn't change anything
 on the fact that it is undefined behavior).

ok, undefined behavior it is.  By the way, ICC does loop versioning on
the case and therefore has no problem. Clang/LLVM vectorizes it with
neither peeling nor versioning, and it works fine to. For legacy code
like this, GCC is less tolerant.

thanks,

David


 Jakub


Re: New GCC options for loop vectorization

2013-09-17 Thread Jakub Jelinek
On Tue, Sep 17, 2013 at 08:37:57AM -0700, Xinliang David Li wrote:
  char a[1];
 
  void foo(int n)
  {
int* b = (int*)(a+n);
int i = 0;
for (; i  1000; ++i)
  b[i] = 1;
  }
 
  int main(int argn, char** argv)
  {
foo(argn);
  }
 
  But that's just a bug that should be fixed (looking into it).
 
 This kind of code is not uncommon for certain applications (e.g, group
 varint decoding).  Besides, the code like this may be built with

That is irrelevant to the fact that it is invalid.

 -fno-strict-aliasing.

It isn't invalid because of aliasing violations, but because of unaligned
access without saying that it is unaligned (say accessing through
aligned(1) type, or packed struct or similar, or doing memcpy).
On various architectures unaligned accesses don't cause faults, so it
may appear to work, and even on i?86/x86_64 often appears to work, as
long as you aren't trying to vectorize code (which doesn't change anything
on the fact that it is undefined behavior).

Jakub


Re: New GCC options for loop vectorization

2013-09-16 Thread Richard Biener
On Fri, Sep 13, 2013 at 6:45 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

 It looks right to me.  The general principle is that the more specific
 option takes precedence over the less specific one, whatever the order on
 the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

 Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

 Should mean -ftree-slp-vectorize.

Thanks for clarifying.

Richard.

 --
 Joseph S. Myers
 jos...@codesourcery.com


Re: New GCC options for loop vectorization

2013-09-16 Thread Richard Biener
On Fri, Sep 13, 2013 at 6:56 PM, Xinliang David Li davi...@google.com wrote:
 Updated patch implementing the logic that more specific option wins.

 Ok for trunk?

@@ -2305,8 +2305,8 @@ omp_max_vf (void)
 {
   if (!optimize
   || optimize_debug
-  || (!flag_tree_vectorize
-   global_options_set.x_flag_tree_vectorize))
+  || (!flag_tree_loop_vectorize
+   global_options_set.x_flag_tree_loop_vectorize))
 return 1;

Not sure what is the intent here, but it looks like
-fno-tree-vectorize will no longer disable this.  So it would
need to check (global_options_set.x_flag_tree_vectorize ||
global_options_set.x_flag_tree_loop_vectorize)?  Jakub?

   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
   loop-simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
   cfun-has_simduid_loops = true;
  }
-  /* If not -fno-tree-vectorize, hint that we want to vectorize
+  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
  the loop.  */
-  if ((flag_tree_vectorize
-   || !global_options_set.x_flag_tree_vectorize)
+  if ((flag_tree_loop_vectorize
+   || !global_options_set.x_flag_tree_loop_vectorize)
loop-safelen  1)
  {
   loop-force_vect = true;

similar.

-  if (!opts_set-x_flag_tree_vectorize)
- opts-x_flag_tree_vectorize = value;
+  if (!opts_set-x_flag_tree_loop_vectorize)
+ opts-x_flag_tree_loop_vectorize = value;
+  if (!opts_set-x_flag_tree_slp_vectorize)
+ opts-x_flag_tree_slp_vectorize = value;

similar - if I use -fprofile-use -fno-tree-vecotorize you override this choice.
This case should be wrapped in if (!opts_set-x_flag_tree_vectorize)

 @item -ftree-vectorize
 @opindex ftree-vectorize
+Perform vectorization on trees. This flag enables
@option{-ftree-loop-vectorize}
+and @option{-ftree-slp-vectorize} if neither option is explicitly specified.

if neither option is explicitely specified doesn't correctly document
-ftree-loop-vectorize -ftree-vectorize behavior, no? (-ftree-slp-vectorize
is still enabled here)

I'm not a native speaker so I cannot suggest a clearer wording here
but maybe just say if not explicitely specified.

Ok with the -fprofile-use change I suggested and whatever resolution Jakub
suggests and the doc adjustment.

Thanks,
Richard.

 thanks,

 David

 On Fri, Sep 13, 2013 at 9:48 AM, Xinliang David Li davi...@google.com wrote:
 Ok -- then my updated patch is wrong then. The implementation in the
 first version matches the requirement.

 thanks,

 David


 On Fri, Sep 13, 2013 at 9:45 AM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

 It looks right to me.  The general principle is that the more specific
 option takes precedence over the less specific one, whatever the order on
 the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

 Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

 Should mean -ftree-slp-vectorize.

 --
 Joseph S. Myers
 jos...@codesourcery.com


Re: New GCC options for loop vectorization

2013-09-16 Thread Richard Biener
On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

 Ok. Can you also comment on 2) ?

I think we want to decide how granular we want to control the vectorizer
and using which mechanism.  My cost-model re-org makes
ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
a step backwards in this context.

So, can you summarize what pieces (including versioning) of the vectorizer
you'd want to be able to disable separately?  Just disabling peeling for
alignment may get you into the versioning for alignment path (and thus
an unvectorized loop at runtime).  Also it's know that the alignment peeling
code needs some serious TLC (it's outcome depends on the order of DRs,
the cost model it uses leaves to be desired as we cannot distinguish
between unaligned load and store costs).

Richard.


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


 Right -- same is true for -fprofile-use option. FDO enables some
 passes, but can not re-enable them if they are flipped off before.


 That said, the option machinery doesn't handle an option being an alias
 for two other options, so it's mechanism to contract positives/negatives
 doesn't work here and the override hooks do not work reliably for
 repeated options.

 Or am I wrong here?  Should we care at all?  Joseph?

 We should probably just document the behavior. Even better, we should
 deprecate the old option.

 thanks,

 David


 Thanks,
 Richard.


 thanks,

 David


Re: New GCC options for loop vectorization

2013-09-16 Thread Jakub Jelinek
On Mon, Sep 16, 2013 at 12:07:37PM +0200, Richard Biener wrote:
 On Fri, Sep 13, 2013 at 6:56 PM, Xinliang David Li davi...@google.com wrote:
  Updated patch implementing the logic that more specific option wins.
 
  Ok for trunk?
 
 @@ -2305,8 +2305,8 @@ omp_max_vf (void)
  {
if (!optimize
|| optimize_debug
 -  || (!flag_tree_vectorize
 -   global_options_set.x_flag_tree_vectorize))
 +  || (!flag_tree_loop_vectorize
 +   global_options_set.x_flag_tree_loop_vectorize))
  return 1;
 
 Not sure what is the intent here, but it looks like
 -fno-tree-vectorize will no longer disable this.  So it would
 need to check (global_options_set.x_flag_tree_vectorize ||
 global_options_set.x_flag_tree_loop_vectorize)?  Jakub?

The point of omp_max_vf is to allow vectorization of simd routines
in the source even without -O3/-Ofast/-ftree-vectorize, as long
as user hasn't requested no vectorization (-fno-tree-vectorize, or
-O0, -Og).  So yes, you'd probably need to change this spot
to check for either global_options_set.x_flag_tree_vectorize
|| global_options_set.x_flag_tree_loop_vectorize, because either
of them meant explicit request to either vectorize or not vectorize loops.
If user has requested vectorization or non-vectorization of loops, then
simd loops just should follow those requests, it is only the default
if there was nothing explicit, which will for now be different between
normal and simd loops, normal loops won't be vectorized, simd loops will be,
because they were specially marked in the source and so it is probably
worthwhile to vectorize them.

Jakub


Re: New GCC options for loop vectorization

2013-09-16 Thread Xinliang David Li
I incorporated all the comments and committed the change (also fixed a
test failure with --help=optimizers).

thanks,

David

On Mon, Sep 16, 2013 at 3:07 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 6:56 PM, Xinliang David Li davi...@google.com wrote:
 Updated patch implementing the logic that more specific option wins.

 Ok for trunk?

 @@ -2305,8 +2305,8 @@ omp_max_vf (void)
  {
if (!optimize
|| optimize_debug
 -  || (!flag_tree_vectorize
 -   global_options_set.x_flag_tree_vectorize))
 +  || (!flag_tree_loop_vectorize
 +   global_options_set.x_flag_tree_loop_vectorize))
  return 1;

 Not sure what is the intent here, but it looks like
 -fno-tree-vectorize will no longer disable this.  So it would
 need to check (global_options_set.x_flag_tree_vectorize ||
 global_options_set.x_flag_tree_loop_vectorize)?  Jakub?

int vs = targetm.vectorize.autovectorize_vector_sizes ();
 @@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
loop-simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
cfun-has_simduid_loops = true;
   }
 -  /* If not -fno-tree-vectorize, hint that we want to vectorize
 +  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
   the loop.  */
 -  if ((flag_tree_vectorize
 -   || !global_options_set.x_flag_tree_vectorize)
 +  if ((flag_tree_loop_vectorize
 +   || !global_options_set.x_flag_tree_loop_vectorize)
 loop-safelen  1)
   {
loop-force_vect = true;

 similar.

 -  if (!opts_set-x_flag_tree_vectorize)
 - opts-x_flag_tree_vectorize = value;
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;

 similar - if I use -fprofile-use -fno-tree-vecotorize you override this 
 choice.
 This case should be wrapped in if (!opts_set-x_flag_tree_vectorize)

  @item -ftree-vectorize
  @opindex ftree-vectorize
 +Perform vectorization on trees. This flag enables
 @option{-ftree-loop-vectorize}
 +and @option{-ftree-slp-vectorize} if neither option is explicitly specified.

 if neither option is explicitely specified doesn't correctly document
 -ftree-loop-vectorize -ftree-vectorize behavior, no? (-ftree-slp-vectorize
 is still enabled here)

 I'm not a native speaker so I cannot suggest a clearer wording here
 but maybe just say if not explicitely specified.

 Ok with the -fprofile-use change I suggested and whatever resolution Jakub
 suggests and the doc adjustment.

 Thanks,
 Richard.

 thanks,

 David

 On Fri, Sep 13, 2013 at 9:48 AM, Xinliang David Li davi...@google.com 
 wrote:
 Ok -- then my updated patch is wrong then. The implementation in the
 first version matches the requirement.

 thanks,

 David


 On Fri, Sep 13, 2013 at 9:45 AM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

 It looks right to me.  The general principle is that the more specific
 option takes precedence over the less specific one, whatever the order on
 the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

 Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

 Should mean -ftree-slp-vectorize.

 --
 Joseph S. Myers
 jos...@codesourcery.com


Re: New GCC options for loop vectorization

2013-09-16 Thread Xinliang David Li
On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

 Ok. Can you also comment on 2) ?

 I think we want to decide how granular we want to control the vectorizer
 and using which mechanism.  My cost-model re-org makes
 ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
 a step backwards in this context.

Using cost model to do a coarse grain control/configuration is
certainly something we want, but having a fine grain control is still
useful.


 So, can you summarize what pieces (including versioning) of the vectorizer
 you'd want to be able to disable separately?

Loop peeling seems to be the main one. There is also a correctness
issue related. For instance, the following code is common in practice,
but loop peeling wrongly assumes initial base-alignment and generates
aligned mov instruction after peeling, leading to SEGV.  Peeling is
not something we can blindly turned on -- even when it is on, there
should be a way to turn it off explicitly:

char a[1];

void foo(int n)
{
  int* b = (int*)(a+n);
  int i = 0;
  for (; i  1000; ++i)
b[i] = 1;
}

int main(int argn, char** argv)
{
  foo(argn);
}



  Just disabling peeling for
 alignment may get you into the versioning for alignment path (and thus
 an unvectorized loop at runtime).

This is not true for target supporting mis-aligned access. I have not
seen a case where alignment driver loop version happens on x86.

Also it's know that the alignment peeling
 code needs some serious TLC (it's outcome depends on the order of DRs,
 the cost model it uses leaves to be desired as we cannot distinguish
 between unaligned load and store costs).

Yet another reason to turn it off as it is not effective anyways?


thanks,

David


 Richard.


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


 Right -- same is true for -fprofile-use option. FDO enables some
 passes, but can not re-enable them if they are flipped off before.


 That said, the option machinery doesn't handle an option being an alias
 for two other options, so it's mechanism to contract positives/negatives
 doesn't work here and the override hooks do not work reliably for
 repeated options.

 Or am I wrong here?  Should we care at all?  Joseph?

 We should probably just document the behavior. Even better, we should
 deprecate the old option.

 thanks,

 David


 Thanks,
 Richard.


 thanks,

 David


Re: New GCC options for loop vectorization

2013-09-13 Thread Xinliang David Li
On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

Ok. Can you also comment on 2) ?


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


Right -- same is true for -fprofile-use option. FDO enables some
passes, but can not re-enable them if they are flipped off before.


 That said, the option machinery doesn't handle an option being an alias
 for two other options, so it's mechanism to contract positives/negatives
 doesn't work here and the override hooks do not work reliably for
 repeated options.

 Or am I wrong here?  Should we care at all?  Joseph?

We should probably just document the behavior. Even better, we should
deprecate the old option.

thanks,

David


 Thanks,
 Richard.


 thanks,

 David


Re: New GCC options for loop vectorization

2013-09-13 Thread Xinliang David Li
New patch attached.

1) the peeling part is removed
2) the new patch implements the last-one-wins logic. -ftree-vectorize
behaves like a true alias. -fno-tree-vectorize can override previous
-ftree-xxx-vectorize.

Ok for trunk after testing?

thanks,

David

On Fri, Sep 13, 2013 at 8:16 AM, Xinliang David Li davi...@google.com wrote:
 On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com 
 wrote:
 Currently -ftree-vectorize turns on both loop and slp vectorizations,
 but there is no simple way to turn on loop vectorization alone. The
 logic for default O3 setting is also complicated.

 In this patch, two new options are introduced:

 1) -ftree-loop-vectorize

 This option is used to turn on loop vectorization only. option
 -ftree-slp-vectorize also becomes a first class citizen, and no funny
 business of Init(2) is needed.  With this change, -ftree-vectorize
 becomes a simple alias to -ftree-loop-vectorize +
 -ftree-slp-vectorize.

 For instance, to turn on only slp vectorize at O3, the old way is:

  -O3 -fno-tree-vectorize -ftree-slp-vectorize

 With the new change it becomes:

 -O3 -fno-loop-vectorize


 To turn on only loop vectorize at O2, the old way is

 -O2 -ftree-vectorize -fno-slp-vectorize

 The new way is

 -O2 -ftree-loop-vectorize



 2) -ftree-vect-loop-peeling

 This option is used to turn on/off loop peeling for alignment.  In the
 long run, this should be folded into the cheap cost model proposed by
 Richard.  This option is also useful in scenarios where peeling can
 introduce runtime problems:
 http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
 common in practice.



 Patch attached. Compiler boostrapped. Ok after testing?

 I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

 Ok. Can you also comment on 2) ?


 I've stopped a quick try doing 1) myself because

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 or

   -ftree-loop-vectorize -fno-tree-vectorize

 properly?  Currently at least

   -ftree-slp-vectorize -fno-tree-vectorize

 doesn't work.


 Right -- same is true for -fprofile-use option. FDO enables some
 passes, but can not re-enable them if they are flipped off before.


 That said, the option machinery doesn't handle an option being an alias
 for two other options, so it's mechanism to contract positives/negatives
 doesn't work here and the override hooks do not work reliably for
 repeated options.

 Or am I wrong here?  Should we care at all?  Joseph?

 We should probably just document the behavior. Even better, we should
 deprecate the old option.

 thanks,

 David


 Thanks,
 Richard.


 thanks,

 David
Index: ChangeLog
===
--- ChangeLog   (revision 202540)
+++ ChangeLog   (working copy)
@@ -1,3 +1,22 @@
+2013-09-12  Xinliang David Li  davi...@google.com
+
+   * tree-if-conv.c (main_tree_if_conversion): Check new flag.
+   * omp-low.c (omp_max_vf): Ditto.
+   (expand_omp_simd): Ditto.
+   * tree-vectorizer.c (vectorize_loops): Ditto.
+   (gate_vect_slp): Ditto.
+   (gate_increase_alignment): Ditto.
+   * tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+   * tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+   (gate_tree_vectorize): Name change.
+   (tree_vectorize): Ditto.
+   (pass_vectorize::gate): Call new function.
+   (pass_vectorize::execute): Ditto.
+   opts.c: O3 default setting change.
+   (finish_options): Check new flag.
+   * doc/invoke.texi: Document new flags.
+   * common.opt: New flags.
+
 2013-09-12  Vladimir Makarov  vmaka...@redhat.com
 
PR middle-end/58335
Index: cp/lambda.c
===
--- cp/lambda.c (revision 202540)
+++ cp/lambda.c (working copy)
@@ -792,7 +792,7 @@ maybe_add_lambda_conv_op (tree type)
  particular, parameter pack expansions are marked PACK_EXPANSION_LOCAL_P in
  the body CALL, but not in DECLTYPE_CALL.  */
 
-  vectree, va_gc *direct_argvec;
+  vectree, va_gc *direct_argvec = NULL;
   tree decltype_call = 0, call;
   tree fn_result = TREE_TYPE (TREE_TYPE (callop));
 
Index: tree-ssa-loop.c
===
--- tree-ssa-loop.c (revision 202540)
+++ tree-ssa-loop.c (working copy)
@@ -303,7 +303,7 @@ make_pass_predcom (gcc::context *ctxt)
 /* Loop autovectorization.  */
 
 static unsigned int

Re: New GCC options for loop vectorization

2013-09-13 Thread Joseph S. Myers
On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;
 
 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;
 
 doesn't look obviously correct.  Does that handle

It looks right to me.  The general principle is that the more specific 
option takes precedence over the less specific one, whatever the order on 
the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

Should mean -ftree-slp-vectorize.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: New GCC options for loop vectorization

2013-09-13 Thread Xinliang David Li
Ok -- then my updated patch is wrong then. The implementation in the
first version matches the requirement.

thanks,

David


On Fri, Sep 13, 2013 at 9:45 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

 It looks right to me.  The general principle is that the more specific
 option takes precedence over the less specific one, whatever the order on
 the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

 Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

 Should mean -ftree-slp-vectorize.

 --
 Joseph S. Myers
 jos...@codesourcery.com


Re: New GCC options for loop vectorization

2013-09-13 Thread Xinliang David Li
Updated patch implementing the logic that more specific option wins.

Ok for trunk?

thanks,

David

On Fri, Sep 13, 2013 at 9:48 AM, Xinliang David Li davi...@google.com wrote:
 Ok -- then my updated patch is wrong then. The implementation in the
 first version matches the requirement.

 thanks,

 David


 On Fri, Sep 13, 2013 at 9:45 AM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Fri, 13 Sep 2013, Richard Biener wrote:

 @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
  opts-x_flag_ipa_reference = false;
break;

 +case OPT_ftree_vectorize:
 +  if (!opts_set-x_flag_tree_loop_vectorize)
 + opts-x_flag_tree_loop_vectorize = value;
 +  if (!opts_set-x_flag_tree_slp_vectorize)
 + opts-x_flag_tree_slp_vectorize = value;
 +  break;

 doesn't look obviously correct.  Does that handle

 It looks right to me.  The general principle is that the more specific
 option takes precedence over the less specific one, whatever the order on
 the command line.

   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

 Should mean -ftree-slp-vectorize.

   -ftree-loop-vectorize -fno-tree-vectorize

 Should mean -ftree-loop-vectorize.

   -ftree-slp-vectorize -fno-tree-vectorize

 Should mean -ftree-slp-vectorize.

 --
 Joseph S. Myers
 jos...@codesourcery.com
Index: omp-low.c
===
--- omp-low.c   (revision 202540)
+++ omp-low.c   (working copy)
@@ -2305,8 +2305,8 @@ omp_max_vf (void)
 {
   if (!optimize
   || optimize_debug
-  || (!flag_tree_vectorize
-  global_options_set.x_flag_tree_vectorize))
+  || (!flag_tree_loop_vectorize
+  global_options_set.x_flag_tree_loop_vectorize))
 return 1;
 
   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
  loop-simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
  cfun-has_simduid_loops = true;
}
-  /* If not -fno-tree-vectorize, hint that we want to vectorize
+  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
 the loop.  */
-  if ((flag_tree_vectorize
-  || !global_options_set.x_flag_tree_vectorize)
+  if ((flag_tree_loop_vectorize
+  || !global_options_set.x_flag_tree_loop_vectorize)
   loop-safelen  1)
{
  loop-force_vect = true;
Index: ChangeLog
===
--- ChangeLog   (revision 202540)
+++ ChangeLog   (working copy)
@@ -1,3 +1,22 @@
+2013-09-12  Xinliang David Li  davi...@google.com
+
+   * tree-if-conv.c (main_tree_if_conversion): Check new flag.
+   * omp-low.c (omp_max_vf): Ditto.
+   (expand_omp_simd): Ditto.
+   * tree-vectorizer.c (vectorize_loops): Ditto.
+   (gate_vect_slp): Ditto.
+   (gate_increase_alignment): Ditto.
+   * tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+   * tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+   (gate_tree_vectorize): Name change.
+   (tree_vectorize): Ditto.
+   (pass_vectorize::gate): Call new function.
+   (pass_vectorize::execute): Ditto.
+   opts.c: O3 default setting change.
+   (finish_options): Check new flag.
+   * doc/invoke.texi: Document new flags.
+   * common.opt: New flags.
+
 2013-09-12  Vladimir Makarov  vmaka...@redhat.com
 
PR middle-end/58335
Index: opts.c
===
--- opts.c  (revision 202540)
+++ opts.c  (working copy)
@@ -498,7 +498,8 @@ static const struct default_options defa
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 
},
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
-{ OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
@@ -826,7 +827,8 @@ finish_options (struct gcc_options *opts
 
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
  is disabled.  */
-  if (!opts-x_flag_tree_vectorize || !opts-x_flag_tree_loop_if_convert)
+  if ((!opts-x_flag_tree_loop_vectorize  !opts-x_flag_tree_slp_vectorize)
+   || !opts-x_flag_tree_loop_if_convert)
 maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
opts-x_param_values, opts_set-x_param_values);
 
@@ -1660,8 +1662,10 @@ common_handle_option (struct gcc_options
opts-x_flag_unswitch_loops = value;
   if (!opts_set-x_flag_gcse_after_reload)
opts-x_flag_gcse_after_reload = value;
-  if (!opts_set-x_flag_tree_vectorize)
-   opts-x_flag_tree_vectorize = value;
+ 

New GCC options for loop vectorization

2013-09-12 Thread Xinliang David Li
Currently -ftree-vectorize turns on both loop and slp vectorizations,
but there is no simple way to turn on loop vectorization alone. The
logic for default O3 setting is also complicated.

In this patch, two new options are introduced:

1) -ftree-loop-vectorize

This option is used to turn on loop vectorization only. option
-ftree-slp-vectorize also becomes a first class citizen, and no funny
business of Init(2) is needed.  With this change, -ftree-vectorize
becomes a simple alias to -ftree-loop-vectorize +
-ftree-slp-vectorize.

For instance, to turn on only slp vectorize at O3, the old way is:

 -O3 -fno-tree-vectorize -ftree-slp-vectorize

With the new change it becomes:

-O3 -fno-loop-vectorize


To turn on only loop vectorize at O2, the old way is

-O2 -ftree-vectorize -fno-slp-vectorize

The new way is

-O2 -ftree-loop-vectorize



2) -ftree-vect-loop-peeling

This option is used to turn on/off loop peeling for alignment.  In the
long run, this should be folded into the cheap cost model proposed by
Richard.  This option is also useful in scenarios where peeling can
introduce runtime problems:
http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
common in practice.



Patch attached. Compiler boostrapped. Ok after testing?


thanks,

David
Index: omp-low.c
===
--- omp-low.c   (revision 202481)
+++ omp-low.c   (working copy)
@@ -2305,8 +2305,8 @@ omp_max_vf (void)
 {
   if (!optimize
   || optimize_debug
-  || (!flag_tree_vectorize
-  global_options_set.x_flag_tree_vectorize))
+  || (!flag_tree_loop_vectorize
+  global_options_set.x_flag_tree_loop_vectorize))
 return 1;
 
   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
  loop-simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
  cfun-has_simduid_loops = true;
}
-  /* If not -fno-tree-vectorize, hint that we want to vectorize
+  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
 the loop.  */
-  if ((flag_tree_vectorize
-  || !global_options_set.x_flag_tree_vectorize)
+  if ((flag_tree_loop_vectorize
+  || !global_options_set.x_flag_tree_loop_vectorize)
   loop-safelen  1)
{
  loop-force_vect = true;
Index: ChangeLog
===
--- ChangeLog   (revision 202481)
+++ ChangeLog   (working copy)
@@ -1,3 +1,24 @@
+2013-09-12  Xinliang David Li  davi...@google.com
+
+   * tree-if-conv.c (main_tree_if_conversion): Check new flag.
+   * omp-low.c (omp_max_vf): Ditto.
+   (expand_omp_simd): Ditto.
+   * tree-vectorizer.c (vectorize_loops): Ditto.
+   (gate_vect_slp): Ditto.
+   (gate_increase_alignment): Ditto.
+   * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Ditto.
+   * tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+   * tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+   (gate_tree_vectorize): Name change.
+   (tree_vectorize): Ditto.
+   (pass_vectorize::gate): Call new function.
+   (pass_vectorize::execute): Ditto.
+   opts.c: O3 default setting change.
+   (finish_options): Check new flag.
+   * doc/invoke.texi: Document new flags.
+   * common.opt: New flags.
+
+
 2013-09-10  Richard Earnshaw  rearn...@arm.com
 
PR target/58361
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 202481)
+++ doc/invoke.texi (working copy)
@@ -419,10 +419,12 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-if-convert-stores -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
+-ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge @gol
--ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
+-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
+-ftree-vect-loop-version -ftree-vect-loop-peeling -ftree-vectorize @gol
+-ftree-vrp @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
@@ -6748,8 +6750,8 @@ invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize},
-@option{-fvect-cost-model},
+@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},