Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-15 Thread Jeff Law

On 06/15/2016 05:03 AM, Richard Biener wrote:

On Thu, May 19, 2016 at 9:39 PM, Ilya Enkovich
 wrote:

Hi,

This patch introduces changes required to run vectorizer on loop
epilogue. This also enables epilogue vectorization using a vector
of smaller size.


While the idea of epilogue vectorization sounds straight-forward the
implementation is somewhat icky with all the ->aux stuff, "redundant"
if-conversion and loop iteration stuff.

So I was thinking of when epilogue vectorization is beneficial which
is obviously when the overall loop trip count is low.  We are not
good in optimizing for that case generally (too much peeling for
alignment, using expensive avx256 vectorization, etc.), so I wonder
if versioning for that case would be a better idea
(performance-wise).

Thus - what cases were you looking at when deciding that vectorizing
the epilogue (with a smaller vector size) is profitable?  Do other
compilers generally do this?
I would think it's better stated that the relative benefits of 
vectorizing the epilogue are greater the shorter the loop, but that's 
nit-picking the discussion.


I do think you've got a legitimate question though.   Ilya, can you give 
any insights here based on your KNL and Haswell testing or data/insights 
from the LLVM and/or ICC teams?


Jeff


Re: [PATCH, vec-tails 02/10] Extend _loop_vec_info structure with epilogue related fields

2016-06-15 Thread Jeff Law

On 05/19/2016 01:38 PM, Ilya Enkovich wrote:

Hi,

This patch adds new fields to _loop_vec_info structure to support loop
epilogue vectorization.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vectorizer.h (struct _loop_vec_info): Add new fields
can_be_masked, required_masks, mask_epilogue, combine_epilogue,
need_masking, orig_loop_info.
(LOOP_VINFO_CAN_BE_MASKED): New.
(LOOP_VINFO_REQUIRED_MASKS): New.
(LOOP_VINFO_COMBINE_EPILOGUE): New.
(LOOP_VINFO_MASK_EPILOGUE): New.
(LOOP_VINFO_NEED_MASKING): New.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_MASK_EPILOGUE): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
* tree-vect-loop.c (new_loop_vec_info): Initialize new
_loop_vec_info fields.
I don't see anything here that is inherently wrong/bad here; I think 
this would be fine once the whole set is approved.   I also think if you 
find that you need additional similar kinds of fields, that would be OK 
as well.


The one question I would ask -- do we ever need to copy VINFO data from 
one loop to a duplicate, and if so, ISTM that the code to copy that data 
would be a part of this patch.


jeff



Re: [PATCH, vec-tails 01/10] New compiler options

2016-06-15 Thread Jeff Law

On 05/20/2016 05:40 AM, Ilya Enkovich wrote:

2016-05-20 14:17 GMT+03:00 Richard Biener :

On Fri, May 20, 2016 at 11:50 AM, Ilya Enkovich  wrote:

2016-05-20 12:26 GMT+03:00 Richard Biener :

On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich  wrote:

Hi,

This patch introduces new options used for loop epilogues vectorization.


Why's that?  This is a bit too much for the casual user and if it is
really necessary
to control this via options then it is not fine-grained enough.

Why doesn't the vectorizer/backend have enough info to decide this itself?


I don't expect casual user to decide which modes to choose.  These controls are
added for debugging and performance measurement purposes.  I see now I miss
-ftree-vectorize-epilogues aliased to -ftree-vectorize-epilogues=all.  Surely
I expect epilogues and short loops vectorization be enabled by default on -O3
or by -ftree-vectorize-loops.


Can you make all these --params then?  I think to be useful to users we'd want
them to be loop pragmas rather than options.


OK, I'll change it to params.  I didn't think about control via
pragmas but will do now.

So the questions I'd like to see answered:

1. You've got 3 modes for epilogue vectorization.  Is this an artifact 
of not really having good heuristics yet for which mode to apply to a 
particular loop at this time?


2. Similarly for cost models.


In the cover message you indicated you were getting expected gains of 
KNL, but not on Haswell.  Do you have any sense yet why you're not 
getting good resuls on Haswell yet?  For KNL are you getting those 
speedups with a generic set of options or are those with a custom set of 
options to set the mode & cost models?


jeff


Re: [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues

2016-06-15 Thread Jeff Law

On 06/15/2016 06:05 AM, Richard Biener wrote:

So I've gone over the patches and gave mostly high-level comments.
The vectorizer is already in somewhat messy (aka not easy to follow)
state, this series doesn't improve the situation (heh).  Esp. the
high-level structure for code generation and its documentation needs
work (where we do versioning / peeling and how we use the copies in
which condition and where, etc).
Expecting major improvements here may not be realistic.  I think the 
question we need to answer is whether or not the improvements from this 
work justify the added complexity.


In an ideal world, I think we'd probably start over on the vectorizer, 
but the infrastructure we have is what it is -- a tangled mess that is 
difficult to understand without working in it regularly.


I'm still hoping to give this stuff a high level looksie before going on 
PTO later this month.


Jeff



Re: [PATCH] Fix up CSE handling of const/pure calls (PR rtl-optimization/71532)

2016-06-15 Thread Jeff Law

On 06/15/2016 07:14 PM, Alan Modra wrote:

On Wed, Jun 15, 2016 at 04:03:04PM -0600, Jeff Law wrote:

FWIW I don't think ownership of the argument slots has ever been
definitively addressed by any ABI and it's been an open question in my mind
for 20+ years -- though I've largely leaned towards callee ownership on my
own thinking.  In an ideal world we'd push to get this clarified at the ABI
level.


The PowerPC64 ABI specifies that the stack parameter save area is not
preserved over calls.  There's a good reason for this:  An ABI that
specifies stack argument slots as preserved over calls cannot allow
sibling calls.
I didn't know it was specified there.  Excellent.  Good to see someone 
doing the right thing with getting that nailed down.


I did read your message about sibling call support implying callee 
ownership of the slots -- and I agree with that conclusion.


jeff


[PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-15 Thread Cesar Philippidis
This patch addresses the following problems with acc routines:

 * incorrectly permitting 'acc seq' loops to call gang, worker and
   vector routines

 * lto-wrapper errors when a function or subroutine isn't marked as
   'acc routine'

The solution to the first problem is straightforward. It only required a
small change to oacc_loop_fixed_partitions. The solution to the second
problem is more involved, since it required changes to the fortran FE,
gimplifier, the behavior of flag_generate_offload, and libgomp.

Starting with the the fortran changes, this patch updates the way that
the fortran FE handles the 'acc routine' attribute in modules. Before,
it only recorded that a function was marked as an acc routine. With this
patch, it now records the level of parallelism the routine has. This is
necessary for the middle end to validate compatible parallelism between
the loop calling the routine and the routine itself.

The second set of changes involves teaching the gimplifier to error when
it detects a function call to an non-acc routines inside an OpenACC
offloaded region. Actually, I relaxed non-acc routines by excluding
calls to builtin functions, including those prefixed with _gfortran_.
Nvptx does have a newlib c library, and it also has a subset of
libgfortran. Still, this solution is probably not optimal.

Next, I had to modify the openacc header files in libgomp to mark
acc_on_device as an acc routine. Unfortunately, this meant that I had to
build the opeancc.mod module for gfortran with -fopenacc. But doing
that, caused caused gcc to stream offloaded code to the openacc.o object
file. So, I've updated the behavior of flag_generate_offload such that
minus one indicates that the user specified -foffload=disable, and that
will prevent gcc from streaming offloaded lto code. The alternative was
to hack libtool to build libgomp with -foffload=disable.

Is this patch OK for trunk?

There are still a couple of other quirks with routines we'll need to
address with a follow up patch. Namely, passing scalar dummy arguments
causes to subroutines trips up the nvptx worker and vector state
propagator if the actual argument is a local variable. That's because
the nvptx state propagator only forwards the pointer to the worker and
vector threads, and not the actual variable itself. Consequently, those
pointers dereference garbage. This is a problem with pass-by-reference
in general.

Cesar

2016-06-15  Cesar Philippidis  

	gcc/
	* cgraphunit.c (ipa_passes): Only stream offloaded code when
	flag_generate_offload is positive.
	(symbol_table::compile): Likewise.
	* common.opt (flag_generate_offload): Update comment on its usage.
	* gimplify.c (gimplify_call_expr): Verify that function calls inside
	OpenACC offloaded regions are 'acc routines'.
	* ipa-inline-analysis.c (inline_generate_summary): Update the usage of
	flag_generate_offload.
	* lto-streamer.c (gate_lto_out): Likewise.
	* omp-low.c (oacc_loop_fixed_partitions): Consider SEQ loop when
	validing loop parallelism restrictions.
	* opts.c (common_handle_option): Set x_flag_generate_offload to minus
	one with -foffload=disable.
	* passes.c (ipa_write_summaries): Update usage of flag_generate_offload.
	* toplev.c (compile_file): Likewise.
	* tree.c (free_lang_data):  Likewise.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	* module.c (oacc_function): New DECIO_MIO_NAME.
	(mio_symbol_attribute): Handle oacc_function attributes.
	* openmp.c (gfc_oacc_routine_dims): Use enum oacc_function to capture
	acc routine geometry.
	(gfc_match_oacc_routine): Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): New const mstring.
	* trans-decl.c (add_attributes_to_decl): Update handling of
	oacc_function.

	gcc/testsuite/
	* c-c++-common/goacc/routine-3.c: Add test coverage for seq loops.
	* c-c++-common/goacc/routine-6.c: New test.
	* gfortran.dg/goacc/routine-7.f90: New test.
	* gfortran.dg/goacc/routine-8.f90: New test.

	libgomp/
	* Makefile.am (openacc.lo): New target.
	(openacc.mod): Build with -fopenacc -foffload=disable.
	* Makefile.in: Regenerate.
	* openacc.f90 (function_on_device_h): Make 'acc routine seq'.
	* openacc.h (acc_on_device): Likewise.
	* openacc_lib.h (acc_on_device): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Filter out warning.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Update test case to
	properly utilize acc parallelism.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..5dd211c 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2292,12 +2292,12 @@ ipa_passes (void)
 }
 
   /* Some targets need to handle LTO assembler output specially.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
 targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
 {
-  if (g->have_offload)
+  if (g->have_offload && flag_generate_offload > 0)
 	{
 	  section_name_prefix = 

Re: [C++ PATCH] Don't promote bitfields in last arg of __builtin_*_overflow_p

2016-06-15 Thread Martin Sebor

On 06/15/2016 01:51 PM, Jakub Jelinek wrote:

On Wed, Jun 15, 2016 at 08:08:22AM -0600, Martin Sebor wrote:

I like the idea of being able to use the built-ins for this, but
I think it would be confusing for them to follow subtly different
rules for C than for C++.  Since the value of the last argument


Here is incremental patch to the patch I've posted earlier today,
which doesn't promote the last argument of __builtin_*_overflow_p
and thus for bitfields it behaves pretty much like the C FE.


Looks fine to me.  The bit-field handling should be explained
in the manual.  Though useful, it's unusual enough that I don't
think people will expect it (there have been bug reports or
questions in the past about the C handling of bit-fields from
users familiar with the C++ semantics).

Martin



Re: [PATCH, ping] zero-length arrays in OpenACC

2016-06-15 Thread Cesar Philippidis
Ping.

Cesar

On 06/01/2016 02:35 PM, Cesar Philippidis wrote:
> This patch teaches c and c++ front ends and omp-low how to deal with
> subarray involving GOMP_MAP_FORCE_{PRESENT,TO,FROM,TOFROM} data
> mappings. As the libgomp test case shows, it might be possible for a
> subarray to have zero length. This patch fixes that.
> 
> I also updated *parser_oacc_declare not to handle GOMP_MAP_POINTER,
> because subarray pointers are now firstprivate.
> 
> Is this OK for trunk?
> 
> Cesar
> 



Re: [PATCH] Fix up CSE handling of const/pure calls (PR rtl-optimization/71532)

2016-06-15 Thread Alan Modra
On Wed, Jun 15, 2016 at 04:03:04PM -0600, Jeff Law wrote:
> FWIW I don't think ownership of the argument slots has ever been
> definitively addressed by any ABI and it's been an open question in my mind
> for 20+ years -- though I've largely leaned towards callee ownership on my
> own thinking.  In an ideal world we'd push to get this clarified at the ABI
> level.

The PowerPC64 ABI specifies that the stack parameter save area is not
preserved over calls.  There's a good reason for this:  An ABI that
specifies stack argument slots as preserved over calls cannot allow
sibling calls.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-15 Thread Michael Meissner
On Wed, Jun 15, 2016 at 03:12:55PM +0200, Richard Biener wrote:
> On Wed, 15 Jun 2016, Michael Meissner wrote:
> > Eventually, I decided to punt having to have explicit paths for widening.  I
> > used fractional modes for IFmode (ibm long double format) and KFmode (IEEE
> > 128-bit format).  IFmode widens to KFmode which widens to TFmode.  A backend
> > hook is used to not allow IBM long double to widen to IEEE long double and 
> > vice
> > versa.  At the moment, since there is no wider type than 128-bits, it isn't 
> > an
> > issue.
> 
> Ok, using fractional float modes is a lie though as both IFmode and KFmode
> have no insignificant bits.  I checked that if you have two same-size
> FLOAT_MODEs they still get iterated over with GET_MODE_WIDER_MODE,
> in order of appearance.  So your reason to use fractional modes was
> to make the order explicit and to avoid bogus handling in for example
> the constant pool handling which would mistake say IFmode as being
> smaller than KFmode just because one is GET_MODE_WIDER_MODE of the other
> (and both have the same precision)?  Interestingly for 
> GET_MODE_2XWIDER_MODE it "skips" the duplicate-sized and chooses
> the "first" one.
> 
> I guess having to have both float formats usable at the same time
> is not really supported by our mode machinery.
> 
> > Note, I do feel the front ends should be modified to allow __complex 
> > __float128 directly rather than having to use an attribute to force the 
> > complex type (and to use mode(TF) on x86 or mode(KF) on PowerPC).  It 
> > would clean up both x86 and PowerPC.  However, those patches aren't 
> > written yet.
> 
> Sure, but that's independent of the modes issue.
> 
> Anyway, I'm fine with the backport if you sort out the issue with
> the assert in layout_type.

I did change the code as you suggested, and it did bootstrap and have no
regressions.  When I get back on Monday, I will look at implementing the fix in
trunk, and then putting the revised patch into GCC 6.2.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-15 Thread Jeff Law

On 06/15/2016 08:21 AM, Jakub Sejdak wrote:

Hello,


First of all, do you or your employer have a copyright assignment
to the FSF? The above link contains instructions on how to do that.
It is a necessary prerequisite to accepting any non-small change.


Sorry for a late response, but it took me some time to fulfill
requirements mentioned above.
We (Phoenix Systems) now have a copyright assignment to the FSF.

Which I can confirm was recently recorded by the FSF.

Jeff


Re: [PATCH] Fix up CSE handling of const/pure calls (PR rtl-optimization/71532)

2016-06-15 Thread Jeff Law

On 06/15/2016 01:46 PM, Jakub Jelinek wrote:

Hi!

As the following testcase shows, CSE mishandles const/pure calls, it assumes
that const/pure calls can't clobber even their argument slots.  But, the
argument slots are owned by the callee, so need to be volatile across the
calls.  On the testcase the second round of argument stores is eliminated,
because CSE thinks those memory slots already have the right values.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-06-15  Jakub Jelinek  

PR rtl-optimization/71532
* cse.c (cse_insn): For const/pure calls, invalidate argument passing
memory slots.

* gcc.dg/torture/pr71532.c: New test.
FWIW I don't think ownership of the argument slots has ever been 
definitively addressed by any ABI and it's been an open question in my 
mind for 20+ years -- though I've largely leaned towards callee 
ownership on my own thinking.  In an ideal world we'd push to get this 
clarified at the ABI level.


Certainly this is the safe thing to do.

OK for the trunk.


Jeff



Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-15 Thread Marc Glisse

On Wed, 15 Jun 2016, Kyrill Tkachov wrote:

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html 
following feedback.
I've changed the code to cast the operand to an unsigned type before applying 
the multiplication algorithm

and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function 
cast_mult_synth_to_unsigned in which I've implemented

the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.


Ah, I never meant those cases as an exhaustive list, I was just looking 
for examples showing that the transformation was unsafe, and those 2 came 
to mind:


- x*15 -> x*16-x the second one can overflow even when the first one 
doesn't.


- x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's 
redundant with the negate_variant check?)


On the other hand, as long as we remain in the 'positive' operations, 
turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2 
cannot cause spurious overflows. But I didn't look at the algorithm 
closely enough to characterize the safe cases. Now if you have done it, 
that's good :-) Otherwise, we might want to err on the side of caution.


--
Marc Glisse


Re: [PATCH], PowerPC: Allow DImode in Altivec registers

2016-06-15 Thread Michael Meissner
On Wed, Jun 15, 2016 at 02:51:20PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 15, 2016 at 02:24:41PM -0400, Michael Meissner wrote:
> > > >  ; Some DImode loads are best done as a load of -1 followed by a mask
> > > >  ; instruction.
> > > >  (define_split
> > > > -  [(set (match_operand:DI 0 "gpc_reg_operand")
> > > > +  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> > > 
> > > Not sure what this is for...  If you want to say this split is only to
> > > be done after RA, just say that explicitly in the split condition (i.e.
> > > "reload_completed").  Or does this mean something else?
> > 
> > This is so that constants being loaded into the vector registers aren't 
> > split
> > (they are handled via different define_splits).  Previously, the only 
> > constant
> > that was loaded in vector registers was 0.
> > 
> > The int_reg_operand_not_pseudo allows the split to take place if it has 
> > already
> > gotten hard registers before register allocation.
> 
> When does that happen?

Using arguments, function returns, and of course explicit registers, but I
agree it is fairly low.

> > It could have been the
> > normal int_reg_operand and then use a reload_completed check.
> 
> That is preferred if it makes no difference (otherwise, bebfore you know
> it we'll have twice as many predicates).

We already had the predicate for another use, so I wasn't adding a new one.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Fix builtin-arith-overflow-p-1[23].c on i686

2016-06-15 Thread Uros Bizjak
On Wed, Jun 15, 2016 at 9:57 PM, Jakub Jelinek  wrote:
> Hi!
>
> On the builtin-arith-overflow-p-1{2,3}.c testcases (posted earlier today)
> i?86 miscompiles e.g. t111_4mul function.  Before peephole2 we have:
> (insn 9 6 50 2 (parallel [
> (set (reg:CCO 17 flags)
> (eq:CCO (mult:DI (sign_extend:DI (reg/v:SI 0 ax [orig:90 x ] 
> [90]))
> (const_int -1073741824 [0xc000]))
> (sign_extend:DI (mult:SI (reg/v:SI 0 ax [orig:90 x ] [90])
> (const_int -1073741824 [0xc000])
> (set (reg:SI 1 dx [91])
> (mult:SI (reg/v:SI 0 ax [orig:90 x ] [90])
> (const_int -1073741824 [0xc000])))
> ]) builtin-arith-overflow-p-12.i:35 326 {*mulvsi4_1}
>  (expr_list:REG_UNUSED (reg:SI 1 dx [91])
> (nil)))
>
> (insn 50 9 51 2 (set (reg:QI 1 dx [orig:89 _5+4 ] [89])
> (eq:QI (reg:CCO 17 flags)
> (const_int 0 [0]))) builtin-arith-overflow-p-12.i:35 631 
> {*setcc_qi}
>  (expr_list:REG_DEAD (reg:CCO 17 flags)
> (nil)))
>
> (insn 51 50 16 2 (set (reg:SI 1 dx [orig:89 _5+4 ] [89])
> (zero_extend:SI (reg:QI 1 dx [orig:89 _5+4 ] [89]))) 
> builtin-arith-overflow-p-12.i:35 130 {*zero_extendqisi2}
>  (nil))
>
> and the setcc+movzbl peephole ignores the dx = ax * -1073741824
> SET in the PARALLEL, because it isn't a CLOBBER and thus initializes
> %edx before insn 9, then insn 9 overwrites it and later on we store
> the QImode part, assuming the rest is zero.
> Fixed by using reg_set_p, to actually test if the operands[3] REG
> is SET or CLOBBERed.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Please also change similar peephole2 pattern (that does a zext with an
and insn) a couple of patterns below the one you are changing.

The patch is OK for mainline and also release branches.

Thanks,
Uros.


Re: [C++ Patch] Avoid a few more '+' in warnings

2016-06-15 Thread Jason Merrill

Yep, that looks obvious.

Jason


[C++ Patch] Avoid a few more '+' in warnings

2016-06-15 Thread Paolo Carlini

Hi,

looks like last year I forgot to grep for %q+F and %q+#F. Tested 
x86_64-linux. Should be obvious...


Thanks,
Paolo.

//
2016-06-15  Paolo Carlini  

* decl.c (wrapup_globals_for_namespace): Use DECL_SOURCE_LOCATION and
"%qF" in warning_at instead of "%q+F" in warning.
(check_redeclaration_exception_specification): Likewise in pedwarn
(and error, inform, for consistency).
* call.c (joust): Likewise.
Index: call.c
===
--- call.c  (revision 237318)
+++ call.c  (working copy)
@@ -9422,10 +9424,10 @@ joust (struct z_candidate *cand1, struct z_candida
 "default argument mismatch in "
 "overload resolution"))
{
- inform (input_location,
- " candidate 1: %q+#F", cand1->fn);
- inform (input_location,
- " candidate 2: %q+#F", cand2->fn);
+ inform (DECL_SOURCE_LOCATION (cand1->fn),
+ " candidate 1: %q#F", cand1->fn);
+ inform (DECL_SOURCE_LOCATION (cand2->fn),
+ " candidate 2: %q#F", cand2->fn);
}
}
  else
Index: decl.c
===
--- decl.c  (revision 237318)
+++ decl.c  (working copy)
@@ -914,8 +914,9 @@ wrapup_globals_for_namespace (tree name_space, voi
&& !DECL_ARTIFICIAL (decl)
&& !TREE_NO_WARNING (decl))
  {
-   warning (OPT_Wunused_function,
-"%q+F declared % but never defined", decl);
+   warning_at (DECL_SOURCE_LOCATION (decl),
+   OPT_Wunused_function,
+   "%qF declared % but never defined", decl);
TREE_NO_WARNING (decl) = 1;
  }
 }
@@ -1233,18 +1234,20 @@ check_redeclaration_exception_specification (tree
   && !comp_except_specs (new_exceptions, old_exceptions, ce_normal))
 {
   const char *msg
-   = "declaration of %q+F has a different exception specifier";
+   = "declaration of %qF has a different exception specifier";
   bool complained = true;
+  location_t new_loc = DECL_SOURCE_LOCATION (new_decl);
   if (DECL_IN_SYSTEM_HEADER (old_decl))
-   complained = pedwarn (0, OPT_Wsystem_headers, msg, new_decl);
+   complained = pedwarn (new_loc, OPT_Wsystem_headers, msg, new_decl);
   else if (!flag_exceptions)
/* We used to silently permit mismatched eh specs with
   -fno-exceptions, so make them a pedwarn now.  */
-   complained = pedwarn (0, OPT_Wpedantic, msg, new_decl);
+   complained = pedwarn (new_loc, OPT_Wpedantic, msg, new_decl);
   else
-   error (msg, new_decl);
+   error_at (new_loc, msg, new_decl);
   if (complained)
-   inform (0, "from previous declaration %q+F", old_decl);
+   inform (DECL_SOURCE_LOCATION (old_decl),
+   "from previous declaration %qF", old_decl);
 }
 }
 


[PATCH] Fix builtin-arith-overflow-p-1[23].c on i686

2016-06-15 Thread Jakub Jelinek
Hi!

On the builtin-arith-overflow-p-1{2,3}.c testcases (posted earlier today)
i?86 miscompiles e.g. t111_4mul function.  Before peephole2 we have:
(insn 9 6 50 2 (parallel [
(set (reg:CCO 17 flags)
(eq:CCO (mult:DI (sign_extend:DI (reg/v:SI 0 ax [orig:90 x ] 
[90]))
(const_int -1073741824 [0xc000]))
(sign_extend:DI (mult:SI (reg/v:SI 0 ax [orig:90 x ] [90])
(const_int -1073741824 [0xc000])
(set (reg:SI 1 dx [91])
(mult:SI (reg/v:SI 0 ax [orig:90 x ] [90])
(const_int -1073741824 [0xc000])))
]) builtin-arith-overflow-p-12.i:35 326 {*mulvsi4_1}
 (expr_list:REG_UNUSED (reg:SI 1 dx [91])
(nil)))

(insn 50 9 51 2 (set (reg:QI 1 dx [orig:89 _5+4 ] [89])
(eq:QI (reg:CCO 17 flags)
(const_int 0 [0]))) builtin-arith-overflow-p-12.i:35 631 {*setcc_qi}
 (expr_list:REG_DEAD (reg:CCO 17 flags)
(nil)))

(insn 51 50 16 2 (set (reg:SI 1 dx [orig:89 _5+4 ] [89])
(zero_extend:SI (reg:QI 1 dx [orig:89 _5+4 ] [89]))) 
builtin-arith-overflow-p-12.i:35 130 {*zero_extendqisi2}
 (nil))

and the setcc+movzbl peephole ignores the dx = ax * -1073741824
SET in the PARALLEL, because it isn't a CLOBBER and thus initializes
%edx before insn 9, then insn 9 overwrites it and later on we store
the QImode part, assuming the rest is zero.
Fixed by using reg_set_p, to actually test if the operands[3] REG
is SET or CLOBBERed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-06-15  Jakub Jelinek  

* config/i386/i386.md (setcc + movzbl peephole2): Use reg_set_p.

--- gcc/config/i386/i386.md.jj  2016-06-14 21:38:40.0 +0200
+++ gcc/config/i386/i386.md 2016-06-15 18:56:41.405559224 +0200
@@ -11849,8 +11849,7 @@ (define_peephole2
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
&& ! reg_overlap_mentioned_p (operands[3], operands[0])
-   && ! (GET_CODE (operands[4]) == CLOBBER
-&& reg_mentioned_p (operands[3], operands[4]))"
+   && ! reg_set_p (operands[3], operands[4])"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))

Jakub


[C++ PATCH] Don't promote bitfields in last arg of __builtin_*_overflow_p

2016-06-15 Thread Jakub Jelinek
On Wed, Jun 15, 2016 at 08:08:22AM -0600, Martin Sebor wrote:
> I like the idea of being able to use the built-ins for this, but
> I think it would be confusing for them to follow subtly different
> rules for C than for C++.  Since the value of the last argument

Here is incremental patch to the patch I've posted earlier today,
which doesn't promote the last argument of __builtin_*_overflow_p
and thus for bitfields it behaves pretty much like the C FE.

Bootstrapped/regtested (together with the other patch) on x86_64-linux and
i686-linux, ok for trunk?

2016-06-15  Jakub Jelinek  

* call.c (magic_varargs_p): Return 3 for __builtin_*_overflow_p.
(build_over_call): For magic == 3, do no conversion only on 3rd
argument.

* c-c++-common/torture/builtin-arith-overflow-p-19.c: Run for C++ too.
* g++.dg/ext/builtin-arith-overflow-2.C: New test.

--- gcc/cp/call.c.jj2016-06-15 09:17:22.0 +0200
+++ gcc/cp/call.c   2016-06-15 17:06:57.097793446 +0200
@@ -7132,7 +7132,8 @@ convert_for_arg_passing (tree type, tree
which just decay_conversion or no conversions at all should be done.
This is true for some builtins which don't act like normal functions.
Return 2 if no conversions at all should be done, 1 if just
-   decay_conversion.  */
+   decay_conversion.  Return 3 for special treatment of the 3rd argument
+   for __builtin_*_overflow_p.  */
 
 int
 magic_varargs_p (tree fn)
@@ -7149,6 +7150,11 @@ magic_varargs_p (tree fn)
   case BUILT_IN_VA_START:
return 1;
 
+  case BUILT_IN_ADD_OVERFLOW_P:
+  case BUILT_IN_SUB_OVERFLOW_P:
+  case BUILT_IN_MUL_OVERFLOW_P:
+   return 3;
+
   default:;
return lookup_attribute ("type generic",
 TYPE_ATTRIBUTES (TREE_TYPE (fn))) != 0;
@@ -7606,14 +7612,14 @@ build_over_call (struct z_candidate *can
   for (; arg_index < vec_safe_length (args); ++arg_index)
 {
   tree a = (*args)[arg_index];
-  if (magic == 2)
+  if ((magic == 3 && arg_index == 2) || magic == 2)
{
  /* Do no conversions for certain magic varargs.  */
  a = mark_type_use (a);
  if (TREE_CODE (a) == FUNCTION_DECL && reject_gcc_builtin (a))
return error_mark_node;
}
-  else if (magic == 1)
+  else if (magic != 0)
/* For other magic varargs only do decay_conversion.  */
a = decay_conversion (a, complain);
   else if (DECL_CONSTRUCTOR_P (fn)
--- gcc/testsuite/c-c++-common/torture/builtin-arith-overflow-p-19.c.jj 
2016-06-15 17:13:29.839717235 +0200
+++ gcc/testsuite/c-c++-common/torture/builtin-arith-overflow-p-19.c
2016-06-15 13:05:21.0 +0200
@@ -1,5 +1,5 @@
 /* Test __builtin_{add,sub,mul}_overflow_p.  */
-/* { dg-do run { target c } } */
+/* { dg-do run } */
 /* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
 
 #include "builtin-arith-overflow.h"
--- gcc/testsuite/g++.dg/ext/builtin-arith-overflow-2.C.jj  2016-06-15 
17:34:37.617298196 +0200
+++ gcc/testsuite/g++.dg/ext/builtin-arith-overflow-2.C 2016-06-15 
17:34:11.0 +0200
@@ -0,0 +1,53 @@
+// { dg-do run }
+// { dg-options "-O2" }
+
+struct A { int i : 1; };
+struct B { int j : 3; };
+
+template 
+int
+foo (int x, int y)
+{
+  A a = {};
+  S s = {};
+  return __builtin_add_overflow_p (x, y, a.i) + 2 * __builtin_mul_overflow_p 
(x, y, s.j);
+}
+
+__attribute__((noinline, noclone)) int
+bar (int x, int y)
+{
+  return foo  (x, y);
+}
+
+#if __cplusplus >= 201402L
+template 
+constexpr int
+baz (int x, int y)
+{
+  A a = {};
+  S s = {};
+  return __builtin_add_overflow_p (x, y, a.i) + 2 * __builtin_mul_overflow_p 
(x, y, s.j);
+}
+
+constexpr int t1 = baz  (0, 0);
+constexpr int t2 = baz  (1, -1);
+constexpr int t3 = baz  (1, -4);
+constexpr int t4 = baz  (-4, 4);
+constexpr int t5 = baz  (4, 2);
+static_assert (t1 == 0, "");
+static_assert (t2 == 0, "");
+static_assert (t3 == 1, "");
+static_assert (t4 == 2, "");
+static_assert (t5 == 3, "");
+#endif
+
+int
+main ()
+{
+  if (bar (0, 0) != 0
+  || bar (-1, 1) != 0
+  || bar (-4, 1) != 1
+  || bar (4, -4) != 2
+  || bar (2, 4) != 3)
+__builtin_abort ();
+}


Jakub


Re: [PATCH], PowerPC: Allow DImode in Altivec registers

2016-06-15 Thread Segher Boessenkool
On Wed, Jun 15, 2016 at 02:24:41PM -0400, Michael Meissner wrote:
> > >  ; Some DImode loads are best done as a load of -1 followed by a mask
> > >  ; instruction.
> > >  (define_split
> > > -  [(set (match_operand:DI 0 "gpc_reg_operand")
> > > +  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> > 
> > Not sure what this is for...  If you want to say this split is only to
> > be done after RA, just say that explicitly in the split condition (i.e.
> > "reload_completed").  Or does this mean something else?
> 
> This is so that constants being loaded into the vector registers aren't split
> (they are handled via different define_splits).  Previously, the only constant
> that was loaded in vector registers was 0.
> 
> The int_reg_operand_not_pseudo allows the split to take place if it has 
> already
> gotten hard registers before register allocation.

When does that happen?

> It could have been the
> normal int_reg_operand and then use a reload_completed check.

That is preferred if it makes no difference (otherwise, bebfore you know
it we'll have twice as many predicates).


Segher


[C++ PATCH] Fix some DECL_BUILT_IN uses in C++ FE

2016-06-15 Thread Jakub Jelinek
Hi!

I've noticed 3 spots in the C++ FE test just DECL_BUILT_IN and then
immediately compare DECL_FUNCTION_CODE against BUILT_IN_* constants.
That is only meaningful for BUILT_IN_NORMAL, while DECL_BUILT_IN
macro is DECL_BUILT_IN_CLASS != NOT_BUILT_IN, so it also e.g. includes
BUILT_IN_MD.  If people are unlucky enough, their BUILT_IN_MD builtins
might have the same DECL_FUNCTION_CODE and would be mishandled.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-06-15  Jakub Jelinek  

* tree.c (builtin_valid_in_constant_expr_p): Test for
DECL_BUILT_IN_CLASS equal to BUILT_IN_NORMAL instead of just
DECL_BUILT_IN.
(bot_manip): Likewise.
* call.c (magic_varargs_p): Likewise.

--- gcc/cp/tree.c.jj2016-06-15 09:17:22.0 +0200
+++ gcc/cp/tree.c   2016-06-15 17:02:50.794984463 +0200
@@ -341,7 +341,8 @@ cp_stabilize_reference (tree ref)
 bool
 builtin_valid_in_constant_expr_p (const_tree decl)
 {
-  if (!(TREE_CODE (decl) == FUNCTION_DECL && DECL_BUILT_IN (decl)))
+  if (!(TREE_CODE (decl) == FUNCTION_DECL
+   && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL))
 /* Not a built-in.  */
 return false;
   switch (DECL_FUNCTION_CODE (decl))
@@ -2536,7 +2537,7 @@ bot_manip (tree* tp, int* walk_subtrees,
   /* builtin_LINE and builtin_FILE get the location where the default
 argument is expanded, not where the call was written.  */
   tree callee = get_callee_fndecl (*tp);
-  if (callee && DECL_BUILT_IN (callee))
+  if (callee && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
switch (DECL_FUNCTION_CODE (callee))
  {
  case BUILT_IN_FILE:
--- gcc/cp/call.c.jj2016-06-15 09:17:22.0 +0200
+++ gcc/cp/call.c   2016-06-15 17:06:57.097793446 +0200
@@ -7140,7 +7141,7 @@ magic_varargs_p (tree fn)
   if (flag_cilkplus && is_cilkplus_reduce_builtin (fn) != BUILT_IN_NONE)
 return 2;
 
-  if (DECL_BUILT_IN (fn))
+  if (DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL)
 switch (DECL_FUNCTION_CODE (fn))
   {
   case BUILT_IN_CLASSIFY_TYPE:

Jakub


[PATCH] Fix up CSE handling of const/pure calls (PR rtl-optimization/71532)

2016-06-15 Thread Jakub Jelinek
Hi!

As the following testcase shows, CSE mishandles const/pure calls, it assumes
that const/pure calls can't clobber even their argument slots.  But, the
argument slots are owned by the callee, so need to be volatile across the
calls.  On the testcase the second round of argument stores is eliminated,
because CSE thinks those memory slots already have the right values.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-06-15  Jakub Jelinek  

PR rtl-optimization/71532
* cse.c (cse_insn): For const/pure calls, invalidate argument passing
memory slots.

* gcc.dg/torture/pr71532.c: New test.

--- gcc/cse.c.jj2016-05-20 09:05:08.0 +0200
+++ gcc/cse.c   2016-06-15 16:01:21.568523552 +0200
@@ -5751,6 +5751,13 @@ cse_insn (rtx_insn *insn)
 {
   if (!(RTL_CONST_OR_PURE_CALL_P (insn)))
invalidate_memory ();
+  else
+   /* For const/pure calls, invalidate any argument slots, because
+  those are owned by the callee.  */
+   for (tem = CALL_INSN_FUNCTION_USAGE (insn); tem; tem = XEXP (tem, 1))
+ if (GET_CODE (XEXP (tem, 0)) == USE
+ && MEM_P (XEXP (XEXP (tem, 0), 0)))
+   invalidate (XEXP (XEXP (tem, 0), 0), VOIDmode);
   invalidate_for_call ();
 }
 
--- gcc/testsuite/gcc.dg/torture/pr71532.c.jj   2016-06-15 16:37:55.340368699 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr71532.c  2016-06-15 16:39:45.841956435 
+0200
@@ -0,0 +1,39 @@
+/* PR rtl-optimization/71532 */
+/* { dg-do run } */
+/* { dg-additional-options "-mtune=slm" { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((noinline, noclone, pure)) int
+foo (int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int 
k, int l)
+{
+  a++; b++; c++; d++; e++; f++; g++; h++; i++; j++; k++; l++;
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  return a + b + c + d + e + f + g + h + i + j + k + l;
+}
+
+__attribute__((noinline, noclone, pure)) int
+bar (int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int 
k, int l)
+{
+  a++; b++; c++; d++; e++; f++; g++; h++; i++; j++; k++; l++;
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  asm volatile ("" : : "g" (), "g" (), "g" (), "g" () : "memory");
+  return 2 * a + b + c + d + e + f + g + h + i + j + k + l;
+}
+
+__attribute__((noinline, noclone)) int
+test ()
+{
+  int a = foo (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);
+  a += bar (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);
+  return a;
+}
+
+int
+main ()
+{
+  if (test () != 182)
+__builtin_abort ();
+  return 0;
+}

Jakub


RE: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-15 Thread Evandro Menezes
> On Fri, May 27, 2016 at 02:50:15PM +0100, Kyrill Tkachov wrote:
> >
> > As mentioned in
> > https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00297.html,
> > frename-registers registers can be beneficial for aarch64 and the
> > patch at https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01618.html
> > resolves the AESE/AESMC fusion issue that it exposed in the aarch64
> > backend. So this patch enables the pass for aarch64 at -O2 and above.
> >
> > Ok for trunk?
> 
> As you're proposing to have this on by default, I'd like to give a chance
to
> hear whether there is consensus as to this being the right choice for the
> thunderx, xgene1, exynos-m1 and qdf24xx subtargets.

Though there's a slight (<1%) overall improvement on Exynos M1, there just
were too many significant (<-3%) regressions for a few significant
improvements for me to be comfortable with -frename-registers being a
generic default for AArch64.

I'll run some larger benchmarks tonight, but I'm leaning towards having it
as a target specific extra tuning option.

Thank you,

-- 
Evandro Menezes  Austin, TX

PS: I'm fine with refactoring aarch_option_optimization_table to
aarch64_option_optimization_table.



Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-06-15 Thread Richard Sandiford
Richard Biener  writes:
> With the proposed cost change for vector construction we will end up
> vectorizing the testcase in PR68961 again (on x86_64 and likely
> on ppc64le as well after that target gets adjustments).  Currently
> we can't optimize that away again noticing the direct overlap of
> argument and return registers.  The obstackle is
>
> (insn 7 4 8 2 (set (reg:V2DF 93)
> (vec_concat:V2DF (reg/v:DF 91 [ a ])
> (reg/v:DF 92 [ aa ]))) 
> ...
> (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
>
> which we eventually optimize to DFmode subregs of (reg:V2DF 93).
>
> First of all simplify_subreg doesn't handle the subregs of a vec_concat
> (easy fix below).
>
> Then combine doesn't like to simplify the multi-use (it tries some
> parallel it seems).  So I went to forwprop which eventually manages
> to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> because it is not a constant.  Thus I allow arbitrary simplification
> results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> to be a magic flag to tell it to restrict to the case where all
> uses can be simplified or so, nor to restrict simplifications to a REG.
> But I don't see any undesirable simplifications of (subreg 
> ([vec_]concat)).

Adding that as a special case to propgate_rtx feels like a hack though :-)
I think:

> Index: gcc/fwprop.c
> ===
> *** gcc/fwprop.c  (revision 237286)
> --- gcc/fwprop.c  (working copy)
> *** propagate_rtx (rtx x, machine_mode mode,
> *** 664,670 
> || (GET_CODE (new_rtx) == SUBREG
> && REG_P (SUBREG_REG (new_rtx))
> && (GET_MODE_SIZE (mode)
> !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
>   flags |= PR_CAN_APPEAR;
> if (!varying_mem_p (new_rtx))
>   flags |= PR_HANDLE_MEM;
> --- 664,673 
> || (GET_CODE (new_rtx) == SUBREG
> && REG_P (SUBREG_REG (new_rtx))
> && (GET_MODE_SIZE (mode)
> !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)
> !   || ((GET_CODE (new_rtx) == VEC_CONCAT
> !|| GET_CODE (new_rtx) == CONCAT)
> !   && GET_CODE (x) == SUBREG))
>   flags |= PR_CAN_APPEAR;
> if (!varying_mem_p (new_rtx))
>   flags |= PR_HANDLE_MEM;

...this if statement should fundamentally only test new_rtx.
E.g. we'd want the same thing for any SUBREG inside X.

How about changing:

  /* The replacement we made so far is valid, if all of the recursive
 replacements were valid, or we could simplify everything to
 a constant.  */
  return valid_ops || can_appear || CONSTANT_P (tem);

so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
I suppose that's likely to increase register pressure though,
if only some uses of new_rtx simplify.  (There again, requiring all
uses to be replacable could make hot code the hostage of cold code.)

Thanks,
Richard


Re: [PATCH], PowerPC: Allow DImode in Altivec registers

2016-06-15 Thread Michael Meissner
On Tue, Jun 14, 2016 at 05:53:46PM -0500, Segher Boessenkool wrote:
> On Mon, Jun 13, 2016 at 02:29:41PM -0400, Michael Meissner wrote:
> > It would help if I included the patch.
> 
> :-)
> 
> > > Are these changes ok to install in the trunk?  Assuming they go in the 
> > > trunk,
> > > can I install them in the 6.2 branch if they cause no regression?
> 
> Okay for trunk.  Okay for 6 after a week.
> 
> > > Note, I will be away from the office, starting Thursday afternoon (June 
> > > 16th,
> > > 2016) and I will return on Monday (June 20th, 2016).  I will not have easy
> > > access to email during this time.
> 
> If big problems show up, we can always revert the patch ;-)
> 
> A few things...
> 
> > * config/rs6000/rs6000.md (lfiwax): Update clobbers that don't use
> > direct move to use wi and now wj.
> 
> s/now/not/

Ok.

> > +;; wB needs ISA 2.07 VUPKHSW
> > +(define_constraint "wB"
> > +  "Signed 5-bit constant integer that can be loaded into an altivec 
> > register."
> > +  (and (match_code "const_int")
> > +   (and (match_test "TARGET_P8_VECTOR")
> > +   (match_operand 0 "s5bit_cint_operand"
> 
> "and" takes as many operands as you want, i.e.

Ok, useful to know for the future.

> +  (and (match_code "const_int")
> +   (match_test "TARGET_P8_VECTOR")
> +   (match_operand 0 "s5bit_cint_operand")))
> 
> >  (define_insn "*movdi_internal32"
> > -  [(set (match_operand:DI 0 "rs6000_nonimmediate_operand" 
> > "=Y,r,r,?m,?*d,?*d,r")
> > -   (match_operand:DI 1 "input_operand" "r,Y,r,d,m,d,IJKnGHF"))]
> > +  [(set (match_operand:DI 0 "rs6000_nonimmediate_operand"
> > + "=Y,r, r, ?m,?*d,?*d,
> > +  r, ?Y,?Z,?*wb,  ?*wv,   ?wi,
> > +  ?wo,   ?wo,   ?wv,   ?wi,   ?wi,?wv,
> > +  ?wv")
> > +
> > +   (match_operand:DI 1 "input_operand"
> > +  "r,Y, r, d, m,  d,
> > +   IJKnGHF,  wb,wv,Y, Z,  wi,
> 
> "n" includes "IJK" already?

In this case, I merely copied the existing code before formatting it.

> >  ; Some DImode loads are best done as a load of -1 followed by a mask
> >  ; instruction.
> >  (define_split
> > -  [(set (match_operand:DI 0 "gpc_reg_operand")
> > +  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> 
> Not sure what this is for...  If you want to say this split is only to
> be done after RA, just say that explicitly in the split condition (i.e.
> "reload_completed").  Or does this mean something else?

This is so that constants being loaded into the vector registers aren't split
(they are handled via different define_splits).  Previously, the only constant
that was loaded in vector registers was 0.

The int_reg_operand_not_pseudo allows the split to take place if it has already
gotten hard registers before register allocation.  It could have been the
normal int_reg_operand and then use a reload_completed check.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 2/2] gcc/genrecog: Don't warn for missing mode on special predicates

2016-06-15 Thread Richard Sandiford
Andrew Burgess  writes:
> In md.texi it says:
>
>   Predicates written with @code{define_special_predicate} do not get any
>   automatic mode checks, and are treated as having special mode handling
>   by @command{genrecog}.
>
> However, in genrecog, when validating a SET pattern, if either the
> source or destination is missing a mode then a warning is given, even if
> there's a predicate defined with define_special_predicate.
>
> This commit silences the warning for special predicates.
>
> gcc/ChangeLog:
>
>   * genrecog.c (validate_pattern): Don't warn about missing mode for
>   define_special_predicate predicates.
> Acked-by: Andrew Burgess 
> ---
>  gcc/ChangeLog  |  5 +
>  gcc/genrecog.c | 22 +++---
>  2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/genrecog.c b/gcc/genrecog.c
> index a9f5a4a..7596552 100644
> --- a/gcc/genrecog.c
> +++ b/gcc/genrecog.c
> @@ -674,9 +674,25 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx 
> set, int set_code)
>&& !CONST_WIDE_INT_P (src)
>&& GET_CODE (src) != CALL)
> {
> - const char *which;
> - which = (dmode == VOIDmode ? "destination" : "source");
> - message_at (info->loc, "warning: %s missing a mode?", which);
> + const char *which_msg;
> + rtx which;
> + const char *pred_name;
> + const struct pred_data *pred;
> +
> + which_msg = (dmode == VOIDmode ? "destination" : "source");
> + which = (dmode == VOIDmode ? dest : src);
> + pred_name = XSTR (which, 1);
> + if (pred_name[0] != 0)
> +   {
> + pred = lookup_predicate (pred_name);
> + if (!pred)
> +   error_at (info->loc, "unknown predicate '%s'", pred_name);
> +   }
> + else
> +   pred = 0;
> + if (!pred || !pred->special)
> +   message_at (info->loc, "warning: %s missing a mode?",
> +   which_msg);

There's no guarantee at this point that "which" is a match_operand.
Also, I think the earlier:

/* The operands of a SET must have the same mode unless one
   is VOIDmode.  */
else if (dmode != VOIDmode && smode != VOIDmode && dmode != smode)
  error_at (info->loc, "mode mismatch in set: %smode vs %smode",
GET_MODE_NAME (dmode), GET_MODE_NAME (smode));

should be skipped for special predicates too.

How about generalising:

/* The mode of an ADDRESS_OPERAND is the mode of the memory
   reference, not the mode of the address.  */
if (GET_CODE (src) == MATCH_OPERAND
&& ! strcmp (XSTR (src, 1), "address_operand"))
  ;

to:

if (special_predicate_operand_p (src)
|| special_predicate_operand_p (dest))
  ;

with a new special_predicate_operand_p helper?  I don't think we should
duplicate the "unknown predicate" error here; the helper can just return
false for unknown predicates.

Thanks,
Richard


Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-15 Thread Daniel Krügler
2016-06-14 23:22 GMT+02:00 Daniel Krügler :
> This is an implementation of the Standard is_swappable traits according to
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html
>
> During that work it has been found that std::array's member swap's exception
> specification for zero-size arrays was incorrectly depending on the value_type
> and that was fixed as well.
>
> This patch is *untested*, because I cannot make the tests run on my
> Windows system.
>
> Upon the suggestion of Mike Stump I'm proposing this patch
> nonetheless, asking for sending
> me as specific feedback as possible for any failing tests so that I
> can try to make further
> adjustments if needed.

And now also with the promised patch files.

> Thanks for your patience,
>
> - Daniel


changelog.patch
Description: Binary data


is_swappable_2.patch
Description: Binary data


Re: Cilk Plus testsuite needs massive cleanup (PR testsuite/70595)

2016-06-15 Thread Mike Stump
On Jun 14, 2016, at 11:09 AM, Ilya Verbin  wrote:
> 
> On Fri, Apr 29, 2016 at 11:19:47 -0700, Mike Stump wrote:
>> On Apr 29, 2016, at 5:41 AM, Rainer Orth  
>> wrote:
>>> diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
>>> --- a/gcc/config/darwin.h
>>> +++ b/gcc/config/darwin.h
>>> @@ -179,6 +179,7 @@ extern GTY(()) int darwin_ms_struct;
>>>   %{L*} %(link_libgcc) %o 
>>> %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
>>>   %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
>>> %{static|static-libgcc|static-libstdc++|static-libgfortran: 
>>> libgomp.a%s; : -lgomp } } \
>>> +%{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\
>>>   %{fgnu-tm: \
>>> %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; 
>>> : -litm } } \
>>>   %{!nostdlib:%{!nodefaultlibs:\
>> 
>> Ok.
> 
> Is it OK to backport this patch to gcc-6-branch?

Ok.

Re: [Patch, cfg] Improve jump to return optimization for complex return

2016-06-15 Thread Jiong Wang

Segher Boessenkool writes:

> On Tue, Jun 14, 2016 at 03:53:59PM +0100, Jiong Wang wrote:
>> "bl to pop" into "pop" which is "jump to return" into "return",  so a better
>> place to fix this issue is at try_optimize_cfg where we are doing these
>> jump/return optimization already:
>> 
>>   /* Try to change a branch to a return to just that return.  */
>>   rtx_insn *ret, *use;
>>   ...
>> 
>> Currently we are using ANY_RETURN_P to check whether an rtx is return
>> while ANY_RETURN_P only covered RETURN, SIMPLE_RETURN without side-effect:
>> 
>> #define ANY_RETURN_P(X) \
>>   (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
>
> On PowerPC we have a similar problem for jumps to trivial epilogues,
> which are a parallel of a return with a (use reg:LR_REGNO).  But that
> one shows up for conditional jumps.

On ARM, from my micro gcc bootstrap benchmark by enable
-fdump-rtl-pro_and_epilogue during gcc bootstrap then I can see
there are about 8.5x more "Changed jump.*to return" optimizations
happened:

 grep "Changed jump.*to return" BUILD/gcc/*.pro_and_epilogue | wc -l

The number is boosted from about thousand to about ten thousands.

And although this patch itself adds more code, the size of .text section
is slightly smaller after this patch.

It would be great if you can test this patch and see how the codegen is
affected on PowerPC.

>
>> This patch:
>>   * rename existed ANY_RETURN_P into ANY_PURE_RETURN_P.
>>   * Re-implement ANY_RETURN_P to consider complex JUMP_INSN with
>> PARALLEL in the pattern.
>
> ANY_RETURN_P is used in many other places, have you checked them all?

I had done quick check on gcc/*.[ch] and think those places are safe, I
missed gcc/config/*.

I will double check all of them.

>
>>   * Removed the side_effect_p check on the last insn in both bb1
>> and bb2.  This is because suppose we have bb1 and bb2 contains
>> the following single jump_insn and both fall through to EXIT_BB:
>
> 
>
>> cross jump optimization will try to change the jump_insn in bb1 into
>> a unconditional jump to bb2, then we will enter the next iteration
>> of try_optimize_cfg, and it will be a new "jump to return", then
>> bb1 will be optimized back into above patterns, and thus another round
>> of cross jump optimization, we will end up with infinite loop here.
>
> Why is it save to remove the side_effects_p check?  Why was it there
> in the first place?

git blames shows the side_effects_p check was there since initial cfg
supported added (r45504, back in 2001).

My understanding it's there because the code want to use onlyjump_p and
returnjump_p to conver simple jumps, but the author may have found
onlyjump_p will check side_effect while returnjump_p won't, so the
side_effect_p was added for the latter to match the logic of onlyjump_p.

I found onlyjump_p is introduced by r28584 back in 1999 where RTH used
it to do something like

  "/* If the jump insn has side effects, we can't kill the edge.  */"

That make sense to me as it read like kill a execution path that if
there is side effect, it's not safe to do that.

But here for cross jump, my understanding is we are not killing
something, instead, we are redirecting one path to the other if both
share the same instruction sequences.

So given the following instruction sequences, both ended with jump to
dest and the jump is with side-effect, then even you redirect a jump to
insn A by jump to insn A1, the side-effect will still be executed.  I am
assuming the "outgoing_edges_match/old_insns_match_p" check which is
done before "flow_find_cross_jump" will make sure the side-effect in
both sequences are identical.

 insn Ainsn A1
 insn Binsn A2
 insn Cinsn A3
 jump to dest  jump to dest
  \   /
   \ /
 dest

So I think the removal of them is safe, and if we don't remove them, then
we will trigger endless optimization cycle after this patch, at least on
ARM.

Because complex return pattern is likely has side-effect, thus failed
these initial checkes in flow_find_cross_jump, then both jump_insn in
bb1 and bb2 will fall through to the full comparision path where it's
judged as identical, thus will be counted into ninsns and might triger
cross jump optimization.


bb 1bb 2

(jump_insn   (jump_insn
   (parallel [  (parallel [
(return) (return)
(set/f   (set/f
  (reg:SI 15 pc)   (reg:SI 15 pc)
  (mem:SI  (mem:SI
(post_inc:SI (post_inc:SI
(reg/f:SI 13 sp))(reg/f:SI 13 sp))
])   ])
 -> return)-> return)

\/
 \

Re: [C++ PATCH] Add testcase for 4.8 bug

2016-06-15 Thread Jason Merrill
OK.

Jason


Re: [patch, avr] Fix PR67353

2016-06-15 Thread Denis Chertykov
2016-06-15 13:19 GMT+03:00 Pitchumani Sivanupandi
:
> On Mon, 2016-06-13 at 17:48 +0200, Georg-Johann Lay wrote:
>> Pitchumani Sivanupandi schrieb:
>> >
>> > $ avr-gcc test.c -Wno-misspelled-isr
>> > $
>> What about -Werror=misspelled-isr?
>
> Updated patch.
>
>> >
>> > diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
>> > index ba5cd91..587bdbc 100644
>> > --- a/gcc/config/avr/avr.c
>> > +++ b/gcc/config/avr/avr.c
>> > @@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
>> >   that the name of the function is "__vector_NN" so as to
>> > catch
>> >   when the user misspells the vector name.  */
>> >
>> > -  if (!STR_PREFIX_P (name, "__vector"))
>> > +  if ((!STR_PREFIX_P (name, "__vector")) &&
>> > (avr_warn_misspelled_isr))
>> >  warning_at (loc, 0, "%qs appears to be a misspelled %s
>> > handler",
>> If, instead of the "0" the respective OPT_... enum is used in the
>> call
>> to warning_at, the -Werror= should work as expected (and explicit
>> "&&
>> avr_warn_misspelled_isr" no more needed).
>
> Ok. Updated patch as per the comments.
>
> If OK, could someone commit please?
>
> Regards,
> Pitchumani
>
> gcc/ChangeLog
>
> 2016-06-15  Pitchumani Sivanupandi  
>
> PR target/67353
> * config/avr/avr.c (avr_set_current_function): Warn misspelled
> interrupt/ signal handler if -Wmisspelled-isr flag is enabled.
> * config/avr/avr.opt (Wmisspelled-isr): New warning flag. Enabled
> by default to warn misspelled interrupt/ signal handler.
> * doc/invoke.texi (AVR Options): Document it. Update description
> for -nodevicelib option.

Committed.


Re: [patch,avr] ad PR71103: also handle QImode SUBREGs of CONST

2016-06-15 Thread Denis Chertykov
2016-06-15 12:11 GMT+03:00 Georg-Johann Lay :
> This patch handles the cases when subreg:QI of a CONST or LABEL_REF is to be
> moved to a QImode register.  The original patch only handled SYMBOL_REFs.
>
> OK for trunk and backport?
>
>
> Johann
>
> --
>
>  gcc/
> PR target/71103
> * config/avr/avr.md (movqi): Handle loading subreg:qi (const).
>
> gcc/testsuite/
> PR target/71103
> * gcc.target/avr/torture/pr71103-2.c: New test.
>

Approved.
Please apply.


Re: [PATCH][AArch64][obvious] Clean up parentheses and use GET_MODE_UNIT_BITSIZE in a couple of patterns

2016-06-15 Thread Kyrill Tkachov


On 15/06/16 17:12, Andreas Schwab wrote:

Kyrill Tkachov  writes:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
6effd7d42d18c9b526aaaec93a44e8801908e164..a19d1711b5bcb516e4aca6a22d1b79df4f32923f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3993,15 +3993,12 @@ (define_insn "aarch64_shll_n"
   "aarch64_simd_shift_imm_bitsize_" "i")]
   VSHLL))]
"TARGET_SIMD"
-  "*
-  int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
-  if (INTVAL (operands[2]) == bit_width)
{
-return \"shll\\t%0., %1., %2\";
+if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %2";
+else
+  return "shll\\t%0., %1., %2";

You need to unquote the backslashes, too.


You mean not escape the '\t'?
The port uses \\t for tabs, even in {...} templates, though not consistently.
That could be cleaned up separately.

Kyrill


Andreas.





Re: [PATCH][AArch64][obvious] Clean up parentheses and use GET_MODE_UNIT_BITSIZE in a couple of patterns

2016-06-15 Thread Andreas Schwab
Kyrill Tkachov  writes:

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 6effd7d42d18c9b526aaaec93a44e8801908e164..a19d1711b5bcb516e4aca6a22d1b79df4f32923f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3993,15 +3993,12 @@ (define_insn "aarch64_shll_n"
>  "aarch64_simd_shift_imm_bitsize_" "i")]
>   VSHLL))]
>"TARGET_SIMD"
> -  "*
> -  int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
> -  if (INTVAL (operands[2]) == bit_width)
>{
> -return \"shll\\t%0., %1., %2\";
> +if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (mode))
> +  return "shll\\t%0., %1., %2";
> +else
> +  return "shll\\t%0., %1., %2";

You need to unquote the backslashes, too.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH][AArch64][obvious] Clean up parentheses and use GET_MODE_UNIT_BITSIZE in a couple of patterns

2016-06-15 Thread Kyrill Tkachov

Hi all,

The parentheses in these two patterns are a bit of a mess and we can remove 
them.
Do that. also, use '{' and '}' for the C code so that we can avoid escaping the 
strings
in the block.  Also, use GET_MODE_UNIT_BITSIZE directly instead of taking 
GET_MODE_UNIT_SIZE and
multiplying by BITS_PER_UNIT which is equivalent.

Bootstrapped and tested on aarch64.
Committing as obvious.

Thanks,
Kyrill

2016-06-15  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (aarch64_shll_n): Clean
up parentheses.  Use GET_MODE_UNIT_BITSIZE.
(aarch64_shll2_n): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6effd7d42d18c9b526aaaec93a44e8801908e164..a19d1711b5bcb516e4aca6a22d1b79df4f32923f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3993,15 +3993,12 @@ (define_insn "aarch64_shll_n"
 			   "aarch64_simd_shift_imm_bitsize_" "i")]
  VSHLL))]
   "TARGET_SIMD"
-  "*
-  int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
-  if (INTVAL (operands[2]) == bit_width)
   {
-return \"shll\\t%0., %1., %2\";
+if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %2";
+else
+  return "shll\\t%0., %1., %2";
   }
-  else {
-return \"shll\\t%0., %1., %2\";
-  }"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
@@ -4013,15 +4010,12 @@ (define_insn "aarch64_shll2_n"
 			 (match_operand:SI 2 "immediate_operand" "i")]
  VSHLL))]
   "TARGET_SIMD"
-  "*
-  int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
-  if (INTVAL (operands[2]) == bit_width)
   {
-return \"shll2\\t%0., %1., %2\";
+if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %2";
+else
+  return "shll2\\t%0., %1., %2";
   }
-  else {
-return \"shll2\\t%0., %1., %2\";
-  }"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 


[PATCH, CHKP, PR middle-end/71529] Fix DECL_CONTEXT for args of instrumentation clones with no body

2016-06-15 Thread Ilya Enkovich
Hi,

Currently chkp_build_instrumented_fndecl copies arguments list
in case function has no body.  Copied arguments have incorrect
DECL_CONTEXT and this patch fixes it.

Bootstrapped and regtested for x86_64-unknown-linux-gnu.  I'm
going to commit it to trunk and later port to gcc-6-branch.

Thanks,
Ilya
--
gcc/

2016-06-15  Ilya Enkovich  

PR middle-end/71529
* ipa-chkp.c (chkp_build_instrumented_fndecl): Fix
DECL_CONTEXT for copied arguments.

gcc/testsuite/

2016-06-15  Ilya Enkovich  

PR middle-end/71529
* gcc.target/i386/pr71529.C: New test.


diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c
index 5f5df64..86c48f1 100644
--- a/gcc/ipa-chkp.c
+++ b/gcc/ipa-chkp.c
@@ -207,7 +207,13 @@ chkp_build_instrumented_fndecl (tree fndecl)
   /* For functions with body versioning will make a copy of arguments.
  For functions with no body we need to do it here.  */
   if (!gimple_has_body_p (fndecl))
-DECL_ARGUMENTS (new_decl) = copy_list (DECL_ARGUMENTS (fndecl));
+{
+  tree arg;
+
+  DECL_ARGUMENTS (new_decl) = copy_list (DECL_ARGUMENTS (fndecl));
+  for (arg = DECL_ARGUMENTS (new_decl); arg; arg = DECL_CHAIN (arg))
+   DECL_CONTEXT (arg) = new_decl;
+}
 
   /* We are going to modify attributes list and therefore should
  make own copy.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr71529.C 
b/gcc/testsuite/gcc.target/i386/pr71529.C
new file mode 100644
index 000..3169101
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr71529.C
@@ -0,0 +1,22 @@
+/* PR71529 */
+/* { dg-do compile { target { ! x32 } } } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */
+
+class c1
+{
+ public:
+  virtual ~c1 ();
+};
+
+class c2
+{
+ public:
+  virtual ~c2 ();
+};
+
+class c3 : c1, c2 { };
+
+int main (int, char **)
+{
+  c3 obj;
+}


Re: [PATCH] PR 71483 - Fix live SLP operations

2016-06-15 Thread Richard Biener
On June 14, 2016 4:14:20 PM GMT+02:00, Alan Hayward  
wrote:
>In the given testcase, g++ splits a live operation into two scalar
>statements
>and four vector statements.
>
>_5 = _4 >> 2;
>  _7 = (short int) _5;
>
>Is turned into:
>
>vect__5.32_80 = vect__4.31_76 >> 2;
>  vect__5.32_81 = vect__4.31_77 >> 2;
>  vect__5.32_82 = vect__4.31_78 >> 2;
>  vect__5.32_83 = vect__4.31_79 >> 2;
>  vect__7.33_86 = VEC_PACK_TRUNC_EXPR ;
>  vect__7.33_87 = VEC_PACK_TRUNC_EXPR ;
>
>_5 is then accessed outside the loop.
>
>This patch ensures that vectorizable_live_operation picks the correct
>scalar
>statement.
>I removed the "three possibilites" comment because it was no longer
>accurate
>(it's also possible to have more vector statements than scalar
>statements)
>and
>the calculation is now much simpler.
>
>Tested on x86 and aarch64.
>Ok to commit?

OK.

Thanks,
Richard.

>gcc/
>PR tree-optimization/71483
>   * tree-vect-loop.c (vectorizable_live_operation): Pick correct index
>for slp
>
>testsuite/g++.dg/vect
>PR tree-optimization/71483
>* pr71483.c: New
>
>
>Alan.
>
>
>diff --git a/gcc/testsuite/g++.dg/vect/pr71483.c
>b/gcc/testsuite/g++.dg/vect/pr71483.c
>new file mode 100644
>index 
>..77f879c9a89b8b41ef9dde3c343591857
>2dc8d01
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/vect/pr71483.c
>@@ -0,0 +1,11 @@
>+/* { dg-do compile } */
>+int b, c, d;
>+short *e;
>+void fn1() {
>+  for (; b; b--) {
>+d = *e >> 2;
>+*e++ = d;
>+c = *e;
>+*e++ = d;
>+  }
>+}
>diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>index 
>4c8678505df6ec572b69fd7d12ac55cf4619ece6..a2413bf9c678d11cc2ffd22bc7d984e91
>1831804 100644
>--- a/gcc/tree-vect-loop.c
>+++ b/gcc/tree-vect-loop.c
>@@ -6368,24 +6368,20 @@ vectorizable_live_operation (gimple *stmt,
>
>   int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length ();
>   int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
>-  int scalar_per_vec = num_scalar / num_vec;
>
>-  /* There are three possibilites here:
>-   1: All scalar stmts fit in a single vector.
>-   2: All scalar stmts fit multiple times into a single vector.
>-  We must choose the last occurence of stmt in the vector.
>-   3: Scalar stmts are split across multiple vectors.
>-  We must choose the correct vector and mod the lane accordingly. 
>*/
>+  /* Get the last occurrence of the scalar index from the
>concatenation of
>+   all the slp vectors. Calculate which slp vector it is and the index
>+   within.  */
>+  int pos = (num_vec * nunits) - num_scalar + slp_index;
>+  int vec_entry = pos / nunits;
>+  int vec_index = pos % nunits;
>
>   /* Get the correct slp vectorized stmt.  */
>-  int vec_entry = slp_index / scalar_per_vec;
>   vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]);
>
>   /* Get entry to use.  */
>-  bitstart = build_int_cst (unsigned_type_node,
>-  scalar_per_vec - (slp_index % scalar_per_vec));
>+  bitstart = build_int_cst (unsigned_type_node, vec_index);
>   bitstart = int_const_binop (MULT_EXPR, bitsize, bitstart);
>-  bitstart = int_const_binop (MINUS_EXPR, vec_bitsize, bitstart);
> }
>   else
> {




Re: [PATCH] Optimize inserting value_type into std::vector

2016-06-15 Thread Jonathan Wakely

On 15/06/16 11:34 +0100, Jonathan Wakely wrote:

On 15/06/16 11:15 +0100, Jonathan Wakely wrote:

* include/bits/stl_vector.h (vector::_S_insert_aux_assign): Define
new overloaded functions.
* include/bits/vector.tcc (vector::_M_insert_aux): Use new functions
to avoid creating a redundant temporary.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc: New
test.

Tested x86_64-linux.

This improves our performance on Howard Hinnant's "insert vs emplace"
experiment at
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/insert_vs_emplace.html

With this small change there is no difference between emplacing or
using the relevant insert / push_back function. That also means we
beat libc++ in some cases, making us the bestest, whoo!


We still lose to libc++ in one case. For the "lvalue no reallocation"
test our insert and emplace are equal to libc++'s emplace, but
libc++'s insert is even better.


Libc++'s insert(const_iterator, const value_type&) is *really* clever.
It notices when the object to insert is an element of the vector, and
then works out where its updated position would be after shifting the
existing elements along, and then copies from the new position. That
avoids the copy we make upfront to handle that case safely. Our
approach is inefficient for the common case where the object *isn't*
in the vector.

I guess it's something like:

 void
 insert(const_iterator pos, const value_type& x)
 {
   if (cpacity() == size())
 {
   // reallocate and insert ...
   return;
 }
   auto it = end() - 1;
   *end() = std::move(*it);
   ++this->_M_impl._M_finish;
   const value_type* from = std::addressof(x);
   for (; it != pos; --it)
   {
 auto prev = it - 1;
 *it = std::move(*prev);
 if (std::addressof(*prev) == from)
   ++from;
   }
   *pos = *from;
 }

Clever.




Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-06-15 Thread H.J. Lu
On Wed, Jun 15, 2016 at 7:06 AM, Bernd Schmidt  wrote:
>
>
> On 06/15/2016 04:03 PM, Alan Modra wrote:
>>
>> On Wed, Jun 15, 2016 at 11:49:50AM +0200, Bernd Schmidt wrote:
>>>
>>> On 06/15/2016 03:30 AM, Alan Modra wrote:

 Between these two calls to _gfortran_string_verify,
  if (verify(c4, "A", back = .true.) .ne. 3) call abort
  if (verify(c4, "AB") .ne. 0) call abort
 it seems that gfortran is assuming that parameters passed on the stack
 are unchanged.
>>>
>>>
>>> How? Is this something in the Fortran frontend, or is there CSE going on
>>> for
>>> stores to the argument area?
>>
>>
>> It's not a fortran problem, and I answered my own question about the
>> ABI by the testcase in
>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01098.html.  If tail
>> calls are allowed, then the stack parameter area must be overwritten.
>> Thus an ABI can't allow tail calls if it specifies the stack parameter
>> area is preserved.  x86 allows tail calls.
>
>
> Well yes. The problem isn't that the stack area is overwritten, the problem
> is that something expects that it isn't, and it's not clear to me yet where
> that problem occurs.

Will this patch

https://gcc.gnu.org/bugzilla/attachment.cgi?id=38705

also fix this

-- 
H.J.


[PATCH] [OBVIOUS] Fix obvious typo in predict.c

2016-06-15 Thread Martin Liška
Hello.

This corrects a typo in predict.c, which is pre-approved by Honza.
Survives regression tests & bootstraps on x86_64-linux-gnu.

Installed as r237481.

Martin
>From 5d755d1a83094b24319d4a31d1c951c8aa622a87 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Jun 2016 14:37:29 +0200
Subject: [PATCH 1/2] Fix obvious typo in predict.c

gcc/ChangeLog:

2016-06-15  Martin Liska  

	* predict.c (tree_predict_by_opcode): Call predict_edge_def
	instead of predict_edge w/o a probability.
---
 gcc/predict.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index 7d55ff7..bafcc96 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2192,8 +2192,8 @@ tree_predict_by_opcode (basic_block bb)
 	  predict_edge (then_edge, PRED_BUILTIN_EXPECT, HITRATE (percent));
 	}
   else
-	predict_edge (then_edge, predictor,
-		  integer_zerop (val) ? NOT_TAKEN : TAKEN);
+	predict_edge_def (then_edge, predictor,
+			  integer_zerop (val) ? NOT_TAKEN : TAKEN);
 }
   /* Try "pointer heuristic."
  A comparison ptr == 0 is predicted as false.
-- 
2.8.3



Re: [PATCH, i386]: Implement PR 71246, Missing built-in functions for float128 NaNs

2016-06-15 Thread Uros Bizjak
On Tue, Jun 14, 2016 at 11:45 PM, Rainer Orth
 wrote:
> Uros Bizjak  writes:
>
>> testsuite/ChangeLog:
>>
>> 2016-06-12  Uros Bizjak  
>>
>> PR target/71241
>> * testsuite/gcc.dg/torture/float128-nan.c: New test.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> The test FAILs on 64-bit Solaris/x86:

Thanks for suggestion, fixed by attached patch.

2016-06-15  Uros Bizjak  

* gcc.dg/torture/float128-nan.c: Include stdint.h to define uint64_t.

Tested on x86_64-linux-gnu {,-m32} and committed.

Uros.
diff --git a/gcc/testsuite/gcc.dg/torture/float128-nan.c 
b/gcc/testsuite/gcc.dg/torture/float128-nan.c
index b570623..f9aa457 100644
--- a/gcc/testsuite/gcc.dg/torture/float128-nan.c
+++ b/gcc/testsuite/gcc.dg/torture/float128-nan.c
@@ -5,8 +5,7 @@
 
 #include 
 #include 
-
-typedef unsigned long long int uint64_t;
+#include 
 
 typedef union
 {


Re: [PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-15 Thread Jakub Jelinek
On Wed, Jun 15, 2016 at 08:08:22AM -0600, Martin Sebor wrote:
> I like the idea of being able to use the built-ins for this, but
> I think it would be confusing for them to follow subtly different
> rules for C than for C++.  Since the value of the last argument

I think it isn't that hard to tweak the C++ FE to handle this, I'd just need
to make sure it also handles it right during template instantiation.
I'm afraid specifying an precision there would open a can of worms, what to
do if the argument is not compile time constant, what to do if somebody
specifies 0, or too large value, ...

Jakub


[PATCH 2/3] Add support for arm*-*-phoenix* targets.

2016-06-15 Thread Kuba Sejdak
Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

   * config.gcc: Add support for arm*-*-phoenix* targets.
   * config/arm/t-phoenix: New.
   * config/phoenix.h: New.

---
 gcc/ChangeLog|  6 ++
 gcc/config.gcc   | 11 +++
 gcc/config/arm/t-phoenix | 29 +
 gcc/config/phoenix.h | 33 +
 4 files changed, 79 insertions(+)
 create mode 100644 gcc/config/arm/t-phoenix
 create mode 100644 gcc/config/phoenix.h

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 907bb06..26807d2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-06-15  Jakub Sejdak  
+
+   * config.gcc: Add support for arm*-*-phoenix* targets.
+   * config/arm/t-phoenix: New.
+   * config/phoenix.h: New.
+
 2016-06-14  David Malcolm  
 
* spellcheck-tree.c: Include spellcheck-tree.h rather than
diff --git a/gcc/config.gcc b/gcc/config.gcc
index e47535b..8c46798 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -815,6 +815,11 @@ case ${target} in
   ;;
   esac
   ;;
+*-*-phoenix*)
+  gas=yes
+  gnu_ld=yes
+  default_use_cxa_atexit=yes
+  ;;
 *-*-rtems*)
   case ${enable_threads} in
 "" | yes | rtems) thread_file='rtems' ;;
@@ -1097,6 +1102,12 @@ arm*-*-uclinux*eabi*)# ARM ucLinux
# The EABI requires the use of __cxa_atexit.
default_use_cxa_atexit=yes
;;
+arm*-*-phoenix*)
+   tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/bpabi.h"
+   tm_file="${tm_file} newlib-stdint.h phoenix.h"
+   tm_file="${tm_file} arm/aout.h arm/arm.h"
+   tmake_file="${tmake_file} arm/t-arm arm/t-bpabi arm/t-phoenix"
+   ;;
 arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
case ${target} in
arm*eb-*-eabi*)
diff --git a/gcc/config/arm/t-phoenix b/gcc/config/arm/t-phoenix
new file mode 100644
index 000..d881884
--- /dev/null
+++ b/gcc/config/arm/t-phoenix
@@ -0,0 +1,29 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+MULTILIB_OPTIONS = marm/mthumb
+MULTILIB_DIRNAMES= arm thumb
+MULTILIB_EXCEPTIONS  =
+MULTILIB_MATCHES =
+
+MULTILIB_OPTIONS += mfloat-abi=hard
+MULTILIB_DIRNAMES+= fpu
+MULTILIB_MATCHES += mfloat-abi?hard=mhard-float
+
+MULTILIB_OPTIONS += mno-thumb-interwork/mthumb-interwork
+MULTILIB_DIRNAMES+= normal interwork
diff --git a/gcc/config/phoenix.h b/gcc/config/phoenix.h
new file mode 100644
index 000..9ffb958
--- /dev/null
+++ b/gcc/config/phoenix.h
@@ -0,0 +1,33 @@
+/* Base configuration file for all Phoenix-RTOS targets.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#undef TARGET_OS_CPP_BUILTINS
+#define TARGET_OS_CPP_BUILTINS()   \
+do {   \
+  builtin_define_std ("phoenix");  \
+  builtin_define_std ("unix"); \
+  builtin_assert ("system=phoenix");   \
+  builtin_assert ("system=unix");  \
+} while(0);
+
+#define STD_LIB_SPEC "%{!shared:%{g*:-lg} 
%{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
+
+/* This will prevent selecting 'unsigned long int' instead of 'unsigned int' 
as 'uint32_t' in stdint-newlib.h. */
+#undef STDINT_LONG32
+#define STDINT_LONG32  0
-- 
2.7.4



[PATCH 1/3] Disable libgcj and libgloss for Phoenix-RTOS targets.

2016-06-15 Thread Kuba Sejdak
This patch disables libgcj and libgloss in main configure.ac for new OS port - 
Phoenix-RTOS.
Those libs are unnecessary to build GCC or newlib for arm-phoenix.

Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

* configure.ac: Disable libgcj and libgloss for Phoenix-RTOS targets.
* configure: Regenerated.

---
 ChangeLog| 5 +
 configure| 6 ++
 configure.ac | 6 ++
 3 files changed, 17 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index cee8206..ec5fa6e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-15  Jakub Sejdak  
+
+* configure.ac: Disable libgcj and libgloss for Phoenix-RTOS targets.
+* configure: Regenerated.
+
 2016-06-14  Mikael Morin  
 
* MAINTAINERS (Write After Approval): Update e-mail address.
diff --git a/configure b/configure
index ea63784..19451d2 100755
--- a/configure
+++ b/configure
@@ -3469,6 +3469,9 @@ case "${target}" in
   *-*-netware*)
 noconfigdirs="$noconfigdirs ${libgcj}"
 ;;
+  *-*-phoenix*)
+noconfigdirs="$noconfigdirs ${libgcj}"
+;;
   *-*-rtems*)
 noconfigdirs="$noconfigdirs ${libgcj}"
 ;;
@@ -3725,6 +3728,9 @@ case "${target}" in
 ;;
   *-*-netware*)
 ;;
+  *-*-phoenix*)
+noconfigdirs="$noconfigdirs target-libgloss"
+;;
   *-*-rtems*)
 noconfigdirs="$noconfigdirs target-libgloss"
 # this is not caught below because this stanza matches earlier
diff --git a/configure.ac b/configure.ac
index 54558df..d965059 100644
--- a/configure.ac
+++ b/configure.ac
@@ -805,6 +805,9 @@ case "${target}" in
   *-*-netware*)
 noconfigdirs="$noconfigdirs ${libgcj}"
 ;;
+  *-*-phoenix*)
+noconfigdirs="$noconfigdirs ${libgcj}"
+;;
   *-*-rtems*)
 noconfigdirs="$noconfigdirs ${libgcj}"
 ;;
@@ -1061,6 +1064,9 @@ case "${target}" in
 ;;
   *-*-netware*)
 ;;
+  *-*-phoenix*)
+noconfigdirs="$noconfigdirs target-libgloss"
+;;
   *-*-rtems*)
 noconfigdirs="$noconfigdirs target-libgloss"
 # this is not caught below because this stanza matches earlier
-- 
2.7.4



[PATCH 3/3] Add support for arm*-*-phoenix* targets in libgcc.

2016-06-15 Thread Kuba Sejdak
Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

   * config.host: Add suport for arm*-*-phoenix* targets.

---
 libgcc/ChangeLog   | 4 
 libgcc/config.host | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 19d6011..73288cc 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-06-15  Jakub Sejdak  
+
+   * config.host: Add suport for arm*-*-phoenix* targets.
+
 2016-06-05  Aaron Conole  
Nathan Sidwell  
 
diff --git a/libgcc/config.host b/libgcc/config.host
index 7899216..196abc9 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -404,6 +404,13 @@ arm*-*-uclinux*)   # ARM ucLinux
unwind_header=config/arm/unwind-arm.h
extra_parts="$extra_parts crti.o crtn.o"
;;
+arm*-*-phoenix*)
+   tmake_file="t-hardfp t-softfp arm/t-arm arm/t-elf arm/t-softfp 
arm/t-phoenix"
+   tmake_file="${tmake_file} arm/t-bpabi"
+   tm_file="$tm_file arm/bpabi-lib.h"
+   extra_parts="crtbegin.o crtend.o crti.o crtn.o"
+   unwind_header=config/arm/unwind-arm.h
+   ;;
 arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
tmake_file="${tmake_file} arm/t-arm arm/t-elf t-fixedpoint-gnu-prefix"
tm_file="$tm_file arm/bpabi-lib.h"
-- 
2.7.4



Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-15 Thread Jakub Sejdak
Hello,

> First of all, do you or your employer have a copyright assignment
> to the FSF? The above link contains instructions on how to do that.
> It is a necessary prerequisite to accepting any non-small change.

Sorry for a late response, but it took me some time to fulfill
requirements mentioned above.
We (Phoenix Systems) now have a copyright assignment to the FSF.

> As described in https://gcc.gnu.org/svnwrite.html the contents of this file
> list the people with write access permissions as well as the maintainers of
> each
> gcc component. Maintainers are appointed by the GCC steering committee
> (CC'ed one of them who is also an arm port maintainer).
> So you should remove this hunk and apply it separately
> if the steering committee takes that decision.

It was not my intention to have a write access permission, just to let
others know, who should they contact in case of troubles.
What should I do then, to be appointed by GCC steering committee to be
maintainer of our OS port?

> The rest of the patch looks sane to me but you'd need to sort out the above
> before this can progress.

Since part of this patch should be removed, I will split it and send
again in separate email.

2016-05-19 10:59 GMT+02:00 Kyrill Tkachov :
> Hi Jakub,
>
> For future reference, as per https://gcc.gnu.org/contribute.html the usual
> practice
> is to wait for a week or two before pinging a patch...
>
>
>
> On 17/05/16 09:42, Kuba Sejdak wrote:
>>
>> ---
>>   ChangeLog|  6 ++
>>   MAINTAINERS  |  1 +
>>   configure|  6 ++
>>   configure.ac |  6 ++
>>   gcc/ChangeLog|  6 ++
>>   gcc/config.gcc   | 11 +++
>>   gcc/config/arm/t-phoenix | 29 +
>>   gcc/config/phoenix.h | 33 +
>>   libgcc/ChangeLog |  4 
>>   libgcc/config.host   |  7 +++
>>   10 files changed, 109 insertions(+)
>>   create mode 100644 gcc/config/arm/t-phoenix
>>   create mode 100644 gcc/config/phoenix.h
>>
>> diff --git a/ChangeLog b/ChangeLog
>> index 8698133..2d25a91 100644
>> --- a/ChangeLog
>> +++ b/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2016-05-17  Jakub Sejdak  
>> +
>> +* configure.ac: Disable libgcj and libgloss for Phoenix-RTOS targets.
>> +* configure: Regenerated.
>> +   * MAINTAINERS (OS maintainers): Add myself.
>> +
>
>
> First of all, do you or your employer have a copyright assignment
> to the FSF? The above link contains instructions on how to do that.
> It is a necessary prerequisite to accepting any non-small change.
>
>>   2016-05-16  Jakub Sejdak  
>> * config.guess: Import version 2016-04-02 (newest).
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index c615168..1d22df6 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -134,6 +134,7 @@ hpuxSteve Ellcey
>> 
>>   solaris   Rainer Orth
>> 
>>   netbsdJason Thorpe
>> 
>>   netbsdKrister Walfridsson
>> 
>> +Phoenix-RTOS   Jakub Sejdak
>>   sh-linux-gnu  Kaz Kojima  
>>   RTEMS Ports   Joel Sherrill   
>>   RTEMS Ports   Ralf Corsepius  
>
>
> As described in https://gcc.gnu.org/svnwrite.html the contents of this file
> list the people with write access permissions as well as the maintainers of
> each
> gcc component. Maintainers are appointed by the GCC steering committee
> (CC'ed one of them who is also an arm port maintainer).
> So you should remove this hunk and apply it separately
> if the steering committee takes that decision.
>
> The rest of the patch looks sane to me but you'd need to sort out the above
> before this can progress.
>
> Kyrill
>
>
>> diff --git a/configure b/configure
>> index ea63784..19451d2 100755
>> --- a/configure
>> +++ b/configure
>> @@ -3469,6 +3469,9 @@ case "${target}" in
>> *-*-netware*)
>>   noconfigdirs="$noconfigdirs ${libgcj}"
>>   ;;
>> +  *-*-phoenix*)
>> +noconfigdirs="$noconfigdirs ${libgcj}"
>> +;;
>> *-*-rtems*)
>>   noconfigdirs="$noconfigdirs ${libgcj}"
>>   ;;
>> @@ -3725,6 +3728,9 @@ case "${target}" in
>>   ;;
>> *-*-netware*)
>>   ;;
>> +  *-*-phoenix*)
>> +noconfigdirs="$noconfigdirs target-libgloss"
>> +;;
>> *-*-rtems*)
>>   noconfigdirs="$noconfigdirs target-libgloss"
>>   # this is not caught below because this stanza matches earlier
>> diff --git a/configure.ac b/configure.ac
>> index 54558df..d965059 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -805,6 +805,9 @@ case "${target}" in
>> *-*-netware*)
>>   noconfigdirs="$noconfigdirs 

Re: [PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-15 Thread Martin Sebor

On 06/15/2016 06:16 AM, Joseph Myers wrote:

On Wed, 15 Jun 2016, Jakub Jelinek wrote:


The only thing I'm unsure about is what to do with bitfield types.
For __builtin_{add,sub,mul}_overflow it is not an issue, as one can't take
address of a bitfield.  For __builtin_{add,sub,mul}_overflow_p right now,
the C FE doesn't promote the last argument in any way, therefore for C
the builtin-arith-overflow-p-19.c testcase tests the behavior of bitfield
overflows.  The C++ FE even for type-generic builtins promotes the argument
to the underlying type (as part of decay_conversion), therefore for C++
overflow to bit-fields doesn't work.  Is that acceptable that because the
bitfields in the two languages behave generally slightly differently it is
ok that it differs even here, or should the C FE promote bitfields to the
underlying type for the last argument of __builtin_{add,sub,mul}_overflow_p,
or should the C++ FE special case __builtin_{add,sub,mul}_overflow_p and
not decay_conversion on the last argument to these, something else?


If the idea of the built-ins it to tell whether the result of arithmetic
could be stored in a particular object, then it would seem natural not to
promote bit-fields in either language, and so enable telling whether the
result of arithmetic could be stored in a bit-field.  Except that of
course the last argument is an rvalue not an lvalue, and when it's a
question of whether the result could be stored in a particular type rather
than a particular object, it's less clear that C++ shouldn't promote.


I like the idea of being able to use the built-ins for this, but
I think it would be confusing for them to follow subtly different
rules for C than for C++.  Since the value of the last argument
is ignored, how about using it to specify the precision of the
destination argument?  (With zero meaning the type's precision,
and perhaps (Type)-1 meaning the default for each language.
That way the built-ins could be used both portably and also
to figure out if the result would overflow a bit-field.)

Martin


Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-06-15 Thread Bernd Schmidt



On 06/15/2016 04:03 PM, Alan Modra wrote:

On Wed, Jun 15, 2016 at 11:49:50AM +0200, Bernd Schmidt wrote:

On 06/15/2016 03:30 AM, Alan Modra wrote:

Between these two calls to _gfortran_string_verify,
 if (verify(c4, "A", back = .true.) .ne. 3) call abort
 if (verify(c4, "AB") .ne. 0) call abort
it seems that gfortran is assuming that parameters passed on the stack
are unchanged.


How? Is this something in the Fortran frontend, or is there CSE going on for
stores to the argument area?


It's not a fortran problem, and I answered my own question about the
ABI by the testcase in
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01098.html.  If tail
calls are allowed, then the stack parameter area must be overwritten.
Thus an ABI can't allow tail calls if it specifies the stack parameter
area is preserved.  x86 allows tail calls.


Well yes. The problem isn't that the stack area is overwritten, the 
problem is that something expects that it isn't, and it's not clear to 
me yet where that problem occurs.



Bernd



Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-06-15 Thread Alan Modra
On Wed, Jun 15, 2016 at 11:49:50AM +0200, Bernd Schmidt wrote:
> On 06/15/2016 03:30 AM, Alan Modra wrote:
> >Between these two calls to _gfortran_string_verify,
> >  if (verify(c4, "A", back = .true.) .ne. 3) call abort
> >  if (verify(c4, "AB") .ne. 0) call abort
> >it seems that gfortran is assuming that parameters passed on the stack
> >are unchanged.
> 
> How? Is this something in the Fortran frontend, or is there CSE going on for
> stores to the argument area?

It's not a fortran problem, and I answered my own question about the
ABI by the testcase in
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01098.html.  If tail
calls are allowed, then the stack parameter area must be overwritten.
Thus an ABI can't allow tail calls if it specifies the stack parameter
area is preserved.  x86 allows tail calls.  I didn't look further into
where the problem in arg setup occurs, sorry, except to note that
gcc-5 does not have the problem.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][1/2] Move choose_mult_variant declaration and dependent declarations to expmed.h

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 3:24 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This is a respin of
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00951.html.
> This just moves the necessary declarations to expmed.h so that a file that
> includes
> expmed.h can access the mult synthesis algorithms.
>
> Bootstrapped and tested on x86_64, aarch64, arm.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Kyrill
>
> 2016-06-15  Kyrylo Tkachov  
>
> * expmed.c (mult_variant, choose_mult_variant): Move declaration to...
> * expmed.h: ... Here.


Re: [5/7] Move the fix for PR65518

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:52 AM, Richard Sandiford
 wrote:
> This patch moves the fix for PR65518 to the code that checks whether
> load-and-permute operations are supported.   If the group size is
> greater than the vectorisation factor, it would still be possible
> to fall back to elementwise loads (as for strided groups) rather
> than fail vectorisation entirely.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-vectorizer.h (vect_grouped_load_supported): Add a
> single_element_p parameter.
> * tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.
> Check the PR65518 case here rather than in vectorizable_load.
> * tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly.
> * tree-vect-stmts.c (vectorizable_load): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h
> +++ gcc/tree-vectorizer.h
> @@ -1069,7 +1069,7 @@ extern tree bump_vector_ptr (tree, gimple *, 
> gimple_stmt_iterator *, gimple *,
>  extern tree vect_create_destination_var (tree, tree);
>  extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT);
>  extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT);
> -extern bool vect_grouped_load_supported (tree, unsigned HOST_WIDE_INT);
> +extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT);
>  extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT);
>  extern void vect_permute_store_chain (vec ,unsigned int, gimple *,
>  gimple_stmt_iterator *, vec *);
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c
> +++ gcc/tree-vect-data-refs.c
> @@ -5131,14 +5131,31 @@ vect_setup_realignment (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>
>  /* Function vect_grouped_load_supported.
>
> -   Returns TRUE if even and odd permutations are supported,
> -   and FALSE otherwise.  */
> +   COUNT is the size of the load group (the number of statements plus the
> +   number of gaps).  SINGLE_ELEMENT_P is true if there is actually
> +   only one statement, with a gap of COUNT - 1.
> +
> +   Returns true if a suitable permute exists.  */
>
>  bool
> -vect_grouped_load_supported (tree vectype, unsigned HOST_WIDE_INT count)
> +vect_grouped_load_supported (tree vectype, bool single_element_p,
> +unsigned HOST_WIDE_INT count)
>  {
>machine_mode mode = TYPE_MODE (vectype);
>
> +  /* If this is single-element interleaving with an element distance
> + that leaves unused vector loads around punt - we at least create
> + very sub-optimal code in that case (and blow up memory,
> + see PR65518).  */
> +  if (single_element_p && count > TYPE_VECTOR_SUBPARTS (vectype))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"single-element interleaving not supported "
> +"for not adjacent vector loads\n");
> +  return false;
> +}
> +
>/* vect_permute_load_chain requires the group size to be equal to 3 or
>   be a power of two.  */
>if (count != 3 && exact_log2 (count) == -1)
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c
> +++ gcc/tree-vect-loop.c
> @@ -2148,10 +2148,12 @@ again:
> {
>   vinfo = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
>   vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo));
> + bool single_element_p = !STMT_VINFO_GROUP_NEXT_ELEMENT (vinfo);
>   size = STMT_VINFO_GROUP_SIZE (vinfo);
>   vectype = STMT_VINFO_VECTYPE (vinfo);
>   if (! vect_load_lanes_supported (vectype, size)
> - && ! vect_grouped_load_supported (vectype, size))
> + && ! vect_grouped_load_supported (vectype, single_element_p,
> +   size))
> return false;
> }
>  }
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c
> +++ gcc/tree-vect-stmts.c
> @@ -6298,31 +6298,20 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>
>first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
>group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
> +  bool single_element_p = (first_stmt == stmt
> +  && !GROUP_NEXT_ELEMENT (stmt_info));
>
>if (!slp && !STMT_VINFO_STRIDED_P (stmt_info))
> {
>   if (vect_load_lanes_supported (vectype, group_size))
> load_lanes_p = true;
> - else if (!vect_grouped_load_supported (vectype, 

Re: [4/7] Add a gather_scatter_info structure

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:51 AM, Richard Sandiford
 wrote:
> This patch just refactors the gather/scatter support so that all
> information is in a single structure, rather than separate variables.
> This reduces the number of arguments to a function added in patch 6.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Please pack it by moving offset_dt after offset_vectype.

Ok with that change.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-vectorizer.h (gather_scatter_info): New structure.
> (vect_check_gather_scatter): Return a bool rather than a decl.
> Replace return-by-pointer arguments with a single
> gather_scatter_info *.
> * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
> (vect_analyze_data_refs): Update call accordingly.
> * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
> (vectorizable_mask_load_store): Likewise.  Also record the
> offset dt and vectype in the gather_scatter_info.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h
> +++ gcc/tree-vectorizer.h
> @@ -612,6 +612,28 @@ typedef struct _stmt_vec_info {
>unsigned int num_slp_uses;
>  } *stmt_vec_info;
>
> +/* Information about a gather/scatter call.  */
> +struct gather_scatter_info {
> +  /* The FUNCTION_DECL for the built-in gather/scatter function.  */
> +  tree decl;
> +
> +  /* The loop-invariant base value.  */
> +  tree base;
> +
> +  /* The original scalar offset, which is a non-loop-invariant SSA_NAME.  */
> +  tree offset;
> +
> +  /* The definition type for the vectorized offset.  */
> +  enum vect_def_type offset_dt;
> +
> +  /* The type of the vectorized offset.  */
> +  tree offset_vectype;
> +
> +  /* Each offset element should be multiplied by this amount before
> + being added to the base.  */
> +  int scale;
> +};
> +
>  /* Access Functions.  */
>  #define STMT_VINFO_TYPE(S) (S)->type
>  #define STMT_VINFO_STMT(S) (S)->stmt
> @@ -1035,8 +1057,8 @@ extern bool vect_verify_datarefs_alignment 
> (loop_vec_info);
>  extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance);
>  extern bool vect_analyze_data_ref_accesses (vec_info *);
>  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
> -extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree 
> *,
> -  int *);
> +extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
> +  gather_scatter_info *);
>  extern bool vect_analyze_data_refs (vec_info *, int *);
>  extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
>   tree *, gimple_stmt_iterator *,
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c
> +++ gcc/tree-vect-data-refs.c
> @@ -3174,12 +3174,12 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
> loop_vinfo)
>return true;
>  }
>
> -/* Check whether a non-affine read or write in stmt is suitable for gather 
> load
> -   or scatter store and if so, return a builtin decl for that operation.  */
> +/* Return true if a non-affine read or write in STMT is suitable for a
> +   gather load or scatter store.  Describe the operation in *INFO if so.  */
>
> -tree
> -vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo, tree 
> *basep,
> -  tree *offp, int *scalep)
> +bool
> +vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo,
> +  gather_scatter_info *info)
>  {
>HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
>struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> @@ -3253,7 +3253,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
> loop_vinfo, tree *basep,
>if (!expr_invariant_in_loop_p (loop, base))
>  {
>if (!integer_zerop (off))
> -   return NULL_TREE;
> +   return false;
>off = base;
>base = size_int (pbitpos / BITS_PER_UNIT);
>  }
> @@ -3279,7 +3279,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
> loop_vinfo, tree *basep,
>   gimple *def_stmt = SSA_NAME_DEF_STMT (off);
>
>   if (expr_invariant_in_loop_p (loop, off))
> -   return NULL_TREE;
> +   return false;
>
>   if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
> break;
> @@ -3291,7 +3291,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
> loop_vinfo, tree *basep,
>else
> {
>   if (get_gimple_rhs_class (TREE_CODE (off)) == GIMPLE_TERNARY_RHS)
> -   return NULL_TREE;
> +   return false;
>   code = TREE_CODE (off);
>   

Re: [3/7] Fix load/store costs for strided groups

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:50 AM, Richard Sandiford
 wrote:
> vect_model_store_cost had:
>
>   /* Costs of the stores.  */
>   if (STMT_VINFO_STRIDED_P (stmt_info)
>   && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
> {
>   /* N scalar stores plus extracting the elements.  */
>   inside_cost += record_stmt_cost (body_cost_vec,
>ncopies * TYPE_VECTOR_SUBPARTS 
> (vectype),
>scalar_store, stmt_info, 0, vect_body);
>
> But non-SLP strided groups also use individual scalar stores rather than
> vector stores, so I think we should skip this only for SLP groups.
>
> The same applies to vect_model_load_cost.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

The && grouped_access is redundant, for slp_nodes that's always true.

I don't think we have something like strided groups in the interleaving case,
so the proposed fix (minus the && grouped_access_p) looks like a (good) no-op.

Ok with the suggested change,

> +  if (STMT_VINFO_STRIDED_P (stmt_info) && !slp_node)

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-vect-stmts.c (vect_model_store_cost): For non-SLP
> strided groups, use the cost of N scalar accesses instead
> of ncopies vector accesses.
> (vect_model_load_cost): Likewise.
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index e90eeda..f883580 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -926,8 +926,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, int 
> ncopies,
>
>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>/* Costs of the stores.  */
> -  if (STMT_VINFO_STRIDED_P (stmt_info)
> -  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +  if (STMT_VINFO_STRIDED_P (stmt_info) && !(slp_node && grouped_access_p))
>  {
>/* N scalar stores plus extracting the elements.  */
>inside_cost += record_stmt_cost (body_cost_vec,
> @@ -1059,8 +1058,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int 
> ncopies,
>  }
>
>/* The loads themselves.  */
> -  if (STMT_VINFO_STRIDED_P (stmt_info)
> -  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +  if (STMT_VINFO_STRIDED_P (stmt_info) && !(slp_node && grouped_access_p))
>  {
>/* N scalar loads plus gathering them into a vector.  */
>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>


[PATCH][1/2] Move choose_mult_variant declaration and dependent declarations to expmed.h

2016-06-15 Thread Kyrill Tkachov

Hi all,

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00951.html.
This just moves the necessary declarations to expmed.h so that a file that 
includes
expmed.h can access the mult synthesis algorithms.

Bootstrapped and tested on x86_64, aarch64, arm.

Ok for trunk?

Thanks,
Kyrill

2016-06-15  Kyrylo Tkachov  

* expmed.c (mult_variant, choose_mult_variant): Move declaration to...
* expmed.h: ... Here.
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 1a32e9f1b664f250c5092022eb965237ed0342fc..4c2d94bf73114c5cf5014820a84b318ccee336e9 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -35,6 +35,15 @@ enum alg_code {
   alg_impossible
 };
 
+/* Indicates the type of fixup needed after a constant multiplication.
+   BASIC_VARIANT means no fixup is needed, NEGATE_VARIANT means that
+   the result should be negated, and ADD_VARIANT means that the
+   multiplicand should be added to the result.  */
+enum mult_variant {basic_variant, negate_variant, add_variant};
+
+bool choose_mult_variant (machine_mode, HOST_WIDE_INT,
+			  struct algorithm *, enum mult_variant *, int);
+
 /* This structure holds the "cost" of a multiply sequence.  The
"cost" field holds the total rtx_cost of every operator in the
synthetic multiplication sequence, hence cost(a op b) is defined
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 6645a535b3eef9624e6f3ce61d2fcf864d1cf574..bd29e42aae03742a856d0a4f1232a47ac254f8d6 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2482,16 +2482,8 @@ expand_variable_shift (enum tree_code code, machine_mode mode, rtx shifted,
 }
 
 
-/* Indicates the type of fixup needed after a constant multiplication.
-   BASIC_VARIANT means no fixup is needed, NEGATE_VARIANT means that
-   the result should be negated, and ADD_VARIANT means that the
-   multiplicand should be added to the result.  */
-enum mult_variant {basic_variant, negate_variant, add_variant};
-
 static void synth_mult (struct algorithm *, unsigned HOST_WIDE_INT,
 			const struct mult_cost *, machine_mode mode);
-static bool choose_mult_variant (machine_mode, HOST_WIDE_INT,
- struct algorithm *, enum mult_variant *, int);
 static rtx expand_mult_const (machine_mode, rtx, HOST_WIDE_INT, rtx,
 			  const struct algorithm *, enum mult_variant);
 static unsigned HOST_WIDE_INT invert_mod2n (unsigned HOST_WIDE_INT, int);
@@ -2981,7 +2973,7 @@ synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
Return true if the cheapest of these cost less than MULT_COST,
describing the algorithm in *ALG and final fixup in *VARIANT.  */
 
-static bool
+bool
 choose_mult_variant (machine_mode mode, HOST_WIDE_INT val,
 		 struct algorithm *alg, enum mult_variant *variant,
 		 int mult_cost)


[PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-15 Thread Kyrill Tkachov

Hi all,

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html 
following feedback.
I've changed the code to cast the operand to an unsigned type before applying 
the multiplication algorithm
and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function 
cast_mult_synth_to_unsigned in which I've implemented
the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.

I've added a couple of TODO notes in the places that need to be extended to 
handle shifts as a series of additions
for targets that support vector addition but not vector shifts [2].

tree-vect-patterns.c already includes expmed.h (must have been added in the 
time since I first wrote the patch
last November...) so it picks up the definition of choose_mult_variant moved 
there in patch 1/2.

Bootstrapped and tested on aarch64, arm, x86_64.
Does this look better?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01023.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00967.html

2016-06-15  Kyrylo Tkachov  

PR target/65951
* tree-vect-patterns.c: Include mult-synthesis.h
(target_supports_mult_synth_alg): New function.
(apply_binop_and_append_stmt): Likewise.
(vect_synth_mult_by_constant): Likewise.
(target_has_vecop_for_code): Likewise.
(cast_mult_synth_to_unsigned): Likewise.
(vect_recog_mult_pattern): Use above functions to synthesize vector
multiplication by integer constants.

2016-06-15  Kyrylo Tkachov  

* gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index ..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  data[i] = i;
+  __asm__ volatile ("");
+}
+
+  foo (data);
+  for (i = 0; i < N; i++)
+{
+  if (data[i] / 123 != i)
+  __builtin_abort ();
+  __asm__ volatile ("");
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index ..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  data[i] = i;
+  __asm__ volatile ("");
+}
+
+  foo (data);
+  for (i = 0; i < N; i++)
+{
+  if (data[i] / -19594LL != i)
+  __builtin_abort ();
+  __asm__ volatile ("");
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 8a2221f935063002ecd02d2b20af5cb4bd7d9fee..698a022d4b80795a0b5a22a9c433fe482fa1bdaa 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2131,32 +2131,281 @@ vect_recog_vector_vector_shift_pattern (vec *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Return true iff we need to cast the operand of the
+   

Re: [2/7] Clean up vectorizer load/store costs

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:49 AM, Richard Sandiford
 wrote:
> Add a bit more commentary and try to make the structure more obvious.
> The horrendous:
>
>   if (grouped_access_p
>   && represents_group_p
>   && !store_lanes_p
>   && !STMT_VINFO_STRIDED_P (stmt_info)
>   && !slp_node)
>
> checks go away in patch 6.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-vect-stmts.c (vect_cost_group_size): Delete.
> (vect_model_store_cost): Avoid calling it.  Use first_stmt_p
> variable to indicate when once-per-group costs are being used.
> (vect_model_load_cost): Likewise.  Fix comment and misindented code.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c
> +++ gcc/tree-vect-stmts.c
> @@ -865,24 +865,6 @@ vect_model_promotion_demotion_cost (stmt_vec_info 
> stmt_info,
>   "prologue_cost = %d .\n", inside_cost, prologue_cost);
>  }
>
> -/* Function vect_cost_group_size
> -
> -   For grouped load or store, return the group_size only if it is the first
> -   load or store of a group, else return 1.  This ensures that group size is
> -   only returned once per group.  */
> -
> -static int
> -vect_cost_group_size (stmt_vec_info stmt_info)
> -{
> -  gimple *first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
> -
> -  if (first_stmt == STMT_VINFO_STMT (stmt_info))
> -return GROUP_SIZE (stmt_info);
> -
> -  return 1;
> -}
> -
> -
>  /* Function vect_model_store_cost
>
> Models cost for stores.  In the case of grouped accesses, one access
> @@ -895,47 +877,43 @@ vect_model_store_cost (stmt_vec_info stmt_info, int 
> ncopies,
>stmt_vector_for_cost *prologue_cost_vec,
>stmt_vector_for_cost *body_cost_vec)
>  {
> -  int group_size;
>unsigned int inside_cost = 0, prologue_cost = 0;
> -  struct data_reference *first_dr;
> -  gimple *first_stmt;
> +  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> +  gimple *first_stmt = STMT_VINFO_STMT (stmt_info);
> +  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>
>if (dt == vect_constant_def || dt == vect_external_def)
>  prologue_cost += record_stmt_cost (prologue_cost_vec, 1, scalar_to_vec,
>stmt_info, 0, vect_prologue);
>
> -  /* Grouped access?  */
> -  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +  /* Grouped stores update all elements in the group at once,
> + so we want the DR for the first statement.  */
> +  if (!slp_node && grouped_access_p)
>  {
> -  if (slp_node)
> -{
> -  first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
> -  group_size = 1;
> -}
> -  else
> -{
> -  first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
> -  group_size = vect_cost_group_size (stmt_info);
> -}
> -
> -  first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
> -}
> -  /* Not a grouped access.  */
> -  else
> -{
> -  group_size = 1;
> -  first_dr = STMT_VINFO_DATA_REF (stmt_info);
> +  first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
> +  dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
>  }
>
> +  /* True if we should include any once-per-group costs as well as
> + the cost of the statement itself.  For SLP we only get called
> + once per group anyhow.  */
> +  bool first_stmt_p = (first_stmt == STMT_VINFO_STMT (stmt_info));
> +
>/* We assume that the cost of a single store-lanes instruction is
>   equivalent to the cost of GROUP_SIZE separate stores.  If a grouped
>   access is instead being provided by a permute-and-store operation,
> - include the cost of the permutes.  */
> -  if (!store_lanes_p && group_size > 1
> -  && !STMT_VINFO_STRIDED_P (stmt_info))
> + include the cost of the permutes.
> +
> + For SLP, the caller has already counted the permutation, if any.  */
> +  if (grouped_access_p
> +  && first_stmt_p
> +  && !store_lanes_p
> +  && !STMT_VINFO_STRIDED_P (stmt_info)
> +  && !slp_node)
>  {
>/* Uses a high and low interleave or shuffle operations for each
>  needed permute.  */
> +  int group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
>int nstmts = ncopies * ceil_log2 (group_size) * group_size;
>inside_cost = record_stmt_cost (body_cost_vec, nstmts, vec_perm,
>   stmt_info, 0, vect_body);
> @@ -957,7 +935,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, int 
> ncopies,
>scalar_store, stmt_info, 0, vect_body);
>  }
>else
> -vect_get_store_cost (first_dr, ncopies, _cost, body_cost_vec);
> +vect_get_store_cost (dr, ncopies, _cost, body_cost_vec);
>
>if (STMT_VINFO_STRIDED_P (stmt_info))
> 

Re: [1/7] Remove unnecessary peeling for gaps check

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:48 AM, Richard Sandiford
 wrote:
> I recently relaxed the peeling-for-gaps conditions for LD3 but
> kept them as-is for load-and-permute.  I don't think the conditons
> are needed for load-and-permute either though.  No current load-and-
> permute should load outside the group, so if there is no gap at the end,
> the final vector element loaded will correspond to an element loaded
> by the original scalar loop.
>
> The patch for PR68559 (a missed optimisation PR) increased the peeled
> cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so
> before that fix, we didn't peel for gaps if there was no gap at the end
> of the group and if the group size was a power of 2.
>
> The only current non-power-of-2 load-and-permute size is 3, which
> doesn't require loading more than 3 vectors.
>
> The testcase is based on gcc.dg/vect/pr49038.c.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-vect-stmts.c (vectorizable_load): Remove unnecessary
> peeling-for-gaps condition.
>
> gcc/testsuite/
> * gcc.dg/vect/group-no-gaps-1.c: New test.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c
> +++ gcc/tree-vect-stmts.c
> @@ -6356,13 +6356,11 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>   gcc_assert (GROUP_GAP (stmt_info));
> }
>
> -  /* If there is a gap in the end of the group or the group size cannot
> - be made a multiple of the vector element count then we access excess
> +  /* If there is a gap in the end of the group then we access excess
>  elements in the last iteration and thus need to peel that off.  */
>if (loop_vinfo
>   && ! STMT_VINFO_STRIDED_P (stmt_info)
> - && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
> - || (!slp && !load_lanes_p && vf % group_size != 0)))
> + && GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> Index: gcc/testsuite/gcc.dg/vect/group-no-gaps-1.c
> ===
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/vect/group-no-gaps-1.c
> @@ -0,0 +1,108 @@
> +/* { dg-require-effective-target mmap } */
> +
> +#include 
> +#include 
> +
> +#define COUNT 320
> +#define MMAP_SIZE 0x2
> +#define ADDRESS1 0x112200
> +#define ADDRESS2 (ADDRESS1 + MMAP_SIZE * 16)
> +#define TYPE unsigned int
> +
> +#ifndef MAP_ANONYMOUS
> +#define MAP_ANONYMOUS MAP_ANON
> +#endif
> +
> +#define RHS0(B) b[B]
> +#define RHS1(B) RHS0(B) + b[(B) + 1]
> +#define RHS2(B) RHS1(B) + b[(B) + 2]
> +#define RHS3(B) RHS2(B) + b[(B) + 3]
> +#define RHS4(B) RHS3(B) + b[(B) + 4]
> +#define RHS5(B) RHS4(B) + b[(B) + 5]
> +#define RHS6(B) RHS5(B) + b[(B) + 6]
> +#define RHS7(B) RHS6(B) + b[(B) + 7]
> +
> +#define LHS0(B) a[B]
> +#define LHS1(B) LHS0(B) = a[(B) + 1]
> +#define LHS2(B) LHS1(B) = a[(B) + 2]
> +#define LHS3(B) LHS2(B) = a[(B) + 3]
> +#define LHS4(B) LHS3(B) = a[(B) + 4]
> +#define LHS5(B) LHS4(B) = a[(B) + 5]
> +#define LHS6(B) LHS5(B) = a[(B) + 6]
> +#define LHS7(B) LHS6(B) = a[(B) + 7]
> +
> +#define DEF_GROUP_SIZE(MULT, GAP, NO_GAP)  \
> +  void __attribute__((noinline, noclone))  \
> +  gap_load_##MULT (TYPE *__restrict a, TYPE *__restrict b) \
> +  {\
> +for (int i = 0; i < COUNT; i++)\
> +  a[i] = RHS##GAP (i * MULT);  \
> +  }\
> +  void __attribute__((noinline, noclone))  \
> +  no_gap_load_##MULT (TYPE *__restrict a, TYPE *__restrict b)  \
> +  {\
> +for (int i = 0; i < COUNT; i++)\
> +  a[i] = RHS##NO_GAP (i * MULT);   \
> +  }\
> +  void __attribute__((noinline, noclone))  \
> +  gap_store_##MULT (TYPE *__restrict a, TYPE *__restrict b)\
> +  {\
> +for (int i = 0; i < COUNT; i++)\
> +  LHS##GAP (i * MULT) = b[i];  \
> +  }\
> +  void __attribute__((noinline, noclone))  \
> +  no_gap_store_##MULT (TYPE *__restrict a, TYPE *__restrict b) \
> +  {\
> +for (int i = 0; i < COUNT; i++)\
> +  LHS##NO_GAP (i * MULT) = b[i];

Re: [PATCH] Add testcase for 4.8 aarch64 ICE

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 1:51 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> Hi!
>
> This testcase ICEs on aarch64 at -O2 on the 4.8 branch, got fixed with
> PR52714 fix (r208204).
>
> Is the testcase ok for trunk?  Tested on x86_64-linux and i686-linux.

Ok.

RIchard.

> 2016-06-15  Jakub Jelinek  <ja...@redhat.com>
>
>     * gcc.c-torture/compile/20160615-1.c: New test.
>
> --- gcc/testsuite/gcc.c-torture/compile/20160615-1.c.jj 2016-06-15 
> 11:17:54.690689056 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/20160615-1.c2016-06-15 
> 11:17:48.811765657 +0200
> @@ -0,0 +1,10 @@
> +int a;
> +void bar (int, unsigned, unsigned);
> +
> +void
> +foo (unsigned x)
> +{
> +  unsigned b = a ? x : 0;
> +  if (x || b)
> +bar (0, x, b);
> +}
>
> Jakub


Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-15 Thread Richard Biener
On Wed, 15 Jun 2016, Michael Meissner wrote:

> On Wed, Jun 15, 2016 at 11:01:05AM +0200, Richard Biener wrote:
> > On Tue, 14 Jun 2016, Bill Schmidt wrote:
> > 
> > > Hi Richard,
> > > 
> > > As nobody else has replied, let me take a stab at this one.
> > > 
> > > > On Jun 10, 2016, at 2:06 AM, Richard Biener  wrote:
> > > > 
> > > > On Thu, 9 Jun 2016, Michael Meissner wrote:
> > > > 
> > > >> I'm including the global reviewers on the list.  I just want to be 
> > > >> sure that
> > > >> there is no problem installing these patches on the GCC 6.2 branch.  
> > > >> While it
> > > >> is technically an enchancement, it is needed to be able to install the 
> > > >> glibc
> > > >> support that is needed to complete the work to add IEEE 128-bit 
> > > >> floating point.
> > > >> 
> > > >> The issue being fixed is that when we are creating the complex type, 
> > > >> we used to
> > > >> do a lookup for the size, and that fails on the PowerPC which has 2 
> > > >> 128-bit
> > > >> floating point types (__ibm128 and __float128, with long double 
> > > >> currently
> > > >> defaulting to __ibm128).
> > > > 
> > > > As this enhancement includes middle-end changes I am hesitant to approve
> > > > it for the branch.  Why is it desirable to backport this change?
> > > 
> > > It comes down to community requirements and schedules.  We are in the 
> > > process of
> > > replacing the incompatible IBM long double type with true IEEE-754 
> > > 128-bit floating
> > > point (__float128).  This is a complex multi-stage process where we will 
> > > have to
> > > maintain the functionality of the existing IBM long double for backward 
> > > compatibility
> > > while the new type is implemented.  This impacts multiple packages, 
> > > starting with
> > > gcc and glibc.
> > > 
> > > The glibc maintainers have indicated that work there depends on a certain 
> > > level of
> > > functionality within GCC.  Specifically, both the old and new types must 
> > > be supported,
> > > including corresponding complex types.  Unfortunately the realization 
> > > that the complex
> > > types had to be supported came late, and this work didn't fully make it 
> > > into GCC 6.1.
> > > 
> > > (Part of the problem that has made this whole effort difficult is that it 
> > > is complicated to
> > > maintain two floating-point types of the exact same size.)
> > > 
> > > In any case, the glibc maintainers require this work in GCC 6 so that 
> > > work can begin
> > > in glibc 2.24, with completion scheduled in glibc 2.25.  We are asking 
> > > for an exception 
> > > for this patch in order to allow those schedules to move forward.
> > > 
> > > So that's the history as I understand it... Perhaps others can jump in if 
> > > I've munged
> > > or omitted anything important.
> > 
> > Ok, I see the reason for the backport then.  Looking at the patch
> > the only fragile thing is the layout_type change give it adds an assert
> > and you did need to change frontends (but not all of them?).  I'm not
> > sure about testsuite coverage for complex type for, say, Go or Ada
> > or Java.
> 
> I added the assert after the review from Berndt.  It was to make sure there
> were no other callers to layout_type to create complex nodes.
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00077.html
> 
> > And I don't understand the layout_type change either - it looks to me
> > it could just have used
> > 
> >   SET_TYPE_MODE (type, GET_MODE_COMPLEX_MODE (TYPE_MODE 
> > (TREE_TYPE (type;
> > 
> > and be done with it.  To me that looks a lot safer.
> 
> It has been some time since I looked at the code, I will have to investigate
> it futher.
> 
> Note, I will be offline for the next 4 days.
> 
> > With now having two complex FP modes with the same size how does
> > iteration over MODE_COMPLEX_FLOAT work with GET_MODE_WIDER_MODE?
> > Is the outcome random?  Or do we visit both modes?  That is, could
> > GET_MODE_COMPLEX_MODE be implemented with iterating over complex modes
> > and in addition to size also match the component mode instead?
> 
> I struggled quite a bit with GET_WIDER_MODE.  There are three distinct usage
> cases in the compiler.  One case uses GET_WIDER_MODE to initialize all of the
> types.  Here you want to visit all of the types (though we could change the
> code, since genmode does sort the types so that all of the types for a given
> class are together).
> 
> Another case is the normal case, where given a type, go up to a wider type 
> that
> might implment the code you are looking for.
> 
> A third case is when generating floating point constants in the constant pool,
> see if there is a smaller type that maintains the precision.
> 
> Eventually, I decided to punt having to have explicit paths for widening.  I
> used fractional modes for IFmode (ibm long double format) and KFmode (IEEE
> 128-bit format).  IFmode widens to KFmode which widens to TFmode.  A backend
> hook is used to not allow IBM long double to widen to IEEE long 

Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-15 Thread Michael Meissner
On Wed, Jun 15, 2016 at 11:01:05AM +0200, Richard Biener wrote:
> On Tue, 14 Jun 2016, Bill Schmidt wrote:
> 
> > Hi Richard,
> > 
> > As nobody else has replied, let me take a stab at this one.
> > 
> > > On Jun 10, 2016, at 2:06 AM, Richard Biener  wrote:
> > > 
> > > On Thu, 9 Jun 2016, Michael Meissner wrote:
> > > 
> > >> I'm including the global reviewers on the list.  I just want to be sure 
> > >> that
> > >> there is no problem installing these patches on the GCC 6.2 branch.  
> > >> While it
> > >> is technically an enchancement, it is needed to be able to install the 
> > >> glibc
> > >> support that is needed to complete the work to add IEEE 128-bit floating 
> > >> point.
> > >> 
> > >> The issue being fixed is that when we are creating the complex type, we 
> > >> used to
> > >> do a lookup for the size, and that fails on the PowerPC which has 2 
> > >> 128-bit
> > >> floating point types (__ibm128 and __float128, with long double currently
> > >> defaulting to __ibm128).
> > > 
> > > As this enhancement includes middle-end changes I am hesitant to approve
> > > it for the branch.  Why is it desirable to backport this change?
> > 
> > It comes down to community requirements and schedules.  We are in the 
> > process of
> > replacing the incompatible IBM long double type with true IEEE-754 128-bit 
> > floating
> > point (__float128).  This is a complex multi-stage process where we will 
> > have to
> > maintain the functionality of the existing IBM long double for backward 
> > compatibility
> > while the new type is implemented.  This impacts multiple packages, 
> > starting with
> > gcc and glibc.
> > 
> > The glibc maintainers have indicated that work there depends on a certain 
> > level of
> > functionality within GCC.  Specifically, both the old and new types must be 
> > supported,
> > including corresponding complex types.  Unfortunately the realization that 
> > the complex
> > types had to be supported came late, and this work didn't fully make it 
> > into GCC 6.1.
> > 
> > (Part of the problem that has made this whole effort difficult is that it 
> > is complicated to
> > maintain two floating-point types of the exact same size.)
> > 
> > In any case, the glibc maintainers require this work in GCC 6 so that work 
> > can begin
> > in glibc 2.24, with completion scheduled in glibc 2.25.  We are asking for 
> > an exception 
> > for this patch in order to allow those schedules to move forward.
> > 
> > So that's the history as I understand it... Perhaps others can jump in if 
> > I've munged
> > or omitted anything important.
> 
> Ok, I see the reason for the backport then.  Looking at the patch
> the only fragile thing is the layout_type change give it adds an assert
> and you did need to change frontends (but not all of them?).  I'm not
> sure about testsuite coverage for complex type for, say, Go or Ada
> or Java.

I added the assert after the review from Berndt.  It was to make sure there
were no other callers to layout_type to create complex nodes.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00077.html

> And I don't understand the layout_type change either - it looks to me
> it could just have used
> 
>   SET_TYPE_MODE (type, GET_MODE_COMPLEX_MODE (TYPE_MODE 
> (TREE_TYPE (type;
> 
> and be done with it.  To me that looks a lot safer.

It has been some time since I looked at the code, I will have to investigate
it futher.

Note, I will be offline for the next 4 days.

> With now having two complex FP modes with the same size how does
> iteration over MODE_COMPLEX_FLOAT work with GET_MODE_WIDER_MODE?
> Is the outcome random?  Or do we visit both modes?  That is, could
> GET_MODE_COMPLEX_MODE be implemented with iterating over complex modes
> and in addition to size also match the component mode instead?

I struggled quite a bit with GET_WIDER_MODE.  There are three distinct usage
cases in the compiler.  One case uses GET_WIDER_MODE to initialize all of the
types.  Here you want to visit all of the types (though we could change the
code, since genmode does sort the types so that all of the types for a given
class are together).

Another case is the normal case, where given a type, go up to a wider type that
might implment the code you are looking for.

A third case is when generating floating point constants in the constant pool,
see if there is a smaller type that maintains the precision.

Eventually, I decided to punt having to have explicit paths for widening.  I
used fractional modes for IFmode (ibm long double format) and KFmode (IEEE
128-bit format).  IFmode widens to KFmode which widens to TFmode.  A backend
hook is used to not allow IBM long double to widen to IEEE long double and vice
versa.  At the moment, since there is no wider type than 128-bits, it isn't an
issue.

Note, I do feel the front ends should be modified to allow __complex __float128
directly rather than having to use an attribute to force the complex type (and
to use mode(TF) on 

Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-06-15 Thread H.J. Lu
On Wed, Jun 15, 2016 at 2:49 AM, Bernd Schmidt  wrote:
> On 06/15/2016 03:30 AM, Alan Modra wrote:
>>
>> Between these two calls to _gfortran_string_verify,
>>   if (verify(c4, "A", back = .true.) .ne. 3) call abort
>>   if (verify(c4, "AB") .ne. 0) call abort
>> it seems that gfortran is assuming that parameters passed on the stack
>> are unchanged.
>
>
> How? Is this something in the Fortran frontend, or is there CSE going on for
> stores to the argument area?

Is this related to

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71532

-- 
H.J.


Re: [PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-15 Thread Joseph Myers
On Wed, 15 Jun 2016, Jakub Jelinek wrote:

> The only thing I'm unsure about is what to do with bitfield types.
> For __builtin_{add,sub,mul}_overflow it is not an issue, as one can't take
> address of a bitfield.  For __builtin_{add,sub,mul}_overflow_p right now,
> the C FE doesn't promote the last argument in any way, therefore for C
> the builtin-arith-overflow-p-19.c testcase tests the behavior of bitfield
> overflows.  The C++ FE even for type-generic builtins promotes the argument
> to the underlying type (as part of decay_conversion), therefore for C++
> overflow to bit-fields doesn't work.  Is that acceptable that because the
> bitfields in the two languages behave generally slightly differently it is
> ok that it differs even here, or should the C FE promote bitfields to the
> underlying type for the last argument of __builtin_{add,sub,mul}_overflow_p,
> or should the C++ FE special case __builtin_{add,sub,mul}_overflow_p and
> not decay_conversion on the last argument to these, something else?

If the idea of the built-ins it to tell whether the result of arithmetic 
could be stored in a particular object, then it would seem natural not to 
promote bit-fields in either language, and so enable telling whether the 
result of arithmetic could be stored in a bit-field.  Except that of 
course the last argument is an rvalue not an lvalue, and when it's a 
question of whether the result could be stored in a particular type rather 
than a particular object, it's less clear that C++ shouldn't promote.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-15 Thread Dr. Philipp Tomsich

> On 10 Jun 2016, at 01:28, Jim Wilson  wrote:
> 
> On Tue, May 31, 2016 at 2:56 AM, James Greenhalgh
>  wrote:
>> As you're proposing to have this on by default, I'd like to give a chance
>> to hear whether there is consensus as to this being the right choice for
>> the thunderx, xgene1, exynos-m1 and qdf24xx subtargets.

Testing on XGene-1 and XGene-2 shows a small improvement (0.8% overall)
on SPEC2006 and a negligible improvement for CoreMark, just in line with 
what we’d expect. 

So from our end, it’s a vote for “on by default”.

Regards,
Phil.

Re: [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:35 PM, Ilya Enkovich  wrote:
> Hi,
>
> This series is an extension of previous work on loop epilogue combining [1].
>
> It introduces three ways to handle vectorized loop epilogues: combine it with
> vectorized loop, vectorize it with masks, vectorize it using a smaller vector
> size.
>
> Also it supports vectorization of loops with low trip count.
>
> Epilogue combining is used as a basic masking transformation.  Epilogue
> masking and low trip count loop vectorization is considered as epilogue
> combining with a zero trip count vector loop.
>
> Epilogues vectorization is controlled via new option 
> -ftree-vectorize-epilogues=
> which gets a comma separated list of enabled modes which include combine, 
> mask,
> nomask.  There is a separate option -ftree-vectorize-short-loops for low trip
> count loops.
>
> To support epilogues vectorization I use a queue of loops to be vectorized in
> vectorize_loops and change vect_transform_loop to return generated epilogue
> (in case we want to try vectorize it).  If epilogue is returned then it is
> queued for processing.  This variant of epilogues processing was chosen 
> because
> it is simple and works for all epilogue processing options.
>
> There are currently some limitations implied by this scheme:
>  - Copied loop misses some required optimization info (e.g. scev info)
> which may result in an epilogue which cannot be vectorized
>  - Loop epilogue may require if-convertion
>  - Alias/alignment checks are not inherited and therefore will be performed
> one more time for epilogue.  For now epilogue vectorization is just disabled
> in case alias versioning is required and alignment enhancement is
> disabled for epilogues.
>
> There is a set of new fields added to _loop_vec_info to support epilogues
> vectorization.
>
> LOOP_VINFO_CAN_BE_MASKED - true if vectorized loop can be masked.  It is
> computed during vectorization analysis (in various vectorizable_* functions).
>
> LOOP_VINFO_REQUIRED_MASKS - for loop which can be masked it holds all masks
> required to mask the loop.
>
> LOOP_VINFO_COMBINE_EPILOGUE - true if we decided vectorized loop should be
> masked.
>
> LOOP_VINFO_MASK_EPILOGUE - true if we decided an epilogue of this loop
> should be vectorized and masked
>
> LOOP_VINFO_NEED_MASKING - true if vectorized loop has to be masked (set for
> epilogues we want to mask and low trip count loops).
>
> LOOP_VINFO_ORIG_LOOP_INFO - for epilogues this holds loop_vec_info of the
> original vectorized loop.
>
> To make a decision whether we want to mask or combine a loop epilogue
> cost model is extended with masking costs.  This includes 
> vect_masking_prologue
> and vect_masking_body elements added to vect_cost_model_location enum and
> finish_cost extended with two additional returned values correspondingly.  
> Also
> in addition to add_stmt_cost I also add add_stmt_masking_cost to compute
> a cost for masking a statement.
>
> vect_estimate_min_profitable_iters checks if epilogue masking is profitable
> and also computes a number of iterations required to have profitable
> epilogue combining (this number may be used as a threshold in vectorized
> loop guard).
>
> These patches do not enable any of new features by default for all 
> optimization
> levels.  Masking features are expected to be mostly used for AVX-512 targets
> and lack of hardware suitable for wide performance testing is the reason cost
> model is not tuned and optimizations are not enabled by default.  With small
> tests using a small number of loop iterations and 'heavy' epilogues (e.g.
> number of iterations is VF*2-1) I see expected ~2x gain on existing KNL 
> hardware.
> Later this year we expect to get an access to KNL machines and have an
> opportunity to tune masking cost model.
>
> On Haswell hardware I don't see performance gains on similar loops which means
> masked code is not better than a scalar one when we have a heavy masks usage.
> This still might be useful in case number statements requiring masking is
> relatively small (I used test a[i] += b[i] which needs masking for 3 out of 4
> vector statements).  We will continue search for cases where masking is
> profitable for Haswell to tune masking costs appropriately.

So I've gone over the patches and gave mostly high-level comments.
The vectorizer
is already in somewhat messy (aka not easy to follow) state, this
series doesn't improve
the situation (heh).  Esp. the high-level structure for code
generation and its documentation
needs work (where we do versioning / peeling and how we use the copies
in which condition
and where, etc).

Now - given my question on the profitability code for vectorized body
masking I wonder
if vectorized body masking shouldn't be better done via adding another
version for
low tripcount loops (not < vf but say < vf * N with N determined by a
cost model).
Otherwise I can't see how we'd ever mask the vectorized body for loops with
an parametric number 

Re: [PATCH, vec-tails 08/10] Support loop epilogue masking and low trip count loop vectorization

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:46 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch enables vectorization of loop epilogues and low trip count
> loops using masking.

I wonder why we have the epilogue masking restriction with respect to
the original vectorization factor - shouldn't this simply be handled by
vectorizing the epilogue?  First trying the original VF (requires masking
and is equivalent to low-tripcount loop vectorization), then if that is not
profitable iterate to smaller VFs?   [yes, ideally we'd be able to compare
cost for vectorization with different VFs and choose the best VF]

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * dbgcnt.def (vect_tail_mask): New.
> * tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
> epilogues and low trip count loops.
> (vect_get_known_peeling_cost): Ignore scalat epilogue cost for
> loops we are going to mask.
> (vect_estimate_min_profitable_iters): Support masked loop
> epilogues and low trip count loops.
> * tree-vectorizer.c (vectorize_loops): Add a message for a case
> when loop epilogue can't be vectorized.
>
>
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 73c2966..5aad1d7 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -193,4 +193,5 @@ DEBUG_COUNTER (tree_sra)
>  DEBUG_COUNTER (vect_loop)
>  DEBUG_COUNTER (vect_slp)
>  DEBUG_COUNTER (vect_tail_combine)
> +DEBUG_COUNTER (vect_tail_mask)
>  DEBUG_COUNTER (dom_unreachable_edges)
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 1a80c42..7075f29 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2199,7 +2199,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
> )
>int saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>HOST_WIDE_INT estimated_niter;
>unsigned th;
> -  int min_scalar_loop_bound;
> +  int min_scalar_loop_bound = 0;
>
>/* Check the SLP opportunities in the loop, analyze and build SLP trees.  
> */
>ok = vect_analyze_slp (loop_vinfo, n_stmts);
> @@ -2224,6 +2224,30 @@ start_over:
>unsigned vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>gcc_assert (vectorization_factor != 0);
>
> +  /* For now we mask loop epilogue using the same VF since it was used
> + for cost estimations and it should be easier for reduction
> + optimization.  */
> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
> +  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) != 
> (int)vectorization_factor)
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"not vectorized: VF for loop epilogue doesn't "
> +"match original loop VF.\n");
> +  return false;
> +}
> +
> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && !LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
> +  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) <= 
> (int)vectorization_factor)
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"not vectorized: VF for loop epilogue is too 
> small\n");
> +  return false;
> +}
> +
>if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
>  "vectorization_factor = %d, niters = "
> @@ -2237,11 +2261,29 @@ start_over:
>|| (max_niter != -1
>   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
>  {
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"not vectorized: iteration count smaller than "
> -"vectorization factor.\n");
> -  return false;
> +  /* Allow low trip count for loop epilogue we want to mask.  */
> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> + && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo))
> +   ;
> +  /* Allow low trip count for non-epilogue loops if flag is enabled.  */
> +  else if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && flag_tree_vectorize_short_loops)
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"iteration count is small, masking is "
> +"required for chosen vectorization factor.\n");
> +
> + LOOP_VINFO_NEED_MASKING (loop_vinfo) = true;
> +   }
> +  else
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"not vectorized: iteration count smaller than "
> +"vectorization factor.\n");
> + return false;
> +   }
>  }
>
>/* Analyze the alignment 

[PATCH] Add testcase for 4.8 aarch64 ICE

2016-06-15 Thread Jakub Jelinek
Hi!

This testcase ICEs on aarch64 at -O2 on the 4.8 branch, got fixed with
PR52714 fix (r208204).

Is the testcase ok for trunk?  Tested on x86_64-linux and i686-linux.

2016-06-15  Jakub Jelinek  <ja...@redhat.com>

* gcc.c-torture/compile/20160615-1.c: New test.

--- gcc/testsuite/gcc.c-torture/compile/20160615-1.c.jj 2016-06-15 
11:17:54.690689056 +0200
+++ gcc/testsuite/gcc.c-torture/compile/20160615-1.c2016-06-15 
11:17:48.811765657 +0200
@@ -0,0 +1,10 @@
+int a;
+void bar (int, unsigned, unsigned);
+
+void
+foo (unsigned x)
+{
+  unsigned b = a ? x : 0;
+  if (x || b)
+bar (0, x, b);
+}

Jakub


Re: [PATCH] Fix code emission for FAIL_ALLOC predictor

2016-06-15 Thread Jan Hubicka
> Adding missing patch.
> 
> Martin

> >From 35ba97e0139d955c04e67ca157f8899bbb468bf1 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 9 Jun 2016 17:51:38 +0200
> Subject: [PATCH] Fix code emission for FAIL_ALLOC predictor
> 
> gcc/ChangeLog:
> 
> 2016-06-13  Martin Liska  
> 
>   * predict.def: Define a new predictor.
> 
> gcc/fortran/ChangeLog:
> 
> 2016-06-13  Martin Liska  
> 
>   * trans-array.c (gfc_array_allocate): Do not generate expect
>   stmt.
>   * trans.c (gfc_allocate_using_malloc): Properly set FAIL_ALLOC
>   predictor for malloc return value.
>   (gfc_allocate_allocatable): Use REALLOC predictor instead of
>   FAIL_ALLOC.
>   (gfc_deallocate_with_status): Likewise.

OK, thanks!

Honza


[C++ PATCH] Add testcase for 4.8 bug

2016-06-15 Thread Jakub Jelinek
Hi!

The following testcase ICEs on the 4.8 branch, starting with r198314,
but works in 4.9+.

Is the testcase ok for trunk?  Tested on x86_64-linux and i686-linux.

2016-06-15  Jakub Jelinek  

* g++.dg/cpp0x/ref-qual17.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/ref-qual17.C.jj  2016-06-15 11:07:11.454070330 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/ref-qual17.C 2016-06-15 11:07:02.0 
+0200
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  void foo () &;
+};
+
+void
+bar (__UINTPTR_TYPE__ a)
+{
+  reinterpret_cast(a)->foo ();
+}

Jakub


[PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-15 Thread Jakub Jelinek
On Tue, Jun 14, 2016 at 11:13:28AM -0600, Martin Sebor wrote:
> >Here is an untested patch for that.  Except that the middle-end considers
> >conversions between BOOLEAN_TYPE and single bit unsigned type as useless,
> >so in theory this can't work well, and in practice only if we are lucky
> >enough (plus it generates terrible code right now), so we'd probably need
> >to come up with a different way of expressing whether the internal fn
> >should have a bool/_Bool-ish behavior or not (optional 3rd argument or
> >something ugly like that).  Plus add lots of testcases to cover the weirdo
> >cases.  Is it really worth it, even when we don't want to support overflow
> >into enumeration type and thus will not cover all integral types anyway?
> 
> If it's cumbersome to get to work I agree that it's not worth
> the effort.  Thanks for taking the time to prototype it.

Ok, so here is an updated patch.  In addition to diagnostic wording changes
this (as also the earlier posted patch) fixes the handling of sub-mode
precision, it adds hopefully sufficient testsuite coverage for
__builtin_{add,sub,mul}_overflow_p.

The only thing I'm unsure about is what to do with bitfield types.
For __builtin_{add,sub,mul}_overflow it is not an issue, as one can't take
address of a bitfield.  For __builtin_{add,sub,mul}_overflow_p right now,
the C FE doesn't promote the last argument in any way, therefore for C
the builtin-arith-overflow-p-19.c testcase tests the behavior of bitfield
overflows.  The C++ FE even for type-generic builtins promotes the argument
to the underlying type (as part of decay_conversion), therefore for C++
overflow to bit-fields doesn't work.  Is that acceptable that because the
bitfields in the two languages behave generally slightly differently it is
ok that it differs even here, or should the C FE promote bitfields to the
underlying type for the last argument of __builtin_{add,sub,mul}_overflow_p,
or should the C++ FE special case __builtin_{add,sub,mul}_overflow_p and
not decay_conversion on the last argument to these, something else?

2016-06-15  Jakub Jelinek  

* internal-fn.c (expand_arith_set_overflow): New function.
(expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
Use it.
(expand_arith_overflow_result_store): Likewise.  Handle precision
smaller than mode precision.
* tree-vrp.c (extract_range_basic): For imag part, handle
properly signed 1-bit precision result.
* doc/extend.texi (__builtin_add_overflow): Document that last
argument can't be pointer to enumerated or boolean type.
(__builtin_add_overflow_p): Document that last argument can't
have enumerated or boolean type.

* c-common.c (check_builtin_function_arguments): Require last
argument of BUILT_IN_*_OVERFLOW_P to have INTEGER_TYPE type.
Adjust wording of diagnostics for BUILT_IN_*_OVERLFLOW
if the last argument is pointer to enumerated or boolean type.

* c-c++-common/builtin-arith-overflow-1.c (generic_wrong_type, f3,
f4): Adjust expected diagnostics.
* c-c++-common/torture/builtin-arith-overflow.h (TP): New macro.
(T): If OVFP is defined, redefine to TP.
* c-c++-common/torture/builtin-arith-overflow-12.c: Adjust comment.
* c-c++-common/torture/builtin-arith-overflow-p-1.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-2.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-3.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-4.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-5.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-6.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-7.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-8.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-9.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-10.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-11.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-12.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-13.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-14.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-15.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-16.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-17.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-18.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-19.c: New test.
* g++.dg/ext/builtin-arith-overflow-1.C: Pass 0 instead of C
as last argument to __builtin_add_overflow_p.

--- gcc/internal-fn.c.jj2016-06-14 21:38:37.759308842 +0200
+++ gcc/internal-fn.c   2016-06-15 12:38:52.708677650 +0200
@@ -405,9 +405,23 @@ get_min_precision (tree 

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch introduces support for loop epilogue combining.  This includes
> support in cost estimation and all required changes required to mask
> vectorized loop.

I wonder why you compute a minimum number of iterations to make masking
of the vectorized body profitable rather than a maximum number of iterations.

I'd say masking the vectorized loop is profitable if niter/vf *
masking-overhead < epilogue-cost.
Masking the epilogue is profitable if vectorizing the epilogue with
masking is profitable.

Am I missing something?

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * dbgcnt.def (vect_tail_combine): New.
> * params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support 
> vect_mask_var.
> * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
> epilogue combined with loop body.
> (vect_do_peeling_for_loop_bound): Likewise.
> * tree-vect-loop.c Include alias.h and dbgcnt.h.
> (vect_estimate_min_profitable_iters): Add 
> ret_min_profitable_combine_niters
> arg, compute number of iterations for which loop epilogue combining is
> profitable.
> (vect_generate_tmps_on_preheader): Support combined apilogue.
> (vect_gen_ivs_for_masking): New.
> (vect_get_mask_index_for_elems): New.
> (vect_get_mask_index_for_type): New.
> (vect_gen_loop_masks): New.
> (vect_mask_reduction_stmt): New.
> (vect_mask_mask_load_store_stmt): New.
> (vect_mask_load_store_stmt): New.
> (vect_combine_loop_epilogue): New.
> (vect_transform_loop): Support combined apilogue.
>
>
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 78ddcc2..73c2966 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
>  DEBUG_COUNTER (tree_sra)
>  DEBUG_COUNTER (vect_loop)
>  DEBUG_COUNTER (vect_slp)
> +DEBUG_COUNTER (vect_tail_combine)
>  DEBUG_COUNTER (dom_unreachable_edges)
> diff --git a/gcc/params.def b/gcc/params.def
> index 62a1e40..98d6c5a 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1220,6 +1220,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
>   "Maximum number of may-defs visited when devirtualizing "
>   "speculatively", 50, 0, 0)
>
> +DEFPARAM (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD,
> + "vect-cost-increase-combine-threshold",
> + "Cost increase threshold to mask main loop for epilogue.",
> + 10, 0, 300)
> +
>  /*
>
>  Local variables:
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index f275933..c5bdeb9 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -4000,6 +4000,9 @@ vect_get_new_ssa_name (tree type, enum vect_var_kind 
> var_kind, const char *name)
>case vect_scalar_var:
>  prefix = "stmp";
>  break;
> +  case vect_mask_var:
> +prefix = "mask";
> +break;
>case vect_pointer_var:
>  prefix = "vectp";
>  break;
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index fab5879..b3c0668 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1195,6 +1195,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
> struct loop *scalar_loop,
>int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>int probability_of_second_loop;
> +  bool skip_second_after_first = false;
>
>if (!slpeel_can_duplicate_loop_p (loop, e))
>  return NULL;
> @@ -1393,7 +1394,11 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
> struct loop *scalar_loop,
>  {
>loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
>tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
> -  unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
> +  unsigned limit = 0;
> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> +   skip_second_after_first = true;
> +  else
> +   limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
>if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> limit = limit + 1;
>if (check_profitability
> @@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
> struct loop *scalar_loop,
>bb_between_loops = new_exit_bb;
>bb_after_second_loop = split_edge (single_exit (second_loop));
>
> -  pre_condition =
> -   fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
> -  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
> -  bb_after_second_loop, bb_before_first_loop,
> - inverse_probability 
> (second_guard_probability));
> +  if (skip_second_after_first)
> +/* We can 

Re: [PATCH] Fix code emission for FAIL_ALLOC predictor

2016-06-15 Thread Martin Liška
Adding missing patch.

Martin
>From 35ba97e0139d955c04e67ca157f8899bbb468bf1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 9 Jun 2016 17:51:38 +0200
Subject: [PATCH] Fix code emission for FAIL_ALLOC predictor

gcc/ChangeLog:

2016-06-13  Martin Liska  

	* predict.def: Define a new predictor.

gcc/fortran/ChangeLog:

2016-06-13  Martin Liska  

	* trans-array.c (gfc_array_allocate): Do not generate expect
	stmt.
	* trans.c (gfc_allocate_using_malloc): Properly set FAIL_ALLOC
	predictor for malloc return value.
	(gfc_allocate_allocatable): Use REALLOC predictor instead of
	FAIL_ALLOC.
	(gfc_deallocate_with_status): Likewise.
---
 gcc/fortran/trans-array.c |  2 +-
 gcc/fortran/trans.c   | 10 --
 gcc/predict.def   | 15 +--
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 403ce3a..e95c8dd 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -5553,7 +5553,7 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree status, tree errmsg,
 			  build_int_cst (TREE_TYPE (status), 0));
   gfc_add_expr_to_block (>pre,
 		 fold_build3_loc (input_location, COND_EXPR, void_type_node,
-  gfc_likely (cond, PRED_FORTRAN_FAIL_ALLOC),
+  cond,
   set_descriptor,
   build_empty_stmt (input_location)));
 }
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index c6688d3..d6b4a56 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -672,9 +672,6 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer,
   gfc_start_block (_error);
   if (status != NULL_TREE)
 {
-  gfc_add_expr_to_block (_error,
-			 build_predict_expr (PRED_FORTRAN_FAIL_ALLOC,
-		 NOT_TAKEN));
   tmp = fold_build2_loc (input_location, MODIFY_EXPR, status_type, status,
 			 build_int_cst (status_type, LIBERROR_ALLOCATION));
   gfc_add_expr_to_block (_error, tmp);
@@ -693,7 +690,8 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer,
 boolean_type_node, pointer,
 build_int_cst (prvoid_type_node, 0));
   tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
-			 error_cond, gfc_finish_block (_error),
+			 gfc_unlikely (error_cond, PRED_FORTRAN_FAIL_ALLOC),
+			 gfc_finish_block (_error),
 			 build_empty_stmt (input_location));
 
   gfc_add_expr_to_block (block, tmp);
@@ -796,7 +794,7 @@ gfc_allocate_allocatable (stmtblock_t * block, tree mem, tree size, tree token,
   null_mem = gfc_unlikely (fold_build2_loc (input_location, NE_EXPR,
 	boolean_type_node, mem,
 	build_int_cst (type, 0)),
-			   PRED_FORTRAN_FAIL_ALLOC);
+			   PRED_FORTRAN_REALLOC);
 
   /* If mem is NULL, we call gfc_allocate_using_malloc or
  gfc_allocate_using_lib.  */
@@ -1385,7 +1383,7 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	  cond2 = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
    stat, build_zero_cst (TREE_TYPE (stat)));
 	  tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
- gfc_unlikely (cond2, PRED_FORTRAN_FAIL_ALLOC),
+ gfc_unlikely (cond2, PRED_FORTRAN_REALLOC),
  tmp, build_empty_stmt (input_location));
 	  gfc_add_expr_to_block (_null, tmp);
 	}
diff --git a/gcc/predict.def b/gcc/predict.def
index c0a3f36..da4f9ab 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -163,12 +163,15 @@ DEF_PREDICTOR (PRED_FORTRAN_OVERFLOW, "overflow", PROB_ALWAYS,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Branch leading to a failure status are unlikely.  This can occur for out
-   of memory or when trying to allocate an already allocated allocated or
-   deallocating an already deallocated allocatable.  This predictor only
-   occurs when the user explicitly asked for a return status.  By default,
-   the code aborts, which is handled via PRED_NORETURN.
-   FIXME: the hitrate really ought to be close to 100%.  */
-DEF_PREDICTOR (PRED_FORTRAN_FAIL_ALLOC, "fail alloc", HITRATE (62), 0)
+   of memory.  This predictor only occurs when the user explicitly asked
+   for a return status.  By default, the code aborts,
+   which is handled via PRED_NORETURN.  */
+DEF_PREDICTOR (PRED_FORTRAN_FAIL_ALLOC, "fail alloc", PROB_VERY_LIKELY, 0)
+
+/* Predictor is used for an allocation of an already allocated memory or
+   deallocating an already deallocated allocatable.  */
+DEF_PREDICTOR (PRED_FORTRAN_REALLOC, "repeated allocation/deallocation", \
+	   PROB_LIKELY, 0)
 
 /* Branch leading to an I/O failure status are unlikely.  This predictor is
used for I/O failures such as for invalid unit numbers.  This predictor
-- 
2.8.3



Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:42 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch introduces analysis to determine if loop can be masked
> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
> and compute how much masking costs.

Maybe in a different patch, but it looks like you assume say, a
division, does not need
masking.

Code-generation-wise we'd add a new iv starting with

 iv = { 0, 1, 2, 3 };

and the mask is computed by comparing that against {niter, niter, niter, niter}?

So if we need masks for different vector element counts we could also add
additional IVs rather than "widening"/"shortening" the comparison result.
cond-expr reduction does this kind of IV as well which is a chance to share
some code (eventually).

You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
this is meaningful (if then only by accident).  I think you should look at the
control IV itself, possibly it's value-range, to determine the smallest possible
type to use.

Finally we have a related missed optimization opportunity, namely avoiding
peeling for gaps if we mask the last load of the group (profitability depends
on the overhead of such masking of course as it would be done in the main
vectorized loop).

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * tree-vect-loop.c: Include insn-config.h and recog.h.
> (vect_check_required_masks_widening): New.
> (vect_check_required_masks_narrowing): New.
> (vect_get_masking_iv_elems): New.
> (vect_get_masking_iv_type): New.
> (vect_get_extreme_masks): New.
> (vect_check_required_masks): New.
> (vect_analyze_loop_operations): Add vect_check_required_masks
> call to compute LOOP_VINFO_CAN_BE_MASKED.
> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
> LOOP_VINFO_NEED_MASKING before starting over.
> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
> masking cost.
> * tree-vect-stmts.c (can_mask_load_store): New.
> (vect_model_load_masking_cost): New.
> (vect_model_store_masking_cost): New.
> (vect_model_simple_masking_cost): New.
> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
> and masking cost.
> (vectorizable_simd_clone_call): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (vect_stmt_should_be_masked_for_epilogue): New.
> (vect_add_required_mask_for_stmt): New.
> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
> * tree-vectorizer.h (vect_model_load_masking_cost): New.
> (vect_model_store_masking_cost): New.
> (vect_model_simple_masking_cost): New.
>
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index e25a0ce..31360d3 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "ssa.h"
>  #include "optabs-tree.h"
> +#include "insn-config.h"
> +#include "recog.h" /* FIXME: for insn_data */
>  #include "diagnostic-core.h"
>  #include "fold-const.h"
>  #include "stor-layout.h"
> @@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
>  vectorization_factor);
>  }
>
> +/* Function vect_check_required_masks_widening.
> +
> +   Return 1 if vector mask of type MASK_TYPE can be widened
> +   to a type having REQ_ELEMS elements in a single vector.  */
> +
> +static bool
> +vect_check_required_masks_widening (loop_vec_info loop_vinfo,
> +   tree mask_type, unsigned req_elems)
> +{
> +  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
> +
> +  gcc_assert (mask_elems > req_elems);
> +
> +  /* Don't convert if it requires too many intermediate steps.  */
> +  int steps = exact_log2 (mask_elems / req_elems);
> +  if (steps > MAX_INTERM_CVT_STEPS + 1)
> +return false;
> +
> +  /* Check we have conversion support for given mask mode.  */
> +  machine_mode mode = TYPE_MODE (mask_type);
> +  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
> +  if (icode == CODE_FOR_nothing
> +  || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
> +return false;
> +
> +  /* Make recursive call for multi-step conversion.  */
> +  if (steps > 1)
> +{
> +  mask_elems = mask_elems >> 1;
> +  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
> +  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +   return false;
> +
> +  if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
> +  req_elems))
> +   return false;
> +}
> +  else
> +{
> +  mask_type = build_truth_vector_type (req_elems, current_vector_size);
> +  if (TYPE_MODE 

Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:39 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch introduces changes required to run vectorizer on loop epilogue.
> This also enables epilogue vectorization using a vector of smaller size.

While the idea of epilogue vectorization sounds straight-forward the
implementation
is somewhat icky with all the ->aux stuff, "redundant" if-conversion
and loop iteration stuff.

So I was thinking of when epilogue vectorization is beneficial which
is obviously when
the overall loop trip count is low.  We are not good in optimizing for
that case generally
(too much peeling for alignment, using expensive avx256 vectorization,
etc.), so I wonder
if versioning for that case would be a better idea (performance-wise).

Thus - what cases were you looking at when deciding that vectorizing
the epilogue
(with a smaller vector size) is profitable?  Do other compilers
generally do this?

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * tree-if-conv.c (tree_if_conversion): Make public.
> * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
> try to enhance alignment for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
> created loop.
> * tree-vect-loop.c: include tree-if-conv.h.
> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
> loop->aux.
> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
> loop->aux.
> (vect_analyze_loop): Reset loop->aux.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization.  If-convert epilogue if required.
> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
> process and insert vectorized loop epilogues into this queue.
> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created
> loop.
> (vect_transform_loop): Return created loop.
>
>
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index c38e21b..41b6c99 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -2801,7 +2801,7 @@ ifcvt_local_dce (basic_block bb)
> profitability analysis.  Returns non-zero todo flags when something
> changed.  */
>
> -static unsigned int
> +unsigned int
>  tree_if_conversion (struct loop *loop)
>  {
>unsigned int todo = 0;
> diff --git a/gcc/tree-if-conv.h b/gcc/tree-if-conv.h
> new file mode 100644
> index 000..3a732c2
> --- /dev/null
> +++ b/gcc/tree-if-conv.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#ifndef GCC_TREE_IF_CONV_H
> +#define GCC_TREE_IF_CONV_H
> +
> +unsigned int tree_if_conversion (struct loop *);
> +
> +#endif  /* GCC_TREE_IF_CONV_H  */
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 7652e21..f275933 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -1595,7 +1595,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>/* Check if we can possibly peel the loop.  */
>if (!vect_can_advance_ivs_p (loop_vinfo)
>|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> -  || loop->inner)
> +  || loop->inner
> +  /* Required peeling was performed in prologue and
> +is not required for epilogue.  */
> +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>  do_peeling = false;
>
>if (do_peeling
> @@ -1875,7 +1878,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>
>do_versioning =
> optimize_loop_nest_for_speed_p (loop)
> -   && (!loop->inner); /* FORNOW */
> +   && (!loop->inner) /* FORNOW */
> +/* Required versioning was performed for the
> +  original loop and is not required for epilogue.  */
> +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>
>if (do_versioning)
>  {
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 7ec6dae..fab5879 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1742,9 +1742,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info 
> loop_vinfo, tree niters,
> NITERS / VECTORIZATION_FACTOR times 

Re: [PATCH] Optimize inserting value_type into std::vector

2016-06-15 Thread Jonathan Wakely

On 15/06/16 11:15 +0100, Jonathan Wakely wrote:

* include/bits/stl_vector.h (vector::_S_insert_aux_assign): Define
new overloaded functions.
* include/bits/vector.tcc (vector::_M_insert_aux): Use new functions
to avoid creating a redundant temporary.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc: New
test.

Tested x86_64-linux.

This improves our performance on Howard Hinnant's "insert vs emplace"
experiment at
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/insert_vs_emplace.html

With this small change there is no difference between emplacing or
using the relevant insert / push_back function. That also means we
beat libc++ in some cases, making us the bestest, whoo!


We still lose to libc++ in one case. For the "lvalue no reallocation"
test our insert and emplace are equal to libc++'s emplace, but
libc++'s insert is even better.

For "xvalue no reallocation" and "rvalue no reallocation" our insert
and emplace are equal to libc++'s insert, but libc++'s emplace is
worse. So either my patch is wrong or we've got a new minimum for the
emplace operations.

In all other cases our results are unchanged by this patch.

So if this patch goes in we fix the case that made it difficult for
Howard to give consistent advice, where emplace was better than insert
for libstdc++ but not the other implementations. With this patch
insert is always at least as good as emplace, for all implementations.




Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-06-15 Thread Richard Biener
On Tue, Jun 14, 2016 at 10:15 PM, Jason Merrill  wrote:
> As discussed in bug 71104, the C++ P0145 proposal specifies the evaluation
> order of certain operations:
>
> 1. a.b
> 2. a->b
> 3. a->*b
> 4. a(b1, b2, b3)
> 5. b @= a
> 6. a[b]
> 7. a << b
> 8. a >> b
>
> The second patch introduces a flag -fargs-in-order to control whether these
> orders are enforced on calls.  -fargs-in-order=1 enforces all but the
> ordering between function arguments in #4.
>
> The first patch implements #5 for the built-in assignment operator by
> changing the order of gimplification of MODIFY_EXPR in the back end, as
> richi was also thinking about doing to fix 71104.  This runs into problems
> with DECL_VALUE_EXPR variables, where is_gimple_reg can be true before
> gimplification and false afterward, so he checks for this situation in
> rhs_predicate_for.  richi, you said you were still working on 71104; is this
> patch OK to put in for now, or should I wait for something better?

I wasn't too happy about the rhs_predicate_for change and I was also worried
about generating a lot less optimal GIMPLE due to evaluating the predicate
on un-gimplified *to_p.  I wondered if we should simply gimplify *from_p
with is_gimple_mem_rhs_or_call unconditionally, then gimplify *to_p
and after that if (unmodified) rhs_predicate_for (*to_p) is !=
is_gimple_mem_rhs_or_call
re-gimplify *from_p to avoid this.  That should also avoid changing
rhs_predicate_for.

Not sure if that solves whatever you were running into with OpenMP.

I simply didn't have found the time to experiment with the above or even
validate my fear by say comparing .gimple dumps of cc1 files with/without
the gimplification order change.

Richard.

> For the moment the patch turns on full ordering by default so we can see
> what effect it has in routine benchmarking.  This will probably back off by
> the time of the GCC 7 release.
>
> In my own SPEC runs with an earlier version of the patch, most of the C++
> tests did not change significantly, but xalancbmk slowed down about 1%.  I'm
> running it again now with the current patch.
>
> Tested x86_64-pc-linux-gnu, applying second patch to trunk, is the first
> patch OK as well?
>
> Jason


Re: [patch, avr] Fix PR67353

2016-06-15 Thread Pitchumani Sivanupandi
On Mon, 2016-06-13 at 17:48 +0200, Georg-Johann Lay wrote:
> Pitchumani Sivanupandi schrieb:
> > 
> > $ avr-gcc test.c -Wno-misspelled-isr
> > $
> What about -Werror=misspelled-isr?

Updated patch.

> > 
> > diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> > index ba5cd91..587bdbc 100644
> > --- a/gcc/config/avr/avr.c
> > +++ b/gcc/config/avr/avr.c
> > @@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
> >   that the name of the function is "__vector_NN" so as to
> > catch
> >   when the user misspells the vector name.  */
> >  
> > -  if (!STR_PREFIX_P (name, "__vector"))
> > +  if ((!STR_PREFIX_P (name, "__vector")) &&
> > (avr_warn_misspelled_isr))
> >  warning_at (loc, 0, "%qs appears to be a misspelled %s
> > handler",
> If, instead of the "0" the respective OPT_... enum is used in the
> call 
> to warning_at, the -Werror= should work as expected (and explicit
> "&& 
> avr_warn_misspelled_isr" no more needed).

Ok. Updated patch as per the comments.

If OK, could someone commit please?

Regards,
Pitchumani

gcc/ChangeLog

2016-06-15  Pitchumani Sivanupandi  

PR target/67353
* config/avr/avr.c (avr_set_current_function): Warn misspelled
interrupt/ signal handler if -Wmisspelled-isr flag is enabled.
* config/avr/avr.opt (Wmisspelled-isr): New warning flag. Enabled
by default to warn misspelled interrupt/ signal handler.
* doc/invoke.texi (AVR Options): Document it. Update description
for -nodevicelib option.
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index ba5cd91..b327624 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -754,8 +754,8 @@ avr_set_current_function (tree decl)
  when the user misspells the vector name.  */
 
   if (!STR_PREFIX_P (name, "__vector"))
-warning_at (loc, 0, "%qs appears to be a misspelled %s handler",
-name, isr);
+warning_at (loc, OPT_Wmisspelled_isr, "%qs appears to be a misspelled "
+   "%s handler, missing __vector prefix", name, isr);
 }
 
   /* Don't print the above diagnostics more than once.  */
diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index 8809b9b..05aa4b6 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -91,6 +91,10 @@ Waddr-space-convert
 Warning C Report Var(avr_warn_addr_space_convert) Init(0)
 Warn if the address space of an address is changed.
 
+Wmisspelled-isr
+Warning C C++ Report Var(avr_warn_misspelled_isr) Init(1)
+Warn if the ISR is misspelled, i.e. without __vector prefix. Enabled by default.
+
 mfract-convert-truncate
 Target Report Mask(FRACT_CONV_TRUNC)
 Allow to use truncation instead of rounding towards 0 for fractional int types.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa11209..0bf39c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -640,7 +640,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert}
+-mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
+-Wmisspelled-isr}
 
 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14554,12 +14555,17 @@ Only change the lower 8@tie{}bits of the stack pointer.
 
 @item -nodevicelib
 @opindex nodevicelib
-Don't link against AVR-LibC's device specific library @code{libdev.a}.
+Don't link against AVR-LibC's device specific library @code{lib.a}.
 
 @item -Waddr-space-convert
 @opindex Waddr-space-convert
 Warn about conversions between address spaces in the case where the
 resulting address space is not contained in the incoming address space.
+
+@item -Wmisspelled-isr
+@opindex Wmisspelled-isr
+Warn if the ISR is misspelled, i.e. without __vector prefix.
+Enabled by default.
 @end table
 
 @subsubsection @code{EIND} and Devices with More Than 128 Ki Bytes of Flash


Re: [patch, avr] Fix PR67353

2016-06-15 Thread Pitchumani Sivanupandi
On Mon, 2016-06-13 at 17:48 +0200, Georg-Johann Lay wrote:
> Pitchumani Sivanupandi schrieb:
> > 
> > $ avr-gcc test.c -Wno-misspelled-isr
> > $
> What about -Werror=misspelled-isr?

Updated patch.

> > 
> > diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> > index ba5cd91..587bdbc 100644
> > --- a/gcc/config/avr/avr.c
> > +++ b/gcc/config/avr/avr.c
> > @@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
> >   that the name of the function is "__vector_NN" so as to
> > catch
> >   when the user misspells the vector name.  */
> >  
> > -  if (!STR_PREFIX_P (name, "__vector"))
> > +  if ((!STR_PREFIX_P (name, "__vector")) &&
> > (avr_warn_misspelled_isr))
> >  warning_at (loc, 0, "%qs appears to be a misspelled %s
> > handler",
> If, instead of the "0" the respective OPT_... enum is used in the
> call 
> to warning_at, the -Werror= should work as expected (and explicit
> "&& 
> avr_warn_misspelled_isr" no more needed).

Ok. Updated patch as per the comments.

If OK, could someone commit please?

Regards,
Pitchumani
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index ba5cd91..b327624 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -754,8 +754,8 @@ avr_set_current_function (tree decl)
  when the user misspells the vector name.  */
 
   if (!STR_PREFIX_P (name, "__vector"))
-warning_at (loc, 0, "%qs appears to be a misspelled %s handler",
-name, isr);
+warning_at (loc, OPT_Wmisspelled_isr, "%qs appears to be a misspelled "
+   "%s handler, missing __vector prefix", name, isr);
 }
 
   /* Don't print the above diagnostics more than once.  */
diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index 8809b9b..05aa4b6 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -91,6 +91,10 @@ Waddr-space-convert
 Warning C Report Var(avr_warn_addr_space_convert) Init(0)
 Warn if the address space of an address is changed.
 
+Wmisspelled-isr
+Warning C C++ Report Var(avr_warn_misspelled_isr) Init(1)
+Warn if the ISR is misspelled, i.e. without __vector prefix. Enabled by default.
+
 mfract-convert-truncate
 Target Report Mask(FRACT_CONV_TRUNC)
 Allow to use truncation instead of rounding towards 0 for fractional int types.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa11209..0bf39c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -640,7 +640,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert}
+-mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
+-Wmisspelled-isr}
 
 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14554,12 +14555,17 @@ Only change the lower 8@tie{}bits of the stack pointer.
 
 @item -nodevicelib
 @opindex nodevicelib
-Don't link against AVR-LibC's device specific library @code{libdev.a}.
+Don't link against AVR-LibC's device specific library @code{lib.a}.
 
 @item -Waddr-space-convert
 @opindex Waddr-space-convert
 Warn about conversions between address spaces in the case where the
 resulting address space is not contained in the incoming address space.
+
+@item -Wmisspelled-isr
+@opindex Wmisspelled-isr
+Warn if the ISR is misspelled, i.e. without __vector prefix.
+Enabled by default.
 @end table
 
 @subsubsection @code{EIND} and Devices with More Than 128 Ki Bytes of Flash


[PATCH] Optimize inserting value_type into std::vector

2016-06-15 Thread Jonathan Wakely

* include/bits/stl_vector.h (vector::_S_insert_aux_assign): Define
new overloaded functions.
* include/bits/vector.tcc (vector::_M_insert_aux): Use new functions
to avoid creating a redundant temporary.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc: New
test.

Tested x86_64-linux.

This improves our performance on Howard Hinnant's "insert vs emplace"
experiment at
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/insert_vs_emplace.html

With this small change there is no difference between emplacing or
using the relevant insert / push_back function. That also means we
beat libc++ in some cases, making us the bestest, whoo!

I originally wrote _S_insert_aux_arg functions which returned their
argument (either _Tp or _Tp&& as appropriate), relying on RVO to elide
the extra constructions, but that caused 23_containers/vector/40192.cc
to FAIL, so this patch passes the iterator into the new functions and
the assignment is done there.

Does anyone see any problem with this optimisation? I'm pretty sure
there are no cases where we actually need to create a temporary from
an expression that is already an rvalue of the correct type.

Howard's test code is CC BY 4.0, so I didn't add our usual GPL header
to the test file. I think the comments I added to the file meet the
requirements for attribution and indicating changes.

commit 22b2f2460e5508f2c993b41e31add9b10fb2794c
Author: Jonathan Wakely 
Date:   Tue Jul 29 10:10:14 2014 +0100

Optimize inserting value_type into std::vector

* include/bits/stl_vector.h (vector::_S_insert_aux_assign): Define
new overloaded functions.
* include/bits/vector.tcc (vector::_M_insert_aux): Use new functions
to avoid creating a redundant temporary.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc: New
test.

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 9b6d258..ec1e884 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1407,6 +1407,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_insert_aux(iterator __position, const value_type& __x);
 #else
   template
+   static void
+   _S_insert_aux_assign(iterator __pos, _Args&&... __args)
+   { *__pos =  _Tp(std::forward<_Args>(__args)...); }
+
+  static void
+  _S_insert_aux_assign(iterator __pos, _Tp&& __arg)
+  { *__pos = std::move(__arg); }
+
+  template
 void
 _M_insert_aux(iterator __position, _Args&&... __args);
 
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index 715b83e..d621804 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -342,7 +342,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus < 201103L
  *__position = __x_copy;
 #else
- *__position = _Tp(std::forward<_Args>(__args)...);
+ _S_insert_aux_assign(__position, std::forward<_Args>(__args)...);
 #endif
}
   else
diff --git 
a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc 
b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
new file mode 100644
index 000..39a3f03
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
@@ -0,0 +1,573 @@
+// { dg-options "-std=gnu++11" }
+
+// The class X and test code is by by Howard Hinnant and used under a
+// Creative Commons Attribution 4.0 International License.
+// http://creativecommons.org/licenses/by/4.0/
+// https://github.com/HowardHinnant/papers/blob/master/insert_vs_emplace.html
+//
+// The original code was reformatted and modified to use the VERIFY macro
+// instead of writing to standard output.
+
+#include 
+#include 
+#include 
+
+class X
+{
+  int i_;
+  int* p_;
+
+public:
+  struct special
+  {
+unsigned c;
+unsigned dt;
+unsigned cc;
+unsigned ca;
+unsigned mc;
+unsigned ma;
+  };
+  static special sp;
+
+  X(int i, int* p)
+: i_(i)
+  , p_(p)
+  {
+// std::cout << "X(int i, int* p)\n";
+sp.c++;
+  }
+
+  ~X()
+  {
+// std::cout << "~X()\n";
+sp.dt++;
+  }
+
+  X(const X& x)
+: i_(x.i_)
+  , p_(x.p_)
+  {
+// std::cout << "X(const X& x)\n";
+sp.cc++;
+  }
+
+  X& operator=(const X& x)
+  {
+
+i_ = x.i_;
+p_ = x.p_;
+// std::cout << "X& operator=(const X& x)\n";
+sp.ca++;
+return *this;
+  }
+
+  X(X&& x) noexcept
+: i_(x.i_)
+, p_(x.p_)
+{
+  // std::cout << "X(X&& x)\n";
+  sp.mc++;
+}
+
+  X& operator=(X&& x) noexcept
+  {
+
+i_ = x.i_;
+p_ = x.p_;
+// std::cout << "X& operator=(X&& x)\n";
+sp.ma++;
+return *this;
+  }
+
+};
+
+std::ostream&
+operator<<(std::ostream& os, X::special const& 

Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-06-15 Thread Bernd Schmidt

On 06/15/2016 03:30 AM, Alan Modra wrote:

Between these two calls to _gfortran_string_verify,
  if (verify(c4, "A", back = .true.) .ne. 3) call abort
  if (verify(c4, "AB") .ne. 0) call abort
it seems that gfortran is assuming that parameters passed on the stack
are unchanged.


How? Is this something in the Fortran frontend, or is there CSE going on 
for stores to the argument area?



Bernd


Re: [PATCH] PR 71439 - Only vectorize live PHIs that are inductions

2016-06-15 Thread Richard Biener
On Wed, Jun 15, 2016 at 10:49 AM, Alan Hayward  wrote:
> For a PHI to be used outside the loop it needs to be vectorized. However
> the
> vectorizer currently will only vectorize PHIs that are an induction.
>
> This patch fixes PR 71439 by only allowing a live PHI to be vectorized if
> it
> is an induction. In addition, live PHIs need to pass a
> vectorizable_live_operation check.
>
> Tested on x86 and aarch64.
> Ok to commit?

Ok.

Thanks,
Richard.

> Alan.
>
> gcc/
> PR tree-optimization/71439
> * tree-vect-loop.c (vect_analyze_loop_operations): Additional check 
> for
> live PHIs.
>
> testsuite/
> PR tree-optimization/71439
> * gcc.dg/vect/pr71439.c: New
>
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr71439.c
> b/gcc/testsuite/gcc.dg/vect/pr71439.c
> new file mode 100644
> index
> ..95e4763bad6e9f301d53c20ffa160b96b
> dad9a53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr71439.c
> @@ -0,0 +1,17 @@
> +#include "tree-vect.h"
> +
> +int a, b, c;
> +short fn1(int p1, int p2) { return p1 + p2; }
> +
> +int main() {
> +  a = 0;
> +  for (; a < 30; a = fn1(a, 4)) {
> +c = b;
> +b = 6;
> +  }
> +
> +  if (c != 6)
> +abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> 1231b95f6a71337833e8c4b24884da9f96a7b5bf..90ade75bcd212b542ad680877a79df717
> 751ff4b 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1669,7 +1669,8 @@ vect_analyze_loop_operations (loop_vec_info
> loop_vinfo)
>
>gcc_assert (stmt_info);
>
> -  if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
> +  if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
> +   || STMT_VINFO_LIVE_P (stmt_info))
>&& STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
>  {
>/* A scalar-dependence cycle that we don't support.  */
> @@ -1686,6 +1687,9 @@ vect_analyze_loop_operations (loop_vec_info
> loop_vinfo)
>  ok = vectorizable_induction (phi, NULL, NULL);
>  }
>
> + if (ok && STMT_VINFO_LIVE_P (stmt_info))
> +   ok = vectorizable_live_operation (phi, NULL, NULL, -1, NULL);
> +
>if (!ok)
>  {
>if (dump_enabled_p ())
>
>
>
>


[Patch AArch64 2/2]Add missing vcond by rewriting it with vcond_mask/vec_cmp patterns.

2016-06-15 Thread Bin Cheng
Hi,
This is the second patch.  It rewrites vcond patterns using vcond_mask/vec_cmp 
patterns introduced in the first one.  It also implements vcond patterns which 
were missing in the current AArch64 backend.  After this patch, I have a simple 
follow up change enabling testing requirement "vect_cond_mixed" on AArch64, 
which will enable various tests.

Bootstrap & test along with the first patch on AArch64, is it OK?

Thanks,
bin

2016-06-07  Alan Lawrence  
Renlin Li  
Bin Cheng  

* config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
* config/aarch64/aarch64-simd.md (v2di3): Call
gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
(aarch64_vcond_internal): Delete pattern.
(aarch64_vcond_internal): Ditto.
(vcond): Re-implement using vec_cmp and vcond_mask.
(vcondu): Ditto.
(vcond): Delete.
(vcond): New pattern.
(vcondu): New pattern.
(aarch64_cmtst): Revise comment using aarch64_vcond instead
of aarch64_vcond_internal.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..e080b71 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1053,7 +1053,7 @@
 }
 
   cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
-  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
+  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
   operands[2], cmp_fmt, operands[1], operands[2]));
   DONE;
 })
@@ -2202,314 +2202,6 @@
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
-   (if_then_else:VSDQ_I_DI
- (match_operator 3 "comparison_operator"
-   [(match_operand:VSDQ_I_DI 4 "register_operand")
-(match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
- (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
- (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
-  "TARGET_SIMD"
-{
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (mode);
-  enum rtx_code code = GET_CODE (operands[3]);
-
-  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
- and desirable for other comparisons if it results in FOO ? -1 : 0
- (this allows direct use of the comparison result without a bsl).  */
-  if (code == NE
-  || (code != EQ
- && op1 == CONST0_RTX (mode)
- && op2 == CONSTM1_RTX (mode)))
-{
-  op1 = operands[2];
-  op2 = operands[1];
-  switch (code)
-{
-case LE: code = GT; break;
-case LT: code = GE; break;
-case GE: code = LT; break;
-case GT: code = LE; break;
-/* No case EQ.  */
-case NE: code = EQ; break;
-case LTU: code = GEU; break;
-case LEU: code = GTU; break;
-case GTU: code = LEU; break;
-case GEU: code = LTU; break;
-default: gcc_unreachable ();
-}
-}
-
-  /* Make sure we can handle the last operand.  */
-  switch (code)
-{
-case NE:
-  /* Normalized to EQ above.  */
-  gcc_unreachable ();
-
-case LE:
-case LT:
-case GE:
-case GT:
-case EQ:
-  /* These instructions have a form taking an immediate zero.  */
-  if (operands[5] == CONST0_RTX (mode))
-break;
-  /* Fall through, as may need to load into register.  */
-default:
-  if (!REG_P (operands[5]))
-operands[5] = force_reg (mode, operands[5]);
-  break;
-}
-
-  switch (code)
-{
-case LT:
-  emit_insn (gen_aarch64_cmlt (mask, operands[4], operands[5]));
-  break;
-
-case GE:
-  emit_insn (gen_aarch64_cmge (mask, operands[4], operands[5]));
-  break;
-
-case LE:
-  emit_insn (gen_aarch64_cmle (mask, operands[4], operands[5]));
-  break;
-
-case GT:
-  emit_insn (gen_aarch64_cmgt (mask, operands[4], operands[5]));
-  break;
-
-case LTU:
-  emit_insn (gen_aarch64_cmgtu (mask, operands[5], operands[4]));
-  break;
-
-case GEU:
-  emit_insn (gen_aarch64_cmgeu (mask, operands[4], operands[5]));
-  break;
-
-case LEU:
-  emit_insn (gen_aarch64_cmgeu (mask, operands[5], operands[4]));
-  break;
-
-case GTU:
-  emit_insn (gen_aarch64_cmgtu (mask, operands[4], operands[5]));
-  break;
-
-/* NE has been normalized to EQ above.  */
-case EQ:
-  emit_insn (gen_aarch64_cmeq (mask, operands[4], operands[5]));
-  break;
-
-default:
-  gcc_unreachable ();
-}
-
-/* If we have (a = (b CMP c) ? -1 : 0);
-   Then we can simply move the generated mask.  */
-
-if (op1 == CONSTM1_RTX (mode)
-   && op2 == CONST0_RTX (mode))
-  emit_move_insn (operands[0], mask);
-else
-  {
-   if (!REG_P (op1))
- op1 = force_reg (mode, 

[Patch AArch64 1/2]Implement vcond_mask/vec_cmp patterns.

2016-06-15 Thread Bin Cheng
Hi,
According to review comments, I split the original patch @ 
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01182.html into two, as well as 
refined the comments.  Here is the first one implementing vcond_mask/vec_cmp 
patterns on AArch64.  These new patterns will be used in the second patch for 
vcond.

Bootstrap & test on AArch64.  Is it OK?

2016-06-07  Alan Lawrence  
Renlin Li  
Bin Cheng  

* config/aarch64/aarch64-simd.md (vec_cmp): New pattern.
(vec_cmp_internal): New pattern.
(vec_cmp): New pattern.
(vec_cmp_internal): New pattern.
(vec_cmpu): New pattern.
(vcond_mask_): New pattern.
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..9437d02 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2202,6 +2202,325 @@
   DONE;
 })
 
+(define_expand "vcond_mask_"
+  [(match_operand:VALLDI 0 "register_operand")
+   (match_operand:VALLDI 1 "nonmemory_operand")
+   (match_operand:VALLDI 2 "nonmemory_operand")
+   (match_operand: 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  /* If we have (a = (P) ? -1 : 0);
+ Then we can simply move the generated mask (result must be int).  */
+  if (operands[1] == CONSTM1_RTX (mode)
+  && operands[2] == CONST0_RTX (mode))
+emit_move_insn (operands[0], operands[3]);
+  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
+  else if (operands[1] == CONST0_RTX (mode)
+  && operands[2] == CONSTM1_RTX (mode))
+emit_insn (gen_one_cmpl2 (operands[0], operands[3]));
+  else
+{
+  if (!REG_P (operands[1]))
+   operands[1] = force_reg (mode, operands[1]);
+  if (!REG_P (operands[2]))
+   operands[2] = force_reg (mode, operands[2]);
+  emit_insn (gen_aarch64_simd_bsl (operands[0], operands[3],
+operands[1], operands[2]));
+}
+
+  DONE;
+})
+
+;; Patterns comparing two vectors to produce a mask.
+
+;; Internal pattern for vec_cmp.  It returns expected result mask for
+;; comparison operators other than NE.  For NE operator, it returns
+;; the opposite result mask.  This is intended behavior so that we
+;; can save one mask inverting instruction when using this pattern in
+;; vcond patterns.  In this case, it is the caller's responsibility
+;; to interpret and use the result mask correctly.
+(define_expand "vec_cmp_internal"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+ (match_operator 1 "comparison_operator"
+   [(match_operand:VSDQ_I_DI 2 "register_operand")
+(match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  rtx mask = operands[0];
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  if (operands[3] == CONST0_RTX (mode)
+  && (code == NE || code == LE || code == LT
+ || code == GE || code == GT || code == EQ))
+{
+  /* Some instructions have a form taking an immediate zero.  */
+  ;
+}
+  else if (!REG_P (operands[3]))
+{
+  /* Make sure we can handle the last operand.  */
+  operands[3] = force_reg (mode, operands[3]);
+}
+
+  switch (code)
+{
+case LT:
+  emit_insn (gen_aarch64_cmlt (mask, operands[2], operands[3]));
+  break;
+
+case GE:
+  emit_insn (gen_aarch64_cmge (mask, operands[2], operands[3]));
+  break;
+
+case LE:
+  emit_insn (gen_aarch64_cmle (mask, operands[2], operands[3]));
+  break;
+
+case GT:
+  emit_insn (gen_aarch64_cmgt (mask, operands[2], operands[3]));
+  break;
+
+case LTU:
+  emit_insn (gen_aarch64_cmgtu (mask, operands[3], operands[2]));
+  break;
+
+case GEU:
+  emit_insn (gen_aarch64_cmgeu (mask, operands[2], operands[3]));
+  break;
+
+case LEU:
+  emit_insn (gen_aarch64_cmgeu (mask, operands[3], operands[2]));
+  break;
+
+case GTU:
+  emit_insn (gen_aarch64_cmgtu (mask, operands[2], operands[3]));
+  break;
+
+case NE:
+  /* See comment at the beginning of this pattern, we return the
+opposite of result mask for this operator, and it's caller's
+resonsibility to invert the mask.
+
+Fall through.  */
+
+case EQ:
+  emit_insn (gen_aarch64_cmeq (mask, operands[2], operands[3]));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Don't invert result mask for NE, which should be done in caller.  */
+
+  DONE;
+})
+
+(define_expand "vec_cmp"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+ (match_operator 1 "comparison_operator"
+   [(match_operand:VSDQ_I_DI 2 "register_operand")
+(match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp_internal (operands[0],
+operands[1], operands[2],
+  

Re: [C++ Patch] One more error + error to error + inform and a subtler issue

2016-06-15 Thread Paolo Carlini

Hi,

On 15/06/2016 03:30, Jason Merrill wrote:

On Tue, Jun 14, 2016 at 6:12 PM, Paolo Carlini  wrote:

constexpr-specialization.C:7:26: error: redeclaration ‘constexpr int foo(T)
[with T = int]’ differs in ‘constexpr’

constexpr-specialization.C:6:16: error: from previous declaration ‘constexpr
int foo(T) [with T = int]’

see? The pretty printing of the previous declaration is very misleading
because it has constexpr in it!

I'm guessing this happens because we're printing the type of the
template plus template arguments, and the template is declared
constexpr.  That should be easy enough to fix in dump_function_decl by
looking at the instantiation when deciding whether to print
"constexpr".
Indeed, it seems very easy to do and dump_function_decl should be fixed 
anyway, outside this specific error message. Also, we already had a 
similar issue for the exceptions. Thus the below, tested x86_64-linux.


Thanks,
Paolo.

//
/cp
2016-06-15  Paolo Carlini  

* decl.c (validate_constexpr_redeclaration): Change pair of errors
to error + inform.
* error.c (dump_function_decl): Save the constexpr specifier too.

/testsuite
2016-06-15  Paolo Carlini  

* g++.dg/cpp0x/constexpr-specialization.C: Adjust for dg-message
vs dg-error; test constexpr specifier too.
Index: cp/decl.c
===
--- cp/decl.c   (revision 237472)
+++ cp/decl.c   (working copy)
@@ -1278,8 +1278,11 @@ validate_constexpr_redeclaration (tree old_decl, t
  && DECL_TEMPLATE_SPECIALIZATION (new_decl))
return true;
 
-  error ("redeclaration %q+D differs in %", new_decl);
-  error ("from previous declaration %q+D", old_decl);
+  error_at (DECL_SOURCE_LOCATION (new_decl),
+   "redeclaration %qD differs in % "
+   "from previous declaration", new_decl);
+  inform (DECL_SOURCE_LOCATION (old_decl),
+ "previous declaration %qD", old_decl);
   return false;
 }
   return true;
Index: cp/error.c
===
--- cp/error.c  (revision 237472)
+++ cp/error.c  (working copy)
@@ -1486,6 +1486,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t
   int show_return = flags & TFF_RETURN_TYPE || flags & TFF_DECL_SPECIFIERS;
   int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME);
   tree exceptions;
+  bool constexpr_p;
 
   flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME);
   if (TREE_CODE (t) == TEMPLATE_DECL)
@@ -1495,6 +1496,10 @@ dump_function_decl (cxx_pretty_printer *pp, tree t
  emitting an error about incompatible specifications.  */
   exceptions = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (t));
 
+  /* Likewise for the constexpr specifier, in case t is a specialization
+ and we are emitting an error about an incompatible redeclaration.  */
+  constexpr_p = DECL_DECLARED_CONSTEXPR_P (t);
+
   /* Pretty print template instantiations only.  */
   if (DECL_USE_TEMPLATE (t) && DECL_TEMPLATE_INFO (t)
   && flag_pretty_templates)
@@ -1529,7 +1534,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t
   else if (DECL_VIRTUAL_P (t))
pp_cxx_ws_string (pp, "virtual");
 
-  if (DECL_DECLARED_CONSTEXPR_P (t))
+  if (constexpr_p)
pp_cxx_ws_string (pp, "constexpr");
 }
 
Index: testsuite/g++.dg/cpp0x/constexpr-specialization.C
===
--- testsuite/g++.dg/cpp0x/constexpr-specialization.C   (revision 237472)
+++ testsuite/g++.dg/cpp0x/constexpr-specialization.C   (working copy)
@@ -3,10 +3,10 @@
 
 template constexpr int foo(T);
 template<> int foo(int);
-template<> int foo(int);// { dg-error "previous" }
-template<> constexpr int foo(int);  // { dg-error "redeclaration" }
+template<> int foo(int);// { dg-message "previous declaration 'int 
foo" }
+template<> constexpr int foo(int);  // { dg-error "redeclaration 'constexpr 
int foo" }
 
 template int bar(T);
 template<> constexpr int bar(int);
-template<> constexpr int bar(int);  // { dg-error "previous" }
-template<> int bar(int);// { dg-error "redeclaration" }
+template<> constexpr int bar(int);  // { dg-message "previous declaration 
'constexpr int bar" }
+template<> int bar(int);// { dg-error "redeclaration 'int bar" }


[patch,avr] ad PR71103: also handle QImode SUBREGs of CONST

2016-06-15 Thread Georg-Johann Lay
This patch handles the cases when subreg:QI of a CONST or LABEL_REF is to be 
moved to a QImode register.  The original patch only handled SYMBOL_REFs.


OK for trunk and backport?


Johann

--

 gcc/
PR target/71103
* config/avr/avr.md (movqi): Handle loading subreg:qi (const).

gcc/testsuite/
PR target/71103
* gcc.target/avr/torture/pr71103-2.c: New test.

Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 237472)
+++ config/avr/avr.md	(working copy)
@@ -638,16 +638,24 @@
 rtx dest = operands[0];
 rtx src  = avr_eval_addr_attrib (operands[1]);
 
-if (SUBREG_P(src) && (GET_CODE(XEXP(src,0)) == SYMBOL_REF) &&
-can_create_pseudo_p())
-  {
-rtx symbol_ref = XEXP(src, 0);
-XEXP (src, 0) = copy_to_mode_reg (GET_MODE(symbol_ref), symbol_ref);
-  }
-
 if (avr_mem_flash_p (dest))
   DONE;
 
+if (QImode == mode
+&& SUBREG_P (src)
+&& CONSTANT_ADDRESS_P (SUBREG_REG (src)))
+{
+// store_bitfield may want to store a SYMBOL_REF or CONST in a
+// structure that's represented as PSImode.  As the upper 16 bits
+// of PSImode cannot be expressed as an HImode subreg, the rhs is
+// decomposed into QImode (word_mode) subregs of SYMBOL_REF,
+// CONST or LABEL_REF; cf. PR71103.
+
+rtx const_addr = SUBREG_REG (src);
+operands[1] = src = copy_rtx (src);
+SUBREG_REG (src) = copy_to_mode_reg (GET_MODE (const_addr), const_addr);
+  }
+
 /* One of the operands has to be in a register.  */
 if (!register_operand (dest, mode)
 && !reg_or_0_operand (src, mode))
Index: testsuite/gcc.target/avr/torture/pr71103-2.c
===
--- testsuite/gcc.target/avr/torture/pr71103-2.c	(nonexistent)
+++ testsuite/gcc.target/avr/torture/pr71103-2.c	(working copy)
@@ -0,0 +1,118 @@
+/* Use -g0 so that this test case doesn't just fail because
+   of PR52472.  */
+
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99 -g0" } */
+
+struct S12
+{
+  char c;
+  const char *p;
+};
+
+struct S12f
+{
+  char c;
+  struct S12f (*f)(void);
+};
+
+struct S12labl
+{
+  char c;
+  void **labl;
+};
+
+struct S121
+{
+  char c;
+  const char *p;
+  char d;
+};
+
+const char str[5] = "abcd";
+
+struct S12 test_S12_0 (void)
+{
+  struct S12 s;
+  s.c = 'A';
+  s.p = str;
+  return s;
+}
+
+struct S12 test_S12_4 (void)
+{
+  struct S12 s;
+  s.c = 'A';
+  s.p = str + 4;
+  return s;
+}
+
+struct S12f test_S12f (void)
+{
+  struct S12f s;
+  s.c = 'A';
+  s.f = test_S12f;
+  return s;
+}
+
+struct S121 test_S121 (void)
+{
+  struct S121 s;
+  s.c = 'c';
+  s.p = str + 4;
+  s.d = 'd';
+  return s;
+}
+
+extern void use_S12lab (struct S12labl*);
+
+struct S12labl test_S12lab (void)
+{
+  struct S12labl s;
+labl:;
+  s.c = 'A';
+  s.labl = &
+  return s;
+}
+
+#ifdef __MEMX
+
+struct S13
+{
+  char c;
+  const __memx char *p;
+};
+
+const __memx char str_x[] = "abcd";
+
+struct S13 test_S13_0 (void)
+{
+  struct S13 s;
+  s.c = 'A';
+  s.p = str_x;
+  return s;
+}
+
+struct S13 test_S13_4a (void)
+{
+  struct S13 s;
+  s.c = 'A';
+  s.p = str_x + 4;
+  return s;
+}
+
+#ifdef __FLASH1
+
+const __flash1 char str_1[] = "abcd";
+
+struct S13 test_13_4b (void)
+{
+  struct S13 s;
+  s.c = 'A';
+  s.p = str_1 + 4;
+  return s;
+}
+
+#endif /* have __flash1 */
+#endif /* have __memx */
+


Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-15 Thread Richard Biener
On Tue, 14 Jun 2016, Bill Schmidt wrote:

> Hi Richard,
> 
> As nobody else has replied, let me take a stab at this one.
> 
> > On Jun 10, 2016, at 2:06 AM, Richard Biener  wrote:
> > 
> > On Thu, 9 Jun 2016, Michael Meissner wrote:
> > 
> >> I'm including the global reviewers on the list.  I just want to be sure 
> >> that
> >> there is no problem installing these patches on the GCC 6.2 branch.  While 
> >> it
> >> is technically an enchancement, it is needed to be able to install the 
> >> glibc
> >> support that is needed to complete the work to add IEEE 128-bit floating 
> >> point.
> >> 
> >> The issue being fixed is that when we are creating the complex type, we 
> >> used to
> >> do a lookup for the size, and that fails on the PowerPC which has 2 128-bit
> >> floating point types (__ibm128 and __float128, with long double currently
> >> defaulting to __ibm128).
> > 
> > As this enhancement includes middle-end changes I am hesitant to approve
> > it for the branch.  Why is it desirable to backport this change?
> 
> It comes down to community requirements and schedules.  We are in the process 
> of
> replacing the incompatible IBM long double type with true IEEE-754 128-bit 
> floating
> point (__float128).  This is a complex multi-stage process where we will have 
> to
> maintain the functionality of the existing IBM long double for backward 
> compatibility
> while the new type is implemented.  This impacts multiple packages, starting 
> with
> gcc and glibc.
> 
> The glibc maintainers have indicated that work there depends on a certain 
> level of
> functionality within GCC.  Specifically, both the old and new types must be 
> supported,
> including corresponding complex types.  Unfortunately the realization that 
> the complex
> types had to be supported came late, and this work didn't fully make it into 
> GCC 6.1.
> 
> (Part of the problem that has made this whole effort difficult is that it is 
> complicated to
> maintain two floating-point types of the exact same size.)
> 
> In any case, the glibc maintainers require this work in GCC 6 so that work 
> can begin
> in glibc 2.24, with completion scheduled in glibc 2.25.  We are asking for an 
> exception 
> for this patch in order to allow those schedules to move forward.
> 
> So that's the history as I understand it... Perhaps others can jump in if 
> I've munged
> or omitted anything important.

Ok, I see the reason for the backport then.  Looking at the patch
the only fragile thing is the layout_type change give it adds an assert
and you did need to change frontends (but not all of them?).  I'm not
sure about testsuite coverage for complex type for, say, Go or Ada
or Java.

And I don't understand the layout_type change either - it looks to me
it could just have used

  SET_TYPE_MODE (type, GET_MODE_COMPLEX_MODE (TYPE_MODE 
(TREE_TYPE (type;

and be done with it.  To me that looks a lot safer.

With now having two complex FP modes with the same size how does
iteration over MODE_COMPLEX_FLOAT work with GET_MODE_WIDER_MODE?
Is the outcome random?  Or do we visit both modes?  That is, could
GET_MODE_COMPLEX_MODE be implemented with iterating over complex modes
and in addition to size also match the component mode instead?

[I realize this is just backports but at least the layout_type change
looks dangerous to me]

Thanks,
Richard.

> Thanks!
> Bill
> 
> > 
> > Thanks,
> > Richard.
> > 
> >> On Fri, Jun 03, 2016 at 09:33:35AM -0400, Michael Meissner wrote:
> >>> These patches were installed on the trunk on May 2nd, with a fix from Alan
> >>> Modra on May 11th.  Unless I here objections in the next few days, I will
> >>> commit these changes to the GCC 6.x branch.  These changes will allow 
> >>> people to
> >>> use complex __float128 types (via an attribute) on the PowerPC.
> >>> 
> >>> Note, we will need patches to libgcc to fully enable complex __float128 
> >>> support
> >>> on the PowerPC.  These patches enable the compiler support, so that the 
> >>> libgcc
> >>> changes can be coded.
> >>> 
> >>> In addition to bootstrapping and regtesting on the PowerPC (little endian
> >>> power8), I also bootstrapped and regested the changes on x86_64 running 
> >>> RHEL
> >>> 6.2.  There were no regressions in either case.
> >>> 
> >>> [gcc]
> >>> 2016-06-02  Michael Meissner  
> >>> 
> >>>   Back port from trunk
> >>>   2016-05-11  Alan Modra  
> >>> 
> >>>   * config/rs6000/rs6000.c (is_complex_IBM_long_double,
> >>>   abi_v4_pass_in_fpr): New functions.
> >>>   (rs6000_function_arg_boundary): Exclude complex IBM long double
> >>>   from 64-bit alignment when ABI_V4.
> >>>   (rs6000_function_arg, rs6000_function_arg_advance_1,
> >>>   rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.
> >>> 
> >>>   Back port from trunk
> >>>   2016-05-02  Michael Meissner  
> >>> 
> >>>   * machmode.h (mode_complex): Add support to give the complex mode
> >>>   for a 

[7/7] Add negative and zero strides to vect_memory_access_type

2016-06-15 Thread Richard Sandiford
This patch uses the vect_memory_access_type from patch 6 to represent
the effect of a negative contiguous stride or a zero stride.  The latter
is valid only for loads.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_memory_access_type): Add
VMAT_INVARIANT, VMAT_CONTIGUOUS_DOWN and VMAT_CONTIGUOUS_REVERSED.
* tree-vect-stmts.c (compare_step_with_zero): New function.
(perm_mask_for_reverse): Move further up file.
(get_group_load_store_type): Stick to VMAT_ELEMENTWISE if the
step is negative.
(get_negative_load_store_type): New function.
(get_load_store_type): Call it.  Add an ncopies argument.
(vectorizable_mask_load_store): Update call accordingly and
remove tests for negative steps.
(vectorizable_store, vectorizable_load): Likewise.  Handle new
memory_access_types.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h
+++ gcc/tree-vectorizer.h
@@ -488,14 +488,26 @@ enum slp_vect_type {
 /* Describes how we're going to vectorize an individual load or store,
or a group of loads or stores.  */
 enum vect_memory_access_type {
+  /* An access to an invariant address.  This is used only for loads.  */
+  VMAT_INVARIANT,
+
   /* A simple contiguous access.  */
   VMAT_CONTIGUOUS,
 
+  /* A contiguous access that goes down in memory rather than up,
+ with no additional permutation.  This is used only for stores
+ of invariants.  */
+  VMAT_CONTIGUOUS_DOWN,
+
   /* A simple contiguous access in which the elements need to be permuted
  after loading or before storing.  Only used for loop vectorization;
  SLP uses separate permutes.  */
   VMAT_CONTIGUOUS_PERMUTE,
 
+  /* A simple contiguous access in which the elements need to be reversed
+ after loading or before storing.  */
+  VMAT_CONTIGUOUS_REVERSE,
+
   /* An access that uses IFN_LOAD_LANES or IFN_STORE_LANES.  */
   VMAT_LOAD_STORE_LANES,
 
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -1672,6 +1672,42 @@ vectorizable_internal_function (combined_fn cfn, tree 
fndecl,
 static tree permute_vec_elements (tree, tree, tree, gimple *,
  gimple_stmt_iterator *);
 
+/* STMT is a non-strided load or store, meaning that it accesses
+   elements with a known constant step.  Return -1 if that step
+   is negative, 0 if it is zero, and 1 if it is greater than zero.  */
+
+static int
+compare_step_with_zero (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree step;
+  if (loop_vinfo && nested_in_vect_loop_p (LOOP_VINFO_LOOP (loop_vinfo), stmt))
+step = STMT_VINFO_DR_STEP (stmt_info);
+  else
+step = DR_STEP (STMT_VINFO_DATA_REF (stmt_info));
+  return tree_int_cst_compare (step, size_zero_node);
+}
+
+/* If the target supports a permute mask that reverses the elements in
+   a vector of type VECTYPE, return that mask, otherwise return null.  */
+
+static tree
+perm_mask_for_reverse (tree vectype)
+{
+  int i, nunits;
+  unsigned char *sel;
+
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  sel = XALLOCAVEC (unsigned char, nunits);
+
+  for (i = 0; i < nunits; ++i)
+sel[i] = nunits - 1 - i;
+
+  if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel))
+return NULL_TREE;
+  return vect_gen_perm_mask_checked (vectype, sel);
+}
 
 /* A subroutine of get_load_store_type, with a subset of the same
arguments.  Handle the case where STMT is part of a grouped load
@@ -1755,7 +1791,8 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
 would access excess elements in the last iteration.  */
   bool would_overrun_p = (gap != 0);
   if (!STMT_VINFO_STRIDED_P (stmt_info)
- && (can_overrun_p || !would_overrun_p))
+ && (can_overrun_p || !would_overrun_p)
+ && compare_step_with_zero (stmt) > 0)
{
  /* First try using LOAD/STORE_LANES.  */
  if (vls_type == VLS_LOAD
@@ -1814,17 +1851,69 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
   return true;
 }
 
+/* A subroutine of get_load_store_type, with a subset of the same
+   arguments.  Handle the case where STMT is a load or store that
+   accesses consecutive elements with a negative step.  */
+
+static vect_memory_access_type
+get_negative_load_store_type (gimple *stmt, tree vectype,
+ vec_load_store_type vls_type,
+ unsigned int ncopies)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  dr_alignment_support alignment_support_scheme;
+
+  if (ncopies > 1)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc 

[6/7] Explicitly classify vector loads and stores

2016-06-15 Thread Richard Sandiford
This is the main patch in the series.  It adds a new enum and routines
for classifying a vector load or store implementation.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_memory_access_type): New enum.
(_stmt_vec_info): Add a memory_access_type field.
(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
(vect_model_store_cost): Take an access type instead of a boolean.
(vect_model_load_cost): Likewise.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
vect_model_store_cost and vect_model_load_cost.
* tree-vect-stmts.c (vec_load_store_type): New enum.
(vect_model_store_cost): Take an access type instead of a
store_lanes_p boolean.  Simplify tests.
(vect_model_load_cost): Likewise, but for load_lanes_p.
(get_group_load_store_type, get_load_store_type): New functions.
(vectorizable_store): Use get_load_store_type.  Record the access
type in STMT_VINFO_MEMORY_ACCESS_TYPE.
(vectorizable_load): Likewise.
(vectorizable_mask_load_store): Likewise.  Replace is_store
variable with vls_type.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h
+++ gcc/tree-vectorizer.h
@@ -485,6 +485,33 @@ enum slp_vect_type {
   hybrid
 };
 
+/* Describes how we're going to vectorize an individual load or store,
+   or a group of loads or stores.  */
+enum vect_memory_access_type {
+  /* A simple contiguous access.  */
+  VMAT_CONTIGUOUS,
+
+  /* A simple contiguous access in which the elements need to be permuted
+ after loading or before storing.  Only used for loop vectorization;
+ SLP uses separate permutes.  */
+  VMAT_CONTIGUOUS_PERMUTE,
+
+  /* An access that uses IFN_LOAD_LANES or IFN_STORE_LANES.  */
+  VMAT_LOAD_STORE_LANES,
+
+  /* An access in which each scalar element is loaded or stored
+ individually.  */
+  VMAT_ELEMENTWISE,
+
+  /* A hybrid of VMAT_CONTIGUOUS and VMAT_ELEMENTWISE, used for grouped
+ SLP accesses.  Each unrolled iteration uses a contiguous load
+ or store for the whole group, but the groups from separate iterations
+ are combined in the same way as for VMAT_ELEMENTWISE.  */
+  VMAT_STRIDED_SLP,
+
+  /* The access uses gather loads or scatter stores.  */
+  VMAT_GATHER_SCATTER
+};
 
 typedef struct data_reference *dr_p;
 
@@ -602,6 +629,10 @@ typedef struct _stmt_vec_info {
   /* True if this is an access with loop-invariant stride.  */
   bool strided_p;
 
+  /* Classifies how the load or store is going to be implemented
+ for loop vectorization.  */
+  vect_memory_access_type memory_access_type;
+
   /* For both loads and stores.  */
   bool simd_lane_access_p;
 
@@ -659,6 +690,7 @@ STMT_VINFO_BB_VINFO (stmt_vec_info stmt_vinfo)
 #define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
 #define STMT_VINFO_GATHER_SCATTER_P(S)(S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)   (S)->strided_p
+#define STMT_VINFO_MEMORY_ACCESS_TYPE(S)   (S)->memory_access_type
 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
 #define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
 
@@ -1006,12 +1038,12 @@ extern void free_stmt_vec_info (gimple *stmt);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
 stmt_vector_for_cost *,
stmt_vector_for_cost *);
-extern void vect_model_store_cost (stmt_vec_info, int, bool,
+extern void vect_model_store_cost (stmt_vec_info, int, vect_memory_access_type,
   enum vect_def_type, slp_tree,
   stmt_vector_for_cost *,
   stmt_vector_for_cost *);
-extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
- stmt_vector_for_cost *,
+extern void vect_model_load_cost (stmt_vec_info, int, vect_memory_access_type,
+ slp_tree, stmt_vector_for_cost *,
  stmt_vector_for_cost *);
 extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
  enum vect_cost_for_stmt, stmt_vec_info,
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c
+++ gcc/tree-vect-slp.c
@@ -1490,9 +1490,13 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree 
node,
   stmt_info = vinfo_for_stmt (stmt);
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
 {
+  vect_memory_access_type memory_access_type
+   = (STMT_VINFO_STRIDED_P (stmt_info)
+  ? VMAT_STRIDED_SLP
+  : VMAT_CONTIGUOUS);
   if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)))
-   vect_model_store_cost (stmt_info, ncopies_for_cost, false,
-  

[5/7] Move the fix for PR65518

2016-06-15 Thread Richard Sandiford
This patch moves the fix for PR65518 to the code that checks whether
load-and-permute operations are supported.   If the group size is
greater than the vectorisation factor, it would still be possible
to fall back to elementwise loads (as for strided groups) rather
than fail vectorisation entirely.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_grouped_load_supported): Add a
single_element_p parameter.
* tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.
Check the PR65518 case here rather than in vectorizable_load.
* tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly.
* tree-vect-stmts.c (vectorizable_load): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h
+++ gcc/tree-vectorizer.h
@@ -1069,7 +1069,7 @@ extern tree bump_vector_ptr (tree, gimple *, 
gimple_stmt_iterator *, gimple *,
 extern tree vect_create_destination_var (tree, tree);
 extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT);
 extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT);
-extern bool vect_grouped_load_supported (tree, unsigned HOST_WIDE_INT);
+extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT);
 extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT);
 extern void vect_permute_store_chain (vec ,unsigned int, gimple *,
 gimple_stmt_iterator *, vec *);
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c
+++ gcc/tree-vect-data-refs.c
@@ -5131,14 +5131,31 @@ vect_setup_realignment (gimple *stmt, 
gimple_stmt_iterator *gsi,
 
 /* Function vect_grouped_load_supported.
 
-   Returns TRUE if even and odd permutations are supported,
-   and FALSE otherwise.  */
+   COUNT is the size of the load group (the number of statements plus the
+   number of gaps).  SINGLE_ELEMENT_P is true if there is actually
+   only one statement, with a gap of COUNT - 1.
+
+   Returns true if a suitable permute exists.  */
 
 bool
-vect_grouped_load_supported (tree vectype, unsigned HOST_WIDE_INT count)
+vect_grouped_load_supported (tree vectype, bool single_element_p,
+unsigned HOST_WIDE_INT count)
 {
   machine_mode mode = TYPE_MODE (vectype);
 
+  /* If this is single-element interleaving with an element distance
+ that leaves unused vector loads around punt - we at least create
+ very sub-optimal code in that case (and blow up memory,
+ see PR65518).  */
+  if (single_element_p && count > TYPE_VECTOR_SUBPARTS (vectype))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"single-element interleaving not supported "
+"for not adjacent vector loads\n");
+  return false;
+}
+
   /* vect_permute_load_chain requires the group size to be equal to 3 or
  be a power of two.  */
   if (count != 3 && exact_log2 (count) == -1)
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c
+++ gcc/tree-vect-loop.c
@@ -2148,10 +2148,12 @@ again:
{
  vinfo = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
  vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo));
+ bool single_element_p = !STMT_VINFO_GROUP_NEXT_ELEMENT (vinfo);
  size = STMT_VINFO_GROUP_SIZE (vinfo);
  vectype = STMT_VINFO_VECTYPE (vinfo);
  if (! vect_load_lanes_supported (vectype, size)
- && ! vect_grouped_load_supported (vectype, size))
+ && ! vect_grouped_load_supported (vectype, single_element_p,
+   size))
return false;
}
 }
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -6298,31 +6298,20 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
 
   first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
   group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+  bool single_element_p = (first_stmt == stmt
+  && !GROUP_NEXT_ELEMENT (stmt_info));
 
   if (!slp && !STMT_VINFO_STRIDED_P (stmt_info))
{
  if (vect_load_lanes_supported (vectype, group_size))
load_lanes_p = true;
- else if (!vect_grouped_load_supported (vectype, group_size))
+ else if (!vect_grouped_load_supported (vectype, single_element_p,
+group_size))
return false;
}
 
-  /* If this is single-element interleaving with an element distance
- that leaves unused vector loads around punt 

[4/7] Add a gather_scatter_info structure

2016-06-15 Thread Richard Sandiford
This patch just refactors the gather/scatter support so that all
information is in a single structure, rather than separate variables.
This reduces the number of arguments to a function added in patch 6.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (gather_scatter_info): New structure.
(vect_check_gather_scatter): Return a bool rather than a decl.
Replace return-by-pointer arguments with a single
gather_scatter_info *.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
(vect_analyze_data_refs): Update call accordingly.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
(vectorizable_mask_load_store): Likewise.  Also record the
offset dt and vectype in the gather_scatter_info.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h
+++ gcc/tree-vectorizer.h
@@ -612,6 +612,28 @@ typedef struct _stmt_vec_info {
   unsigned int num_slp_uses;
 } *stmt_vec_info;
 
+/* Information about a gather/scatter call.  */
+struct gather_scatter_info {
+  /* The FUNCTION_DECL for the built-in gather/scatter function.  */
+  tree decl;
+
+  /* The loop-invariant base value.  */
+  tree base;
+
+  /* The original scalar offset, which is a non-loop-invariant SSA_NAME.  */
+  tree offset;
+
+  /* The definition type for the vectorized offset.  */
+  enum vect_def_type offset_dt;
+
+  /* The type of the vectorized offset.  */
+  tree offset_vectype;
+
+  /* Each offset element should be multiplied by this amount before
+ being added to the base.  */
+  int scale;
+};
+
 /* Access Functions.  */
 #define STMT_VINFO_TYPE(S) (S)->type
 #define STMT_VINFO_STMT(S) (S)->stmt
@@ -1035,8 +1057,8 @@ extern bool vect_verify_datarefs_alignment 
(loop_vec_info);
 extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance);
 extern bool vect_analyze_data_ref_accesses (vec_info *);
 extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
-extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree *,
-  int *);
+extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
+  gather_scatter_info *);
 extern bool vect_analyze_data_refs (vec_info *, int *);
 extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
  tree *, gimple_stmt_iterator *,
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c
+++ gcc/tree-vect-data-refs.c
@@ -3174,12 +3174,12 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
   return true;
 }
 
-/* Check whether a non-affine read or write in stmt is suitable for gather load
-   or scatter store and if so, return a builtin decl for that operation.  */
+/* Return true if a non-affine read or write in STMT is suitable for a
+   gather load or scatter store.  Describe the operation in *INFO if so.  */
 
-tree
-vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo, tree *basep,
-  tree *offp, int *scalep)
+bool
+vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo,
+  gather_scatter_info *info)
 {
   HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -3253,7 +3253,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
loop_vinfo, tree *basep,
   if (!expr_invariant_in_loop_p (loop, base))
 {
   if (!integer_zerop (off))
-   return NULL_TREE;
+   return false;
   off = base;
   base = size_int (pbitpos / BITS_PER_UNIT);
 }
@@ -3279,7 +3279,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
loop_vinfo, tree *basep,
  gimple *def_stmt = SSA_NAME_DEF_STMT (off);
 
  if (expr_invariant_in_loop_p (loop, off))
-   return NULL_TREE;
+   return false;
 
  if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
break;
@@ -3291,7 +3291,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
loop_vinfo, tree *basep,
   else
{
  if (get_gimple_rhs_class (TREE_CODE (off)) == GIMPLE_TERNARY_RHS)
-   return NULL_TREE;
+   return false;
  code = TREE_CODE (off);
  extract_ops_from_tree (off, , , );
}
@@ -3366,7 +3366,7 @@ vect_check_gather_scatter (gimple *stmt, loop_vec_info 
loop_vinfo, tree *basep,
  defined in the loop, punt.  */
   if (TREE_CODE (off) != SSA_NAME
   || expr_invariant_in_loop_p (loop, off))
-return NULL_TREE;
+return false;
 
   if (offtype == NULL_TREE)
 offtype = TREE_TYPE (off);
@@ -3379,15 +3379,15 @@ vect_check_gather_scatter 

[3/7] Fix load/store costs for strided groups

2016-06-15 Thread Richard Sandiford
vect_model_store_cost had:

  /* Costs of the stores.  */
  if (STMT_VINFO_STRIDED_P (stmt_info)
  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
{
  /* N scalar stores plus extracting the elements.  */
  inside_cost += record_stmt_cost (body_cost_vec,
   ncopies * TYPE_VECTOR_SUBPARTS (vectype),
   scalar_store, stmt_info, 0, vect_body);

But non-SLP strided groups also use individual scalar stores rather than
vector stores, so I think we should skip this only for SLP groups.

The same applies to vect_model_load_cost.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vect-stmts.c (vect_model_store_cost): For non-SLP
strided groups, use the cost of N scalar accesses instead
of ncopies vector accesses.
(vect_model_load_cost): Likewise.

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e90eeda..f883580 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -926,8 +926,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   /* Costs of the stores.  */
-  if (STMT_VINFO_STRIDED_P (stmt_info)
-  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+  if (STMT_VINFO_STRIDED_P (stmt_info) && !(slp_node && grouped_access_p))
 {
   /* N scalar stores plus extracting the elements.  */
   inside_cost += record_stmt_cost (body_cost_vec,
@@ -1059,8 +1058,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int 
ncopies,
 }
 
   /* The loads themselves.  */
-  if (STMT_VINFO_STRIDED_P (stmt_info)
-  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+  if (STMT_VINFO_STRIDED_P (stmt_info) && !(slp_node && grouped_access_p))
 {
   /* N scalar loads plus gathering them into a vector.  */
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);



[2/7] Clean up vectorizer load/store costs

2016-06-15 Thread Richard Sandiford
Add a bit more commentary and try to make the structure more obvious.
The horrendous:

  if (grouped_access_p
  && represents_group_p
  && !store_lanes_p
  && !STMT_VINFO_STRIDED_P (stmt_info)
  && !slp_node)

checks go away in patch 6.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vect-stmts.c (vect_cost_group_size): Delete.
(vect_model_store_cost): Avoid calling it.  Use first_stmt_p
variable to indicate when once-per-group costs are being used.
(vect_model_load_cost): Likewise.  Fix comment and misindented code.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -865,24 +865,6 @@ vect_model_promotion_demotion_cost (stmt_vec_info 
stmt_info,
  "prologue_cost = %d .\n", inside_cost, prologue_cost);
 }
 
-/* Function vect_cost_group_size
-
-   For grouped load or store, return the group_size only if it is the first
-   load or store of a group, else return 1.  This ensures that group size is
-   only returned once per group.  */
-
-static int
-vect_cost_group_size (stmt_vec_info stmt_info)
-{
-  gimple *first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
-
-  if (first_stmt == STMT_VINFO_STMT (stmt_info))
-return GROUP_SIZE (stmt_info);
-
-  return 1;
-}
-
-
 /* Function vect_model_store_cost
 
Models cost for stores.  In the case of grouped accesses, one access
@@ -895,47 +877,43 @@ vect_model_store_cost (stmt_vec_info stmt_info, int 
ncopies,
   stmt_vector_for_cost *prologue_cost_vec,
   stmt_vector_for_cost *body_cost_vec)
 {
-  int group_size;
   unsigned int inside_cost = 0, prologue_cost = 0;
-  struct data_reference *first_dr;
-  gimple *first_stmt;
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  gimple *first_stmt = STMT_VINFO_STMT (stmt_info);
+  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
 
   if (dt == vect_constant_def || dt == vect_external_def)
 prologue_cost += record_stmt_cost (prologue_cost_vec, 1, scalar_to_vec,
   stmt_info, 0, vect_prologue);
 
-  /* Grouped access?  */
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+  /* Grouped stores update all elements in the group at once,
+ so we want the DR for the first statement.  */
+  if (!slp_node && grouped_access_p)
 {
-  if (slp_node)
-{
-  first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
-  group_size = 1;
-}
-  else
-{
-  first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
-  group_size = vect_cost_group_size (stmt_info);
-}
-
-  first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
-}
-  /* Not a grouped access.  */
-  else
-{
-  group_size = 1;
-  first_dr = STMT_VINFO_DATA_REF (stmt_info);
+  first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
+  dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
 }
 
+  /* True if we should include any once-per-group costs as well as
+ the cost of the statement itself.  For SLP we only get called
+ once per group anyhow.  */
+  bool first_stmt_p = (first_stmt == STMT_VINFO_STMT (stmt_info));
+
   /* We assume that the cost of a single store-lanes instruction is
  equivalent to the cost of GROUP_SIZE separate stores.  If a grouped
  access is instead being provided by a permute-and-store operation,
- include the cost of the permutes.  */
-  if (!store_lanes_p && group_size > 1
-  && !STMT_VINFO_STRIDED_P (stmt_info))
+ include the cost of the permutes.
+
+ For SLP, the caller has already counted the permutation, if any.  */
+  if (grouped_access_p
+  && first_stmt_p
+  && !store_lanes_p
+  && !STMT_VINFO_STRIDED_P (stmt_info)
+  && !slp_node)
 {
   /* Uses a high and low interleave or shuffle operations for each
 needed permute.  */
+  int group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
   int nstmts = ncopies * ceil_log2 (group_size) * group_size;
   inside_cost = record_stmt_cost (body_cost_vec, nstmts, vec_perm,
  stmt_info, 0, vect_body);
@@ -957,7 +935,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
   scalar_store, stmt_info, 0, vect_body);
 }
   else
-vect_get_store_cost (first_dr, ncopies, _cost, body_cost_vec);
+vect_get_store_cost (dr, ncopies, _cost, body_cost_vec);
 
   if (STMT_VINFO_STRIDED_P (stmt_info))
 inside_cost += record_stmt_cost (body_cost_vec,
@@ -1026,8 +1004,8 @@ vect_get_store_cost (struct data_reference *dr, int 
ncopies,
 
 /* Function vect_model_load_cost
 
-   Models cost for loads.  In the case of grouped accesses, the last access
-   has the overhead of the grouped access attributed to it.  Since unaligned
+   Models cost for loads.  

[PATCH] PR 71439 - Only vectorize live PHIs that are inductions

2016-06-15 Thread Alan Hayward
For a PHI to be used outside the loop it needs to be vectorized. However
the
vectorizer currently will only vectorize PHIs that are an induction.

This patch fixes PR 71439 by only allowing a live PHI to be vectorized if
it
is an induction. In addition, live PHIs need to pass a
vectorizable_live_operation check.

Tested on x86 and aarch64.
Ok to commit?

Alan.

gcc/
PR tree-optimization/71439
* tree-vect-loop.c (vect_analyze_loop_operations): Additional check for
live PHIs.

testsuite/
PR tree-optimization/71439
* gcc.dg/vect/pr71439.c: New



diff --git a/gcc/testsuite/gcc.dg/vect/pr71439.c
b/gcc/testsuite/gcc.dg/vect/pr71439.c
new file mode 100644
index 
..95e4763bad6e9f301d53c20ffa160b96b
dad9a53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr71439.c
@@ -0,0 +1,17 @@
+#include "tree-vect.h"
+
+int a, b, c;
+short fn1(int p1, int p2) { return p1 + p2; }
+
+int main() {
+  a = 0;
+  for (; a < 30; a = fn1(a, 4)) {
+c = b;
+b = 6;
+  }
+
+  if (c != 6)
+abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
1231b95f6a71337833e8c4b24884da9f96a7b5bf..90ade75bcd212b542ad680877a79df717
751ff4b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1669,7 +1669,8 @@ vect_analyze_loop_operations (loop_vec_info
loop_vinfo)

   gcc_assert (stmt_info);

-  if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
+  if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
+   || STMT_VINFO_LIVE_P (stmt_info))
   && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
 {
   /* A scalar-dependence cycle that we don't support.  */
@@ -1686,6 +1687,9 @@ vect_analyze_loop_operations (loop_vec_info
loop_vinfo)
 ok = vectorizable_induction (phi, NULL, NULL);
 }

+ if (ok && STMT_VINFO_LIVE_P (stmt_info))
+   ok = vectorizable_live_operation (phi, NULL, NULL, -1, NULL);
+
   if (!ok)
 {
   if (dump_enabled_p ())






[1/7] Remove unnecessary peeling for gaps check

2016-06-15 Thread Richard Sandiford
I recently relaxed the peeling-for-gaps conditions for LD3 but
kept them as-is for load-and-permute.  I don't think the conditons
are needed for load-and-permute either though.  No current load-and-
permute should load outside the group, so if there is no gap at the end,
the final vector element loaded will correspond to an element loaded
by the original scalar loop.

The patch for PR68559 (a missed optimisation PR) increased the peeled
cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so
before that fix, we didn't peel for gaps if there was no gap at the end
of the group and if the group size was a power of 2.

The only current non-power-of-2 load-and-permute size is 3, which
doesn't require loading more than 3 vectors.

The testcase is based on gcc.dg/vect/pr49038.c.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vect-stmts.c (vectorizable_load): Remove unnecessary
peeling-for-gaps condition.

gcc/testsuite/
* gcc.dg/vect/group-no-gaps-1.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -6356,13 +6356,11 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
  gcc_assert (GROUP_GAP (stmt_info));
}
 
-  /* If there is a gap in the end of the group or the group size cannot
- be made a multiple of the vector element count then we access excess
+  /* If there is a gap in the end of the group then we access excess
 elements in the last iteration and thus need to peel that off.  */
   if (loop_vinfo
  && ! STMT_VINFO_STRIDED_P (stmt_info)
- && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
- || (!slp && !load_lanes_p && vf % group_size != 0)))
+ && GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
Index: gcc/testsuite/gcc.dg/vect/group-no-gaps-1.c
===
--- /dev/null
+++ gcc/testsuite/gcc.dg/vect/group-no-gaps-1.c
@@ -0,0 +1,108 @@
+/* { dg-require-effective-target mmap } */
+
+#include 
+#include 
+
+#define COUNT 320
+#define MMAP_SIZE 0x2
+#define ADDRESS1 0x112200
+#define ADDRESS2 (ADDRESS1 + MMAP_SIZE * 16)
+#define TYPE unsigned int
+
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+#define RHS0(B) b[B]
+#define RHS1(B) RHS0(B) + b[(B) + 1]
+#define RHS2(B) RHS1(B) + b[(B) + 2]
+#define RHS3(B) RHS2(B) + b[(B) + 3]
+#define RHS4(B) RHS3(B) + b[(B) + 4]
+#define RHS5(B) RHS4(B) + b[(B) + 5]
+#define RHS6(B) RHS5(B) + b[(B) + 6]
+#define RHS7(B) RHS6(B) + b[(B) + 7]
+
+#define LHS0(B) a[B]
+#define LHS1(B) LHS0(B) = a[(B) + 1]
+#define LHS2(B) LHS1(B) = a[(B) + 2]
+#define LHS3(B) LHS2(B) = a[(B) + 3]
+#define LHS4(B) LHS3(B) = a[(B) + 4]
+#define LHS5(B) LHS4(B) = a[(B) + 5]
+#define LHS6(B) LHS5(B) = a[(B) + 6]
+#define LHS7(B) LHS6(B) = a[(B) + 7]
+
+#define DEF_GROUP_SIZE(MULT, GAP, NO_GAP)  \
+  void __attribute__((noinline, noclone))  \
+  gap_load_##MULT (TYPE *__restrict a, TYPE *__restrict b) \
+  {\
+for (int i = 0; i < COUNT; i++)\
+  a[i] = RHS##GAP (i * MULT);  \
+  }\
+  void __attribute__((noinline, noclone))  \
+  no_gap_load_##MULT (TYPE *__restrict a, TYPE *__restrict b)  \
+  {\
+for (int i = 0; i < COUNT; i++)\
+  a[i] = RHS##NO_GAP (i * MULT);   \
+  }\
+  void __attribute__((noinline, noclone))  \
+  gap_store_##MULT (TYPE *__restrict a, TYPE *__restrict b)\
+  {\
+for (int i = 0; i < COUNT; i++)\
+  LHS##GAP (i * MULT) = b[i];  \
+  }\
+  void __attribute__((noinline, noclone))  \
+  no_gap_store_##MULT (TYPE *__restrict a, TYPE *__restrict b) \
+  {\
+for (int i = 0; i < COUNT; i++)\
+  LHS##NO_GAP (i * MULT) = b[i];   \
+  }
+
+#define USE_GROUP_SIZE(MULT)   \
+  gap_load_##MULT (end_x - COUNT, end_y - COUNT * MULT + 1);   \
+  no_gap_load_##MULT (end_x - COUNT, end_y - COUNT * MULT);\
+  gap_store_##MULT (end_x - COUNT * MULT + 1, end_y - COUNT);  \
+  no_gap_store_##MULT (end_x - 

[0/7] Tweak vector load/store code

2016-06-15 Thread Richard Sandiford
This patch series adds a new enum and routines for classifying a vector
load or store implementation.  Originally there were three motivations:

  (1) Reduce cut-&-paste

  (2) Make the chosen vectorisation strategy more obvious.  At the
  moment this is derived implicitly from various other bits of
  state (GROUPED, STRIDED, SLP, etc.)

  (3) Decouple the vectorisation strategy from those other bits of state,
  so that there can be a choice of implementation for a given scalar
  statement.  The specific problem here is that we class:

  for (...)
{
  ... = a[i * x];
  ... = a[i * x + 1];
}

  as "strided and grouped" but:

  for (...)
{
  ... = a[i * 7];
  ... = a[i * 7 + 1];
}

  as "non-strided and grouped".  Before the patches, "strided and
  grouped" loads would always try to use separate scalar loads
  while "non-strided and grouped" loads would always try to use
  load-and-permute.  But load-and-permute is never supported for
  a group size of 7, so the effect was that the first loop was
  vectorisable and the second wasn't.  It seemed odd that not
  knowing x (but accepting it could be 7) would allow more
  optimisation opportunities than knowing x is 7.

Unfortunately, it looks like we underestimate the cost of separate
scalar accesses on at least aarch64, so I've disabled (3) for now;
see the "if" statement at the end of get_load_store_type in patch 6.
I think the series still does (1) and (2) though, so that's the
justification for it in its current form.  It also means that (3)
is now simply a case of removing the FIXME code, once the cost model
problems have been sorted out.  (I did wonder about adding a --param,
but that seems overkill.  I hope to get back to this during GCC 7 stage 1.)

Thanks,
Richard


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-15 Thread Christophe Lyon
On 9 June 2016 at 14:46, Jakub Jelinek  wrote:
> On Thu, Jun 09, 2016 at 02:40:43PM +0200, Christophe Lyon wrote:
>> > Bet it depends if this happens before the signal(SIGILL, sig_ill_handler);
>> > call or after it.  If before, then I guess you'd better rewrite the
>> > long long a = 0, b = 1;
>> > asm ("vorr %P0, %P1, %P2"
>> >  : "=w" (a)
>> >  : "0" (a), "w" (b));
>> > if (a != 1)
>>
>> Of course you are right: it happens just before the call to signal,
>> to build the sig_ill_handler address in r1.
>>
>> So it's not even a problem with rewriting the asm.
>
> Ugh, so the added options don't affect just vectorized code, but normal
> integer only code?
> check_vect is fragile, there is always a risk that some instruction is
> scheduled before the call.

Yes, here it's an instruction used to build a parameter of the call.

> If you have working target attribute support, I think you should compile
> check_vect with attribute set to some lowest common denominator that every
> ARM CPU supports (if there is any, that is).  Though most likely you'll need
> to tweak the inline asm, because maybe "w" constraint won't be available
> then.

ARM does not support attribute/pragma cpu :(

Here is a new patch version, which removes the hardcoded dg-do run directives,
so that tests use compile or run depending on the result of
check_vect_support_and_set_flags.

On ARM, this first uses arm_neon_ok to detect the required flags, then
depending on
arm_neon_hw, it decides whether to dg-do run or compile.

OK?

Christophe

> Jakub
gcc/testsuite/ChangeLog:

2016-06-15  Christophe Lyon  

* g++.dg/vect/pr33834_2.cc: Use dg-additional options instead of
dg-options.
* g++.dg/vect/pr33860a.cc: Likewise.
* g++.dg/vect/pr45470-a.cc: Likewise.
* g++.dg/vect/pr45470-b.cc: Likewise.
* g++.dg/vect/pr60896.cc: Likewise.
* gcc.dg/vect/no-tree-pre-pr45241.c: Likewise.
* gcc.dg/vect/pr18308.c: Likewise.
* gcc.dg/vect/pr24049.c: Likewise.
* gcc.dg/vect/pr33373.c: Likewise.
* gcc.dg/vect/pr36228.c: Likewise.
* gcc.dg/vect/pr42395.c: Likewise.
* gcc.dg/vect/pr42604.c: Likewise.
* gcc.dg/vect/pr46663.c: Likewise.
* gcc.dg/vect/pr48765.c: Likewise.
* gcc.dg/vect/pr49093.c: Likewise.
* gcc.dg/vect/pr49352.c: Likewise.
* gcc.dg/vect/pr52298.c: Likewise.
* gcc.dg/vect/pr52870.c: Likewise.
* gcc.dg/vect/pr53185.c: Likewise.
* gcc.dg/vect/pr53773.c: Likewise.
* gcc.dg/vect/pr56695.c: Likewise.
* gcc.dg/vect/pr62171.c: Likewise.
* gcc.dg/vect/pr63530.c: Likewise.
* gcc.dg/vect/pr68339.c: Likewise.
* gcc.dg/vect/vect-82_64.c: Likewise.
* gcc.dg/vect/vect-83_64.c: Likewise.
* gcc.dg/vect/vect-debug-pr41926.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-shift-2-big-array.c: Likewise.
* gcc.dg/vect/vect-shift-2.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* gcc.dg/vect/O3-pr70130.c: Remove dg-do run.
* gcc.dg/vect/pr70021.c: Likewise.
* gcc.dg/vect/pr70138-1.c: Likewise.
* gcc.dg/vect/pr70138-2.c: Likewise.
* gcc.dg/vect/pr70354-1.c: Likewise.
* gcc.dg/vect/pr70354-2.c: Likewise.
* gcc.dg/vect/vect-nb-iter-ub-1.c: Likewise.
* gcc.dg/vect/vect-nb-iter-ub-2.c: Likewise.
* gcc.dg/vect/vect-nb-iter-ub-3.c: Likewise.
* gcc.dg/vect/pr71259.c: Use dg-additional options instead of
dg-options. Remove dg-do run.
diff --git a/gcc/testsuite/g++.dg/vect/pr33834_2.cc 
b/gcc/testsuite/g++.dg/vect/pr33834_2.cc
index ecaf588..49e72d2 100644
--- a/gcc/testsuite/g++.dg/vect/pr33834_2.cc
+++ b/gcc/testsuite/g++.dg/vect/pr33834_2.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -ftree-vectorize" } */
+/* { dg-additional-options "-O3 -ftree-vectorize" } */
 
 /* Testcase by Martin Michlmayr  */
 
diff --git a/gcc/testsuite/g++.dg/vect/pr33860a.cc 
b/gcc/testsuite/g++.dg/vect/pr33860a.cc
index 0e5164f..bbfdeef 100644
--- a/gcc/testsuite/g++.dg/vect/pr33860a.cc
+++ b/gcc/testsuite/g++.dg/vect/pr33860a.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && ilp32 } } } 
*/
+/* { dg-additional-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } } */
 
 /* Testcase by Martin Michlmayr  */
 
diff --git a/gcc/testsuite/g++.dg/vect/pr45470-a.cc 
b/gcc/testsuite/g++.dg/vect/pr45470-a.cc
index 98ce4ca..ba5873c 100644
--- a/gcc/testsuite/g++.dg/vect/pr45470-a.cc
+++ b/gcc/testsuite/g++.dg/vect/pr45470-a.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize -fnon-call-exceptions" } */
+/* { dg-additional-options "-O1 -ftree-vectorize -fnon-call-exceptions" } */
 
 struct A
 {
diff 

Re: container method call shortcuts

2016-06-15 Thread Jonathan Wakely

On 14/06/16 22:04 +0200, François Dumont wrote:

Hi

   Here is the patch to limit burden on compiler in finding out what 
is the right method to call eventually when we already know it.


Very nice, OK for trunk, thanks.


Re: PR 71181 Avoid rehash after reserve

2016-06-15 Thread Jonathan Wakely

On 14/06/16 22:34 +0200, François Dumont wrote:

On 14/06/2016 13:22, Jonathan Wakely wrote:

On 13/06/16 21:49 +0200, François Dumont wrote:

Hi

  I eventually would like to propose the attached patch.

  In tr1 I made sure we use a special past-the-end iterator that 
makes usage of lower_bound result without check safe.


I'm confused ... isn't that already done?


Indeed but my intention was to make sentinel values useless so that we 
can remove them one day.


I don't like current code because when you just look at lower_bound 
call you can wonder why returned value is not tested. You need to 
consider how __prime_list has been defined. When you add '- 1' in the 
call to lower_bound you don't need to look too far to understand it.




_S_n_primes is defined as:

  enum { _S_n_primes = sizeof(unsigned long) != 8 ? 256 : 256 + 48 };

The table of primes is:

extern const unsigned long __prime_list[] = // 256 + 1 or 256 + 48 + 1

Which means that _S_n_primes is already one less, so that the "end"
returned by lower_bound is already dereferenceable. That's what the
comment in the table suggests too:

  // Sentinel, so we don't have to test the result of lower_bound,
  // or, on 64-bit machines, rest of the table.
#if __SIZEOF_LONG__ != 8
  4294967291ul

So ...

diff --git a/libstdc++-v3/include/tr1/hashtable_policy.h 
b/libstdc++-v3/include/tr1/hashtable_policy.h

index 4ee6d45..24d1a59 100644
--- a/libstdc++-v3/include/tr1/hashtable_policy.h
+++ b/libstdc++-v3/include/tr1/hashtable_policy.h
@@ -420,8 +420,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Prime_rehash_policy::
 _M_next_bkt(std::size_t __n) const
 {
-const unsigned long* __p = std::lower_bound(__prime_list, 
__prime_list

-+ _S_n_primes, __n);
+// Past-the-end iterator is made dereferenceable to avoid check on
+// lower_bound result.
+const unsigned long* __p
+  = std::lower_bound(__prime_list, __prime_list + _S_n_primes 
- 1, __n);


Is this redundant? Unless I'm misunderstanding something, _S_n_primes
already handles this.


Yes it does for now but not if __prime_list is a the pure list of 
prime numbers.


OK. And as I said below, lower_bound(primes, primes + nprimes - 1, n)
still works because anything greater than the second-to-last prime
should be treated as the last one anyway.

Would this comment make it clearer?

 // Don't include the last prime in the search, so that anything
 // higher than the second-to-last prime returns a past-the-end
 // iterator that can be dereferenced to get the last prime.
 const unsigned long* __p
   = std::lower_bound(__prime_list, __prime_list + _S_n_primes - 1, __n)




The other changes in tr1/hashtable_policy.h are nice simplifications.

diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc 
b/libstdc++-v3/src/c++11/hashtable_c++0x.cc

index a5e6520..7cbd364 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,22 +46,36 @@ namespace __detail
 {
   // Optimize lookups involving the first elements of __prime_list.
   // (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[12]
-  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
+static const unsigned char __fast_bkt[13]
+  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };

-if (__n <= 11)
+if (__n <= 12)
 {
   _M_next_resize =
 __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
   return __fast_bkt[__n];
 }

+// Number of primes without sentinel.
   constexpr auto __n_primes
 = sizeof(__prime_list) / sizeof(unsigned long) - 1;
+// past-the-end iterator is made dereferenceable.
+constexpr auto __prime_list_end = __prime_list + __n_primes - 1;


I don't think this comment clarifies things very well.

Because of the sentinel and because __n_primes doesn't include the
sentinel, (__prime_list + __n_primes) is already dereferenceable
anyway, so the comment doesn't explain why there's *another* -1 here.


The comment is doing as if there was no sentinel.


OK. I think a similar comment as suggested above could help, by being
more verbose about what's happening.

 // Don't include the last prime in the search, so that anything
 // higher than the second-to-last prime returns a past-the-end
 // iterator that can be dereferenced to get the last prime.





   const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 5, __prime_list + 
__n_primes, __n);

+  std::lower_bound(__prime_list + 6, __prime_list_end, __n);
+
+if (*__next_bkt == __n && __next_bkt != __prime_list_end)
+  ++__next_bkt;


Can we avoid this check by searching for __n + 1 instead of __n with
the lower_bound call?


Yes, that's another option, I will give it a try.


I did some comparisons and this version seems to execute fewer
instructions in some simple tests, according to cachegrind.



RE: [PATCH] [ARC] New CPU C-define handler.

2016-06-15 Thread Claudiu Zissulescu
PING

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Thursday, May 19, 2016 1:58 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ; g...@amylaar.uk;
> francois.bed...@synopsys.com
> Subject: [PATCH] [ARC] New CPU C-define handler.
> 
> This patch refactors how we handle the built-in preprocessor macros and
> assertions for ARC.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> 2016-05-02  Claudiu Zissulescu  
> 
>   * config/arc/arc-c.c: New file.
>   * config/arc/arc-c.def: Likewise.
>   * config/arc/t-arc: Likewise.
>   * config.gcc: Include arc-c.o as c and cpp object.
>   * config/arc/arc-protos.h (arc_cpu_cpp_builtins): Add prototype.
>   * config/arc/arc.h (TARGET_CPU_CPP_BUILTINS): Use
>   arc_cpu_cpp_builtins.
> ---
>  gcc/config.gcc  |  2 ++
>  gcc/config/arc/arc-c.c  | 69
> +
>  gcc/config/arc/arc-c.def| 68
> 
>  gcc/config/arc/arc-protos.h |  1 +
>  gcc/config/arc/arc.h| 56 +---
>  gcc/config/arc/t-arc| 29 +++
>  6 files changed, 170 insertions(+), 55 deletions(-)
>  create mode 100644 gcc/config/arc/arc-c.c
>  create mode 100644 gcc/config/arc/arc-c.def
>  create mode 100644 gcc/config/arc/t-arc
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 4e98df7..148e020 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -323,6 +323,8 @@ am33_2.0-*-linux*)
>   ;;
>  arc*-*-*)
>   cpu_type=arc
> + c_target_objs="arc-c.o"
> + cxx_target_objs="arc-c.o"
>   ;;
>  arm*-*-*)
>   cpu_type=arm
> diff --git a/gcc/config/arc/arc-c.c b/gcc/config/arc/arc-c.c
> new file mode 100644
> index 000..3bf3fd2
> --- /dev/null
> +++ b/gcc/config/arc/arc-c.c
> @@ -0,0 +1,69 @@
> +/* Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .
> +*/
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "tm_p.h"
> +#include "cpplib.h"
> +#include "c-family/c-common.h"
> +#include "target.h"
> +
> +#define builtin_define(TXT) cpp_define (pfile, TXT)
> +#define builtin_assert(TXT) cpp_assert (pfile, TXT)
> +
> +/* Define or undefine macros based on the current target.  */
> +
> +static void
> +def_or_undef_macro (cpp_reader* pfile, const char *name, bool def_p)
> +{
> +  if (def_p)
> +cpp_define (pfile, name);
> +  else
> +cpp_undef (pfile, name);
> +}
> +
> +/* Helper for TARGET_CPU_CPP_BUILTINS hook.  */
> +
> +void
> +arc_cpu_cpp_builtins (cpp_reader * pfile)
> +{
> +  builtin_assert ("cpu=arc");
> +  builtin_assert ("machine=arc");
> +
> +  builtin_define ("__arc__");
> +
> +#undef ARC_C_DEF
> +#define ARC_C_DEF(NAME, CONDITION)   \
> +  def_or_undef_macro (pfile, NAME, CONDITION);
> +
> +#include "arc-c.def"
> +#undef ARC_C_DEF
> +
> +  builtin_define_with_int_value ("__ARC_TLS_REGNO__",
> +  arc_tp_regno);
> +
> +  builtin_define (TARGET_BIG_ENDIAN
> +   ? "__BIG_ENDIAN__" : "__LITTLE_ENDIAN__");
> +  if (TARGET_BIG_ENDIAN)
> +builtin_define ("__big_endian__");
> +
> +}
> diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
> new file mode 100644
> index 000..88c65ac
> --- /dev/null
> +++ b/gcc/config/arc/arc-c.def
> @@ -0,0 +1,68 @@
> +/* Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .
> +*/
> +
> +ARC_C_DEF ("__ARC600__", TARGET_ARC600)
> +ARC_C_DEF ("__ARC601__", TARGET_ARC601)
> +ARC_C_DEF 

RE: [PATCH] [ARC] Fix emitting jump tables for ARCv2

2016-06-15 Thread Claudiu Zissulescu
PING

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Tuesday, April 26, 2016 1:29 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ; g...@amylaar.uk;
> francois.bed...@synopsys.com; jeremy.benn...@embecosm.com
> Subject: [PATCH] [ARC] Fix emitting jump tables for ARCv2
> 
> The compact casesi option only make sens for ARCv1 cores. For ARCv2 cores
> we
> use the regular expansion.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> 2016-04-26  Claudiu Zissulescu  
> 
>   * common/config/arc/arc-common.c
> (arc_option_optimization_table):
>   Disable compact casesi as default option.
>   * config/arc/arc.c (arc_override_options): Enable compact casesi
>   option for non-ARCv2 cores.
>   * config/arc/arc.md (movsi_insn): Use @pcl relocation.
>   (movsi_ne): Update assembly printing pattern.
>   (casesi_load): Use short ld instruction.
> ---
>  gcc/common/config/arc/arc-common.c |  1 -
>  gcc/config/arc/arc.c   |  7 +++
>  gcc/config/arc/arc.md  | 11 +++
>  3 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/common/config/arc/arc-common.c
> b/gcc/common/config/arc/arc-common.c
> index 64fb053..17cc1bd 100644
> --- a/gcc/common/config/arc/arc-common.c
> +++ b/gcc/common/config/arc/arc-common.c
> @@ -56,7 +56,6 @@ static const struct default_options
> arc_option_optimization_table[] =
>  { OPT_LEVELS_ALL, OPT_mbbit_peephole, NULL, 1 },
>  { OPT_LEVELS_SIZE, OPT_mq_class, NULL, 1 },
>  { OPT_LEVELS_SIZE, OPT_mcase_vector_pcrel, NULL, 1 },
> -{ OPT_LEVELS_SIZE, OPT_mcompact_casesi, NULL, 1 },
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>};
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 6f2136e..be55c99 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -812,6 +812,13 @@ arc_override_options (void)
>if (arc_size_opt_level == 3)
>  optimize_size = 1;
> 
> +  /* Compact casesi is not a valid option for ARCv2 family, disable
> + it.  */
> +  if (TARGET_V2)
> +TARGET_COMPACT_CASESI = 0;
> +  else if (optimize_size == 1)
> +TARGET_COMPACT_CASESI = 1;
> +
>if (flag_pic)
>  target_flags |= MASK_NO_SDATA_SET;
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 718443b..aec4b37 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -713,7 +713,7 @@
> ror %0,((%1*2+1) & 0x3f) ;6
> mov%? %0,%1   ;7
> add %0,%S1;8
> -   * return arc_get_unalign () ? \"add %0,pcl,%1-.+2\" : \"add %0,pcl,%1-.\";
> +   add %0,pcl,%1@pcl
> mov%? %0,%S1%&;10
> mov%? %0,%S1  ;11
> ld%?%U1 %0,%1%&   ;12
> @@ -3467,8 +3467,8 @@
>""
>"@
>   * current_insn_predicate = 0; return \"sub%?.ne %0,%0,%0%&\";
> -mov_s.ne %0,%1
> -mov_s.ne %0,%1
> +* current_insn_predicate = 0; return \"mov%?.ne %0,%1\";
> +* current_insn_predicate = 0; return \"mov%?.ne %0,%1\";
>   mov.ne %0,%1
>   mov.ne %0,%S1"
>[(set_attr "type" "cmove")
> @@ -3777,7 +3777,10 @@
>switch (GET_MODE (diff_vec))
>  {
>  case SImode:
> -  return \"ld.as %0,[%1,%2]%&\";
> +  if ((which_alternative == 0) && TARGET_CODE_DENSITY)
> + return \"ld_s.as %0,[%1,%2]%&\";
> +  else
> + return \"ld.as %0,[%1,%2]%&\";
>  case HImode:
>if (ADDR_DIFF_VEC_FLAGS (diff_vec).offset_unsigned)
>   return \"ld%_.as %0,[%1,%2]\";
> --
> 1.9.1



RE: [PATCH 0/2] [ARC] Refurbish backend options

2016-06-15 Thread Claudiu Zissulescu
PING

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Monday, May 30, 2016 2:33 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ; g...@amylaar.uk;
> francois.bed...@synopsys.com
> Subject: [PATCH 0/2] [ARC] Refurbish backend options
> 
> This series of patches redefines the way how we handle the options
> within the ARC backend. Firstly, a number of options got deprecated as
> they were directed to control the assembler behavior, assembler which
> got overhaul and ignores the options in question. Secondly, we remove
> the capitalized cpu names accepted by mcpu option. Finally, we
> introduced new cpu options, based on the Synopsys predefined CPU
> templates to make easier the translation from hardware CPU template to
> gcc options. Those new cpu options are handled by a number of scripts
> which seamlessly generates gcc options, multilib variants, and
> checking the allowed hardware extensions against gcc command line
> options.
> 
> Please find two patches, one which is refurbishing the ARC options,
> and the second one which updates the ARC specific tests.
> 
> Claudiu Zissulescu (2):
>   New option handling, refurbish multilib support.
>   Update target specific tests.
> 
>  gcc/common/config/arc/arc-common.c   | 162 --
>  gcc/config.gcc   |  47 +++---
>  gcc/config/arc/arc-arch.h| 120 +
>  gcc/config/arc/arc-arches.def|  35 
>  gcc/config/arc/arc-c.def |   4 +
>  gcc/config/arc/arc-cpus.def  |  47 ++
>  gcc/config/arc/arc-options.def   |  69 
>  gcc/config/arc/arc-opts.h|  47 +-
>  gcc/config/arc/arc-protos.h  |   1 -
>  gcc/config/arc/arc-tables.opt|  90 ++
>  gcc/config/arc/arc.c | 186 +++--
>  gcc/config/arc/arc.h |  91 +-
>  gcc/config/arc/arc.md|   5 -
>  gcc/config/arc/arc.opt   | 109 
>  gcc/config/arc/driver-arc.c  |  80 +
>  gcc/config/arc/genmultilib.awk   | 204 
> +++
>  gcc/config/arc/genoptions.awk|  86 ++
>  gcc/config/arc/t-arc |  19 +++
>  gcc/config/arc/t-arc-newlib  |  46 -
>  gcc/config/arc/t-arc-uClibc  |  20 ---
>  gcc/config/arc/t-multilib|  51 ++
>  gcc/config/arc/t-uClibc  |  20 +++
>  gcc/doc/invoke.texi  |  86 --
>  gcc/testsuite/gcc.target/arc/abitest.S   |  31 
>  gcc/testsuite/gcc.target/arc/arc.exp |  66 +++-
>  gcc/testsuite/gcc.target/arc/barrel-shifter-1.c  |   2 +-
>  gcc/testsuite/gcc.target/arc/builtin_simd.c  |   1 +
>  gcc/testsuite/gcc.target/arc/builtin_simdarc.c   |   1 +
>  gcc/testsuite/gcc.target/arc/cmem-1.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-2.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-3.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-4.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-5.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-6.c|   1 +
>  gcc/testsuite/gcc.target/arc/cmem-7.c|   1 +
>  gcc/testsuite/gcc.target/arc/extzv-1.c   |   1 +
>  gcc/testsuite/gcc.target/arc/insv-1.c|   1 +
>  gcc/testsuite/gcc.target/arc/insv-2.c|   1 +
>  gcc/testsuite/gcc.target/arc/interrupt-1.c   |   7 +-
>  gcc/testsuite/gcc.target/arc/interrupt-2.c   |   1 +
>  gcc/testsuite/gcc.target/arc/interrupt-3.c   |   2 +-
>  gcc/testsuite/gcc.target/arc/jump-around-jump.c  |   2 +-
>  gcc/testsuite/gcc.target/arc/mA6.c   |   1 +
>  gcc/testsuite/gcc.target/arc/mA7.c   |   1 +
>  gcc/testsuite/gcc.target/arc/mARC600.c   |   1 +
>  gcc/testsuite/gcc.target/arc/mARC601.c   |   3 +-
>  gcc/testsuite/gcc.target/arc/mARC700.c   |   1 +
>  gcc/testsuite/gcc.target/arc/mcpu-arc600.c   |   3 +-
>  gcc/testsuite/gcc.target/arc/mcpu-arc601.c   |   5 +-
>  gcc/testsuite/gcc.target/arc/mcpu-arc700.c   |   3 +-
>  gcc/testsuite/gcc.target/arc/mcrc.c  |   8 -
>  gcc/testsuite/gcc.target/arc/mdpfp.c |   1 +
>  gcc/testsuite/gcc.target/arc/mdsp-packa.c|   9 -
>  gcc/testsuite/gcc.target/arc/mdvbf.c |   9 -
>  gcc/testsuite/gcc.target/arc/mmac-24.c   |   8 -
>  gcc/testsuite/gcc.target/arc/mmac-d16.c  |   9 -
>  gcc/testsuite/gcc.target/arc/mno-crc.c   |  11 --
>  gcc/testsuite/gcc.target/arc/mno-dsp-packa.c |  11 --
>  gcc/testsuite/gcc.target/arc/mno-dvbf.c  |  11 --
>  

  1   2   >