Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-23 Thread Jeff Law
On 10/16/2017 04:56 AM, Wilco Dijkstra wrote:
> This patch cleans up autopref scheduling.
> 
> The code is greatly simplified.  Sort accesses on the offset first, and
> only if the offsets are the same fall back to other comparisons in
> rank_for_schedule.  This doesn't at all restore the original behaviour
> since we no longer compare the base address, but it now defines a total
> sorting order.  More work will be required to improve the sorting so
> that only loads/stores with the same base are affected.
> 
> AArch64 bootstrap completes.
> 
> OK for commit?
> 
> ChangeLog:
> 2017-10-03  Wilco Dijkstra  
> 
>     PR rtl-optimization/82396
>     * gcc/haifa-sched.c (autopref_multipass_init): Simplify
>     initialization.
>     (autopref_rank_data): Simplify sort order.
>     * gcc/sched-int.h (autopref_multipass_data_): Remove
>     multi_mem_insn_p, min_offset and max_offset.
OK. Sorry for the delay.

jeff


Re: Make more use of df_read_modify_subreg_p

2017-10-23 Thread Jeff Law
On 10/13/2017 10:08 AM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 08/24/2017 12:25 PM, Richard Sandiford wrote:
>>> Segher Boessenkool  writes:
 On Wed, Aug 23, 2017 at 11:49:03AM +0100, Richard Sandiford wrote:
> This patch uses df_read_modify_subreg_p to check whether writing
> to a subreg would preserve some of the existing contents.

 combine does not keep the DF info up-to-date -- but that is no
 problem here, df_read_modify_subreg_p uses no DF info at all.  Maybe
 it should not have "df_" in the name?
>>>
>>> Yeah, I guess that's a bit confusing.  I've just posted a patch
>>> to rename it.
>>>
>>> Here's a version of the patch that applies on top of that one.
>>> Tested as before.  OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> 2017-08-24  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * caller-save.c (mark_referenced_regs):  Use read_modify_subreg_p.
>>> * combine.c (find_single_use_1): Likewise.
>>> (expand_field_assignment): Likewise.
>>> (move_deaths): Likewise.
>>> * lra-constraints.c (simplify_operand_subreg): Likewise.
>>> (curr_insn_transform): Likewise.
>>> * lra.c (collect_non_operand_hard_regs): Likewise.
>>> (add_regs_to_insn_regno_info): Likewise.
>>> * rtlanal.c (reg_referenced_p): Likewise.
>>> (covers_regno_no_parallel_p): Likewise.
>>>
>>
>>
>>> Index: gcc/combine.c
>>> ===
>>> --- gcc/combine.c   2017-08-24 19:22:26.163269637 +0100
>>> +++ gcc/combine.c   2017-08-24 19:22:45.218100970 +0100
>>> @@ -579,10 +579,7 @@ find_single_use_1 (rtx dest, rtx *loc)
>>>   && !REG_P (SET_DEST (x))
>>>   && ! (GET_CODE (SET_DEST (x)) == SUBREG
>>> && REG_P (SUBREG_REG (SET_DEST (x)))
>>> -   && (((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
>>> - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
>>> -   == ((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
>>> -+ (UNITS_PER_WORD - 1)) / UNITS_PER_WORD
>>> +   && !read_modify_subreg_p (SET_DEST (x
>>> break;
>> Is this correct for a paradoxical subreg?  ISTM the original code was
>> checking for a subreg that just changes the mode, but not the size
>> (subreg:SI (reg:SF)) or (subreg:DF (reg:DI)) kinds of things.  It would
>> reject a paradoxical AFAICT.
>>
>> As written now I think the condition would be true for a paradoxical.
>>
>> Similarly for the other two instances in combine.c and the changes in
>> rtlanal.c.
>>
>> In some of those cases you might be able to argue that it's the right
>> way to handle a paradoxical.  I haven't thought a whole lot about that
>> angle, but mention it as a possible way your change might still be correct.
> 
> Yeah, I agree this'll change the handling of paradoxical subregs that
> occupy more words than the SUBREG_REG, but I think the new version is
> correct.  The comment says:
> 
>   /* If the destination is anything other than CC0, PC, a REG or a SUBREG
>of a REG that occupies all of the REG, the insn uses DEST if
>it is mentioned in the destination or the source.  Otherwise, we
>need just check the source.  */
> 
> and a paradoxical subreg does occupy all of the SUBREG_REG.
> 
> The code is trying to work out whether the instruction "reads" the
> destination if you view partial stores as a read of the old value
> followed by a write of a partially-updated value, whereas writing to a
> paradoxical subreg preserves none of the original value.  And that's
> also the semantics that the current code uses for "normal" word-sized
> paradoxical subregs.
OK.Thanks for clarifying.

Jeff


[RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-23 Thread Jeff Law

Martin,

I'd like your thoughts on this patch.

One of the things I'm working on is changes that would allow passes that
use dominator walks to trivially perform context sensitive range
analysis as a part of their dominator walk.

As I outlined earlier this would allow us to easily fix the false
positive sprintf warning reported a week or two ago.

This patch converts the sprintf warning code to perform a dominator walk
rather than just walking the blocks in whatever order they appear in the
basic block array.

>From an implementation standpoint we derive a new class sprintf_dom_walk
from the dom_walker class.  Like other dom walkers we walk statements
from within the before_dom_children member function.  Very standard stuff.

I moved handle_gimple_call and various dependencies into the
sprintf_dom_walker class to facilitate calling handle_gimple_call from
within the before_dom_children member function.  There's light fallout
in various places where the call_info structure was explicitly expected
to be found in the pass_sprintf_length class, but is now found in the
sprintf_dom_walker class.

This has been bootstrapped and regression tested on x86_64-linux-gnu.
I've also layered my embedded VRP analysis on top of this work and
verified that it does indeed fix the reported false positive.

Thoughts?

Jeff






* gimple-ssa-sprintf.c: Include domwalk.h.
(class sprintf_dom_walker): New class, derived from dom_walker.
(sprintf_dom_walker::before_dom_children): New function.
(struct call_info): Moved into sprintf_dom_walker class
(compute_formath_length, handle_gimple_call): Likewise.
(sprintf_length::execute): Call the dominator walker rather
than walking the statements.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 9770df7..2223f24 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "toplev.h"
 #include "substring-locations.h"
 #include "diagnostic.h"
+#include "domwalk.h"
 
 /* The likely worst case value of MB_LEN_MAX for the target, large enough
for UTF-8.  Ideally, this would be obtained by a target hook if it were
@@ -113,6 +114,19 @@ static int warn_level;
 
 struct format_result;
 
+class sprintf_dom_walker : public dom_walker
+{
+ public:
+  sprintf_dom_walker () : dom_walker (CDI_DOMINATORS) {}
+  ~sprintf_dom_walker () {}
+
+  virtual edge before_dom_children (basic_block);
+  bool handle_gimple_call (gimple_stmt_iterator *);
+
+  struct call_info;
+  bool compute_format_length (call_info &, format_result *);
+};
+
 class pass_sprintf_length : public gimple_opt_pass
 {
   bool fold_return_value;
@@ -135,10 +149,6 @@ public:
   fold_return_value = param;
 }
 
-  bool handle_gimple_call (gimple_stmt_iterator *);
-
-  struct call_info;
-  bool compute_format_length (call_info &, format_result *);
 };
 
 bool
@@ -976,7 +986,7 @@ bytes_remaining (unsigned HOST_WIDE_INT navail, const 
format_result )
 
 /* Description of a call to a formatted function.  */
 
-struct pass_sprintf_length::call_info
+struct sprintf_dom_walker::call_info
 {
   /* Function call statement.  */
   gimple *callstmt;
@@ -2348,7 +2358,7 @@ format_plain (const directive , tree)
should be diagnosed given the AVAILable space in the destination.  */
 
 static bool
-should_warn_p (const pass_sprintf_length::call_info ,
+should_warn_p (const sprintf_dom_walker::call_info ,
   const result_range , const result_range )
 {
   if (result.max <= avail.min)
@@ -2419,7 +2429,7 @@ should_warn_p (const pass_sprintf_length::call_info ,
 
 static bool
 maybe_warn (substring_loc , location_t argloc,
-   const pass_sprintf_length::call_info ,
+   const sprintf_dom_walker::call_info ,
const result_range _range, const result_range ,
const directive )
 {
@@ -2716,7 +2726,7 @@ maybe_warn (substring_loc , location_t argloc,
in *RES.  Return true if the directive has been handled.  */
 
 static bool
-format_directive (const pass_sprintf_length::call_info ,
+format_directive (const sprintf_dom_walker::call_info ,
  format_result *res, const directive )
 {
   /* Offset of the beginning of the directive from the beginning
@@ -3004,7 +3014,7 @@ format_directive (const pass_sprintf_length::call_info 
,
the directive.  */
 
 static size_t
-parse_directive (pass_sprintf_length::call_info ,
+parse_directive (sprintf_dom_walker::call_info ,
 directive , format_result *res,
 const char *str, unsigned *argno)
 {
@@ -3431,7 +3441,7 @@ parse_directive (pass_sprintf_length::call_info ,
that caused the processing to be terminated early).  */
 
 bool
-pass_sprintf_length::compute_format_length (call_info ,
+sprintf_dom_walker::compute_format_length (call_info ,
format_result *res)
 {
   if (dump_file)
@@ -3514,7 +3524,7 @@ 

Re: [PATCH, rs6000] Add Power 9 support for vec_first builtins

2017-10-23 Thread Segher Boessenkool
Hi Carl,

On Thu, Oct 19, 2017 at 04:31:13PM -0700, Carl Love wrote:
>   * config/rs6000/rs6000-builtin.def (VFIRSTMATCHINDEX,
>   VFIRSTMATCHOREOSINDEX, VFIRSTMISMATCHINDEX, VFIRSTMISMATCHOREOSINDEX):
>   Add BU_P9V_AV_2 expansions for the builtins.

Those names are a bit unwieldy ;-)

> +BU_P9V_OVERLOAD_2 (VFIRSTMISMATCHOREOSINDEX, 'first_mismatch_or_eos_index")

How did this compile?


Segher


Re: [Patch] Edit contrib/ files to download gfortran prerequisites

2017-10-23 Thread Damian Rouson
 


On October 21, 2017 at 11:17:49 AM, Bernhard Reutner-Fischer 
(rep.dot@gmail.com(mailto:rep.dot@gmail.com)) wrote:

>  
> + die "Invoking wget and curl in the 'download_prerequisites' script failed."
>  
> I suggest "Neither wget nor curl found, cannot download tarballs.”

Thanks for the suggestion.  I have now replaced the previous message with your 
suggested one.

>  
> As an open-mpi user i question why this hardcodes MPICH, fwiw.

The primary reason is because Fortran 2015 has features that support the 
writing 
of fault-tolerant parallel algorithms. GCC 7 supports these features using 
OpenCoarrays,
which in turn uses features that are in MPICH 3.2.  I don’t think there is 
anything
comparable in an OpenMPI release yet, although I’m aware that OpenMPI 
developers are
working on it.  

Nonetheless, I would be glad to add a “—-with-mpi” flag that would enable users 
to specify
the MPI of their choice. We could also allow for non-MPI options.  For example, 
OpenSHEM
has been tested as an alternative recently.  Either way, I’m thinking I’ll hold 
off on 
investing any more time in this patch for now.  It seems unlikely to be 
approved for the
trunk until other work completes and it’s uncertain when the other work will 
complete.

Damian


Re: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread H.J. Lu
On Mon, Oct 23, 2017 at 3:19 PM, Tsimbalist, Igor V
 wrote:
> You are right. The functions in the tests should be changed to static scope 
> to trigger the check in the patch. After that I expect there should be no 
> endbr generated at all for the static functions and that's is wrong.
>

Here is the updated patch with new testcases.  OK for trunk if there are
no regressions?

Thanks.

H.J.
>
>> -Original Message-
>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> Sent: Tuesday, October 24, 2017 12:06 AM
>> To: Tsimbalist, Igor V 
>> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
>> directly
>>
>> On Mon, Oct 23, 2017 at 3:01 PM, Tsimbalist, Igor V
>>  wrote:
>> > Existing tests cet-label.c cet-switch-2.c cet-sjlj-1.c cet-sjlj-3.c should 
>> > catch
>> this.
>>
>> There are no regressions with my patch.  Did I miss something?
>>
>> > Igor
>> >
>> >
>> >> -Original Message-
>> >> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> >> Sent: Monday, October 23, 2017 11:50 PM
>> >> To: Tsimbalist, Igor V 
>> >> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
>> >> directly
>> >>
>> >> On Mon, Oct 23, 2017 at 2:44 PM, Tsimbalist, Igor V
>> >>  wrote:
>> >> > The change will skip a whole function from endbr processing by
>> >> rest_of_insert_endbranch,
>> >> > which inserts endbr not only at the beginning of the function but inside
>> the
>> >> function's
>> >> > body also. For example, tests with setjmp should fail.
>> >> >
>> >> > I would suggest to insert the check in rest_of_insert_endbranch
>> function,
>> >> something like this
>> >> >
>> >> >   if (!(lookup_attribute ("nocf_check",
>> >> >   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
>> >> > || cgraph_node::get (fun->decl)->only_called_directly_p ())
>> >> >
>> >> > Igor
>> >>
>> >> Can you provide one test for each case to cover all of them?
>> >>
>> >>
>> >> >
>> >> >> -Original Message-
>> >> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> >> >> Sent: Monday, October 23, 2017 9:26 PM
>> >> >> To: H.J. Lu 
>> >> >> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
>> >> >> 
>> >> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only
>> called
>> >> >> directly
>> >> >>
>> >> >> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
>> >> >> > There is no need to insert ENDBR instruction if function is only 
>> >> >> > called
>> >> >> > directly.
>> >> >> >
>> >> >> > OK for trunk if there is no regressions?
>> >> >>
>> >> >> Patch needs to be OK'd by Igor first.
>> >> >>
>> >> >> Uros.
>> >> >>
>> >> >> > H.J.
>> >> >> > 
>> >> >> > gcc/
>> >> >> >
>> >> >> > PR target/82659
>> >> >> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
>> >> >> > false if function is only called directly.
>> >> >> >
>> >> >> > gcc/testsuite/
>> >> >> >
>> >> >> > PR target/82659
>> >> >> > * gcc.target/i386/pr82659-1.c: New test.
>> >> >> > * gcc.target/i386/pr82659-2.c: Likewise.
>> >> >> > * gcc.target/i386/pr82659-3.c: Likewise.
>> >> >> > * gcc.target/i386/pr82659-4.c: Likewise.
>> >> >> > * gcc.target/i386/pr82659-5.c: Likewise.
>> >> >> > * gcc.target/i386/pr82659-6.c: Likewise.
>> >> >> > ---
>> >> >> >  gcc/config/i386/i386.c|  6 --
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19
>> +++
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18
>> ++
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21
>> >> +
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
>> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19
>> +++
>> >> >> >  7 files changed, 106 insertions(+), 2 deletions(-)
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
>> >> >> >
>> >> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> >> >> > index fb0b7e71469..b86504378ae 100644
>> >> >> > --- a/gcc/config/i386/i386.c
>> >> >> > +++ b/gcc/config/i386/i386.c
>> >> >> > @@ -2693,9 +2693,11 @@ public:
>> >> >> >{}
>> >> >> >
>> >> >> >/* opt_pass 

RE: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread Tsimbalist, Igor V
You are right. The functions in the tests should be changed to static scope to 
trigger the check in the patch. After that I expect there should be no endbr 
generated at all for the static functions and that's is wrong.

Igor


> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Tuesday, October 24, 2017 12:06 AM
> To: Tsimbalist, Igor V 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
> directly
> 
> On Mon, Oct 23, 2017 at 3:01 PM, Tsimbalist, Igor V
>  wrote:
> > Existing tests cet-label.c cet-switch-2.c cet-sjlj-1.c cet-sjlj-3.c should 
> > catch
> this.
> 
> There are no regressions with my patch.  Did I miss something?
> 
> > Igor
> >
> >
> >> -Original Message-
> >> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> >> Sent: Monday, October 23, 2017 11:50 PM
> >> To: Tsimbalist, Igor V 
> >> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
> >> directly
> >>
> >> On Mon, Oct 23, 2017 at 2:44 PM, Tsimbalist, Igor V
> >>  wrote:
> >> > The change will skip a whole function from endbr processing by
> >> rest_of_insert_endbranch,
> >> > which inserts endbr not only at the beginning of the function but inside
> the
> >> function's
> >> > body also. For example, tests with setjmp should fail.
> >> >
> >> > I would suggest to insert the check in rest_of_insert_endbranch
> function,
> >> something like this
> >> >
> >> >   if (!(lookup_attribute ("nocf_check",
> >> >   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
> >> > || cgraph_node::get (fun->decl)->only_called_directly_p ())
> >> >
> >> > Igor
> >>
> >> Can you provide one test for each case to cover all of them?
> >>
> >>
> >> >
> >> >> -Original Message-
> >> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> >> >> Sent: Monday, October 23, 2017 9:26 PM
> >> >> To: H.J. Lu 
> >> >> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> >> >> 
> >> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only
> called
> >> >> directly
> >> >>
> >> >> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
> >> >> > There is no need to insert ENDBR instruction if function is only 
> >> >> > called
> >> >> > directly.
> >> >> >
> >> >> > OK for trunk if there is no regressions?
> >> >>
> >> >> Patch needs to be OK'd by Igor first.
> >> >>
> >> >> Uros.
> >> >>
> >> >> > H.J.
> >> >> > 
> >> >> > gcc/
> >> >> >
> >> >> > PR target/82659
> >> >> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
> >> >> > false if function is only called directly.
> >> >> >
> >> >> > gcc/testsuite/
> >> >> >
> >> >> > PR target/82659
> >> >> > * gcc.target/i386/pr82659-1.c: New test.
> >> >> > * gcc.target/i386/pr82659-2.c: Likewise.
> >> >> > * gcc.target/i386/pr82659-3.c: Likewise.
> >> >> > * gcc.target/i386/pr82659-4.c: Likewise.
> >> >> > * gcc.target/i386/pr82659-5.c: Likewise.
> >> >> > * gcc.target/i386/pr82659-6.c: Likewise.
> >> >> > ---
> >> >> >  gcc/config/i386/i386.c|  6 --
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19
> +++
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18
> ++
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21
> >> +
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
> >> >> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19
> +++
> >> >> >  7 files changed, 106 insertions(+), 2 deletions(-)
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
> >> >> >
> >> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> >> > index fb0b7e71469..b86504378ae 100644
> >> >> > --- a/gcc/config/i386/i386.c
> >> >> > +++ b/gcc/config/i386/i386.c
> >> >> > @@ -2693,9 +2693,11 @@ public:
> >> >> >{}
> >> >> >
> >> >> >/* opt_pass methods: */
> >> >> > -  virtual bool gate (function *)
> >> >> > +  virtual bool gate (function *fun)
> >> >> >  {
> >> >> > -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
> >> >> > +  return ((flag_cf_protection & CF_BRANCH)
> >> >> > + && TARGET_IBT
> >> >> > +  

Re: Make tests failing with version namespace UNSUPPORTED

2017-10-23 Thread Jonathan Wakely

On 23/10/17 22:07 +0200, François Dumont wrote:

Hi

    I completed execution of all tests and added the 
dg-require-normal-namespace to a few more files.


    I also eventually prefer to keep dg-require-normal-mode and 
dg-require-normal-namespace seperated, the first for alternative 
modes, the latter for versioned namespace.


That seems better, thanks.

    With this patch there is no more conformance tests failing with 
versioned namespace.


Excellent!


Ok to commit ?


Yes, thanks.



Re: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread H.J. Lu
On Mon, Oct 23, 2017 at 3:01 PM, Tsimbalist, Igor V
 wrote:
> Existing tests cet-label.c cet-switch-2.c cet-sjlj-1.c cet-sjlj-3.c should 
> catch this.

There are no regressions with my patch.  Did I miss something?

> Igor
>
>
>> -Original Message-
>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> Sent: Monday, October 23, 2017 11:50 PM
>> To: Tsimbalist, Igor V 
>> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
>> directly
>>
>> On Mon, Oct 23, 2017 at 2:44 PM, Tsimbalist, Igor V
>>  wrote:
>> > The change will skip a whole function from endbr processing by
>> rest_of_insert_endbranch,
>> > which inserts endbr not only at the beginning of the function but inside 
>> > the
>> function's
>> > body also. For example, tests with setjmp should fail.
>> >
>> > I would suggest to insert the check in rest_of_insert_endbranch function,
>> something like this
>> >
>> >   if (!(lookup_attribute ("nocf_check",
>> >   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
>> > || cgraph_node::get (fun->decl)->only_called_directly_p ())
>> >
>> > Igor
>>
>> Can you provide one test for each case to cover all of them?
>>
>>
>> >
>> >> -Original Message-
>> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> >> Sent: Monday, October 23, 2017 9:26 PM
>> >> To: H.J. Lu 
>> >> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
>> >> 
>> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
>> >> directly
>> >>
>> >> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
>> >> > There is no need to insert ENDBR instruction if function is only called
>> >> > directly.
>> >> >
>> >> > OK for trunk if there is no regressions?
>> >>
>> >> Patch needs to be OK'd by Igor first.
>> >>
>> >> Uros.
>> >>
>> >> > H.J.
>> >> > 
>> >> > gcc/
>> >> >
>> >> > PR target/82659
>> >> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
>> >> > false if function is only called directly.
>> >> >
>> >> > gcc/testsuite/
>> >> >
>> >> > PR target/82659
>> >> > * gcc.target/i386/pr82659-1.c: New test.
>> >> > * gcc.target/i386/pr82659-2.c: Likewise.
>> >> > * gcc.target/i386/pr82659-3.c: Likewise.
>> >> > * gcc.target/i386/pr82659-4.c: Likewise.
>> >> > * gcc.target/i386/pr82659-5.c: Likewise.
>> >> > * gcc.target/i386/pr82659-6.c: Likewise.
>> >> > ---
>> >> >  gcc/config/i386/i386.c|  6 --
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19 +++
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18 ++
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21
>> +
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
>> >> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19 +++
>> >> >  7 files changed, 106 insertions(+), 2 deletions(-)
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
>> >> >
>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> >> > index fb0b7e71469..b86504378ae 100644
>> >> > --- a/gcc/config/i386/i386.c
>> >> > +++ b/gcc/config/i386/i386.c
>> >> > @@ -2693,9 +2693,11 @@ public:
>> >> >{}
>> >> >
>> >> >/* opt_pass methods: */
>> >> > -  virtual bool gate (function *)
>> >> > +  virtual bool gate (function *fun)
>> >> >  {
>> >> > -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
>> >> > +  return ((flag_cf_protection & CF_BRANCH)
>> >> > + && TARGET_IBT
>> >> > + && !cgraph_node::get (fun->decl)->only_called_directly_p 
>> >> > ());
>> >> >  }
>> >> >
>> >> >virtual unsigned int execute (function *)
>> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >> b/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >> > new file mode 100644
>> >> > index 000..8f0a6906815
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >> > @@ -0,0 +1,19 @@
>> >> > +/* { dg-do compile } */
>> >> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
>> >> > +/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
>> >> > +/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } 
>> >> > } } */
>> >> > +
>> >> > +extern int x;
>> >> > +
>> >> > +static void
>> >> > 

Re: [Patch] Edit contrib/ files to download gfortran prerequisites

2017-10-23 Thread Damian Rouson


On October 23, 2017 at 3:54:22 AM, Richard Biener (richard.guent...@gmail.com) 
wrote:

On Sat, Oct 21, 2017 at 2:26 AM, Damian Rouson 
 wrote: 
> 
> Hi Richard, 
> 
> Attached is a revised patch that makes the downloading of Fortran 
> prerequisites optional via a new --no-fortran flag that can be passed to 
> contrib/download_prerequisites as requested in your reply below. 
> 
> As Jerry mentioned in his response, he has been working on edits to the 
> top-level build machinery, but we need additional guidance to complete his 
> work. Given that there were no responses to his request for guidance and it’s 
> not clear when that work will complete, I’m hoping this minor change can be 
> approved independently so that this patch doesn’t suffer bit rot in the 
> interim. 

But the change doesn't make sense without the build actually picking up things. 
Each prerequisite tar ball contains build scripts so the change is useful even 
without the ultimate integration into the GCC build system.  Our first step was 
to get the tar balls onto the GCC ftp server.  Our next step was to set up for 
the files to be downloaded automatically with this patch.  For now, users can 
use the build scripts in the prerequisite tar balls.  Our final step will be to 
finish the integration into the GCC build system.   Jerry has requested 
guidance on this mailing list, but I don’t think there have been any replies to 
his request.  In the interim, I’m stuck updating this patch indefinitely.  I’ve 
already updated it several times because the files it patches are changing.  

Most importantly, the features these prerequisites enable are part of the 
Fortran standard so they are necessary for full access to the language — namely 
access to any Fortran 2008 or Fortran 2015 parallel features.  I hope I’ve made 
a case for the benefit of the patch. Is there a way in which the patch is 
harmful?

Damian

RE: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread Tsimbalist, Igor V
Existing tests cet-label.c cet-switch-2.c cet-sjlj-1.c cet-sjlj-3.c should 
catch this.

Igor


> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Monday, October 23, 2017 11:50 PM
> To: Tsimbalist, Igor V 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
> directly
> 
> On Mon, Oct 23, 2017 at 2:44 PM, Tsimbalist, Igor V
>  wrote:
> > The change will skip a whole function from endbr processing by
> rest_of_insert_endbranch,
> > which inserts endbr not only at the beginning of the function but inside the
> function's
> > body also. For example, tests with setjmp should fail.
> >
> > I would suggest to insert the check in rest_of_insert_endbranch function,
> something like this
> >
> >   if (!(lookup_attribute ("nocf_check",
> >   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
> > || cgraph_node::get (fun->decl)->only_called_directly_p ())
> >
> > Igor
> 
> Can you provide one test for each case to cover all of them?
> 
> 
> >
> >> -Original Message-
> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> >> Sent: Monday, October 23, 2017 9:26 PM
> >> To: H.J. Lu 
> >> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> >> 
> >> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
> >> directly
> >>
> >> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
> >> > There is no need to insert ENDBR instruction if function is only called
> >> > directly.
> >> >
> >> > OK for trunk if there is no regressions?
> >>
> >> Patch needs to be OK'd by Igor first.
> >>
> >> Uros.
> >>
> >> > H.J.
> >> > 
> >> > gcc/
> >> >
> >> > PR target/82659
> >> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
> >> > false if function is only called directly.
> >> >
> >> > gcc/testsuite/
> >> >
> >> > PR target/82659
> >> > * gcc.target/i386/pr82659-1.c: New test.
> >> > * gcc.target/i386/pr82659-2.c: Likewise.
> >> > * gcc.target/i386/pr82659-3.c: Likewise.
> >> > * gcc.target/i386/pr82659-4.c: Likewise.
> >> > * gcc.target/i386/pr82659-5.c: Likewise.
> >> > * gcc.target/i386/pr82659-6.c: Likewise.
> >> > ---
> >> >  gcc/config/i386/i386.c|  6 --
> >> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19 +++
> >> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18 ++
> >> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21
> +
> >> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
> >> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
> >> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19 +++
> >> >  7 files changed, 106 insertions(+), 2 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
> >> >
> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> > index fb0b7e71469..b86504378ae 100644
> >> > --- a/gcc/config/i386/i386.c
> >> > +++ b/gcc/config/i386/i386.c
> >> > @@ -2693,9 +2693,11 @@ public:
> >> >{}
> >> >
> >> >/* opt_pass methods: */
> >> > -  virtual bool gate (function *)
> >> > +  virtual bool gate (function *fun)
> >> >  {
> >> > -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
> >> > +  return ((flag_cf_protection & CF_BRANCH)
> >> > + && TARGET_IBT
> >> > + && !cgraph_node::get (fun->decl)->only_called_directly_p 
> >> > ());
> >> >  }
> >> >
> >> >virtual unsigned int execute (function *)
> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-1.c
> >> b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> >> > new file mode 100644
> >> > index 000..8f0a6906815
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> >> > @@ -0,0 +1,19 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
> >> > +/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
> >> > +/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } 
> >> > } } */
> >> > +
> >> > +extern int x;
> >> > +
> >> > +static void
> >> > +__attribute__ ((noinline, noclone))
> >> > +test (int i)
> >> > +{
> >> > +  x = i;
> >> > +}
> >> > +
> >> > +void
> >> > +bar (int i)
> >> > +{
> >> > +  test (i);
> >> > +}
> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-2.c
> >> b/gcc/testsuite/gcc.target/i386/pr82659-2.c
> >> > 

[PATCH] Fix #line __LINE__ handling and added conformance tests.

2017-10-23 Thread Max Woodbury
>From 62ab7123e73563b43f1833842a419aa66eca7ce2 Mon Sep 17 00:00:00 2001
From: Max T Woodbury 
Date: Mon, 23 Oct 2017 16:58:49 -0400

Copyright 2017 Max TenEyck Woodbury, Durham North Carolina
all rights assigned to the Free Software Foundation, Inc., 23 Oct 2017

The Problem:

There is a problem associated with writing portable code; it is sometimes
desirable to change the string returned by the __FILE__ and related macros or
diagnostic messages.  The #line directive lets you do this.  The problem is that
the #line directive also changes the sequence numbers of subsequent lines and
that may be an unwanted side effect.  The obvious solution is to put a number
into the new line number slot of the directive that will leave the line number
sequencing unchanged.  This creates a maintenance issue: any changes to the
module that changes the number of lines before the #line directive ends will
require changing the line number value in the directive.
You would think that this issue could be avoided by using the __LINE__ macro
to insert the proper number.  There are examples in the programming literature
that suggest exactly that.  Unfortunately an ambiguity in the interpreting the
standard makes the exact value generated by __LINE__ in the #line directive
implementation dependent.  One fairly obvious implementation has the pre-
processor collect the entire text of any directive together, including the '\n'
that ends the directive, before processing a directive.  Since some directives
suppress macro expansion, the expansion of the __LINE__ macro in a #line dir-
ective is postponed until the #line directive has been identified.  In that case
the __LINE__ macro expands to the number of the next input line, which is
exactly the value wanted.
Unfortunately there is another way to proceed; in this alternative, the type
of directive is identified "on the fly" and directives that allow macro
expansion also do the macro expansions "on-the-fly".  This results in __LINE__
values for the #line directive itself instead of the number for the next line,
which changes the subsequent line sequence.

The solution:

In the interest of portability, a special check has been added to the #line
directive implementation to see if the __LINE__ macro is being used to specify
the new line number.  If it is, a flag is set that suppresses the line number
change.
Note that this solution is not robust; hiding the __LINE__ reference inside
another macro will bypass this special case test.

Test cases have been added to the test suite to check this behavior.
---
 gcc/testsuite/ChangeLog  |  4 
 gcc/testsuite/gcc.dg/cpp/line4.c |  7 +++
 libcpp/ChangeLog |  4 
 libcpp/directives.c  | 12 
 4 files changed, 27 insertions(+)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f14930b..a65328f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2017-10-22  Max T. Woodbury 
+
+* gcc.dg/cpp/line4.c: New #line __LINE__ tests.
+
 2017-10-22  Uros Bizjak  

PR target/52451
diff --git a/gcc/testsuite/gcc.dg/cpp/line4.c b/gcc/testsuite/gcc.dg/cpp/line4.c
index 84dbf96..5902046 100644
--- a/gcc/testsuite/gcc.dg/cpp/line4.c
+++ b/gcc/testsuite/gcc.dg/cpp/line4.c
@@ -17,3 +17,10 @@ enum { j = __LINE__ };
 char array1[i== 44 ? 1 : -1];
 char array2[j== 90 ? 1 : -1];
 char array3[__LINE__ == 19 ? 1 : -1];
+
+#line __LINE__ /* N.B. this should not change the line sequence numbering.  */
+char array4[__LINE__ == 22 ? 1 : -1];
+#line __LINE__ /* N.B. extra lines in block comment should
+  not change the line sequence numbering.  */
+char array5[__LINE__ == 25 ? 1 : -1];
+
diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index f2c0d4d..31e1991 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,7 @@
+2017-10-23  Max T. Woodbury 
+
+* directives.c: Added #line __LINE__ special case handling.
+
 2017-10-10  Nathan Sidwell  

PR preprocessor/82506
diff --git a/libcpp/directives.c b/libcpp/directives.c
index 7cac653..a6243ac 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -964,6 +964,13 @@ do_line (cpp_reader *pfile)
   linenum_type cap = CPP_OPTION (pfile, c99) ? 2147483647 : 32767;
   bool wrapped;

+  /* #line __LINE__ should not change the line numbers. */
+  token = cpp_peek_token (pfile, 0);
+  bool no_change = (token->type == CPP_NAME &&
+token->val.node.node->type == NT_MACRO &&
+token->val.node.node->flags & NODE_BUILTIN &&
 char array1[i== 44 ? 1 : -1];
 char array2[j== 90 ? 1 : -1];
 char array3[__LINE__ == 19 ? 1 : -1];
+
+#line __LINE__ /* N.B. this should not change the line sequence numbering.  */
+char array4[__LINE__ == 22 ? 1 : -1];
+#line __LINE__ /* N.B. extra lines in block comment should

Re: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread H.J. Lu
On Mon, Oct 23, 2017 at 2:44 PM, Tsimbalist, Igor V
 wrote:
> The change will skip a whole function from endbr processing by 
> rest_of_insert_endbranch,
> which inserts endbr not only at the beginning of the function but inside the 
> function's
> body also. For example, tests with setjmp should fail.
>
> I would suggest to insert the check in rest_of_insert_endbranch function, 
> something like this
>
>   if (!(lookup_attribute ("nocf_check",
>   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
> || cgraph_node::get (fun->decl)->only_called_directly_p ())
>
> Igor

Can you provide one test for each case to cover all of them?


>
>> -Original Message-
>> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> Sent: Monday, October 23, 2017 9:26 PM
>> To: H.J. Lu 
>> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
>> 
>> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
>> directly
>>
>> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
>> > There is no need to insert ENDBR instruction if function is only called
>> > directly.
>> >
>> > OK for trunk if there is no regressions?
>>
>> Patch needs to be OK'd by Igor first.
>>
>> Uros.
>>
>> > H.J.
>> > 
>> > gcc/
>> >
>> > PR target/82659
>> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
>> > false if function is only called directly.
>> >
>> > gcc/testsuite/
>> >
>> > PR target/82659
>> > * gcc.target/i386/pr82659-1.c: New test.
>> > * gcc.target/i386/pr82659-2.c: Likewise.
>> > * gcc.target/i386/pr82659-3.c: Likewise.
>> > * gcc.target/i386/pr82659-4.c: Likewise.
>> > * gcc.target/i386/pr82659-5.c: Likewise.
>> > * gcc.target/i386/pr82659-6.c: Likewise.
>> > ---
>> >  gcc/config/i386/i386.c|  6 --
>> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19 +++
>> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18 ++
>> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21 +
>> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
>> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
>> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19 +++
>> >  7 files changed, 106 insertions(+), 2 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
>> >
>> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > index fb0b7e71469..b86504378ae 100644
>> > --- a/gcc/config/i386/i386.c
>> > +++ b/gcc/config/i386/i386.c
>> > @@ -2693,9 +2693,11 @@ public:
>> >{}
>> >
>> >/* opt_pass methods: */
>> > -  virtual bool gate (function *)
>> > +  virtual bool gate (function *fun)
>> >  {
>> > -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
>> > +  return ((flag_cf_protection & CF_BRANCH)
>> > + && TARGET_IBT
>> > + && !cgraph_node::get (fun->decl)->only_called_directly_p ());
>> >  }
>> >
>> >virtual unsigned int execute (function *)
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> b/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> > new file mode 100644
>> > index 000..8f0a6906815
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-1.c
>> > @@ -0,0 +1,19 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
>> > +/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
>> > +/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } 
>> > } */
>> > +
>> > +extern int x;
>> > +
>> > +static void
>> > +__attribute__ ((noinline, noclone))
>> > +test (int i)
>> > +{
>> > +  x = i;
>> > +}
>> > +
>> > +void
>> > +bar (int i)
>> > +{
>> > +  test (i);
>> > +}
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-2.c
>> b/gcc/testsuite/gcc.target/i386/pr82659-2.c
>> > new file mode 100644
>> > index 000..228a20006b6
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-2.c
>> > @@ -0,0 +1,18 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
>> > +/* { dg-final { scan-assembler-times "endbr32" 2 { target ia32 } } } */
>> > +/* { dg-final { scan-assembler-times "endbr64" 2 { target { ! ia32 } } } 
>> > } */
>> > +
>> > +extern int x;
>> > +
>> > +void
>> > +test (int i)
>> > +{
>> > +  x = i;
>> > +}
>> > +
>> > +void
>> > +bar (int i)
>> > +{
>> > +  test (i);
>> > +}
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-3.c
>> 

RE: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread Tsimbalist, Igor V
The change will skip a whole function from endbr processing by 
rest_of_insert_endbranch,
which inserts endbr not only at the beginning of the function but inside the 
function's
body also. For example, tests with setjmp should fail.

I would suggest to insert the check in rest_of_insert_endbranch function, 
something like this

  if (!(lookup_attribute ("nocf_check",
  TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl)))
|| cgraph_node::get (fun->decl)->only_called_directly_p ())

Igor


> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Monday, October 23, 2017 9:26 PM
> To: H.J. Lu 
> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> 
> Subject: Re: [PATCH] i386: Don't generate ENDBR if function is only called
> directly
> 
> On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
> > There is no need to insert ENDBR instruction if function is only called
> > directly.
> >
> > OK for trunk if there is no regressions?
> 
> Patch needs to be OK'd by Igor first.
> 
> Uros.
> 
> > H.J.
> > 
> > gcc/
> >
> > PR target/82659
> > * config/i386/i386.c (pass_insert_endbranch::gate): Return
> > false if function is only called directly.
> >
> > gcc/testsuite/
> >
> > PR target/82659
> > * gcc.target/i386/pr82659-1.c: New test.
> > * gcc.target/i386/pr82659-2.c: Likewise.
> > * gcc.target/i386/pr82659-3.c: Likewise.
> > * gcc.target/i386/pr82659-4.c: Likewise.
> > * gcc.target/i386/pr82659-5.c: Likewise.
> > * gcc.target/i386/pr82659-6.c: Likewise.
> > ---
> >  gcc/config/i386/i386.c|  6 --
> >  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19 +++
> >  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18 ++
> >  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21 +
> >  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
> >  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19 +++
> >  7 files changed, 106 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index fb0b7e71469..b86504378ae 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -2693,9 +2693,11 @@ public:
> >{}
> >
> >/* opt_pass methods: */
> > -  virtual bool gate (function *)
> > +  virtual bool gate (function *fun)
> >  {
> > -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
> > +  return ((flag_cf_protection & CF_BRANCH)
> > + && TARGET_IBT
> > + && !cgraph_node::get (fun->decl)->only_called_directly_p ());
> >  }
> >
> >virtual unsigned int execute (function *)
> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-1.c
> b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> > new file mode 100644
> > index 000..8f0a6906815
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
> > +/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
> > +/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } } 
> > */
> > +
> > +extern int x;
> > +
> > +static void
> > +__attribute__ ((noinline, noclone))
> > +test (int i)
> > +{
> > +  x = i;
> > +}
> > +
> > +void
> > +bar (int i)
> > +{
> > +  test (i);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-2.c
> b/gcc/testsuite/gcc.target/i386/pr82659-2.c
> > new file mode 100644
> > index 000..228a20006b6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-2.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
> > +/* { dg-final { scan-assembler-times "endbr32" 2 { target ia32 } } } */
> > +/* { dg-final { scan-assembler-times "endbr64" 2 { target { ! ia32 } } } } 
> > */
> > +
> > +extern int x;
> > +
> > +void
> > +test (int i)
> > +{
> > +  x = i;
> > +}
> > +
> > +void
> > +bar (int i)
> > +{
> > +  test (i);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/pr82659-3.c
> b/gcc/testsuite/gcc.target/i386/pr82659-3.c
> > new file mode 100644
> > index 000..6ae23e40abc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr82659-3.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fcf-protection -mcet" } */
> > +/* { dg-final { scan-assembler-times "endbr32" 

Re: [C++ Patch] PR 80449 ("[7/8 Regression] ICE reporting failed partial class template specialization class template argument deduction")

2017-10-23 Thread Jason Merrill
OK.

On Mon, Oct 23, 2017 at 4:36 PM, Paolo Carlini  wrote:
> Hi,
>
> this issue is by and large a duplicate of C++/79790, which I already fixed.
> There is a minor remaining nit: for the testcase, after the correct:
>
> error: cannot deduce template arguments of ‘C’, as it has no viable
> deduction guides
>
> we also emit the meaningless:
>
> error: too many initializers for ‘’
>
> only because in finish_compound_literal we don't check - as we do in most
> other places - the return value of do_auto_deduction for error_mark_node and
> it filters through until reshape_init. Tested x86_64-linux.
>
> Thanks, Paolo.
>
> 
>


Re: [PATCH, rs6000] 1/2 Add x86 SSE2 <emmintrin,h> intrinsics to GCC PPC64LE target

2017-10-23 Thread Segher Boessenkool
Hi!

On Tue, Oct 17, 2017 at 01:24:45PM -0500, Steven Munroe wrote:
> Some inline assembler is required. There a several cases where we need 
> to generate Data Cache Block instruction. There are no existing builtin
> for flush and touch for store transient.

Would builtins for those help?  Would anything else want to use such
builtins, I mean?

> +   For PowerISA Scalar double in FPRs (left most 64-bits of the
> +   low 32 VSRs), while X86_64 SSE2 uses the right most 64-bits of
> +   the XMM. These differences require extra steps on POWER to match
> +   the SSE2 scalar double semantics.

Maybe say "is in FPRs"?  (And two space after a full stop, here and
elsewhere).

> +/* We need definitions from the SSE header files*/

Dot space space.

> +/* Sets the low DPFP value of A from the low value of B.  */
> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_move_sd (__m128d __A, __m128d __B)
> +{
> +#if 1
> +  __v2df result = (__v2df) __A;
> +  result [0] = ((__v2df) __B)[0];
> +  return (__m128d) result;
> +#else
> +  return (vec_xxpermdi(__A, __B, 1));
> +#endif
> +}

You probably forgot to finish this?  Or, what are the two versions,
and why are they both here?  Same question later a few times.

> +/* Add the lower double-precision (64-bit) floating-point element in
> + * a and b, store the result in the lower element of dst, and copy
> + * the upper element from a to the upper element of dst. */

No leading stars on block comments please.

> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_cmpnge_pd (__m128d __A, __m128d __B)
> +{
> +  return ((__m128d)vec_cmplt ((__v2df ) __A, (__v2df ) __B));
> +}

You have some spaces before closing parentheses here (and elsewhere --
please check).

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_cvtpd_epi32 (__m128d __A)
> +{
> +  __v2df rounded = vec_rint (__A);
> +  __v4si result, temp;
> +  const __v4si vzero =
> +{ 0, 0, 0, 0 };
> +
> +  /* VSX Vector truncate Double-Precision to integer and Convert to
> +   Signed Integer Word format with Saturate.  */
> +  __asm__(
> +  "xvcvdpsxws %x0,%x1;\n"
> +  : "=wa" (temp)
> +  : "wa" (rounded)
> +  : );

Why the ";\n"?  And no empty clobber list please.

> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_cvtps_pd (__m128 __A)
> +{
> +  /* Check if vec_doubleh is defined by . If so use that. */
> +#ifdef vec_doubleh
> +  return (__m128d) vec_doubleh ((__v4sf)__A);
> +#else
> +  /* Otherwise the compiler is not current and so need to generate the
> + equivalent code.  */

Do we need this?  The compiler will always be current.

> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_loadl_pd (__m128d __A, double const *__B)
> +{
> +  __v2df result = (__v2df)__A;
> +  result [0] = *__B;
> +  return (__m128d)result;
> +}
> +#ifdef _ARCH_PWR8
> +/* Intrinsic functions that require PowerISA 2.07 minimum.  */

You want an empty line before that #ifdef.

> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_movemask_pd (__m128d  __A)
> +{
> +  __vector __m64 result;
> +  static const __vector unsigned int perm_mask =
> +{
> +#ifdef __LITTLE_ENDIAN__
> + 0x80800040, 0x80808080, 0x80808080, 0x80808080
> +#elif __BIG_ENDIAN__
> +  0x80808080, 0x80808080, 0x80808080, 0x80800040

Wrong indent in the LE case?

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_slli_epi16 (__m128i __A, int __B)
> +{
> +  __v8hu lshift;
> +  __v8hi result =
> +{ 0, 0, 0, 0, 0, 0, 0, 0 };

Could as well fit that on the same line.

> +  if (__B < 16)
> +{
> +  if (__builtin_constant_p(__B))
> + {
> +   lshift = (__v8hu) vec_splat_s16(__B);
> + }
> +  else
> + {
> +   lshift = vec_splats ((unsigned short) __B);
> + }

No blocks please, for single-line cases.

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_cmpeq_epi8 (__m128i __A, __m128i __B)
> +{
> +#if 1
> +  return (__m128i ) vec_cmpeq ((__v16qi) __A, (__v16qi)__B);
> +#else
> +  return (__m128i) ((__v16qi)__A == (__v16qi)__B);
> +#endif
> +}

Here's another #if 1.

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_sad_epu8 (__m128i __A, __m128i __B)
> +{
> +  __v16qu a, b;
> +  __v16qu vmin, vmax, vabsdiff;
> +  __v4si vsum;
> +  const __v4su zero = { 0, 0, 0, 0 };
> +  __v4si result;
> +
> +  a = (__v16qu) __A;
> +  b = (__v16qu) __B;
> +  vmin = vec_min (a, b);
> +  vmax = vec_max (a, b);
> +  vabsdiff = vec_sub (vmax, vmin);
> +  /* Sum four groups of bytes into integers.  */
> +  vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
> +  /* Sum across four 

[C++ Patch] PR 80449 ("[7/8 Regression] ICE reporting failed partial class template specialization class template argument deduction")

2017-10-23 Thread Paolo Carlini

Hi,

this issue is by and large a duplicate of C++/79790, which I already 
fixed. There is a minor remaining nit: for the testcase, after the correct:


    error: cannot deduce template arguments of ‘C’, as it has no 
viable deduction guides


we also emit the meaningless:

    error: too many initializers for ‘’

only because in finish_compound_literal we don't check - as we do in 
most other places - the return value of do_auto_deduction for 
error_mark_node and it filters through until reshape_init. Tested 
x86_64-linux.


Thanks, Paolo.



/cp
2017-10-23  Paolo Carlini  

PR c++/80449
* semantics.c (finish_compound_literal): Check do_auto_deduction
return value for error_mark_node.

/testsuite
2017-10-23  Paolo Carlini  

PR c++/80449
* g++.dg/cpp1z/class-deduction46.C: New.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 254005)
+++ cp/semantics.c  (working copy)
@@ -2711,8 +2711,12 @@ finish_compound_literal (tree type, tree compound_
 
   if (tree anode = type_uses_auto (type))
 if (CLASS_PLACEHOLDER_TEMPLATE (anode))
-  type = do_auto_deduction (type, compound_literal, anode, complain,
-   adc_variable_type);
+  {
+   type = do_auto_deduction (type, compound_literal, anode, complain,
+ adc_variable_type);
+   if (type == error_mark_node)
+ return error_mark_node;
+  }
 
   if (processing_template_decl)
 {
Index: testsuite/g++.dg/cpp1z/class-deduction46.C
===
--- testsuite/g++.dg/cpp1z/class-deduction46.C  (nonexistent)
+++ testsuite/g++.dg/cpp1z/class-deduction46.C  (working copy)
@@ -0,0 +1,6 @@
+// PR c++/80449
+// { dg-options -std=c++17 }
+
+template struct C;
+template<> struct C { C(int, int) {} };
+auto k = C{0, 0};  // { dg-error "cannot deduce" }


Re: [PATCH] Add INCLUDE_UNIQUE_PTR and use it (PR bootstrap/82610)

2017-10-23 Thread Richard Biener
On October 23, 2017 8:15:20 PM GMT+02:00, David Malcolm  
wrote:
>On Mon, 2017-10-23 at 16:40 +0100, Pedro Alves wrote:
>> On 10/23/2017 04:17 PM, Jonathan Wakely wrote:
>> > On 23/10/17 17:07 +0200, Michael Matz wrote:
>> > > Hi,
>> > > 
>> > > On Mon, 23 Oct 2017, Richard Biener wrote:
>> > > 
>> > > > I guess so. But we have to make gdb happy as well. It really
>> > > > depends how
>> > > > much each TU grows with the extra (unneeded) include grows in
>> > > > C++11 and
>> > > > C++04 mode.
>> > > 
>> > > The c++ headers unconditionally included from system.h, with:
>> > > 
>> > > % echo '#include <$name>' | g++-7 -E -x c++ - | wc -l
>> > > new:  3564
>> > > cstring:   533
>> > > utility:  3623
>> > > memory:  28066
>> > 
>> > That's using the -std=gnu++4 default for g++-7, and for that mode
>> > the header *is* needed, to get the definition of std::unique_ptr.
>> > 
>> > For C++98 (when it isn't needed) that header is much smaller:
>> > 
>> > tmp$ echo '#include ' | g++ -E -x c++ - | wc -l
>> > 28101
>> > tmp$ echo '#include ' | g++ -E -x c++ - -std=gnu++98  | wc
>> > -l
>> > 4267
>> > 
>> > (Because it doesn't contain std::unique_ptr and std::shared_ptr
>> > before
>> > C++11).
>> > 
>> > > compile time:
>> > > % echo -e '#include <$name>\nint i;' | time g++-7 -c -x c++ -
>> > > new: 0:00.06elapsed, 17060maxresident, 0major+3709minor
>> > > cstring: 0:00.03elapsed, 13524maxresident, 0major+3075minor
>> > > utility: 0:00.05elapsed, 16952maxresident, 0major+3776minor
>> > > memory:  0:00.25elapsed, 40356maxresident, 0major+9764minor
>> > > 
>> > > Hence,  is not cheap at all, including it unconditionally
>> > > from
>> > > system.h when it isn't actually used by many things doesn't seem
>> > > a good
>> > > idea.
>> > > 
>> 
>> I think the real question is whether it makes a difference in
>> a full build.  There won't be many translation units that
>> don't include some other headers.  (though of course I won't
>> be surprised if it does make a difference.)
>> 
>> If it's a real issue, you could fix this like how the
>> other similar cases were handled by system.h, by adding this
>> in system.h:
>> 
>>  #ifdef __cplusplus
>>  #ifdef INCLUDE_UNIQUE_PTR
>>  # include "unique-ptr.h"
>>  #endif
>>  #endif
>> 
>> instead of unconditionally including  there,
>> and then translation units that want unique-ptr.h would
>> do "#define INCLUDE_UNIQUE_PTR" instead of #include "unique-ptr.h",
>> like done for a few other C++ headers.
>> 
>> (I maintain that IMO this is kind of self-inflicted GCC pain due
>> to the fact that "#pragma poison" poisons too much.  If #pragma
>> poison's behavior were adjusted (or a new variant/mode created) to
>> ignore references to the poisoned symbol names in system headers (or
>> something like that), then you wouldn't need this manual management
>> of header dependencies in gcc/system.h and the corresponding 
>> '#define INCLUDE_FOO' contortions.  There's nothing that you can
>> reasonably
>> do with a reference to a poisoned symbol in a system header, other
>> than
>> avoid having the system header have the '#pragma poison' in effect
>> when
>> its included, which leads to contortions like system.h's.  Note that
>> the poisoned names are _still used anyway_.  So can we come up with
>> a GCC change that would avoid having to worry about manually doing
>> this?  It'd likely help other projects too.)
>> 
>> Thanks,
>> Pedro Alves
>
>Here's a different patch, which instead moves the include of our
>"unique-ptr.h" to system.h (conditionalized on INCLUDE_UNIQUE_PTR),
>after the decl of "free" and before the redefinition of "abort".
>
>It also makes the include of  in unique-ptr.h be conditional
>on C++11 or later.
>
>Hence it makes the new stuff only be included for the places where
>we're actually using unique_ptr.
>
>Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
>using gcc 4.8 for the initial bootstrap (hence testing both gnu+03
>and then gnu++14 in the selftests, for stage 1 and stages 2 and 3
>respectively).
>
>I don't know if it actually fixes the bootstrap issue seen with
>clang on Darwin and OpenBSD, though - but I expect it to.
>
>OK for trunk?

OK. 

Richard. 

>gcc/ChangeLog:
>   PR bootstrap/82610
>   * system.h: Conditionally include "unique-ptr.h" if
>   INCLUDE_UNIQUE_PTR is defined.
>   * unique-ptr-tests.cc: Remove include of "unique-ptr.h" in favor
>   of defining INCLUDE_UNIQUE_PTR before including "system.h".
>
>include/ChangeLog:
>   * unique-ptr.h: Make include of  conditional on C++11 or
>   later.
>---
> gcc/system.h| 10 ++
> gcc/unique-ptr-tests.cc |  2 +-
> include/unique-ptr.h|  4 +++-
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
>diff --git a/gcc/system.h b/gcc/system.h
>index f0664e9..1714af4 100644
>--- a/gcc/system.h
>+++ b/gcc/system.h
>@@ -720,6 +720,16 @@ extern int vsnprintf (char *, size_t, const char
>*, va_list);
> #define __builtin_expect(a, b) 

Re: Make tests failing with version namespace UNSUPPORTED

2017-10-23 Thread François Dumont

Hi

    I completed execution of all tests and added the 
dg-require-normal-namespace to a few more files.


    I also eventually prefer to keep dg-require-normal-mode and 
dg-require-normal-namespace seperated, the first for alternative modes, 
the latter for versioned namespace.


    With this patch there is no more conformance tests failing with 
versioned namespace.


Ok to commit ?

François


On 27/09/2017 22:40, François Dumont wrote:

Hi

    I would like to propose to add a new dg-require-normal-namespace 
attribute to make several tests failing when version namespace is 
active UNSUPPORTED. It is like dg-require-normal-mode but also 
consider when version namespace is being used.


    I still need to complete execution of all tests with version 
namespace but I think that all tests will be ok then.


    I have also updated the code used for dg-require-normal-mode to 
include c++config.h in case users are changing this file to activate 
any mode.


    * testsuite/lib/libstdc++.exp 
([check_v3_target_normal_namespace]): New.

    * testsuite/lib/dg-options.exp ([dg-require-normal-namespace]): New,
    use latter.
    * testsuite/23_containers/headers/bitset/synopsis.cc: Replace
    dg-require-normal-mode with latter.
    * testsuite/23_containers/headers/deque/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/forward_list/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/list/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/map/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/set/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/vector/synopsis.cc: Likewise.
    * testsuite/23_containers/map/modifiers/erase/abi_tag.cc: Likewise.
    * testsuite/23_containers/multimap/modifiers/erase/abi_tag.cc: 
Likewise.
    * testsuite/23_containers/multiset/modifiers/erase/abi_tag.cc: 
Likewise.

    * testsuite/23_containers/set/modifiers/erase/abi_tag.cc: Likewise.

    Ok to commit ?

François




diff --git a/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc b/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
index e298374..91fdf37 100644
--- a/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
+++ b/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/from_chars/requirements.cc b/libstdc++-v3/testsuite/20_util/from_chars/requirements.cc
index 00b7d87..6afc918 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/requirements.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/requirements.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
+// { dg-require-normal-namespace "" }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/headers/functional/synopsis.cc b/libstdc++-v3/testsuite/20_util/headers/functional/synopsis.cc
index 466d3d4..c001daa 100644
--- a/libstdc++-v3/testsuite/20_util/headers/functional/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/functional/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
index adf5f48..95f42ac 100644
--- a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc b/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
index 71f1903..95308139 100644
--- a/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/requirements.cc b/libstdc++-v3/testsuite/20_util/to_chars/requirements.cc
index d50588b..4c13d8a 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/requirements.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/requirements.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
+// { dg-require-normal-namespace "" }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc b/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc
index d27d220..568d846 100644
--- a/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc
+++ b/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 // 

Re: [PATCH] Add INCLUDE_UNIQUE_PTR and use it (PR bootstrap/82610)

2017-10-23 Thread David Malcolm
On Mon, 2017-10-23 at 21:43 +0200, Gerald Pfeifer wrote:
> On Mon, 23 Oct 2017, David Malcolm wrote:
> > Here's a different patch, which instead moves the include of our
> > "unique-ptr.h" to system.h (conditionalized on INCLUDE_UNIQUE_PTR),
> > after the decl of "free" and before the redefinition of "abort".
> 
> Thanks for your persistance in tackling this, David!
> 
> > I don't know if it actually fixes the bootstrap issue seen with
> > clang on Darwin and OpenBSD, though - but I expect it to.
> 
> I didn't know OpenBSD was broken too. ;-)  I just applied your patch
> on the FreeBSD tester that exhibited the issue for me, and that is
> now
> happily into stage 2, whereas previous it would fail during stage 1.

Ooops, yes, I meant to say FreeBSD above.  Thanks for verifying the
fix.

Dave


Re: [PATCH] Add INCLUDE_UNIQUE_PTR and use it (PR bootstrap/82610)

2017-10-23 Thread Gerald Pfeifer
On Mon, 23 Oct 2017, David Malcolm wrote:
> Here's a different patch, which instead moves the include of our
> "unique-ptr.h" to system.h (conditionalized on INCLUDE_UNIQUE_PTR),
> after the decl of "free" and before the redefinition of "abort".

Thanks for your persistance in tackling this, David!

> I don't know if it actually fixes the bootstrap issue seen with
> clang on Darwin and OpenBSD, though - but I expect it to.

I didn't know OpenBSD was broken too. ;-)  I just applied your patch
on the FreeBSD tester that exhibited the issue for me, and that is now
happily into stage 2, whereas previous it would fail during stage 1.

Good night,
Gerald


Re: [PATCH] i386: Don't generate ENDBR if function is only called directly

2017-10-23 Thread Uros Bizjak
On Sun, Oct 22, 2017 at 4:13 PM, H.J. Lu  wrote:
> There is no need to insert ENDBR instruction if function is only called
> directly.
>
> OK for trunk if there is no regressions?

Patch needs to be OK'd by Igor first.

Uros.

> H.J.
> 
> gcc/
>
> PR target/82659
> * config/i386/i386.c (pass_insert_endbranch::gate): Return
> false if function is only called directly.
>
> gcc/testsuite/
>
> PR target/82659
> * gcc.target/i386/pr82659-1.c: New test.
> * gcc.target/i386/pr82659-2.c: Likewise.
> * gcc.target/i386/pr82659-3.c: Likewise.
> * gcc.target/i386/pr82659-4.c: Likewise.
> * gcc.target/i386/pr82659-5.c: Likewise.
> * gcc.target/i386/pr82659-6.c: Likewise.
> ---
>  gcc/config/i386/i386.c|  6 --
>  gcc/testsuite/gcc.target/i386/pr82659-1.c | 19 +++
>  gcc/testsuite/gcc.target/i386/pr82659-2.c | 18 ++
>  gcc/testsuite/gcc.target/i386/pr82659-3.c | 21 +
>  gcc/testsuite/gcc.target/i386/pr82659-4.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr82659-5.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr82659-6.c | 19 +++
>  7 files changed, 106 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82659-6.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index fb0b7e71469..b86504378ae 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2693,9 +2693,11 @@ public:
>{}
>
>/* opt_pass methods: */
> -  virtual bool gate (function *)
> +  virtual bool gate (function *fun)
>  {
> -  return ((flag_cf_protection & CF_BRANCH) && TARGET_IBT);
> +  return ((flag_cf_protection & CF_BRANCH)
> + && TARGET_IBT
> + && !cgraph_node::get (fun->decl)->only_called_directly_p ());
>  }
>
>virtual unsigned int execute (function *)
> diff --git a/gcc/testsuite/gcc.target/i386/pr82659-1.c 
> b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> new file mode 100644
> index 000..8f0a6906815
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr82659-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection -mcet" } */
> +/* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } } */
> +
> +extern int x;
> +
> +static void
> +__attribute__ ((noinline, noclone))
> +test (int i)
> +{
> +  x = i;
> +}
> +
> +void
> +bar (int i)
> +{
> +  test (i);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr82659-2.c 
> b/gcc/testsuite/gcc.target/i386/pr82659-2.c
> new file mode 100644
> index 000..228a20006b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr82659-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection -mcet" } */
> +/* { dg-final { scan-assembler-times "endbr32" 2 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "endbr64" 2 { target { ! ia32 } } } } */
> +
> +extern int x;
> +
> +void
> +test (int i)
> +{
> +  x = i;
> +}
> +
> +void
> +bar (int i)
> +{
> +  test (i);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr82659-3.c 
> b/gcc/testsuite/gcc.target/i386/pr82659-3.c
> new file mode 100644
> index 000..6ae23e40abc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr82659-3.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection -mcet" } */
> +/* { dg-final { scan-assembler-times "endbr32" 2 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "endbr64" 2 { target { ! ia32 } } } } */
> +
> +extern int x;
> +
> +static void
> +__attribute__ ((noinline, noclone))
> +test (int i)
> +{
> +  x = i;
> +}
> +
> +extern __typeof (test) foo __attribute__ ((alias ("test")));
> +
> +void
> +bar (int i)
> +{
> +  test (i);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr82659-4.c 
> b/gcc/testsuite/gcc.target/i386/pr82659-4.c
> new file mode 100644
> index 000..ca87264e98b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr82659-4.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection -mcet" } */
> +/* { dg-final { scan-assembler-times "endbr32" 2 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "endbr64" 2 { target { ! ia32 } } } } */
> +
> +static void
> +test (void)
> +{
> +}
> +
> +void *
> +bar (void)
> +{
> +  return test;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr82659-5.c 
> b/gcc/testsuite/gcc.target/i386/pr82659-5.c
> new file mode 100644
> index 000..c34eade0f90
> 

Re: [PATCH] Fix wrong-debug with i?86/x86_64 _GLOBAL_OFFSET_TABLE_ (PR debug/82630)

2017-10-23 Thread Uros Bizjak
On Mon, Oct 23, 2017 at 10:04 AM, Richard Biener  wrote:
> On Mon, 23 Oct 2017, Jakub Jelinek wrote:
>
>> On Mon, Oct 23, 2017 at 09:48:50AM +0200, Richard Biener wrote:
>> > > --- gcc/targhooks.c.jj2017-10-13 19:02:08.0 +0200
>> > > +++ gcc/targhooks.c   2017-10-20 14:26:07.945464025 +0200
>> > > @@ -177,6 +177,14 @@ default_legitimize_address_displacement
>> > >return false;
>> > >  }
>> > >
>> > > +bool
>> > > +default_const_not_ok_for_debug_p (rtx x)
>> > > +{
>> > > +  if (GET_CODE (x) == UNSPEC)
>> >
>> > What about UNSPEC_VOLATILE?
>>
>> This hook is called on the argument of CONST or SYMBOL_REF.
>> UNSPEC_VOLATILE can't appear inside of CONST, it wouldn't be CONST then.
>>
>> UNSPEC appearing outside of CONST is rejected unconditionally in
>> mem_loc_descriptor:
>> ...
>> case UNSPEC:
>> ...
>>   /* If delegitimize_address couldn't do anything with the UNSPEC, we
>>  can't express it in the debug info.  This can happen e.g. with some
>>  TLS UNSPECs.  */
>>   break;
>> and for UNSPEC_VOLATILE we just ICE, because var-tracking shouldn't let
>> those through:
>> default:
>>   if (flag_checking)
>> {
>>   print_rtl (stderr, rtl);
>>   gcc_unreachable ();
>> }
>>   break;
>
> Ok.  The patch looks fine from a middle-end point of view.

LGTM for the x86 part.

Thanks,
Uros.


Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-10-23 Thread François Dumont

Hi

     I completed execution of all tests and found one test impacted by 
this patch.


     It is a good example of the impact of the patch. Users won't be 
able to build a istreambuf_iterator at a point where the underlying 
streambuf is at end-of-stream and then put some data in the streambuf 
and then use the iterator. This is similar to what Petr was proposing, 
some eof iterator becoming valid again through an operation on the 
streambuf. I would prefer we forbid it completely or we accept it 
completely but current middle way situation is strange.


     The fix is easy, let the compiler build the streambuf_iterator 
when needed. Even if patch is not accepted I think we should keep the 
change on the test which is fragile.


François


On 13/10/2017 19:14, François Dumont wrote:

Hi

 Here is the last patch I will propose for istreambuf_iterator. 
This is mostly to remove the mutable keyword on _M_sbuf.


 To do so I had to reset _M_sbuf in valid places that is to say 
constructors and increment operators. Despite that we might still have 
eof iterators with _M_sbuf not null when you have for instance several 
iterator instance but only increment one. It seems fine to me because 
even in this case iterator will still be considered as eof and using 
several istreambuf_iterator to go through a given streambuf is not usual.


 As _M_sbuf is immutable I have been able to restore the simple 
call to _M_at_eof() in the increment operators debug check.


Ok to commit after successful tests ?

François






diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 081afe5..0a6c7f9 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -94,7 +94,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // the "end of stream" iterator value.
   // NB: This implementation assumes the "end of stream" value
   // is EOF, or -1.
-  mutable streambuf_type*	_M_sbuf;
+  streambuf_type*	_M_sbuf;
   int_type		_M_c;
 
 public:
@@ -110,11 +110,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Construct start of input stream iterator.
   istreambuf_iterator(istream_type& __s) _GLIBCXX_USE_NOEXCEPT
-  : _M_sbuf(__s.rdbuf()), _M_c(traits_type::eof()) { }
+  : _M_sbuf(__s.rdbuf()), _M_c(traits_type::eof())
+  { _M_init(); }
 
   ///  Construct start of streambuf iterator.
   istreambuf_iterator(streambuf_type* __s) _GLIBCXX_USE_NOEXCEPT
-  : _M_sbuf(__s), _M_c(traits_type::eof()) { }
+  : _M_sbuf(__s), _M_c(traits_type::eof())
+  { _M_init(); }
 
   ///  Return the current character pointed to by iterator.  This returns
   ///  streambuf.sgetc().  It cannot be assigned.  NB: The result of
@@ -138,13 +140,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator&
   operator++()
   {
-	__glibcxx_requires_cond(_M_sbuf &&
-(!_S_is_eof(_M_c) || !_S_is_eof(_M_sbuf->sgetc())),
+	__glibcxx_requires_cond(!_M_at_eof(),
 _M_message(__gnu_debug::__msg_inc_istreambuf)
 ._M_iterator(*this));
 
 	_M_sbuf->sbumpc();
 	_M_c = traits_type::eof();
+
+	if (_S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = 0;
+
 	return *this;
   }
 
@@ -152,14 +157,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator
   operator++(int)
   {
-	__glibcxx_requires_cond(_M_sbuf &&
-(!_S_is_eof(_M_c) || !_S_is_eof(_M_sbuf->sgetc())),
+	__glibcxx_requires_cond(!_M_at_eof(),
 _M_message(__gnu_debug::__msg_inc_istreambuf)
 ._M_iterator(*this));
 
 	istreambuf_iterator __old = *this;
 	__old._M_c = _M_sbuf->sbumpc();
 	_M_c = traits_type::eof();
+
+	if (_S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = __old._M_sbuf = 0;
+
 	return __old;
   }
 
@@ -172,12 +180,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return _M_at_eof() == __b._M_at_eof(); }
 
 private:
+  void
+  _M_init()
+  {
+	if (_M_sbuf && _S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = 0;
+  }
+
   int_type
   _M_get() const
   {
 	int_type __ret = _M_c;
-	if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = _M_sbuf->sgetc()))
-	  _M_sbuf = 0;
+	if (_M_sbuf && __builtin_expect(_S_is_eof(__ret), true))
+	  __ret = _M_sbuf->sgetc();
+
 	return __ret;
   }
 
@@ -391,10 +407,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		__c = __sb->snextc();
 	}
 
+	  if (!traits_type::eq_int_type(__c, __eof))
+	{
 	  __first._M_c = __eof;
+	  return __first;
+	}
 	}
 
-  return __first;
+  return __last;
 }
 
 // @} group iterators
diff --git a/libstdc++-v3/testsuite/22_locale/money_get/get/char/9.cc b/libstdc++-v3/testsuite/22_locale/money_get/get/char/9.cc
index 9b69956..476e38f 100644
--- a/libstdc++-v3/testsuite/22_locale/money_get/get/char/9.cc
+++ b/libstdc++-v3/testsuite/22_locale/money_get/get/char/9.cc
@@ -41,7 +41,6 @@ int main()
 = std::use_facet(liffey.getloc());
 
   typedef 

C++ PATCH for c++/77369, wrong noexcept handling with template type arguments

2017-10-23 Thread Jason Merrill
In C++14 and below, the exception-specification is not part of a
function type, so we need to drop it from template arguments;
otherwise, all uses of a particular template instantiation get the
exception specification that the first use happened to have.

This patch regresses the -Wnoexcept-type warning, as in this case we
discard the exception-specification long before we get to mangling.  I
explored fixing this for a while, but it seems to require a whole new
mechanism for propagating warnings through overload resolution.  And
this warning doesn't seem to be as important as I initially thought.

Tested x86_64-pc-linux-gnu, applying to trunk.

Jason
commit 30c1bda3bd426be189e7b61844d9801605a04d49
Author: Jason Merrill 
Date:   Tue Oct 17 10:38:03 2017 -0400

PR c++/77369 - wrong noexcept handling in C++14 and below

* tree.c (strip_typedefs): Canonicalize TYPE_RAISES_EXCEPTIONS.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 366f46f1506..48d40945af3 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -1439,7 +1439,11 @@ strip_typedefs (tree t, bool *remove_attributes)
  is_variant = true;
 
type = strip_typedefs (TREE_TYPE (t), remove_attributes);
-   changed = type != TREE_TYPE (t) || is_variant;
+   tree canon_spec = (flag_noexcept_type
+  ? canonical_eh_spec (TYPE_RAISES_EXCEPTIONS (t))
+  : NULL_TREE);
+   changed = (type != TREE_TYPE (t) || is_variant
+  || TYPE_RAISES_EXCEPTIONS (t) != canon_spec);
 
for (arg_node = TYPE_ARG_TYPES (t);
 arg_node;
@@ -1498,9 +1502,8 @@ strip_typedefs (tree t, bool *remove_attributes)
type_memfn_rqual (t));
  }
 
-   if (TYPE_RAISES_EXCEPTIONS (t))
- result = build_exception_variant (result,
-   TYPE_RAISES_EXCEPTIONS (t));
+   if (canon_spec)
+ result = build_exception_variant (result, canon_spec);
if (TYPE_HAS_LATE_RETURN_TYPE (t))
  TYPE_HAS_LATE_RETURN_TYPE (result) = 1;
   }
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept31.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept31.C
new file mode 100644
index 000..c4c0e7dd466
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept31.C
@@ -0,0 +1,12 @@
+// PR c++/77369
+// { dg-do compile { target c++11 } }
+
+template int caller(F f) noexcept(noexcept(f())) { f(); return 0; }
+
+void func1() noexcept { }
+
+void func2() { throw 1; }
+
+int instantiate_caller_with_func1 = caller(func1);
+
+static_assert( !noexcept(caller(func2)), "" );
diff --git a/gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C 
b/gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C
index 8eb3be0bd61..b51d7af2b11 100644
--- a/gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C
+++ b/gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C
@@ -5,7 +5,7 @@
 void foo () throw () {}// { dg-bogus "mangled name" }
 
 template 
-T bar (T x) { return x; }  // { dg-warning "mangled name" "" { target 
c++14_down } }
+T bar (T x) { return x; }
 
 void baz () {  // { dg-bogus "mangled name" }
   return (bar (foo)) ();


Re: [PATCH, i386]: Fix PR 82628, wrong code at -Os on x86_64-linux-gnu in the 32-bit mode

2017-10-23 Thread Uros Bizjak
On Mon, Oct 23, 2017 at 1:27 PM, Uros Bizjak  wrote:
> On Mon, Oct 23, 2017 at 1:07 PM, Jakub Jelinek  wrote:
>> On Mon, Oct 23, 2017 at 12:27:15PM +0200, Uros Bizjak wrote:
>>> On Mon, Oct 23, 2017 at 12:09 PM, Jakub Jelinek  wrote:
>>> > On Sun, Oct 22, 2017 at 08:04:28PM +0200, Uros Bizjak wrote:
>>> >> Hello!
>>> >>
>>> >> In PR 82628 Jakub figured out that insn patterns that consume carry
>>> >> flag were not 100% correct. Due to this issue, combine is able to
>>> >> simplify various CC_REG propagations that result in invalid code.
>>> >>
>>> >> Attached patch fixes (well, mitigates) the above problem by splitting
>>> >> the double-mode compare after the reload, in the same way other
>>> >> *_doubleword patterns are handled from "the beginning of the time".
>>> >
>>> > I'm afraid this is going to haunt us sooner or later, combine isn't the
>>> > only pass that uses simplify-rtx.c infrastructure heavily and when we lie
>>> > in the RTL pattern, eventually something will be simplified wrongly.
>>> >
>>> > So, at least we'd need to use UNSPEC for the pattern, like (only lightly
>>> > tested so far) below.
>>>
>>> I agree with the above. Patterns that consume Carry flag are now
>>> marked with (plus (ltu (...)), but effectively, they behave like
>>> unspecs. So, I see no problem to change all SBB and ADC to unspec at
>>> once, similar to the change you proposed in the patch.
>>
>> So like this (addcarry/subborrow defered to a separate patch)?
>> Or do you want to use UNSPEC even for the unsigned comparison case,
>> i.e. from the patch remove the predicates.md/constraints.md part,
>> sub3_carry_ccc{,_1} and anything related to that?
>
> Looking at the attached patch, I think, this won't be necessary
> anymore. The pattern is quite important for 32bit targets, so this
> fact warrants a couple of complicated patterns.
>
>> As for addcarry/subborrow, the problem is that we expect in the pr67317*
>> tests that combine is able to notice that the CF setter sets CF to
>> unconditional 0 and matches the pattern.  With the patch I wrote
>> we end up with the combiner trying to match an insn where the CCC
>> is set from a TImode comparison:
>> (parallel [
>> (set (reg:CC 17 flags)
>> (compare:CC (zero_extend:TI (plus:DI (reg/v:DI 92 [ a ])
>> (reg/v:DI 94 [ c ])))
>> (zero_extend:TI (reg/v:DI 94 [ c ]
>> (set (reg:DI 98)
>> (plus:DI (reg/v:DI 92 [ a ])
>> (reg/v:DI 94 [ c ])))
>> ])
>> So, either we need a define_insn_and_split pattern that would deal with
>> that (for UNSPEC it would be the same thing, have a define_insn_and_split
>> that would replace the (ltu...) with (const_int 0)), or perhaps be smarter
>> during expansion, if we see the first argument is constant 0, expand it
>> like a normal add instruction with CC setter.
>>
>> 2017-10-23  Jakub Jelinek  
>>
>> PR target/82628
>> * config/i386/predicates.md (x86_64_dwzext_immediate_operand): New.
>> * config/i386/constraints.md (Wf): New constraint.
>> * config/i386/i386.md (UNSPEC_SBB): New unspec.
>> (cmp_doubleword): Removed.
>> (sub3_carry_ccc, *sub3_carry_ccc_1): New patterns.
>> (sub3_carry_ccgz): Use unspec instead of compare.
>> * config/i386/i386.c (ix86_expand_branch) : Don't
>> expand with cmp_doubleword.  For LTU and GEU use
>> sub3_carry_ccc instead of sub3_carry_ccgz and use 
>> CCCmode.
>
> OK.

The patch also fixes PR 82662. I have added the following testcase and
closed the PR.

2017-10-23  Uros Bizjak  

PR target/82662
* gcc.target/i386/pr82662.c: New test.

Tested on x86_64-linux-gnu {,-m32} and committed to mailine SVN.

Uros.
Index: gcc.target/i386/pr82662.c
===
--- gcc.target/i386/pr82662.c   (nonexistent)
+++ gcc.target/i386/pr82662.c   (working copy)
@@ -0,0 +1,26 @@
+/* PR target/82580 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#ifdef __SIZEOF_INT128__
+typedef unsigned __int128 U;
+typedef signed __int128 S;
+#else
+typedef unsigned long long U;
+typedef signed long long S;
+#endif
+void bar (void);
+int f0 (U x, U y) { return x == y; }
+int f1 (U x, U y) { return x != y; }
+int f2 (U x, U y) { return x > y; }
+int f3 (U x, U y) { return x >= y; }
+int f4 (U x, U y) { return x < y; }
+int f5 (U x, U y) { return x <= y; }
+int f6 (S x, S y) { return x == y; }
+int f7 (S x, S y) { return x != y; }
+int f8 (S x, S y) { return x > y; }
+int f9 (S x, S y) { return x >= y; }
+int f10 (S x, S y) { return x < y; }
+int f11 (S x, S y) { return x <= y; }
+
+/* { dg-final { scan-assembler-times {\mset} 12 } } */


Re: [PATCH] Add INCLUDE_UNIQUE_PTR and use it (PR bootstrap/82610)

2017-10-23 Thread Pedro Alves
On 10/23/2017 07:15 PM, David Malcolm wrote:

> OK for trunk?

FAOD, FWIW, LGTM.

Thanks,
Pedro Alves



[PATCH] Add INCLUDE_UNIQUE_PTR and use it (PR bootstrap/82610)

2017-10-23 Thread David Malcolm
On Mon, 2017-10-23 at 16:40 +0100, Pedro Alves wrote:
> On 10/23/2017 04:17 PM, Jonathan Wakely wrote:
> > On 23/10/17 17:07 +0200, Michael Matz wrote:
> > > Hi,
> > > 
> > > On Mon, 23 Oct 2017, Richard Biener wrote:
> > > 
> > > > I guess so. But we have to make gdb happy as well. It really
> > > > depends how
> > > > much each TU grows with the extra (unneeded) include grows in
> > > > C++11 and
> > > > C++04 mode.
> > > 
> > > The c++ headers unconditionally included from system.h, with:
> > > 
> > > % echo '#include <$name>' | g++-7 -E -x c++ - | wc -l
> > > new:  3564
> > > cstring:   533
> > > utility:  3623
> > > memory:  28066
> > 
> > That's using the -std=gnu++4 default for g++-7, and for that mode
> > the header *is* needed, to get the definition of std::unique_ptr.
> > 
> > For C++98 (when it isn't needed) that header is much smaller:
> > 
> > tmp$ echo '#include ' | g++ -E -x c++ - | wc -l
> > 28101
> > tmp$ echo '#include ' | g++ -E -x c++ - -std=gnu++98  | wc
> > -l
> > 4267
> > 
> > (Because it doesn't contain std::unique_ptr and std::shared_ptr
> > before
> > C++11).
> > 
> > > compile time:
> > > % echo -e '#include <$name>\nint i;' | time g++-7 -c -x c++ -
> > > new: 0:00.06elapsed, 17060maxresident, 0major+3709minor
> > > cstring: 0:00.03elapsed, 13524maxresident, 0major+3075minor
> > > utility: 0:00.05elapsed, 16952maxresident, 0major+3776minor
> > > memory:  0:00.25elapsed, 40356maxresident, 0major+9764minor
> > > 
> > > Hence,  is not cheap at all, including it unconditionally
> > > from
> > > system.h when it isn't actually used by many things doesn't seem
> > > a good
> > > idea.
> > > 
> 
> I think the real question is whether it makes a difference in
> a full build.  There won't be many translation units that
> don't include some other headers.  (though of course I won't
> be surprised if it does make a difference.)
> 
> If it's a real issue, you could fix this like how the
> other similar cases were handled by system.h, by adding this
> in system.h:
> 
>  #ifdef __cplusplus
>  #ifdef INCLUDE_UNIQUE_PTR
>  # include "unique-ptr.h"
>  #endif
>  #endif
> 
> instead of unconditionally including  there,
> and then translation units that want unique-ptr.h would
> do "#define INCLUDE_UNIQUE_PTR" instead of #include "unique-ptr.h",
> like done for a few other C++ headers.
> 
> (I maintain that IMO this is kind of self-inflicted GCC pain due
> to the fact that "#pragma poison" poisons too much.  If #pragma
> poison's behavior were adjusted (or a new variant/mode created) to
> ignore references to the poisoned symbol names in system headers (or
> something like that), then you wouldn't need this manual management
> of header dependencies in gcc/system.h and the corresponding 
> '#define INCLUDE_FOO' contortions.  There's nothing that you can
> reasonably
> do with a reference to a poisoned symbol in a system header, other
> than
> avoid having the system header have the '#pragma poison' in effect
> when
> its included, which leads to contortions like system.h's.  Note that
> the poisoned names are _still used anyway_.  So can we come up with
> a GCC change that would avoid having to worry about manually doing
> this?  It'd likely help other projects too.)
> 
> Thanks,
> Pedro Alves

Here's a different patch, which instead moves the include of our
"unique-ptr.h" to system.h (conditionalized on INCLUDE_UNIQUE_PTR),
after the decl of "free" and before the redefinition of "abort".

It also makes the include of  in unique-ptr.h be conditional
on C++11 or later.

Hence it makes the new stuff only be included for the places where
we're actually using unique_ptr.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
using gcc 4.8 for the initial bootstrap (hence testing both gnu+03
and then gnu++14 in the selftests, for stage 1 and stages 2 and 3
respectively).

I don't know if it actually fixes the bootstrap issue seen with
clang on Darwin and OpenBSD, though - but I expect it to.

OK for trunk?

gcc/ChangeLog:
PR bootstrap/82610
* system.h: Conditionally include "unique-ptr.h" if
INCLUDE_UNIQUE_PTR is defined.
* unique-ptr-tests.cc: Remove include of "unique-ptr.h" in favor
of defining INCLUDE_UNIQUE_PTR before including "system.h".

include/ChangeLog:
* unique-ptr.h: Make include of  conditional on C++11 or
later.
---
 gcc/system.h| 10 ++
 gcc/unique-ptr-tests.cc |  2 +-
 include/unique-ptr.h|  4 +++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/system.h b/gcc/system.h
index f0664e9..1714af4 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -720,6 +720,16 @@ extern int vsnprintf (char *, size_t, const char *, 
va_list);
 #define __builtin_expect(a, b) (a)
 #endif
 
+/* Some of the headers included by  can use "abort" within a
+   namespace, e.g. "_VSTD::abort();", which fails after we use the
+   preprocessor to redefine "abort" as "fancy_abort" below.
+   Given that 

Re: Don't query the frontend for unsupported types

2017-10-23 Thread Richard Sandiford
Ping.

Richard Sandiford  writes:
> Richard Biener  writes:
>> On Fri, Sep 22, 2017 at 6:42 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Thu, Sep 21, 2017 at 2:56 PM, Richard Sandiford
  wrote:
> Richard Biener  writes:
>> On September 20, 2017 2:36:03 PM GMT+02:00, Richard Sandiford
>>  wrote:
>>>When forcing a constant of mode MODE into memory, force_const_mem
>>>asks the frontend to provide the type associated with that mode.
>>>In principle type_for_mode is allowed to return null, and although
>>>one use site correctly handled that, the other didn't.
>>>
>>>I think there's agreement that it's bogus to use type_for_mode for
>>>this kind of thing, since it forces frontends to handle types that
>>>don't exist in that language.  See e.g. http://gcc.gnu.org/PR46805
>>>where the Go frontend was forced to handle vector types even though
>>>Go doesn't have vector types.
>>>
>>>Also, the frontends use code like:
>>>
>>>  else if (VECTOR_MODE_P (mode))
>>>{
>>>  machine_mode inner_mode = GET_MODE_INNER (mode);
>>>  tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
>>>  if (inner_type != NULL_TREE)
>>>return build_vector_type_for_mode (inner_type, mode);
>>>}
>>>
>>>and there's no guarantee that every vector mode M used by backend
>>>rtl has an associated vector type whose TYPE_MODE is M.  I think
>>>really the type_for_mode hook should only return trees that _do_ have
>>>the requested TYPE_MODE, but PR46805 linked above shows that this is
>>>likely to have too many knock-on consequences.  It doesn't make sense
>>>for force_const_mem to ask about vector modes that aren't valid for
>>>vector types, so this patch handles the condition there instead.
>>>
>>>This is needed for SVE multi-register modes, which are modelled as
>>>vector modes but are not usable as vector types.
>>>
>>>Tested on aarch64-linux-gnu, x86_64-linux-gnu and
>>>powerpc64le-linus-gnu.
>>>OK to install?
>>
>> I think we should get rid of the use entirely.
>
> I first read this as not using type_for_mode at all in force_const_mem,
> which sounded like a good thing :-)

 That's what I meant ;)  A mode doesn't really have a type...

   I tried it overnight on the usual
> at-least-one-target-per-CPU set and diffing the before and after
> assembly for the testsuite.  And it looks like i686 relies on this
> to get an alignment of 16 rather than 4 for XFmode constants:
> GET_MODE_ALIGNMENT (XFmode) == 32 (as requested by i386-modes.def),
> but i386's CONSTANT_ALIGNMENT increases it to 128 for static constants.

 Then the issue is that CONSTANT_ALIGNMENT takes a tree and not a mode...
 even worse than type_for_mode is a use of make_tree!  Incidentially
 ix86_constant_alignment _does_ look at the mode in the end...
>>>
>>> OK, I guess this means another target hook conversion.  The patch
>>> below converts CONSTANT_ALIGNMENT with its current interface.
>>> The definition:
>>>
>>>   #define CONSTANT_ALIGNMENT(EXP, ALIGN) \
>>> (TREE_CODE (EXP) == STRING_CST \
>>>  && (ALIGN) < BITS_PER_WORD ? BITS_PER_WORD : (ALIGN))
>>>
>>> was very common, so the patch adds a canned definition for that,
>>> called constant_alignment_word_strings.  Some ports had a variation
>>> that used a port-local FASTEST_ALIGNMENT instead of BITS_PER_WORD;
>>> the patch uses constant_alignment_word_strings if FASTEST_ALIGNMENT
>>> was always BITS_PER_WORD and a port-local hook function otherwise.
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>>> Also tested by comparing the testsuite assembly output on at least one
>>> target per CPU directory.  I don't think this comes under Jeff's
>>> preapproval due to the constant_alignment_word_strings thing, so:
>>> OK to install?
>>
>> Ok.
>
> Thanks.  A bit later than intended, but here's the follow-on to add
> the new rtx hook.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?
>
> Richard
>
>
> 2017-10-01  Richard Sandiford  
>
> gcc/
>   * target.def (static_rtx_alignment): New hook.
>   * targhooks.h (default_static_rtx_alignment): Declare.
>   * targhooks.c (default_static_rtx_alignment): New function.
>   * doc/tm.texi.in (TARGET_STATIC_RTX_ALIGNMENT): New hook.
>   * doc/tm.texi: Regenerate.
>   * varasm.c (force_const_mem): Use targetm.static_rtx_alignment
>   instead of targetm.constant_alignment.  

Re: [libstdc++, patch] Fix build on APFS file system

2017-10-23 Thread Jonathan Wakely

On 23/10/17 19:48 +0200, FX wrote:

The patch seems like a rough bandaid to hide the real bug.  Better to identify 
the real bug.  If there is a missing dependency, then I'd like to think that 
adding the right dependency should resolve the issue.


So far, apart from a suggestion from Marc, I haven’t received any help or 
advice in identifying or debugging the issue.

FX


You could try this.


diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 2c4d193d0a4..39083cc4ebc 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -1016,6 +1016,8 @@ allcreated = \
 # Here are the rules for building the headers
 all-local: ${allstamped} ${allcreated}
 
+${pch_output} : | ${allstamped}
+
 # Ignore errors from 'mkdir -p' to avoid parallel make failure on
 # systems with broken mkdir.  Call mkdir unconditionally because
 # it is just as cheap to avoid going through the shell.
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index bc8556c68d2..e1a852e2906 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1465,6 +1465,8 @@ uninstall-am:
 # Here are the rules for building the headers
 all-local: ${allstamped} ${allcreated}
 
+${pch_output} : | ${allstamped}
+
 # Ignore errors from 'mkdir -p' to avoid parallel make failure on
 # systems with broken mkdir.  Call mkdir unconditionally because
 # it is just as cheap to avoid going through the shell.


Re: [AArch64] Tweak aarch64_classify_address interface

2017-10-23 Thread Richard Sandiford
Ping.

Richard Sandiford  writes:
> Richard Sandiford  writes:
>> James Greenhalgh  writes:
>>> On Tue, Aug 22, 2017 at 10:23:47AM +0100, Richard Sandiford wrote:
 Previously aarch64_classify_address used an rtx code to distinguish
 LDP/STP addresses from normal addresses; the code was PARALLEL
 to select LDP/STP and anything else to select normal addresses.
 This patch replaces that parameter with a dedicated enum.
 
 The SVE port will add another enum value that didn't map naturally
 to an rtx code.
 
 Tested on aarch64-linux-gnu.  OK to install?
>>>
>>> I can't say I really like this new interface, I'd prefer two wrappers
>>> aarch64_legitimate_address_p , aarch64_legitimate_ldp_address_p (or similar)
>>> around your new interface, and for most code to simply call the wrapper.
>>> Or an overloaded call that filled in ADDR_QUERY_M automatically, to save
>>> that spreading through the backend.
>>
>> OK, I went for the second, putting the query type last and making it
>> an optional argument.
>
> By way of a ping, here's the patch updated to current trunk.
>
> Tested on aarch64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
> 2017-09-18  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
>
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New enum.
>   (aarch64_legitimate_address_p): Use it instead of an rtx code,
>   as an optional final parameter.
>   * config/aarch64/aarch64.c (aarch64_classify_address): Likewise.
>   (aarch64_legitimate_address_p): Likewise.
>   (aarch64_address_valid_for_prefetch_p): Update calls accordingly.
>   (aarch64_legitimate_address_hook_p): Likewise.
>   (aarch64_print_operand_address): Likewise.
>   (aarch64_address_cost): Likewise.
>   * config/aarch64/constraints.md (Umq, Ump): Likewise.
>   * config/aarch64/predicates.md (aarch64_mem_pair_operand): Likewise.
>
> Index: gcc/config/aarch64/aarch64-protos.h
> ===
> --- gcc/config/aarch64/aarch64-protos.h   2017-09-18 14:41:37.369070450 
> +0100
> +++ gcc/config/aarch64/aarch64-protos.h   2017-09-18 14:42:29.656488378 
> +0100
> @@ -111,6 +111,19 @@ enum aarch64_symbol_type
>SYMBOL_FORCE_TO_MEM
>  };
>  
> +/* Classifies the type of an address query.
> +
> +   ADDR_QUERY_M
> +  Query what is valid for an "m" constraint and a memory_operand
> +  (the rules are the same for both).
> +
> +   ADDR_QUERY_LDP_STP
> +  Query what is valid for a load/store pair.  */
> +enum aarch64_addr_query_type {
> +  ADDR_QUERY_M,
> +  ADDR_QUERY_LDP_STP
> +};
> +
>  /* A set of tuning parameters contains references to size and time
> cost models and vectors for address cost calculations, register
> move costs and memory move costs.  */
> @@ -427,7 +440,8 @@ bool aarch64_float_const_representable_p
>  
>  #if defined (RTX_CODE)
>  
> -bool aarch64_legitimate_address_p (machine_mode, rtx, RTX_CODE, bool);
> +bool aarch64_legitimate_address_p (machine_mode, rtx, bool,
> +aarch64_addr_query_type = ADDR_QUERY_M);
>  machine_mode aarch64_select_cc_mode (RTX_CODE, rtx, rtx);
>  rtx aarch64_gen_compare_reg (RTX_CODE, rtx, rtx);
>  rtx aarch64_load_tp (rtx);
> Index: gcc/config/aarch64/aarch64.c
> ===
> --- gcc/config/aarch64/aarch64.c  2017-09-18 14:41:37.373588926 +0100
> +++ gcc/config/aarch64/aarch64.c  2017-09-18 14:42:29.657389742 +0100
> @@ -4409,21 +4409,21 @@ virt_or_elim_regno_p (unsigned regno)
> || regno == ARG_POINTER_REGNUM);
>  }
>  
> -/* Return true if X is a valid address for machine mode MODE.  If it is,
> -   fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
> -   effect.  OUTER_CODE is PARALLEL for a load/store pair.  */
> +/* Return true if X is a valid address of type TYPE for machine mode MODE.
> +   If it is, fill in INFO appropriately.  STRICT_P is true if
> +   REG_OK_STRICT is in effect.  */
>  
>  static bool
>  aarch64_classify_address (struct aarch64_address_info *info,
> -   rtx x, machine_mode mode,
> -   RTX_CODE outer_code, bool strict_p)
> +   rtx x, machine_mode mode, bool strict_p,
> +   aarch64_addr_query_type type = ADDR_QUERY_M)
>  {
>enum rtx_code code = GET_CODE (x);
>rtx op0, op1;
>  
>/* On BE, we use load/store pair for all large int mode load/stores.
>   TI/TFmode may also use a load/store pair.  */
> -  bool load_store_pair_p = (outer_code == PARALLEL
> +  bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
>   || mode == TImode
>   || mode == TFmode
>   

Re: [PATCH] rl78 subdi3 improvement

2017-10-23 Thread DJ Delorie

Committed.  Thanks!

Note: your diff program isn't producing valid diffs...

* it's dropping leading tabs
* it's not putting a space after file names in the headers

I have to manually fix these to apply the patch; if you could fix it on
your end that would be appreciated :-)


Re: [libstdc++, patch] Fix build on APFS file system

2017-10-23 Thread FX
> The patch seems like a rough bandaid to hide the real bug.  Better to 
> identify the real bug.  If there is a missing dependency, then I'd like to 
> think that adding the right dependency should resolve the issue.

So far, apart from a suggestion from Marc, I haven’t received any help or 
advice in identifying or debugging the issue.

FX

[107/nnn] poly_int: GET_MODE_SIZE

2017-10-23 Thread Richard Sandiford
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod.  Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
   

Re: [libstdc++, patch] Fix build on APFS file system

2017-10-23 Thread Mike Stump
On Oct 18, 2017, at 7:51 AM, FX  wrote:
> 
> Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
> macOS 10.13 often fail (failure rate for “make -j2” to “make -j8” is about 
> 60% from my own builds and results reported by others): 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
> This is reproducible with several versions of GNU make.
> 
> Changing libstdc++’s makefile to mark install-headers with .NOTPARALLEL fixes 
> the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
> months now, and have had no report of build issues since then.
> 
> Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
> platforms). OK to commit?

The patch seems like a rough bandaid to hide the real bug.  Better to identify 
the real bug.  If there is a missing dependency, then I'd like to think that 
adding the right dependency should resolve the issue.



[106/nnn] poly_int: GET_MODE_BITSIZE

2017-10-23 Thread Richard Sandiford
This patch changes GET_MODE_BITSIZE from an unsigned short
to a poly_uint16.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* machmode.h (mode_to_bits): Return a poly_uint16 rather than an
unsigned short.
(GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is polynomial.
* calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial.
* combine.c (make_extraction): Likewise.
* dse.c (find_shift_sequence): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise.
(extract_bit_field, extract_low_bits): Likewise.
* expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise.
(optimize_bitfield_assignment_op, expand_assignment): Likewise.
(store_field, expand_expr_real_1): Likewise.
* fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* reload.c (find_reloads): Likewise.
* reload1.c (alter_reg): Likewise.
* stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise.
* targhooks.c (default_secondary_memory_needed_mode): Likewise.
* tree-if-conv.c (predicate_mem_writes): Likewise.
* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
* tree-vect-patterns.c (adjust_bool_pattern): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* valtrack.c (dead_debug_insert_temp): Likewise.
* varasm.c (mergeable_constant_section): Likewise.
* config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a .

gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE
as polynomial.

gcc/c-family/
* c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE
as polynomial.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-10-23 17:25:54.180292158 +0100
+++ gcc/machmode.h  2017-10-23 17:25:57.265181271 +0100
@@ -527,7 +527,7 @@ mode_to_bytes (machine_mode mode)
 
 /* Return the base GET_MODE_BITSIZE value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_bits (machine_mode mode)
 {
   return mode_to_bytes (mode) * BITS_PER_UNIT;
@@ -600,7 +600,29 @@ #define GET_MODE_SIZE(MODE) (mode_to_byt
 
 /* Get the size in bits of an object of mode MODE.  */
 
-#define GET_MODE_BITSIZE(MODE) (mode_to_bits (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_BITSIZE(MODE) ((unsigned short) mode_to_bits (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_BITSIZE (machine_mode mode)
+{
+  return mode_to_bits (mode);
+}
+
+template
+ALWAYS_INLINE typename if_poly::t
+GET_MODE_BITSIZE (const T )
+{
+  return mode_to_bits (mode);
+}
+
+template
+ALWAYS_INLINE typename if_nonpoly::t
+GET_MODE_BITSIZE (const T )
+{
+  return mode_to_bits (mode).coeffs[0];
+}
+#endif
 
 /* Get the number of value bits of an object of mode MODE.  */
 
Index: gcc/calls.c
===
--- gcc/calls.c 2017-10-23 17:25:46.488568637 +0100
+++ gcc/calls.c 2017-10-23 17:25:57.257181559 +0100
@@ -2835,12 +2835,11 @@ check_sibcall_argument_overlap (rtx_insn
 bool
 shift_return_value (machine_mode mode, bool left_p, rtx value)
 {
-  HOST_WIDE_INT shift;
-
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
   machine_mode value_mode = GET_MODE (value);
-  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
-  if (shift == 0)
+  poly_int64 shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
+
+  if (known_zero (shift))
 return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
Index: gcc/combine.c
===
--- gcc/combine.c   2017-10-23 17:25:54.176292301 +0100
+++ gcc/combine.c   2017-10-23 17:25:57.258181523 +0100
@@ -7675,8 +7675,9 @@ make_extraction (machine_mode mode, rtx
  are the same as for a register operation, since at present we don't
  have named patterns for aligned memory structures.  */
   struct extraction_insn insn;
-  if (get_best_reg_extraction_insn (, pattern,
-   GET_MODE_BITSIZE (inner_mode), mode))
+  unsigned int inner_size;
+  if (GET_MODE_BITSIZE (inner_mode).is_constant (_size)
+  && get_best_reg_extraction_insn (, pattern, inner_size, mode))
 {
   wanted_inner_reg_mode = insn.struct_mode.require ();
   pos_mode = insn.pos_mode;
@@ -7712,9 +7713,11 @@ make_extraction (machine_mode mode, rtx
 If it's a MEM we need to recompute POS relative to that.
 However, if we're extracting from (or inserting into) a 

[104/nnn] poly_int: GET_MODE_PRECISION

2017-10-23 Thread Richard Sandiford
This patch changes GET_MODE_PRECISION from an unsigned short
to a poly_uint16.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* machmode.h (mode_precision): Change from unsigned short to
poly_uint16_pod.
(mode_to_precision): Return a poly_uint16 rather than an unsigned
short.
(GET_MODE_PRECISION): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(HWI_COMPUTABLE_MODE_P): Turn into a function.  Optimize the case
in which the mode is already known to be a scalar_int_mode.
* genmodes.c (emit_mode_precision): Change the type of mode_precision
from unsigned short to poly_uint16_pod.  Use ZERO_COEFFS for the
initializer.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_PRECISION.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_PRECISION.
* combine.c (update_rsp_from_reg_equal): Treat GET_MODE_PRECISION
as polynomial.
(try_combine, find_split_point, combine_simplify_rtx): Likewise.
(expand_field_assignment, make_extraction): Likewise.
(make_compound_operation_int, record_dead_and_set_regs_1): Likewise.
(get_last_value): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (cse_insn): Likewise.
* expr.c (expand_expr_real_1): Likewise.
* lra-constraints.c (simplify_operand_subreg): Likewise.
* optabs-query.c (can_atomic_load_p): Likewise.
* optabs.c (expand_atomic_load): Likewise.
(expand_atomic_store): Likewise.
* ree.c (combine_reaching_defs): Likewise.
* rtl.h (partial_subreg_p, paradoxical_subreg_p): Likewise.
* rtlanal.c (nonzero_bits1, lsb_bitfield_op_p): Likewise.
* tree.h (type_has_mode_precision_p): Likewise.
* ubsan.c (instrument_si_overflow): Likewise.

gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_PRECISION
as polynomial.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-10-23 17:25:48.620492005 +0100
+++ gcc/machmode.h  2017-10-23 17:25:54.180292158 +0100
@@ -23,7 +23,7 @@ #define HAVE_MACHINE_MODES
 typedef opt_mode opt_machine_mode;
 
 extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
-extern const unsigned short mode_precision[NUM_MACHINE_MODES];
+extern const poly_uint16_pod mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
 extern const poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
@@ -535,7 +535,7 @@ mode_to_bits (machine_mode mode)
 
 /* Return the base GET_MODE_PRECISION value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_precision (machine_mode mode)
 {
   return mode_precision[mode];
@@ -604,7 +604,30 @@ #define GET_MODE_BITSIZE(MODE) (mode_to_
 
 /* Get the number of value bits of an object of mode MODE.  */
 
-#define GET_MODE_PRECISION(MODE) (mode_to_precision (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_PRECISION(MODE) \
+  ((unsigned short) mode_to_precision (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_PRECISION (machine_mode mode)
+{
+  return mode_to_precision (mode);
+}
+
+template
+ALWAYS_INLINE typename if_poly::t
+GET_MODE_PRECISION (const T )
+{
+  return mode_to_precision (mode);
+}
+
+template
+ALWAYS_INLINE typename if_nonpoly::t
+GET_MODE_PRECISION (const T )
+{
+  return mode_to_precision (mode).coeffs[0];
+}
+#endif
 
 /* Get the number of integral bits of an object of mode MODE.  */
 extern CONST_MODE_IBIT unsigned char mode_ibit[NUM_MACHINE_MODES];
@@ -863,9 +886,22 @@ #define TRULY_NOOP_TRUNCATION_MODES_P(MO
   (targetm.truly_noop_truncation (GET_MODE_PRECISION (MODE1), \
  GET_MODE_PRECISION (MODE2)))
 
-#define HWI_COMPUTABLE_MODE_P(MODE) \
-  (SCALAR_INT_MODE_P (MODE) \
-   && GET_MODE_PRECISION (MODE) <= HOST_BITS_PER_WIDE_INT)
+/* Return true if MODE is a scalar integer mode that fits in a
+   HOST_WIDE_INT.  */
+
+inline bool
+HWI_COMPUTABLE_MODE_P (machine_mode mode)
+{
+  machine_mode mme = mode;
+  return (SCALAR_INT_MODE_P (mme)
+ && mode_to_precision (mme).coeffs[0] <= HOST_BITS_PER_WIDE_INT);
+}
+
+inline bool
+HWI_COMPUTABLE_MODE_P (scalar_int_mode mode)
+{
+  return GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT;
+}
 
 struct int_n_data_t {
   /* These parts are initailized by genmodes output */
Index: gcc/genmodes.c
===
--- gcc/genmodes.c  2017-10-23 17:25:48.618492077 +0100
+++ gcc/genmodes.c  2017-10-23 17:25:54.178292230 +0100
@@ -1358,13 

[105/nnn] poly_int: expand_assignment

2017-10-23 Thread Richard Sandiford
This patch makes the CONCAT handing in expand_assignment cope with
polynomial mode sizes.  The mode of the CONCAT must be complex,
so we can base the tests on the sizes of the real and imaginary
components.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.c (expand_assignment): Cope with polynomial mode sizes
when assigning to a CONCAT.

Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:25:54.178292230 +0100
+++ gcc/expr.c  2017-10-23 17:25:56.086223649 +0100
@@ -5109,32 +5109,36 @@ expand_assignment (tree to, tree from, b
   /* Handle expand_expr of a complex value returning a CONCAT.  */
   else if (GET_CODE (to_rtx) == CONCAT)
{
- unsigned short mode_bitsize = GET_MODE_BITSIZE (GET_MODE (to_rtx));
+ machine_mode to_mode = GET_MODE (to_rtx);
+ gcc_checking_assert (COMPLEX_MODE_P (to_mode));
+ poly_int64 mode_bitsize = GET_MODE_BITSIZE (to_mode);
+ unsigned short inner_bitsize = GET_MODE_UNIT_BITSIZE (to_mode);
  if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
  && known_zero (bitpos)
  && must_eq (bitsize, mode_bitsize))
result = store_expr (from, to_rtx, false, nontemporal, reversep);
- else if (must_eq (bitsize, mode_bitsize / 2)
+ else if (must_eq (bitsize, inner_bitsize)
   && (known_zero (bitpos)
-  || must_eq (bitpos, mode_bitsize / 2)))
+  || must_eq (bitpos, inner_bitsize)))
result = store_expr (from, XEXP (to_rtx, maybe_nonzero (bitpos)),
 false, nontemporal, reversep);
- else if (must_le (bitpos + bitsize, mode_bitsize / 2))
+ else if (must_le (bitpos + bitsize, inner_bitsize))
result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
  bitregion_start, bitregion_end,
  mode1, from, get_alias_set (to),
  nontemporal, reversep);
- else if (must_ge (bitpos, mode_bitsize / 2))
+ else if (must_ge (bitpos, inner_bitsize))
result = store_field (XEXP (to_rtx, 1), bitsize,
- bitpos - mode_bitsize / 2,
+ bitpos - inner_bitsize,
  bitregion_start, bitregion_end,
  mode1, from, get_alias_set (to),
  nontemporal, reversep);
- else if (known_zero (bitpos) && must_eq (bitsize, mode_bitsize))
+ else if (known_zero (bitpos)
+  && must_eq (bitsize, mode_bitsize))
{
  rtx from_rtx;
  result = expand_normal (from);
- from_rtx = simplify_gen_subreg (GET_MODE (to_rtx), result,
+ from_rtx = simplify_gen_subreg (to_mode, result,
  TYPE_MODE (TREE_TYPE (from)), 0);
  emit_move_insn (XEXP (to_rtx, 0),
  read_complex_part (from_rtx, false));


[103/nnn] poly_int: TYPE_VECTOR_SUBPARTS

2017-10-23 Thread Richard Sandiford
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
encoded in the 10-bit precision field and was previously always stored
as a simple log2 value.  The challenge was to use this 10 bits to
encode the number of elements in variable-length vectors, so that
we didn't need to increase the size of the tree.

In practice the number of vector elements should always have the form
N + N * X (where X is the runtime value), and as for constant-length
vectors, N must be a power of 2 (even though X itself might not be).
The patch therefore uses the low bit to select between constant-length
and variable-length and uses the upper 9 bits to encode log2(N).
Targets without variable-length vectors continue to use the old scheme.

A new valid_vector_subparts_p function tests whether a given number
of elements can be encoded.  This is false for the vector modes that
represent an LD3 or ST3 vector triple (which we want to treat as arrays
of vectors rather than single vectors).

Most of the patch is mechanical; previous patches handled the changes
that weren't entirely straightforward.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
polynomial numbers of units.
(SET_TYPE_VECTOR_SUBPARTS): Likewise.
(valid_vector_subparts_p): New function.
(build_vector_type): Remove temporary shim and take the number
of units as a poly_uint64 rather than an int.
(build_opaque_vector_type): Take the number of units as a
poly_uint64 rather than an int.
* tree.c (build_vector): Handle polynomial TYPE_VECTOR_SUBPARTS.
(build_vector_from_ctor, type_hash_canon_hash): Likewise.
(type_cache_hasher::equal, uniform_vector_p): Likewise.
(vector_type_mode): Likewise.
(build_vector_from_val): If the number of units isn't constant,
use build_vec_duplicate_cst for constant operands and
VEC_DUPLICATE_EXPR otherwise.
(make_vector_type): Remove temporary is_constant ().
(build_vector_type, build_opaque_vector_type): Take the number of
units as a poly_uint64 rather than an int.
* cfgexpand.c (expand_debug_expr): Handle polynomial
TYPE_VECTOR_SUBPARTS.
* expr.c (count_type_elements, store_constructor): Likewise.
* fold-const.c (const_binop, const_unop, fold_convert_const)
(operand_equal_p, fold_view_convert_expr, fold_vec_perm)
(fold_ternary_loc, fold_relational_const): Likewise.
(native_interpret_vector): Likewise.  Change the size from an
int to an unsigned int.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
duplicating a non-constant operand into a variable-length vector.
* match.pd: Handle polynomial TYPE_VECTOR_SUBPARTS.
* omp-simd-clone.c (simd_clone_subparts): Likewise.
* print-tree.c (print_node): Likewise.
* stor-layout.c (layout_type): Likewise.
* targhooks.c (default_builtin_vectorization_cost): Likewise.
* tree-cfg.c (verify_gimple_comparison): Likewise.
(verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
(verify_gimple_assign_single): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
(expand_vector_condition, optimize_vector_constructor): Likewise.
(lower_vec_perm, get_compute_type): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
(vectorizable_shift, vectorizable_operation, vectorizable_store)
(vect_gen_perm_mask_any, vectorizable_load, vect_is_simple_cond)
(vectorizable_comparison, supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.

gcc/ada/
* gcc-interface/utils.c 

[102/nnn] poly_int: vect_permute_load/store_chain

2017-10-23 Thread Richard Sandiford
The GET_MODE_NUNITS patch made vect_grouped_store_supported and
vect_grouped_load_supported check for a constant number of elements,
so vect_permute_store_chain and vect_permute_load_chain can assert
for that.  This patch adds commentary to that effect; the actual
asserts will be added by a later, more mechanical, patch.

The patch also reorganises the function so that the asserts
are linked specifically to code that builds permute vectors
element-by-element.  This allows a later patch to add support
for some variable-length permutes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vect_permute_store_chain): Reorganize
so that both the length == 3 and length != 3 cases set up their
own permute vectors.  Add comments explaining why we know the
number of elements is constant.
(vect_permute_load_chain): Likewise.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-10-23 17:25:48.623491897 +0100
+++ gcc/tree-vect-data-refs.c   2017-10-23 17:25:50.361429427 +0100
@@ -4734,11 +4734,7 @@ vect_permute_store_chain (vec dr_c
   tree perm_mask_low, perm_mask_high;
   tree data_ref;
   tree perm3_mask_low, perm3_mask_high;
-  unsigned int i, n, log_length = exact_log2 (length);
-  unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
-
-  auto_vec_perm_indices sel (nelt);
-  sel.quick_grow (nelt);
+  unsigned int i, j, n, log_length = exact_log2 (length);
 
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
@@ -4746,8 +4742,12 @@ vect_permute_store_chain (vec dr_c
 
   if (length == 3)
 {
+  /* vect_grouped_store_supported ensures that this is constant.  */
+  unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
   unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+  auto_vec_perm_indices sel (nelt);
+  sel.quick_grow (nelt);
   for (j = 0; j < 3; j++)
 {
  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4806,6 +4806,10 @@ vect_permute_store_chain (vec dr_c
   /* If length is not equal to 3 then only power of 2 is supported.  */
   gcc_assert (pow2p_hwi (length));
 
+  /* vect_grouped_store_supported ensures that this is constant.  */
+  unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
+  auto_vec_perm_indices sel (nelt);
+  sel.quick_grow (nelt);
   for (i = 0, n = nelt / 2; i < n; i++)
{
  sel[i * 2] = i;
@@ -5321,10 +5325,6 @@ vect_permute_load_chain (vec dr_ch
   gimple *perm_stmt;
   tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
   unsigned int i, j, log_length = exact_log2 (length);
-  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
-
-  auto_vec_perm_indices sel (nelt);
-  sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
@@ -5332,8 +5332,12 @@ vect_permute_load_chain (vec dr_ch
 
   if (length == 3)
 {
+  /* vect_grouped_load_supported ensures that this is constant.  */
+  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
   unsigned int k;
 
+  auto_vec_perm_indices sel (nelt);
+  sel.quick_grow (nelt);
   for (k = 0; k < 3; k++)
{
  for (i = 0; i < nelt; i++)
@@ -5379,6 +5383,10 @@ vect_permute_load_chain (vec dr_ch
   /* If length is not equal to 3 then only power of 2 is supported.  */
   gcc_assert (pow2p_hwi (length));
 
+  /* vect_grouped_load_supported ensures that this is constant.  */
+  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
+  auto_vec_perm_indices sel (nelt);
+  sel.quick_grow (nelt);
   for (i = 0; i < nelt; ++i)
sel[i] = i * 2;
   perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);


[101/nnn] poly_int: GET_MODE_NUNITS

2017-10-23 Thread Richard Sandiford
This patch changes GET_MODE_NUNITS from unsigned char
to poly_uint16, although it remains a macro when compiling
target code with NUM_POLY_INT_COEFFS == 1.

If the number of units isn't known at compile time, we use:

  (const:M (vec_duplicate:M X))

to represent a vector in which every element is equal to X.  The code
ensures that there is only a single instance of each constant, so that
pointer equality is enough.  (This is a requirement for the constants
that go in const_tiny_rtx, but we might as well do it for all constants.)

Similarly we use:

  (const:M (vec_series:M A B))

for a linear series starting at A and having step B.

The to_constant call in make_vector_type goes away in a later patch.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* machmode.h (mode_nunits): Change from unsigned char to
poly_uint16_pod.
(ONLY_FIXED_SIZE_MODES): New macro.
(pod_mode::measurement_type, scalar_int_mode::measurement_type)
(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
(complex_mode::measurement_type, fixed_size_mode::measurement_type):
New typedefs.
(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
* genmodes.c (ZERO_COEFFS): New macro.
(emit_mode_nunits_inline): Make mode_nunits_inline return a
poly_uint16.
(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
Use ZERO_COEFFS when emitting initializers.
* data-streamer.h (bp_pack_poly_value): New function.
(bp_unpack_poly_value): Likewise.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_NUNITS.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_NUNITS.
* tree.c (make_vector_type): Remove temporary shim and make
the real function take the number of units as a poly_uint64
rather than an int.
(build_vector_type_for_mode): Handle polynomial nunits.
* emit-rtl.c (gen_const_vec_duplicate_1): Likewise.
(gen_const_vec_series, gen_rtx_CONST_VECTOR): Likewise.
* genrecog.c (validate_pattern): Likewise.
* optabs-query.c (can_mult_highpart_p): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vector_broadcast, expand_binop_directly)
(shift_amt_for_vec_perm_mask, expand_vec_perm, expand_vec_cond_expr)
(expand_mult_highpart): Likewise.
* rtlanal.c (subreg_get_info): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_const_unary_operation, simplify_binary_operation_1)
(simplify_const_binary_operation, simplify_ternary_operation)
(test_vector_ops_duplicate, test_vector_ops): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
* tree-vect-loop.c (have_whole_vector_shift): Likewise.

gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Handle polynomial
GET_MODE_NUNITS.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-10-23 17:11:54.535862371 +0100
+++ gcc/machmode.h  2017-10-23 17:25:48.620492005 +0100
@@ -25,7 +25,7 @@ typedef opt_mode opt_machi
 extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
-extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
+extern const poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_wider[NUM_MACHINE_MODES];
@@ -76,6 +76,14 @@ struct mode_traits
   typedef machine_mode from_int;
 };
 
+/* Always treat machine modes as fixed-size while compiling code specific
+   to targets that have no variable-size modes.  */
+#if defined (IN_TARGET_CODE) && NUM_POLY_INT_COEFFS == 1
+#define ONLY_FIXED_SIZE_MODES 1
+#else
+#define ONLY_FIXED_SIZE_MODES 0
+#endif
+
 /* Get the name of mode MODE as a string.  */
 
 extern const char * const mode_name[NUM_MACHINE_MODES];
@@ -313,6 +321,7 @@ opt_mode::exists (U *mode) const
 struct pod_mode
 {
   typedef typename mode_traits::from_int from_int;
+  typedef typename T::measurement_type measurement_type;
 
   machine_mode m_mode;
   ALWAYS_INLINE operator machine_mode () const { return m_mode; }
@@ -391,6 +400,7 @@ is_a (machine_mode m, U *result)
 {
 

[100/nnn] poly_int: memrefs_conflict_p

2017-10-23 Thread Richard Sandiford
The xsize and ysize arguments to memrefs_conflict_p are encode such
that:

- 0 means the size is unknown
- >0 means the size is known
- <0 means that the negative of the size is a worst-case size after
  alignment

In other words, the sign effectively encodes a boolean; it isn't
meant to be taken literally.  With poly_ints these correspond to:

- known_zero (...)
- may_gt (..., 0)
- may_lt (..., 0)

respectively.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* alias.c (addr_side_effect_eval): Take the size as a poly_int64
rather than an int.  Use plus_constant.
(memrefs_conflict_p): Take the sizes as poly_int64s rather than ints.
Take the offset "c" as a poly_int64 rather than a HOST_WIDE_INT.

Index: gcc/alias.c
===
--- gcc/alias.c 2017-10-23 17:16:50.356530167 +0100
+++ gcc/alias.c 2017-10-23 17:25:47.476533124 +0100
@@ -148,7 +148,6 @@ struct GTY(()) alias_set_entry {
 };
 
 static int rtx_equal_for_memref_p (const_rtx, const_rtx);
-static int memrefs_conflict_p (int, rtx, int, rtx, HOST_WIDE_INT);
 static void record_set (rtx, const_rtx, void *);
 static int base_alias_check (rtx, rtx, rtx, rtx, machine_mode,
 machine_mode);
@@ -2295,9 +2294,9 @@ get_addr (rtx x)
 is not modified by the memory reference then ADDR is returned.  */
 
 static rtx
-addr_side_effect_eval (rtx addr, int size, int n_refs)
+addr_side_effect_eval (rtx addr, poly_int64 size, int n_refs)
 {
-  int offset = 0;
+  poly_int64 offset = 0;
 
   switch (GET_CODE (addr))
 {
@@ -2318,11 +2317,7 @@ addr_side_effect_eval (rtx addr, int siz
   return addr;
 }
 
-  if (offset)
-addr = gen_rtx_PLUS (GET_MODE (addr), XEXP (addr, 0),
-gen_int_mode (offset, GET_MODE (addr)));
-  else
-addr = XEXP (addr, 0);
+  addr = plus_constant (GET_MODE (addr), XEXP (addr, 0), offset);
   addr = canon_rtx (addr);
 
   return addr;
@@ -2372,7 +2367,8 @@ offset_overlap_p (poly_int64 c, poly_int
If that is fixed the TBAA hack for union type-punning can be removed.  */
 
 static int
-memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, HOST_WIDE_INT c)
+memrefs_conflict_p (poly_int64 xsize, rtx x, poly_int64 ysize, rtx y,
+   poly_int64 c)
 {
   if (GET_CODE (x) == VALUE)
 {
@@ -2417,13 +2413,13 @@ memrefs_conflict_p (int xsize, rtx x, in
   else if (GET_CODE (x) == LO_SUM)
 x = XEXP (x, 1);
   else
-x = addr_side_effect_eval (x, abs (xsize), 0);
+x = addr_side_effect_eval (x, may_lt (xsize, 0) ? -xsize : xsize, 0);
   if (GET_CODE (y) == HIGH)
 y = XEXP (y, 0);
   else if (GET_CODE (y) == LO_SUM)
 y = XEXP (y, 1);
   else
-y = addr_side_effect_eval (y, abs (ysize), 0);
+y = addr_side_effect_eval (y, may_lt (ysize, 0) ? -ysize : ysize, 0);
 
   if (GET_CODE (x) == SYMBOL_REF && GET_CODE (y) == SYMBOL_REF)
 {
@@ -2436,7 +2432,7 @@ memrefs_conflict_p (int xsize, rtx x, in
 through alignment adjustments (i.e., that have negative
 sizes), because we can't know how far they are from each
 other.  */
-  if (xsize < 0 || ysize < 0)
+  if (may_lt (xsize, 0) || may_lt (ysize, 0))
return -1;
   /* If decls are different or we know by offsets that there is no overlap,
 we win.  */
@@ -2467,6 +2463,7 @@ memrefs_conflict_p (int xsize, rtx x, in
   else if (x1 == y)
return memrefs_conflict_p (xsize, x0, ysize, const0_rtx, c);
 
+  poly_int64 cx1, cy1;
   if (GET_CODE (y) == PLUS)
{
  /* The fact that Y is canonicalized means that this
@@ -2483,22 +2480,21 @@ memrefs_conflict_p (int xsize, rtx x, in
return memrefs_conflict_p (xsize, x0, ysize, y0, c);
  if (rtx_equal_for_memref_p (x0, y0))
return memrefs_conflict_p (xsize, x1, ysize, y1, c);
- if (CONST_INT_P (x1))
+ if (poly_int_rtx_p (x1, ))
{
- if (CONST_INT_P (y1))
+ if (poly_int_rtx_p (y1, ))
return memrefs_conflict_p (xsize, x0, ysize, y0,
-  c - INTVAL (x1) + INTVAL (y1));
+  c - cx1 + cy1);
  else
-   return memrefs_conflict_p (xsize, x0, ysize, y,
-  c - INTVAL (x1));
+   return memrefs_conflict_p (xsize, x0, ysize, y, c - cx1);
}
- else if (CONST_INT_P (y1))
-   return memrefs_conflict_p (xsize, x, ysize, y0, c + INTVAL (y1));
+ else if (poly_int_rtx_p (y1, ))
+   return memrefs_conflict_p (xsize, x, ysize, y0, c + cy1);
 
  return -1;
}
-  else if (CONST_INT_P (x1))
-   return memrefs_conflict_p (xsize, x0, ysize, y, c - INTVAL (x1));
+  else if 

[099/nnn] poly_int: struct_value_size

2017-10-23 Thread Richard Sandiford
This patch makes calls.c treat struct_value_size (one of the
operands to a call pattern) as polynomial.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* calls.c (emit_call_1, expand_call): Change struct_value_size from
a HOST_WIDE_INT to a poly_int64.

Index: gcc/calls.c
===
--- gcc/calls.c 2017-10-23 17:25:45.501604113 +0100
+++ gcc/calls.c 2017-10-23 17:25:46.488568637 +0100
@@ -377,7 +377,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
 tree funtype ATTRIBUTE_UNUSED,
 poly_int64 stack_size ATTRIBUTE_UNUSED,
 poly_int64 rounded_stack_size,
-HOST_WIDE_INT struct_value_size ATTRIBUTE_UNUSED,
+poly_int64 struct_value_size ATTRIBUTE_UNUSED,
 rtx next_arg_reg ATTRIBUTE_UNUSED, rtx valreg,
 int old_inhibit_defer_pop, rtx call_fusage, int ecf_flags,
 cumulative_args_t args_so_far ATTRIBUTE_UNUSED)
@@ -437,7 +437,8 @@ emit_call_1 (rtx funexp, tree fntree ATT
 next_arg_reg, NULL_RTX);
   else
pat = targetm.gen_sibcall (funmem, rounded_stack_size_rtx,
-  next_arg_reg, GEN_INT (struct_value_size));
+  next_arg_reg,
+  gen_int_mode (struct_value_size, Pmode));
 }
   /* If the target has "call" or "call_value" insns, then prefer them
  if no arguments are actually popped.  If the target does not have
@@ -470,7 +471,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
  next_arg_reg, NULL_RTX);
   else
pat = targetm.gen_call (funmem, rounded_stack_size_rtx, next_arg_reg,
-   GEN_INT (struct_value_size));
+   gen_int_mode (struct_value_size, Pmode));
 }
   emit_insn (pat);
 
@@ -3048,7 +3049,7 @@ expand_call (tree exp, rtx target, int i
   /* Size of aggregate value wanted, or zero if none wanted
  or if we are using the non-reentrant PCC calling convention
  or expecting the value in registers.  */
-  HOST_WIDE_INT struct_value_size = 0;
+  poly_int64 struct_value_size = 0;
   /* Nonzero if called function returns an aggregate in memory PCC style,
  by returning the address of where to find it.  */
   int pcc_struct_value = 0;
@@ -3210,7 +3211,8 @@ expand_call (tree exp, rtx target, int i
   }
 #else /* not PCC_STATIC_STRUCT_RETURN */
   {
-   struct_value_size = int_size_in_bytes (rettype);
+   if (!poly_int_tree_p (TYPE_SIZE_UNIT (rettype), _value_size))
+ struct_value_size = -1;
 
/* Even if it is semantically safe to use the target as the return
   slot, it may be not sufficiently aligned for the return type.  */


[098/nnn] poly_int: load_register_parameters

2017-10-23 Thread Richard Sandiford
This patch makes load_register_parameters cope with polynomial sizes.
The requirement here is that any register parameters with non-constant
sizes must either have a specific mode (e.g. a variable-length vector
mode) or must be represented with a PARALLEL.  This is in practice
already a requirement for parameters passed in vector registers,
since the default behaviour of splitting parameters into words doesn't
make sense for them.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* calls.c (load_register_parameters): Cope with polynomial
mode sizes.  Require a constant size for BLKmode parameters
that aren't described by a PARALLEL.  If BLOCK_REG_PADDING
forces a parameter to be padded at the lsb end in order to
fill a complete number of words, require the parameter size
to be ordered wrt UNITS_PER_WORD.

Index: gcc/calls.c
===
--- gcc/calls.c 2017-10-23 17:25:38.230865460 +0100
+++ gcc/calls.c 2017-10-23 17:25:45.501604113 +0100
@@ -2520,7 +2520,8 @@ load_register_parameters (struct arg_dat
{
  int partial = args[i].partial;
  int nregs;
- int size = 0;
+ poly_int64 size = 0;
+ HOST_WIDE_INT const_size = 0;
  rtx_insn *before_arg = get_last_insn ();
  /* Set non-negative if we must move a word at a time, even if
 just one word (e.g, partial == 4 && mode == DFmode).  Set
@@ -2536,8 +2537,12 @@ load_register_parameters (struct arg_dat
}
  else if (TYPE_MODE (TREE_TYPE (args[i].tree_value)) == BLKmode)
{
- size = int_size_in_bytes (TREE_TYPE (args[i].tree_value));
- nregs = (size + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
+ /* Variable-sized parameters should be described by a
+PARALLEL instead.  */
+ const_size = int_size_in_bytes (TREE_TYPE (args[i].tree_value));
+ gcc_assert (const_size >= 0);
+ nregs = (const_size + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
+ size = const_size;
}
  else
size = GET_MODE_SIZE (args[i].mode);
@@ -2559,21 +2564,27 @@ load_register_parameters (struct arg_dat
  /* Handle case where we have a value that needs shifting
 up to the msb.  eg. a QImode value and we're padding
 upward on a BYTES_BIG_ENDIAN machine.  */
- if (size < UNITS_PER_WORD
- && (args[i].locate.where_pad
- == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD)))
+ if (args[i].locate.where_pad
+ == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD))
{
- rtx x;
- int shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
-
- /* Assigning REG here rather than a temp makes CALL_FUSAGE
-report the whole reg as used.  Strictly speaking, the
-call only uses SIZE bytes at the msb end, but it doesn't
-seem worth generating rtl to say that.  */
- reg = gen_rtx_REG (word_mode, REGNO (reg));
- x = expand_shift (LSHIFT_EXPR, word_mode, reg, shift, reg, 1);
- if (x != reg)
-   emit_move_insn (reg, x);
+ gcc_checking_assert (ordered_p (size, UNITS_PER_WORD));
+ if (may_lt (size, UNITS_PER_WORD))
+   {
+ rtx x;
+ poly_int64 shift
+   = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+
+ /* Assigning REG here rather than a temp makes
+CALL_FUSAGE report the whole reg as used.
+Strictly speaking, the call only uses SIZE
+bytes at the msb end, but it doesn't seem worth
+generating rtl to say that.  */
+ reg = gen_rtx_REG (word_mode, REGNO (reg));
+ x = expand_shift (LSHIFT_EXPR, word_mode,
+   reg, shift, reg, 1);
+ if (x != reg)
+   emit_move_insn (reg, x);
+   }
}
 #endif
}
@@ -2588,17 +2599,20 @@ load_register_parameters (struct arg_dat
 
  else if (partial == 0 || args[i].pass_on_stack)
{
+ /* SIZE and CONST_SIZE are 0 for partial arguments and
+the size of a BLKmode type otherwise.  */
+ gcc_checking_assert (must_eq (size, const_size));
  rtx mem = validize_mem (copy_rtx (args[i].value));
 
  /* Check for overlap with already clobbered argument area,
 providing that this has non-zero size.  */
  

[097/nnn] poly_int: alter_reg

2017-10-23 Thread Richard Sandiford
This patch makes alter_reg cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* reload1.c (spill_stack_slot_width): Change element type
from unsigned int to poly_uint64_pod.
(alter_reg): Treat mode sizes as polynomial.

Index: gcc/reload1.c
===
--- gcc/reload1.c   2017-10-23 17:25:38.242865029 +0100
+++ gcc/reload1.c   2017-10-23 17:25:44.492640380 +0100
@@ -200,7 +200,7 @@ #define spill_indirect_levels   \
 static rtx spill_stack_slot[FIRST_PSEUDO_REGISTER];
 
 /* Width allocated so far for that stack slot.  */
-static unsigned int spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
+static poly_uint64_pod spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
 
 /* Record which pseudos needed to be spilled.  */
 static regset_head spilled_pseudos;
@@ -2142,10 +2142,10 @@ alter_reg (int i, int from_reg, bool don
 {
   rtx x = NULL_RTX;
   machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-  unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
+  poly_uint64 inherent_size = GET_MODE_SIZE (mode);
   unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
   machine_mode wider_mode = wider_subreg_mode (mode, reg_max_ref_mode[i]);
-  unsigned int total_size = GET_MODE_SIZE (wider_mode);
+  poly_uint64 total_size = GET_MODE_SIZE (wider_mode);
   unsigned int min_align = GET_MODE_BITSIZE (reg_max_ref_mode[i]);
   poly_int64 adjust = 0;
 
@@ -2174,10 +2174,15 @@ alter_reg (int i, int from_reg, bool don
{
  rtx stack_slot;
 
+ /* The sizes are taken from a subreg operation, which guarantees
+that they're ordered.  */
+ gcc_checking_assert (ordered_p (total_size, inherent_size));
+
  /* No known place to spill from => no slot to reuse.  */
  x = assign_stack_local (mode, total_size,
  min_align > inherent_align
- || total_size > inherent_size ? -1 : 0);
+ || may_gt (total_size, inherent_size)
+ ? -1 : 0);
 
  stack_slot = x;
 
@@ -2189,7 +2194,7 @@ alter_reg (int i, int from_reg, bool don
  adjust = inherent_size - total_size;
  if (maybe_nonzero (adjust))
{
- unsigned int total_bits = total_size * BITS_PER_UNIT;
+ poly_uint64 total_bits = total_size * BITS_PER_UNIT;
  machine_mode mem_mode
= int_mode_for_size (total_bits, 1).else_blk ();
  stack_slot = adjust_address_nv (x, mem_mode, adjust);
@@ -2203,9 +2208,10 @@ alter_reg (int i, int from_reg, bool don
 
   /* Reuse a stack slot if possible.  */
   else if (spill_stack_slot[from_reg] != 0
-  && spill_stack_slot_width[from_reg] >= total_size
-  && (GET_MODE_SIZE (GET_MODE (spill_stack_slot[from_reg]))
-  >= inherent_size)
+  && must_ge (spill_stack_slot_width[from_reg], total_size)
+  && must_ge (GET_MODE_SIZE
+  (GET_MODE (spill_stack_slot[from_reg])),
+  inherent_size)
   && MEM_ALIGN (spill_stack_slot[from_reg]) >= min_align)
x = spill_stack_slot[from_reg];
 
@@ -2221,16 +2227,21 @@ alter_reg (int i, int from_reg, bool don
  if (partial_subreg_p (mode,
GET_MODE (spill_stack_slot[from_reg])))
mode = GET_MODE (spill_stack_slot[from_reg]);
- if (spill_stack_slot_width[from_reg] > total_size)
-   total_size = spill_stack_slot_width[from_reg];
+ total_size = ordered_max (total_size,
+   spill_stack_slot_width[from_reg]);
  if (MEM_ALIGN (spill_stack_slot[from_reg]) > min_align)
min_align = MEM_ALIGN (spill_stack_slot[from_reg]);
}
 
+ /* The sizes are taken from a subreg operation, which guarantees
+that they're ordered.  */
+ gcc_checking_assert (ordered_p (total_size, inherent_size));
+
  /* Make a slot with that size.  */
  x = assign_stack_local (mode, total_size,
  min_align > inherent_align
- || total_size > inherent_size ? -1 : 0);
+ || may_gt (total_size, inherent_size)
+ ? -1 : 0);
  stack_slot = x;
 
  /* Cancel the  big-endian correction done in assign_stack_local.
@@ -2241,7 +2252,7 @@ alter_reg (int i, int from_reg, bool don
  adjust = GET_MODE_SIZE (mode) - total_size;
  if (maybe_nonzero (adjust))
{
- 

[096/nnn] poly_int: reloading complex subregs

2017-10-23 Thread Richard Sandiford
This patch splits out a condition that is common to both push_reload
and reload_inner_reg_of_subreg.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* reload.c (complex_word_subreg_p): New function.
(reload_inner_reg_of_subreg, push_reload): Use it.

Index: gcc/reload.c
===
--- gcc/reload.c2017-10-23 17:18:51.485721234 +0100
+++ gcc/reload.c2017-10-23 17:25:43.543674491 +0100
@@ -811,6 +811,23 @@ find_reusable_reload (rtx *p_in, rtx out
   return n_reloads;
 }
 
+/* Return true if:
+
+   (a) (subreg:OUTER_MODE REG ...) represents a word or subword subreg
+   of a multiword value; and
+
+   (b) the number of *words* in REG does not match the number of *registers*
+   in REG.  */
+
+static bool
+complex_word_subreg_p (machine_mode outer_mode, rtx reg)
+{
+  machine_mode inner_mode = GET_MODE (reg);
+  return (GET_MODE_SIZE (outer_mode) <= UNITS_PER_WORD
+ && GET_MODE_SIZE (inner_mode) > UNITS_PER_WORD
+ && GET_MODE_SIZE (inner_mode) / UNITS_PER_WORD != REG_NREGS (reg));
+}
+
 /* Return true if X is a SUBREG that will need reloading of its SUBREG_REG
expression.  MODE is the mode that X will be used in.  OUTPUT is true if
the function is invoked for the output part of an enclosing reload.  */
@@ -842,11 +859,7 @@ reload_inner_reg_of_subreg (rtx x, machi
  INNER is larger than a word and the number of registers in INNER is
  not the same as the number of words in INNER, then INNER will need
  reloading (with an in-out reload).  */
-  return (output
- && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
- && GET_MODE_SIZE (GET_MODE (inner)) > UNITS_PER_WORD
- && ((GET_MODE_SIZE (GET_MODE (inner)) / UNITS_PER_WORD)
- != REG_NREGS (inner)));
+  return output && complex_word_subreg_p (mode, inner);
 }
 
 /* Return nonzero if IN can be reloaded into REGNO with mode MODE without
@@ -1064,12 +1077,7 @@ push_reload (rtx in, rtx out, rtx *inloc
  /* The case where out is nonzero
 is handled differently in the following statement.  */
  && (out == 0 || subreg_lowpart_p (in))
- && ((GET_MODE_SIZE (inmode) <= UNITS_PER_WORD
-  && (GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))
-  > UNITS_PER_WORD)
-  && ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))
-   / UNITS_PER_WORD)
-  != REG_NREGS (SUBREG_REG (in
+ && (complex_word_subreg_p (inmode, SUBREG_REG (in))
  || !targetm.hard_regno_mode_ok (subreg_regno (in), inmode)))
  || (secondary_reload_class (1, rclass, inmode, in) != NO_REGS
  && (secondary_reload_class (1, rclass, GET_MODE (SUBREG_REG (in)),


[095/nnn] poly_int: process_alt_operands

2017-10-23 Thread Richard Sandiford
This patch makes process_alt_operands check that the mode sizes
are ordered, so that match_reload can validly treat them as subregs
of one another.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* lra-constraints.c (process_alt_operands): Reject matched
operands whose sizes aren't ordered.
(match_reload): Refer to this check here.

Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2017-10-23 17:20:47.003797985 +0100
+++ gcc/lra-constraints.c   2017-10-23 17:25:42.597708494 +0100
@@ -933,6 +933,8 @@ match_reload (signed char out, signed ch
   push_to_sequence (*before);
   if (inmode != outmode)
 {
+  /* process_alt_operands has already checked that the mode sizes
+are ordered.  */
   if (partial_subreg_p (outmode, inmode))
{
  reg = new_in_reg
@@ -2112,6 +2114,13 @@ process_alt_operands (int only_alternati
len = 0;
lra_assert (nop > m);
 
+   /* Reject matches if we don't know which operand is
+  bigger.  This situation would arguably be a bug in
+  an .md pattern, but could also occur in a user asm.  */
+   if (!ordered_p (GET_MODE_SIZE (biggest_mode[m]),
+   GET_MODE_SIZE (biggest_mode[nop])))
+ break;
+
this_alternative_matches = m;
m_hregno = get_hard_regno (*curr_id->operand_loc[m], false);
/* We are supposed to match a previous operand.


[094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call

2017-10-23 Thread Richard Sandiford
This patch makes the mode size assumptions in
expand_ifn_atomic_compare_exchange_into_call a bit more
explicit, so that a later patch can add a to_constant () call.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Assert
that the mode size is in the set {1, 2, 4, 8, 16}.

Index: gcc/builtins.c
===
--- gcc/builtins.c  2017-10-23 17:22:18.226824652 +0100
+++ gcc/builtins.c  2017-10-23 17:25:41.647742640 +0100
@@ -5838,9 +5838,12 @@ expand_ifn_atomic_compare_exchange_into_
   /* Skip the boolean weak parameter.  */
   for (z = 4; z < 6; z++)
 vec->quick_push (gimple_call_arg (call, z));
+  /* At present we only have BUILT_IN_ATOMIC_COMPARE_EXCHANGE_{1,2,4,8,16}.  */
+  unsigned int bytes_log2 = exact_log2 (GET_MODE_SIZE (mode));
+  gcc_assert (bytes_log2 < 5);
   built_in_function fncode
 = (built_in_function) ((int) BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1
-  + exact_log2 (GET_MODE_SIZE (mode)));
+  + bytes_log2);
   tree fndecl = builtin_decl_explicit (fncode);
   tree fn = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fndecl)),
fndecl);


[093/nnn] poly_int: adjust_mems

2017-10-23 Thread Richard Sandiford
This patch makes the var-tracking.c handling of autoinc addresses
cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* var-tracking.c (adjust_mems): Treat mode sizes as polynomial.
Use plus_constant instead of gen_rtx_PLUS.

Index: gcc/var-tracking.c
===
--- gcc/var-tracking.c  2017-10-23 17:16:59.708267276 +0100
+++ gcc/var-tracking.c  2017-10-23 17:25:40.610779914 +0100
@@ -1016,6 +1016,7 @@ adjust_mems (rtx loc, const_rtx old_rtx,
   machine_mode mem_mode_save;
   bool store_save;
   scalar_int_mode tem_mode, tem_subreg_mode;
+  poly_int64 size;
   switch (GET_CODE (loc))
 {
 case REG:
@@ -1060,11 +1061,9 @@ adjust_mems (rtx loc, const_rtx old_rtx,
   return mem;
 case PRE_INC:
 case PRE_DEC:
-  addr = gen_rtx_PLUS (GET_MODE (loc), XEXP (loc, 0),
-  gen_int_mode (GET_CODE (loc) == PRE_INC
-? GET_MODE_SIZE (amd->mem_mode)
-: -GET_MODE_SIZE (amd->mem_mode),
-GET_MODE (loc)));
+  size = GET_MODE_SIZE (amd->mem_mode);
+  addr = plus_constant (GET_MODE (loc), XEXP (loc, 0),
+   GET_CODE (loc) == PRE_INC ? size : -size);
   /* FALLTHRU */
 case POST_INC:
 case POST_DEC:
@@ -1072,12 +1071,10 @@ adjust_mems (rtx loc, const_rtx old_rtx,
addr = XEXP (loc, 0);
   gcc_assert (amd->mem_mode != VOIDmode && amd->mem_mode != BLKmode);
   addr = simplify_replace_fn_rtx (addr, old_rtx, adjust_mems, data);
-  tem = gen_rtx_PLUS (GET_MODE (loc), XEXP (loc, 0),
- gen_int_mode ((GET_CODE (loc) == PRE_INC
-|| GET_CODE (loc) == POST_INC)
-   ? GET_MODE_SIZE (amd->mem_mode)
-   : -GET_MODE_SIZE (amd->mem_mode),
-   GET_MODE (loc)));
+  size = GET_MODE_SIZE (amd->mem_mode);
+  tem = plus_constant (GET_MODE (loc), XEXP (loc, 0),
+  (GET_CODE (loc) == PRE_INC
+   || GET_CODE (loc) == POST_INC) ? size : -size);
   store_save = amd->store;
   amd->store = false;
   tem = simplify_replace_fn_rtx (tem, old_rtx, adjust_mems, data);


[092/nnn] poly_int: PUSH_ROUNDING

2017-10-23 Thread Richard Sandiford
PUSH_ROUNDING is difficult to convert to a hook since there is still
a lot of conditional code based on it.  It isn't clear that a direct
conversion with checks for null hooks is the right thing to do.

Rather than untangle that, this patch converts all implementations
that do something to out-of-line functions that have the same
interface as a hook would have.  This should at least help towards
any future hook conversion.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/cr16/cr16-protos.h (cr16_push_rounding): Declare.
* config/cr16/cr16.h (PUSH_ROUNDING): Move implementation to...
* config/cr16/cr16.c (cr16_push_rounding): ...this new function.
* config/h8300/h8300-protos.h (h8300_push_rounding): Declare.
* config/h8300/h8300.h (PUSH_ROUNDING): Move implementation to...
* config/h8300/h8300.c (h8300_push_rounding): ...this new function.
* config/i386/i386-protos.h (ix86_push_rounding): Declare.
* config/i386/i386.h (PUSH_ROUNDING): Move implementation to...
* config/i386/i386.c (ix86_push_rounding): ...this new function.
* config/m32c/m32c-protos.h (m32c_push_rounding): Take and return
a poly_int64.
* config/m32c/m32c.c (m32c_push_rounding): Likewise.
* config/m68k/m68k-protos.h (m68k_push_rounding): Declare.
* config/m68k/m68k.h (PUSH_ROUNDING): Move implementation to...
* config/m68k/m68k.c (m68k_push_rounding): ...this new function.
* config/pdp11/pdp11-protos.h (pdp11_push_rounding): Declare.
* config/pdp11/pdp11.h (PUSH_ROUNDING): Move implementation to...
* config/pdp11/pdp11.c (pdp11_push_rounding): ...this new function.
* config/stormy16/stormy16-protos.h (xstormy16_push_rounding): Declare.
* config/stormy16/stormy16.h (PUSH_ROUNDING): Move implementation to...
* config/stormy16/stormy16.c (xstormy16_push_rounding): ...this new
function.
* expr.c (emit_move_resolve_push): Treat the input and result
of PUSH_ROUNDING as a poly_int64.
(emit_move_complex_push, emit_single_push_insn_1): Likewise.
(emit_push_insn): Likewise.
* lra-eliminations.c (mark_not_eliminable): Likewise.
* recog.c (push_operand): Likewise.
* reload1.c (elimination_effects): Likewise.
* rtlanal.c (nonzero_bits1): Likewise.
* calls.c (store_one_arg): Likewise.  Require the padding to be
known at compile time.

Index: gcc/config/cr16/cr16-protos.h
===
--- gcc/config/cr16/cr16-protos.h   2017-09-04 11:49:42.896500726 +0100
+++ gcc/config/cr16/cr16-protos.h   2017-10-23 17:25:38.230865460 +0100
@@ -94,5 +94,6 @@ extern const char *cr16_emit_logical_di
 /* Handling the "interrupt" attribute.  */
 extern int cr16_interrupt_function_p (void);
 extern bool cr16_is_data_model (enum data_model_type);
+extern poly_int64 cr16_push_rounding (poly_int64);
 
 #endif /* Not GCC_CR16_PROTOS_H.  */ 
Index: gcc/config/cr16/cr16.h
===
--- gcc/config/cr16/cr16.h  2017-10-23 11:41:22.824941066 +0100
+++ gcc/config/cr16/cr16.h  2017-10-23 17:25:38.231865424 +0100
@@ -383,7 +383,7 @@ #define ACCUMULATE_OUTGOING_ARGS 0
 
 #define PUSH_ARGS 1
 
-#define PUSH_ROUNDING(BYTES) (((BYTES) + 1) & ~1)
+#define PUSH_ROUNDING(BYTES) cr16_push_rounding (BYTES)
 
 #ifndef CUMULATIVE_ARGS
 struct cumulative_args
Index: gcc/config/cr16/cr16.c
===
--- gcc/config/cr16/cr16.c  2017-10-23 17:19:01.400170158 +0100
+++ gcc/config/cr16/cr16.c  2017-10-23 17:25:38.231865424 +0100
@@ -2215,6 +2215,14 @@ cr16_emit_logical_di (rtx *operands, enu
   return "";
 }
 
+/* Implement PUSH_ROUNDING.  */
+
+poly_int64
+cr16_push_rounding (poly_int64 bytes)
+{
+  return (bytes + 1) & ~1;
+}
+
 /* Initialize 'targetm' variable which contains pointers to functions 
and data relating to the target machine.  */
 
Index: gcc/config/h8300/h8300-protos.h
===
--- gcc/config/h8300/h8300-protos.h 2017-09-12 14:29:25.231530806 +0100
+++ gcc/config/h8300/h8300-protos.h 2017-10-23 17:25:38.231865424 +0100
@@ -112,5 +112,6 @@ extern boolh8sx_mergeable_me
 extern boolh8sx_emit_movmd (rtx, rtx, rtx, HOST_WIDE_INT);
 extern voidh8300_swap_into_er6 (rtx);
 extern voidh8300_swap_out_of_er6 (rtx);
+extern poly_int64  h8300_push_rounding (poly_int64);
 
 #endif /* ! GCC_H8300_PROTOS_H */
Index: gcc/config/h8300/h8300.h
===
--- gcc/config/h8300/h8300.h2017-10-23 11:41:22.920697531 +0100
+++ gcc/config/h8300/h8300.h

[091/nnn] poly_int: emit_single_push_insn_1

2017-10-23 Thread Richard Sandiford
This patch makes emit_single_push_insn_1 cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.c (emit_single_push_insn_1): Treat mode sizes as polynomial.
Use plus_constant instead of gen_rtx_PLUS.

Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:25:35.142976454 +0100
+++ gcc/expr.c  2017-10-23 17:25:37.064907370 +0100
@@ -4141,9 +4141,6 @@ emit_single_push_insn_1 (machine_mode mo
  access to type.  */
   else if (targetm.calls.function_arg_padding (mode, type) == PAD_DOWNWARD)
 {
-  unsigned padding_size = rounded_size - GET_MODE_SIZE (mode);
-  HOST_WIDE_INT offset;
-
   emit_move_insn (stack_pointer_rtx,
  expand_binop (Pmode,
STACK_GROWS_DOWNWARD ? sub_optab
@@ -4152,31 +4149,27 @@ emit_single_push_insn_1 (machine_mode mo
gen_int_mode (rounded_size, Pmode),
NULL_RTX, 0, OPTAB_LIB_WIDEN));
 
-  offset = (HOST_WIDE_INT) padding_size;
+  poly_int64 offset = rounded_size - GET_MODE_SIZE (mode);
   if (STACK_GROWS_DOWNWARD && STACK_PUSH_CODE == POST_DEC)
/* We have already decremented the stack pointer, so get the
   previous value.  */
-   offset += (HOST_WIDE_INT) rounded_size;
+   offset += rounded_size;
 
   if (!STACK_GROWS_DOWNWARD && STACK_PUSH_CODE == POST_INC)
/* We have already incremented the stack pointer, so get the
   previous value.  */
-   offset -= (HOST_WIDE_INT) rounded_size;
+   offset -= rounded_size;
 
-  dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-   gen_int_mode (offset, Pmode));
+  dest_addr = plus_constant (Pmode, stack_pointer_rtx, offset);
 }
   else
 {
   if (STACK_GROWS_DOWNWARD)
/* ??? This seems wrong if STACK_PUSH_CODE == POST_DEC.  */
-   dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
- gen_int_mode (-(HOST_WIDE_INT) rounded_size,
-   Pmode));
+   dest_addr = plus_constant (Pmode, stack_pointer_rtx, -rounded_size);
   else
/* ??? This seems wrong if STACK_PUSH_CODE == POST_INC.  */
-   dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
- gen_int_mode (rounded_size, Pmode));
+   dest_addr = plus_constant (Pmode, stack_pointer_rtx, rounded_size);
 
   dest_addr = gen_rtx_PRE_MODIFY (Pmode, stack_pointer_rtx, dest_addr);
 }


[090/nnn] poly_int: set_inc_state

2017-10-23 Thread Richard Sandiford
This trivial patch makes auto-inc-dec.c:set_inc_state take a poly_int64.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* auto-inc-dec.c (set_inc_state): Take the mode size as a poly_int64
rather than an int.

Index: gcc/auto-inc-dec.c
===
--- gcc/auto-inc-dec.c  2017-07-27 10:37:54.907033464 +0100
+++ gcc/auto-inc-dec.c  2017-10-23 17:25:36.142940510 +0100
@@ -152,14 +152,14 @@ enum gen_form
 static rtx mem_tmp;
 
 static enum inc_state
-set_inc_state (HOST_WIDE_INT val, int size)
+set_inc_state (HOST_WIDE_INT val, poly_int64 size)
 {
   if (val == 0)
 return INC_ZERO;
   if (val < 0)
-return (val == -size) ? INC_NEG_SIZE : INC_NEG_ANY;
+return must_eq (val, -size) ? INC_NEG_SIZE : INC_NEG_ANY;
   else
-return (val == size) ? INC_POS_SIZE : INC_POS_ANY;
+return must_eq (val, size) ? INC_POS_SIZE : INC_POS_ANY;
 }
 
 /* The DECISION_TABLE that describes what form, if any, the increment


[089/nnn] poly_int: expand_expr_real_1

2017-10-23 Thread Richard Sandiford
This patch makes the VIEW_CONVERT_EXPR handling in expand_expr_real_1
cope with polynomial type and mode sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.c (expand_expr_real_1): Use tree_to_poly_uint64
instead of int_size_in_bytes when handling VIEW_CONVERT_EXPRs
via stack temporaries.  Treat the mode size as polynomial too.

Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:25:34.105013764 +0100
+++ gcc/expr.c  2017-10-23 17:25:35.142976454 +0100
@@ -6,9 +6,10 @@ expand_expr_real_1 (tree exp, rtx target
  else if (STRICT_ALIGNMENT)
{
  tree inner_type = TREE_TYPE (treeop0);
- HOST_WIDE_INT temp_size
-   = MAX (int_size_in_bytes (inner_type),
-  (HOST_WIDE_INT) GET_MODE_SIZE (mode));
+ poly_uint64 mode_size = GET_MODE_SIZE (mode);
+ poly_uint64 op0_size
+   = tree_to_poly_uint64 (TYPE_SIZE_UNIT (inner_type));
+ poly_int64 temp_size = upper_bound (op0_size, mode_size);
  rtx new_rtx
= assign_stack_temp_for_type (mode, temp_size, type);
  rtx new_with_op0_mode


[088/nnn] poly_int: expand_expr_real_2

2017-10-23 Thread Richard Sandiford
This patch makes expand_expr_real_2 cope with polynomial mode sizes
when handling conversions involving a union type.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.c (expand_expr_real_2): When handling conversions involving
unions, apply tree_to_poly_uint64 to the TYPE_SIZE rather than
multiplying int_size_in_bytes by BITS_PER_UNIT.  Treat GET_MODE_BISIZE
as a poly_uint64 too.

Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:25:30.704136008 +0100
+++ gcc/expr.c  2017-10-23 17:25:34.105013764 +0100
@@ -8354,11 +8354,14 @@ #define REDUCE_BIT_FIELD(expr)  (reduce_b
  && !TYPE_REVERSE_STORAGE_ORDER (type));
 
  /* Store this field into a union of the proper type.  */
+ poly_uint64 op0_size
+   = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (treeop0)));
+ poly_uint64 union_size = GET_MODE_BITSIZE (mode);
  store_field (target,
-  MIN ((int_size_in_bytes (TREE_TYPE
-   (treeop0))
-* BITS_PER_UNIT),
-   (HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
+  /* The conversion must be constructed so that
+ we know at compile time how many bits
+ to preserve.  */
+  ordered_min (op0_size, union_size),
   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
   false, false);
}


[087/nnn] poly_int: subreg_get_info

2017-10-23 Thread Richard Sandiford
This patch makes subreg_get_info handle polynomial sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* rtlanal.c (subreg_get_info): Handle polynomial mode sizes.

Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2017-10-23 17:25:30.705135972 +0100
+++ gcc/rtlanal.c   2017-10-23 17:25:32.610067499 +0100
@@ -3694,8 +3694,9 @@ subreg_get_info (unsigned int xregno, ma
 
   gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
 
-  unsigned int xsize = GET_MODE_SIZE (xmode);
-  unsigned int ysize = GET_MODE_SIZE (ymode);
+  poly_uint64 xsize = GET_MODE_SIZE (xmode);
+  poly_uint64 ysize = GET_MODE_SIZE (ymode);
+
   bool rknown = false;
 
   /* If the register representation of a non-scalar mode has holes in it,
@@ -3707,6 +3708,7 @@ subreg_get_info (unsigned int xregno, ma
   /* As a consequence, we must be dealing with a constant number of
 scalars, and thus a constant offset.  */
   HOST_WIDE_INT coffset = offset.to_constant ();
+  HOST_WIDE_INT cysize = ysize.to_constant ();
   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
   unsigned int nunits = GET_MODE_NUNITS (xmode);
   scalar_mode xmode_unit = GET_MODE_INNER (xmode);
@@ -3727,7 +3729,7 @@ subreg_get_info (unsigned int xregno, ma
 of each unit.  */
   if ((coffset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
  && (coffset / GET_MODE_SIZE (xmode_unit)
- != ((coffset + ysize - 1) / GET_MODE_SIZE (xmode_unit
+ != ((coffset + cysize - 1) / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
@@ -3738,8 +3740,12 @@ subreg_get_info (unsigned int xregno, ma
 
   nregs_ymode = hard_regno_nregs (xregno, ymode);
 
+  /* Subreg sizes must be ordered, so that we can tell whether they are
+ partial, paradoxical or complete.  */
+  gcc_checking_assert (ordered_p (xsize, ysize));
+
   /* Paradoxical subregs are otherwise valid.  */
-  if (!rknown && known_zero (offset) && ysize > xsize)
+  if (!rknown && known_zero (offset) && may_gt (ysize, xsize))
 {
   info->representable_p = true;
   /* If this is a big endian paradoxical subreg, which uses more
@@ -3761,20 +3767,19 @@ subreg_get_info (unsigned int xregno, ma
 
   /* If registers store different numbers of bits in the different
  modes, we cannot generally form this subreg.  */
+  poly_uint64 regsize_xmode, regsize_ymode;
   if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)
   && !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode)
-  && (xsize % nregs_xmode) == 0
-  && (ysize % nregs_ymode) == 0)
+  && multiple_p (xsize, nregs_xmode, _xmode)
+  && multiple_p (ysize, nregs_ymode, _ymode))
 {
-  int regsize_xmode = xsize / nregs_xmode;
-  int regsize_ymode = ysize / nregs_ymode;
   if (!rknown
- && ((nregs_ymode > 1 && regsize_xmode > regsize_ymode)
- || (nregs_xmode > 1 && regsize_ymode > regsize_xmode)))
+ && ((nregs_ymode > 1 && may_gt (regsize_xmode, regsize_ymode))
+ || (nregs_xmode > 1 && may_gt (regsize_ymode, regsize_xmode
{
  info->representable_p = false;
- info->nregs = CEIL (ysize, regsize_xmode);
- if (!can_div_trunc_p (offset, regsize_xmode, >offset))
+ if (!can_div_away_from_zero_p (ysize, regsize_xmode, >nregs)
+ || !can_div_trunc_p (offset, regsize_xmode, >offset))
/* Checked by validate_subreg.  We must know at compile time
   which inner registers are being accessed.  */
gcc_unreachable ();
@@ -3800,7 +3805,7 @@ subreg_get_info (unsigned int xregno, ma
   HOST_WIDE_INT count;
   if (!rknown
  && WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN
- && regsize_xmode == regsize_ymode
+ && must_eq (regsize_xmode, regsize_ymode)
  && constant_multiple_p (offset, regsize_ymode, ))
{
  info->representable_p = true;
@@ -3837,8 +3842,7 @@ subreg_get_info (unsigned int xregno, ma
  be exact, otherwise we don't know how to verify the constraint.
  These conditions may be relaxed but subreg_regno_offset would
  need to be redesigned.  */
-  gcc_assert ((xsize % num_blocks) == 0);
-  poly_uint64 bytes_per_block = xsize / num_blocks;
+  poly_uint64 bytes_per_block = exact_div (xsize, num_blocks);
 
   /* Get the number of the first block that contains the subreg and the byte
  offset of the subreg from the start of that block.  */


[086/nnn] poly_int: REGMODE_NATURAL_SIZE

2017-10-23 Thread Richard Sandiford
This patch makes target-independent code that uses REGMODE_NATURAL_SIZE
treat it as a poly_int rather than a constant.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* combine.c (can_change_dest_mode): Handle polynomial
REGMODE_NATURAL_SIZE.
* expmed.c (store_bit_field_1): Likewise.
* expr.c (store_constructor): Likewise.
* emit-rtl.c (validate_subreg): Operate on polynomial mode sizes
and polynomial REGMODE_NATURAL_SIZE.
(gen_lowpart_common): Likewise.
* reginfo.c (record_subregs_of_mode): Likewise.
* rtlanal.c (read_modify_subreg_p): Likewise.

Index: gcc/combine.c
===
--- gcc/combine.c   2017-10-23 17:25:26.554256722 +0100
+++ gcc/combine.c   2017-10-23 17:25:30.702136080 +0100
@@ -2474,8 +2474,8 @@ can_change_dest_mode (rtx x, int added_s
 
   /* Don't change between modes with different underlying register sizes,
  since this could lead to invalid subregs.  */
-  if (REGMODE_NATURAL_SIZE (mode)
-  != REGMODE_NATURAL_SIZE (GET_MODE (x)))
+  if (may_ne (REGMODE_NATURAL_SIZE (mode),
+ REGMODE_NATURAL_SIZE (GET_MODE (x
 return false;
 
   regno = REGNO (x);
Index: gcc/expmed.c
===
--- gcc/expmed.c2017-10-23 17:23:00.293367701 +0100
+++ gcc/expmed.c2017-10-23 17:25:30.703136044 +0100
@@ -778,7 +778,7 @@ store_bit_field_1 (rtx str_rtx, poly_uin
 In the latter case, use subreg on the rhs side, not lhs.  */
   rtx sub;
   HOST_WIDE_INT regnum;
-  HOST_WIDE_INT regsize = REGMODE_NATURAL_SIZE (GET_MODE (op0));
+  poly_uint64 regsize = REGMODE_NATURAL_SIZE (GET_MODE (op0));
   if (known_zero (bitnum)
  && must_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0
{
Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:23:00.293367701 +0100
+++ gcc/expr.c  2017-10-23 17:25:30.704136008 +0100
@@ -6204,8 +6204,8 @@ store_constructor (tree exp, rtx target,
   a constant.  But if more than one register is involved,
   this probably loses.  */
else if (REG_P (target) && TREE_STATIC (exp)
-&& (GET_MODE_SIZE (GET_MODE (target))
-<= REGMODE_NATURAL_SIZE (GET_MODE (target
+&& must_le (GET_MODE_SIZE (GET_MODE (target)),
+REGMODE_NATURAL_SIZE (GET_MODE (target
  {
emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
cleared = 1;
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2017-10-23 17:23:00.293367701 +0100
+++ gcc/emit-rtl.c  2017-10-23 17:25:30.703136044 +0100
@@ -924,8 +924,13 @@ gen_tmp_stack_mem (machine_mode mode, rt
 validate_subreg (machine_mode omode, machine_mode imode,
 const_rtx reg, poly_uint64 offset)
 {
-  unsigned int isize = GET_MODE_SIZE (imode);
-  unsigned int osize = GET_MODE_SIZE (omode);
+  poly_uint64 isize = GET_MODE_SIZE (imode);
+  poly_uint64 osize = GET_MODE_SIZE (omode);
+
+  /* The sizes must be ordered, so that we know whether the subreg
+ is partial, paradoxical or complete.  */
+  if (!ordered_p (isize, osize))
+return false;
 
   /* All subregs must be aligned.  */
   if (!multiple_p (offset, osize))
@@ -935,7 +940,7 @@ validate_subreg (machine_mode omode, mac
   if (may_ge (offset, isize))
 return false;
 
-  unsigned int regsize = REGMODE_NATURAL_SIZE (imode);
+  poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
 
   /* ??? This should not be here.  Temporarily continue to allow word_mode
  subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
@@ -945,7 +950,7 @@ validate_subreg (machine_mode omode, mac
 ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
  is the culprit here, and not the backends.  */
-  else if (osize >= regsize && isize >= osize)
+  else if (must_ge (osize, regsize) && must_ge (isize, osize))
 ;
   /* Allow component subregs of complex and vector.  Though given the below
  extraction rules, it's not always clear what that means.  */
@@ -964,7 +969,7 @@ validate_subreg (machine_mode omode, mac
  (subreg:SI (reg:DF) 0) isn't.  */
   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
 {
-  if (! (isize == osize
+  if (! (must_eq (isize, osize)
 /* LRA can use subreg to store a floating point value in
an integer mode.  Although the floating point and the
integer modes need the same number of hard registers,
@@ -976,7 +981,7 @@ validate_subreg (machine_mode omode, mac
 }
 
   /* Paradoxical subregs must have offset zero.  */

[085/nnn] poly_int: expand_vector_ubsan_overflow

2017-10-23 Thread Richard Sandiford
This patch makes expand_vector_ubsan_overflow cope with a polynomial
number of elements.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* internal-fn.c (expand_vector_ubsan_overflow): Handle polynomial
numbers of elements.

Index: gcc/internal-fn.c
===
--- gcc/internal-fn.c   2017-10-23 17:11:39.913311438 +0100
+++ gcc/internal-fn.c   2017-10-23 17:22:51.056325855 +0100
@@ -1872,7 +1872,7 @@ expand_mul_overflow (location_t loc, tre
 expand_vector_ubsan_overflow (location_t loc, enum tree_code code, tree lhs,
  tree arg0, tree arg1)
 {
-  int cnt = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+  poly_uint64 cnt = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
   rtx_code_label *loop_lab = NULL;
   rtx cntvar = NULL_RTX;
   tree cntv = NULL_TREE;
@@ -1882,6 +1882,8 @@ expand_vector_ubsan_overflow (location_t
   tree resv = NULL_TREE;
   rtx lhsr = NULL_RTX;
   rtx resvr = NULL_RTX;
+  unsigned HOST_WIDE_INT const_cnt = 0;
+  bool use_loop_p = (!cnt.is_constant (_cnt) || const_cnt > 4);
 
   if (lhs)
 {
@@ -1902,7 +1904,7 @@ expand_vector_ubsan_overflow (location_t
}
}
 }
-  if (cnt > 4)
+  if (use_loop_p)
 {
   do_pending_stack_adjust ();
   loop_lab = gen_label_rtx ();
@@ -1921,10 +1923,10 @@ expand_vector_ubsan_overflow (location_t
   rtx arg1r = expand_normal (arg1);
   arg1 = make_tree (TREE_TYPE (arg1), arg1r);
 }
-  for (int i = 0; i < (cnt > 4 ? 1 : cnt); i++)
+  for (unsigned int i = 0; i < (use_loop_p ? 1 : const_cnt); i++)
 {
   tree op0, op1, res = NULL_TREE;
-  if (cnt > 4)
+  if (use_loop_p)
{
  tree atype = build_array_type_nelts (eltype, cnt);
  op0 = uniform_vector_p (arg0);
@@ -1964,7 +1966,7 @@ expand_vector_ubsan_overflow (location_t
  false, false, false, true, );
  break;
case MINUS_EXPR:
- if (cnt > 4 ? integer_zerop (arg0) : integer_zerop (op0))
+ if (use_loop_p ? integer_zerop (arg0) : integer_zerop (op0))
expand_neg_overflow (loc, res, op1, true, );
  else
expand_addsub_overflow (loc, MINUS_EXPR, res, op0, op1,
@@ -1978,7 +1980,7 @@ expand_vector_ubsan_overflow (location_t
  gcc_unreachable ();
}
 }
-  if (cnt > 4)
+  if (use_loop_p)
 {
   struct separate_ops ops;
   ops.code = PLUS_EXPR;
@@ -1991,7 +1993,8 @@ expand_vector_ubsan_overflow (location_t
EXPAND_NORMAL);
   if (ret != cntvar)
emit_move_insn (cntvar, ret);
-  do_compare_rtx_and_jump (cntvar, GEN_INT (cnt), NE, false,
+  rtx cntrtx = gen_int_mode (cnt, TYPE_MODE (sizetype));
+  do_compare_rtx_and_jump (cntvar, cntrtx, NE, false,
   TYPE_MODE (sizetype), NULL_RTX, NULL, loop_lab,
   profile_probability::very_likely ());
 }


[084/nnn] poly_int: folding BIT_FIELD_REFs on vectors

2017-10-23 Thread Richard Sandiford
This patch makes the:

  (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)

folder cope with polynomial numbers of elements.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* match.pd: Cope with polynomial numbers of vector elements.

Index: gcc/match.pd
===
--- gcc/match.pd2017-10-23 17:22:18.230825454 +0100
+++ gcc/match.pd2017-10-23 17:22:50.031432167 +0100
@@ -4307,46 +4307,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
idx = idx / width;
n = n / width;
/* Constructor elements can be subvectors.  */
-   unsigned HOST_WIDE_INT k = 1;
+   poly_uint64 k = 1;
if (CONSTRUCTOR_NELTS (ctor) != 0)
  {
tree cons_elem = TREE_TYPE (CONSTRUCTOR_ELT (ctor, 0)->value);
   if (TREE_CODE (cons_elem) == VECTOR_TYPE)
 k = TYPE_VECTOR_SUBPARTS (cons_elem);
 }
+   unsigned HOST_WIDE_INT elt, count, const_k;
  }
  (switch
   /* We keep an exact subset of the constructor elements.  */
-  (if ((idx % k) == 0 && (n % k) == 0)
+  (if (multiple_p (idx, k, ) && multiple_p (n, k, ))
(if (CONSTRUCTOR_NELTS (ctor) == 0)
 { build_constructor (type, NULL); }
-   (with
+   (if (count == 1)
+(if (elt < CONSTRUCTOR_NELTS (ctor))
+ { CONSTRUCTOR_ELT (ctor, elt)->value; }
+ { build_zero_cst (type); })
 {
-  idx /= k;
-  n /= k;
-}
-(if (n == 1)
- (if (idx < CONSTRUCTOR_NELTS (ctor))
-  { CONSTRUCTOR_ELT (ctor, idx)->value; }
-  { build_zero_cst (type); })
- {
-   vec *vals;
-   vec_alloc (vals, n);
-   for (unsigned i = 0;
-i < n && idx + i < CONSTRUCTOR_NELTS (ctor); ++i)
- CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
- CONSTRUCTOR_ELT (ctor, idx + i)->value);
-   build_constructor (type, vals);
- }
+  vec *vals;
+  vec_alloc (vals, count);
+  for (unsigned i = 0;
+   i < count && elt + i < CONSTRUCTOR_NELTS (ctor); ++i)
+CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
+CONSTRUCTOR_ELT (ctor, elt + i)->value);
+  build_constructor (type, vals);
+})))
   /* The bitfield references a single constructor element.  */
-  (if (idx + n <= (idx / k + 1) * k)
+  (if (k.is_constant (_k)
+  && idx + n <= (idx / const_k + 1) * const_k)
(switch
-(if (CONSTRUCTOR_NELTS (ctor) <= idx / k)
+   (if (CONSTRUCTOR_NELTS (ctor) <= idx / const_k)
 { build_zero_cst (type); })
-   (if (n == k)
-{ CONSTRUCTOR_ELT (ctor, idx / k)->value; })
-   (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; }
-  @1 { bitsize_int ((idx % k) * width); })
+   (if (n == const_k)
+{ CONSTRUCTOR_ELT (ctor, idx / const_k)->value; })
+   (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / const_k)->value; }
+  @1 { bitsize_int ((idx % const_k) * width); })
 
 /* Simplify a bit extraction from a bit insertion for the cases with
the inserted element fully covering the extraction or the insertion


[083/nnn] poly_int: fold_indirect_ref_1

2017-10-23 Thread Richard Sandiford
This patch makes fold_indirect_ref_1 handle polynomial offsets in
a POINTER_PLUS_EXPR.  The specific reason for doing this now is
to handle:

  (tree_to_uhwi (part_width) / BITS_PER_UNIT
   * TYPE_VECTOR_SUBPARTS (op00type));

when TYPE_VECTOR_SUBPARTS becomes a poly_int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* fold-const.c (fold_indirect_ref_1): Handle polynomial offsets
in a POINTER_PLUS_EXPR.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c2017-10-23 17:20:50.881679906 +0100
+++ gcc/fold-const.c2017-10-23 17:22:48.984540760 +0100
@@ -14137,6 +14137,7 @@ fold_indirect_ref_1 (location_t loc, tre
 {
   tree sub = op0;
   tree subtype;
+  poly_uint64 const_op01;
 
   STRIP_NOPS (sub);
   subtype = TREE_TYPE (sub);
@@ -14191,7 +14192,7 @@ fold_indirect_ref_1 (location_t loc, tre
 }
 
   if (TREE_CODE (sub) == POINTER_PLUS_EXPR
-  && TREE_CODE (TREE_OPERAND (sub, 1)) == INTEGER_CST)
+  && poly_int_tree_p (TREE_OPERAND (sub, 1), _op01))
 {
   tree op00 = TREE_OPERAND (sub, 0);
   tree op01 = TREE_OPERAND (sub, 1);
@@ -14208,15 +14209,12 @@ fold_indirect_ref_1 (location_t loc, tre
  && type == TREE_TYPE (op00type))
{
  tree part_width = TYPE_SIZE (type);
- unsigned HOST_WIDE_INT max_offset
+ poly_uint64 max_offset
= (tree_to_uhwi (part_width) / BITS_PER_UNIT
   * TYPE_VECTOR_SUBPARTS (op00type));
- if (tree_int_cst_sign_bit (op01) == 0
- && compare_tree_int (op01, max_offset) == -1)
+ if (must_lt (const_op01, max_offset))
{
- unsigned HOST_WIDE_INT offset = tree_to_uhwi (op01);
- unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
- tree index = bitsize_int (indexi);
+ tree index = bitsize_int (const_op01 * BITS_PER_UNIT);
  return fold_build3_loc (loc,
  BIT_FIELD_REF, type, op00,
  part_width, index);
@@ -14226,8 +14224,8 @@ fold_indirect_ref_1 (location_t loc, tre
  else if (TREE_CODE (op00type) == COMPLEX_TYPE
   && type == TREE_TYPE (op00type))
{
- tree size = TYPE_SIZE_UNIT (type);
- if (tree_int_cst_equal (size, op01))
+ if (must_eq (wi::to_poly_offset (TYPE_SIZE_UNIT (type)),
+  const_op01))
return fold_build1_loc (loc, IMAGPART_EXPR, type, op00);
}
  /* ((foo *))[1] => fooarray[1] */


[082/nnn] poly_int: omp-simd-clone.c

2017-10-23 Thread Richard Sandiford
This patch adds a wrapper around TYPE_VECTOR_SUBPARTS for omp-simd-clone.c.
Supporting SIMD clones for variable-length vectors is post GCC8 work.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* omp-simd-clone.c (simd_clone_subparts): New function.
(simd_clone_init_simd_arrays): Use it instead of TYPE_VECTOR_SUBPARTS.
(ipa_simd_modify_function_body): Likewise.

Index: gcc/omp-simd-clone.c
===
--- gcc/omp-simd-clone.c2017-08-30 12:19:19.716220030 +0100
+++ gcc/omp-simd-clone.c2017-10-23 17:22:47.947648317 +0100
@@ -51,6 +51,15 @@ Software Foundation; either version 3, o
 #include "stringpool.h"
 #include "attribs.h"
 
+/* Return the number of elements in vector type VECTYPE, which is associated
+   with a SIMD clone.  At present these always have a constant length.  */
+
+static unsigned HOST_WIDE_INT
+simd_clone_subparts (tree vectype)
+{
+  return TYPE_VECTOR_SUBPARTS (vectype);
+}
+
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
of arguments to reserve space for.  */
 
@@ -770,7 +779,7 @@ simd_clone_init_simd_arrays (struct cgra
}
  continue;
}
-  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)) == node->simdclone->simdlen)
+  if (simd_clone_subparts (TREE_TYPE (arg)) == node->simdclone->simdlen)
{
  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
  tree ptr = build_fold_addr_expr (array);
@@ -781,7 +790,7 @@ simd_clone_init_simd_arrays (struct cgra
}
   else
{
- unsigned int simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
+ unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
{
@@ -927,8 +936,8 @@ ipa_simd_modify_function_body (struct cg
  iter,
  NULL_TREE, NULL_TREE);
   if (adjustments[j].op == IPA_PARM_OP_NONE
- && TYPE_VECTOR_SUBPARTS (vectype) < node->simdclone->simdlen)
-   j += node->simdclone->simdlen / TYPE_VECTOR_SUBPARTS (vectype) - 1;
+ && simd_clone_subparts (vectype) < node->simdclone->simdlen)
+   j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
 }
 
   l = adjustments.length ();


[081/nnn] poly_int: brig vector elements

2017-10-23 Thread Richard Sandiford
This patch adds a brig-specific wrapper around TYPE_VECTOR_SUBPARTS,
since presumably it will never need to support variable vector lengths.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/brig/
* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): New
function.
* brigfrontend/brig-basic-inst-handler.cc
(brig_basic_inst_handler::build_shuffle): Use it instead of
TYPE_VECTOR_SUBPARTS.
(brig_basic_inst_handler::build_unpack): Likewise.
(brig_basic_inst_handler::build_pack): Likewise.
(brig_basic_inst_handler::build_unpack_lo_or_hi): Likewise.
(brig_basic_inst_handler::operator ()): Likewise.
(brig_basic_inst_handler::build_lower_element_broadcast): Likewise.
* brigfrontend/brig-code-entry-handler.cc
(brig_code_entry_handler::get_tree_cst_for_hsa_operand): Likewise.
(brig_code_entry_handler::get_comparison_result_type): Likewise.
(brig_code_entry_handler::expand_or_call_builtin): Likewise.

Index: gcc/brig/brigfrontend/brig-util.h
===
--- gcc/brig/brigfrontend/brig-util.h   2017-10-02 09:10:56.960755788 +0100
+++ gcc/brig/brigfrontend/brig-util.h   2017-10-23 17:22:46.882758777 +0100
@@ -76,4 +76,12 @@ bool gccbrig_might_be_host_defined_var_p
 /* From hsa.h.  */
 bool hsa_type_packed_p (BrigType16_t type);
 
+/* Return the number of elements in a VECTOR_TYPE.  BRIG does not support
+   variable-length vectors.  */
+inline unsigned HOST_WIDE_INT
+gccbrig_type_vector_subparts (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type);
+}
+
 #endif
Index: gcc/brig/brigfrontend/brig-basic-inst-handler.cc
===
--- gcc/brig/brigfrontend/brig-basic-inst-handler.cc2017-08-10 
14:36:07.092506123 +0100
+++ gcc/brig/brigfrontend/brig-basic-inst-handler.cc2017-10-23 
17:22:46.882758777 +0100
@@ -97,9 +97,10 @@ brig_basic_inst_handler::build_shuffle (
  output elements can originate from any input element.  */
   vec *mask_offset_vals = NULL;
 
+  unsigned int element_count = gccbrig_type_vector_subparts (arith_type);
+
   vec *input_mask_vals = NULL;
-  size_t input_mask_element_size
-= exact_log2 (TYPE_VECTOR_SUBPARTS (arith_type));
+  size_t input_mask_element_size = exact_log2 (element_count);
 
   /* Unpack the tightly packed mask elements to BIT_FIELD_REFs
  from which to construct the mask vector as understood by
@@ -109,7 +110,7 @@ brig_basic_inst_handler::build_shuffle (
   tree mask_element_type
 = build_nonstandard_integer_type (input_mask_element_size, true);
 
-  for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
+  for (size_t i = 0; i < element_count; ++i)
 {
   tree mask_element
= build3 (BIT_FIELD_REF, mask_element_type, mask_operand,
@@ -119,17 +120,15 @@ brig_basic_inst_handler::build_shuffle (
   mask_element = convert (element_type, mask_element);
 
   tree offset;
-  if (i < TYPE_VECTOR_SUBPARTS (arith_type) / 2)
+  if (i < element_count / 2)
offset = build_int_cst (element_type, 0);
   else
-   offset
- = build_int_cst (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
+   offset = build_int_cst (element_type, element_count);
 
   CONSTRUCTOR_APPEND_ELT (mask_offset_vals, NULL_TREE, offset);
   CONSTRUCTOR_APPEND_ELT (input_mask_vals, NULL_TREE, mask_element);
 }
-  tree mask_vec_type
-= build_vector_type (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
+  tree mask_vec_type = build_vector_type (element_type, element_count);
 
   tree mask_vec = build_constructor (mask_vec_type, input_mask_vals);
   tree offset_vec = build_constructor (mask_vec_type, mask_offset_vals);
@@ -158,7 +157,8 @@ brig_basic_inst_handler::build_unpack (t
   vec *input_mask_vals = NULL;
   vec *and_mask_vals = NULL;
 
-  size_t element_count = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
+  size_t element_count
+= gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
   tree vec_type = build_vector_type (element_type, element_count);
 
   for (size_t i = 0; i < element_count; ++i)
@@ -213,7 +213,7 @@ brig_basic_inst_handler::build_pack (tre
  TODO: Reuse this for implementing 'bitinsert'
  without a builtin call.  */
 
-  size_t ecount = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
+  size_t ecount = gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
   size_t vecsize = int_size_in_bytes (TREE_TYPE (operands[0])) * BITS_PER_UNIT;
   tree wide_type = build_nonstandard_integer_type (vecsize, 1);
 
@@ -275,9 +275,10 @@ brig_basic_inst_handler::build_unpack_lo
 {
   tree element_type = get_unsigned_int_type (TREE_TYPE (arith_type));
   

[080/nnn] poly_int: tree-vect-generic.c

2017-10-23 Thread Richard Sandiford
This patch makes tree-vect-generic.c cope with variable-length vectors.
Decomposition is only supported for constant-length vectors, since we
should never generate unsupported variable-length operations.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-generic.c (nunits_for_known_piecewise_op): New function.
(expand_vector_piecewise): Use it instead of TYPE_VECTOR_SUBPARTS.
(expand_vector_addition, add_rshift, expand_vector_divmod): Likewise.
(expand_vector_condition, vector_element): Likewise.
(subparts_gt): New function.
(get_compute_type): Use subparts_gt.
(count_type_subparts): Delete.
(expand_vector_operations_1): Use subparts_gt instead of
count_type_subparts.

Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c 2017-10-23 17:11:39.944370794 +0100
+++ gcc/tree-vect-generic.c 2017-10-23 17:22:45.856865193 +0100
@@ -41,6 +41,26 @@ Free Software Foundation; either version
 
 static void expand_vector_operations_1 (gimple_stmt_iterator *);
 
+/* Return the number of elements in a vector type TYPE that we have
+   already decided needs to be expanded piecewise.  We don't support
+   this kind of expansion for variable-length vectors, since we should
+   always check for target support before introducing uses of those.  */
+static unsigned int
+nunits_for_known_piecewise_op (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type);
+}
+
+/* Return true if TYPE1 has more elements than TYPE2, where either
+   type may be a vector or a scalar.  */
+
+static inline bool
+subparts_gt (tree type1, tree type2)
+{
+  poly_uint64 n1 = VECTOR_TYPE_P (type1) ? TYPE_VECTOR_SUBPARTS (type1) : 1;
+  poly_uint64 n2 = VECTOR_TYPE_P (type2) ? TYPE_VECTOR_SUBPARTS (type2) : 1;
+  return must_gt (n1, n2);
+}
 
 /* Build a constant of type TYPE, made of VALUE's bits replicated
every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
@@ -254,7 +274,7 @@ expand_vector_piecewise (gimple_stmt_ite
   vec *v;
   tree part_width = TYPE_SIZE (inner_type);
   tree index = bitsize_int (0);
-  int nunits = TYPE_VECTOR_SUBPARTS (type);
+  int nunits = nunits_for_known_piecewise_op (type);
   int delta = tree_to_uhwi (part_width)
  / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)));
   int i;
@@ -338,7 +358,7 @@ expand_vector_addition (gimple_stmt_iter
 
   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
   && parts_per_word >= 4
-  && TYPE_VECTOR_SUBPARTS (type) >= 4)
+  && nunits_for_known_piecewise_op (type) >= 4)
 return expand_vector_parallel (gsi, f_parallel,
   type, a, b, code);
   else
@@ -373,7 +393,7 @@ expand_vector_comparison (gimple_stmt_it
 add_rshift (gimple_stmt_iterator *gsi, tree type, tree op0, int *shiftcnts)
 {
   optab op;
-  unsigned int i, nunits = TYPE_VECTOR_SUBPARTS (type);
+  unsigned int i, nunits = nunits_for_known_piecewise_op (type);
   bool scalar_shift = true;
 
   for (i = 1; i < nunits; i++)
@@ -418,7 +438,7 @@ expand_vector_divmod (gimple_stmt_iterat
   bool has_vector_shift = true;
   int mode = -1, this_mode;
   int pre_shift = -1, post_shift;
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
+  unsigned int nunits = nunits_for_known_piecewise_op (type);
   int *shifts = XALLOCAVEC (int, nunits * 4);
   int *pre_shifts = shifts + nunits;
   int *post_shifts = pre_shifts + nunits;
@@ -867,7 +887,6 @@ expand_vector_condition (gimple_stmt_ite
   tree index = bitsize_int (0);
   tree comp_width = width;
   tree comp_index = index;
-  int nunits = TYPE_VECTOR_SUBPARTS (type);
   int i;
   location_t loc = gimple_location (gsi_stmt (*gsi));
 
@@ -920,6 +939,7 @@ expand_vector_condition (gimple_stmt_ite
   warning_at (loc, OPT_Wvector_operation_performance,
  "vector condition will be expanded piecewise");
 
+  int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
   for (i = 0; i < nunits; i++)
 {
@@ -1189,7 +1209,7 @@ vector_element (gimple_stmt_iterator *gs
 
   vect_type = TREE_TYPE (vect);
   vect_elt_type = TREE_TYPE (vect_type);
-  elements = TYPE_VECTOR_SUBPARTS (vect_type);
+  elements = nunits_for_known_piecewise_op (vect_type);
 
   if (TREE_CODE (idx) == INTEGER_CST)
 {
@@ -1446,8 +1466,7 @@ get_compute_type (enum tree_code code, o
   tree vector_compute_type
= type_for_widest_vector_mode (TREE_TYPE (type), op);
   if (vector_compute_type != NULL_TREE
- && (TYPE_VECTOR_SUBPARTS (vector_compute_type)
- < TYPE_VECTOR_SUBPARTS (compute_type))
+ && subparts_gt (compute_type, vector_compute_type)
  && TYPE_VECTOR_SUBPARTS (vector_compute_type) > 1
  && (optab_handler (op, TYPE_MODE (vector_compute_type))
  != 

[079/nnn] poly_int: vect_no_alias_p

2017-10-23 Thread Richard Sandiford
This patch replaces the two-state vect_no_alias_p with a three-state
vect_compile_time_alias that handles polynomial segment lengths.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vect_no_alias_p): Replace with...
(vect_compile_time_alias): ...this new function.  Do the calculation
on poly_ints rather than trees.
(vect_prune_runtime_alias_test_list): Update call accordingly.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-10-23 17:22:34.681024458 +0100
+++ gcc/tree-vect-data-refs.c   2017-10-23 17:22:44.864968082 +0100
@@ -2989,52 +2989,49 @@ vect_vfa_segment_size (struct data_refer
 
 /* Function vect_no_alias_p.
 
-   Given data references A and B with equal base and offset, the alias
-   relation can be decided at compilation time, return TRUE if they do
-   not alias to each other; return FALSE otherwise.  SEGMENT_LENGTH_A
+   Given data references A and B with equal base and offset, see whether
+   the alias relation can be decided at compilation time.  Return 1 if
+   it can and the references alias, 0 if it can and the references do
+   not alias, and -1 if we cannot decide at compile time.  SEGMENT_LENGTH_A
and SEGMENT_LENGTH_B are the memory lengths accessed by A and B
respectively.  */
 
-static bool
-vect_no_alias_p (struct data_reference *a, struct data_reference *b,
- tree segment_length_a, tree segment_length_b)
+static int
+vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
+tree segment_length_a, tree segment_length_b)
 {
-  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
- && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
-  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))
-return false;
+  poly_offset_int offset_a = wi::to_poly_offset (DR_INIT (a));
+  poly_offset_int offset_b = wi::to_poly_offset (DR_INIT (b));
+  poly_uint64 const_length_a;
+  poly_uint64 const_length_b;
 
-  tree seg_a_min = DR_INIT (a);
-  tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
-   seg_a_min, segment_length_a);
   /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
  bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
  [a, a+12) */
   if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
 {
-  tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
-  seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
-  seg_a_max, unit_size);
-  seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),
-  DR_INIT (a), unit_size);
+  const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi ();
+  offset_a = (offset_a + vect_get_scalar_dr_size (a)) - const_length_a;
 }
-  tree seg_b_min = DR_INIT (b);
-  tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
-   seg_b_min, segment_length_b);
+  else
+const_length_a = tree_to_poly_uint64 (segment_length_a);
   if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
 {
-  tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
-  seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
-  seg_b_max, unit_size);
-  seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)),
-  DR_INIT (b), unit_size);
+  const_length_b = (-wi::to_poly_wide (segment_length_b)).force_uhwi ();
+  offset_b = (offset_b + vect_get_scalar_dr_size (b)) - const_length_b;
 }
+  else
+const_length_b = tree_to_poly_uint64 (segment_length_b);
 
-  if (tree_int_cst_le (seg_a_max, seg_b_min)
-  || tree_int_cst_le (seg_b_max, seg_a_min))
-return true;
+  if (ranges_must_overlap_p (offset_a, const_length_a,
+offset_b, const_length_b))
+return 1;
+
+  if (!ranges_may_overlap_p (offset_a, const_length_a,
+offset_b, const_length_b))
+return 0;
 
-  return false;
+  return -1;
 }
 
 /* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH
@@ -3176,21 +3173,26 @@ vect_prune_runtime_alias_test_list (loop
comp_res = data_ref_compare_tree (DR_OFFSET (dr_a),
  DR_OFFSET (dr_b));
 
-  /* Alias is known at compilation time.  */
+  /* See whether the alias is known at compilation time.  */
   if (comp_res == 0
  && TREE_CODE (DR_STEP (dr_a)) == INTEGER_CST
  && TREE_CODE (DR_STEP (dr_b)) == INTEGER_CST
- && TREE_CODE (segment_length_a) == INTEGER_CST
- && TREE_CODE (segment_length_b) == INTEGER_CST)
+ && poly_int_tree_p (segment_length_a)
+ && 

[078/nnn] poly_int: two-operation SLP

2017-10-23 Thread Richard Sandiford
This patch makes two-operation SLP handle but reject variable-length
vectors.  Adding support for this is a post-GCC8 thing.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-slp.c (vect_build_slp_tree_1): Handle polynomial
numbers of units.
(vect_schedule_slp_instance): Likewise.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-10-23 17:22:42.827179461 +0100
+++ gcc/tree-vect-slp.c 2017-10-23 17:22:43.865071801 +0100
@@ -903,10 +903,19 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 
   /* If we allowed a two-operation SLP node verify the target can cope
  with the permute we are going to use.  */
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   if (alt_stmt_code != ERROR_MARK
   && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
 {
-  unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned HOST_WIDE_INT count;
+  if (!nunits.is_constant ())
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: different operations "
+"not allowed with variable-length SLP.\n");
+ return false;
+   }
   auto_vec_perm_indices sel (count);
   for (i = 0; i < count; ++i)
{
@@ -3796,6 +3805,7 @@ vect_schedule_slp_instance (slp_tree nod
 
   /* VECTYPE is the type of the destination.  */
   vectype = STMT_VINFO_VECTYPE (stmt_info);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   group_size = SLP_INSTANCE_GROUP_SIZE (instance);
 
   if (!SLP_TREE_VEC_STMTS (node).exists ())
@@ -3858,13 +3868,16 @@ vect_schedule_slp_instance (slp_tree nod
  unsigned k = 0, l;
  for (j = 0; j < v0.length (); ++j)
{
- unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec melts (nunits);
- for (l = 0; l < nunits; ++l)
+ /* Enforced by vect_build_slp_tree, which rejects variable-length
+vectors for SLP_TREE_TWO_OPERATORS.  */
+ unsigned int const_nunits = nunits.to_constant ();
+ auto_vec melts (const_nunits);
+ for (l = 0; l < const_nunits; ++l)
{
  if (k >= group_size)
k = 0;
- tree t = build_int_cst (meltype, mask[k++] * nunits + l);
+ tree t = build_int_cst (meltype,
+ mask[k++] * const_nunits + l);
  melts.quick_push (t);
}
  tmask = build_vector (mvectype, melts);


[077/nnn] poly_int: vect_get_constant_vectors

2017-10-23 Thread Richard Sandiford
For now, vect_get_constant_vectors can only cope with constant-length
vectors, although a patch after the main SVE submission relaxes this.
This patch adds an appropriate guard for variable-length vectors.
The TYPE_VECTOR_SUBPARTS use in vect_get_constant_vectors will then
have a to_constant call when TYPE_VECTOR_SUBPARTS becomes a poly_int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-slp.c (vect_get_and_check_slp_defs): Reject
constant and extern definitions for variable-length vectors.
(vect_get_constant_vectors): Note that the number of units
is known to be constant.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-10-23 17:22:32.728227020 +0100
+++ gcc/tree-vect-slp.c 2017-10-23 17:22:42.827179461 +0100
@@ -403,6 +403,20 @@ vect_get_and_check_slp_defs (vec_info *v
{
case vect_constant_def:
case vect_external_def:
+ /* We must already have set a vector size by now.  */
+ gcc_checking_assert (maybe_nonzero (current_vector_size));
+ if (!current_vector_size.is_constant ())
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: invalid type of def "
+  "for variable-length SLP ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, oprnd);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+   }
+ return -1;
+   }
  break;
 
case vect_reduction_def:
@@ -3219,6 +3233,7 @@ vect_get_constant_vectors (tree op, slp_
   = build_same_sized_truth_vector_type (STMT_VINFO_VECTYPE (stmt_vinfo));
   else
 vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+  /* Enforced by vect_get_and_check_slp_defs.  */
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   if (STMT_VINFO_DATA_REF (stmt_vinfo))


[076/nnn] poly_int: vectorizable_conversion

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_conversion cope with variable-length
vectors.  We already require the number of elements in one vector
to be a multiple of the number of elements in the other vector,
so the patch uses that to choose between widening and narrowing.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-stmts.c (vectorizable_conversion): Treat the number
of units as polynomial.  Choose between WIDE and NARROW based
on multiple_p.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-10-23 17:22:40.906378704 +0100
+++ gcc/tree-vect-stmts.c   2017-10-23 17:22:41.879277786 +0100
@@ -4102,8 +4102,8 @@ vectorizable_conversion (gimple *stmt, g
   int ndts = 2;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   tree vectype_out, vectype_in;
   int ncopies, i, j;
   tree lhs_type, rhs_type;
@@ -4238,12 +4238,15 @@ vectorizable_conversion (gimple *stmt, g
 
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
-  if (nunits_in < nunits_out)
-modifier = NARROW;
-  else if (nunits_out == nunits_in)
+  if (must_eq (nunits_out, nunits_in))
 modifier = NONE;
+  else if (multiple_p (nunits_out, nunits_in))
+modifier = NARROW;
   else
-modifier = WIDEN;
+{
+  gcc_checking_assert (multiple_p (nunits_in, nunits_out));
+  modifier = WIDEN;
+}
 
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in


[073/nnn] poly_int: vectorizable_load/store

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_load and vectorizable_store cope with
variable-length vectors.  The reverse and permute cases will be
excluded by the code that checks the permutation mask (although a
patch after the main SVE submission adds support for the reversed
case).  Here we also need to exclude VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP, which split the operation up into a constant
number of constant-sized operations.  We also don't try to extend
the current widening gather/scatter support to variable-length
vectors, since SVE uses a different approach.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-stmts.c (get_load_store_type): Treat the number of
units as polynomial.  Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
for variable-length vectors.
(vectorizable_mask_load_store): Treat the number of units as
polynomial, asserting that it is constant if the condition has
already been enforced.
(vectorizable_store, vectorizable_load): Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-10-23 17:22:32.730226813 +0100
+++ gcc/tree-vect-stmts.c   2017-10-23 17:22:38.938582823 +0100
@@ -1955,6 +1955,7 @@ get_load_store_type (gimple *stmt, tree
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   vec_info *vinfo = stmt_info->vinfo;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 {
   *memory_access_type = VMAT_GATHER_SCATTER;
@@ -1998,6 +1999,17 @@ get_load_store_type (gimple *stmt, tree
*memory_access_type = VMAT_CONTIGUOUS;
 }
 
+  if ((*memory_access_type == VMAT_ELEMENTWISE
+   || *memory_access_type == VMAT_STRIDED_SLP)
+  && !nunits.is_constant ())
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Not using elementwise accesses due to variable "
+"vectorization factor.\n");
+  return false;
+}
+
   /* FIXME: At the moment the cost model seems to underestimate the
  cost of using elementwise accesses.  This check preserves the
  traditional behavior until that can be fixed.  */
@@ -2038,7 +2050,7 @@ vectorizable_mask_load_store (gimple *st
   tree dummy;
   tree dataref_ptr = NULL_TREE;
   gimple *ptr_incr;
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int i, j;
   bool inv_p;
@@ -2168,7 +2180,8 @@ vectorizable_mask_load_store (gimple *st
   gimple_seq seq;
   basic_block new_bb;
   enum { NARROW, NONE, WIDEN } modifier;
-  int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+  poly_uint64 gather_off_nunits
+   = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
 
   rettype = TREE_TYPE (TREE_TYPE (gs_info.decl));
   srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
@@ -2179,32 +2192,37 @@ vectorizable_mask_load_store (gimple *st
   gcc_checking_assert (types_compatible_p (srctype, rettype)
   && types_compatible_p (srctype, masktype));
 
-  if (nunits == gather_off_nunits)
+  if (must_eq (nunits, gather_off_nunits))
modifier = NONE;
-  else if (nunits == gather_off_nunits / 2)
+  else if (must_eq (nunits * 2, gather_off_nunits))
{
  modifier = WIDEN;
 
- auto_vec_perm_indices sel (gather_off_nunits);
- for (i = 0; i < gather_off_nunits; ++i)
-   sel.quick_push (i | nunits);
+ /* Currently widening gathers and scatters are only supported for
+fixed-length vectors.  */
+ int count = gather_off_nunits.to_constant ();
+ auto_vec_perm_indices sel (count);
+ for (i = 0; i < count; ++i)
+   sel.quick_push (i | (count / 2));
 
  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
}
-  else if (nunits == gather_off_nunits * 2)
+  else if (must_eq (nunits, gather_off_nunits * 2))
{
  modifier = NARROW;
 
- auto_vec_perm_indices sel (nunits);
- sel.quick_grow (nunits);
- for (i = 0; i < nunits; ++i)
-   sel[i] = i < gather_off_nunits
-? i : i + nunits - gather_off_nunits;
+ /* Currently narrowing gathers and scatters are only supported for
+fixed-length vectors.  */
+ int count = nunits.to_constant ();
+ auto_vec_perm_indices sel (count);
+ sel.quick_grow (count);
+ for (i = 0; i < count; ++i)
+   sel[i] = i < count / 2 ? i : i + count / 2;
 
  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
  ncopies *= 

[075/nnn] poly_int: vectorizable_simd_clone_call

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_simd_clone_call cope with variable-length
vectors.  For now we don't support SIMD clones for variable-length
vectors; this will be post GCC 8 material.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-stmts.c (simd_clone_subparts): New function.
(vectorizable_simd_clone_call): Use it instead of TYPE_VECTOR_SUBPARTS.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-10-23 17:22:39.943478586 +0100
+++ gcc/tree-vect-stmts.c   2017-10-23 17:22:40.906378704 +0100
@@ -3206,6 +3206,16 @@ vect_simd_lane_linear (tree op, struct l
 }
 }
 
+/* Return the number of elements in vector type VECTYPE, which is associated
+   with a SIMD clone.  At present these vectors always have a constant
+   length.  */
+
+static unsigned HOST_WIDE_INT
+simd_clone_subparts (tree vectype)
+{
+  return TYPE_VECTOR_SUBPARTS (vectype);
+}
+
 /* Function vectorizable_simd_clone_call.
 
Check if STMT performs a function call that can be vectorized
@@ -3474,7 +3484,7 @@ vectorizable_simd_clone_call (gimple *st
  = get_vectype_for_scalar_type (TREE_TYPE (gimple_call_arg (stmt,
 i)));
if (arginfo[i].vectype == NULL
-   || (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
+   || (simd_clone_subparts (arginfo[i].vectype)
> bestn->simdclone->simdlen))
  return false;
   }
@@ -3561,15 +3571,15 @@ vectorizable_simd_clone_call (gimple *st
{
case SIMD_CLONE_ARG_TYPE_VECTOR:
  atype = bestn->simdclone->args[i].vector_type;
- o = nunits / TYPE_VECTOR_SUBPARTS (atype);
+ o = nunits / simd_clone_subparts (atype);
  for (m = j * o; m < (j + 1) * o; m++)
{
- if (TYPE_VECTOR_SUBPARTS (atype)
- < TYPE_VECTOR_SUBPARTS (arginfo[i].vectype))
+ if (simd_clone_subparts (atype)
+ < simd_clone_subparts (arginfo[i].vectype))
{
  unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
- k = (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
-  / TYPE_VECTOR_SUBPARTS (atype));
+ k = (simd_clone_subparts (arginfo[i].vectype)
+  / simd_clone_subparts (atype));
  gcc_assert ((k & (k - 1)) == 0);
  if (m == 0)
vec_oprnd0
@@ -3595,8 +3605,8 @@ vectorizable_simd_clone_call (gimple *st
}
  else
{
- k = (TYPE_VECTOR_SUBPARTS (atype)
-  / TYPE_VECTOR_SUBPARTS (arginfo[i].vectype));
+ k = (simd_clone_subparts (atype)
+  / simd_clone_subparts (arginfo[i].vectype));
  gcc_assert ((k & (k - 1)) == 0);
  vec *ctor_elts;
  if (k != 1)
@@ -3714,11 +3724,11 @@ vectorizable_simd_clone_call (gimple *st
   new_stmt = gimple_build_call_vec (fndecl, vargs);
   if (vec_dest)
{
- gcc_assert (ratype || TYPE_VECTOR_SUBPARTS (rtype) == nunits);
+ gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
  if (ratype)
new_temp = create_tmp_var (ratype);
- else if (TYPE_VECTOR_SUBPARTS (vectype)
-  == TYPE_VECTOR_SUBPARTS (rtype))
+ else if (simd_clone_subparts (vectype)
+  == simd_clone_subparts (rtype))
new_temp = make_ssa_name (vec_dest, new_stmt);
  else
new_temp = make_ssa_name (rtype, new_stmt);
@@ -3728,11 +3738,11 @@ vectorizable_simd_clone_call (gimple *st
 
   if (vec_dest)
{
- if (TYPE_VECTOR_SUBPARTS (vectype) < nunits)
+ if (simd_clone_subparts (vectype) < nunits)
{
  unsigned int k, l;
  unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
- k = nunits / TYPE_VECTOR_SUBPARTS (vectype);
+ k = nunits / simd_clone_subparts (vectype);
  gcc_assert ((k & (k - 1)) == 0);
  for (l = 0; l < k; l++)
{
@@ -3767,16 +3777,16 @@ vectorizable_simd_clone_call (gimple *st
}
  continue;
}
- else if (TYPE_VECTOR_SUBPARTS (vectype) > nunits)
+ else if (simd_clone_subparts (vectype) > nunits)
{
- unsigned int k = (TYPE_VECTOR_SUBPARTS (vectype)
-   / TYPE_VECTOR_SUBPARTS (rtype));
+ unsigned int k = (simd_clone_subparts (vectype)
+   / 

[074/nnn] poly_int: vectorizable_call

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_call handle variable-length vectors.
The only substantial change is to use build_index_vector for
IFN_GOMP_SIMD_LANE; this makes no functional difference for
fixed-length vectors.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-stmts.c (vectorizable_call): Treat the number of
vectors as polynomial.  Use build_index_vector for
IFN_GOMP_SIMD_LANE.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-10-23 17:22:38.938582823 +0100
+++ gcc/tree-vect-stmts.c   2017-10-23 17:22:39.943478586 +0100
@@ -2637,8 +2637,8 @@ vectorizable_call (gimple *gs, gimple_st
   tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (gs), prev_stmt_info;
   tree vectype_out, vectype_in;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec_info *vinfo = stmt_info->vinfo;
@@ -2758,11 +2758,11 @@ vectorizable_call (gimple *gs, gimple_st
   /* FORNOW */
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
-  if (nunits_in == nunits_out / 2)
+  if (must_eq (nunits_in * 2, nunits_out))
 modifier = NARROW;
-  else if (nunits_out == nunits_in)
+  else if (must_eq (nunits_out, nunits_in))
 modifier = NONE;
-  else if (nunits_out == nunits_in / 2)
+  else if (must_eq (nunits_out * 2, nunits_in))
 modifier = WIDEN;
   else
 return false;
@@ -2961,11 +2961,7 @@ vectorizable_call (gimple *gs, gimple_st
  if (gimple_call_internal_p (stmt)
  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
{
- auto_vec v (nunits_out);
- for (int k = 0; k < nunits_out; ++k)
-   v.quick_push (build_int_cst (unsigned_type_node,
-j * nunits_out + k));
- tree cst = build_vector (vectype_out, v);
+ tree cst = build_index_vector (vectype_out, j * nunits_out, 1);
  tree new_var
= vect_get_new_ssa_name (vectype_out, vect_simple_var, "cst_");
  gimple *init_stmt = gimple_build_assign (new_var, cst);


[072/nnn] poly_int: vectorizable_live_operation

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_live_operation cope with variable-length
vectors.  For now we just handle cases in which we can tell at compile
time which vector contains the final result.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-loop.c (vectorizable_live_operation): Treat the number
of units as polynomial.  Punt if we can't tell at compile time
which vector contains the final result.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-10-23 17:22:36.904793787 +0100
+++ gcc/tree-vect-loop.c2017-10-23 17:22:37.879692661 +0100
@@ -7132,10 +7132,12 @@ vectorizable_live_operation (gimple *stm
   imm_use_iterator imm_iter;
   tree lhs, lhs_type, bitsize, vec_bitsize;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   gimple *use_stmt;
   auto_vec vec_oprnds;
+  int vec_entry = 0;
+  poly_uint64 vec_index = 0;
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
@@ -7164,6 +7166,30 @@ vectorizable_live_operation (gimple *stm
   else
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
+  if (slp_node)
+{
+  gcc_assert (slp_index >= 0);
+
+  int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length ();
+  int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+
+  /* Get the last occurrence of the scalar index from the concatenation of
+all the slp vectors. Calculate which slp vector it is and the index
+within.  */
+  poly_uint64 pos = (num_vec * nunits) - num_scalar + slp_index;
+
+  /* Calculate which vector contains the result, and which lane of
+that vector we need.  */
+  if (!can_div_trunc_p (pos, nunits, _entry, _index))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Cannot determine which vector holds the"
+" final result.\n");
+ return false;
+   }
+}
+
   if (!vec_stmt)
 /* No transformation required.  */
 return true;
@@ -7185,18 +7211,6 @@ vectorizable_live_operation (gimple *stm
   tree vec_lhs, bitstart;
   if (slp_node)
 {
-  gcc_assert (slp_index >= 0);
-
-  int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length ();
-  int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-  /* Get the last occurrence of the scalar index from the concatenation of
-all the slp vectors. Calculate which slp vector it is and the index
-within.  */
-  int pos = (num_vec * nunits) - num_scalar + slp_index;
-  int vec_entry = pos / nunits;
-  int vec_index = pos % nunits;
-
   /* Get the correct slp vectorized stmt.  */
   vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]);
 


[071/nnn] poly_int: vectorizable_induction

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_induction cope with variable-length
vectors.  For now we punt on SLP inductions, but patchees after
the main SVE submission add support for those too.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-loop.c (vectorizable_induction): Treat the number
of units as polynomial.  Punt on SLP inductions.  Use an integer
VEC_SERIES_EXPR for variable-length integer reductions.  Use a
cast of such a series for variable-length floating-point
reductions.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-10-23 17:22:35.829905285 +0100
+++ gcc/tree-vect-loop.c2017-10-23 17:22:36.904793787 +0100
@@ -6624,7 +6624,7 @@ vectorizable_induction (gimple *phi,
 return false;
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   if (slp_node)
 ncopies = 1;
@@ -6689,6 +6689,16 @@ vectorizable_induction (gimple *phi,
 iv_loop = loop;
   gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
 
+  if (slp_node && !nunits.is_constant ())
+{
+  /* The current SLP code creates the initial value element-by-element.  */
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"SLP induction not supported for variable-length"
+" vectors.\n");
+  return false;
+}
+
   if (!vec_stmt) /* transformation not required.  */
 {
   STMT_VINFO_TYPE (stmt_info) = induc_vec_info_type;
@@ -6737,6 +6747,9 @@ vectorizable_induction (gimple *phi,
  [VF*S, VF*S, VF*S, VF*S] for all.  */
   if (slp_node)
 {
+  /* Enforced above.  */
+  unsigned int const_nunits = nunits.to_constant ();
+
   /* Convert the init to the desired type.  */
   stmts = NULL;
   init_expr = gimple_convert (, TREE_TYPE (vectype), init_expr);
@@ -6765,19 +6778,20 @@ vectorizable_induction (gimple *phi,
   /* Now generate the IVs.  */
   unsigned group_size = SLP_TREE_SCALAR_STMTS (slp_node).length ();
   unsigned nvects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-  unsigned elts = nunits * nvects;
-  unsigned nivs = least_common_multiple (group_size, nunits) / nunits;
+  unsigned elts = const_nunits * nvects;
+  unsigned nivs = least_common_multiple (group_size,
+const_nunits) / const_nunits;
   gcc_assert (elts % group_size == 0);
   tree elt = init_expr;
   unsigned ivn;
   for (ivn = 0; ivn < nivs; ++ivn)
{
- auto_vec elts (nunits);
+ auto_vec elts (const_nunits);
  stmts = NULL;
- for (unsigned eltn = 0; eltn < nunits; ++eltn)
+ for (unsigned eltn = 0; eltn < const_nunits; ++eltn)
{
- if (ivn*nunits + eltn >= group_size
- && (ivn*nunits + eltn) % group_size == 0)
+ if (ivn*const_nunits + eltn >= group_size
+ && (ivn * const_nunits + eltn) % group_size == 0)
elt = gimple_build (, PLUS_EXPR, TREE_TYPE (elt),
elt, step_expr);
  elts.quick_push (elt);
@@ -6814,7 +6828,7 @@ vectorizable_induction (gimple *phi,
   if (ivn < nvects)
{
  unsigned vfp
-   = least_common_multiple (group_size, nunits) / group_size;
+   = least_common_multiple (group_size, const_nunits) / group_size;
  /* Generate [VF'*S, VF'*S, ... ].  */
  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
@@ -6889,18 +6903,45 @@ vectorizable_induction (gimple *phi,
   stmts = NULL;
   new_name = gimple_convert (, TREE_TYPE (vectype), init_expr);
 
-  auto_vec elts (nunits);
-  elts.quick_push (new_name);
-  for (i = 1; i < nunits; i++)
-   {
- /* Create: new_name_i = new_name + step_expr  */
- new_name = gimple_build (, PLUS_EXPR, TREE_TYPE (new_name),
-  new_name, step_expr);
+  unsigned HOST_WIDE_INT const_nunits;
+  if (nunits.is_constant (_nunits))
+   {
+ auto_vec elts (const_nunits);
  elts.quick_push (new_name);
+ for (i = 1; i < const_nunits; i++)
+   {
+ /* Create: new_name_i = new_name + step_expr  */
+ new_name = gimple_build (, PLUS_EXPR, TREE_TYPE (new_name),
+  new_name, step_expr);
+ elts.quick_push (new_name);
+   }
+ /* Create a vector from [new_name_0, new_name_1, ...,
+new_name_nunits-1]  */
+ vec_init = gimple_build_vector (, vectype, elts);
+   }
+  else if 

[070/nnn] poly_int: vectorizable_reduction

2017-10-23 Thread Richard Sandiford
This patch makes vectorizable_reduction cope with variable-length vectors.
We can handle the simple case of an inner loop reduction for which
the target has native support for the epilogue operation.  For now we
punt on other cases, but patches after the main SVE submission allow
SLP and double reductions too.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (build_index_vector): Declare.
* tree.c (build_index_vector): New function.
* tree-vect-loop.c (get_initial_def_for_reduction): Treat the number
of units as polynomial, forcibly converting it to a constant if
vectorizable_reduction has already enforced the condition.
(get_initial_defs_for_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.  Use build_index_vector
to create a {1,2,3,...} vector.
(vectorizable_reduction): Treat the number of units as polynomial.
Choose vectype_in based on the largest scalar element size rather
than the smallest number of units.  Enforce the restrictions
relied on above.

Index: gcc/tree.h
===
--- gcc/tree.h  2017-10-23 17:22:32.736226191 +0100
+++ gcc/tree.h  2017-10-23 17:22:35.831905077 +0100
@@ -4050,6 +4050,7 @@ extern tree build_vector (tree, vec *);
 extern tree build_vector_from_val (tree, tree);
 extern tree build_vec_series (tree, tree, tree);
+extern tree build_index_vector (tree, poly_uint64, poly_uint64);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec *);
Index: gcc/tree.c
===
--- gcc/tree.c  2017-10-23 17:22:32.734226398 +0100
+++ gcc/tree.c  2017-10-23 17:22:35.830905181 +0100
@@ -1974,6 +1974,37 @@ build_vec_series (tree type, tree base,
   return build2 (VEC_SERIES_EXPR, type, base, step);
 }
 
+/* Return a vector with the same number of units and number of bits
+   as VEC_TYPE, but in which the elements are a linear series of unsigned
+   integers { BASE, BASE + STEP, BASE + STEP * 2, ... }.  */
+
+tree
+build_index_vector (tree vec_type, poly_uint64 base, poly_uint64 step)
+{
+  tree index_vec_type = vec_type;
+  tree index_elt_type = TREE_TYPE (vec_type);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vec_type);
+  if (!INTEGRAL_TYPE_P (index_elt_type) || !TYPE_UNSIGNED (index_elt_type))
+{
+  index_elt_type = build_nonstandard_integer_type
+   (GET_MODE_BITSIZE (SCALAR_TYPE_MODE (index_elt_type)), true);
+  index_vec_type = build_vector_type (index_elt_type, nunits);
+}
+
+  unsigned HOST_WIDE_INT count;
+  if (nunits.is_constant ())
+{
+  auto_vec v (count);
+  for (unsigned int i = 0; i < count; ++i)
+   v.quick_push (build_int_cstu (index_elt_type, base + i * step));
+  return build_vector (index_vec_type, v);
+}
+
+  return build_vec_series (index_vec_type,
+  build_int_cstu (index_elt_type, base),
+  build_int_cstu (index_elt_type, step));
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-10-23 17:22:32.727227124 +0100
+++ gcc/tree-vect-loop.c2017-10-23 17:22:35.829905285 +0100
@@ -3997,11 +3997,10 @@ get_initial_def_for_reduction (gimple *s
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree scalar_type = TREE_TYPE (init_val);
   tree vectype = get_vectype_for_scalar_type (scalar_type);
-  int nunits;
+  poly_uint64 nunits;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   tree def_for_init;
   tree init_def;
-  int i;
   bool nested_in_vect_loop = false;
   REAL_VALUE_TYPE real_init_val = dconst0;
   int int_init_val = 0;
@@ -4082,9 +4081,13 @@ get_initial_def_for_reduction (gimple *s
else
  {
/* Option2: the first element is INIT_VAL.  */
-   auto_vec elts (nunits);
+
+   /* Enforced by vectorizable_reduction (which disallows double
+  reductions with variable-length vectors).  */
+   unsigned int count = nunits.to_constant ();
+   auto_vec elts (count);
elts.quick_push (init_val);
-   for (i = 1; i < nunits; ++i)
+   for (unsigned int i = 1; i < count; ++i)
  elts.quick_push (def_for_init);
init_def = gimple_build_vector (, vectype, elts);
  }
@@ -4144,6 +4147,8 @@ get_initial_defs_for_reduction (slp_tree
 
   vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
   scalar_type = TREE_TYPE (vector_type);
+  /* 

[069/nnn] poly_int: vector_alignment_reachable_p

2017-10-23 Thread Richard Sandiford
This patch makes vector_alignment_reachable_p cope with variable-length
vectors.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vector_alignment_reachable_p): Treat the
number of units as polynomial.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-10-23 17:22:26.571498977 +0100
+++ gcc/tree-vect-data-refs.c   2017-10-23 17:22:34.681024458 +0100
@@ -1153,16 +1153,17 @@ vector_alignment_reachable_p (struct dat
 the prolog loop ({VF - misalignment}), is a multiple of the
 number of the interleaved accesses.  */
   int elem_size, mis_in_elements;
-  int nelements = TYPE_VECTOR_SUBPARTS (vectype);
 
   /* FORNOW: handle only known alignment.  */
   if (!known_alignment_for_access_p (dr))
return false;
 
-  elem_size = GET_MODE_SIZE (TYPE_MODE (vectype)) / nelements;
+  poly_uint64 nelements = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
+  elem_size = vector_element_size (vector_size, nelements);
   mis_in_elements = DR_MISALIGNMENT (dr) / elem_size;
 
-  if ((nelements - mis_in_elements) % GROUP_SIZE (stmt_info))
+  if (!multiple_p (nelements - mis_in_elements, GROUP_SIZE (stmt_info)))
return false;
 }
 


[068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES

2017-10-23 Thread Richard Sandiford
This patch changes the type of current_vector_size to poly_uint64.
It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
in a vector of possible sizes (as poly_uint64s) instead of returning
a bitmask.  The documentation claimed that the hook didn't need to
include the default vector size (returned by preferred_simd_mode),
but that wasn't consistent with the omp-low.c usage.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.h (vector_sizes, auto_vector_sizes): New typedefs.
* target.def (autovectorize_vector_sizes): Return the vector sizes
by pointer, using vector_sizes rather than a bitmask.
* targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
* targhooks.c (default_autovectorize_vector_sizes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
* omp-general.c (omp_max_vf): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (can_vec_mask_load_store_p): Likewise.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (current_vector_size): Change from an unsigned int
to a poly_uint64.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
the vector size as a poly_uint64 rather than an unsigned int.
(current_vector_size): Change from an unsigned int to a poly_uint64.
(get_vectype_for_scalar_type): Update accordingly.
* tree.h (build_truth_vector_type): Take the size and number of
units as a poly_uint64 rather than an unsigned int.
(build_vector_type): Add a temporary overload that takes
the number of units as a poly_uint64 rather than an unsigned int.
* tree.c (make_vector_type): Likewise.
(build_truth_vector_type): Take the number of units as a poly_uint64
rather than an unsigned int.

Index: gcc/target.h
===
--- gcc/target.h2017-10-23 17:11:40.126719272 +0100
+++ gcc/target.h2017-10-23 17:22:32.724227435 +0100
@@ -199,6 +199,13 @@ typedef vec vec_perm_ind
automatically freed.  */
 typedef auto_vec auto_vec_perm_indices;
 
+/* The type to use for lists of vector sizes.  */
+typedef vec vector_sizes;
+
+/* Same, but can be used to construct local lists that are
+   automatically freed.  */
+typedef auto_vec auto_vector_sizes;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/target.def
===
--- gcc/target.def  2017-10-23 17:22:30.980383601 +0100
+++ gcc/target.def  2017-10-23 17:22:32.724227435 +0100
@@ -1880,12 +1880,16 @@ transformations even in absence of speci
after processing the preferred one derived from preferred_simd_mode.  */
 DEFHOOK
 (autovectorize_vector_sizes,
- "This hook should return a mask of sizes that should be iterated over\n\
-after trying to autovectorize using the vector size derived from the\n\
-mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
-The default is zero which means to not iterate over other vector sizes.",
- unsigned int,
- (void),
+ "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
+the only one that is worth considering, this hook should add all suitable\n\
+vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
+one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
+\n\
+The hook does not need to do anything if the vector returned by\n\
+@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
+for autovectorization.  The default implementation does nothing.",
+ void,
+ (vector_sizes *sizes),
  default_autovectorize_vector_sizes)
 
 /* Function to get a target mode for a vector mask.  */
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2017-10-23 17:22:30.980383601 +0100
+++ gcc/targhooks.h 2017-10-23 17:22:32.725227332 +0100
@@ -106,7 +106,7 @@ default_builtin_support_vector_misalignm
 const_tree,
 int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
-extern 

[067/nnn] poly_int: get_mask_mode

2017-10-23 Thread Richard Sandiford
This patch makes TARGET_GET_MASK_MODE take polynomial nunits and
vector_size arguments.  The gcc_assert in default_get_mask_mode
is now handled by the exact_div call in vector_element_size.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (get_mask_mode): Take the number of units and length
as poly_uint64s rather than unsigned ints.
* targhooks.h (default_get_mask_mode): Update accordingly.
* targhooks.c (default_get_mask_mode): Likewise.
* config/i386/i386.c (ix86_get_mask_mode): Likewise.
* doc/tm.texi: Regenerate.

Index: gcc/target.def
===
--- gcc/target.def  2017-10-23 17:19:01.411170305 +0100
+++ gcc/target.def  2017-10-23 17:22:30.980383601 +0100
@@ -1901,7 +1901,7 @@ The default implementation returns the m
 is @var{length} bytes long and that contains @var{nunits} elements,\n\
 if such a mode exists.",
  opt_machine_mode,
- (unsigned nunits, unsigned length),
+ (poly_uint64 nunits, poly_uint64 length),
  default_get_mask_mode)
 
 /* Target builtin that implements vector gather operation.  */
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2017-10-23 17:19:01.411170305 +0100
+++ gcc/targhooks.h 2017-10-23 17:22:30.980383601 +0100
@@ -107,7 +107,7 @@ default_builtin_support_vector_misalignm
 int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
-extern opt_machine_mode default_get_mask_mode (unsigned, unsigned);
+extern opt_machine_mode default_get_mask_mode (poly_uint64, poly_uint64);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
   struct _stmt_vec_info *, int,
Index: gcc/targhooks.c
===
--- gcc/targhooks.c 2017-10-23 17:19:01.411170305 +0100
+++ gcc/targhooks.c 2017-10-23 17:22:30.980383601 +0100
@@ -1254,17 +1254,17 @@ default_autovectorize_vector_sizes (void
   return 0;
 }
 
-/* By defaults a vector of integers is used as a mask.  */
+/* By default a vector of integers is used as a mask.  */
 
 opt_machine_mode
-default_get_mask_mode (unsigned nunits, unsigned vector_size)
+default_get_mask_mode (poly_uint64 nunits, poly_uint64 vector_size)
 {
-  unsigned elem_size = vector_size / nunits;
+  unsigned int elem_size = vector_element_size (vector_size, nunits);
   scalar_int_mode elem_mode
 = smallest_int_mode_for_size (elem_size * BITS_PER_UNIT);
   machine_mode vector_mode;
 
-  gcc_assert (elem_size * nunits == vector_size);
+  gcc_assert (must_eq (elem_size * nunits, vector_size));
 
   if (mode_for_vector (elem_mode, nunits).exists (_mode)
   && VECTOR_MODE_P (vector_mode)
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  2017-10-23 17:19:01.404170211 +0100
+++ gcc/config/i386/i386.c  2017-10-23 17:22:30.978383200 +0100
@@ -48121,7 +48121,7 @@ ix86_autovectorize_vector_sizes (void)
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
 
 static opt_machine_mode
-ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+ix86_get_mask_mode (poly_uint64 nunits, poly_uint64 vector_size)
 {
   unsigned elem_size = vector_size / nunits;
 
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2017-10-23 17:19:01.408170265 +0100
+++ gcc/doc/tm.texi 2017-10-23 17:22:30.979383401 +0100
@@ -5846,7 +5846,7 @@ mode returned by @code{TARGET_VECTORIZE_
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
-@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
(unsigned @var{nunits}, unsigned @var{length})
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
(poly_uint64 @var{nunits}, poly_uint64 @var{length})
 A vector mask is a value that holds one boolean result for every element
 in a vector.  This hook returns the machine mode that should be used to
 represent such a mask when the vector in question is @var{length} bytes


[066/nnn] poly_int: omp_max_vf

2017-10-23 Thread Richard Sandiford
This patch makes omp_max_vf return a polynomial vectorization factor.
We then need to be able to stash a polynomial value in
OMP_CLAUSE_SAFELEN_EXPR too:

   /* If max_vf is non-zero, then we can use only a vectorization factor
  up to the max_vf we chose.  So stick it into the safelen clause.  */

For now the cfgloop safelen is still constant though.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* omp-general.h (omp_max_vf): Return a poly_uint64 instead of an int.
* omp-general.c (omp_max_vf): Likewise.
* omp-expand.c (omp_adjust_chunk_size): Update call to omp_max_vf.
(expand_omp_simd): Handle polynomial safelen.
* omp-low.c (omplow_simd_context): Add a default constructor.
(omplow_simd_context::max_vf): Change from int to poly_uint64.
(lower_rec_simd_input_clauses): Update accordingly.
(lower_rec_input_clauses): Likewise.

Index: gcc/omp-general.h
===
--- gcc/omp-general.h   2017-05-18 07:51:12.357753671 +0100
+++ gcc/omp-general.h   2017-10-23 17:22:29.881163047 +0100
@@ -78,7 +78,7 @@ extern tree omp_get_for_step_from_incr (
 extern void omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
  struct omp_for_data_loop *loops);
 extern gimple *omp_build_barrier (tree lhs);
-extern int omp_max_vf (void);
+extern poly_uint64 omp_max_vf (void);
 extern int omp_max_simt_vf (void);
 extern tree oacc_launch_pack (unsigned code, tree device, unsigned op);
 extern void oacc_replace_fn_attrib (tree fn, tree dims);
Index: gcc/omp-general.c
===
--- gcc/omp-general.c   2017-08-10 14:36:08.449457108 +0100
+++ gcc/omp-general.c   2017-10-23 17:22:29.881163047 +0100
@@ -423,7 +423,7 @@ omp_build_barrier (tree lhs)
 
 /* Return maximum possible vectorization factor for the target.  */
 
-int
+poly_uint64
 omp_max_vf (void)
 {
   if (!optimize
Index: gcc/omp-expand.c
===
--- gcc/omp-expand.c2017-10-02 09:10:57.525659817 +0100
+++ gcc/omp-expand.c2017-10-23 17:22:29.881163047 +0100
@@ -206,8 +206,8 @@ omp_adjust_chunk_size (tree chunk_size,
   if (!simd_schedule)
 return chunk_size;
 
-  int vf = omp_max_vf ();
-  if (vf == 1)
+  poly_uint64 vf = omp_max_vf ();
+  if (must_eq (vf, 1U))
 return chunk_size;
 
   tree type = TREE_TYPE (chunk_size);
@@ -4609,11 +4609,12 @@ expand_omp_simd (struct omp_region *regi
 
   if (safelen)
 {
+  poly_uint64 val;
   safelen = OMP_CLAUSE_SAFELEN_EXPR (safelen);
-  if (TREE_CODE (safelen) != INTEGER_CST)
+  if (!poly_int_tree_p (safelen, ))
safelen_int = 0;
-  else if (tree_fits_uhwi_p (safelen) && tree_to_uhwi (safelen) < INT_MAX)
-   safelen_int = tree_to_uhwi (safelen);
+  else
+   safelen_int = MIN (constant_lower_bound (val), INT_MAX);
   if (safelen_int == 1)
safelen_int = 0;
 }
Index: gcc/omp-low.c
===
--- gcc/omp-low.c   2017-10-23 17:17:01.432034493 +0100
+++ gcc/omp-low.c   2017-10-23 17:22:29.882163248 +0100
@@ -3487,11 +3487,12 @@ omp_clause_aligned_alignment (tree claus
and lower_rec_input_clauses.  */
 
 struct omplow_simd_context {
+  omplow_simd_context () { memset (this, 0, sizeof (*this)); }
   tree idx;
   tree lane;
   vec simt_eargs;
   gimple_seq simt_dlist;
-  int max_vf;
+  poly_uint64_pod max_vf;
   bool is_simt;
 };
 
@@ -3502,28 +3503,30 @@ struct omplow_simd_context {
 lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
  omplow_simd_context *sctx, tree , tree )
 {
-  if (sctx->max_vf == 0)
+  if (known_zero (sctx->max_vf))
 {
   sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
-  if (sctx->max_vf > 1)
+  if (may_gt (sctx->max_vf, 1U))
{
  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
OMP_CLAUSE_SAFELEN);
- if (c
- && (TREE_CODE (OMP_CLAUSE_SAFELEN_EXPR (c)) != INTEGER_CST
- || tree_int_cst_sgn (OMP_CLAUSE_SAFELEN_EXPR (c)) != 1))
-   sctx->max_vf = 1;
- else if (c && compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c),
- sctx->max_vf) == -1)
-   sctx->max_vf = tree_to_shwi (OMP_CLAUSE_SAFELEN_EXPR (c));
+ if (c)
+   {
+ poly_uint64 safe_len;
+ if (!poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), _len)
+ || may_lt (safe_len, 1U))
+   sctx->max_vf = 1;
+ else
+   sctx->max_vf = lower_bound (sctx->max_vf, safe_len);
+   }
}
-  if 

[065/nnn] poly_int: vect_nunits_for_cost

2017-10-23 Thread Richard Sandiford
This patch adds a function for getting the number of elements in
a vector for cost purposes, which is always constant.  It makes
it possible for a later patch to change GET_MODE_NUNITS and
TYPE_VECTOR_SUBPARTS to a poly_int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (vect_nunits_for_cost): New function.
* tree-vect-loop.c (vect_model_reduction_cost): Use it.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
(vect_analyze_slp_cost): Likewise.
* tree-vect-stmts.c (vect_model_store_cost): Likewise.
(vect_model_load_cost): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-10-23 17:22:26.575499779 +0100
+++ gcc/tree-vectorizer.h   2017-10-23 17:22:28.837953732 +0100
@@ -1154,6 +1154,16 @@ vect_vf_for_cost (loop_vec_info loop_vin
   return estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 }
 
+/* Estimate the number of elements in VEC_TYPE for costing purposes.
+   Pick a reasonable estimate if the exact number isn't known at
+   compile time.  */
+
+static inline unsigned int
+vect_nunits_for_cost (tree vec_type)
+{
+  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vec_type));
+}
+
 /* Return the size of the value accessed by unvectorized data reference DR.
This is only valid once STMT_VINFO_VECTYPE has been calculated for the
associated gimple statement, since that guarantees that DR accesses
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-10-23 17:22:26.573499378 +0100
+++ gcc/tree-vect-loop.c2017-10-23 17:22:28.835953330 +0100
@@ -3844,13 +3844,15 @@ vect_model_reduction_cost (stmt_vec_info
}
   else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
{
- unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ unsigned estimated_nunits = vect_nunits_for_cost (vectype);
  /* Extraction of scalar elements.  */
- epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits,
+ epilogue_cost += add_stmt_cost (target_cost_data,
+ 2 * estimated_nunits,
  vec_to_scalar, stmt_info, 0,
  vect_epilogue);
  /* Scalar max reductions via COND_EXPR / MAX_EXPR.  */
- epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits - 3,
+ epilogue_cost += add_stmt_cost (target_cost_data,
+ 2 * estimated_nunits - 3,
  scalar_stmt, stmt_info, 0,
  vect_epilogue);
}
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-10-23 17:22:27.793744215 +0100
+++ gcc/tree-vect-slp.c 2017-10-23 17:22:28.836953531 +0100
@@ -1718,8 +1718,8 @@ vect_analyze_slp_cost_1 (slp_instance in
_perms);
  record_stmt_cost (body_cost_vec, n_perms, vec_perm,
stmt_info, 0, vect_body);
- unsigned nunits
-   = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+ unsigned assumed_nunits
+   = vect_nunits_for_cost (STMT_VINFO_VECTYPE (stmt_info));
  /* And adjust the number of loads performed.  This handles
 redundancies as well as loads that are later dead.  */
  auto_sbitmap perm (GROUP_SIZE (stmt_info));
@@ -1730,7 +1730,7 @@ vect_analyze_slp_cost_1 (slp_instance in
  bool load_seen = false;
  for (i = 0; i < GROUP_SIZE (stmt_info); ++i)
{
- if (i % nunits == 0)
+ if (i % assumed_nunits == 0)
{
  if (load_seen)
ncopies_for_cost++;
@@ -1743,7 +1743,7 @@ vect_analyze_slp_cost_1 (slp_instance in
ncopies_for_cost++;
  gcc_assert (ncopies_for_cost
  <= (GROUP_SIZE (stmt_info) - GROUP_GAP (stmt_info)
- + nunits - 1) / nunits);
+ + assumed_nunits - 1) / assumed_nunits);
  poly_uint64 uf = SLP_INSTANCE_UNROLLING_FACTOR (instance);
  ncopies_for_cost *= estimated_poly_value (uf);
}
@@ -1856,9 +1856,9 @@ vect_analyze_slp_cost (slp_instance inst
 assumed_vf = vect_vf_for_cost (STMT_VINFO_LOOP_VINFO (stmt_info));
   else
 assumed_vf = 1;
-  unsigned nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   /* For reductions look at a reduction operand in case the reduction
  operation 

[064/nnn] poly_int: SLP max_units

2017-10-23 Thread Richard Sandiford
This match makes tree-vect-slp.c track the maximum number of vector
units as a poly_uint64 rather than an unsigned int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-slp.c (vect_record_max_nunits, vect_build_slp_tree_1)
(vect_build_slp_tree_2, vect_build_slp_tree): Change max_nunits
from an unsigned int * to a poly_uint64_pod *.
(calculate_unrolling_factor): New function.
(vect_analyze_slp_instance): Use it.  Track polynomial max_nunits.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-10-23 17:22:26.573499378 +0100
+++ gcc/tree-vect-slp.c 2017-10-23 17:22:27.793744215 +0100
@@ -489,7 +489,7 @@ vect_get_and_check_slp_defs (vec_info *v
 
 static bool
 vect_record_max_nunits (vec_info *vinfo, gimple *stmt, unsigned int group_size,
-   tree vectype, unsigned int *max_nunits)
+   tree vectype, poly_uint64 *max_nunits)
 {
   if (!vectype)
 {
@@ -506,8 +506,11 @@ vect_record_max_nunits (vec_info *vinfo,
 
   /* If populating the vector type requires unrolling then fail
  before adjusting *max_nunits for basic-block vectorization.  */
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned HOST_WIDE_INT const_nunits;
   if (is_a  (vinfo)
-  && TYPE_VECTOR_SUBPARTS (vectype) > group_size)
+  && (!nunits.is_constant (_nunits)
+ || const_nunits > group_size))
 {
   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   "Build SLP failed: unrolling required "
@@ -517,9 +520,7 @@ vect_record_max_nunits (vec_info *vinfo,
 }
 
   /* In case of multiple types we need to detect the smallest type.  */
-  if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
-*max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
-
+  vect_update_max_nunits (max_nunits, vectype);
   return true;
 }
 
@@ -540,7 +541,7 @@ vect_record_max_nunits (vec_info *vinfo,
 static bool
 vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   vec stmts, unsigned int group_size,
-  unsigned nops, unsigned int *max_nunits,
+  unsigned nops, poly_uint64 *max_nunits,
   bool *matches, bool *two_operators)
 {
   unsigned int i;
@@ -966,16 +967,15 @@ bst_traits::equal (value_type existing,
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
-  unsigned int *max_nunits,
+  poly_uint64 *max_nunits,
   vec *loads,
   bool *matches, unsigned *npermutes, unsigned *tree_size,
   unsigned max_tree_size);
 
 static slp_tree
 vect_build_slp_tree (vec_info *vinfo,
- vec stmts, unsigned int group_size,
- unsigned int *max_nunits,
- vec *loads,
+vec stmts, unsigned int group_size,
+poly_uint64 *max_nunits, vec *loads,
 bool *matches, unsigned *npermutes, unsigned *tree_size,
 unsigned max_tree_size)
 {
@@ -1007,12 +1007,13 @@ vect_build_slp_tree (vec_info *vinfo,
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
-  unsigned int *max_nunits,
+  poly_uint64 *max_nunits,
   vec *loads,
   bool *matches, unsigned *npermutes, unsigned *tree_size,
   unsigned max_tree_size)
 {
-  unsigned nops, i, this_tree_size = 0, this_max_nunits = *max_nunits;
+  unsigned nops, i, this_tree_size = 0;
+  poly_uint64 this_max_nunits = *max_nunits;
   gimple *stmt;
   slp_tree node;
 
@@ -1951,6 +1952,15 @@ vect_split_slp_store_group (gimple *firs
   return group2;
 }
 
+/* Calculate the unrolling factor for an SLP instance with GROUP_SIZE
+   statements and a vector of NUNITS elements.  */
+
+static poly_uint64
+calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
+{
+  return exact_div (common_multiple (nunits, group_size), group_size);
+}
+
 /* Analyze an SLP instance starting from a group of grouped stores.  Call
vect_build_slp_tree to build a tree of packed stmts if possible.
Return FALSE if it's impossible to SLP any stmt in the loop.  */
@@ -1962,11 +1972,9 @@ vect_analyze_slp_instance (vec_info *vin
   slp_instance new_instance;
   slp_tree node;
   unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (stmt));
-  unsigned int nunits;
   tree vectype, scalar_type = NULL_TREE;
   gimple *next;
   unsigned int i;
-  unsigned int max_nunits = 0;
   vec loads;
   struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
   vec scalar_stmts;
@@ 

[062/nnn] poly_int: prune_runtime_alias_test_list

2017-10-23 Thread Richard Sandiford
This patch makes prune_runtime_alias_test_list take the iteration
factor as a poly_int and tracks polynomial offsets internally
as well.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-data-ref.h (prune_runtime_alias_test_list): Take the
factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
* tree-data-ref.c (prune_runtime_alias_test_list): Likewise.
Track polynomial offsets.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-10-13 10:23:39.775145588 +0100
+++ gcc/tree-data-ref.h 2017-10-23 17:22:25.492282436 +0100
@@ -472,7 +472,7 @@ extern bool dr_equal_offsets_p (struct d
 extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec *,
-  unsigned HOST_WIDE_INT);
+  poly_uint64);
 extern void create_runtime_alias_checks (struct loop *,
 vec *, tree*);
 /* Return true when the base objects of data references A and B are
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-10-23 17:22:18.231825655 +0100
+++ gcc/tree-data-ref.c 2017-10-23 17:22:25.492282436 +0100
@@ -1417,7 +1417,7 @@ comp_dr_with_seg_len_pair (const void *p
 
 void
 prune_runtime_alias_test_list (vec *alias_pairs,
-  unsigned HOST_WIDE_INT factor)
+  poly_uint64 factor)
 {
   /* Sort the collected data ref pairs so that we can scan them once to
  combine all possible aliasing checks.  */
@@ -1462,51 +1462,63 @@ prune_runtime_alias_test_list (vecdr),
DR_BASE_ADDRESS (dr_a2->dr), 0)
  || !operand_equal_p (DR_OFFSET (dr_a1->dr),
   DR_OFFSET (dr_a2->dr), 0)
- || !tree_fits_shwi_p (DR_INIT (dr_a1->dr))
- || !tree_fits_shwi_p (DR_INIT (dr_a2->dr)))
+ || !poly_int_tree_p (DR_INIT (dr_a1->dr), _a1)
+ || !poly_int_tree_p (DR_INIT (dr_a2->dr), _a2))
continue;
 
+ /* Don't combine if we can't tell which one comes first.  */
+ if (!ordered_p (init_a1, init_a2))
+   continue;
+
+ /* Make sure dr_a1 starts left of dr_a2.  */
+ if (may_gt (init_a1, init_a2))
+   {
+ std::swap (*dr_a1, *dr_a2);
+ std::swap (init_a1, init_a2);
+   }
+
  /* Only merge const step data references.  */
- if (TREE_CODE (DR_STEP (dr_a1->dr)) != INTEGER_CST
- || TREE_CODE (DR_STEP (dr_a2->dr)) != INTEGER_CST)
+ poly_int64 step_a1, step_a2;
+ if (!poly_int_tree_p (DR_STEP (dr_a1->dr), _a1)
+ || !poly_int_tree_p (DR_STEP (dr_a2->dr), _a2))
continue;
 
- /* DR_A1 and DR_A2 must goes in the same direction.  */
- if (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node)
- != tree_int_cst_compare (DR_STEP (dr_a2->dr), size_zero_node))
+ bool neg_step = may_lt (step_a1, 0) || may_lt (step_a2, 0);
+
+ /* DR_A1 and DR_A2 must go in the same direction.  */
+ if (neg_step && (may_gt (step_a1, 0) || may_gt (step_a2, 0)))
continue;
 
- bool neg_step
-   = (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node) < 0);
+ poly_uint64 seg_len_a1 = 0, seg_len_a2 = 0;
+ bool const_seg_len_a1 = poly_int_tree_p (dr_a1->seg_len,
+  _len_a1);
+ bool const_seg_len_a2 = poly_int_tree_p (dr_a2->seg_len,
+  _len_a2);
 
  /* We need to compute merged segment length at compilation time for
 dr_a1 and dr_a2, which is impossible if either one has non-const
 segment length.  */
- if ((!tree_fits_uhwi_p (dr_a1->seg_len)
-  || !tree_fits_uhwi_p (dr_a2->seg_len))
- && tree_int_cst_compare (DR_STEP (dr_a1->dr),
-  DR_STEP (dr_a2->dr)) != 0)
+ if ((!const_seg_len_a1 || !const_seg_len_a2)
+ && may_ne (step_a1, step_a2))
continue;
 
- /* Make sure dr_a1 starts left of dr_a2.  */
- if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))
-   std::swap (*dr_a1, *dr_a2);
-
  bool do_remove = false;
- wide_int diff = (wi::to_wide (DR_INIT (dr_a2->dr))
-  - wi::to_wide (DR_INIT (dr_a1->dr)));
- wide_int min_seg_len_b;
+ poly_uint64 diff = init_a2 - init_a1;
+ poly_uint64 min_seg_len_b;
  tree new_seg_len;
 
- if 

[063/nnn] poly_int: vectoriser vf and uf

2017-10-23 Thread Richard Sandiford
This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64.  This in turn required some knock-on
changes in signedness elsewhere.

Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.

The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types.  Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.

The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.

The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
from an unsigned int to a poly_uint64.
(_loop_vec_info::slp_unrolling_factor): Likewise.
(_loop_vec_info::vectorization_factor): Change from an int
to a poly_uint64.
(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
(vect_get_num_vectors): New function.
(vect_update_max_nunits, vect_vf_for_cost): Likewise.
(vect_get_num_copies): Use vect_get_num_vectors.
(vect_analyze_data_ref_dependences): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_ref_dependences): Likewise.
(vect_compute_data_ref_alignment): Handle polynomial vf.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_supportable_dr_alignment): Likewise.
(dependence_distance_ge_vf): Take the vectorization factor as a
poly_uint64 rather than an unsigned HOST_WIDE_INT.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
vfm1 as a poly_uint64 rather than an int.  Make the same change
for the returned bound_scalar.
(vect_gen_vector_loop_niters): Handle polynomial vf.
(vect_do_peeling): Likewise.  Update call to
vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
be constant.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
(vect_transform_loop): Likewise.  Use the lowest possible VF when
updating the upper bounds of the loop.
(vect_min_worthwhile_factor): Make static.  Return an unsigned int
rather than an int.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
polynomial unroll factors.
(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
(vect_make_slp_decision): Likewise.
(vect_supported_load_permutation_p): Likewise, and polynomial
vf too.
(vect_analyze_slp_cost): Handle polynomial vf.
(vect_slp_analyze_node_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
(vectorizable_load): Handle polynomial vf.
* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
a poly_uint64.
(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.

gcc/testsuite/
* gcc.dg/vect-opt-info-1.c: New test.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-10-23 17:22:23.377858186 +0100
+++ gcc/tree-vectorizer.h   2017-10-23 17:22:26.575499779 +0100
@@ -129,7 +129,7 @@ typedef struct _slp_instance {
   unsigned int group_size;
 
   /* The unrolling factor required to vectorized this SLP instance.  */
-  unsigned int unrolling_factor;
+  

[061/nnn] poly_int: compute_data_ref_alignment

2017-10-23 Thread Richard Sandiford
This patch makes vect_compute_data_ref_alignment treat DR_INIT as a
poly_int and handles cases in which the calculated misalignment might
not be constant.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Treat drb->init as a poly_int.  Fail if its misalignment wrt
vector_alignment isn't known.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-10-23 17:22:18.234826257 +0100
+++ gcc/tree-vect-data-refs.c   2017-10-23 17:22:24.456074525 +0100
@@ -944,8 +944,8 @@ vect_compute_data_ref_alignment (struct
   DR_VECT_AUX (dr)->base_misaligned = true;
   base_misalignment = 0;
 }
-  unsigned int misalignment = (base_misalignment
-  + TREE_INT_CST_LOW (drb->init));
+  poly_int64 misalignment
+= base_misalignment + wi::to_poly_offset (drb->init).force_shwi ();
 
   /* If this is a backward running DR then first access in the larger
  vectype actually is N-1 elements before the address in the DR.
@@ -955,7 +955,21 @@ vect_compute_data_ref_alignment (struct
 misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
 * TREE_INT_CST_LOW (drb->step));
 
-  SET_DR_MISALIGNMENT (dr, misalignment & (vector_alignment - 1));
+  unsigned int const_misalignment;
+  if (!known_misalignment (misalignment, vector_alignment,
+  _misalignment))
+{
+  if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Non-constant misalignment for access: ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, ref);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+   }
+  return true;
+}
+
+  SET_DR_MISALIGNMENT (dr, const_misalignment);
 
   if (dump_enabled_p ())
 {


[059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use

2017-10-23 Thread Richard Sandiford
This patch makes ivopts handle polynomial address offsets
when recording potential IV uses.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-ssa-loop-ivopts.c (iv_use::addr_offset): Change from
an unsigned HOST_WIDE_INT to a poly_uint64_pod.
(group_compare_offset): Update accordingly.
(split_small_address_groups_p): Likewise.
(record_use): Take addr_offset as a poly_uint64 rather than
an unsigned HOST_WIDE_INT.
(strip_offset): Return the offset as a poly_uint64 rather than
an unsigned HOST_WIDE_INT.
(record_group_use, split_address_groups): Track polynomial offsets.
(add_iv_candidate_for_use): Likewise.
(addr_offset_valid_p): Take the offset as a poly_int64 rather
than a HOST_WIDE_INT.
(strip_offset_1): Return the offset as a poly_int64 rather than
a HOST_WIDE_INT.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  2017-10-23 17:17:03.208794553 +0100
+++ gcc/tree-ssa-loop-ivopts.c  2017-10-23 17:22:22.298641645 +0100
@@ -367,7 +367,7 @@ struct iv_use
   tree *op_p;  /* The place where it occurs.  */
 
   tree addr_base;  /* Base address with const offset stripped.  */
-  unsigned HOST_WIDE_INT addr_offset;
+  poly_uint64_pod addr_offset;
/* Const offset stripped from base address.  */
 };
 
@@ -1508,7 +1508,7 @@ find_induction_variables (struct ivopts_
 static struct iv_use *
 record_use (struct iv_group *group, tree *use_p, struct iv *iv,
gimple *stmt, enum use_type type, tree addr_base,
-   unsigned HOST_WIDE_INT addr_offset)
+   poly_uint64 addr_offset)
 {
   struct iv_use *use = XCNEW (struct iv_use);
 
@@ -1553,7 +1553,7 @@ record_invariant (struct ivopts_data *da
 }
 
 static tree
-strip_offset (tree expr, unsigned HOST_WIDE_INT *offset);
+strip_offset (tree expr, poly_uint64 *offset);
 
 /* Record a group of TYPE.  */
 
@@ -1580,7 +1580,7 @@ record_group_use (struct ivopts_data *da
 {
   tree addr_base = NULL;
   struct iv_group *group = NULL;
-  unsigned HOST_WIDE_INT addr_offset = 0;
+  poly_uint64 addr_offset = 0;
 
   /* Record non address type use in a new group.  */
   if (type == USE_ADDRESS && iv->base_object)
@@ -2514,7 +2514,7 @@ find_interesting_uses_outside (struct iv
 static GTY (()) vec *addr_list;
 
 static bool
-addr_offset_valid_p (struct iv_use *use, HOST_WIDE_INT offset)
+addr_offset_valid_p (struct iv_use *use, poly_int64 offset)
 {
   rtx reg, addr;
   unsigned list_index;
@@ -2548,10 +2548,7 @@ group_compare_offset (const void *a, con
   const struct iv_use *const *u1 = (const struct iv_use *const *) a;
   const struct iv_use *const *u2 = (const struct iv_use *const *) b;
 
-  if ((*u1)->addr_offset != (*u2)->addr_offset)
-return (*u1)->addr_offset < (*u2)->addr_offset ? -1 : 1;
-  else
-return 0;
+  return compare_sizes_for_sort ((*u1)->addr_offset, (*u2)->addr_offset);
 }
 
 /* Check if small groups should be split.  Return true if no group
@@ -2582,7 +2579,8 @@ split_small_address_groups_p (struct ivo
   gcc_assert (group->type == USE_ADDRESS);
   if (group->vuses.length () == 2)
{
- if (group->vuses[0]->addr_offset > group->vuses[1]->addr_offset)
+ if (compare_sizes_for_sort (group->vuses[0]->addr_offset,
+ group->vuses[1]->addr_offset) > 0)
std::swap (group->vuses[0], group->vuses[1]);
}
   else
@@ -2594,7 +2592,7 @@ split_small_address_groups_p (struct ivo
   distinct = 1;
   for (pre = group->vuses[0], j = 1; j < group->vuses.length (); j++)
{
- if (group->vuses[j]->addr_offset != pre->addr_offset)
+ if (may_ne (group->vuses[j]->addr_offset, pre->addr_offset))
{
  pre = group->vuses[j];
  distinct++;
@@ -2635,13 +2633,13 @@ split_address_groups (struct ivopts_data
   for (j = 1; j < group->vuses.length ();)
{
  struct iv_use *next = group->vuses[j];
- HOST_WIDE_INT offset = next->addr_offset - use->addr_offset;
+ poly_int64 offset = next->addr_offset - use->addr_offset;
 
  /* Split group if aksed to, or the offset against the first
 use can't fit in offset part of addressing mode.  IV uses
 having the same offset are still kept in one group.  */
- if (offset != 0 &&
- (split_p || !addr_offset_valid_p (use, offset)))
+ if (maybe_nonzero (offset)
+ && (split_p || !addr_offset_valid_p (use, offset)))
{
  if (!new_group)
new_group = record_group (data, group->type);
@@ -2702,12 +2700,13 @@ find_interesting_uses (struct ivopts_dat
 
 static tree
 strip_offset_1 (tree 

[060/nnn] poly_int: loop versioning threshold

2017-10-23 Thread Richard Sandiford
This patch splits the loop versioning threshold out from the
cost model threshold so that the former can become a poly_uint64.
We still use a single test to enforce both limits where possible.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold
field.
(LOOP_VINFO_VERSIONING_THRESHOLD): New macro
(vect_loop_versioning): Take the loop versioning threshold as a
separate parameter.
* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
versioning_threshold.
(vect_analyze_loop_2): Compute the loop versioning threshold
whenever loop versioning is needed, and store it in the new
field rather than combining it with the cost model threshold.
(vect_transform_loop): Update call to vect_loop_versioning.
Try to combine the loop versioning and cost thresholds here.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-10-23 17:11:39.817127625 +0100
+++ gcc/tree-vectorizer.h   2017-10-23 17:22:23.377858186 +0100
@@ -238,6 +238,12 @@ typedef struct _loop_vec_info : public v
  PARAM_MIN_VECT_LOOP_BOUND.  */
   unsigned int th;
 
+  /* When applying loop versioning, the vector form should only be used
+ if the number of scalar iterations is >= this value, on top of all
+ the other requirements.  Ignored when loop versioning is not being
+ used.  */
+  poly_uint64 versioning_threshold;
+
   /* Unrolling factor  */
   int vectorization_factor;
 
@@ -357,6 +363,7 @@ #define LOOP_VINFO_NITERS(L)
 #define LOOP_VINFO_NITERS_UNCHANGED(L) (L)->num_iters_unchanged
 #define LOOP_VINFO_NITERS_ASSUMPTIONS(L)   (L)->num_iters_assumptions
 #define LOOP_VINFO_COST_MODEL_THRESHOLD(L) (L)->th
+#define LOOP_VINFO_VERSIONING_THRESHOLD(L) (L)->versioning_threshold
 #define LOOP_VINFO_VECTORIZABLE_P(L)   (L)->vectorizable
 #define LOOP_VINFO_VECT_FACTOR(L)  (L)->vectorization_factor
 #define LOOP_VINFO_MAX_VECT_FACTOR(L)  (L)->max_vectorization_factor
@@ -1143,7 +1150,8 @@ extern void slpeel_make_loop_iterate_nti
 extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
 struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
 struct loop *, edge);
-extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
+extern void vect_loop_versioning (loop_vec_info, unsigned int, bool,
+ poly_uint64);
 extern struct loop *vect_do_peeling (loop_vec_info, tree, tree,
 tree *, tree *, tree *, int, bool, bool);
 extern source_location find_loop_location (struct loop *);
Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2017-10-23 17:11:39.816125711 +0100
+++ gcc/tree-vect-loop-manip.c  2017-10-23 17:22:23.376857985 +0100
@@ -2295,7 +2295,8 @@ vect_create_cond_for_alias_checks (loop_
 
 void
 vect_loop_versioning (loop_vec_info loop_vinfo,
- unsigned int th, bool check_profitability)
+ unsigned int th, bool check_profitability,
+ poly_uint64 versioning_threshold)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop;
   struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
@@ -2320,6 +2321,17 @@ vect_loop_versioning (loop_vec_info loop
 cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 build_int_cst (TREE_TYPE (scalar_loop_iters),
th - 1));
+  if (maybe_nonzero (versioning_threshold))
+{
+  tree expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
+  build_int_cst (TREE_TYPE (scalar_loop_iters),
+ versioning_threshold - 1));
+  if (cond_expr)
+   cond_expr = fold_build2 (BIT_AND_EXPR, boolean_type_node,
+expr, cond_expr);
+  else
+   cond_expr = expr;
+}
 
   if (version_niter)
 vect_create_cond_for_niters_checks (loop_vinfo, _expr);
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-10-23 17:11:39.816125711 +0100
+++ gcc/tree-vect-loop.c2017-10-23 17:22:23.377858186 +0100
@@ -1110,6 +1110,7 @@ _loop_vec_info::_loop_vec_info (struct l
 num_iters_unchanged (NULL_TREE),
 num_iters_assumptions (NULL_TREE),
 th (0),
+versioning_threshold (0),
 vectorization_factor (0),
 max_vectorization_factor (0),
 unaligned_dr 

[058/nnn] poly_int: get_binfo_at_offset

2017-10-23 Thread Richard Sandiford
This patch changes the offset parameter to get_binfo_at_offset
from HOST_WIDE_INT to poly_int64.  This function probably doesn't
need to handle polynomial offsets in practice, but it's easy
to do and avoids forcing the caller to check first.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (get_binfo_at_offset): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* tree.c (get_binfo_at_offset): Likewise.

Index: gcc/tree.h
===
--- gcc/tree.h  2017-10-23 17:20:50.884679814 +0100
+++ gcc/tree.h  2017-10-23 17:22:21.308442966 +0100
@@ -4836,7 +4836,7 @@ extern void tree_set_block (tree, tree);
 extern location_t *block_nonartificial_location (tree);
 extern location_t tree_nonartificial_location (tree);
 extern tree block_ultimate_origin (const_tree);
-extern tree get_binfo_at_offset (tree, HOST_WIDE_INT, tree);
+extern tree get_binfo_at_offset (tree, poly_int64, tree);
 extern bool virtual_method_call_p (const_tree);
 extern tree obj_type_ref_class (const_tree ref);
 extern bool types_same_for_odr (const_tree type1, const_tree type2,
Index: gcc/tree.c
===
--- gcc/tree.c  2017-10-23 17:22:18.236826658 +0100
+++ gcc/tree.c  2017-10-23 17:22:21.307442765 +0100
@@ -12328,7 +12328,7 @@ lookup_binfo_at_offset (tree binfo, tree
found, return, otherwise return NULL_TREE.  */
 
 tree
-get_binfo_at_offset (tree binfo, HOST_WIDE_INT offset, tree expected_type)
+get_binfo_at_offset (tree binfo, poly_int64 offset, tree expected_type)
 {
   tree type = BINFO_TYPE (binfo);
 
@@ -12340,7 +12340,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
   if (types_same_for_odr (type, expected_type))
  return binfo;
-  if (offset < 0)
+  if (may_lt (offset, 0))
return NULL_TREE;
 
   for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
@@ -12350,7 +12350,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
  pos = int_bit_position (fld);
  size = tree_to_uhwi (DECL_SIZE (fld));
- if (pos <= offset && (pos + size) > offset)
+ if (known_in_range_p (offset, pos, size))
break;
}
   if (!fld || TREE_CODE (TREE_TYPE (fld)) != RECORD_TYPE)
@@ -12358,7 +12358,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
   /* Offset 0 indicates the primary base, whose vtable contents are
 represented in the binfo for the derived class.  */
-  else if (offset != 0)
+  else if (maybe_nonzero (offset))
{
  tree found_binfo = NULL, base_binfo;
  /* Offsets in BINFO are in bytes relative to the whole structure


[057/nnn] poly_int: build_ref_for_offset

2017-10-23 Thread Richard Sandiford
This patch changes the offset parameter to build_ref_for_offset
from HOST_WIDE_INT to poly_int64.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* ipa-prop.h (build_ref_for_offset): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* tree-sra.c (build_ref_for_offset): Likewise.

Index: gcc/ipa-prop.h
===
--- gcc/ipa-prop.h  2017-10-23 17:16:58.508429306 +0100
+++ gcc/ipa-prop.h  2017-10-23 17:22:20.152210973 +0100
@@ -878,7 +878,7 @@ void ipa_release_body_info (struct ipa_f
 tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
 
 /* From tree-sra.c:  */
-tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
+tree build_ref_for_offset (location_t, tree, poly_int64, bool, tree,
   gimple_stmt_iterator *, bool);
 
 /* In ipa-cp.c  */
Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  2017-10-23 17:18:47.667056920 +0100
+++ gcc/tree-sra.c  2017-10-23 17:22:20.153211173 +0100
@@ -1671,7 +1671,7 @@ make_fancy_name (tree expr)
of handling bitfields.  */
 
 tree
-build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
+build_ref_for_offset (location_t loc, tree base, poly_int64 offset,
  bool reverse, tree exp_type, gimple_stmt_iterator *gsi,
  bool insert_after)
 {
@@ -1689,7 +1689,7 @@ build_ref_for_offset (location_t loc, tr
 TYPE_QUALS (exp_type)
 | ENCODE_QUAL_ADDR_SPACE (as));
 
-  gcc_checking_assert (offset % BITS_PER_UNIT == 0);
+  poly_int64 byte_offset = exact_div (offset, BITS_PER_UNIT);
   get_object_alignment_1 (base, , );
   base = get_addr_base_and_unit_offset (base, _offset);
 
@@ -1711,27 +1711,26 @@ build_ref_for_offset (location_t loc, tr
   else
gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 
-  off = build_int_cst (reference_alias_ptr_type (prev_base),
-  offset / BITS_PER_UNIT);
+  off = build_int_cst (reference_alias_ptr_type (prev_base), byte_offset);
   base = tmp;
 }
   else if (TREE_CODE (base) == MEM_REF)
 {
   off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
-  base_offset + offset / BITS_PER_UNIT);
+  base_offset + byte_offset);
   off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);
   base = unshare_expr (TREE_OPERAND (base, 0));
 }
   else
 {
   off = build_int_cst (reference_alias_ptr_type (prev_base),
-  base_offset + offset / BITS_PER_UNIT);
+  base_offset + byte_offset);
   base = build_fold_addr_expr (unshare_expr (base));
 }
 
-  misalign = (misalign + offset) & (align - 1);
-  if (misalign != 0)
-align = least_bit_hwi (misalign);
+  unsigned int align_bound = known_alignment (misalign + offset);
+  if (align_bound != 0)
+align = MIN (align, align_bound);
   if (align != TYPE_ALIGN (exp_type))
 exp_type = build_aligned_type (exp_type, align);
 


[056/nnn] poly_int: MEM_REF offsets

2017-10-23 Thread Richard Sandiford
This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset
now returning a poly_offset_int instead of an offset_int.  The
non-mechanical changes to callers of mem_ref_offset were handled by
previous patches.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* fold-const.h (mem_ref_offset): Return a poly_offset_int rather
than an offset_int.
* tree.c (mem_ref_offset): Likewise.
* builtins.c (get_object_alignment_2): Treat MEM_REF offsets as
poly_ints.
* expr.c (get_inner_reference, expand_expr_real_1): Likewise.
* gimple-fold.c (get_base_constructor): Likewise.
* gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise.
* ipa-prop.c (compute_complex_assign_jump_func, get_ancestor_addr_info)
(ipa_get_adjustment_candidate): Likewise.
* match.pd: Likewise.
* tree-data-ref.c (dr_analyze_innermost): Likewise.
* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
* tree-eh.c (tree_could_trap_p): Likewise.
* tree-object-size.c (addr_object_size): Likewise.
* tree-ssa-address.c (copy_ref_info): Likewise.
* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
(indirect_refs_may_alias_p): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
* tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise.
(non_rewritable_mem_ref_base): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
* tree-vrp.c (search_for_addr_array): Likewise.
* varasm.c (decode_addr_const): Likewise.

Index: gcc/fold-const.h
===
--- gcc/fold-const.h2017-10-23 17:18:47.662057360 +0100
+++ gcc/fold-const.h2017-10-23 17:22:18.228825053 +0100
@@ -114,7 +114,7 @@ extern tree fold_indirect_ref_loc (locat
 extern tree build_simple_mem_ref_loc (location_t, tree);
 #define build_simple_mem_ref(T)\
build_simple_mem_ref_loc (UNKNOWN_LOCATION, T)
-extern offset_int mem_ref_offset (const_tree);
+extern poly_offset_int mem_ref_offset (const_tree);
 extern tree build_invariant_address (tree, tree, poly_int64);
 extern tree constant_boolean_node (bool, tree);
 extern tree div_if_zero_remainder (const_tree, const_tree);
Index: gcc/tree.c
===
--- gcc/tree.c  2017-10-23 17:17:01.436033953 +0100
+++ gcc/tree.c  2017-10-23 17:22:18.236826658 +0100
@@ -4925,10 +4925,11 @@ build_simple_mem_ref_loc (location_t loc
 
 /* Return the constant offset of a MEM_REF or TARGET_MEM_REF tree T.  */
 
-offset_int
+poly_offset_int
 mem_ref_offset (const_tree t)
 {
-  return offset_int::from (wi::to_wide (TREE_OPERAND (t, 1)), SIGNED);
+  return poly_offset_int::from (wi::to_poly_wide (TREE_OPERAND (t, 1)),
+   SIGNED);
 }
 
 /* Return an invariant ADDR_EXPR of type TYPE taking the address of BASE
Index: gcc/builtins.c
===
--- gcc/builtins.c  2017-10-23 17:18:57.855161317 +0100
+++ gcc/builtins.c  2017-10-23 17:22:18.226824652 +0100
@@ -350,7 +350,7 @@ get_object_alignment_2 (tree exp, unsign
  bitpos += ptr_bitpos;
  if (TREE_CODE (exp) == MEM_REF
  || TREE_CODE (exp) == TARGET_MEM_REF)
-   bitpos += mem_ref_offset (exp).to_short_addr () * BITS_PER_UNIT;
+   bitpos += mem_ref_offset (exp).force_shwi () * BITS_PER_UNIT;
}
 }
   else if (TREE_CODE (exp) == STRING_CST)
Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:20:49.571719793 +0100
+++ gcc/expr.c  2017-10-23 17:22:18.228825053 +0100
@@ -7165,8 +7165,8 @@ get_inner_reference (tree exp, poly_int6
  tree off = TREE_OPERAND (exp, 1);
  if (!integer_zerop (off))
{
- offset_int boff, coff = mem_ref_offset (exp);
- boff = coff << LOG2_BITS_PER_UNIT;
+ poly_offset_int boff = mem_ref_offset (exp);
+ boff <<= LOG2_BITS_PER_UNIT;
  bit_offset += boff;
}
  exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
@@ -10255,9 +10255,9 @@ expand_expr_real_1 (tree exp, rtx target
   might end up in a register.  */
if (mem_ref_refers_to_non_mem_p (exp))
  {
-   HOST_WIDE_INT offset = mem_ref_offset (exp).to_short_addr ();
+   poly_int64 offset = mem_ref_offset (exp).force_shwi ();
base = TREE_OPERAND (base, 0);
-   if (offset == 0
+   if (known_zero (offset)
&& !reverse

[055/nnn] poly_int: find_bswap_or_nop_load

2017-10-23 Thread Richard Sandiford
This patch handles polynomial offsets in find_bswap_or_nop_load,
which could be useful for constant-sized data at a variable offset.
It is needed for a later patch to compile.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-ssa-math-opts.c (find_bswap_or_nop_load): Track polynomial
offsets for MEM_REFs.

Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c2017-10-23 17:18:47.667056920 +0100
+++ gcc/tree-ssa-math-opts.c2017-10-23 17:22:16.929564362 +0100
@@ -2122,35 +2122,31 @@ find_bswap_or_nop_load (gimple *stmt, tr
 
   if (TREE_CODE (base_addr) == MEM_REF)
 {
-  offset_int bit_offset = 0;
+  poly_offset_int bit_offset = 0;
   tree off = TREE_OPERAND (base_addr, 1);
 
   if (!integer_zerop (off))
{
- offset_int boff, coff = mem_ref_offset (base_addr);
- boff = coff << LOG2_BITS_PER_UNIT;
+ poly_offset_int boff = mem_ref_offset (base_addr);
+ boff <<= LOG2_BITS_PER_UNIT;
  bit_offset += boff;
}
 
   base_addr = TREE_OPERAND (base_addr, 0);
 
   /* Avoid returning a negative bitpos as this may wreak havoc later.  */
-  if (wi::neg_p (bit_offset))
+  if (may_lt (bit_offset, 0))
{
- offset_int mask = wi::mask  (LOG2_BITS_PER_UNIT, false);
- offset_int tem = wi::bit_and_not (bit_offset, mask);
- /* TEM is the bitpos rounded to BITS_PER_UNIT towards -Inf.
-Subtract it to BIT_OFFSET and add it (scaled) to OFFSET.  */
- bit_offset -= tem;
- tem >>= LOG2_BITS_PER_UNIT;
+ tree byte_offset = wide_int_to_tree
+   (sizetype, bits_to_bytes_round_down (bit_offset));
+ bit_offset = num_trailing_bits (bit_offset);
  if (offset)
-   offset = size_binop (PLUS_EXPR, offset,
-   wide_int_to_tree (sizetype, tem));
+   offset = size_binop (PLUS_EXPR, offset, byte_offset);
  else
-   offset = wide_int_to_tree (sizetype, tem);
+   offset = byte_offset;
}
 
-  bitpos += bit_offset.to_shwi ();
+  bitpos += bit_offset.force_shwi ();
 }
 
   if (!multiple_p (bitpos, BITS_PER_UNIT, ))


[054/nnn] poly_int: adjust_ptr_info_misalignment

2017-10-23 Thread Richard Sandiford
This patch makes adjust_ptr_info_misalignment take the adjustment
as a poly_uint64 rather than an unsigned int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-ssanames.h (adjust_ptr_info_misalignment): Take the increment
as a poly_uint64 rather than an unsigned int.
* tree-ssanames.c (adjust_ptr_info_misalignment): Likewise.

Index: gcc/tree-ssanames.h
===
--- gcc/tree-ssanames.h 2017-10-23 17:22:13.147805567 +0100
+++ gcc/tree-ssanames.h 2017-10-23 17:22:15.674312500 +0100
@@ -89,8 +89,7 @@ extern bool get_ptr_info_alignment (stru
 extern void mark_ptr_info_alignment_unknown (struct ptr_info_def *);
 extern void set_ptr_info_alignment (struct ptr_info_def *, unsigned int,
unsigned int);
-extern void adjust_ptr_info_misalignment (struct ptr_info_def *,
- unsigned int);
+extern void adjust_ptr_info_misalignment (struct ptr_info_def *, poly_uint64);
 extern struct ptr_info_def *get_ptr_info (tree);
 extern void set_ptr_nonnull (tree);
 extern bool get_ptr_nonnull (const_tree);
Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c 2017-10-23 17:22:13.147805567 +0100
+++ gcc/tree-ssanames.c 2017-10-23 17:22:15.674312500 +0100
@@ -643,13 +643,16 @@ set_ptr_info_alignment (struct ptr_info_
misalignment by INCREMENT modulo its current alignment.  */
 
 void
-adjust_ptr_info_misalignment (struct ptr_info_def *pi,
- unsigned int increment)
+adjust_ptr_info_misalignment (struct ptr_info_def *pi, poly_uint64 increment)
 {
   if (pi->align != 0)
 {
-  pi->misalign += increment;
-  pi->misalign &= (pi->align - 1);
+  increment += pi->misalign;
+  if (!known_misalignment (increment, pi->align, >misalign))
+   {
+ pi->align = known_alignment (increment);
+ pi->misalign = 0;
+   }
 }
 }
 


[053/nnn] poly_int: decode_addr_const

2017-10-23 Thread Richard Sandiford
This patch makes the varasm-local addr_const track polynomial offsets.
I'm not sure how useful this is, but it was easier to convert than not.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* varasm.c (addr_const::offset): Change from HOST_WIDE_INT
to poly_int64.
(decode_addr_const): Update accordingly.

Index: gcc/varasm.c
===
--- gcc/varasm.c2017-10-23 17:11:39.974428235 +0100
+++ gcc/varasm.c2017-10-23 17:20:52.530629696 +0100
@@ -2873,29 +2873,31 @@ assemble_real (REAL_VALUE_TYPE d, scalar
 
 struct addr_const {
   rtx base;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 };
 
 static void
 decode_addr_const (tree exp, struct addr_const *value)
 {
   tree target = TREE_OPERAND (exp, 0);
-  int offset = 0;
+  poly_int64 offset = 0;
   rtx x;
 
   while (1)
 {
+  poly_int64 bytepos;
   if (TREE_CODE (target) == COMPONENT_REF
- && tree_fits_shwi_p (byte_position (TREE_OPERAND (target, 1
+ && poly_int_tree_p (byte_position (TREE_OPERAND (target, 1)),
+ ))
{
- offset += int_byte_position (TREE_OPERAND (target, 1));
+ offset += bytepos;
  target = TREE_OPERAND (target, 0);
}
   else if (TREE_CODE (target) == ARRAY_REF
   || TREE_CODE (target) == ARRAY_RANGE_REF)
{
  offset += (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (target)))
-* tree_to_shwi (TREE_OPERAND (target, 1)));
+* tree_to_poly_int64 (TREE_OPERAND (target, 1)));
  target = TREE_OPERAND (target, 0);
}
   else if (TREE_CODE (target) == MEM_REF
@@ -3042,14 +3044,14 @@ const_hash_1 (const tree exp)
  case SYMBOL_REF:
/* Don't hash the address of the SYMBOL_REF;
   only use the offset and the symbol name.  */
-   hi = value.offset;
+   hi = value.offset.coeffs[0];
p = XSTR (value.base, 0);
for (i = 0; p[i] != 0; i++)
  hi = ((hi * 613) + (unsigned) (p[i]));
break;
 
  case LABEL_REF:
-   hi = (value.offset
+   hi = (value.offset.coeffs[0]
  + CODE_LABEL_NUMBER (label_ref_label (value.base)) * 13);
break;
 
@@ -3242,7 +3244,7 @@ compare_constant (const tree t1, const t
decode_addr_const (t1, );
decode_addr_const (t2, );
 
-   if (value1.offset != value2.offset)
+   if (may_ne (value1.offset, value2.offset))
  return 0;
 
code = GET_CODE (value1.base);


[052/nnn] poly_int: bit_field_size/offset

2017-10-23 Thread Richard Sandiford
verify_expr ensured that the size and offset in gimple BIT_FIELD_REFs
satisfied tree_fits_uhwi_p.  This patch extends that so that they can
be poly_uint64s, and adds helper routines for accessing them when the
verify_expr requirements apply.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (bit_field_size, bit_field_offset): New functions.
* hsa-gen.c (gen_hsa_addr): Use them.
* tree-ssa-forwprop.c (simplify_bitfield_ref): Likewise.
(simplify_vector_constructor): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
* tree-cfg.c (verify_expr): Require the sizes and offsets of a
BIT_FIELD_REF to be poly_uint64s rather than uhwis.
* fold-const.c (fold_ternary_loc): Protect tree_to_uhwi with
tree_fits_uhwi_p.

Index: gcc/tree.h
===
--- gcc/tree.h  2017-10-23 17:18:47.668056833 +0100
+++ gcc/tree.h  2017-10-23 17:20:50.884679814 +0100
@@ -4764,6 +4764,24 @@ poly_int_tree_p (const_tree t)
   return (TREE_CODE (t) == INTEGER_CST || POLY_INT_CST_P (t));
 }
 
+/* Return the bit size of BIT_FIELD_REF T, in cases where it is known
+   to be a poly_uint64.  (This is always true at the gimple level.)  */
+
+inline poly_uint64
+bit_field_size (const_tree t)
+{
+  return tree_to_poly_uint64 (TREE_OPERAND (t, 1));
+}
+
+/* Return the starting bit offset of BIT_FIELD_REF T, in cases where it is
+   known to be a poly_uint64.  (This is always true at the gimple level.)  */
+
+inline poly_uint64
+bit_field_offset (const_tree t)
+{
+  return tree_to_poly_uint64 (TREE_OPERAND (t, 2));
+}
+
 extern tree strip_float_extensions (tree);
 extern int really_constant_p (const_tree);
 extern bool ptrdiff_tree_p (const_tree, poly_int64_pod *);
Index: gcc/hsa-gen.c
===
--- gcc/hsa-gen.c   2017-10-23 17:18:47.664057184 +0100
+++ gcc/hsa-gen.c   2017-10-23 17:20:50.882679875 +0100
@@ -1959,8 +1959,8 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOS
   goto out;
 }
   else if (TREE_CODE (ref) == BIT_FIELD_REF
-  && ((tree_to_uhwi (TREE_OPERAND (ref, 1)) % BITS_PER_UNIT) != 0
-  || (tree_to_uhwi (TREE_OPERAND (ref, 2)) % BITS_PER_UNIT) != 0))
+  && (!multiple_p (bit_field_size (ref), BITS_PER_UNIT)
+  || !multiple_p (bit_field_offset (ref), BITS_PER_UNIT)))
 {
   HSA_SORRY_ATV (EXPR_LOCATION (origref),
 "support for HSA does not implement "
Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c 2017-10-23 17:17:01.434034223 +0100
+++ gcc/tree-ssa-forwprop.c 2017-10-23 17:20:50.883679845 +0100
@@ -1727,7 +1727,7 @@ simplify_bitfield_ref (gimple_stmt_itera
   gimple *def_stmt;
   tree op, op0, op1, op2;
   tree elem_type;
-  unsigned idx, n, size;
+  unsigned idx, size;
   enum tree_code code;
 
   op = gimple_assign_rhs1 (stmt);
@@ -1762,12 +1762,11 @@ simplify_bitfield_ref (gimple_stmt_itera
 return false;
 
   size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
-  n = TREE_INT_CST_LOW (op1) / size;
-  if (n != 1)
+  if (may_ne (bit_field_size (op), size))
 return false;
-  idx = TREE_INT_CST_LOW (op2) / size;
 
-  if (code == VEC_PERM_EXPR)
+  if (code == VEC_PERM_EXPR
+  && constant_multiple_p (bit_field_offset (op), size, ))
 {
   tree p, m, tem;
   unsigned nelts;
@@ -2020,9 +2019,10 @@ simplify_vector_constructor (gimple_stmt
return false;
  orig = ref;
}
-  if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size)
+  unsigned int elt;
+  if (may_ne (bit_field_size (op1), elem_size)
+ || !constant_multiple_p (bit_field_offset (op1), elem_size, ))
return false;
-  unsigned int elt = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size;
   if (elt != i)
maybe_ident = false;
   sel.quick_push (elt);
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c2017-10-23 17:17:01.435034088 +0100
+++ gcc/tree-ssa-sccvn.c2017-10-23 17:20:50.884679814 +0100
@@ -766,12 +766,8 @@ copy_reference_ops_from_ref (tree ref, v
  /* Record bits, position and storage order.  */
  temp.op0 = TREE_OPERAND (ref, 1);
  temp.op1 = TREE_OPERAND (ref, 2);
- if (tree_fits_shwi_p (TREE_OPERAND (ref, 2)))
-   {
- HOST_WIDE_INT off = tree_to_shwi (TREE_OPERAND (ref, 2));
- if (off % BITS_PER_UNIT == 0)
-   temp.off = off / BITS_PER_UNIT;
-   }
+ if (!multiple_p (bit_field_offset (ref), BITS_PER_UNIT, ))
+   temp.off = -1;
  temp.reverse = REF_REVERSE_STORAGE_ORDER (ref);
  break;
   

[051/nnn] poly_int: emit_group_load/store

2017-10-23 Thread Richard Sandiford
This patch changes the sizes passed to emit_group_load and
emit_group_store from int to poly_int64.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.h (emit_group_load, emit_group_load_into_temps)
(emit_group_store): Take the size as a poly_int64 rather than an int.
* expr.c (emit_group_load_1, emit_group_load): Likewise.
(emit_group_load_into_temp, emit_group_store): Likewise.

Index: gcc/expr.h
===
--- gcc/expr.h  2017-10-23 17:18:56.434286222 +0100
+++ gcc/expr.h  2017-10-23 17:20:49.571719793 +0100
@@ -128,10 +128,10 @@ extern rtx gen_group_rtx (rtx);
 
 /* Load a BLKmode value into non-consecutive registers represented by a
PARALLEL.  */
-extern void emit_group_load (rtx, rtx, tree, int);
+extern void emit_group_load (rtx, rtx, tree, poly_int64);
 
 /* Similarly, but load into new temporaries.  */
-extern rtx emit_group_load_into_temps (rtx, rtx, tree, int);
+extern rtx emit_group_load_into_temps (rtx, rtx, tree, poly_int64);
 
 /* Move a non-consecutive group of registers represented by a PARALLEL into
a non-consecutive group of registers represented by a PARALLEL.  */
@@ -142,7 +142,7 @@ extern rtx emit_group_move_into_temps (r
 
 /* Store a BLKmode value from non-consecutive registers represented by a
PARALLEL.  */
-extern void emit_group_store (rtx, rtx, tree, int);
+extern void emit_group_store (rtx, rtx, tree, poly_int64);
 
 extern rtx maybe_emit_group_store (rtx, tree);
 
Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:18:57.860160878 +0100
+++ gcc/expr.c  2017-10-23 17:20:49.571719793 +0100
@@ -2095,7 +2095,8 @@ gen_group_rtx (rtx orig)
into corresponding XEXP (XVECEXP (DST, 0, i), 0) element.  */
 
 static void
-emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type, int ssize)
+emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
+  poly_int64 ssize)
 {
   rtx src;
   int start, i;
@@ -2134,12 +2135,16 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
   for (i = start; i < XVECLEN (dst, 0); i++)
 {
   machine_mode mode = GET_MODE (XEXP (XVECEXP (dst, 0, i), 0));
-  HOST_WIDE_INT bytepos = INTVAL (XEXP (XVECEXP (dst, 0, i), 1));
-  unsigned int bytelen = GET_MODE_SIZE (mode);
-  int shift = 0;
-
-  /* Handle trailing fragments that run over the size of the struct.  */
-  if (ssize >= 0 && bytepos + (HOST_WIDE_INT) bytelen > ssize)
+  poly_int64 bytepos = INTVAL (XEXP (XVECEXP (dst, 0, i), 1));
+  poly_int64 bytelen = GET_MODE_SIZE (mode);
+  poly_int64 shift = 0;
+
+  /* Handle trailing fragments that run over the size of the struct.
+It's the target's responsibility to make sure that the fragment
+cannot be strictly smaller in some cases and strictly larger
+in others.  */
+  gcc_checking_assert (ordered_p (bytepos + bytelen, ssize));
+  if (known_size_p (ssize) && may_gt (bytepos + bytelen, ssize))
{
  /* Arrange to shift the fragment to where it belongs.
 extract_bit_field loads to the lsb of the reg.  */
@@ -2153,7 +2158,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
  )
shift = (bytelen - (ssize - bytepos)) * BITS_PER_UNIT;
  bytelen = ssize - bytepos;
- gcc_assert (bytelen > 0);
+ gcc_assert (may_gt (bytelen, 0));
}
 
   /* If we won't be loading directly from memory, protect the real source
@@ -2177,33 +2182,34 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
   if (MEM_P (src)
  && (! targetm.slow_unaligned_access (mode, MEM_ALIGN (src))
  || MEM_ALIGN (src) >= GET_MODE_ALIGNMENT (mode))
- && bytepos * BITS_PER_UNIT % GET_MODE_ALIGNMENT (mode) == 0
- && bytelen == GET_MODE_SIZE (mode))
+ && multiple_p (bytepos * BITS_PER_UNIT, GET_MODE_ALIGNMENT (mode))
+ && must_eq (bytelen, GET_MODE_SIZE (mode)))
{
  tmps[i] = gen_reg_rtx (mode);
  emit_move_insn (tmps[i], adjust_address (src, mode, bytepos));
}
   else if (COMPLEX_MODE_P (mode)
   && GET_MODE (src) == mode
-  && bytelen == GET_MODE_SIZE (mode))
+  && must_eq (bytelen, GET_MODE_SIZE (mode)))
/* Let emit_move_complex do the bulk of the work.  */
tmps[i] = src;
   else if (GET_CODE (src) == CONCAT)
{
- unsigned int slen = GET_MODE_SIZE (GET_MODE (src));
- unsigned int slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
- unsigned int elt = bytepos / slen0;
- unsigned int subpos = bytepos % slen0;
+ poly_int64 slen = GET_MODE_SIZE (GET_MODE (src));
+ poly_int64 slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
+ unsigned int elt;
+ 

[050/nnn] poly_int: reload<->ira interface

2017-10-23 Thread Richard Sandiford
This patch uses poly_int64 for:

- ira_reuse_stack_slot
- ira_mark_new_stack_slot
- ira_spilled_reg_stack_slot::width

all of which are part of the IRA/reload interface.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* ira-int.h (ira_spilled_reg_stack_slot::width): Change from
an unsigned int to a poly_uint64.
* ira.h (ira_reuse_stack_slot, ira_mark_new_stack_slot): Take the
sizes as poly_uint64s rather than unsigned ints.
* ira-color.c (ira_reuse_stack_slot, ira_mark_new_stack_slot):
Likewise.

Index: gcc/ira-int.h
===
--- gcc/ira-int.h   2017-10-23 16:52:18.222670182 +0100
+++ gcc/ira-int.h   2017-10-23 17:20:48.204761416 +0100
@@ -604,7 +604,7 @@ struct ira_spilled_reg_stack_slot
   /* RTL representation of the stack slot.  */
   rtx mem;
   /* Size of the stack slot.  */
-  unsigned int width;
+  poly_uint64_pod width;
 };
 
 /* The number of elements in the following array.  */
Index: gcc/ira.h
===
--- gcc/ira.h   2017-10-23 17:10:45.257213436 +0100
+++ gcc/ira.h   2017-10-23 17:20:48.204761416 +0100
@@ -200,8 +200,8 @@ extern void ira_mark_allocation_change (
 extern void ira_mark_memory_move_deletion (int, int);
 extern bool ira_reassign_pseudos (int *, int, HARD_REG_SET, HARD_REG_SET *,
  HARD_REG_SET *, bitmap);
-extern rtx ira_reuse_stack_slot (int, unsigned int, unsigned int);
-extern void ira_mark_new_stack_slot (rtx, int, unsigned int);
+extern rtx ira_reuse_stack_slot (int, poly_uint64, poly_uint64);
+extern void ira_mark_new_stack_slot (rtx, int, poly_uint64);
 extern bool ira_better_spill_reload_regno_p (int *, int *, rtx, rtx, rtx_insn 
*);
 extern bool ira_bad_reload_regno (int, rtx, rtx);
 
Index: gcc/ira-color.c
===
--- gcc/ira-color.c 2017-10-23 17:11:40.005487591 +0100
+++ gcc/ira-color.c 2017-10-23 17:20:48.204761416 +0100
@@ -4495,8 +4495,8 @@ ira_reassign_pseudos (int *spilled_pseud
TOTAL_SIZE.  In the case of failure to find a slot which can be
used for REGNO, the function returns NULL.  */
 rtx
-ira_reuse_stack_slot (int regno, unsigned int inherent_size,
- unsigned int total_size)
+ira_reuse_stack_slot (int regno, poly_uint64 inherent_size,
+ poly_uint64 total_size)
 {
   unsigned int i;
   int slot_num, best_slot_num;
@@ -4509,8 +4509,8 @@ ira_reuse_stack_slot (int regno, unsigne
 
   ira_assert (! ira_use_lra_p);
 
-  ira_assert (inherent_size == PSEUDO_REGNO_BYTES (regno)
- && inherent_size <= total_size
+  ira_assert (must_eq (inherent_size, PSEUDO_REGNO_BYTES (regno))
+ && must_le (inherent_size, total_size)
  && ALLOCNO_HARD_REGNO (allocno) < 0);
   if (! flag_ira_share_spill_slots)
 return NULL_RTX;
@@ -4533,8 +4533,8 @@ ira_reuse_stack_slot (int regno, unsigne
  slot = _spilled_reg_stack_slots[slot_num];
  if (slot->mem == NULL_RTX)
continue;
- if (slot->width < total_size
- || GET_MODE_SIZE (GET_MODE (slot->mem)) < inherent_size)
+ if (may_lt (slot->width, total_size)
+ || may_lt (GET_MODE_SIZE (GET_MODE (slot->mem)), inherent_size))
continue;
 
  EXECUTE_IF_SET_IN_BITMAP (>spilled_regs,
@@ -4586,7 +4586,7 @@ ira_reuse_stack_slot (int regno, unsigne
 }
   if (x != NULL_RTX)
 {
-  ira_assert (slot->width >= total_size);
+  ira_assert (must_ge (slot->width, total_size));
 #ifdef ENABLE_IRA_CHECKING
   EXECUTE_IF_SET_IN_BITMAP (>spilled_regs,
FIRST_PSEUDO_REGISTER, i, bi)
@@ -4615,7 +4615,7 @@ ira_reuse_stack_slot (int regno, unsigne
TOTAL_SIZE was allocated for REGNO.  We store this info for
subsequent ira_reuse_stack_slot calls.  */
 void
-ira_mark_new_stack_slot (rtx x, int regno, unsigned int total_size)
+ira_mark_new_stack_slot (rtx x, int regno, poly_uint64 total_size)
 {
   struct ira_spilled_reg_stack_slot *slot;
   int slot_num;
@@ -4623,7 +4623,7 @@ ira_mark_new_stack_slot (rtx x, int regn
 
   ira_assert (! ira_use_lra_p);
 
-  ira_assert (PSEUDO_REGNO_BYTES (regno) <= total_size);
+  ira_assert (must_le (PSEUDO_REGNO_BYTES (regno), total_size));
   allocno = ira_regno_allocno_map[regno];
   slot_num = -ALLOCNO_HARD_REGNO (allocno) - 2;
   if (slot_num == -1)


[049/nnn] poly_int: emit_inc

2017-10-23 Thread Richard Sandiford
This patch changes the LRA emit_inc routine so that it takes
a poly_int64 rather than an int.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* lra-constraints.c (emit_inc): Change inc_amount from an int
to a poly_int64.

Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2017-10-23 17:19:21.001863152 +0100
+++ gcc/lra-constraints.c   2017-10-23 17:20:47.003797985 +0100
@@ -3533,7 +3533,7 @@ process_address (int nop, bool check_onl
 
Return pseudo containing the result. */
 static rtx
-emit_inc (enum reg_class new_rclass, rtx in, rtx value, int inc_amount)
+emit_inc (enum reg_class new_rclass, rtx in, rtx value, poly_int64 inc_amount)
 {
   /* REG or MEM to be copied and incremented.  */
   rtx incloc = XEXP (value, 0);
@@ -3561,7 +3561,7 @@ emit_inc (enum reg_class new_rclass, rtx
   if (GET_CODE (value) == PRE_DEC || GET_CODE (value) == POST_DEC)
inc_amount = -inc_amount;
 
-  inc = GEN_INT (inc_amount);
+  inc = gen_int_mode (inc_amount, GET_MODE (value));
 }
 
   if (! post && REG_P (incloc))


[048/nnn] poly_int: cfgexpand stack variables

2017-10-23 Thread Richard Sandiford
This patch changes the type of stack_var::size from HOST_WIDE_INT
to poly_uint64.  The difference in signedness is because the
field was set by:

  v->size = tree_to_uhwi (size);


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* cfgexpand.c (stack_var::size): Change from a HOST_WIDE_INT
to a poly_uint64.
(add_stack_var, stack_var_cmp, partition_stack_vars)
(dump_stack_var_partition): Update accordingly.
(alloc_stack_frame_space): Take the size as a poly_int64 rather
than a HOST_WIDE_INT.
(expand_stack_vars, expand_one_stack_var_1): Handle polynomial sizes.
(defer_stack_allocation, estimated_stack_frame_size): Likewise.
(account_stack_vars, expand_one_var): Likewise.  Return a poly_uint64
rather than a HOST_WIDE_INT.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2017-10-23 17:18:53.827515374 +0100
+++ gcc/cfgexpand.c 2017-10-23 17:19:04.559212322 +0100
@@ -314,7 +314,7 @@ struct stack_var
 
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's representative.  */
-  HOST_WIDE_INT size;
+  poly_uint64 size;
 
   /* The *byte* alignment required for this variable.  Or as, with the
  size, the alignment for this partition.  */
@@ -390,7 +390,7 @@ align_base (HOST_WIDE_INT base, unsigned
Return the frame offset.  */
 
 static poly_int64
-alloc_stack_frame_space (HOST_WIDE_INT size, unsigned HOST_WIDE_INT align)
+alloc_stack_frame_space (poly_int64 size, unsigned HOST_WIDE_INT align)
 {
   poly_int64 offset, new_frame_offset;
 
@@ -443,10 +443,10 @@ add_stack_var (tree decl)
   tree size = TREE_CODE (decl) == SSA_NAME
 ? TYPE_SIZE_UNIT (TREE_TYPE (decl))
 : DECL_SIZE_UNIT (decl);
-  v->size = tree_to_uhwi (size);
+  v->size = tree_to_poly_uint64 (size);
   /* Ensure that all variables have size, so that  !=  for any two
  variables that are simultaneously live.  */
-  if (v->size == 0)
+  if (known_zero (v->size))
 v->size = 1;
   v->alignb = align_local_variable (decl);
   /* An alignment of zero can mightily confuse us later.  */
@@ -676,8 +676,8 @@ stack_var_cmp (const void *a, const void
   size_t ib = *(const size_t *)b;
   unsigned int aligna = stack_vars[ia].alignb;
   unsigned int alignb = stack_vars[ib].alignb;
-  HOST_WIDE_INT sizea = stack_vars[ia].size;
-  HOST_WIDE_INT sizeb = stack_vars[ib].size;
+  poly_int64 sizea = stack_vars[ia].size;
+  poly_int64 sizeb = stack_vars[ib].size;
   tree decla = stack_vars[ia].decl;
   tree declb = stack_vars[ib].decl;
   bool largea, largeb;
@@ -690,10 +690,9 @@ stack_var_cmp (const void *a, const void
 return (int)largeb - (int)largea;
 
   /* Secondary compare on size, decreasing  */
-  if (sizea > sizeb)
-return -1;
-  if (sizea < sizeb)
-return 1;
+  int diff = compare_sizes_for_sort (sizeb, sizea);
+  if (diff != 0)
+return diff;
 
   /* Tertiary compare on true alignment, decreasing.  */
   if (aligna < alignb)
@@ -904,7 +903,7 @@ partition_stack_vars (void)
 {
   size_t i = stack_vars_sorted[si];
   unsigned int ialign = stack_vars[i].alignb;
-  HOST_WIDE_INT isize = stack_vars[i].size;
+  poly_int64 isize = stack_vars[i].size;
 
   /* Ignore objects that aren't partition representatives. If we
  see a var that is not a partition representative, it must
@@ -916,7 +915,7 @@ partition_stack_vars (void)
{
  size_t j = stack_vars_sorted[sj];
  unsigned int jalign = stack_vars[j].alignb;
- HOST_WIDE_INT jsize = stack_vars[j].size;
+ poly_int64 jsize = stack_vars[j].size;
 
  /* Ignore objects that aren't partition representatives.  */
  if (stack_vars[j].representative != j)
@@ -932,8 +931,8 @@ partition_stack_vars (void)
 sizes, as the shorter vars wouldn't be adequately protected.
 Don't do that for "large" (unsupported) alignment objects,
 those aren't protected anyway.  */
- if ((asan_sanitize_stack_p ())
- && isize != jsize
+ if (asan_sanitize_stack_p ()
+ && may_ne (isize, jsize)
  && ialign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
break;
 
@@ -964,9 +963,9 @@ dump_stack_var_partition (void)
   if (stack_vars[i].representative != i)
continue;
 
-  fprintf (dump_file, "Partition %lu: size " HOST_WIDE_INT_PRINT_DEC
-  " align %u\n", (unsigned long) i, stack_vars[i].size,
-  stack_vars[i].alignb);
+  fprintf (dump_file, "Partition %lu: size ", (unsigned long) i);
+  print_dec (stack_vars[i].size, dump_file);
+  fprintf (dump_file, " align %u\n", stack_vars[i].alignb);
 
   for (j = i; j != EOC; j = stack_vars[j].next)

[047/nnn] poly_int: argument sizes

2017-10-23 Thread Richard Sandiford
This patch changes various bits of state related to argument sizes so
that they have type poly_int64 rather than HOST_WIDE_INT.  This includes:

- incoming_args::pops_args and incoming_args::size
- rtl_data::outgoing_args_size
- pending_stack_adjust
- stack_pointer_delta
- stack_usage::pushed_stack_size
- args_size::constant

It also changes TARGET_RETURN_POPS_ARGS so that the size of the
arguments passed in and the size returned by the hook are both
poly_int64s.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (return_pops_args): Treat both the input and output
sizes as poly_int64s rather than HOST_WIDE_INTS.
* targhooks.h (default_return_pops_args): Update accordingly.
* targhooks.c (default_return_pops_args): Likewise.
* doc/tm.texi: Regenerate.
* emit-rtl.h (incoming_args): Change pops_args, size and
outgoing_args_size from int to poly_int64_pod.
* function.h (expr_status): Change x_pending_stack_adjust and
x_stack_pointer_delta from int to poly_int64.
(args_size::constant): Change from HOST_WIDE_INT to poly_int64.
(ARGS_SIZE_RTX): Update accordingly.
* calls.c (highest_outgoing_arg_in_use): Change from int to
unsigned int.
(stack_usage_watermark, stored_args_watermark): New variables.
(stack_region_maybe_used_p, mark_stack_region_used): New functions.
(emit_call_1): Change the stack_size and rounded_stack_size
parameters from HOST_WIDE_INT to poly_int64.  Track n_popped
as a poly_int64.
(save_fixed_argument_area): Check stack_usage_watermark.
(initialize_argument_information): Change old_pending_adj from
a HOST_WIDE_INT * to a poly_int64_pod *.
(compute_argument_block_size): Return the size as a poly_int64
rather than an int.
(finalize_must_preallocate): Track polynomial argument sizes.
(compute_argument_addresses): Likewise.
(internal_arg_pointer_based_exp): Track polynomial offsets.
(mem_overlaps_already_clobbered_arg_p): Rename to...
(mem_might_overlap_already_clobbered_arg_p): ...this and take the
size as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
Check stored_args_used_watermark.
(load_register_parameters): Update accordingly.
(check_sibcall_argument_overlap_1): Likewise.
(combine_pending_stack_adjustment_and_call): Take the unadjusted
args size as a poly_int64 rather than an int.  Return a bool
indicating whether the optimization was possible and return
the new adjustment by reference.
(check_sibcall_argument_overlap): Track polynomail argument sizes.
Update stored_args_watermark.
(can_implement_as_sibling_call_p): Handle polynomial argument sizes.
(expand_call): Likewise.  Maintain stack_usage_watermark and
stored_args_watermark.  Update calls to
combine_pending_stack_adjustment_and_call.
(emit_library_call_value_1): Handle polynomial argument sizes.
Call stack_region_maybe_used_p and mark_stack_region_used.
Maintain stack_usage_watermark.
(store_one_arg): Likewise.  Update call to
mem_overlaps_already_clobbered_arg_p.
* config/arm/arm.c (arm_output_function_prologue): Add a cast to
HOST_WIDE_INT.
* config/avr/avr.c (avr_outgoing_args_size): Likewise.
* config/microblaze/microblaze.c (microblaze_function_prologue):
Likewise.
* config/cr16/cr16.c (cr16_return_pops_args): Update for new
TARGET_RETURN_POPS_ARGS interface.
(cr16_compute_frame, cr16_initial_elimination_offset): Add casts
to HOST_WIDE_INT.
* config/ft32/ft32.c (ft32_compute_frame): Likewise.
* config/i386/i386.c (ix86_return_pops_args): Update for new
TARGET_RETURN_POPS_ARGS interface.
(ix86_expand_split_stack_prologue): Add a cast to HOST_WIDE_INT.
* config/moxie/moxie.c (moxie_compute_frame): Likewise.
* config/m68k/m68k.c (m68k_return_pops_args): Update for new
TARGET_RETURN_POPS_ARGS interface.
* config/vax/vax.c (vax_return_pops_args): Likewise.
* config/pa/pa.h (STACK_POINTER_OFFSET): Add a cast to poly_int64.
(EXIT_IGNORE_STACK): Update reference to crtl->outgoing_args_size.
* config/arm/arm.h (CALLER_INTERWORKING_SLOT_SIZE): Likewise.
* config/powerpcspe/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
* config/powerpcspe/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
* config/powerpcspe/powerpcspe.h (STACK_DYNAMIC_OFFSET): Likewise.
* config/rs6000/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
* config/rs6000/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
* config/rs6000/rs6000.h (STACK_DYNAMIC_OFFSET): Likewise.
* 

[046/nnn] poly_int: instantiate_virtual_regs

2017-10-23 Thread Richard Sandiford
This patch makes the instantiate virtual regs pass track offsets
as poly_ints.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* function.c (in_arg_offset, var_offset, dynamic_offset)
(out_arg_offset, cfa_offset): Change from int to poly_int64.
(instantiate_new_reg): Return the new offset as a poly_int64_pod
rather than a HOST_WIDE_INT.
(instantiate_virtual_regs_in_rtx): Track polynomial offsets.
(instantiate_virtual_regs_in_insn): Likewise.

Index: gcc/function.c
===
--- gcc/function.c  2017-10-23 17:18:53.834514759 +0100
+++ gcc/function.c  2017-10-23 17:18:59.743148042 +0100
@@ -1367,11 +1367,11 @@ initial_value_entry (int i, rtx *hreg, r
routines.  They contain the offsets of the virtual registers from their
respective hard registers.  */
 
-static int in_arg_offset;
-static int var_offset;
-static int dynamic_offset;
-static int out_arg_offset;
-static int cfa_offset;
+static poly_int64 in_arg_offset;
+static poly_int64 var_offset;
+static poly_int64 dynamic_offset;
+static poly_int64 out_arg_offset;
+static poly_int64 cfa_offset;
 
 /* In most machines, the stack pointer register is equivalent to the bottom
of the stack.  */
@@ -1418,10 +1418,10 @@ #define STACK_DYNAMIC_OFFSET(FNDECL)\
offset indirectly through the pointer.  Otherwise, return 0.  */
 
 static rtx
-instantiate_new_reg (rtx x, HOST_WIDE_INT *poffset)
+instantiate_new_reg (rtx x, poly_int64_pod *poffset)
 {
   rtx new_rtx;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 
   if (x == virtual_incoming_args_rtx)
 {
@@ -1480,7 +1480,7 @@ instantiate_virtual_regs_in_rtx (rtx *lo
   if (rtx x = *loc)
{
  rtx new_rtx;
- HOST_WIDE_INT offset;
+ poly_int64 offset;
  switch (GET_CODE (x))
{
case REG:
@@ -1533,7 +1533,7 @@ safe_insn_predicate (int code, int opera
 static void
 instantiate_virtual_regs_in_insn (rtx_insn *insn)
 {
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   int insn_code, i;
   bool any_change = false;
   rtx set, new_rtx, x;
@@ -1572,7 +1572,8 @@ instantiate_virtual_regs_in_insn (rtx_in
 to the generic case is avoiding a new pseudo and eliminating a
 move insn in the initial rtl stream.  */
   new_rtx = instantiate_new_reg (SET_SRC (set), );
-  if (new_rtx && offset != 0
+  if (new_rtx
+ && maybe_nonzero (offset)
  && REG_P (SET_DEST (set))
  && REGNO (SET_DEST (set)) > LAST_VIRTUAL_REGISTER)
{
@@ -1598,17 +1599,18 @@ instantiate_virtual_regs_in_insn (rtx_in
 
   /* Handle a plus involving a virtual register by determining if the
 operands remain valid if they're modified in place.  */
+  poly_int64 delta;
   if (GET_CODE (SET_SRC (set)) == PLUS
  && recog_data.n_operands >= 3
  && recog_data.operand_loc[1] ==  (SET_SRC (set), 0)
  && recog_data.operand_loc[2] ==  (SET_SRC (set), 1)
- && CONST_INT_P (recog_data.operand[2])
+ && poly_int_rtx_p (recog_data.operand[2], )
  && (new_rtx = instantiate_new_reg (recog_data.operand[1], )))
{
- offset += INTVAL (recog_data.operand[2]);
+ offset += delta;
 
  /* If the sum is zero, then replace with a plain move.  */
- if (offset == 0
+ if (known_zero (offset)
  && REG_P (SET_DEST (set))
  && REGNO (SET_DEST (set)) > LAST_VIRTUAL_REGISTER)
{
@@ -1686,7 +1688,7 @@ instantiate_virtual_regs_in_insn (rtx_in
  new_rtx = instantiate_new_reg (x, );
  if (new_rtx == NULL)
continue;
- if (offset == 0)
+ if (known_zero (offset))
x = new_rtx;
  else
{
@@ -1711,7 +1713,7 @@ instantiate_virtual_regs_in_insn (rtx_in
  new_rtx = instantiate_new_reg (SUBREG_REG (x), );
  if (new_rtx == NULL)
continue;
- if (offset != 0)
+ if (maybe_nonzero (offset))
{
  start_sequence ();
  new_rtx = expand_simple_binop


[045/nnn] poly_int: REG_ARGS_SIZE

2017-10-23 Thread Richard Sandiford
This patch adds new utility functions for manipulating REG_ARGS_SIZE
notes and allows the notes to carry polynomial as well as constant sizes.

The code was inconsistent about whether INT_MIN or HOST_WIDE_INT_MIN
should be used to represent an unknown size.  The patch uses
HOST_WIDE_INT_MIN throughout.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* rtl.h (get_args_size, add_args_size_note): New functions.
(find_args_size_adjust): Return a poly_int64 rather than a
HOST_WIDE_INT.
(fixup_args_size_notes): Likewise.  Make the same change to the
end_args_size parameter.
* rtlanal.c (get_args_size, add_args_size_note): New functions.
* builtins.c (expand_builtin_trap): Use add_args_size_note.
* calls.c (emit_call_1): Likewise.
* explow.c (adjust_stack_1): Likewise.
* cfgcleanup.c (old_insns_match_p): Update use of
find_args_size_adjust.
* combine.c (distribute_notes): Track polynomial arg sizes.
* dwarf2cfi.c (dw_trace_info): Change beg_true_args_size,
end_true_args_size, beg_delay_args_size and end_delay_args_size
from HOST_WIDE_INT to poly_int64.
(add_cfi_args_size): Take the args_size as a poly_int64 rather
than a HOST_WIDE_INT.
(notice_args_size, notice_eh_throw, maybe_record_trace_start)
(maybe_record_trace_start_abnormal, scan_trace, connect_traces): Track
polynomial arg sizes.
* emit-rtl.c (try_split): Use get_args_size.
* recog.c (peep2_attempt): Likewise.
* reload1.c (reload_as_needed): Likewise.
* expr.c (find_args_size_adjust): Return the adjustment as a
poly_int64 rather than a HOST_WIDE_INT.
(fixup_args_size_notes): Change end_args_size from a HOST_WIDE_INT
to a poly_int64 and change the return type in the same way.
(emit_single_push_insn): Track polynomial arg sizes.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2017-10-23 17:16:55.754801166 +0100
+++ gcc/rtl.h   2017-10-23 17:18:57.862160702 +0100
@@ -3329,6 +3329,7 @@ extern rtx get_related_value (const_rtx)
 extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT);
 extern void split_const (rtx, rtx *, rtx *);
 extern rtx strip_offset (rtx, poly_int64_pod *);
+extern poly_int64 get_args_size (const_rtx);
 extern bool unsigned_reg_p (rtx);
 extern int reg_mentioned_p (const_rtx, const_rtx);
 extern int count_occurrences (const_rtx, const_rtx, int);
@@ -3364,6 +3365,7 @@ extern int find_regno_fusage (const_rtx,
 extern rtx alloc_reg_note (enum reg_note, rtx, rtx);
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx_insn *, enum reg_note, int);
+extern void add_args_size_note (rtx_insn *, poly_int64);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
 extern rtx duplicate_reg_note (rtx);
 extern void remove_note (rtx_insn *, const_rtx);
@@ -3954,8 +3956,8 @@ extern void emit_jump (rtx);
 /* In expr.c */
 extern rtx move_by_pieces (rtx, rtx, unsigned HOST_WIDE_INT,
   unsigned int, int);
-extern HOST_WIDE_INT find_args_size_adjust (rtx_insn *);
-extern int fixup_args_size_notes (rtx_insn *, rtx_insn *, int);
+extern poly_int64 find_args_size_adjust (rtx_insn *);
+extern poly_int64 fixup_args_size_notes (rtx_insn *, rtx_insn *, poly_int64);
 
 /* In expmed.c */
 extern void init_expmed (void);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2017-10-23 17:18:53.836514583 +0100
+++ gcc/rtlanal.c   2017-10-23 17:18:57.862160702 +0100
@@ -937,6 +937,15 @@ strip_offset (rtx x, poly_int64_pod *off
   *offset_out = 0;
   return x;
 }
+
+/* Return the argument size in REG_ARGS_SIZE note X.  */
+
+poly_int64
+get_args_size (const_rtx x)
+{
+  gcc_checking_assert (REG_NOTE_KIND (x) == REG_ARGS_SIZE);
+  return rtx_to_poly_int64 (XEXP (x, 0));
+}
 
 /* Return the number of places FIND appears within X.  If COUNT_DEST is
zero, we do not count occurrences inside the destination of a SET.  */
@@ -2362,6 +2371,15 @@ add_int_reg_note (rtx_insn *insn, enum r
   datum, REG_NOTES (insn));
 }
 
+/* Add a REG_ARGS_SIZE note to INSN with value VALUE.  */
+
+void
+add_args_size_note (rtx_insn *insn, poly_int64 value)
+{
+  gcc_checking_assert (!find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX));
+  add_reg_note (insn, REG_ARGS_SIZE, gen_int_mode (value, Pmode));
+}
+
 /* Add a register note like NOTE to INSN.  */
 
 void
Index: gcc/builtins.c
===
--- gcc/builtins.c  2017-10-23 17:18:42.394520412 +0100
+++ gcc/builtins.c  2017-10-23 17:18:57.855161317 +0100
@@ -5027,7 +5027,7 @@ expand_builtin_trap (void)
 

[043/nnn] poly_int: frame allocations

2017-10-23 Thread Richard Sandiford
This patch converts the frame allocation code (mostly in function.c)
to use poly_int64 rather than HOST_WIDE_INT for frame offsets and
sizes.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* function.h (frame_space): Change start and length from HOST_WIDE_INT
to poly_int64.
(get_frame_size): Return the size as a poly_int64 rather than a
HOST_WIDE_INT.
(frame_offset_overflow): Take the offset as a poly_int64 rather
than a HOST_WIDE_INT.
(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
(assign_stack_temp): Likewise for the size.
* function.c (get_frame_size): Return a poly_int64 rather than
a HOST_WIDE_INT.
(frame_offset_overflow): Take the offset as a poly_int64 rather
than a HOST_WIDE_INT.
(try_fit_stack_local): Take the start, length and size as poly_int64s
rather than HOST_WIDE_INTs.  Return the offset as a poly_int64_pod
rather than a HOST_WIDE_INT.
(add_frame_space): Take the start and end as poly_int64s rather than
HOST_WIDE_INTs.
(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
(assign_stack_temp): Likewise for the size.
(temp_slot): Change size, base_offset and full_size from HOST_WIDE_INT
to poly_int64.
(find_temp_slot_from_address): Handle polynomial offsets.
(combine_temp_slots): Likewise.
* emit-rtl.h (rtl_data::x_frame_offset): Change from HOST_WIDE_INT
to poly_int64.
* cfgexpand.c (alloc_stack_frame_space): Return the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(expand_one_stack_var_at): Take the offset as a poly_int64 rather
than a HOST_WIDE_INT.
(expand_stack_vars, expand_one_stack_var_1, expand_used_vars): Handle
polynomial frame offsets.
* config/m32r/m32r-protos.h (m32r_compute_frame_size): Take the size
as a poly_int64 rather than an int.
* config/m32r/m32r.c (m32r_compute_frame_size): Likewise.
* config/v850/v850-protos.h (compute_frame_size): Likewise.
* config/v850/v850.c (compute_frame_size): Likewise.
* config/xtensa/xtensa-protos.h (compute_frame_size): Likewise.
* config/xtensa/xtensa.c (compute_frame_size): Likewise.
* config/pa/pa-protos.h (pa_compute_frame_size): Likewise.
* config/pa/pa.c (pa_compute_frame_size): Likewise.
* explow.h (get_dynamic_stack_base): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* explow.c (get_dynamic_stack_base): Likewise.
* final.c (final_start_function): Use the constant lower bound
of the frame size for -Wframe-larger-than.
* ira.c (do_reload): Adjust for new get_frame_size return type.
* lra.c (lra): Likewise.
* reload1.c (reload): Likewise.
* config/avr/avr.c (avr_asm_function_end_prologue): Likewise.
* config/pa/pa.h (EXIT_IGNORE_STACK): Likewise.
* rtlanal.c (get_initial_register_offset): Return the offset as
a poly_int64 rather than a HOST_WIDE_INT.

Index: gcc/function.h
===
--- gcc/function.h  2017-10-23 17:07:40.163546918 +0100
+++ gcc/function.h  2017-10-23 17:18:53.834514759 +0100
@@ -187,8 +187,8 @@ struct GTY(()) frame_space
 {
   struct frame_space *next;
 
-  HOST_WIDE_INT start;
-  HOST_WIDE_INT length;
+  poly_int64 start;
+  poly_int64 length;
 };
 
 struct GTY(()) stack_usage
@@ -571,19 +571,19 @@ extern void free_after_compilation (stru
 /* Return size needed for stack frame based on slots so far allocated.
This size counts from zero.  It is not rounded to STACK_BOUNDARY;
the caller may have to do that.  */
-extern HOST_WIDE_INT get_frame_size (void);
+extern poly_int64 get_frame_size (void);
 
 /* Issue an error message and return TRUE if frame OFFSET overflows in
the signed target pointer arithmetics for function FUNC.  Otherwise
return FALSE.  */
-extern bool frame_offset_overflow (HOST_WIDE_INT, tree);
+extern bool frame_offset_overflow (poly_int64, tree);
 
 extern unsigned int spill_slot_alignment (machine_mode);
 
-extern rtx assign_stack_local_1 (machine_mode, HOST_WIDE_INT, int, int);
-extern rtx assign_stack_local (machine_mode, HOST_WIDE_INT, int);
-extern rtx assign_stack_temp_for_type (machine_mode, HOST_WIDE_INT, tree);
-extern rtx assign_stack_temp (machine_mode, HOST_WIDE_INT);
+extern rtx assign_stack_local_1 (machine_mode, poly_int64, int, int);
+extern rtx assign_stack_local (machine_mode, poly_int64, int);
+extern rtx assign_stack_temp_for_type (machine_mode, poly_int64, tree);
+extern rtx assign_stack_temp (machine_mode, poly_int64);
 extern rtx assign_temp (tree, int, int);
 extern void update_temp_slot_address (rtx, rtx);

[044/nnn] poly_int: push_block/emit_push_insn

2017-10-23 Thread Richard Sandiford
This patch changes the "extra" parameters to push_block and
emit_push_insn from int to poly_int64.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.h (push_block, emit_push_insn): Change the "extra" parameter
from HOST_WIDE_INT to poly_int64.
* expr.c (push_block, emit_push_insn): Likewise.

Index: gcc/expr.h
===
--- gcc/expr.h  2017-10-23 17:18:43.842393134 +0100
+++ gcc/expr.h  2017-10-23 17:18:56.434286222 +0100
@@ -233,11 +233,11 @@ extern rtx emit_move_resolve_push (machi
 
 /* Push a block of length SIZE (perhaps variable)
and return an rtx to address the beginning of the block.  */
-extern rtx push_block (rtx, int, int);
+extern rtx push_block (rtx, poly_int64, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
 extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-   int, rtx, int, rtx, rtx, int, rtx, bool);
+   int, rtx, poly_int64, rtx, rtx, int, rtx, bool);
 
 /* Extract the accessible bit-range from a COMPONENT_REF.  */
 extern void get_bit_range (poly_uint64_pod *, poly_uint64_pod *, tree,
Index: gcc/expr.c
===
--- gcc/expr.c  2017-10-23 17:18:47.661057448 +0100
+++ gcc/expr.c  2017-10-23 17:18:56.434286222 +0100
@@ -3865,19 +3865,19 @@ compress_float_constant (rtx x, rtx y)
otherwise, the padding comes at high addresses.  */
 
 rtx
-push_block (rtx size, int extra, int below)
+push_block (rtx size, poly_int64 extra, int below)
 {
   rtx temp;
 
   size = convert_modes (Pmode, ptr_mode, size, 1);
   if (CONSTANT_P (size))
 anti_adjust_stack (plus_constant (Pmode, size, extra));
-  else if (REG_P (size) && extra == 0)
+  else if (REG_P (size) && known_zero (extra))
 anti_adjust_stack (size);
   else
 {
   temp = copy_to_mode_reg (Pmode, size);
-  if (extra != 0)
+  if (maybe_nonzero (extra))
temp = expand_binop (Pmode, add_optab, temp,
 gen_int_mode (extra, Pmode),
 temp, 0, OPTAB_LIB_WIDEN);
@@ -3887,7 +3887,7 @@ push_block (rtx size, int extra, int bel
   if (STACK_GROWS_DOWNWARD)
 {
   temp = virtual_outgoing_args_rtx;
-  if (extra != 0 && below)
+  if (maybe_nonzero (extra) && below)
temp = plus_constant (Pmode, temp, extra);
 }
   else
@@ -3895,7 +3895,7 @@ push_block (rtx size, int extra, int bel
   if (CONST_INT_P (size))
temp = plus_constant (Pmode, virtual_outgoing_args_rtx,
  -INTVAL (size) - (below ? 0 : extra));
-  else if (extra != 0 && !below)
+  else if (maybe_nonzero (extra) && !below)
temp = gen_rtx_PLUS (Pmode, virtual_outgoing_args_rtx,
 negate_rtx (Pmode, plus_constant (Pmode, size,
   extra)));
@@ -4269,7 +4269,7 @@ memory_load_overlap (rtx x, rtx y, HOST_
 
 bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
-   unsigned int align, int partial, rtx reg, int extra,
+   unsigned int align, int partial, rtx reg, poly_int64 extra,
rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
rtx alignment_pad, bool sibcall_p)
 {
@@ -4357,9 +4357,11 @@ emit_push_insn (rtx x, machine_mode mode
  /* Push padding now if padding above and stack grows down,
 or if padding below and stack grows up.
 But if space already allocated, this has already been done.  */
- if (extra && args_addr == 0
- && where_pad != PAD_NONE && where_pad != stack_direction)
-   anti_adjust_stack (GEN_INT (extra));
+ if (maybe_nonzero (extra)
+ && args_addr == 0
+ && where_pad != PAD_NONE
+ && where_pad != stack_direction)
+   anti_adjust_stack (gen_int_mode (extra, Pmode));
 
  move_by_pieces (NULL, xinner, INTVAL (size) - used, align, 0);
}
@@ -4480,9 +4482,11 @@ emit_push_insn (rtx x, machine_mode mode
   /* Push padding now if padding above and stack grows down,
 or if padding below and stack grows up.
 But if space already allocated, this has already been done.  */
-  if (extra && args_addr == 0
- && where_pad != PAD_NONE && where_pad != stack_direction)
-   anti_adjust_stack (GEN_INT (extra));
+  if (maybe_nonzero (extra)
+ && args_addr == 0
+ && where_pad != PAD_NONE
+ && where_pad != stack_direction)
+   anti_adjust_stack (gen_int_mode (extra, Pmode));
 
   /* If we make space by pushing it, we might as well push
 the real data.  Otherwise, we can leave OFFSET nonzero
@@ -4531,9 

[042/nnn] poly_int: reload1.c

2017-10-23 Thread Richard Sandiford
This patch makes a few small poly_int64 changes to reload1.c,
mostly related to eliminations.  Again, there's no real expectation
that reload will be used for targets that have polynomial-sized modes,
but it seemed easier to convert it anyway.


2017-10-23  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* reload1.c (elim_table): Change initial_offset, offset and
previous_offset from HOST_WIDE_INT to poly_int64_pod.
(offsets_at): Change the target array's element type from
HOST_WIDE_INT to poly_int64_pod.
(set_label_offsets, eliminate_regs_1, eliminate_regs_in_insn)
(elimination_costs_in_insn, update_eliminable_offsets)
(verify_initial_elim_offsets, set_offsets_for_label)
(init_eliminable_invariants): Update after above changes.

Index: gcc/reload1.c
===
--- gcc/reload1.c   2017-10-23 17:18:51.486721146 +0100
+++ gcc/reload1.c   2017-10-23 17:18:52.641619623 +0100
@@ -261,13 +261,13 @@ struct elim_table
 {
   int from;/* Register number to be eliminated.  */
   int to;  /* Register number used as replacement.  */
-  HOST_WIDE_INT initial_offset;/* Initial difference between values.  
*/
+  poly_int64_pod initial_offset; /* Initial difference between values.  */
   int can_eliminate;   /* Nonzero if this elimination can be done.  */
   int can_eliminate_previous;  /* Value returned by TARGET_CAN_ELIMINATE
   target hook in previous scan over insns
   made by reload.  */
-  HOST_WIDE_INT offset;/* Current offset between the two regs. 
 */
-  HOST_WIDE_INT previous_offset;/* Offset at end of previous insn.  */
+  poly_int64_pod offset;   /* Current offset between the two regs.  */
+  poly_int64_pod previous_offset; /* Offset at end of previous insn.  */
   int ref_outside_mem; /* "to" has been referenced outside a MEM.  */
   rtx from_rtx;/* REG rtx for the register to be 
eliminated.
   We cannot simply compare the number since
@@ -313,7 +313,7 @@ #define NUM_ELIMINABLE_REGS ARRAY_SIZE (
 
 static int first_label_num;
 static char *offsets_known_at;
-static HOST_WIDE_INT (*offsets_at)[NUM_ELIMINABLE_REGS];
+static poly_int64_pod (*offsets_at)[NUM_ELIMINABLE_REGS];
 
 vec *reg_equivs;
 
@@ -2351,9 +2351,9 @@ set_label_offsets (rtx x, rtx_insn *insn
   where the offsets disagree.  */
 
for (i = 0; i < NUM_ELIMINABLE_REGS; i++)
- if (offsets_at[CODE_LABEL_NUMBER (x) - first_label_num][i]
- != (initial_p ? reg_eliminate[i].initial_offset
- : reg_eliminate[i].offset))
+ if (may_ne (offsets_at[CODE_LABEL_NUMBER (x) - first_label_num][i],
+ (initial_p ? reg_eliminate[i].initial_offset
+  : reg_eliminate[i].offset)))
reg_eliminate[i].can_eliminate = 0;
 
   return;
@@ -2436,7 +2436,7 @@ set_label_offsets (rtx x, rtx_insn *insn
   /* If we reach here, all eliminations must be at their initial
 offset because we are doing a jump to a variable address.  */
   for (p = reg_eliminate; p < _eliminate[NUM_ELIMINABLE_REGS]; p++)
-   if (p->offset != p->initial_offset)
+   if (may_ne (p->offset, p->initial_offset))
  p->can_eliminate = 0;
   break;
 
@@ -2593,8 +2593,9 @@ eliminate_regs_1 (rtx x, machine_mode me
   We special-case the commonest situation in
   eliminate_regs_in_insn, so just replace a PLUS with a
   PLUS here, unless inside a MEM.  */
-   if (mem_mode != 0 && CONST_INT_P (XEXP (x, 1))
-   && INTVAL (XEXP (x, 1)) == - ep->previous_offset)
+   if (mem_mode != 0
+   && CONST_INT_P (XEXP (x, 1))
+   && must_eq (INTVAL (XEXP (x, 1)), -ep->previous_offset))
  return ep->to_rtx;
else
  return gen_rtx_PLUS (Pmode, ep->to_rtx,
@@ -3344,7 +3345,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
   if (plus_cst_src)
 {
   rtx reg = XEXP (plus_cst_src, 0);
-  HOST_WIDE_INT offset = INTVAL (XEXP (plus_cst_src, 1));
+  poly_int64 offset = INTVAL (XEXP (plus_cst_src, 1));
 
   if (GET_CODE (reg) == SUBREG)
reg = SUBREG_REG (reg);
@@ -3364,7 +3365,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
   increase the cost of the insn by replacing a simple REG
   with (plus (reg sp) CST).  So try only when we already
   had a PLUS before.  */
-   if (offset == 0 || plus_src)
+   if (known_zero (offset) || plus_src)
  {
rtx new_src = plus_constant (GET_MODE 

  1   2   3   >