Re: [PATCH] PR fortran/91785 -- Set locus for inquiry parameter

2019-10-01 Thread Jerry DeLisle

On 10/1/19 5:06 PM, Steve Kargl wrote:

The attached patch has been tested on x86_64-*-freebsd.
OK to commit?



Ok Steve,

Thanks,

Jerry

The patch prevents an ICE by setting the locus of an
inquiry parameter.

2019-10-01  Steven G. Kargl  



Re: [PATCH] PR fortran/91784 -- Simplify EXPR_OP in conversion of constants

2019-10-01 Thread Jerry DeLisle

On 10/1/19 4:46 PM, Steve Kargl wrote:

The attached patch has been tested on x86_64-*-freebsd.
OK to commmit?


OK, thanks Steve.



In a previous patch, I specialized the simplfication of
an EXPR_OP to the case of an inserted parenthesis.  This
was too restrictive as evidenced by the new test case.
The patch simply does a simplification of an expression.
  
2019-10-01  Steven G. Kargl  


PR fortran/91784
* simplify.c (gfc_convert_constant): Simplify expression if the
expression type is EXPR_OP.

2019-10-01  Steven G. Kargl  

PR fortran/91784
* gfortran.dg/pr91784.f90: New test.





Re: [PATCH] PR fortran/91942 -- Inquiry parameter cannot be IO tag

2019-10-01 Thread Jerry DeLisle

On 10/1/19 4:10 PM, Steve Kargl wrote:

The attached patch has been tested on x86_64-*-freebsd.
OK to commit?



OK Steve, thanks,

Jerry


Handle :: tokens in C for C2x

2019-10-01 Thread Joseph Myers
As part of adding [[]]-style attributes, C2x adds the token :: for use
in scoped attribute names.

This patch adds corresponding support for that token in C to GCC.  The
token is supported both for C2x and for older gnu* standards (on the
basis that extensions are normally supported in older gnu* versions;
people will expect to be able to use [[]] attributes, before C2x is
the default, without needing to use -std=gnu2x).

There are no cases in older C standards where the token : can be
followed by a token starting with : in syntactically valid sources;
the only cases the :: token could break in older standard C thus are
ones involving concatenation of pp-tokens where the result does not
end up as tokens (e.g., gets stringized).  In GNU C extensions, the
main case where :: might appear in existing sources is in asm
statements, and the C parser is thus made to handle it like two
consecutive : tokens, which the C++ parser already does.  A limited
test of various positionings of :: in asm statements is added to the
testsuite (in particular, to cover the syntax error when :: means too
many colons but a single : would be OK), but existing tests cover a
variety of styles there anyway.

Technically there are cases in Objective-C and OpenMP for which this
also changes how previously valid code is lexed: the objc-selector-arg
syntax allows multiple consecutive : tokens (although I don't think
they are particularly useful there), while OpenMP syntax includes
array section syntax such as [:] which, before :: was a token, could
also be written as [::> (there might be other OpenMP cases potentially
affected, I didn't check all the OpenMP syntax in detail).  I don't
think either of those cases affects the basis for supporting the ::
token in all -std=gnu* modes, or that there is any obvious need to
special-case handling of CPP_SCOPE tokens for those constructs the way
there is for asm statements.

cpp_avoid_paste, which determines when spaces need adding between
tokens in preprocessed output where there wouldn't otherwise be
whitespace between them (e.g. if stringized), already inserts space
between : and : unconditionally, rather than only for C++, so no
change is needed there (but a C2x test is added that such space is
indeed inserted).

Bootstrapped with no regressions on x86-64-pc-linux-gnu.  Applied to
mainline.

gcc/c:
2019-10-02  Joseph Myers  

* c-parser.c (c_parser_asm_statement): Handle CPP_SCOPE like two
CPP_COLON tokens.

gcc/testsuite:
2019-10-02  Joseph Myers  

* gcc.dg/asm-scope-1.c, gcc.dg/cpp/c11-scope-1.c,
gcc.dg/cpp/c17-scope-1.c, gcc.dg/cpp/c2x-scope-1.c,
gcc.dg/cpp/c2x-scope-2.c, gcc.dg/cpp/c90-scope-1.c,
gcc.dg/cpp/c94-scope-1.c, gcc.dg/cpp/c99-scope-1.c,
gcc.dg/cpp/gnu11-scope-1.c, gcc.dg/cpp/gnu17-scope-1.c,
gcc.dg/cpp/gnu89-scope-1.c, gcc.dg/cpp/gnu99-scope-1.c: New tests.

libcpp:
2019-10-02  Joseph Myers  

* include/cpplib.h (struct cpp_options): Add member scope.
* init.c (struct lang_flags, lang_defaults): Likewise.
(cpp_set_lang): Set scope member of pfile.
* lex.c (_cpp_lex_direct): Test CPP_OPTION (pfile, scope) not
CPP_OPTION (pfile, cplusplus) for creating CPP_SCOPE tokens.

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 276414)
+++ gcc/c/c-parser.c(working copy)
@@ -6411,8 +6411,10 @@
 
The form with asm-goto-operands is valid if and only if the
asm-qualifier-list contains goto, and is the only allowed form in that case.
-   Duplicate asm-qualifiers are not allowed.  */
+   Duplicate asm-qualifiers are not allowed.
 
+   The :: token is considered equivalent to two consecutive : tokens.  */
+
 static tree
 c_parser_asm_statement (c_parser *parser)
 {
@@ -6509,11 +6511,21 @@
   nsections = 3 + is_goto;
   for (section = 0; section < nsections; ++section)
 {
-  if (!c_parser_require (parser, CPP_COLON,
-is_goto
-? G_("expected %<:%>")
-: G_("expected %<:%> or %<)%>"),
-UNKNOWN_LOCATION, is_goto))
+  if (c_parser_next_token_is (parser, CPP_SCOPE))
+   {
+ ++section;
+ if (section == nsections)
+   {
+ c_parser_error (parser, "expected %<)%>");
+ goto error_close_paren;
+   }
+ c_parser_consume_token (parser);
+   }
+  else if (!c_parser_require (parser, CPP_COLON,
+ is_goto
+ ? G_("expected %<:%>")
+ : G_("expected %<:%> or %<)%>"),
+ UNKNOWN_LOCATION, is_goto))
goto error_close_paren;
 
   /* Once past any colon, we're no longer a simple asm.  */
@@ -6520,6 +6532,7 @@
   simple = false;
 
   if ((!c_parser_next_token_is (parser, CPP_COLON)
+  

[PATCH] PR fortran/91785 -- Set locus for inquiry parameter

2019-10-01 Thread Steve Kargl
The attached patch has been tested on x86_64-*-freebsd.
OK to commit?

The patch prevents an ICE by setting the locus of an
inquiry parameter.

2019-10-01  Steven G. Kargl  Index: gcc/fortran/primary.c
===
--- gcc/fortran/primary.c	(revision 276426)
+++ gcc/fortran/primary.c	(working copy)
@@ -2331,6 +2331,8 @@ gfc_match_varspec (gfc_expr *primary, int equiv_flag, 
 
   if (tmp && tmp->type == REF_INQUIRY)
 	{
+	  if (!primary->where.lb || !primary->where.nextc)
+	primary->where = gfc_current_locus;
 	  gfc_simplify_expr (primary, 0);
 
 	  if (primary->expr_type == EXPR_CONSTANT)
Index: gcc/testsuite/gfortran.dg/pr91785.f90
===
--- gcc/testsuite/gfortran.dg/pr91785.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91785.f90	(working copy)
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR fortran/91785
+! Code contributed by Gerhard Steinmetz
+program p
+   complex :: a(*)   ! { dg-error "Assumed size array at" }
+   real :: b(2)
+   b = a%im  ! { dg-error "upper bound in the last dimension" }
+end


Re: [PATCH], V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

2019-10-01 Thread Segher Boessenkool
Hi Mike,

On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> I needed
> to add a second memory constraint ('eM') that prevents using a prefixed
> address.

Do we need both em and eM?  Do we ever want to allow prefixed insns but not
pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
prefix trick won't work (because there are multiple insns in the template);
we could solve that some other way (by inserting the p's manually for
example).

But what should inline asm users do wrt prefixed/pcrel memory?  Should
they not use prefixed memory at all?  That means for asm we should always
disallow prefixed for "m".

Having both em and eM names is a bit confusing (which is what?)  The eM
one should not be documented in the user manual, probably.

Maybe just using wY here will work just as well?  That is also not ideal
of course, but we already have that one anyway.

> 4) In the previous patch, I missed setting the prefixed size and non prefixed
> size for mov_ppc64 in the insn.  This pattern is used for moving PTImode
> in GPR registers (on non-VSX systems, it would move TImode also).  By the time
> it gets to final, it will have been split, but it is still useful to get the
> sizes correct before the mode is split.

So that is a separate patch?  Please send it as one, then?

> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> + the offset must be 0.  The prefixed PLWA does not have this
> + restriction.  */
> +  if (TARGET_PREFIXED_ADDR
> +  && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> +return true;

Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
part of all its callers?

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +return false;
> +
> +  enum insn_form iform
> += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PREFIXED_NUMERIC
> +  && iform != INSN_FORM_PCREL_LOCAL
> +  && iform != INSN_FORM_BAD);
> +})
> +
> +(define_predicate "non_pcrel_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +return false;
> +
> +  enum insn_form iform
> += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PCREL_EXTERNAL
> +  && iform != INSN_FORM_PCREL_LOCAL
> +  && iform != INSN_FORM_BAD);
> +})

Why does non_prefixed_memory not check INSN_FORM_PCREL_EXTERNAL?  Why does
non_prefixed_memory not use non_pcrel_memory, instead of open-coding it?

What is INSN_FORM_BAD about, in both functions?

> +;; Return 1 if op is either a register operand or a memory operand that does
> +;; not use a PC-relative address.
> +(define_predicate "reg_or_non_pcrel_memory"
> +  (match_code "reg,subreg,mem")
> +{
> +  if (REG_P (op) || SUBREG_P (op))
> +return register_operand (op, mode);
> +
> +  return non_pcrel_memory (op, mode);
> +})

Why do we need this predicate?  Should it use register_operand like this,
or should it use gpc_reg_operand?

> +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);

Similar as above: should TARGET_PCREL be part of pcrel_local_address?

> @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest,
>   systems.  */
>if (MEM_P (src))
>  {
> +  /* If this is a PC-relative address, we would need another register to
> +  hold the address of the vector along with the variable offset.  The
> +  callers should use reg_or_non_pcrel_memory to make sure we don't
> +  get a PC-relative address here.  */

I don't understand this comment, nor the problem.  Please expand?

> +  /* Allow prefixed instructions if supported.  If the bottom two bits of the
> + offset are non-zero, we could use a prefixed instruction (which does not
> + have the DS-form constraint that the traditional instruction had) 
> instead
> + of forcing the unaligned offset to a GPR.  */
> +  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
> +return true;

Here (and for DQ) you aren't testing TARGET_PREFIXED?

> @@ -7371,7 +7435,7 @@ mem_operand_gpr (rtx op, machine_mode mo
> causes a wrap, so test only the low 16 bits.  */
>  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x1u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

This is a separate patch (and pre-approved).

> @@ -7404,7 +7475,7 @@ mem_operand_ds_form (rtx op, machine_mod
> causes a wrap, so test only the low 16 bits.  */
>  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x1u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

Together with this.

> -  offset += 0x8000;
> -  return offset < 0x1 - extra;
> +  if (TARGET_PREFIXED_ADDR)
> +return 

[PATCH] PR fortran/91784 -- Simplify EXPR_OP in conversion of constants

2019-10-01 Thread Steve Kargl
The attached patch has been tested on x86_64-*-freebsd.
OK to commmit?

In a previous patch, I specialized the simplfication of
an EXPR_OP to the case of an inserted parenthesis.  This
was too restrictive as evidenced by the new test case.
The patch simply does a simplification of an expression.
 
2019-10-01  Steven G. Kargl  

PR fortran/91784
* simplify.c (gfc_convert_constant): Simplify expression if the
expression type is EXPR_OP.

2019-10-01  Steven G. Kargl  

PR fortran/91784
* gfortran.dg/pr91784.f90: New test.

-- 
Steve
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 276426)
+++ gcc/fortran/simplify.c	(working copy)
@@ -8508,10 +8508,10 @@ gfc_convert_constant (gfc_expr *e, bt type, int kind)
 	{
 	  if (c->expr->expr_type == EXPR_ARRAY)
 		tmp = gfc_convert_constant (c->expr, type, kind);
-	  else if (c->expr->expr_type == EXPR_OP
-		   && c->expr->value.op.op == INTRINSIC_PARENTHESES)
+	  else if (c->expr->expr_type == EXPR_OP)
 		{
-		  gfc_simplify_expr (c->expr, 1);
+		  if (!gfc_simplify_expr (c->expr, 1))
+		return _bad_expr;
 		  tmp = f (c->expr, kind);
 		}
 	  else
Index: gcc/testsuite/gfortran.dg/pr91784.f90
===
--- gcc/testsuite/gfortran.dg/pr91784.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91784.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do run }
+! PR fortran/91784
+! Code originally contributed by Gerhard Steinmetz
+program p
+   complex :: x(1)
+   x = (1.0, 2.0) * [real :: -(3.0 + 4.0)]
+   if (int(real(x(1))) /= -7) stop 1
+   if (int(aimag(x(1))) /= -14) stop 2
+end


[PATCH] PR fortran/91942 -- Inquiry parameter cannot be IO tag

2019-10-01 Thread Steve Kargl
The attached patch has been tested on x86_64-*-freebsd.
OK to commit?

If an inquiry parameter cannot appear in a file IO statement
as a tag (see new testcase).  The patch re-arranges the code
to prevent an ICE due to 'result->symtree = NULL;', checks
for a constant in a variable definition context, and takes
a different exit route to get better error messages.

2019-10-01  Steven G. Kargl  

PR fortran/91942
* io.c (match_vtag): Check for non-NULL result->symtree.
(match_out_tag): Check for invalid constant due to inquiry parameter.
(match_filepos): Instead of a syntax error, go to cleanup to get better
error messages.


2019-10-01  Steven G. Kargl  

PR fortran/91942
* gfortran.dg/pr91587.f90: Update dg-error regex.
* gfortran.dg/pr91942.f90: New test.
-- 
Steve
Index: gcc/fortran/io.c
===
--- gcc/fortran/io.c	(revision 276426)
+++ gcc/fortran/io.c	(working copy)
@@ -1482,25 +1482,30 @@ match_vtag (const io_tag *tag, gfc_expr **v)
   return MATCH_ERROR;
 }
 
-  if (result->symtree->n.sym->attr.intent == INTENT_IN)
+  if (result->symtree)
 {
-  gfc_error ("Variable %s cannot be INTENT(IN) at %C", tag->name);
-  gfc_free_expr (result);
-  return MATCH_ERROR;
-}
+  bool impure;
 
-  bool impure = gfc_impure_variable (result->symtree->n.sym);
-  if (impure && gfc_pure (NULL))
-{
-  gfc_error ("Variable %s cannot be assigned in PURE procedure at %C",
-		 tag->name);
-  gfc_free_expr (result);
-  return MATCH_ERROR;
-}
+  if (result->symtree->n.sym->attr.intent == INTENT_IN)
+	{
+	  gfc_error ("Variable %s cannot be INTENT(IN) at %C", tag->name);
+	  gfc_free_expr (result);
+	  return MATCH_ERROR;
+	}
 
-  if (impure)
-gfc_unset_implicit_pure (NULL);
+  impure = gfc_impure_variable (result->symtree->n.sym);
+  if (impure && gfc_pure (NULL))
+	{
+	  gfc_error ("Variable %s cannot be assigned in PURE procedure at %C",
+		 tag->name);
+	  gfc_free_expr (result);
+	  return MATCH_ERROR;
+	}
 
+  if (impure)
+	gfc_unset_implicit_pure (NULL);
+}
+
   *v = result;
   return MATCH_YES;
 }
@@ -1515,8 +1520,17 @@ match_out_tag (const io_tag *tag, gfc_expr **result)
 
   m = match_vtag (tag, result);
   if (m == MATCH_YES)
-gfc_check_do_variable ((*result)->symtree);
+{
+  if ((*result)->symtree)
+	gfc_check_do_variable ((*result)->symtree);
 
+  if ((*result)->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error ("Expecting a variable at %L", &(*result)->where);
+	  return MATCH_ERROR;
+	}
+}
+
   return m;
 }
 
@@ -2845,7 +2859,7 @@ match_filepos (gfc_statement st, gfc_exec_op op)
 
   m = match_file_element (fp);
   if (m == MATCH_ERROR)
-goto syntax;
+goto cleanup;
   if (m == MATCH_NO)
 {
   m = gfc_match_expr (>unit);
Index: gcc/testsuite/gfortran.dg/pr91587.f90
===
--- gcc/testsuite/gfortran.dg/pr91587.f90	(revision 276426)
+++ gcc/testsuite/gfortran.dg/pr91587.f90	(working copy)
@@ -2,9 +2,9 @@
 ! PR fortran/91587
 ! Code contributed by Gerhard Steinmetz
 program p
-   backspace(err=!)  ! { dg-error "Syntax error in" }
-   flush(err=!)  ! { dg-error "Syntax error in" }
-   rewind(err=!) ! { dg-error "Syntax error in" }
+   backspace(err=!)  ! { dg-error "Invalid value for" }
+   flush(err=!)  ! { dg-error "Invalid value for" }
+   rewind(err=!) ! { dg-error "Invalid value for" }
 end
 
 subroutine bar   ! An other matcher runs, and gives a different error.
Index: gcc/testsuite/gfortran.dg/pr91942.f90
===
--- gcc/testsuite/gfortran.dg/pr91942.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91942.f90	(working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/91942
+! Code contributed by Gerhard Steinmetz
+program p
+   integer :: i
+   backspace (iostat=i%kind) ! { dg-error "Expecting a variable at" }
+   endfile (iostat=i%kind) ! { dg-error "Expecting END PROGRAM" }
+   flush (iostat=i%kind) ! { dg-error "Expecting a variable at" }
+   rewind (iostat=i%kind) ! { dg-error "Expecting a variable at" }
+end


Re: [LRA] Don't make eliminable registers live (PR91957)

2019-10-01 Thread Jeff Law
On 10/1/19 4:39 PM, Richard Sandiford wrote:
> One effect of https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00802.html
> was to strengthen the sanity check in lra_assigns so that it checks
> whether reg_renumber is consistent with the whole conflict set.
> This duly tripped on csky for a pseudo that had been allocated
> to the eliminated frame pointer.  (csky doesn't have a separate
> hard frame pointer.)
> 
> lra-lives uses:
> 
> /* Set of hard regs (except eliminable ones) currently live.  */
> static HARD_REG_SET hard_regs_live;
> 
> to track the set of live directly-referenced hard registers, and it
> correctly implements the exclusion when setting up the initial set:
> 
>   hard_regs_live &= ~eliminable_regset;
> 
> But later calls to make_hard_regno_live and make_hard_regno_dead
> would process eliminable registers like other registers, recording
> conflicts for them and potentially making them live.  (Note that
> after r266086, make_hard_regno_dead adds conflicts for registers
> that are already marked dead.)  I think this would have had the
> effect of pessimising targets without a separate hard frame pointer.
> 
> Tested on aarch64-linux-gnu (where it probably doesn't do much)
> and against the testcase on csky-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-10-01  Richard Sandiford  
> 
> gcc/
>   PR middle-end/91957
>   * lra-lives.c (make_hard_regno_dead): Don't record conflicts for
>   eliminable registers.
>   (make_hard_regno_live): Likewise, and don't make them live.
OK
jeff


[LRA] Don't make eliminable registers live (PR91957)

2019-10-01 Thread Richard Sandiford
One effect of https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00802.html
was to strengthen the sanity check in lra_assigns so that it checks
whether reg_renumber is consistent with the whole conflict set.
This duly tripped on csky for a pseudo that had been allocated
to the eliminated frame pointer.  (csky doesn't have a separate
hard frame pointer.)

lra-lives uses:

/* Set of hard regs (except eliminable ones) currently live.  */
static HARD_REG_SET hard_regs_live;

to track the set of live directly-referenced hard registers, and it
correctly implements the exclusion when setting up the initial set:

  hard_regs_live &= ~eliminable_regset;

But later calls to make_hard_regno_live and make_hard_regno_dead
would process eliminable registers like other registers, recording
conflicts for them and potentially making them live.  (Note that
after r266086, make_hard_regno_dead adds conflicts for registers
that are already marked dead.)  I think this would have had the
effect of pessimising targets without a separate hard frame pointer.

Tested on aarch64-linux-gnu (where it probably doesn't do much)
and against the testcase on csky-linux-gnu.  OK to install?

Richard


2019-10-01  Richard Sandiford  

gcc/
PR middle-end/91957
* lra-lives.c (make_hard_regno_dead): Don't record conflicts for
eliminable registers.
(make_hard_regno_live): Likewise, and don't make them live.

Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c 2019-10-01 09:55:35.146088630 +0100
+++ gcc/lra-lives.c 2019-10-01 23:36:02.545946185 +0100
@@ -281,7 +281,8 @@ update_pseudo_point (int regno, int poin
 make_hard_regno_live (int regno)
 {
   lra_assert (HARD_REGISTER_NUM_P (regno));
-  if (TEST_HARD_REG_BIT (hard_regs_live, regno))
+  if (TEST_HARD_REG_BIT (hard_regs_live, regno)
+  || TEST_HARD_REG_BIT (eliminable_regset, regno))
 return;
   SET_HARD_REG_BIT (hard_regs_live, regno);
   sparseset_set_bit (start_living, regno);
@@ -295,6 +296,9 @@ make_hard_regno_live (int regno)
 static void
 make_hard_regno_dead (int regno)
 {
+  if (TEST_HARD_REG_BIT (eliminable_regset, regno))
+return;
+
   lra_assert (HARD_REGISTER_NUM_P (regno));
   unsigned int i;
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)


Re: [PATCH] Fix PR fortran/91716

2019-10-01 Thread Steve Kargl
On Tue, Oct 01, 2019 at 07:58:37PM +, Bernd Edlinger wrote:
> On 9/13/19 12:16 PM, Janne Blomqvist wrote:
> > On Fri, Sep 13, 2019 at 1:07 PM Bernd Edlinger
> >  wrote:
> >>
> >> Hi,
> >>
> >> this fixes a test case where a short string constant is put in a larger 
> >> memory object.
> >>
> >> The consistency check in varasm.c is failed because both types should 
> >> agree.
> >>
> >> Since the failed assertion is just a gcc_checking_assert I think a 
> >> back-port of this fix
> >> to the gcc-9 branch will not be necessary.
> >>
> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> >> Is it OK for trunk?
> >>
> >> Thanks
> >> Bernd.
> > 
> > Ok.
> > 
> 
> Well, I have mistakenly assumed that this triggers only a "checking" assert,
> but it turned out that is not the case, as written in last comment in the BZ,
> immediately after that gcc_checking_assert, there is a gcc_assert, and also
> an ICE in the gcc-9 branch.  The same patch fixes also the second problem,
> and survives reg-bootstrap and testing on x86_64-pc-linux-gnu as expected.
> 
> 
> So I would like to ask at this time, if it is also OK for gcc-9 ?
> 
> 

If you're backporting a patch from trunk to 9-branch, then
I think it is ok to commit.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


[RFC][Fortran,patch] %C error diagnostic location

2019-10-01 Thread Tobias Burnus

Hi all,

my feeling is that %C locations are always off by one, e.g., showing the 
(1) under the last white-space character before the place where the 
error occurred – the match starts at the character after the 
gfc_current_location.


That bothered my for a while – but today, I was wondering whether one 
shouldn't simply bump the %C location by one – such that it shows at the 
first wrong character and not at the last okay character.


What do you think?


Another observation (unfixed): If gfortran buffers the error, the %C 
does not seem to get resolved at gfc_{error,warning} time but at the 
time when the buffer is flushed – which will have a reset error location.


Cheers,

Tobias


	* error (error_print, gfc_format_decoder): Fix off-by one issue with %C.

diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index a0ce7a6b190..815cae9d7e7 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -618,12 +618,18 @@ error_print (const char *type, const char *format0, va_list argp)
 	  {
 		l2 = loc;
 		arg[pos].u.stringval = "(2)";
+		/* Point %C first offending character not the last good one. */
+		if (arg[pos].type == TYPE_CURRENTLOC)
+		  l2->nextc++;
 	  }
 	else
 	  {
 		l1 = loc;
 		have_l1 = 1;
 		arg[pos].u.stringval = "(1)";
+		/* Point %C first offending character not the last good one. */
+		if (arg[pos].type == TYPE_CURRENTLOC)
+		  l1->nextc++;
 	  }
 	break;
 
@@ -963,6 +969,9 @@ gfc_format_decoder (pretty_printer *pp, text_info *text, const char *spec,
 	  loc = va_arg (*text->args_ptr, locus *);
 	gcc_assert (loc->nextc - loc->lb->line >= 0);
 	unsigned int offset = loc->nextc - loc->lb->line;
+	if (*spec == 'C')
+	  /* Point %C first offending character not the last good one. */
+	  offset++;
 	/* If location[0] != UNKNOWN_LOCATION means that we already
 	   processed one of %C/%L.  */
 	int loc_num = text->get_location (0) == UNKNOWN_LOCATION ? 0 : 1;
@@ -1400,7 +1409,7 @@ gfc_internal_error (const char *gmsgid, ...)
 void
 gfc_clear_error (void)
 {
-  error_buffer.flag = 0;
+  error_buffer.flag = false;
   warnings_not_errors = false;
   gfc_clear_pp_buffer (pp_error_buffer);
 }


[committed] Support prefixes in diagnostic_show_locus

2019-10-01 Thread David Malcolm
Previously, diagnostic_show_locus saved and restored the pretty_printer's
prefix, clearing it for the duration of the call.

I have a patch kit in development that can benefit from applying a prefix
to the output of d_s_l, so this patch adds support to d_s_l for printing
such prefixes.

It moves the save and restore of the pp's prefix from d_s_l to all of its
callers, and updates diagnostic-show-locus.c to properly handle prefixes.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r276433.

gcc/c-family/ChangeLog:
* c-opts.c (c_diagnostic_finalizer): Temporarily clear prefix when
calling diagnostic_show_locus, rather than destroying it afterwards.

gcc/ChangeLog:
* diagnostic-show-locus.c (layout::print_gap_in_line_numbering):
Call pp_emit_prefix.
(layout::print_source_line): Likewise.
(layout::start_annotation_line): Likewise.
(diagnostic_show_locus): Remove call to temporarily clear the
prefix.
(selftest::test_one_liner_fixit_remove): Add test coverage for the
interaction of pp_set_prefix with rulers and fix-it hints.
* diagnostic.c (default_diagnostic_finalizer): Temporarily clear
prefix when calling diagnostic_show_locus, rather than destroying
it afterwards.
(print_parseable_fixits): Temporarily clear prefix.
* pretty-print.c (pp_format): Save and restore line_length, rather
than assuming it is zero.
(pp_output_formatted_text): Remove assertion that line_length is
zero.

gcc/fortran/ChangeLog:
* error.c (gfc_diagnostic_starter): Clear the prefix before
calling diagnostic_show_locus.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_group_plugin.c (test_begin_group_cb):
Clear the prefix before emitting the "END GROUP" line.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
(custom_diagnostic_finalizer): Temporarily clear prefix when
calling diagnostic_show_locus, rather than destroying it
afterwards.
---
 gcc/c-family/c-opts.c  |  4 +-
 gcc/diagnostic-show-locus.c| 96 +++---
 gcc/diagnostic.c   |  9 +-
 gcc/fortran/error.c|  1 +
 gcc/pretty-print.c |  4 +-
 .../gcc.dg/plugin/diagnostic_group_plugin.c|  1 +
 .../plugin/diagnostic_plugin_test_show_locus.c |  5 +-
 7 files changed, 101 insertions(+), 19 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 23ab4cf..949d96a 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -168,11 +168,13 @@ c_diagnostic_finalizer (diagnostic_context *context,
diagnostic_info *diagnostic,
diagnostic_t)
 {
+  char *saved_prefix = pp_take_prefix (context->printer);
+  pp_set_prefix (context->printer, NULL);
   diagnostic_show_locus (context, diagnostic->richloc, diagnostic->kind);
   /* By default print macro expansion contexts in the diagnostic
  finalizer -- for tokens resulting from macro expansion.  */
   virt_loc_aware_diagnostic_finalizer (context, diagnostic);
-  pp_destroy_prefix (context->printer);
+  pp_set_prefix (context->printer, saved_prefix);
   pp_flush (context->printer);
 }
 
diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 42146c5..0cce82f 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -1039,6 +1039,8 @@ layout::print_gap_in_line_numbering ()
 {
   gcc_assert (m_show_line_numbers_p);
 
+  pp_emit_prefix (m_pp);
+
   for (int i = 0; i < m_linenum_width + 1; i++)
 pp_character (m_pp, '.');
 
@@ -1266,6 +1268,8 @@ layout::print_source_line (linenum_type row, const char 
*line, int line_width,
   line_width);
   line += m_x_offset;
 
+  pp_emit_prefix (m_pp);
+
   if (m_show_line_numbers_p)
 {
   int width = num_digits (row);
@@ -1346,6 +1350,7 @@ layout::should_print_annotation_line_p (linenum_type row) 
const
 void
 layout::start_annotation_line (char margin_char) const
 {
+  pp_emit_prefix (m_pp);
   if (m_show_line_numbers_p)
 {
   /* Print the margin.  If MARGIN_CHAR != ' ', then print up to 3
@@ -2297,9 +2302,6 @@ diagnostic_show_locus (diagnostic_context * context,
 
   context->last_location = loc;
 
-  char *saved_prefix = pp_take_prefix (context->printer);
-  pp_set_prefix (context->printer, NULL);
-
   layout layout (context, richloc, diagnostic_kind);
   for (int line_span_idx = 0; line_span_idx < layout.get_num_line_spans ();
line_span_idx++)
@@ -2329,8 +2331,6 @@ diagnostic_show_locus (diagnostic_context * context,
   row <= last_line; row++)
layout.print_line (row);
 }
-
-  pp_set_prefix (context->printer, saved_prefix);
 }
 
 #if CHECKING_P
@@ -2463,23 +2463,93 @@ 

Re: [C++ PATCH] PR c++/91369 - Implement P0784R7: constexpr new

2019-10-01 Thread Jason Merrill

On 9/27/19 4:31 PM, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement P0784R7, which includes constexpr
destructors and constexpr new/delete/new[]/delete[].

::operator new is allowed during constexpr evaluation and returns address of
an artificial VAR_DECL with special name.  At this point we don't really
know the type of the heap storage, just size.  Later on when we encounter
cast to the corresponding pointer type, we change the name of the var and
type to match the type from the new expression (for new[] we need to do
further stuff as at the point where build_new_1 is called, we might not know
the exact array size, but we shall know that during the constexpr
evaluation, and cookie handling also complicates it a little bit).
When we first store into such heap objects, a ctor is created for them on
the fly.  Finally, ::operator delete marks those heap VAR_DECLs as deleted
and cxx_eval_outermost_constant_expr checks if everything that has been
allocated has been also deallocated and verifies addresses of those heap
vars aren't leaking into the return value.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-09-27  Jakub Jelinek  

PR c++/91369 - Implement P0784R7: constexpr new
c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_constexpr_dynamic_alloc=201907 for -std=c++2a.
cp/
* cp-tree.h (enum cp_tree_index): Add CPTI_HEAP_UNINIT_IDENTIFIER,
CPTI_HEAP_IDENTIFIER and CPTI_HEAP_DELETED_IDENTIFIER.
(heap_uninit_identifier, heap_identifier, heap_deleted_identifier):
Define.
(type_has_constexpr_destructor, cxx_constant_dtor): Declare.
* class.c (type_maybe_constexpr_default_constructor): Make static.
(type_maybe_constexpr_destructor, type_has_constexpr_destructor): New
functions.
(finalize_literal_type_property): For c++2a, don't clear
CLASSTYPE_LITERAL_P for types without trivial destructors unless they
have non-constexpr destructors.
(explain_non_literal_class): For c++2a, complain about non-constexpr
destructors rather than about non-trivial destructors.
* constexpr.c: Include stor-layout.h.
(struct constexpr_ctx): Add heap_vars field.
(cxx_eval_call_expression): For c++2a allow calls to replaceable
global allocation functions, for new return address of a heap uninit
var, for delete record its deletion.
(initialized_type): Handle destructors for c++2a.
(cxx_fold_indirect_ref): Also handle array fields in structures.
(non_const_var_error): Add auto_diagnostic_group sentinel.  Emit
special diagnostics for heap variables.
(cxx_eval_store_expression): Create ctor for heap variables on the
first write.  Formatting fix.  Handle const_object_being_modified
with array type.
(cxx_eval_loop_expr): Initialize jump_target if NULL.
(cxx_eval_constant_expression) : If not skipping
upon entry to body, run cleanup with the same *jump_target as it
started to run the cleanup even if the body returns, breaks or
continues.
: Formatting fix.  On cast of replaceable global
allocation function to some pointer type, adjust the type of
the heap variable and change name from heap_uninit_identifier
to heap_identifier.
(find_heap_var_refs): New function.
(cxx_eval_outermost_constant_expr): Add constexpr_dtor argument,
handle evaluation of constexpr dtors and add tracking of heap
variables.  Use tf_no_cleanup for get_target_expr_with_sfinae.
(cxx_constant_value): Adjust cxx_eval_outermost_constant_expr caller.
(cxx_constant_dtor): New function.
(maybe_constant_value, fold_non_dependent_expr_template,
maybe_constant_init_1): Adjust cxx_eval_outermost_constant_expr
callers.
(potential_constant_expression_1): Ignore clobbers.  Allow
COND_EXPR_IS_VEC_DELETE for c++2a.  Allow CLEANUP_STMT.
* decl.c (initialize_predefined_identifiers): Add heap identifiers.
(cp_finish_decl): Don't clear TREE_READONLY for constexpr variables
with non-trivial, but constexpr destructors.
(register_dtor_fn): For constexpr variables with constexpr non-trivial
destructors call cxx_constant_dtor instead of adding destructor calls
at runtime.
(expand_static_init): For constexpr variables with constexpr
non-trivial destructors call cxx_maybe_build_cleanup.
(grokdeclarator): Allow constexpr destructors for c++2a.  Formatting
fix.
(cxx_maybe_build_cleanup): For constexpr variables with constexpr
non-trivial destructors call cxx_constant_dtor instead of adding
destructor calls at runtime.
* init.c: Include stor-layout.h.
(build_new_1): For c++2a and new[], add cast around the alloc call
to help constexpr 

Re: [SVE] PR91532

2019-10-01 Thread Prathamesh Kulkarni
On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
>
> On 10/1/19 12:40 AM, Richard Biener wrote:
> > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> >
> >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
> >>>
> >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> >>>
>  On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> >
> > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> >> Hi,
> >> For PR91532, the dead store is trivially deleted if we place dse pass
> >> between ifcvt and vect. Would it be OK to add another instance of dse 
> >> there ?
> >> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> >> will clean up the dead store ?
> > I'd hesitate to add another DSE pass.  If there's one nearby could we
> > move the existing pass?
>  Well I think the nearest one is just after pass_warn_restrict. Not
>  sure if it's a good
>  idea to move it up from there ?
> >>>
> >>> You'll need it inbetween ifcvt and vect so it would be disabled
> >>> w/o vectorization, so no, that doesn't work.
> >>>
> >>> ifcvt already invokes SEME region value-numbering so if we had
> >>> MESE region DSE it could use that.  Not sure if you feel like
> >>> refactoring DSE to work on regions - it currently uses a DOM
> >>> walk which isn't suited for that.
> >>>
> >>> if-conversion has a little "local" dead predicate compute removal
> >>> thingy (not that I like that), eventually it can be enhanced to
> >>> do the DSE you want?  Eventually it should be moved after the local
> >>> CSE invocation though.
> >> Hi,
> >> Thanks for the suggestions.
> >> For now, would it be OK to do "dse" on loop header in
> >> tree_if_conversion, as in the attached patch ?
> >> The patch does local dse in a new function ifcvt_local_dse instead of
> >> ifcvt_local_dce, because it needed to be done after RPO VN which
> >> eliminates:
> >> Removing dead stmt _ifc__62 = *_55;
> >> and makes the following store dead:
> >> *_55 = _ifc__61;
> >
> > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > try that as independent patch (pre-approved).
> >
> > I don't mind the extra walk though.
> >
> > What I see as possible issue is that dse_classify_store walks virtual
> > uses and I'm not sure if the loop exit is a natural boundary for
> > such walk (eventually the loop header virtual PHI is reached but
> > there may also be a loop-closed PHI for the virtual operand,
> > but not necessarily).  So the question is whether to add a
> > "stop at" argument to dse_classify_store specifying the virtual
> > use the walk should stop at?
> I think we want to stop at the block boundary -- aren't the cases we
> care about here local to a block?
This version restricts walking in dse_classify_store to basic-block if
bb_only is true,
and removes dead stores in ifcvt_local_dce instead of separate walk.
Does it look OK ?

Thanks,
Prathamesh
>
> jeff
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 822aae5b83f..11df055ccf4 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tree-ssa-sccvn.h"
 #include "tree-cfgcleanup.h"
+#include "tree-ssa-dse.h"
 
 /* Only handle PHIs with no more arguments unless we are asked to by
simd pragma.  */
@@ -2960,6 +2961,18 @@ ifcvt_local_dce (basic_block bb)
   while (!gsi_end_p (gsi))
 {
   stmt = gsi_stmt (gsi);
+  if (gimple_store_p (stmt))
+{
+  tree lhs = gimple_get_lhs (stmt);
+  ao_ref write;
+  ao_ref_init (, lhs);
+  if (dse_classify_store (, stmt, false, NULL, NULL, true)
+  == DSE_STORE_DEAD)
+delete_dead_or_redundant_assignment (, "dead");
+	  gsi_next ();
+	  continue;
+	}
+
   if (gimple_plf (stmt, GF_PLF_2))
 	{
 	  gsi_next ();
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index ba67884a825..7148654b7d9 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "alias.h"
 #include "tree-ssa-loop.h"
+#include "tree-ssa-dse.h"
 
 /* This file implements dead store elimination.
 
@@ -76,21 +77,13 @@ along with GCC; see the file COPYING3.  If not see
fact, they are the same transformation applied to different views of
the CFG.  */
 
-static void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char *);
+void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char *);
 static void delete_dead_or_redundant_call (gimple_stmt_iterator *, const char *);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
remove their dead edges eventually.  */
 static bitmap need_eh_cleanup;
 
-/* Return value from dse_classify_store */
-enum dse_store_status
-{
-  DSE_STORE_LIVE,
-  DSE_STORE_MAYBE_PARTIAL_DEAD,
-  DSE_STORE_DEAD
-};
-
 /* STMT is a statement that may write into memory.  Analyze it 

Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-01 Thread Jeff Law
On 9/27/19 12:23 PM, Aaron Sawdey wrote:
> This is the third piece of my effort to improve inline expansion of memmove. 
> The
> first two parts I posted back in June fixed the names of the optab entries
> involved so that optab cpymem is used for memcpy() and optab movmem is used 
> for
> memmove(). This piece adds support for actually attempting to invoke the 
> movmem
> optab to do inline expansion of __builtin_memmove().
> 
> Because what needs to be done for memmove() is very similar to memcpy(), I 
> have
> just added a bool parm "might_overlap" to several of the functions involved so
> the same functions can handle both. The name might_overlap comes from the fact
> that if we still have a memmove() call at expand, this means
> gimple_fold_builtin_memory_op() was not able to prove that the source and
> destination do not overlap.
> 
> There are a few places where might_overlap gets used to keep us from trying to
> use the by-pieces infrastructure or generate a copy loop, as neither of those
> things will work correctly if source and destination overlap.
> 
> I've restructured things slightly in emit_block_move_hints() so that we can
> try the pattern first if we already know that by-pieces won't work. This way
> we can bail out immediately in the might_overlap case.
> 
> Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything 
> passes,
> is this ok for trunk?
> 
> 
> 2019-09-27  Aaron Sawdey 
> 
>   * builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>   (expand_builtin_memcpy): Use might_overlap parm.
>   (expand_builtin_mempcpy_args): Use might_overlap parm.
>   (expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>   (expand_builtin_memory_copy_args): Add might_overlap parm.
>   * expr.c (emit_block_move_via_cpymem): Rename to
>   emit_block_move_via_pattern, add might_overlap parm, use cpymem
>   or movmem optab as appropriate.
>   (emit_block_move_hints): Add might_overlap parm, do the right
>   thing for might_overlap==true.
>   * expr.h (emit_block_move_hints): Update prototype.
> 
> 
> 
> 
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c(revision 276131)
> +++ gcc/builtins.c(working copy)
> @@ -3894,10 +3897,11 @@
>   _max_size);
>src_str = c_getstr (src);
> 
> -  /* If SRC is a string constant and block move would be done
> - by pieces, we can avoid loading the string from memory
> - and only stored the computed constants.  */
> -  if (src_str
> +  /* If SRC is a string constant and block move would be done by
> + pieces, we can avoid loading the string from memory and only
> + stored the computed constants.  I'm not sure if the by pieces
> + method works if src/dest are overlapping, so avoid that case.  */
> +  if (src_str && !might_overlap
I don't think you need the check here.  c_getstr, when it returns
somethign useful is going to be returning a string constant.  Think read
only literals here.  I'm pretty sure overlap isn't going to be possible.

>&& CONST_INT_P (len_rtx)
>&& (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
>&& can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
> @@ -3922,7 +3926,7 @@
>&& (retmode == RETURN_BEGIN || target == const0_rtx))
>  method = BLOCK_OP_TAILCALL;
>bool use_mempcpy_call = (targetm.libc_has_fast_function (BUILT_IN_MEMPCPY)
> -&& retmode == RETURN_END
> +&& retmode == RETURN_END && !might_overlap
Put the && !might_overlap on its own line for readability.


> Index: gcc/expr.c
> ===
> --- gcc/expr.c(revision 276131)
> +++ gcc/expr.c(working copy)
> @@ -1622,13 +1624,29 @@
>set_mem_size (y, const_size);
>  }
> 
> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
> +  bool pattern_ok = false;
> +
> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
> +{
> +  pattern_ok =
> + emit_block_move_via_pattern (x, y, size, align,
> +  expected_align, expected_size,
> +  min_size, max_size, probable_max_size,
> +  might_overlap);
> +  if (!pattern_ok && might_overlap)
> + {
> +   /* Do not try any of the other methods below as they are not safe
> +  for overlapping moves.  */
> +   *is_move_done = false;
> +   return retval;
> + }
> +}
> +
> +  if (pattern_ok) ;
Drop the semi-colon down to its own line like

if (whatever)
  ;
else if (...)
  something
else if (...)
  something else

With the changes noted above, this is fine for th trunk.

jeff


Re: [PATCH] Do not check call type compatibility when cloning cgraph edges

2019-10-01 Thread Jeff Law
On 9/30/19 3:30 AM, Martin Jambor wrote:
> Hi,
> 
> when looking into PR 70929 and thus looking at what
> gimple_check_call_matching_types and gimple_check_call_args (both in
> cgraph.c) do, I noticed that they get invoked a lot when cloning cgraph
> edges when code is being copied as part of inlining transformation, only
> to have their result overwritten by the value from the original edge.
> (They also fail a lot during that time because since call redirection has
> not taken place yet, the arguments are actually expected not to match.)
> 
> The following patch avoids that by adding a simple parameter to various
> create_edge methods which indicate that we are cloning and the call
> statement should therefore not be examined.  For consistency reasons I
> unfortunately also had to change the meaning of the last parameter of
> create_indirect_edge but there is only one place where it is called with
> value of the parameter specified and its intent has always been to
> behave differently when cloning.
> 
> Bootstrapped and tested on x86_64-linux.  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> 
> 2019-09-27  Martin Jambor  
> 
>   * cgraph.c (symbol_table::create_edge): New parameter cloning_p,
>   do not compute some stuff when set.
>   (cgraph_node::create_edge): Likewise.
>   (cgraph_node::create_indirect_edge): Renamed last parameter to
>   coning_p and flipped its meaning, don't even calculate
>   inline_failed when set.
>   * cgraph.h (cgraph_node::create_edge): Add new parameter.
>   (symbol_tablecreate_edge): Likewise.
>   (cgraph_node::create_indirect_edge): Rename last parameter, flip
>   the default value.
>   * cgraphclones.c (cgraph_edge::clone): Pass true cloning_p to all
>   call graph edge creating functions.
OK
jeff
> ---


[PATCH] Make some new algorithms work in parallel mode

2019-10-01 Thread Jonathan Wakely

Tested x86_64-linux (normal and parallel modes), committed to trunk.

commit 92d5df45f3e62d31bebcdea9c1568b0877d0fe0e
Author: Jonathan Wakely 
Date:   Tue Oct 1 20:34:22 2019 +0100

Make some new algorithms work in parallel mode

* include/experimental/algorithm (experimental::sample): Qualify call
to __sample correctly.
* include/parallel/algo.h (sample, for_each_n): Add using-declarations
for algorithms that don't have parallel implementations.

diff --git a/libstdc++-v3/include/experimental/algorithm b/libstdc++-v3/include/experimental/algorithm
index 8ba212c5132..f036a713ef3 100644
--- a/libstdc++-v3/include/experimental/algorithm
+++ b/libstdc++-v3/include/experimental/algorithm
@@ -77,9 +77,9 @@ inline namespace fundamentals_v2
 		"sample size must be an integer type");
 
   typename iterator_traits<_PopulationIterator>::difference_type __d = __n;
-  return std::__sample(__first, __last, __pop_cat{}, __out, __samp_cat{},
-			   __d,
-			   std::forward<_UniformRandomNumberGenerator>(__g));
+  return _GLIBCXX_STD_A::
+	__sample(__first, __last, __pop_cat{}, __out, __samp_cat{}, __d,
+		 std::forward<_UniformRandomNumberGenerator>(__g));
 }
 
   template= 201703L
+  using _GLIBCXX_STD_A::for_each_n;
+  using _GLIBCXX_STD_A::sample;
+#endif
 } // end namespace
 } // end namespace
 


[PATCH] Make some parallel mode algorithms usable in constexpr contexts

2019-10-01 Thread Jonathan Wakely

Tested x86_64-linux (normal and parallel modes), committed to trunk.

Most algos still aren't constexpr in parallel mode, this just fixes
the ones needed by std::array.


commit 9b07c6cfd553566a3ebdbab72b12e007d4912bf1
Author: Jonathan Wakely 
Date:   Tue Oct 1 20:33:08 2019 +0100

Make some parallel mode algorithms usable in constexpr contexts

This makes the __parallel::equal and __parallel:lexicographical_compare
algorithms usable in constant expressions, by dispatching to the
sequential algorithm when calling during constant evaluation.

* include/parallel/algobase.h (equal, lexicographical_compare): Add
_GLIBCXX20_CONSTEXPR and dispatch to sequential algorithm when being
constant evaluated.
* include/parallel/algorithmfwd.h (equal, lexicographical_compare):
Add _GLIBCXX20_CONSTEXPR.

diff --git a/libstdc++-v3/include/parallel/algobase.h b/libstdc++-v3/include/parallel/algobase.h
index 829eb11306b..d78bdc961a1 100644
--- a/libstdc++-v3/include/parallel/algobase.h
+++ b/libstdc++-v3/include/parallel/algobase.h
@@ -214,19 +214,31 @@ namespace __parallel
 
   // Public interface
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 equal(_IIter1 __begin1, _IIter1 __end1, _IIter2 __begin2)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::equal(__begin1, __end1, __begin2);
+#endif
+
   return __gnu_parallel::mismatch(__begin1, __end1, __begin2).first
   == __end1;
 }
 
   // Public interface
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 equal(_IIter1 __begin1, _IIter1 __end1, _IIter2 __begin2, 
   _Predicate __pred)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::equal(__begin1, __end1, __begin2, __pred);
+#endif
+
   return __gnu_parallel::mismatch(__begin1, __end1, __begin2, __pred).first
   == __end1;
 }
@@ -286,9 +298,15 @@ namespace __parallel
 }
 
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 equal(_IIter1 __begin1, _IIter1 __end1, _IIter2 __begin2, _IIter2 __end2)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::equal(__begin1, __end1, __begin2, __end2);
+#endif
+
   typedef __gnu_parallel::_EqualTo<
 	typename std::iterator_traits<_IIter1>::value_type,
 	typename std::iterator_traits<_IIter2>::value_type> _EqualTo;
@@ -299,15 +317,22 @@ namespace __parallel
 }
 
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 equal(_IIter1 __begin1, _IIter1 __end1,
 	  _IIter2 __begin2, _IIter2 __end2, _BinaryPredicate __binary_pred)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::equal(__begin1, __end1, __begin2, __end2,
+ __binary_pred);
+#endif
+
   return __equal_switch(__begin1, __end1, __begin2, __end2, __binary_pred,
 			std::__iterator_category(__begin1),
 			std::__iterator_category(__begin2));
 }
-#endif
+#endif // C++14
 
   // Sequential fallback
   template
@@ -391,10 +416,17 @@ namespace __parallel
 
   // Public interface
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 lexicographical_compare(_IIter1 __begin1, _IIter1 __end1,
 _IIter2 __begin2, _IIter2 __end2)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::lexicographical_compare(__begin1, __end1,
+		   __begin2, __end2);
+#endif
+
   typedef iterator_traits<_IIter1> _TraitsType1;
   typedef typename _TraitsType1::value_type _ValueType1;
   typedef typename _TraitsType1::iterator_category _IteratorCategory1;
@@ -411,11 +443,19 @@ namespace __parallel
 
   // Public interface
   template
+_GLIBCXX20_CONSTEXPR
 inline bool
 lexicographical_compare(_IIter1 __begin1, _IIter1 __end1,
 _IIter2 __begin2, _IIter2 __end2,
 _Predicate __pred)
 {
+#if __cplusplus > 201703L
+  if (std::is_constant_evaluated())
+	return _GLIBCXX_STD_A::lexicographical_compare(__begin1, __end1,
+		   __begin2, __end2,
+		   __pred);
+#endif
+
   typedef iterator_traits<_IIter1> _TraitsType1;
   typedef typename _TraitsType1::iterator_category _IteratorCategory1;
 
diff --git a/libstdc++-v3/include/parallel/algorithmfwd.h b/libstdc++-v3/include/parallel/algorithmfwd.h
index a6d03a50cfc..a227ebac2a3 100644
--- a/libstdc++-v3/include/parallel/algorithmfwd.h
+++ b/libstdc++-v3/include/parallel/algorithmfwd.h
@@ -130,10 +130,12 @@ namespace __parallel
   __gnu_parallel::sequential_tag);
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 equal(_IIter1, _IIter1, _IIter2);
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 equal(_IIter1, _IIter1, _IIter2, _Predicate);
 
@@ -285,10 +287,12 @@ namespace __parallel
 

[PATCH] Disable tests that aren't valid in parallel mode

2019-10-01 Thread Jonathan Wakely

Tested x86_64-linux (normal and parallel modes), committed to trunk.
commit b11c8f480fe1cd5696ec1a8f0db481c5f45429b8
Author: Jonathan Wakely 
Date:   Tue Oct 1 20:31:51 2019 +0100

Disable tests that aren't valid in parallel mode

Tests that depend on debug mode can't be tested in parallel mode.

* testsuite/17_intro/using_namespace_std_tr1_neg.cc: Skip test for
parallel mode.
* testsuite/20_util/hash/84998.cc: Likewise.
* testsuite/23_containers/deque/types/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/list/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/multiset/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/set/pmr_typedefs_debug.cc: Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/unordered_set/pmr_typedefs_debug.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Likewise.
* testsuite/23_containers/vector/types/pmr_typedefs_debug.cc: Likewise.
* testsuite/25_algorithms/binary_search/partitioned.cc: Likewise.
* testsuite/25_algorithms/copy/86658.cc: Likewise.
* testsuite/25_algorithms/equal_range/partitioned.cc: Likewise.
* testsuite/25_algorithms/lexicographical_compare/71545.cc: Likewise.
* testsuite/25_algorithms/lower_bound/partitioned.cc: Likewise.
* testsuite/25_algorithms/upper_bound/partitioned.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/17_intro/using_namespace_std_tr1_neg.cc b/libstdc++-v3/testsuite/17_intro/using_namespace_std_tr1_neg.cc
index 31242760179..bdc41507424 100644
--- a/libstdc++-v3/testsuite/17_intro/using_namespace_std_tr1_neg.cc
+++ b/libstdc++-v3/testsuite/17_intro/using_namespace_std_tr1_neg.cc
@@ -18,7 +18,7 @@
 // .
 
 // NB: parallel-mode uses TR1 bits...
-#undef _GLIBCXX_PARALLEL
+// { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PARALLEL" } }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/hash/84998.cc b/libstdc++-v3/testsuite/20_util/hash/84998.cc
index 1cf57e9073c..b00df223415 100644
--- a/libstdc++-v3/testsuite/20_util/hash/84998.cc
+++ b/libstdc++-v3/testsuite/20_util/hash/84998.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-D_GLIBCXX_DEBUG" }
 // { dg-do compile { target c++11 } }
+// { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PARALLEL" } }
 
 // PR libstdc++/84998
 
diff --git a/libstdc++-v3/testsuite/23_containers/deque/types/pmr_typedefs_debug.cc b/libstdc++-v3/testsuite/23_containers/deque/types/pmr_typedefs_debug.cc
index ac96584a6e0..9bee219b58f 100644
--- a/libstdc++-v3/testsuite/23_containers/deque/types/pmr_typedefs_debug.cc
+++ b/libstdc++-v3/testsuite/23_containers/deque/types/pmr_typedefs_debug.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17 -D_GLIBCXX_DEBUG" }
 // { dg-do compile { target c++17 } }
+// { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PARALLEL" } }
 
 #include 
 static_assert(std::is_same_v<
diff --git a/libstdc++-v3/testsuite/23_containers/forward_list/pmr_typedefs_debug.cc b/libstdc++-v3/testsuite/23_containers/forward_list/pmr_typedefs_debug.cc
index 7df01d530f8..66138ba830c 100644
--- a/libstdc++-v3/testsuite/23_containers/forward_list/pmr_typedefs_debug.cc
+++ b/libstdc++-v3/testsuite/23_containers/forward_list/pmr_typedefs_debug.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17 -D_GLIBCXX_DEBUG" }
 // { dg-do compile { target c++17 } }
+// { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PARALLEL" } }
 
 #include 
 static_assert(std::is_same_v<
diff --git a/libstdc++-v3/testsuite/23_containers/list/pmr_typedefs_debug.cc b/libstdc++-v3/testsuite/23_containers/list/pmr_typedefs_debug.cc
index d59f8c41d1d..b1bb271aa1c 100644
--- a/libstdc++-v3/testsuite/23_containers/list/pmr_typedefs_debug.cc
+++ b/libstdc++-v3/testsuite/23_containers/list/pmr_typedefs_debug.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17 -D_GLIBCXX_DEBUG" }
 // { dg-do compile { target c++17 } }
+// { dg-skip-if "" { *-*-* } { "-D_GLIBCXX_PARALLEL" } }
 
 #include 
 static_assert(std::is_same_v<
diff --git a/libstdc++-v3/testsuite/23_containers/map/pmr_typedefs_debug.cc b/libstdc++-v3/testsuite/23_containers/map/pmr_typedefs_debug.cc
index d06673027e5..c959b6957ed 100644
--- a/libstdc++-v3/testsuite/23_containers/map/pmr_typedefs_debug.cc
+++ b/libstdc++-v3/testsuite/23_containers/map/pmr_typedefs_debug.cc
@@ -17,6 

[PATCH] Fix non-reserved names in Parallel Mode headers

2019-10-01 Thread Jonathan Wakely

* include/parallel/algo.h: Replace non-reserved names.
* include/parallel/multiway_merge.h: Likewise.
* include/parallel/multiway_mergesort.h: Likewise.
* include/parallel/numericfwd.h: Likewise.
* testsuite/17_intro/names.cc: Add RAI to test macros.

Tested x86_64-linux (normal and parallel modes), committed to trunk.


commit 37e8a3375e8238a98d2a3624f701f40ea6a8fed3
Author: Jonathan Wakely 
Date:   Tue Oct 1 16:25:07 2019 +0100

Fix non-reserved names in Parallel Mode headers

* include/parallel/algo.h: Replace non-reserved names.
* include/parallel/multiway_merge.h: Likewise.
* include/parallel/multiway_mergesort.h: Likewise.
* include/parallel/numericfwd.h: Likewise.
* testsuite/17_intro/names.cc: Add RAI to test macros.

diff --git a/libstdc++-v3/include/parallel/algo.h 
b/libstdc++-v3/include/parallel/algo.h
index dc6971f5dd4..afa325bb4af 100644
--- a/libstdc++-v3/include/parallel/algo.h
+++ b/libstdc++-v3/include/parallel/algo.h
@@ -311,11 +311,11 @@ namespace __parallel
 { return _GLIBCXX_STD_A::unique_copy(__begin, __last, __out, __pred); }
 
   // Parallel unique_copy for random access iterators
-  template
-RandomAccessOutputIterator
+_RandomAccessOutputIterator
 __unique_copy_switch(_RAIter __begin, _RAIter __last,
-RandomAccessOutputIterator __out, _Predicate __pred,
+_RandomAccessOutputIterator __out, _Predicate __pred,
 random_access_iterator_tag, random_access_iterator_tag)
 {
   if (_GLIBCXX_PARALLEL_CONDITION(
diff --git a/libstdc++-v3/include/parallel/multiway_merge.h 
b/libstdc++-v3/include/parallel/multiway_merge.h
index 6bdf08b4704..c5f85881ace 100644
--- a/libstdc++-v3/include/parallel/multiway_merge.h
+++ b/libstdc++-v3/include/parallel/multiway_merge.h
@@ -232,7 +232,7 @@ namespace __gnu_parallel
*
* @return End iterator of output sequence.
*/
-  template class iterator,
+  template class iterator,
typename _RAIterIterator,
typename _RAIter3,
typename _DifferenceTp,
@@ -351,7 +351,7 @@ namespace __gnu_parallel
*
* @return End iterator of output sequence.
*/
-  template class iterator,
+  template class iterator,
typename _RAIterIterator,
typename _RAIter3,
typename _DifferenceTp,
@@ -641,8 +641,8 @@ namespace __gnu_parallel
   /** @brief Multi-way merging procedure for a high branching factor,
* requiring sentinels to exist.
*
-   * @tparam UnguardedLoserTree _Loser Tree variant to use for the unguarded
-   *   merging.
+   * @tparam _UnguardedLoserTree Loser Tree variant to use for the unguarded
+   *merging.
*
* @param __seqs_begin Begin iterator of iterator pair input sequence.
* @param __seqs_end End iterator of iterator pair input sequence.
@@ -653,7 +653,7 @@ namespace __gnu_parallel
*
* @return End iterator of output sequence.
*/
-  template
+  __target_end = multiway_merge_loser_tree_unguarded<_UnguardedLoserTree>
(__seqs_begin, __seqs_end, __target, __sentinel, __length, __comp);
 
 #if _GLIBCXX_PARALLEL_ASSERTIONS
diff --git a/libstdc++-v3/include/parallel/multiway_mergesort.h 
b/libstdc++-v3/include/parallel/multiway_mergesort.h
index d39fc9abc19..d382a2c92f6 100644
--- a/libstdc++-v3/include/parallel/multiway_mergesort.h
+++ b/libstdc++-v3/include/parallel/multiway_mergesort.h
@@ -264,19 +264,19 @@ namespace __gnu_parallel
   { __gnu_sequential::sort(__begin, __end, __comp); }
 };
 
-  template
+  typename _DiffType>
 struct __possibly_stable_multiway_merge
 { };
 
-  template
-struct __possibly_stable_multiway_merge
 {
-  void operator()(const Seq_RAIter& __seqs_begin,
- const Seq_RAIter& __seqs_end,
+  void operator()(const _Seq_RAIter& __seqs_begin,
+ const _Seq_RAIter& __seqs_end,
  const _RAIter& __target,
  _Compare& __comp,
  _DiffType __length_am) const
@@ -284,13 +284,13 @@ namespace __gnu_parallel
  __length_am, __comp, sequential_tag()); }
 };
 
-  template
-struct __possibly_stable_multiway_merge
 {
-  void operator()(const Seq_RAIter& __seqs_begin,
-  const Seq_RAIter& __seqs_end,
+  void operator()(const _Seq_RAIter& __seqs_begin,
+  const _Seq_RAIter& __seqs_end,
   const _RAIter& __target,
   _Compare& __comp,
   _DiffType __length_am) const
diff --git a/libstdc++-v3/include/parallel/numericfwd.h 
b/libstdc++-v3/include/parallel/numericfwd.h
index a9b8a2b1ea6..a1b1dc7571b 100644
--- a/libstdc++-v3/include/parallel/numericfwd.h
+++ b/libstdc++-v3/include/parallel/numericfwd.h
@@ -149,17 +149,17 @@ 

Re: [PATCH] IPA-CP release transformation summary (PR jit/91928)

2019-10-01 Thread Jeff Law
On 10/1/19 4:11 AM, Andrea Corallo wrote:
> Martin Jambor writes:
> 
>> Hi,
>>
>> On Mon, Sep 30 2019, Andrea Corallo wrote:
>>> Hi all,
>>> I'd like to submit this patch.
>>> It release the ipa cp transformation summary after functions being expanded.
>>> This is to fix the compiler when used with libgccjit on subsequent
>>> compilations (every new compilation should have a clean transformation
>>> summary).
>> if this is a general problem then I think we should instead add another
>> hook to class ipa_opt_pass_d to free transformation summary, call it for
>> all IPA passes at the appropriate time and implement it for IPA-CP. That
>> way it will work for all IPA passes which might have a transformation
>> summary.
>>
>> Martin
>>
>>
>>> Bootstrap on arm64 and X86-64.
>>>
>>> Bests
>>>   Andrea
>>>
>>> gcc/ChangeLog
>>> 2019-??-??  Andrea Corallo  
>>>
>>> * cgraphunit.c (expand_all_functions): Release ipcp_transformation_sum
>>> when finished.
>>> * ipa-prop.c (ipcp_free_transformation_sum): New function.
>>> * ipa-prop.h (ipcp_free_transformation_sum): Add declaration.
> Hi,
> actually looking around in order to implement the suggestions I realized
> that already some code was put in place in toplev::finalize calling
> then ipa_cp_c_finalize exactly for this purpose.
> 
> I've updated the patch accordingly.
> 
> Bootstraped on aarch64.
> 
> Is it okay for trunk?
> 
> Bests
>   Andrea
> 
> gcc/ChangeLog
> 2019-??-??  Andrea Corallo  
> 
>   * ipa-cp.c (ipa_cp_c_finalize): Release ipcp_transformation_sum.
>   * ipa-prop.c (ipcp_free_transformation_sum): New function.
>   * ipa-prop.h (ipcp_free_transformation_sum): Add declaration.
OK for the trunk.

jeff


Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-01 Thread Jeff Law
On 9/28/19 8:14 PM, Oleg Endo wrote:
> Hi,
> 
> I've been dragging this patch along with me for a while.
> At the moment, I don't have the resources to fully test it as requested
> by Ian in the PR discussion.
> 
> So I would like to ask for general comments on this one and hope that
> folks with bigger automated test setups can run the patch through their
> machinery for little endian targets.
> 
> 
> Summary of the story:
> 
> I've noticed this issue on the RX on GCC 6, but it seems it's been
> there forever.
> 
> On RX, fp-bit is used for software floating point emulation.  The RX
> target also uses "MS bit-field" layout by default.  This means that
> code like
> 
> struct
> {
>   fractype fraction:FRACBITS __attribute__ ((packed));
>   unsigned int exp:EXPBITS __attribute__ ((packed));
>   unsigned int sign:1 __attribute__ ((packed));
> } bits;
> 
> will result in sizeof (bits) != 8
> 
> For some reason, this bit-field style declaration is used only for
> FLOAT_BIT_ORDER_MISMATCH, which generally seems to be set for little
> endian targets.  In other cases (i.e. big endian) open coded bit field
> extraction and packing is used on the base integer type, like
> 
>  fraction = src->value_raw & fractype)1) << FRACBITS) - 1);
>  exp = ((int)(src->value_raw >> FRACBITS)) & ((1 << EXPBITS) - 1);
>  sign = ((int)(src->value_raw >> (FRACBITS + EXPBITS))) & 1;
> 
> This works of course regardless of the bit-field packing layout of the
> target.
> 
> Joseph suggested to pack the struct bit, which would fix the issue.  
> https://gcc.gnu.org/ml/gcc-bugs/2017-08/msg01651.html
> 
> However, I would like to propose to remove the special case of
> FLOAT_BIT_ORDER_MISMATCH altogether as in the attached patch.
> 
> Any comments?
> 
> Cheers,
> Oleg
> 
> 
> 
> libgcc/ChangeLog
> 
>   PR libgcc/77804
>   * fp-bit.h: Remove FLOAT_BIT_ORDER_MISMATCH.
>   * fp-bit.c (pack_d, unpack_d): Remove special cases for 
>   FLOAT_BIT_ORDER_MISMATCH.
>   * config/arc/t-arc: Remove FLOAT_BIT_ORDER_MISMATCH.
> 
So the ask is to just test this on some LE targets?  I can do that :-)

I'll throw it in.  Analysis will be slightly more difficult than usual
as we've got some fallout from Richard S's work, but it's certainly do-able.

Jeff

ps.  ANd yes, I've got a request to the build farm folks to get a
jenkins instance on the build farm.  Once that's in place I can have my
tester start publishing results that everyone can see.


Re: [PATCH] Help compiler detect invalid code

2019-10-01 Thread François Dumont

On 9/27/19 1:24 PM, Jonathan Wakely wrote:

On 20/09/19 07:08 +0200, François Dumont wrote:
I already realized that previous patch will be too controversial to 
be accepted.


In this new version I just implement a real memmove in __memmove so 


A real memmove doesn't just work backwards, it needs to detect any
overlaps and work forwards *or* backwards, as needed.
ok, good to know, I understand now why using __builtin_memcopy didn't 
show any performance enhancement when I tested it !


I think your change will break this case:

#include 

constexpr int f(int i, int j, int k)
{
 int arr[5] = { 0, 0, i, j, k };
 std::move(arr+2, arr+5, arr);
 return arr[0] + arr[1] + arr[2];
}

static_assert( f(1, 2, 3) == 6 );

This is valid because std::move only requires that the result iterator
is not in the input range, but it's OK for the two ranges to overlap.

I haven't tested it, but I think with your change the array will end
up containing {3, 2, 3, 2, 3} instead of {1, 2, 3, 2, 3}.

Indeed, I've added a std::move constexpr test in this new proposal which 
demonstrate that.


C++ Standard clearly states that [copy|move]_backward is done backward. 
So in this new proposal I propose to add a __memcopy used in copy/move 
and keep __memmove for *_backward algos. Both are using 
__builtin_memmove as before.



    * include/bits/stl_algobase.h (__memmove): Return void, loop as long as
    __n != 0.
    (__memcopy): New.
    (__copy_move<_IsMove, true, 
std::random_access_iterator_tag>::__copy_m):

    Adapt to use latter.
    (__copy_move_backward_a): Remove std::is_constant_evaluated block.
    * testsuite/25_algorithms/copy/constexpr.cc (test): Add check on copied
    values.
    * testsuite/25_algorithms/copy_backward/constexpr.cc (test): Likewise
    and rename in test1.
    (test2): New.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New.
    * testsuite/25_algorithms/copy_backward/constexpr.cc: New.
    * testsuite/25_algorithms/equal/constexpr_neg.cc: New.
    * testsuite/25_algorithms/move/constexpr.cc: New.
    * testsuite/25_algorithms/move/constexpr_neg.cc: New.

Tested under Linux x86_64.

Ok to commit ?

François

Index: include/bits/stl_algobase.h
===
--- include/bits/stl_algobase.h	(révision 276259)
+++ include/bits/stl_algobase.h	(copie de travail)
@@ -78,18 +78,18 @@
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /*
-   * A constexpr wrapper for __builtin_memmove.
+   * A constexpr wrapper for memcopy.
* @param __num The number of elements of type _Tp (not bytes).
*/
   template
 _GLIBCXX14_CONSTEXPR
-inline void*
-__memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+inline void
+__memcopy(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
 {
 #ifdef __cpp_lib_is_constant_evaluated
   if (std::is_constant_evaluated())
 	{
-	  for(; __num > 0; --__num)
+	  for (; __num != 0; --__num)
 	{
 	  if constexpr (_IsMove)
 		*__dst = std::move(*__src);
@@ -98,15 +98,40 @@
 	  ++__src;
 	  ++__dst;
 	}
-	  return __dst;
 	}
   else
 #endif
-	return __builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
-  return __dst;
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
 }
 
   /*
+   * A constexpr wrapper for memmove.
+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+_GLIBCXX14_CONSTEXPR
+inline void
+__memmove(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
+{
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+	{
+	  __dst += __num;
+	  __src += __num;
+	  for (; __num != 0; --__num)
+	{
+	  if constexpr (_IsMove)
+		*--__dst = std::move(*--__src);
+	  else
+		*--__dst = *--__src;
+	}
+	}
+  else
+#endif
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
+}
+
+  /*
* A constexpr wrapper for __builtin_memcmp.
* @param __num The number of elements of type _Tp (not bytes).
*/
@@ -446,7 +471,7 @@
 #endif
 	  const ptrdiff_t _Num = __last - __first;
 	  if (_Num)
-	std::__memmove<_IsMove>(__result, __first, _Num);
+	std::__memcopy<_IsMove>(__result, __first, _Num);
 	  return __result + _Num;
 	}
 };
@@ -676,12 +701,6 @@
 			 && __is_pointer<_BI2>::__value
 			 && __are_same<_ValueType1, _ValueType2>::__value);
 
-#ifdef __cpp_lib_is_constant_evaluated
-  if (std::is_constant_evaluated())
-	return std::__copy_move_backward::__copy_move_b(__first, __last,
-			__result);
-#endif
   return std::__copy_move_backward<_IsMove, __simple,
    _Category>::__copy_move_b(__first,
  __last,
Index: testsuite/25_algorithms/copy/constexpr.cc
===
--- testsuite/25_algorithms/copy/constexpr.cc	(révision 276259)
+++ testsuite/25_algorithms/copy/constexpr.cc	(copie de travail)
@@ -24,12 +24,12 @@
 constexpr bool
 test()
 {
-  constexpr std::array ca0{{0, 1, 2, 3, 4, 5, 

[committed, obvious] Regenerate `liboffloadmic/plugin/configure' for `uclinuxfdpiceabi'

2019-10-01 Thread Maciej W. Rozycki
Regenerate `liboffloadmic/plugin/configure' for r275564 ("[ARM/FDPIC v6 
02/24] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure 
scripts") too.

liboffloadmic/
* plugin/configure: Regenerate.
---
On Sun, 29 Sep 2019, Christophe Lyon wrote:

> > A change made with r275564 ("[ARM/FDPIC v6 02/24] [ARM] FDPIC: Handle
> > arm*-*-uclinuxfdpiceabi in configure scripts") to libtool.m4 has not
> > regenerated all the `configure' scripts affected.  Fix it.
> >
> 
> Oops sorry!
> I knew I forgot something

 No worries.  Here's another one, which I missed as I only checked top 
level subdirectories.

  Maciej
---
 liboffloadmic/plugin/configure |   22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

gcc-configure-uclinux-regen-more.diff
Index: gcc/liboffloadmic/plugin/configure
===
--- gcc.orig/liboffloadmic/plugin/configure
+++ gcc/liboffloadmic/plugin/configure
@@ -5746,7 +5746,7 @@ irix5* | irix6* | nonstopux*)
   ;;
 
 # This must be Linux ELF.
-linux* | k*bsd*-gnu | kopensolaris*-gnu)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
@@ -8825,7 +8825,7 @@ _LT_EOF
   archive_expsym_cmds='sed "s,^,_," $export_symbols 
>$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs 
$compiler_flags ${wl}-h,$soname 
${wl}--retain-symbols-file,$output_objdir/$soname.expsym 
${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
   ;;
 
-gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu)
+gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
   tmp_diet=no
   if test "$host_os" = linux-dietlibc; then
case $cc_basename in
@@ -10356,7 +10356,12 @@ linux*oldld* | linux*aout* | linux*coff*
   ;;
 
 # This must be Linux ELF.
-linux* | k*bsd*-gnu | kopensolaris*-gnu)
+
+# uclinux* changes (here and below) have been submitted to the libtool
+# project, but have not yet been accepted: they are GCC-local changes
+# for the time being.  (See
+# https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg0.html)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
   version_type=linux
   need_lib_prefix=no
   need_version=no
@@ -11045,7 +11050,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11048 "configure"
+#line 11053 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11151,7 +11156,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11154 "configure"
+#line 11159 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -14016,7 +14021,12 @@ linux*oldld* | linux*aout* | linux*coff*
   ;;
 
 # This must be Linux ELF.
-linux* | k*bsd*-gnu | kopensolaris*-gnu)
+
+# uclinux* changes (here and below) have been submitted to the libtool
+# project, but have not yet been accepted: they are GCC-local changes
+# for the time being.  (See
+# https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg0.html)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
   version_type=linux
   need_lib_prefix=no
   need_version=no


Re: [PATCH] Fix PR fortran/91716

2019-10-01 Thread Bernd Edlinger
On 9/13/19 12:16 PM, Janne Blomqvist wrote:
> On Fri, Sep 13, 2019 at 1:07 PM Bernd Edlinger
>  wrote:
>>
>> Hi,
>>
>> this fixes a test case where a short string constant is put in a larger 
>> memory object.
>>
>> The consistency check in varasm.c is failed because both types should agree.
>>
>> Since the failed assertion is just a gcc_checking_assert I think a back-port 
>> of this fix
>> to the gcc-9 branch will not be necessary.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
> 
> Ok.
> 

Well, I have mistakenly assumed that this triggers only a "checking" assert,
but it turned out that is not the case, as written in last comment in the BZ,
immediately after that gcc_checking_assert, there is a gcc_assert, and also
an ICE in the gcc-9 branch.  The same patch fixes also the second problem,
and survives reg-bootstrap and testing on x86_64-pc-linux-gnu as expected.


So I would like to ask at this time, if it is also OK for gcc-9 ?


Thanks
Bernd.


Re: [PATCH] Come up with json::integer_number and use it in GCOV.

2019-10-01 Thread Jeff Law
On 9/18/19 2:04 AM, Martin Liška wrote:
> PING^4
> 
> Just note that the author of the JSON implementation
> in GCC is fine with the patch ;)
OK if this is still pending :-)
jeff


Re: [SVE] PR91532

2019-10-01 Thread Jeff Law
On 10/1/19 12:40 AM, Richard Biener wrote:
> On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> 
>> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
>>>
>>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
>>>
 On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
>
> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
>> Hi,
>> For PR91532, the dead store is trivially deleted if we place dse pass
>> between ifcvt and vect. Would it be OK to add another instance of dse 
>> there ?
>> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
>> will clean up the dead store ?
> I'd hesitate to add another DSE pass.  If there's one nearby could we
> move the existing pass?
 Well I think the nearest one is just after pass_warn_restrict. Not
 sure if it's a good
 idea to move it up from there ?
>>>
>>> You'll need it inbetween ifcvt and vect so it would be disabled
>>> w/o vectorization, so no, that doesn't work.
>>>
>>> ifcvt already invokes SEME region value-numbering so if we had
>>> MESE region DSE it could use that.  Not sure if you feel like
>>> refactoring DSE to work on regions - it currently uses a DOM
>>> walk which isn't suited for that.
>>>
>>> if-conversion has a little "local" dead predicate compute removal
>>> thingy (not that I like that), eventually it can be enhanced to
>>> do the DSE you want?  Eventually it should be moved after the local
>>> CSE invocation though.
>> Hi,
>> Thanks for the suggestions.
>> For now, would it be OK to do "dse" on loop header in
>> tree_if_conversion, as in the attached patch ?
>> The patch does local dse in a new function ifcvt_local_dse instead of
>> ifcvt_local_dce, because it needed to be done after RPO VN which
>> eliminates:
>> Removing dead stmt _ifc__62 = *_55;
>> and makes the following store dead:
>> *_55 = _ifc__61;
> 
> I suggested trying to move ifcvt_local_dce after RPO VN, you could
> try that as independent patch (pre-approved).
> 
> I don't mind the extra walk though.
> 
> What I see as possible issue is that dse_classify_store walks virtual
> uses and I'm not sure if the loop exit is a natural boundary for
> such walk (eventually the loop header virtual PHI is reached but
> there may also be a loop-closed PHI for the virtual operand,
> but not necessarily).  So the question is whether to add a
> "stop at" argument to dse_classify_store specifying the virtual
> use the walk should stop at?
I think we want to stop at the block boundary -- aren't the cases we
care about here local to a block?

jeff


Re: [PATCH] [MIPS] Fix PR target/91769

2019-10-01 Thread Jeff Law
On 9/25/19 1:16 AM, Dragan Mladjenovic wrote:
> From: "Dragan Mladjenovic" 
> 
> This fixes the issue by checking that addr's base reg is not part of dest
> multiword reg instead just checking the first reg of dest.
> 
> gcc/ChangeLog:
> 
> 2019-09-25  Dragan Mladjenovic  
> 
>   PR target/91769
>   * config/mips/mips.c (mips_split_move): Use reg_overlap_mentioned_p
>   instead of REGNO equality check on addr.reg.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-09-25  Dragan Mladjenovic  
> 
>   PR target/91769
>   * gcc.target/mips/pr91769.c: New test.
OK.  This would seem fine to backport to gcc-9 as well.  I don't think
gcc-8 had this code.

Sorry for introducing the problem (this looks like mine :(

jeff


Re: [EXTERNAL]Re: [RFC/PATCH v2][PR89245] Check REG_CALL_DECL note during the tail-merging

2019-10-01 Thread Jeff Law
On 9/6/19 4:23 AM, Dragan Mladjenovic wrote:
> On 24.07.2019. 20:57, Jeff Law wrote:
>> On 7/17/19 2:29 AM, Dragan Mladjenovic wrote:
>>>
>>>
>>> On 09.07.2019. 23:21, Jeff Law wrote:
 On 7/9/19 2:06 PM, Dragan Mladjenovic wrote:
> This patch prevents merging of CALL instructions that that have different
> REG_CALL_DECL notes attached to them.
>
> On most architectures this is not an important distinction. Usually 
> instruction patterns
> for calls to different functions reference different SYMBOL_REF-s, so 
> they won't match.
> On MIPS PIC calls get split into an got_load/*call_internal pair where 
> the latter represents
> indirect register call w/o SYMBOL_REF attached (until machine_reorg 
> pass). The bugzilla issue
> had such two internal_call-s merged despite the fact that they had 
> different register usage
> information assigned by ipa-ra.
>
> As per comment form Richard Sandiford, this version compares reg usage 
> for both call
> instruction instead of shallow comparing the notes. Tests updated 
> accordingly.
>
> gcc/ChangeLog:
>
> 2019-07-09  Dragan Mladjenovic  
>
>   * cfgcleanup.c (old_insns_match_p): Check if used hard regs set is equal
>   for both call instructions.
>
> gcc/testsuite/ChangeLog:
>
> 2019-07-09  Dragan Mladjenovic  
>
>   * gcc.target/mips/cfgcleanup-jalr1.c: New test.
>   * gcc.target/mips/cfgcleanup-jalr2.c: New test.
>   * gcc.target/mips/cfgcleanup-jalr3.c: New test.
 THanks.  I've installed this on the trunk.

 jeff
>>> Thanks. Can this be back-ported to active branches also. This issue
>>> seems to be there > since gcc6 if not gcc5.
>> I've asked Matthew to handle the backport.  I'm going to be on PTO the
>> next couple weeks.
>>
>> jeff
>>
> 
> Hi,
> 
> Sorry, I forgot to check up on this patch. Is it still ok for me to try 
> to backport it to gcc 9 and gcc 8 branches?
Yes, this would be fine to backport to gcc-8 and gcc-9 branches.  I'd
expect it to be pretty easy as I don't think old_insns_match_p has
changed much.

Jeff


Re: [ping][PR target/85401] initialize the move cost table before using it

2019-10-01 Thread Jeff Law
On 9/30/19 2:45 PM, co...@sdf.org wrote:
> On Mon, Sep 30, 2019 at 11:46:24AM -0400, Vladimir Makarov wrote:
>> Yes, the patch is mostly ok.  You can commit it into the trunk after
>> applying changes mentioned below. If you do not have a write access, let me
>> know I'll commit the patch by myself.
> 
> I don't have commit access. It would be nice if you committed it :)
I took care of the nits and committed the patch.


> 
>> It would be nice to add a small test too.  But it is not obligatory for this
>> case as the patch is obvious and it might be hard to create a small test to
>> reproduce the bug.
> 
> I have the C code that produces this failure. I can creduce it, but I'm
> not sure there's a relationship between it and the bug.
> Doing unrelated changes (adding instruction scheduling) to vax also hid it.
> 
> Is this kind of test still valuable?
Often they are.

jeff


Re: [PATCH 2/2] libada: Respect `--enable-version-specific-runtime-libs'

2019-10-01 Thread Maciej W. Rozycki
On Fri, 27 Sep 2019, Arnaud Charlet wrote:

> >  Shall I amend the change description anyhow then?  I know it has not (as 
> > yet, as discussed at the GNU Tools Cauldron recently) been enforced for 
> > the GCC project (unlike with e.g. glibc), however I mean to use it whole 
> > as the commit message, which is what I have been doing for quite a while 
> > now, because I recognise the value of change descriptions for future 
> > repository examination.
> 
> Sure, add any needed clarification.

 I have added this:

This lets one have `libgnarl-10.so' and `libgnat-10.so' installed in say
/usr/lib and /usr/$(target_alias)/lib for a native and a cross-build
respectively, rather than in /usr/lib/gcc/$(target_alias)/10.0.0/adalib.

and committed the change now.

  Maciej


Re: [01/32] Add function_abi.{h,cc}

2019-10-01 Thread Bernd Edlinger
Hi,

I am currently trying to implement -Wshadow=local, and
this patch triggers a build-failure with -Wshadow=local
since i is a parameter that is the regno.
But it is also used as loop variable,
so I think this introduces probably a bug:

> @@ -728,7 +731,11 @@ globalize_reg (tree decl, int i)
>   appropriate regs_invalidated_by_call bit, even if it's already
>   set in fixed_regs.  */
>if (i != STACK_POINTER_REGNUM)
> -SET_HARD_REG_BIT (regs_invalidated_by_call, i);
> +{
> +  SET_HARD_REG_BIT (regs_invalidated_by_call, i);
> +  for (unsigned int i = 0; i < NUM_ABI_IDS; ++i)
> + function_abis[i].add_full_reg_clobber (i);
> +}


I would think you meant:

for (unsigned int j = 0; j < NUM_ABI_IDS; ++j)
  function_abis[j].add_full_reg_clobber (i);

Right?

Thanks
Bernd.


Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-01 Thread Jason Merrill

On 10/1/19 3:34 AM, Richard Biener wrote:

On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:


My comments accidentally got lost.

Several places in the front-end (and elsewhere) use the same lazy
allocation pattern for hash_maps, and this patch replaces them with
hash_map_safe_* functions like vec_safe_*.  They don't provide a way
to specify an initial size, but I don't think that's a significant
issue.

Tested x86_64-pc-linux-gnu.  OK for trunk?


You are using create_ggc but the new functions do not indicate that ggc
allocation is done.
It's then also incomplete with not having a non-ggc variant
of them?  Maybe I'm missing something.


Ah, I had been thinking that this lazy pattern would only be used with 
ggc, but I see that I was wrong.  How's this?


Incidentally, now I see another C++11 feature I'd like to be able to 
use: default template arguments for function templates.


commit c091a74dc7b1550d7dd633dbde7ed7931cd4b025
Author: Jason Merrill 
Date:   Mon Sep 30 13:17:27 2019 -0400

Add some hash_map_safe_* functions like vec_safe_*.

gcc/
* hash-map.h (default_hash_map_size): New variable.
(create_ggc): Use it as default argument.
(hash_map_maybe_create, hash_map_safe_get)
(hash_map_safe_get_or_insert, hash_map_safe_put): New fns.
gcc/cp/
* constexpr.c (maybe_initialize_fundef_copies_table): Remove.
(get_fundef_copy): Use hash_map_safe_get_or_insert.
* cp-objcp-common.c (cp_get_debug_type): Use hash_map_safe_*.
* decl.c (store_decomp_type): Remove.
(cp_finish_decomp): Use hash_map_safe_put.
* init.c (get_nsdmi): Use hash_map_safe_*.
* pt.c (store_defaulted_ttp, lookup_defaulted_ttp): Remove.
(add_defaults_to_ttp): Use hash_map_safe_*.

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index ba20fe79f23..ce98d3431c3 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
removed.  Objects of hash_map type are copy-constructible but not
assignable.  */
 
+const size_t default_hash_map_size = 13;
 template,
 			Value> */>
@@ -129,7 +130,7 @@ class GTY((user)) hash_map
   };
 
 public:
-  explicit hash_map (size_t n = 13, bool ggc = false,
+  explicit hash_map (size_t n = default_hash_map_size, bool ggc = false,
 		 bool sanitize_eq_and_hash = true,
 		 bool gather_mem_stats = GATHER_STATISTICS
 		 CXX_MEM_STAT_INFO)
@@ -146,7 +147,7 @@ public:
 	   HASH_MAP_ORIGIN PASS_MEM_STAT) {}
 
   /* Create a hash_map in ggc memory.  */
-  static hash_map *create_ggc (size_t size,
+  static hash_map *create_ggc (size_t size = default_hash_map_size,
 			   bool gather_mem_stats = GATHER_STATISTICS
 			   CXX_MEM_STAT_INFO)
 {
@@ -326,4 +327,43 @@ gt_pch_nx (hash_map *h, gt_pointer_operator op, void *cookie)
   op (>m_table.m_entries, cookie);
 }
 
+enum hm_alloc { hm_heap = false, hm_ggc = true };
+template
+inline hash_map *
+hash_map_maybe_create (hash_map *)
+{
+  if (!h)
+{
+  if (ggc)
+	h = hash_map::create_ggc (size);
+  else
+	h = new hash_map (size);
+}
+  return h;
+}
+
+/* Like h->get, but handles null h.  */
+template
+inline V*
+hash_map_safe_get (hash_map *h, const K& k)
+{
+  return h ? h->get (k) : NULL;
+}
+
+/* Like h->get, but handles null h.  */
+template
+inline V&
+hash_map_safe_get_or_insert (hash_map *, const K& k, bool *e = NULL)
+{
+  return hash_map_maybe_create (h)->get_or_insert (k, e);
+}
+
+/* Like h->put, but handles null h.  */
+template
+inline bool
+hash_map_safe_put (hash_map *, const K& k, const V& v)
+{
+  return hash_map_maybe_create (h)->put (k, v);
+}
+
 #endif
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index cb5484f4b72..6366872695a 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1098,15 +1098,6 @@ maybe_initialize_constexpr_call_table (void)
 
 static GTY(()) hash_map *fundef_copies_table;
 
-/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
-
-static void
-maybe_initialize_fundef_copies_table ()
-{
-  if (fundef_copies_table == NULL)
-fundef_copies_table = hash_map::create_ggc (101);
-}
-
 /* Reuse a copy or create a new unshared copy of the function FUN.
Return this copy.  We use a TREE_LIST whose PURPOSE is body, VALUE
is parms, TYPE is result.  */
@@ -1114,11 +1105,10 @@ maybe_initialize_fundef_copies_table ()
 static tree
 get_fundef_copy (constexpr_fundef *fundef)
 {
-  maybe_initialize_fundef_copies_table ();
-
   tree copy;
   bool existed;
-  tree *slot = _copies_table->get_or_insert (fundef->decl, );
+  tree *slot = &(hash_map_safe_get_or_insert
+		 (fundef_copies_table, fundef->decl, ));
 
   if (!existed)
 {
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 4369a5b5570..990e3f04db9 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -145,11 +145,9 @@ 

[committed v3 1/2] libada: Remove racy duplicate gnatlib installation

2019-10-01 Thread Maciej W. Rozycki
For some reason, presumably historical, the `install-gnatlib' target for 
the default multilib is invoked twice, once via the `ada.install-common' 
target in `gcc/ada/gcc-interface/Make-lang.in' invoked from gcc/ and 
again via the `install-libada' target in libada/.

Apart from doing the same twice this is actually harmful in sufficiently 
parallelized `make' invocation, as the removal of old files performed 
within the `install-gnatlib' recipe in the former case actually races 
with the installation of new files done in the latter case, causing the 
recipe to fail and abort, however non-fatally, having not completed the 
installation of all the built files needed for the newly-built compiler 
to work correctly.

This can be observed with a native `x86_64-linux-gnu' bootstrap:

make[4]: Entering directory '.../gcc/ada'
rm -rf .../lib/gcc/x86_64-linux-gnu/10.0.0/adalib
rm: cannot remove '.../lib/gcc/x86_64-linux-gnu/10.0.0/adalib': Directory not 
empty
make[4]: *** [gcc-interface/Makefile:512: install-gnatlib] Error 1
make[4]: Leaving directory '.../gcc/ada'
make[3]: *** [.../gcc/ada/gcc-interface/Make-lang.in:853: install-gnatlib] 
Error 2
make[2]: [.../gcc/ada/gcc-interface/Make-lang.in:829: ada.install-common] Error 
2 (ignored)

which then causes missing files to be reported when an attempt is made 
to use the newly-installed non-functional compiler to build a 
`riscv-linux-gnu' cross-compiler:

(cd ada/bldtools/sinfo; gnatmake -q xsinfo ; ./xsinfo sinfo.h )
error: "ada.ali" not found, "ada.ads" must be compiled
error: "s-memory.ali" not found, "s-memory.adb" must be compiled
gnatmake: *** bind failed.
/bin/sh: ./xsinfo: No such file or directory
make[2]: *** [.../gcc/ada/Make-generated.in:45: ada/sinfo.h] Error 127
make[2]: Leaving directory '.../gcc'
make[1]: *** [Makefile:4369: all-gcc] Error 2
make[1]: Leaving directory '...'
make: *** [Makefile:965: all] Error 2

Depending on timing `.../lib/gcc/x86_64-linux-gnu/10.0.0/adainclude' may
cause an installation failure instead and the resulting compiler may be 
non-functional in a different way.

Only invoke `install-gnatlib' from within gcc/ then if a legacy build 
process is being used with libada disabled and gnatlib built manually 
with `make -C gcc gnatlib'.

gcc/
* Makefile.in (gnat_install_lib): New variable.
* configure.ac: Substitute it.
* configure: Regenerate.

gcc/ada/
* gcc-interface/Make-lang.in (ada.install-common): Split into...
(gnat-install-tools, gnat-install-lib): ... these.
---
On Tue, 1 Oct 2019, Arnaud Charlet wrote:

> >  I have verified this change by running my combined build process where a 
> > native `x86_64-linux-gnu' configuration is built first and then used to 
> > build a `riscv64-linux-gnu' cross-compiler, both with `--disable-libada' 
> > specified, without and with this patch applied.  I have added `make -C gcc 
> > gnatlib && make -C gcc gnattools' as an extra build step before `make 
> > install'.
> > 
> >  This has run up to failing to find `riscv64-linux-gnu' system headers in 
> > `make -C gcc gnatlib' as noted above, at which point the installation 
> > trees had both the same contents, including `x86_64-linux-gnu' gnatlib 
> > development files and static libraries as well as gnattools in particular.
> 
> Can you also please do a native build with --disable-libada and
> make -C gcc gnatlib && make -C gcc gnattools && make install
> ?

 I had actually done that already, as described in the first paragraph 
quoted above.

> Once successful, the change is OK, thanks for the extra work.

 Here's the final version I have committed then, with the small adjustment 
mentioned earlier on and having brought the formatting of the commit 
description broken in v2 back to order.  Thank you for your review.

  Maciej

Changes from v1:

- gnatlib installation now retained in gcc/ada/gcc-interface/Make-lang.in 
  and instead invoked iff `--disable-libada' has been requested at the top 
  level.

Changes from v2:

- use an ordering dependency only between `gnat-install-lib' and
  `gnat-install-tools'.
---
 gcc/Makefile.in|4 
 gcc/ada/gcc-interface/Make-lang.in |5 -
 gcc/configure  |   15 +--
 gcc/configure.ac   |   10 ++
 4 files changed, 31 insertions(+), 3 deletions(-)

gcc-lang-no-install-gnatlib.diff
Index: gcc/gcc/Makefile.in
===
--- gcc.orig/gcc/Makefile.in
+++ gcc/gcc/Makefile.in
@@ -1706,6 +1706,10 @@ $(FULL_DRIVER_NAME): ./xgcc
 # language hooks, generated by configure
 @language_hooks@
 
+# Wire in install-gnatlib invocation with `make install' for a configuration
+# with top-level libada disabled.
+gnat_install_lib = @gnat_install_lib@
+
 # per-language makefile fragments
 ifneq ($(LANG_MAKEFRAGS),)
 include $(LANG_MAKEFRAGS)
Index: gcc/gcc/ada/gcc-interface/Make-lang.in

Re: [PATCH 2/2][GCC][RFC][middle-end]: Add complex number arithmetic patterns to SLP pattern matcher.

2019-10-01 Thread Toon Moene

On 10/1/19 1:39 PM, Tamar Christina wrote:


The patterns work by looking at the sequence produced after GCC lowers complex
numbers.  As such they would match any normal operation that does the same
computations.


Thanks - I didn't understand Ramana's comments during the GNU Tools 
Cauldron about this feature, but now I do.


Can't wait to put my (upcoming) ThunderX hardware to work on this (plus 
that I have to teach *a lot* of 30-year+ Fortran programmers that you do 
not have to lower COMPLEX arithmetic yourself, because the compiler will 
do this optimally for you ...).


Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


FIX ICE building cactusBSSN

2019-10-01 Thread Jan Hubicka
Hi,
sorry for taking so long to get to this.  This patch fixes the ICE which
happens when we try to output warning about anonymous type.
As Jason explains in the PR log the warning is correct and I think we
should warn at compile time when parsing 
$ cat 2.ii
extern "C" {
struct {
} admbaserest_;
}
as there seems to be no way to use admbaserest_ from other translation
unit in standard conforming way?

Honza

PR lto/91222
* ipa-devirt.c (warn_types_mismatch): Do not ICE when anonymous type
is matched with non-C++ type
* g++.dg/lto/odr-6_0.C: New testcase.
* g++.dg/lto/odr-6_1.c: New testcase.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 276272)
+++ ipa-devirt.c(working copy)
@@ -992,14 +992,14 @@ warn_types_mismatch (tree t1, tree t2, l
  std::swap (t1, t2);
  std::swap (loc_t1, loc_t2);
}
-  gcc_assert (TYPE_NAME (t1) && TYPE_NAME (t2)
- && TREE_CODE (TYPE_NAME (t1)) == TYPE_DECL
- && TREE_CODE (TYPE_NAME (t2)) == TYPE_DECL);
+  gcc_assert (TYPE_NAME (t1)
+ && TREE_CODE (TYPE_NAME (t1)) == TYPE_DECL);
   tree n1 = TYPE_NAME (t1);
-  tree n2 = TYPE_NAME (t2);
+  tree n2 = TYPE_NAME (t2) ? TYPE_NAME (t2) : NULL;
+
   if (TREE_CODE (n1) == TYPE_DECL)
n1 = DECL_NAME (n1);
-  if (TREE_CODE (n2) == TYPE_DECL)
+  if (n2 && TREE_CODE (n2) == TYPE_DECL)
n2 = DECL_NAME (n2);
   /* Most of the time, the type names will match, do not be unnecesarily
  verbose.  */
Index: testsuite/g++.dg/lto/odr-6_0.C
===
--- testsuite/g++.dg/lto/odr-6_0.C  (nonexistent)
+++ testsuite/g++.dg/lto/odr-6_0.C  (working copy)
@@ -0,0 +1,8 @@
+// { dg-lto-do link }
+extern "C" {
+struct {  // { dg-lto-message "" 2 }
+} admbaserest_;
+}
+int main()
+{
+}
Index: testsuite/g++.dg/lto/odr-6_1.c
===
--- testsuite/g++.dg/lto/odr-6_1.c  (nonexistent)
+++ testsuite/g++.dg/lto/odr-6_1.c  (working copy)
@@ -0,0 +1,4 @@
+struct {} admbaserest_; // { dg-lto-message "type of " 2 }
+
+
+


Re: [PATCH] add __has_builtin (PR 66970)

2019-10-01 Thread Martin Sebor

On 10/1/19 11:38 AM, Jakub Jelinek wrote:

On Tue, Oct 01, 2019 at 11:16:10AM -0600, Martin Sebor wrote:

Attached is an implementation of the __has_builtin special
preprocessor operator/macro analogous to __has_attribute and
(hopefully) compatible with the synonymous Clang feature (I
couldn't actually find tests for it in the Clang test suite
but if someone points me at them I'll verify it).


For the __builtin_*/__sync_*/__atomic_* etc. builtins whether something
is a builtin or not is quite clear, basically whether if using the right
operands for it will compile and do something.
For the library functions, what does it mean that something is a builtin
though?  In some cases we have them in the builtin tables only
to have some predefined attributes for them, in other cases because the
compiler is aware of some of their special properties, in other cases that
the compiler will sometimes optimize them into something else or inline them
and at other times keep them as a library calls, in some cases only for the
compiler to be able to emit calls to those routines in generated code,
in other cases to always optimize them/inline them and never emit the
library function (which perhaps doesn't exist).
I believe that for some builtins like stpcpy we treat them as builtins only
if we see them prototyped first.
And then for C++ with namespaces and symbol lookup rules, whether something
is or isn't a builtin depends not just on the identifier, but namespace too
and maybe argument types too.

So, what do we want __has_builtin to mean for those?

Say, do we want __has_builtin (_exit) just because we have it in the tables
to 1) add noreturn/nothrow/leaf attributes to it 2) consider it in branch
prediction heuristics, but otherwise don't do anything special with it?


As I understand it from the Clang manual and my limited testing, 
__has_builtin is supposed to evaluate to true if the argument is

recognized as the name of a built-in function or function-like
operator.

The number of arguments to the built-in or their types are not
considered in this context (or even available).  Neither are
other declarations of the same symbol.

The intent is simply to determine whether the name can be used
but not how or to what effect.

This is similar to other such queries like __has_attribute which
also doesn't consider the number of attribute operands, or what
the attribute can apply to, or what its effect might be.  Likewise
for __has_include which evaluates to true when the referenced
header can be found by an #include directive without implying
that #including it will not trigger errors for what's in
the header.

Martin

PS As an example, this complete test case passes with both
compilers unless abs is disabled with -fno-builtin.

  #if !__has_builtin (abs)
  # error "__has_builtin (abs) --> false"
  #endif

Ditto if abs is declared as a different symbol first, effectively
preventing the built-in abs from being used.


Re: [patch] range-ops contribution

2019-10-01 Thread Jeff Law
On 10/1/19 11:11 AM, Aldy Hernandez wrote:
> Hi folks.
> 
> Here is my official submission of the range-ops part of the ranger to
> mainline.
> 
> I realize that I could have split this patch up into 2-3 separate ones,
> but I don't want to run into the chicken-and-egg scenario of last time,
> where I had 4 inter-connected patches that were hard to review
> independently.
It might have helped a bit, but it was pretty easy to find the mapping
from bits in wide-int-range.cc into range-op.cc -- the comments were
copied :-)

> 
> A few notes that may help in reviewing.
> 
> The range-ops proper is in range-op.*.
> 
> The range.* files are separate files containing some simple auxiliary
> functions that will have irange and value_range_base counterparts.  Our
> development branch will have #define value_range_base irange, and some
> auxiliary glue, but none of that will be in trunk.  As promised, trunk
> is all value_range_base.
> 
> * The changes to tree-vrp.* are:
> 
> 1. New constructors to align the irange and value_range_base APIs.  We
> discussed this a few months ago, and these were the agreed upon changes
> to the API.
Right.

> 
> 2. Extracting the symbolic handling of PLUS/MINUS and POINTER_PLUS_EXPR
> into separate functions (extract_range_from_plus_minus_expr and
> extract_range_from_pointer_plus_expr).
In retrospect we should have broken down that function in the old vrp
code.  I suspect that function started out relatively small and just
kept expanding over time into the horrid mess that became.

THere were a number of places where you ended up pulling code from two
existing locations into a single point in range-ops.  But again, it was
just a matter of finding the multiple original source points and mapping
then into their new location in range-ops.cc, using the copied comments
as a guide.

> 
> 3. New range_fold_unary_expr and range_fold_binary_expr functions. These
> are the VRP entry point into range-ops.  They normalize symbolics and do
> some minor pre-shuffling before calling range-ops to do the actual range
> operation.
Right.  I see these as primarily an adapter between existing code and
the new range ops.

> 
> (I have some minor shuffling of these two functions that I'd like to
> post as separate clean-ups, but I didn't want to pollute this patchset
> with them: Fedora taking forever to test and all.)
Works for me.


> 5. Removing the wide-int-range.* files.  Most of the code is now
> in-lined into range-op.cc with the exception of
> wide_int_range_set_zero_nonzero_bits which has been moved into tree-vrp.c.
Right.  Largely follows from #2 above.

> 
> I think that's all!
> 
> For testing this patchset, I left the old extract_*ary_expr_code in, and
> added comparison code that trapped if there were any differences between
> what VRP was getting and what range-ops calculated.  I found no
> regressions in either a full bootstrap/tests (all languages), or with a
> full Fedora build.  As a bonus, we found quite a few cases where
> range-ops was getting better results.
So to provide a bit more info here.  We ran tests back in the spring
which resulted in various bugfixes/improvements.  Aldy asked me to
re-run with their more recent branch.  That run exposed one very clear
ranger bug which Aldy fixed prior to submitting this patch as well as
several cases where the results differed.  We verified each and every
one of them was a case where Ranger was getting better results.

> (Note: At the last minute, Jeff found one regression in the multi-day
> Fedora build.  I will fix this as a follow-up.  BTW, it does not cause
> any regressions in a bootstrap or GCC test-run, just a discrepancy on
> one specific corner case between VRP and range-ops.)
Right.  WHat happened was there was a package that failed to build due
to the Fortran front-end getting tighter in its handling of argument
checking.  Once that (and various other issues related to using a gcc-10
snapshot) was worked around I rebuilt the failing packages.  That in
turn exposed another case where ranger and vrp differed in their results
(it's a MULT+overflow case IIRC)  ANyway, I'm leaving it to you to
analyze :-)


[ ... ]

> 
> The attached patch is based off of trunk from a few weeks ago.  If
> approved, I will merge and re-test again with latest trunk.  I won't
> however, test all of Fedora :-P.
Agreed, I don't think that's necessary.  FWIW, using a month-old branch
for testing was amazingly helpful in other respects.  We found ~100
packages that need updating for gcc-10 as well as a few bugs unrelated
to Ranger.  I've actually got Sunday's snapshot spinning now and fully
expect to be spinning Fedora builds with snapshots for the next several
months.  So I don't expect a Fedora build just to test after ranger
integration, but instead that it'll "just happen" on a subsequent snapshot.

> 
> May I be so bold as to suggest that if there are minor suggestions that
> arise from this review, that they be done as follow-ups?  I'd like to
> get 

Re: [PATCH] add __has_builtin (PR 66970)

2019-10-01 Thread Jakub Jelinek
On Tue, Oct 01, 2019 at 11:16:10AM -0600, Martin Sebor wrote:
> Attached is an implementation of the __has_builtin special
> preprocessor operator/macro analogous to __has_attribute and
> (hopefully) compatible with the synonymous Clang feature (I
> couldn't actually find tests for it in the Clang test suite
> but if someone points me at them I'll verify it).

For the __builtin_*/__sync_*/__atomic_* etc. builtins whether something
is a builtin or not is quite clear, basically whether if using the right
operands for it will compile and do something.
For the library functions, what does it mean that something is a builtin
though?  In some cases we have them in the builtin tables only
to have some predefined attributes for them, in other cases because the
compiler is aware of some of their special properties, in other cases that
the compiler will sometimes optimize them into something else or inline them
and at other times keep them as a library calls, in some cases only for the
compiler to be able to emit calls to those routines in generated code,
in other cases to always optimize them/inline them and never emit the
library function (which perhaps doesn't exist).
I believe that for some builtins like stpcpy we treat them as builtins only
if we see them prototyped first.
And then for C++ with namespaces and symbol lookup rules, whether something
is or isn't a builtin depends not just on the identifier, but namespace too
and maybe argument types too.

So, what do we want __has_builtin to mean for those?

Say, do we want __has_builtin (_exit) just because we have it in the tables
to 1) add noreturn/nothrow/leaf attributes to it 2) consider it in branch
prediction heuristics, but otherwise don't do anything special with it?

Jakub


[PATCH] add __has_builtin (PR 66970)

2019-10-01 Thread Martin Sebor

Attached is an implementation of the __has_builtin special
preprocessor operator/macro analogous to __has_attribute and
(hopefully) compatible with the synonymous Clang feature (I
couldn't actually find tests for it in the Clang test suite
but if someone points me at them I'll verify it).

Tested on x86_64-linux.

Martin

PS I couldn't find an existing API to test whether a reserved
symbol like __builtin_offsetof is a function-like built-in so
I hardwired the tests for C and C++ into the new names_builtin_p
functions.  I don't like this very much because the next time
such an operator is added there is nothing to remind us to update
the functions.  Adding a flag to the c_common_reswords array would
solve the problem but at the expense of a linear search through
it.  Does anyone have a suggestion for how to do this better?
PR c/66970 - Add __has_builtin() macro

gcc/ChangeLog:
	PR c/66970
	* doc/cpp.texi (__has_builtin): Document.
	* doc/extend.texi (__builtin_frob_return_addr): Correct spelling.

gcc/c/ChangeLog:

	PR c/66970
	* c-decl.c (names_builtin_p): Define a new function.

gcc/c-family/ChangeLog:

	PR c/66970
	* c-common.c (c_common_nodes_and_builtins): Call c_define_builtins
	even when only preprocessing.
	* c-common.h (names_builtin_p): Declare new function.
	* c-lex.c (init_c_lex): Set has_builtin.
	(c_common_has_builtin): Define a new function.
	* c-ppoutput.c (init_pp_output): Set has_builtin.

gcc/cp/ChangeLog:

	PR c/66970
	* cp-objcp-common.c (names_builtin_p): Define new function.

gcc/testsuite/ChangeLog:

	PR c/66970
	* c-c++-common/cpp/has-builtin-2.c: New test.
	* c-c++-common/cpp/has-builtin-3.c: New test.
	* c-c++-common/cpp/has-builtin.c: New test.

libcpp/ChangeLog:

	PR c/66970
	* include/cpplib.h (cpp_builtin_type): Add BT_HAS_BUILTIN.
	(cpp_callbacks::has_builtin): Declare new member.
	* init.c (builtin_array): Add an element for BT_HAS_BUILTIN.
	(cpp_init_special_builtins): Handle BT_HAS_BUILTIN.
	* macro.c (_cpp_builtin_macro_text): Same.
	* traditional.c: Same.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7169813d0f2..cd664566249 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -4467,8 +4467,7 @@ c_common_nodes_and_builtins (void)
   va_list_ref_type_node = build_reference_type (va_list_type_node);
 }
 
-  if (!flag_preprocess_only)
-c_define_builtins (va_list_ref_type_node, va_list_arg_type_node);
+  c_define_builtins (va_list_ref_type_node, va_list_arg_type_node);
 
   main_identifier_node = get_identifier ("main");
 
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 1e13aaa16fc..ad1bc6c3628 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -801,6 +801,7 @@ extern void c_register_addr_space (const char *str, addr_space_t as);
 extern bool in_late_binary_op;
 extern const char *c_addr_space_name (addr_space_t as);
 extern tree identifier_global_value (tree);
+extern bool names_builtin_p (const char *);
 extern tree c_linkage_bindings (tree);
 extern void record_builtin_type (enum rid, const char *, tree);
 extern tree build_void_list_node (void);
@@ -1022,6 +1023,7 @@ extern bool c_cpp_diagnostic (cpp_reader *, enum cpp_diagnostic_level,
 			  const char *, va_list *)
  ATTRIBUTE_GCC_DIAG(5,0);
 extern int c_common_has_attribute (cpp_reader *);
+extern int c_common_has_builtin (cpp_reader *);
 
 extern bool parse_optimize_options (tree, bool);
 
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..61ecbb23569 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -81,6 +81,7 @@ init_c_lex (void)
   cb->valid_pch = c_common_valid_pch;
   cb->read_pch = c_common_read_pch;
   cb->has_attribute = c_common_has_attribute;
+  cb->has_builtin = c_common_has_builtin;
   cb->get_source_date_epoch = cb_get_source_date_epoch;
   cb->get_suggestion = cb_get_suggestion;
   cb->remap_filename = remap_macro_filename;
@@ -385,6 +386,58 @@ c_common_has_attribute (cpp_reader *pfile)
 
   return result;
 }
+
+/* Callback for has_builtin.  */
+
+int
+c_common_has_builtin (cpp_reader *pfile)
+{
+  const cpp_token *token = get_token_no_padding (pfile);
+  if (token->type != CPP_OPEN_PAREN)
+{
+  cpp_error (pfile, CPP_DL_ERROR,
+		 "missing '(' after \"__has_builtin\"");
+  return 0;
+}
+
+  const char *name = "";
+  token = get_token_no_padding (pfile);
+  if (token->type == CPP_NAME)
+{
+  name = (const char *) cpp_token_as_text (pfile, token);
+  token = get_token_no_padding (pfile);
+  if (token->type != CPP_CLOSE_PAREN)
+	{
+	  cpp_error (pfile, CPP_DL_ERROR,
+		 "expected ')' after \"%s\"", name);
+	  name = "";
+	}
+}
+  else
+{
+  cpp_error (pfile, CPP_DL_ERROR,
+		 "macro \"__has_builtin\" requires an identifier");
+  if (token->type == CPP_CLOSE_PAREN)
+	return 0;
+}
+
+  /* Consume tokens up to the closing parenthesis, including any nested
+ pairs of parentheses, to avoid confusing redundant 

[tree-if-conv.c] Move call to ifcvt_local_dce after rpo vn

2019-10-01 Thread Prathamesh Kulkarni
Hi,
The attached patch is committed to trunk after bootstrap+test on
x86_64-unknown-linux-gnu.
Pre-approved by Richard.

Thanks,
Prathamesh
Index: ChangeLog
===
--- ChangeLog	(revision 276416)
+++ ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-10-01  Prathamesh Kulkarni  
+
+	* tree-if-conv.c (tree_if_conversion): Move call to ifcvt_local_dce
+	after local CSE.
+
 2019-10-01  Jan Hubicka  
 
 	* doc/invoke.texi (early-inlining-insns-O2): Document.
Index: tree-if-conv.c
===
--- tree-if-conv.c	(revision 276416)
+++ tree-if-conv.c	(working copy)
@@ -3060,9 +3060,6 @@
  on-the-fly.  */
   combine_blocks (loop);
 
-  /* Delete dead predicate computations.  */
-  ifcvt_local_dce (loop->header);
-
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
  and stores are involved.  CSE only the loop body, not the entry
  PHIs, those are to be kept in sync with the non-if-converted copy.
@@ -3071,6 +3068,9 @@
   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
   bitmap_set_bit (exit_bbs, loop->latch->index);
   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
+
+  /* Delete dead predicate computations.  */
+  ifcvt_local_dce (loop->header);
   BITMAP_FREE (exit_bbs);
 
   todo |= TODO_cleanup_cfg;


Re: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2

2019-10-01 Thread Jan Hubicka
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_O2.
>   * params.def (PARAM_EARLY_INLINING_INSNS_O2): New.
>   (PARAM_EARLY_INLINING_INSNS): Update documentation.
>   * invoke.texi (early-inlining-insns-O2): New.
>   (early-inlining-insns): Update documentation.
Hi,
this is a variant of patch with testsuite compensation I comitted today.
There are some cases where we need early inlining to happen in order to
get the even we look for.

In most cases this will go back once late inlining autoinline,
but I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91954
for missed vectorization issue and 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91955
for Wstringop.

Honza

* doc/invoke.texi (early-inlining-insns-O2): Document.
(early-inlining-insns): Update.
* params.def (early-inlining-insns-O2): New bound.
(early-inlining-insns): Update docs.
* ipa-inline.c (want_early_inline_function_p): Use new bound.

* g++.dg/tree-ssa/pr61034.C: Set early-inlining-insns-O2=14.
* g++.dg/tree-ssa/pr8781.C: Likewise.
* g++.dg/warn/Wstringop-truncation-1.C: Likewise.
* gcc.dg/ipa/pr63416.c: likewise.
* gcc.dg/vect/pr66142.c: Likewise.
* gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 276272)
+++ doc/invoke.texi (working copy)
@@ -11291,9 +11291,17 @@ recursion depth can be guessed from the
 via a given call expression.  This parameter limits inlining only to call
 expressions whose probability exceeds the given threshold (in percents).
 
+@item early-inlining-insns-O2
+Specify growth that the early inliner can make.  In effect it increases
+the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O1} or @option{-O2}
+optimization levels.
+
 @item early-inlining-insns
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O3} or @option{-Ofast}
+optimization levels.
 
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
Index: ipa-inline.c
===
--- ipa-inline.c(revision 276272)
+++ ipa-inline.c(working copy)
@@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr
 {
   int growth = estimate_edge_growth (e);
   int n;
+  int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3
+? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)
+: PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2);
+
 
   if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE))
;
@@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr
 growth);
  want_inline = false;
}
-  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  else if (growth > early_inlining_insns)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
 "  will not early inline: %C->%C, "
-"growth %i exceeds --param early-inlining-insns\n",
-e->caller, callee,
-growth);
+"growth %i exceeds --param 
early-inlining-insns%s\n",
+e->caller, callee, growth,
+opt_for_fn (e->caller->decl, optimize) >= 3
+? "" : "-O2");
  want_inline = false;
}
   else if ((n = num_calls (callee)) != 0
-  && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  && growth * (n + 1) > early_inlining_insns)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
 "  will not early inline: %C->%C, "
-"growth %i exceeds --param early-inlining-insns "
+"growth %i exceeds --param early-inlining-insns%s "
 "divided by number of calls\n",
-e->caller, callee,
-growth);
+e->caller, callee, growth,
+opt_for_fn (e->caller->decl, optimize) >= 3
+? "" : "-O2");
  want_inline = false;
}
 }
Index: params.def
===
--- params.def  (revision 276272)
+++ params.def  (working copy)
@@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 10, 0, 0)
 

Re: [PATCH] Fix -Waddress-of-packed-member ICE in unevaluated contexts (PR c++/91925)

2019-10-01 Thread Jason Merrill

On 9/29/19 6:11 AM, Jakub Jelinek wrote:

Hi!

On the following testcase we ICE, because check_alignment_of_packed_member
is called on the decltype expressions and the aggregate has not been laid
out.

The following patch fixes it by not emitting warnings on fields that weren't
laid out yet.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2019-09-28  Jakub Jelinek  

PR c++/91925
* c-warn.c (check_alignment_of_packed_member): Ignore FIELD_DECLs
with NULL DECL_FIELD_OFFSET.

* g++.dg/conversion/packed2.C: New test.

--- gcc/c-family/c-warn.c.jj2019-09-20 12:25:06.393034759 +0200
+++ gcc/c-family/c-warn.c   2019-09-28 13:40:12.010732474 +0200
@@ -2798,6 +2798,8 @@ check_alignment_of_packed_member (tree t
/* Check alignment of the data member.  */
if (TREE_CODE (field) == FIELD_DECL
&& (DECL_PACKED (field) || TYPE_PACKED (TREE_TYPE (field)))
+  /* Ignore FIELDs not laid out yet.  */
+  && DECL_FIELD_OFFSET (field)
&& (!rvalue || TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE))
  {
/* Check the expected alignment against the field alignment.  */
--- gcc/testsuite/g++.dg/conversion/packed2.C.jj2019-09-28 
13:46:30.650025052 +0200
+++ gcc/testsuite/g++.dg/conversion/packed2.C   2019-09-28 13:41:48.513277844 
+0200
@@ -0,0 +1,15 @@
+// PR c++/91925
+// { dg-do compile { target c++11 } }
+// { dg-options "-fpack-struct" }
+
+struct A {};
+int foo (A);
+struct B {
+  A a;
+  decltype (foo (a)) p;
+};
+template  T bar (T);
+class C {
+  A a;
+  decltype (bar (a)) p;
+};

Jakub





[SH][committed] Fix PR 88562

2019-10-01 Thread Oleg Endo
Hi,

The attached patch fixes PR 88562.

Tested on trunk with
   make -k check 
RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb}"

Committed to trunk, GCC 9, GCC 8, GCC 7 as r276411, r276412, r276413, r276414.

Cheers,
Oleg


gcc/ChangeLog:
PR target/88562
* config/sh/sh.c (sh_extending_set_of_reg::use_as_extended_reg): Use
sh_check_add_incdec_notes to preserve REG_INC notes when replacing
a memory access insn.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276264)
+++ gcc/config/sh/sh.c	(working copy)
@@ -12068,9 +12068,11 @@
 	rtx r = gen_reg_rtx (SImode);
 	rtx_insn* i0;
 	if (from_mode == QImode)
-	  i0 = emit_insn_after (gen_extendqisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendqisi2 (r, set_src), insn));
 	else if (from_mode == HImode)
-	  i0 = emit_insn_after (gen_extendhisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendhisi2 (r, set_src), insn));
 	else
 	  gcc_unreachable ();
 


[Patch][omp-low.c,fortran] Simple fix for optional argument handling with OpenMP's use_device_ptr

2019-10-01 Thread Tobias Burnus

Hi all,
[For those who got it twice, I actually forget to include the mailing 
lists in the first round. Ups.]


this patch fixes the bug that with "optional" the wrong pointer is used 
with "use_device_ptr"; the bug is already observable without doing 
actual offloading.


Namely, "present(ptr)" checks whether the passed argument is != NULL. 
While using "ptr" – e.g. as "associated(ptr)" – workes the (once) 
dereferenced dummy argument, which matches the actual argument.



The test case is written such that the pointer passed to 
"use_device_ptr" is present.*


Built and regtested on x86_64-gnu-linux without device; I am currently 
doing a full bootstrap + regtesting and want to test it also with nvptx 
offloading. Assuming no issue pops up:


OK for the trunk?


Regarding the patches:

* The first tiny patch is mine

* The second patch which added the lang_hook omp_is_optional_argument is 
the one posted at https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01743.html

This one was approved by Jakub and I only did two things:
(a) re-diff-ed it (trivial as fuzzy worked)
(b) I followed both suggestions of Jakub (PARAM_DECL + adding "( )")


[Motivation of this patch is – besides fixing an issue – to get the 
second patch it, which makes it easier to consolidate some other bits 
and pieces.]



Thanks,

Tobias

* OpenACC (Sec. 2.17) demands that a variable 'arg' in "clauses has no 
effect at runtime if PRESENT(arg) is .false." – Hence, one needs to go 
beyond this patch. That's done in the patch series at 
https://gcc.gnu.org/ml/gcc-patches/2019-07/threads.html#00960 – the 
patch lang_hook patch of this email is 2/5 of that series.




	gcc/
	* omp-low.c (lower_omp_target): Dereference optional argument
	to work with the right pointer.

	gcc/testsuite/
	* libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-1.f90: New.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a0e5041d3f2..ca7dfdb83a1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11870,7 +11870,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  var = build_fold_addr_expr (var);
 	else
 	  {
-		if (omp_is_reference (ovar))
+		if (omp_is_reference (ovar) || omp_is_optional_argument (ovar))
 		  {
 		type = TREE_TYPE (type);
 		if (TREE_CODE (type) != ARRAY_TYPE
diff --git a/libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-1.f90 b/libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-1.f90
new file mode 100644
index 000..93c61216034
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-1.f90
@@ -0,0 +1,36 @@
+! Test whether use_device_ptr properly handles OPTIONAL arguments
+! (Only case of present arguments is tested)
+program test_it
+  implicit none
+  integer, target :: ixx
+  integer, pointer :: ptr_i, ptr_null
+
+  ptr_i => ixx
+  call foo(ptr_i)
+
+  ptr_null => null()
+  call bar(ptr_null)
+contains
+  subroutine foo(ii)
+integer, pointer, optional :: ii
+
+if (.not.present(ii)) call abort()
+if (.not.associated(ii, ixx)) call abort()
+!$omp target data map(to:ixx) use_device_ptr(ii)
+if (.not.present(ii)) call abort()
+if (.not.associated(ii)) call abort()
+!$omp end target data
+  end subroutine foo
+
+  ! For bar, it is assumed that a NULL ptr on the host maps to NULL on the device
+  subroutine bar(jj)
+integer, pointer, optional :: jj
+
+if (.not.present(jj)) call abort()
+if (associated(jj)) call abort()
+!$omp target data map(to:ixx) use_device_ptr(jj)
+if (.not.present(jj)) call abort()
+   if (associated(jj)) call abort()
+!$omp end target data
+  end subroutine bar
+end program test_it

	Patch from:
	https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01743.html
	Re: [PATCH 2/5, OpenACC] Support Fortran optional arguments in the firstprivate clause

	Changes: Rediffed, review changes added (added parentheses, PARM_DECL check)

	gcc/fortran/
	* f95-lang.c (LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT): Define to
	gfc_omp_is_optional_argument.
	* trans-decl.c (create_function_arglist): Set
	GFC_DECL_OPTIONAL_ARGUMENT in the generated decl if the parameter is
	optional.
	* trans-openmp.c (gfc_omp_is_optional_argument): New.
	(gfc_omp_privatize_by_reference): Return true if the decl is an
	optional pass-by-reference argument.
	* trans.h (gfc_omp_is_optional_argument): New declaration.
	(lang_decl): Add new optional_arg field.
	(GFC_DECL_OPTIONAL_ARGUMENT): New macro.

	gcc/
	* langhooks-def.h (LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT): Default to
	false.
	(LANG_HOOKS_DECLS): Add LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT.
	* langhooks.h (omp_is_optional_argument): New hook.
	* omp-general.c (omp_is_optional_argument): New.
	* omp-general.h (omp_is_optional_argument): New declaration.
	* omp-low.c (lower_omp_target): Create temporary for received value
	and take the address for new_var if the original variable was a
	DECL_BY_REFERENCE.  Use size of referenced object when a
	pass-by-reference optional argument used as 

PING^3 [PATCH v2] S/390: Improve storing asan frame_pc

2019-10-01 Thread Ilya Leoshkevich
> Am 02.07.2019 um 17:34 schrieb Ilya Leoshkevich :
> 
> Bootstrap and regtest running on x86_64-redhat-linux, s390x-redhat-linux
> and ppc64le-redhat-linux.
> 
> Currently s390 emits the following sequence to store a frame_pc:
> 
>   a:
>   .LASANPC0:
> 
>   lg  %r1,.L5-.L4(%r13)
>   la  %r1,0(%r1,%r12)
>   stg %r1,176(%r11)
> 
>   .L5:
>   .quad   .LASANPC0@GOTOFF
> 
> The reason GOT indirection is used instead of larl is that gcc does not
> know that .LASANPC0, being a code label, is aligned on a 2-byte
> boundary, and larl can load only even addresses.
> 
> This patch provides such an alignment hint.  Since targets don't provide
> their instruction alignments yet, the new macro is introduced for that
> purpose.  It returns 1-byte alignment by default, so this change is a
> no-op for targets other than s390.
> 
> As a result, we get the desired:
> 
>   larl%r1,.LASANPC0
>   stg %r1,176(%r11)
> 
> gcc/ChangeLog:
> 
> 2019-06-28  Ilya Leoshkevich  
> 
>   * asan.c (asan_emit_stack_protection): Provide an alignment
>   hint.
>   * config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
>   requires code labels to be aligned on a 2-byte boundary.
>   * defaults.h (CODE_LABEL_BOUNDARY): New macro.
>   * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
>   * doc/tm.texi.in: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-28  Ilya Leoshkevich  
> 
>   * gcc.target/s390/asan-no-gotoff.c: New test.
> ---
> gcc/asan.c |  1 +
> gcc/config/s390/s390.h |  3 +++
> gcc/defaults.h |  5 +
> gcc/doc/tm.texi|  4 
> gcc/doc/tm.texi.in |  4 
> gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
> 6 files changed, 32 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
> 
> diff --git a/gcc/asan.c b/gcc/asan.c
> index 605d04f87f7..2db69f476bc 100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -1523,6 +1523,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
> unsigned int alignb,
>   DECL_INITIAL (decl) = decl;
>   TREE_ASM_WRITTEN (decl) = 1;
>   TREE_ASM_WRITTEN (id) = 1;
> +  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
>   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>   shadow_base = expand_binop (Pmode, lshr_optab, base,
> gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
> diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
> index 969f58a2ba0..3d0266c9dff 100644
> --- a/gcc/config/s390/s390.h
> +++ b/gcc/config/s390/s390.h
> @@ -334,6 +334,9 @@ extern const char *s390_host_detect_local_cpu (int argc, 
> const char **argv);
> /* Allocation boundary (in *bits*) for the code of a function.  */
> #define FUNCTION_BOUNDARY 64
> 
> +/* Alignment required for a code label, in bits.  */
> +#define CODE_LABEL_BOUNDARY 16
> +
> /* There is no point aligning anything to a rounder boundary than this.  */
> #define BIGGEST_ALIGNMENT 64
> 
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index af7ea185f1e..97c4c17537d 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -1459,4 +1459,9 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
> #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB
> #endif
> 
> +/* Alignment required for a code label, in bits.  */
> +#ifndef CODE_LABEL_BOUNDARY
> +#define CODE_LABEL_BOUNDARY BITS_PER_UNIT
> +#endif
> +
> #endif  /* ! GCC_DEFAULTS_H */
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 14c1ea6a323..3b50fc0c0a7 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -1019,6 +1019,10 @@ to a value equal to or larger than 
> @code{STACK_BOUNDARY}.
> Alignment required for a function entry point, in bits.
> @end defmac
> 
> +@defmac CODE_LABEL_BOUNDARY
> +Alignment required for a code label, in bits.
> +@end defmac
> +
> @defmac BIGGEST_ALIGNMENT
> Biggest alignment that any data type can require on this machine, in
> bits.  Note that this is not the biggest alignment that is supported,
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index b4d57b86e2f..ab038b7462c 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -969,6 +969,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
> Alignment required for a function entry point, in bits.
> @end defmac
> 
> +@defmac CODE_LABEL_BOUNDARY
> +Alignment required for a code label, in bits.
> +@end defmac
> +
> @defmac BIGGEST_ALIGNMENT
> Biggest alignment that any data type can require on this machine, in
> bits.  Note that this is not the biggest alignment that is supported,
> diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c 
> b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
> new file mode 100644
> index 000..f555e4e96f8
> --- /dev/null
> +++ 

Re: [PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Richard Sandiford
Paul Koning  writes:
> On Oct 1, 2019, at 5:14 AM, Segher Boessenkool  
> wrote:
>> 
>> The regrename pass temporarily changes some operand RTL to CC0 so that
>> note_stores and scan_rtx don't see those operands.  CC0 is deprecated
>> and we want to remove it, so we need to use something else here.
>> PC fits the bill fine.
>
> CC0 is, presumably, not part of GENERAL_REGS, but PC is, in some ports.  Does 
> that cause a problem here? 

"PC" here is the special rtx pc_rtx, which represents (pc) rather than
a (reg ...).  It's distinct from real PC registers like r15 on AArch32.

Richard


Re: [PATCH v4 5/7] S/390: Remove code duplication in vec_* comparison expanders

2019-10-01 Thread Andreas Krebbel
On 01.10.19 15:27, Ilya Leoshkevich wrote:
> s390.md uses a lot of near-identical expanders that perform dispatching
> to other expanders based on operand types. Since the following patch
> would require even more of these, avoid copy-pasting the code by
> generating these expanders using an iterator.
> 
> gcc/ChangeLog:
> 
> 2019-08-09  Ilya Leoshkevich  
> 
>   PR target/77918
>   * config/s390/s390.c (s390_expand_vec_compare): Use
>   gen_vec_cmpordered and gen_vec_cmpunordered.
>   * config/s390/vector.md (vec_cmpuneq, vec_cmpltgt, vec_ordered,
>   vec_unordered): Delete.
>   (vec_ordered): Rename to vec_cmpordered.
>   (vec_unordered): Rename to vec_cmpunordered.
>   (VEC_CMP_EXPAND): New iterator for the generic dispatcher.
>   (vec_cmp): Generic dispatcher.

Ok. Thanks!

Andreas



Re: [PATCH v4 4/7] S/390: Implement vcond expander for V1TI,V1TF

2019-10-01 Thread Andreas Krebbel
On 01.10.19 15:27, Ilya Leoshkevich wrote:
> Currently gcc does not emit wf{c,k}* instructions when comparing long
> double values.  Middle-end actually adds them in the first place, but
> then veclower pass replaces them with floating point register pair
> operations, because the corresponding expander is missing.
> 
> gcc/ChangeLog:
> 
> 2019-08-09  Ilya Leoshkevich  
> 
>   PR target/77918
>   * config/s390/vector.md (V_HW): Add V1TI in order to make
>   vcond$a$b generate vcondv1tiv1tf.

Ok. Thanks!

Andreas



[PATCH v4 7/7] S/390: Test signaling FP comparison instructions

2019-10-01 Thread Ilya Leoshkevich
gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  

PR target/77918
* gcc.target/s390/s390.exp: Enable Fortran tests.
* gcc.target/s390/zvector/autovec-double-quiet-eq.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-ge.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-gt.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-le.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-lt.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-ordered.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-unordered.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-eq.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-le-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c: 
New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: New test.
* gcc.target/s390/zvector/autovec-double-smax-z13.F90: New test.
* gcc.target/s390/zvector/autovec-double-smax.F90: New test.
* gcc.target/s390/zvector/autovec-double-smin-z13.F90: New test.
* gcc.target/s390/zvector/autovec-double-smin.F90: New test.
* gcc.target/s390/zvector/autovec-float-quiet-eq.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-ge.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-gt.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-le.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-lt.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-ordered.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-unordered.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: New test.
* gcc.target/s390/zvector/autovec-fortran.h: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec.h: New test.
---
 gcc/testsuite/gcc.target/s390/s390.exp|  8 
 .../s390/zvector/autovec-double-quiet-eq.c|  8 
 .../s390/zvector/autovec-double-quiet-ge.c|  8 
 .../s390/zvector/autovec-double-quiet-gt.c|  8 
 .../s390/zvector/autovec-double-quiet-le.c|  8 
 .../s390/zvector/autovec-double-quiet-lt.c|  8 
 .../zvector/autovec-double-quiet-ordered.c| 10 +
 .../s390/zvector/autovec-double-quiet-uneq.c  | 10 +
 .../zvector/autovec-double-quiet-unordered.c  | 11 +
 .../autovec-double-signaling-eq-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-eq-z13.c |  9 
 .../zvector/autovec-double-signaling-eq.c | 11 +
 .../autovec-double-signaling-ge-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-ge-z13.c |  9 
 .../zvector/autovec-double-signaling-ge.c |  8 
 .../autovec-double-signaling-gt-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-gt-z13.c |  9 
 .../zvector/autovec-double-signaling-gt.c |  8 
 .../autovec-double-signaling-le-z13-finite.c 

[PATCH v4 4/7] S/390: Implement vcond expander for V1TI,V1TF

2019-10-01 Thread Ilya Leoshkevich
Currently gcc does not emit wf{c,k}* instructions when comparing long
double values.  Middle-end actually adds them in the first place, but
then veclower pass replaces them with floating point register pair
operations, because the corresponding expander is missing.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

PR target/77918
* config/s390/vector.md (V_HW): Add V1TI in order to make
vcond$a$b generate vcondv1tiv1tf.
---
 gcc/config/s390/vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 8a0b01f562b..2c2c56f7835 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -29,7 +29,7 @@
 ; All modes directly supported by the hardware having full vector reg size
 ; V_HW2 is duplicate of V_HW for having two iterators expanding
 ; independently e.g. vcond
-(define_mode_iterator V_HW  [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE") 
(V1TF "TARGET_VXE")])
+(define_mode_iterator V_HW  [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V2DF 
(V4SF "TARGET_VXE") (V1TF "TARGET_VXE")])
 (define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE") 
(V1TF "TARGET_VXE")])
 
 (define_mode_iterator V_HW_64 [V2DI V2DF])
-- 
2.23.0



[PATCH v4 6/7] S/390: Use signaling FP comparison instructions

2019-10-01 Thread Ilya Leoshkevich
dg-torture.exp=inf-compare-1.c is failing, because (qNaN > +Inf)
comparison is compiled to CDB instruction, which does not signal an
invalid operation exception. KDB should have been used instead.

This patch introduces a new CCmode and a new pattern in order to
generate signaling instructions in this and similar cases.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

PR target/77918
* config/s390/2827.md: Add new opcodes.
* config/s390/2964.md: Likewise.
* config/s390/3906.md: Likewise.
* config/s390/8561.md: Likewise.
* config/s390/s390-builtins.def (s390_vfchesb): Use
the new vec_cmpgev4sf_quiet_nocc.
(s390_vfchedb): Use the new vec_cmpgev2df_quiet_nocc.
(s390_vfchsb): Use the new vec_cmpgtv4sf_quiet_nocc.
(s390_vfchdb): Use the new vec_cmpgtv2df_quiet_nocc.
(vec_cmplev4sf): Use the new vec_cmplev4sf_quiet_nocc.
(vec_cmplev2df): Use the new vec_cmplev2df_quiet_nocc.
(vec_cmpltv4sf): Use the new vec_cmpltv4sf_quiet_nocc.
(vec_cmpltv2df): Use the new vec_cmpltv2df_quiet_nocc.
* config/s390/s390-modes.def (CCSFPS): New mode.
* config/s390/s390.c (s390_match_ccmode_set): Support CCSFPS.
(s390_select_ccmode): Return CCSFPS for LT, LE, GT, GE and LTGT.
(s390_branch_condition_mask): Reuse CCS for CCSFPS.
(s390_expand_vec_compare): Use non-signaling patterns where
necessary.
(s390_reverse_condition): Support CCSFPS.
* config/s390/s390.md (*cmp_ccsfps): New pattern.
* config/s390/vector.md: (VFCMP_HW_OP): Remove.
(asm_fcmp_op): Likewise.
(*smaxv2df3_vx): Use pattern for quiet comparison.
(*sminv2df3_vx): Likewise.
(*vec_cmp_nocc): Remove.
(*vec_cmpeq_quiet_nocc): New pattern.
(vec_cmpgt_quiet_nocc): Likewise.
(vec_cmplt_quiet_nocc): New expander.
(vec_cmpge_quiet_nocc): New pattern.
(vec_cmple_quiet_nocc): New expander.
(*vec_cmpeq_signaling_nocc): New pattern.
(*vec_cmpgt_signaling_nocc): Likewise.
(*vec_cmpgt_signaling_finite_nocc): Likewise.
(*vec_cmpge_signaling_nocc): Likewise.
(*vec_cmpge_signaling_finite_nocc): Likewise.
(vec_cmpungt): New expander.
(vec_cmpunge): Likewise.
(vec_cmpuneq): Use quiet patterns.
(vec_cmpltgt): Allow only on z14+.
(vec_cmpordered): Use quiet patterns.
(vec_cmpunordered): Likewise.
(VEC_CMP_EXPAND): Add ungt and unge.

gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* gcc.target/s390/vector/vec-scalar-cmp-1.c: Adjust
expectations.
---
 gcc/config/s390/2827.md   |  14 +-
 gcc/config/s390/2964.md   |  13 +-
 gcc/config/s390/3906.md   |  17 +-
 gcc/config/s390/8561.md   |  19 +-
 gcc/config/s390/s390-builtins.def |  16 +-
 gcc/config/s390/s390-modes.def|   8 +
 gcc/config/s390/s390.c|  34 ++--
 gcc/config/s390/s390.md   |  14 ++
 gcc/config/s390/vector.md | 171 +++---
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 10 files changed, 240 insertions(+), 74 deletions(-)

diff --git a/gcc/config/s390/2827.md b/gcc/config/s390/2827.md
index 3f63f82284d..aafe8e27339 100644
--- a/gcc/config/s390/2827.md
+++ b/gcc/config/s390/2827.md
@@ -44,7 +44,7 @@
 
 (define_insn_reservation "zEC12_normal_fp" 8
   (and (eq_attr "cpu" "zEC12")
-   (eq_attr "mnemonic" 
"lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb"))
 "nothing")
+   (eq_attr "mnemonic" 
"lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb,keb,kebr,kdb,kdbr"))
 "nothing")
 
 (define_insn_reservation "zEC12_cgdbr" 2
   (and (eq_attr "cpu" "zEC12")
@@ -426,6 +426,10 @@
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "cxbr")) "nothing")
 
+(define_insn_reservation "zEC12_kxbr" 18
+  (and (eq_attr "cpu" "zEC12")
+   (eq_attr "mnemonic" "kxbr")) "nothing")
+
 (define_insn_reservation "zEC12_ddbr" 36
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "ddbr")) "nothing")
@@ -578,10 +582,18 @@
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "cdtr")) "nothing")
 
+(define_insn_reservation "zEC12_kdtr" 11
+  (and (eq_attr "cpu" 

[PATCH v4 2/7] Introduce can_vcond_compare_p function

2019-10-01 Thread Ilya Leoshkevich
z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.
However, we cannot express this restriction today: the code only checks
whether vcond$a$b optab exists, which does not contain information about
the operation.

Introduce a function that checks whether back-end supports vector
comparisons with individual rtx codes by matching vcond expander's third
argument with a fake comparison with the corresponding rtx code.

gcc/ChangeLog:

2019-08-27  Ilya Leoshkevich  

PR target/77918
* optabs-tree.c (vcond_icode_p): New function.
(vcond_eq_icode_p): New function.
(expand_vec_cond_expr_p): Use vcond_icode_p and
vcond_eq_icode_p.
* optabs.c (can_vcond_compare_p): New function.
(get_rtx_code): Use get_rtx_code_safe.
(get_rtx_code_safe): New function.
* optabs.h (can_vcond_compare_p): New function.
(get_rtx_code_safe): Likewise.
---
 gcc/optabs-tree.c | 37 +++--
 gcc/optabs.c  | 38 ++
 gcc/optabs.h  |  7 +++
 3 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 8157798cc71..7f505c9cdee 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -23,7 +23,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "target.h"
 #include "insn-codes.h"
+#include "rtl.h"
 #include "tree.h"
+#include "memmodel.h"
+#include "optabs.h"
 #include "optabs-tree.h"
 #include "stor-layout.h"
 
@@ -329,6 +332,28 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type, 
enum tree_code code)
   return false;
 }
 
+/* Return true iff vcond_optab/vcondu_optab support the given tree
+   comparison.  */
+
+static bool
+vcond_icode_p (tree value_type, tree cmp_op_type, enum tree_code code)
+{
+  return can_vcond_compare_p (get_rtx_code (code, TYPE_UNSIGNED (cmp_op_type)),
+ TYPE_MODE (value_type), TYPE_MODE (cmp_op_type));
+}
+
+/* Return true iff vcondeq_optab supports the given tree comparison.  */
+
+static bool
+vcond_eq_icode_p (tree value_type, tree cmp_op_type, enum tree_code code)
+{
+  if (code != EQ_EXPR && code != NE_EXPR)
+return false;
+
+  return get_vcond_eq_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type))
+!= CODE_FOR_nothing;
+}
+
 /* Return TRUE iff, appropriate vector insns are available
for vector cond expr with vector type VALUE_TYPE and a comparison
with operand vector types in CMP_OP_TYPE.  */
@@ -347,14 +372,14 @@ expand_vec_cond_expr_p (tree value_type, tree 
cmp_op_type, enum tree_code code)
   || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS 
(cmp_op_mode)))
 return false;
 
-  if (get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
-  TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing
-  && ((code != EQ_EXPR && code != NE_EXPR)
- || get_vcond_eq_icode (TYPE_MODE (value_type),
-TYPE_MODE (cmp_op_type)) == CODE_FOR_nothing))
+  if (get_rtx_code_safe (code, TYPE_UNSIGNED (cmp_op_type))
+  == LAST_AND_UNUSED_RTX_CODE)
+/* This may happen, for example, if code == SSA_NAME, in which case we
+   cannot be certain whether a vector insn is available.  */
 return false;
 
-  return true;
+  return vcond_icode_p (value_type, cmp_op_type, code)
+|| vcond_eq_icode_p (value_type, cmp_op_type, code);
 }
 
 /* Use the current target and options to initialize
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 35921e691f9..d759449846e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3819,6 +3819,25 @@ can_compare_p (enum rtx_code code, machine_mode mode,
   return 0;
 }
 
+/* Return whether the backend can emit a vector comparison for code CODE,
+   comparing operands of mode CMP_OP_MODE and producing a result with
+   VALUE_MODE.  */
+
+bool
+can_vcond_compare_p (enum rtx_code code, machine_mode value_mode,
+machine_mode cmp_op_mode)
+{
+  enum insn_code icode;
+  bool unsigned_p = (code == LTU || code == LEU || code == GTU || code == GEU);
+  rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
+  rtx reg2 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
+  rtx test = alloca_rtx_fmt_ee (code, value_mode, reg1, reg2);
+
+  return (icode = get_vcond_icode (value_mode, cmp_op_mode, unsigned_p))
+!= CODE_FOR_nothing
+&& insn_operand_matches (icode, 3, test);
+}
+
 /* This function is called when we are going to emit a compare instruction that
compares the values found in X and Y, using the rtl operator COMPARISON.
 
@@ -5348,11 +5367,11 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx op2, 
rtx tcode)
   return insn;
 }
 
-/* Return rtx code for TCODE. Use UNSIGNEDP to select signed
-   or unsigned operation code.  */
+/* Return rtx code for TCODE or LAST_AND_UNUSED_RTX_CODE 

[PATCH v4 5/7] S/390: Remove code duplication in vec_* comparison expanders

2019-10-01 Thread Ilya Leoshkevich
s390.md uses a lot of near-identical expanders that perform dispatching
to other expanders based on operand types. Since the following patch
would require even more of these, avoid copy-pasting the code by
generating these expanders using an iterator.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

PR target/77918
* config/s390/s390.c (s390_expand_vec_compare): Use
gen_vec_cmpordered and gen_vec_cmpunordered.
* config/s390/vector.md (vec_cmpuneq, vec_cmpltgt, vec_ordered,
vec_unordered): Delete.
(vec_ordered): Rename to vec_cmpordered.
(vec_unordered): Rename to vec_cmpunordered.
(VEC_CMP_EXPAND): New iterator for the generic dispatcher.
(vec_cmp): Generic dispatcher.
---
 gcc/config/s390/s390.c|  4 +--
 gcc/config/s390/vector.md | 67 +++
 2 files changed, 13 insertions(+), 58 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1764c3450e6..062cbd8099d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6523,10 +6523,10 @@ s390_expand_vec_compare (rtx target, enum rtx_code cond,
  emit_insn (gen_vec_cmpltgt (target, cmp_op1, cmp_op2));
  return;
case ORDERED:
- emit_insn (gen_vec_ordered (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpordered (target, cmp_op1, cmp_op2));
  return;
case UNORDERED:
- emit_insn (gen_vec_unordered (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpunordered (target, cmp_op1, cmp_op2));
  return;
default: break;
}
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 2c2c56f7835..15b0e7f1802 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1492,22 +1492,6 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_cmpuneq"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_cmpuneqv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_cmpuneqv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; LTGT a <> b -> a > b | b > a
 (define_expand "vec_cmpltgt"
   [(set (match_operand: 0 "register_operand" "=v")
@@ -1520,24 +1504,8 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_cmpltgt"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_cmpltgtv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_cmpltgtv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; ORDERED (a, b): a >= b | b > a
-(define_expand "vec_ordered"
+(define_expand "vec_cmpordered"
   [(set (match_operand:  0 "register_operand" "=v")
(ge: (match_operand:VFT 1 "register_operand"  "v")
 (match_operand:VFT 2 "register_operand"  "v")))
@@ -1548,45 +1516,32 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_ordered"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_orderedv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_orderedv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; UNORDERED (a, b): !ORDERED (a, b)
-(define_expand "vec_unordered"
+(define_expand "vec_cmpunordered"
   [(match_operand: 0 "register_operand" "=v")
(match_operand:VFT1 "register_operand" "v")
(match_operand:VFT2 "register_operand" "v")]
   "TARGET_VX"
 {
-  emit_insn (gen_vec_ordered (operands[0], operands[1], operands[2]));
+  emit_insn (gen_vec_cmpordered (operands[0], operands[1], operands[2]));
   emit_insn (gen_rtx_SET (operands[0],
 gen_rtx_NOT (mode, operands[0])));
   DONE;
 })
 
-(define_expand "vec_unordered"
+(define_code_iterator VEC_CMP_EXPAND
+  [uneq ltgt ordered unordered])
+
+(define_expand "vec_cmp"
   [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
+   (VEC_CMP_EXPAND (match_operand 1 "register_operand" "")
+   (match_operand 2 "register_operand" ""))]
   "TARGET_VX"
 {
   if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_unorderedv4sf (operands[0], operands[1], operands[2]));
+emit_insn (gen_vec_cmpv4sf (operands[0], operands[1], operands[2]));
   else if (GET_MODE 

[PATCH v4 3/7] S/390: Do not use signaling vector comparisons on z13

2019-10-01 Thread Ilya Leoshkevich
z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.  Notify
middle-end about this by using more restrictive operator predicate in
vcond.

gcc/ChangeLog:

2019-08-21  Ilya Leoshkevich  

PR target/77918
* config/s390/vector.md (vcond_comparison_operator): New
predicate.
(vcond): Use vcond_comparison_operator.
---
 gcc/config/s390/vector.md | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 961d2c655e4..8a0b01f562b 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -614,10 +614,30 @@
   operands[2] = GEN_INT (GET_MODE_NUNITS (mode) - 1);
 })
 
+(define_predicate "vcond_comparison_operator"
+  (match_operand 0 "comparison_operator")
+{
+  if (!HONOR_NANS (GET_MODE (XEXP (op, 0)))
+  && !HONOR_NANS (GET_MODE (XEXP (op, 1
+return true;
+  switch (GET_CODE (op))
+{
+case LE:
+case LT:
+case GE:
+case GT:
+case LTGT:
+  /* Signaling vector comparisons are supported only on z14+.  */
+  return TARGET_Z14;
+default:
+  return true;
+}
+})
+
 (define_expand "vcond"
   [(set (match_operand:V_HW 0 "register_operand" "")
(if_then_else:V_HW
-(match_operator 3 "comparison_operator"
+(match_operator 3 "vcond_comparison_operator"
 [(match_operand:V_HW2 4 "register_operand" "")
  (match_operand:V_HW2 5 "nonmemory_operand" "")])
 (match_operand:V_HW 1 "nonmemory_operand" "")
-- 
2.23.0



[PATCH v4 1/7] Allow COND_EXPR and VEC_COND_EXPR condtions to trap

2019-10-01 Thread Ilya Leoshkevich
Right now gimplifier does not allow VEC_COND_EXPR's condition to trap
and introduces a temporary if this could happen, for example, generating

  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;

from GENERIC

  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
  { -1, -1, -1, -1 } ,
  { 0, 0, 0, 0 } >

This is not necessary and makes the resulting GIMPLE harder to analyze.
Change the gimplifier so as to allow COND_EXPR and VEC_COND_EXPR
conditions to trap.

This patch takes special care to avoid introducing trapping comparisons
in GIMPLE_COND.  They are not allowed, because they would require 3
outgoing edges (then, else and EH), which is awkward to say the least.
Therefore, computations of such conditions should live in their own basic
blocks.

gcc/ChangeLog:

2019-09-03  Ilya Leoshkevich  

PR target/77918
* gimple-expr.c (gimple_cond_get_ops_from_tree): Assert that the
caller passes a non-trapping condition.
(is_gimple_condexpr): Allow trapping conditions.
(is_gimple_condexpr_1): New helper function.
(is_gimple_condexpr_for_cond): New function, acts like old
is_gimple_condexpr.
* gimple-expr.h (is_gimple_condexpr_for_cond): New function.
* gimple.c (gimple_could_trap_p_1): Handle COND_EXPR and
VEC_COND_EXPR. Fix an issue with statements like i = (fp < 1.).
* gimplify.c (gimplify_cond_expr): Use
is_gimple_condexpr_for_cond.
(gimplify_expr): Allow is_gimple_condexpr_for_cond.
* tree-eh.c (operation_could_trap_p): Assert on COND_EXPR and
VEC_COND_EXPR.
(tree_could_trap_p): Handle COND_EXPR and VEC_COND_EXPR.
* tree-ssa-forwprop.c (forward_propagate_into_gimple_cond): Use
is_gimple_condexpr_for_cond, remove pointless tmp check
(forward_propagate_into_cond): Remove pointless tmp check.
---
 gcc/gimple-expr.c   | 25 +
 gcc/gimple-expr.h   |  1 +
 gcc/gimple.c| 14 +-
 gcc/gimplify.c  |  5 +++--
 gcc/tree-eh.c   |  8 
 gcc/tree-ssa-forwprop.c |  7 ---
 6 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index 4082828e198..1738af186d7 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -574,6 +574,7 @@ gimple_cond_get_ops_from_tree (tree cond, enum tree_code 
*code_p,
  || TREE_CODE (cond) == TRUTH_NOT_EXPR
  || is_gimple_min_invariant (cond)
  || SSA_VAR_P (cond));
+  gcc_checking_assert (!tree_could_throw_p (cond));
 
   extract_ops_from_tree (cond, code_p, lhs_p, rhs_p);
 
@@ -605,17 +606,33 @@ is_gimple_lvalue (tree t)
  || TREE_CODE (t) == BIT_FIELD_REF);
 }
 
-/*  Return true if T is a GIMPLE condition.  */
+/* Helper for is_gimple_condexpr and is_gimple_condexpr_for_cond.  */
 
-bool
-is_gimple_condexpr (tree t)
+static bool
+is_gimple_condexpr_1 (tree t, bool allow_traps)
 {
   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
-   && !tree_could_throw_p (t)
+   && (allow_traps || !tree_could_throw_p (t))
&& is_gimple_val (TREE_OPERAND (t, 0))
&& is_gimple_val (TREE_OPERAND (t, 1;
 }
 
+/* Return true if T is a GIMPLE condition.  */
+
+bool
+is_gimple_condexpr (tree t)
+{
+  return is_gimple_condexpr_1 (t, true);
+}
+
+/* Like is_gimple_condexpr, but does not allow T to trap.  */
+
+bool
+is_gimple_condexpr_for_cond (tree t)
+{
+  return is_gimple_condexpr_1 (t, false);
+}
+
 /* Return true if T is a gimple address.  */
 
 bool
diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
index 1ad1432bd17..0925aeb0f57 100644
--- a/gcc/gimple-expr.h
+++ b/gcc/gimple-expr.h
@@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum 
tree_code *, tree *,
   tree *);
 extern bool is_gimple_lvalue (tree);
 extern bool is_gimple_condexpr (tree);
+extern bool is_gimple_condexpr_for_cond (tree);
 extern bool is_gimple_address (const_tree);
 extern bool is_gimple_invariant_address (const_tree);
 extern bool is_gimple_ip_invariant_address (const_tree);
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 8e828a5f169..a874c29454c 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2149,10 +2149,22 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, 
bool include_stores)
   return false;
 
 case GIMPLE_ASSIGN:
-  t = gimple_expr_type (s);
   op = gimple_assign_rhs_code (s);
+
+  /* For COND_EXPR and VEC_COND_EXPR only the condition may trap.  */
+  if (op == COND_EXPR || op == VEC_COND_EXPR)
+   return tree_could_trap_p (gimple_assign_rhs1 (s));
+
+  /* For comparisons we need to check rhs operand types instead of rhs type
+ (which is BOOLEAN_TYPE).  */
+  if (TREE_CODE_CLASS (op) 

[PATCH v4 0/7] S/390: Use signaling FP comparison instructions

2019-10-01 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux, x86_64-redhat-linux,
ppc64le-redhat-linux.

This patch series adds signaling FP comparison support (both scalar and
vector) to s390 backend.

Patches 1-3 make it possible to query supported vcond rtxes and make
use of that for z13.

Patches 4-5 are preparation cleanups.

Patch 6 is an actual implementation.

Path 7 contains new tests, that make sure autovectorized comparisons use
proper instructions.

Ilya Leoshkevich (7):
  Allow COND_EXPR and VEC_COND_EXPR condtions to trap
  Introduce can_vcond_compare_p function
  S/390: Do not use signaling vector comparisons on z13
  S/390: Implement vcond expander for V1TI,V1TF
  S/390: Remove code duplication in vec_* comparison expanders
  S/390: Use signaling FP comparison instructions
  S/390: Test signaling FP comparison instructions

v1->v2:
Improve wording in documentation commit message.
Replace hook with optabs query.
Add signaling eq test.

v2->v3:
Allow COND_EXPR and VEC_COND_EXPR conditions to throw, while making
sure that GIMPLE_COND's condition still cannot throw.
Remove documentation patch (superseded by
https://gcc.gnu.org/viewcvs/gcc?view=revision=275303).
Add PR target/77918 reference.

v3->v4:
Committed alloca and vec_unordered bits separately.
Fixed issues in in gimple_could_trap_p_1.
Dropped unnecessary gimple_ternary_operands_ok_p.
Simplified vcondv1tiv1tf implementation.
Improved VEC_CMP_EXPAND naming.
Improved can_vcond_compare_p naming and streamlined implementation.

 gcc/config/s390/2827.md   |  14 +-
 gcc/config/s390/2964.md   |  13 +-
 gcc/config/s390/3906.md   |  17 +-
 gcc/config/s390/8561.md   |  19 +-
 gcc/config/s390/s390-builtins.def |  16 +-
 gcc/config/s390/s390-modes.def|   8 +
 gcc/config/s390/s390.c|  38 ++-
 gcc/config/s390/s390.md   |  14 +
 gcc/config/s390/vector.md | 260 --
 gcc/gimple-expr.c |  25 +-
 gcc/gimple-expr.h |   1 +
 gcc/gimple.c  |  14 +-
 gcc/gimplify.c|   5 +-
 gcc/optabs-tree.c |  37 ++-
 gcc/optabs.c  |  38 ++-
 gcc/optabs.h  |   7 +
 gcc/testsuite/gcc.target/s390/s390.exp|   8 +
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 .../s390/zvector/autovec-double-quiet-eq.c|   8 +
 .../s390/zvector/autovec-double-quiet-ge.c|   8 +
 .../s390/zvector/autovec-double-quiet-gt.c|   8 +
 .../s390/zvector/autovec-double-quiet-le.c|   8 +
 .../s390/zvector/autovec-double-quiet-lt.c|   8 +
 .../zvector/autovec-double-quiet-ordered.c|  10 +
 .../s390/zvector/autovec-double-quiet-uneq.c  |  10 +
 .../zvector/autovec-double-quiet-unordered.c  |  11 +
 .../autovec-double-signaling-eq-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-eq-z13.c |   9 +
 .../zvector/autovec-double-signaling-eq.c |  11 +
 .../autovec-double-signaling-ge-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-ge-z13.c |   9 +
 .../zvector/autovec-double-signaling-ge.c |   8 +
 .../autovec-double-signaling-gt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-gt-z13.c |   9 +
 .../zvector/autovec-double-signaling-gt.c |   8 +
 .../autovec-double-signaling-le-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-le-z13.c |   9 +
 .../zvector/autovec-double-signaling-le.c |   8 +
 .../autovec-double-signaling-lt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-lt-z13.c |   9 +
 .../zvector/autovec-double-signaling-lt.c |   8 +
 ...autovec-double-signaling-ltgt-z13-finite.c |   9 +
 .../autovec-double-signaling-ltgt-z13.c   |   9 +
 .../zvector/autovec-double-signaling-ltgt.c   |   9 +
 .../s390/zvector/autovec-double-smax-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smax.F90  |   8 +
 .../s390/zvector/autovec-double-smin-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smin.F90  |   8 +
 .../s390/zvector/autovec-float-quiet-eq.c |   8 +
 .../s390/zvector/autovec-float-quiet-ge.c |   8 +
 .../s390/zvector/autovec-float-quiet-gt.c |   8 +
 .../s390/zvector/autovec-float-quiet-le.c |   8 +
 .../s390/zvector/autovec-float-quiet-lt.c |   8 +
 .../zvector/autovec-float-quiet-ordered.c |  10 +
 .../s390/zvector/autovec-float-quiet-uneq.c   |  10 +
 .../zvector/autovec-float-quiet-unordered.c   |  11 +
 .../s390/zvector/autovec-float-signaling-eq.c |  11 +
 .../s390/zvector/autovec-float-signaling-ge.c |   8 +
 .../s390/zvector/autovec-float-signaling-gt.c |   8 +
 .../s390/zvector/autovec-float-signaling-le.c |   8 +
 .../s390/zvector/autovec-float-signaling-lt.c |   8 +
 .../zvector/autovec-float-signaling-ltgt.c|   9 +
 .../gcc.target/s390/zvector/autovec-fortran.h |  

Re: [PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Paul Koning
On Oct 1, 2019, at 5:14 AM, Segher Boessenkool  
wrote:
> 
> The regrename pass temporarily changes some operand RTL to CC0 so that
> note_stores and scan_rtx don't see those operands.  CC0 is deprecated
> and we want to remove it, so we need to use something else here.
> PC fits the bill fine.

CC0 is, presumably, not part of GENERAL_REGS, but PC is, in some ports.  Does 
that cause a problem here? 

paul



Fix reload after function-abi patches (PR91948)

2019-10-01 Thread Richard Sandiford
The code was passing a pseudo rather than its allocated hard reg
to ira_need_caller_save_p.  Running under valgrind to reproduce
the failure also showed that ALLOCNO_CROSSED_CALLS_ABIS wasn't
being explicitly initialised.

Tested on aarch64-linux-gnu and cross-tested against the testcase
on cris-elf.  Applied as obvious.

Richard


2019-10-01  Richard Sandiford  

gcc/
PR rtl-optimization/91948
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_CROSSED_CALLS_ABIS.
* ira-color.c (allocno_reload_assign): Pass hard_regno rather
than regno to ira_need_caller_save_p.

Index: gcc/ira-build.c
===
--- gcc/ira-build.c 2019-10-01 09:55:35.134088716 +0100
+++ gcc/ira-build.c 2019-10-01 13:55:02.074324628 +0100
@@ -504,6 +504,7 @@ ira_create_allocno (int regno, bool cap_
   ALLOCNO_CALL_FREQ (a) = 0;
   ALLOCNO_CALLS_CROSSED_NUM (a) = 0;
   ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a) = 0;
+  ALLOCNO_CROSSED_CALLS_ABIS (a) = 0;
   CLEAR_HARD_REG_SET (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a));
 #ifdef STACK_REGS
   ALLOCNO_NO_STACK_REG_P (a) = false;
Index: gcc/ira-color.c
===
--- gcc/ira-color.c 2019-09-30 17:20:49.594663967 +0100
+++ gcc/ira-color.c 2019-10-01 13:55:02.074324628 +0100
@@ -4398,7 +4398,7 @@ allocno_reload_assign (ira_allocno_t a,
   ? ALLOCNO_CLASS_COST (a)
   : ALLOCNO_HARD_REG_COSTS (a)[ira_class_hard_reg_index
[aclass][hard_regno]]));
-  if (ira_need_caller_save_p (a, regno))
+  if (ira_need_caller_save_p (a, hard_regno))
{
  ira_assert (flag_caller_saves);
  caller_save_needed = 1;


Re: [PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space

2019-10-01 Thread Chung-Lin Tang

On 2019/9/24 6:43 PM, Chung-Lin Tang wrote:



--- gcc/config/nvptx/nvptx.c(revision 275493)

+++ gcc/config/nvptx/nvptx.c(working copy)
+static void
+nvptx_expand_to_rtl_hook (void)
+{
+  /* For utilizing CUDA .param kernel arguments, we detect and modify
+ the gimple of offloaded child functions, here before RTL expansion,
+ starting with standard OMP form:
+  foo._omp_fn.0 (const struct .omp_data_t.8 & restrict .omp_data_i) { ... }
+
+ and transform it into a style where the OMP data record fields are
+ "exploded" into individual scalar arguments:
+  foo._omp_fn.0 (int * a, int * b, int * c) { ... }
+
+ Note that there are implicit assumptions of how OMP lowering (and/or other
+ intervening passes) behaves contained in this transformation code;
+ if those passes change in their output, this code may possibly need
+ updating.  */
+
+  if (lookup_attribute ("omp target entrypoint",
+DECL_ATTRIBUTES (current_function_decl))
+  /* The rather indirect manner in which OpenMP target functions are
+ launched makes this transformation only valid for OpenACC currently.
+ TODO: e.g. write_omp_entry(), nvptx_declare_function_name(), etc.
+ needs changes for this to work with OpenMP.  */
+  && lookup_attribute ("oacc function",
+   DECL_ATTRIBUTES (current_function_decl))
+  && VOID_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl

Why the 'void' return conditional?  (Or, should that rather be an
'gcc_checking_assert' at the top of the following block?)


That the shape of child functions omp-low generates. Maybe that should be an
assertion, though here I'm just doing sanity checking and ignoring otherwise.

Come to think of it, maybe I should try using the assertion to check if
I'm unintentionally ignoring transforming some cases...


I've updated the patch to use an assertion for those convention checks. I think
it's better leave a level of checking in place, so gcc_assert() instead of
gcc_checking_assert(). Also tested no regressions.

Thanks,
Chung-Lin
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c(revision 276406)
+++ gcc/config/nvptx/nvptx.c(working copy)
@@ -68,6 +68,10 @@
 #include "attribs.h"
 #include "tree-vrp.h"
 #include "tree-ssa-operands.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "tree-cfg.h"
+#include "gimple-ssa.h"
 #include "tree-ssanames.h"
 #include "gimplify.h"
 #include "tree-phinodes.h"
@@ -6437,6 +6441,226 @@ nvptx_set_current_function (tree fndecl)
   oacc_bcast_partition = 0;
 }
 
+static void
+nvptx_expand_to_rtl_hook (void)
+{
+  /* For utilizing CUDA .param kernel arguments, we detect and modify
+ the gimple of offloaded child functions, here before RTL expansion,
+ starting with standard OMP form:
+  foo._omp_fn.0 (const struct .omp_data_t.8 & restrict .omp_data_i) { ... }
+   
+ and transform it into a style where the OMP data record fields are
+ "exploded" into individual scalar arguments:
+  foo._omp_fn.0 (int * a, int * b, int * c) { ... }
+
+ Note that there are implicit assumptions of how OMP lowering (and/or other
+ intervening passes) behaves contained in this transformation code;
+ if those passes change in their output, this code may possibly need
+ updating.  */
+
+  if (lookup_attribute ("omp target entrypoint",
+   DECL_ATTRIBUTES (current_function_decl))
+  /* The rather indirect manner in which OpenMP target functions are
+launched makes this transformation only valid for OpenACC currently.
+TODO: e.g. write_omp_entry(), nvptx_declare_function_name(), etc.
+needs changes for this to work with OpenMP.  */
+  && lookup_attribute ("oacc function",
+  DECL_ATTRIBUTES (current_function_decl)))
+{
+  tree omp_data_arg = DECL_ARGUMENTS (current_function_decl);
+  tree argtype = TREE_TYPE (omp_data_arg);
+
+  /* Ensure this function is of the form of a single reference argument
+to the OMP data record, or a single void* argument (when no values
+passed)  */
+  gcc_assert (VOID_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl)))
+ && (DECL_CHAIN (omp_data_arg) == NULL_TREE
+ && ((TREE_CODE (argtype) == REFERENCE_TYPE
+  && TREE_CODE (TREE_TYPE (argtype)) == RECORD_TYPE)
+ || (TREE_CODE (argtype) == POINTER_TYPE
+ && TREE_TYPE (argtype) == void_type_node;
+  if (dump_file)
+   {
+ fprintf (dump_file, "Detected offloaded child function %s, "
+  "starting parameter conversion\n",
+  print_generic_expr_to_str (current_function_decl));
+ fprintf (dump_file, "OMP data record argument: %s (tree type: %s)\n",
+  

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-01 Thread Dmitrij Pochepko
Hi Richard,

I updated patch according to all your comments.
Also bootstrapped and tested again on x86_64-pc-linux-gnu and 
aarch64-linux-gnu, which took some time.

attached v3.

Thanks,
Dmitrij

On Thu, Sep 26, 2019 at 09:47:04AM +0200, Richard Biener wrote:
> On Tue, Sep 24, 2019 at 5:29 PM Dmitrij Pochepko
>  wrote:
> >
> > Hi,
> >
> > can anybody take a look at v2?
> 
> +(if (tree_to_uhwi (@4) == 1
> + && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
> 
> those will still ICE for large __int128_t constants.  Since you do not match
> any conversions you should probably restrict the precision of 'type' like
> with
>(if (TYPE_PRECISION (type) <= 64
> && tree_to_uhwi (@4) ...
> 
> likewise tree_to_uhwi will fail for negative constants thus if the
> pattern assumes
> unsigned you should verify that as well with && TYPE_UNSIGNED  (type).
> 
> Your 'argtype' is simply 'type' so you can elide it.
> 
> +   (switch
> +   (if (types_match (argtype, long_long_unsigned_type_node))
> + (convert (BUILT_IN_POPCOUNTLL:integer_type_node @0)))
> +   (if (types_match (argtype, long_unsigned_type_node))
> + (convert (BUILT_IN_POPCOUNTL:integer_type_node @0)))
> +   (if (types_match (argtype, unsigned_type_node))
> + (convert (BUILT_IN_POPCOUNT:integer_type_node @0)))
> 
> Please test small types first so we can avoid popcountll when long == long 
> long
> or long == int.  I also wonder if we really want to use the builtins and
> check optab availability or if we nowadays should use
> direct_internal_fn_supported_p (IFN_POPCOUNT, integer_type_node, type,
> OPTIMIZE_FOR_BOTH) and
> 
> (convert (IFN_POPCOUNT:type @0))
> 
> without the switch?
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Dmitrij
> >
> > On Mon, Sep 09, 2019 at 10:03:40PM +0300, Dmitrij Pochepko wrote:
> > > Hi all.
> > >
> > > Please take a look at v2 (attached).
> > > I changed patch according to review comments. The same testing was 
> > > performed again.
> > >
> > > Thanks,
> > > Dmitrij
> > >
> > > On Thu, Sep 05, 2019 at 06:34:49PM +0300, Dmitrij Pochepko wrote:
> > > > This patch adds matching for Hamming weight (popcount) implementation. 
> > > > The following sources:
> > > >
> > > > int
> > > > foo64 (unsigned long long a)
> > > > {
> > > > unsigned long long b = a;
> > > > b -= ((b>>1) & 0xULL);
> > > > b = ((b>>2) & 0xULL) + (b & 0xULL);
> > > > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
> > > > b *= 0x0101010101010101ULL;
> > > > return (int)(b >> 56);
> > > > }
> > > >
> > > > and
> > > >
> > > > int
> > > > foo32 (unsigned int a)
> > > > {
> > > > unsigned long b = a;
> > > > b -= ((b>>1) & 0xUL);
> > > > b = ((b>>2) & 0xUL) + (b & 0xUL);
> > > > b = ((b>>4) + b) & 0x0F0F0F0FUL;
> > > > b *= 0x01010101UL;
> > > > return (int)(b >> 24);
> > > > }
> > > >
> > > > and equivalents are now recognized as popcount for platforms with hw 
> > > > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and 
> > > > aarch64-linux-gnu systems with no regressions.
> > > >
> > > > (I have no write access to repo)
> > > >
> > > > Thanks,
> > > > Dmitrij
> > > >
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/90836
> > > >
> > > > * gcc/match.pd (popcount): New pattern.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/90836
> > > >
> > > > * lib/target-supports.exp (check_effective_target_popcount)
> > > > (check_effective_target_popcountll): New effective targets.
> > > > * gcc.dg/tree-ssa/popcount4.c: New test.
> > > > * gcc.dg/tree-ssa/popcount4l.c: New test.
> > > > * gcc.dg/tree-ssa/popcount4ll.c: New test.
> > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 0317bc7..b1867bf 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -5358,6 +5358,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >(cmp (popcount @0) integer_zerop)
> > > >(rep @0 { build_zero_cst (TREE_TYPE (@0)); }
> > > >
> > > > +/* 64- and 32-bits branchless implementations of popcount are detected:
> > > > +
> > > > +   int popcount64c (uint64_t x)
> > > > +   {
> > > > + x -= (x >> 1) & 0xULL;
> > > > + x = (x & 0xULL) + ((x >> 2) & 
> > > > 0xULL);
> > > > + x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
> > > > + return (x * 0x0101010101010101ULL) >> 56;
> > > > +   }
> > > > +
> > > > +   int popcount32c (uint32_t x)
> > > > +   {
> > > > + x -= (x >> 1) & 0x;
> > > > + x = (x & 0x) + ((x >> 2) & 0x);
> > > > + x = (x + (x >> 4)) & 0x0f0f0f0f;
> > > > + return (x * 0x01010101) >> 24;
> > > > +   }  */
> > > > +(simplify
> > > > +  (convert
> > > > +(rshift
> > > > +  (mult
> > > > +   (bit_and:c
> > > > + (plus:c
> > > > 

Re: [RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Richard Biener
On Tue, 1 Oct 2019, Joel Hutton wrote:

> On 01/10/2019 12:07, Joel wrote:
> >
> > SLP vectorization: vectorize vector constructors
> >
> >
> > Currently SLP vectorization can build SLP trees staring from 
> > reductions or from group stores. This patch adds a third starting 
> > point: vector constructors.
> >
> >
> > For the following test case (compiled with -O3 -fno-vect-cost-model):
> >
> >
> > char g_d[1024], g_s1[1024], g_s2[1024]; void test_loop(void) { char /d 
> > = g_d, /s1 = g_s1, *s2 = g_s2;
> >
> >
> > for ( int y = 0; y < 128; y++ )
> > {
> >for ( int x = 0; x < 16; x++ )
> >  d[x] = s1[x] + s2[x];
> >d += 16;
> > }
> >
> > }
> >
> >
> > before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, 
> > g_s2 add x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, 
> > #:lo12:g_s2] ldrb w1, [x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] 
> > add x0, x0, :lo12:g_d ldrb w5, [x3, 1] add w1, w1, w7 fmov s0, w1 ldrb 
> > w7, [x4, 2] add w5, w5, w6 ldrb w1, [x3, 2] ldrb w6, [x4, 3] add x2, 
> > x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb w7, [x3, 3] ldrb w5, [x4, 
> > 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 ldrb w8, [x4, 5] 
> > add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, w8 ldrb 
> > w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
> > [x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins 
> > v0.b[4], w6 ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb 
> > w7, [x4, 10] add w9, w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, 
> > [x4, 11] add w8, w8, w7 ldrb w7, [x3, 11] ldrb w5, [x4, 12] add w7, 
> > w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 ldrb w12, [x4, 13] add w6, 
> > w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, w5, w12 ldrb w13, 
> > [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 ldrb w1, 
> > [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
> > v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins 
> > v0.b[14], w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp 
> > x2, x0 bne .L2 ret .cfi_endproc .LFE0:
> >
> >
> > After patch:
> >
> >
> > test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, 
> > x3, :lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, 
> > :lo12:g_d add x1, x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, 
> > v0.16b, v1.16b .p2align 3,,7 .L2: str q0, [x0], 16 cmp x0, x1 bne .L2 
> > ret .cfi_endproc .LFE0:
> >
> >
> >
> >
> > bootstrapped and tested on aarch64-none-linux-gnu
> >
> Patch attached:

I think a better place for the loop searching for CONSTRUCTORs is
vect_slp_analyze_bb_1 where I'd put it before the check you remove,
and I'd simply append found CONSTRUCTORs to the grouped_stores
array.  The fixup you do in vectorizable_operation doesn't
belong there either, I'd add a new field to the SLP instance
structure refering to the CONSTRUCTOR stmt and do the fixup
in vect_schedule_slp_instance instead where you can simply
replace the CONSTRUCTOR with the vectorized SSA name then.

+   /* Check that the constructor elements are unique.  */
+   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+ {
+   tree prev_val;
+   int j;
+   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, 
prev_val)
+   {
+ if (val == prev_val && i!=j)

why's that necessary? (it looks incomplete, also doesn't catch
[duplicate] constants)

You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
(we can have omitted trailing zeros).

What happens if you have a vector constructor that is twice
as large as the machine supports?  The vectorizer will happily
produce a two vector SSA name vectorized result but your
CONSTRUCTOR replacement doesn't work here.  I think this should
be made work correctly (not give up on that case).

Thanks,
Richard.


Re: [PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Segher Boessenkool
On Tue, Oct 01, 2019 at 10:18:37AM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> It's mentioned by name later too:
> 
> /* Step 2: Mark chains for which we have reads outside operands
>as unrenamable.
>We do this by munging all operands into CC0, and closing
>everything remaining.  */

Ah yes, I had that in a later patch.  I have patches that get rid of CC0
everywhere, except in the targets that actually use CC0 still, and there
are some examples in the manual that need to be rewritten.  If anyone
wants these patches, just ask please.

> OK with the same change there.

Done, committed.  Thanks!


Segher


[PATCH 2/2][GCC][RFC][middle-end]: Add complex number arithmetic patterns to SLP pattern matcher.

2019-10-01 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operations:

 1) Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.

c = a + (b * I) and c = a + (b * I * I)

  2) Complex multiplication and Conjucate Complex multiplication of the second
 parameter.

c = a * b and c = a * conj (b)

  3) Complex FMLA, Conjucate FMLA of the second parameter and FMLS.

c += a * b, c += a * conj (b) and c -= a * b

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.

The patterns work by looking at the sequence produced after GCC lowers complex
numbers.  As such they would match any normal operation that does the same
computations.

For this to work it has to recognize two statements in parallel as it needs to
match against operations towards both the real and imaginary numbers.  The
instructions generated by this change is also only available in their vector
forms.

The instructions also require the loads to be contiguous and so when a match is
made, and the code decides it is able to do the replacement it cancels out all
permutes.

The nodes of the SLP tree is still re-organized such that subsequent matches
match against the correct operation.  For instance matching the following SLP
tree:

 +---+
 | stmt 0 REALPART_EXPR <*_3> = _4;  |
 | stmt 1 IMAGPART_EXPR <*_3> = _36; |
 +---+
   |
   |
   v
 +---+
 |  stmt 0 _4 = _16 - _34;   |
 <-- |  stmt 1 _36 = _15 + _33;  |
 +---+
   |
   |
   v
 +---+
 |  stmt 0 _34 = _31 + _32;  |
 <-- |  stmt 1 _33 = _29 - _30;  |
 +---+
   |
   |
   v
 +---+
 |  stmt 0 _31 = _24 * _26;  |
 |  stmt 1 _29 = _23 * _26;  | -->
 +---+
   |
   |
   v
 +---+
 | stmt 0 _24 = IMAGPART_EXPR <*_7>; |
 | stmt 1 _23 = REALPART_EXPR <*_7>; |
 +---+

Which corresponds to the following C code

__attribute__((noinline,noipa))
void fma90_new (double _Complex a[restrict 32], double _Complex b[restrict 32],
double _Complex c[restrict 32])
{
  for (int i=0; i < 32; i++)
  c[i] += a[i] * b[i] * 1.0fi;
}

shows that the patterns in node (_4, _36) and (_34, _33) are the same. e.g. the
generated sequence is

mov x3, 0
.p2align 3,,7
.L139:
ldr q0, [x0, x3]
ldr q3, [x1, x3]
ldr q1, [x2, x3]
fmulv2.2d, v3.2d, v0.d[0]
fmulv0.2d, v3.2d, v0.d[1]
fcadd   v0.2d, v2.2d, v0.2d, #90
fcadd   v0.2d, v1.2d, v0.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 512
bne .L139
ret

Which you can only get by re-ordering the operations after matching the first
FCADD #90 and noticing that after that the next instruction is also an FCADD 90.

When a replacement is done a new internal function is generated which the
back-end has to expand to the proper instruction sequences.

Concretely, this generates

ldr q1, [x1, x3]
ldr q2, [x0, x3]
ldr q0, [x2, x3]
fcmla   v0.2d, v1.2d, v2.2d, #180
fcmla   v0.2d, v1.2d, v2.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

now instead of

add x3, x2, 31
sub x4, x3, x1
sub x3, x3, x0
cmp x4, 62
mov x4, 62
ccmpx3, x4, 0, hi
bls .L5
mov x3, x0
mov x0, x1
add x1, x2, 3200
.p2align 3,,7
.L3:
ld2 {v16.2d - v17.2d}, [x2]
ld2 {v2.2d - v3.2d}, [x3], 32
ld2 {v0.2d - v1.2d}, [x0], 32
mov v7.16b, v17.16b
fmulv6.2d, v0.2d, v3.2d
fmlav7.2d, v1.2d, v3.2d
fmlav6.2d, v1.2d, v2.2d
fmlsv7.2d, v2.2d, v0.2d
faddv4.2d, v6.2d, v16.2d
mov v5.16b, v7.16b
st2 {v4.2d - v5.2d}, [x2], 32
cmp x2, x1
bne .L3
ret
.L5:
add x4, x2, 8
add x6, x0, 8
add x5, x1, 8
mov x3, 0
.p2align 3,,7
.L2:
ldr d1, [x6, x3]
ldr d4, [x1, x3]
ldr d5, [x5, x3]
ldr d3, [x0, x3]
fmuld2, d4, d1
ldr d0, [x4, x3]
fmadd   d0, d5, d1, d0
ldr d1, [x2, x3]
fmadd   d2, 

[PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

2019-10-01 Thread Tamar Christina
Hi All,

This adds a framework to allow pattern matchers to be written at based on the
SLP tree.  The difference between this one and the one in tree-vect-patterns is
that this matcher allows the matching of an arbitrary number of parallel
statements and replacing of an arbitrary number of children or statements.

Any relationship created by the SLP pattern matcher will be undone if SLP fails.

The pattern matcher can also cancel all permutes depending on what the pattern
requested it to do.  As soon as one pattern requests the permutes to be
cancelled all permutes are cancelled.

Compared to the previous pattern matcher this one will work for an arbitrary
group size and will match at any arbitrary node in the SLP tree.  The only
requirement is that the entire node is matched or rejected.

vect_build_slp_tree_1 is a bit more lenient in what it accepts as "compatible
operations" but the matcher cannot be because in cases where you match the order
of the operands may be changed.  So all operands must be changed or none.

Furthermore the matcher relies on canonization of the operations inside the
SLP tree and on the fact that floating math operations are not commutative.
This means that matching a pattern does not need to try all alternatives or
combinations and that arguments will always be in the same order if it's the
same operation.

The pattern matcher also ignored uninteresting nodes such as type casts, loads
and stores.  Doing so is essential to keep the runtime down.

Each matcher is allowed a post condition that can be run to perform any changes
to the SLP tree as needed before the patterns are created and may also abort
the creation of the patterns.

When a pattern is matched it is not immediately created but instead it is
deferred until all statements in the node have been analyzed.  Only if all post
conditions are true, and all statements will be replaced will the patterns be
created in batch.  This allows us to not have to undo any work if the pattern
fails but also makes it so we traverse the tree only once.

When a new pattern is created it is a marked as a pattern to the statement it is
replacing and be marked as used in the current SLP scope.  If SLP fails then
relationship is undone and the relevancy restored.

Each pattern matcher can detect any number of pattern it wants.  The only
constraint is that the optabs they produce must all have the same arity.

The pattern matcher supports instructions that have no scalar form as they
are added as pattern statements to the stmt.  The BB is left untouched and
so the scalar loop is untouched.

Bootstrapped on aarch64-none-linux-gnu and no issues.
No regression testing done yet.

Thanks,
Tamar

gcc/ChangeLog:

2019-10-01  Tamar Christina  

* tree-vect-loop.c (vect_dissolve_slp_only_patterns): New.
(vect_dissolve_slp_only_groups): Use macro.
* tree-vect-patterns.c (vect_mark_pattern_stmts): Expose symbol.
* tree-vect-slp.c (vect_free_slp_tree): Add control of recursion and how
to free.
(ssa_name_def_to_slp_tree_map_t): New.
(vect_create_new_slp_node, vect_build_slp_tree): Use macro.
(vect_create_slp_patt_stmt): New.
(vect_match_slp_patterns_2): New.
(vect_match_slp_patterns): New.
(vect_analyze_slp_instance): Call vect_match_slp_patterns and undo
permutes.
(vect_detect_hybrid_slp_stmts): Dissolve relationships created for SLP.
* tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
(vect_mark_pattern_stmts): New.

-- 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b0cbbac0cb5ba1ffce706715d3dbb9139063803d..1535b8b74e7d79818d8cece847dc31c7596451d5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1808,6 +1808,59 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
   return opt_result::success ();
 }
 
+/* For every SLP only pattern created by the pattern matched rooted in ROOT
+   restore the relevancy of the original statements over those of the pattern
+   and destroy the pattern relationship.  This restores the SLP tree to a state
+   where it can be used when SLP build is cancelled or re-tried.  */
+
+static void
+vect_dissolve_slp_only_patterns (slp_tree root)
+{
+  if (!root)
+return;
+
+  unsigned int i;
+  slp_tree node;
+  stmt_vec_info stmt_info;
+  stmt_vec_info related_stmt_info;
+
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (root), i, stmt_info)
+if (STMT_VINFO_SLP_VECT_ONLY (stmt_info)
+&& (related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info)) != NULL)
+  {
+	if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "dissolving relevancy of %G over %G",
+			   STMT_VINFO_STMT (stmt_info),
+			   STMT_VINFO_STMT (related_stmt_info));
+	STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
+	STMT_VINFO_RELEVANT (related_stmt_info) = vect_used_in_scope;
+	STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false;
+  }
+
+  FOR_EACH_VEC_ELT 

Re: [PATCH] Fix algo constexpr tests in Debug mode

2019-10-01 Thread Jonathan Wakely

On 30/09/19 22:21 +0200, François Dumont wrote:

On 9/30/19 11:03 AM, Jonathan Wakely wrote:

These changes are fine but should have been a separate, unrelated
commit.


Ok, sorry, I consider that just a comment change was fine.


It's fine, but it is unrelated so should be a separate commit. That
makes it easier to backport the documentation fix independently of the
rest of the patch. Or if the patch had to be reverted for some reason,
we wouldn't also revert the doc fix if it was in a separate commit.

Unrelated changes should usually be separate commits.



@@ -157,10 +192,16 @@ namespace __gnu_debug
   *  otherwise.
  */
  template
+    _GLIBCXX20_CONSTEXPR
    inline bool
    __valid_range(_InputIterator __first, _InputIterator __last,
  typename _Distance_traits<_InputIterator>::__type& __dist)
    {
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+    // Detected by the compiler directly.
+    return true;
+#endif


Should this be using the built-in, not the C++20 function?


In practice it's probably equivalent, because the function is only
going to be constant-evaluated when called from C++20 code, and in
that case the std::is_constant_evaluated() function will be available.



Yes, this is why I did it this way. And moreover it is using std::pair 
move assignment operator which is also C++20 constexpr.




It just seems inconsistent to use the built-in in one place and not in
the other.


It is also the reason why the simple simple __valid_range is not using 
the other anymore.


Maybe once I'll have check all the algo calls I'll find out that this 
one need _GLIBCXX_CONSTEXPR.


I got the sensation that library is being 'constexpr' decorated only 
when needed and when properly Standardise so are the Debug helpers.


The standard says when something should be a constexpr function, we
don't get to decide. So if a function is not constexpr in C++17 and is
constexpr in C++20, we have to conform to that. For functions that are
not part of the standard and are our own implementation details, we
can choose when to make them constexpr. We could make all the debug
helpers use _GLIBCXX14_CONSTEXPR (or even _GLIBCXX_CONSTEXPR if they
meet the very restrictive C++11 constexpr requirements) but since they
are never going to be constant evaluated except in C++20, there
doesn't seem to be much point to doing that.



[PATCH] vectorizable_reduction TLC

2019-10-01 Thread Richard Biener


Happened to test this separately.

Committed.

Richard.

2019-10-01  Richard Biener  

* tree-vect-loop.c (vectorizable_reduction): Move variables
to where they are used.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276401)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5767,7 +5767,6 @@ vectorizable_reduction (stmt_vec_info st
slp_instance slp_node_instance,
stmt_vector_for_cost *cost_vec)
 {
-  tree vec_dest;
   tree scalar_dest;
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   tree vectype_in = NULL_TREE;
@@ -5778,29 +5777,21 @@ vectorizable_reduction (stmt_vec_info st
   machine_mode vec_mode;
   int op_type;
   optab optab;
-  tree new_temp = NULL_TREE;
   enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
   stmt_vec_info cond_stmt_vinfo = NULL;
   tree scalar_type;
   bool is_simple_use;
   int i;
   int ncopies;
-  stmt_vec_info prev_stmt_info, prev_phi_info;
+  stmt_vec_info prev_phi_info;
   bool single_defuse_cycle = false;
-  stmt_vec_info new_stmt_info = NULL;
   int j;
   tree ops[3];
   enum vect_def_type dts[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
   bool double_reduc = false;
-  basic_block def_bb;
-  class loop * def_stmt_loop;
-  tree def_arg;
-  auto_vec vec_oprnds0;
-  auto_vec vec_oprnds1;
-  auto_vec vec_oprnds2;
   int vec_num;
-  tree def0, tem;
+  tree tem;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
 
@@ -5900,7 +5891,7 @@ vectorizable_reduction (stmt_vec_info st
}
 
   /* Create the destination vector  */
-  vec_dest = vect_create_destination_var (phi_result, vectype_out);
+  tree vec_dest = vect_create_destination_var (phi_result, vectype_out);
 
   /* Get the loop-entry arguments.  */
   tree vec_initial_def;
@@ -6348,15 +6339,16 @@ vectorizable_reduction (stmt_vec_info st
 
   if (nested_cycle)
 {
-  def_bb = gimple_bb (reduc_def_phi);
-  def_stmt_loop = def_bb->loop_father;
-  def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
-   loop_preheader_edge (def_stmt_loop));
+  basic_block def_bb = gimple_bb (reduc_def_phi);
+  class loop *def_stmt_loop = def_bb->loop_father;
+  tree def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
+   loop_preheader_edge 
(def_stmt_loop));
   stmt_vec_info def_arg_stmt_info = loop_vinfo->lookup_def (def_arg);
   if (def_arg_stmt_info
  && (STMT_VINFO_DEF_TYPE (def_arg_stmt_info)
  == vect_double_reduction_def))
 double_reduc = true;
+  gcc_assert (!double_reduc || STMT_VINFO_RELEVANT (stmt_info) == 
vect_used_in_outer_by_reduction);
 }
 
   vect_reduction_type reduction_type
@@ -6670,6 +6662,8 @@ vectorizable_reduction (stmt_vec_info st
   if (code == DOT_PROD_EXPR
   && !types_compatible_p (TREE_TYPE (ops[0]), TREE_TYPE (ops[1])))
 {
+  gcc_unreachable ();
+  /* No testcase for this.  PR49478.  */
   if (TREE_CODE (ops[0]) == INTEGER_CST)
 ops[0] = fold_convert (TREE_TYPE (ops[1]), ops[0]);
   else if (TREE_CODE (ops[1]) == INTEGER_CST)
@@ -6812,7 +6806,15 @@ vectorizable_reduction (stmt_vec_info st
   return true;
 }
 
+
   /* Transform.  */
+  stmt_vec_info new_stmt_info = NULL;
+  stmt_vec_info prev_stmt_info;
+  tree new_temp = NULL_TREE;
+  auto_vec vec_oprnds0;
+  auto_vec vec_oprnds1;
+  auto_vec vec_oprnds2;
+  tree def0;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
@@ -6836,7 +6838,7 @@ vectorizable_reduction (stmt_vec_info st
 }
 
   /* Create the destination vector  */
-  vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
+  tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   prev_stmt_info = NULL;
   prev_phi_info = NULL;


[RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Joel Hutton
On 01/10/2019 12:07, Joel wrote:
>
> SLP vectorization: vectorize vector constructors
>
>
> Currently SLP vectorization can build SLP trees staring from 
> reductions or from group stores. This patch adds a third starting 
> point: vector constructors.
>
>
> For the following test case (compiled with -O3 -fno-vect-cost-model):
>
>
> char g_d[1024], g_s1[1024], g_s2[1024]; void test_loop(void) { char /d 
> = g_d, /s1 = g_s1, *s2 = g_s2;
>
>
> for ( int y = 0; y < 128; y++ )
> {
>for ( int x = 0; x < 16; x++ )
>  d[x] = s1[x] + s2[x];
>d += 16;
> }
>
> }
>
>
> before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, 
> g_s2 add x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, 
> #:lo12:g_s2] ldrb w1, [x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] 
> add x0, x0, :lo12:g_d ldrb w5, [x3, 1] add w1, w1, w7 fmov s0, w1 ldrb 
> w7, [x4, 2] add w5, w5, w6 ldrb w1, [x3, 2] ldrb w6, [x4, 3] add x2, 
> x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb w7, [x3, 3] ldrb w5, [x4, 
> 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 ldrb w8, [x4, 5] 
> add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, w8 ldrb 
> w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
> [x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins 
> v0.b[4], w6 ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb 
> w7, [x4, 10] add w9, w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, 
> [x4, 11] add w8, w8, w7 ldrb w7, [x3, 11] ldrb w5, [x4, 12] add w7, 
> w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 ldrb w12, [x4, 13] add w6, 
> w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, w5, w12 ldrb w13, 
> [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 ldrb w1, 
> [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
> v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins 
> v0.b[14], w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp 
> x2, x0 bne .L2 ret .cfi_endproc .LFE0:
>
>
> After patch:
>
>
> test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, 
> x3, :lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, 
> :lo12:g_d add x1, x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, 
> v0.16b, v1.16b .p2align 3,,7 .L2: str q0, [x0], 16 cmp x0, x1 bne .L2 
> ret .cfi_endproc .LFE0:
>
>
>
>
> bootstrapped and tested on aarch64-none-linux-gnu
>
Patch attached:
From 7b9e6d02017ffe6f7ab17cbdd48da41ccc5f6db0 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Fri, 27 Sep 2019 10:26:00 +0100
Subject: [PATCH] SLP vectorization: vectorize vector constructors

---
 gcc/tree-vect-slp.c   | 98 ---
 gcc/tree-vect-stmts.c | 20 +
 2 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9b86b67734ad3e3506e9cee6a532b68decf24ae6..4c715ebe34dbdb8072e15dc9053f53a1949a070d 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1923,6 +1923,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   vec scalar_stmts;
 
+  bool constructor = false;
+
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
 {
   scalar_type = TREE_TYPE (DR_REF (dr));
@@ -1935,6 +1937,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   group_size = REDUC_GROUP_SIZE (stmt_info);
 }
+  else if (is_gimple_assign (stmt_info->stmt)
+  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt_info->stmt)))
+	== VECTOR_TYPE
+  && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR)
+{
+  vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt));
+  group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt));
+  constructor = true;
+  if (TREE_CODE (vectype) != VECTOR_TYPE)
+	vectype = NULL;
+}
   else
 {
   gcc_assert (is_a  (vinfo));
@@ -1981,6 +1994,32 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
 	= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
+  else if (constructor)
+{
+  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+	{
+	  tree prev_val;
+	  int j;
+	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, prev_val)
+	{
+	  if (val == prev_val && i!=j)
+		return false;
+	}
+	  if (TREE_CODE (val) == SSA_NAME)
+	{
+	  gimple* def = SSA_NAME_DEF_STMT (val);
+	  stmt_vec_info def_info = vinfo->lookup_stmt (def);
+	  /* Value is defined in another basic block.  */
+	  if (!def_info)
+		return false;
+	  scalar_stmts.safe_push (def_info);
+	}
+	else
+	  return false;
+	}
+}
   else
 {
   /* Collect reduction statements.  */
@@ -2227,6 +2266,50 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
    max_tree_size);
 }
 
+  /* Find SLP sequences starting 

Re: [patch] Extend GIMPLE store merging to throwing stores

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 1:05 PM Eric Botcazou  wrote:
>
> [Thanks for the quick review and sorry for the longish delay]
>
> > +/* Return the index number of the landing pad for STMT, if any.  */
> > +
> > +static int
> > +lp_nr_for_store (gimple *stmt)
> > +{
> > +  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
> > +return 0;
> > +
> > +  if (!stmt_could_throw_p (cfun, stmt))
> > +return 0;
> > +
> > +  return lookup_stmt_eh_lp (stmt);
> > +}
> >
> > Did you add the wrapper as compile-time optimization?  That is,
> > I don't see why simply calling lookup_stmt_eh_lp wouldn't work?
>
> Yes, I added it for C & C++, which both trivially fail the first test.  More
> generally, every additional processing is (directly or indirectly) guarded by
> the conjunction cfun->can_throw_non_call_exceptions && cfun->eh throughout.
>
> > +  /* If the function can throw and catch non-call exceptions, we'll be
> > trying + to merge stores across different basic blocks so we need to
> > first unsplit + the EH edges in order to streamline the CFG of the
> > function.  */ +  if (cfun->can_throw_non_call_exceptions && cfun->eh)
> > +{
> > +  free_dominance_info (CDI_DOMINATORS);
> > +  maybe_remove_unreachable_handlers ();
> > +  changed = unsplit_all_eh ();
> > +  if (changed)
> > +   delete_unreachable_blocks ();
> > +}
> >
> > uh, can unsplitting really result in unreachable blocks or does it
> > merely forget to delete forwarders it made unreachable?
>
> The latter.
>
> > Removing unreachable handlers is also to make things match better?
>
> Nope, only because calculate_dominance_info aborts otherwise below.
>
> > Just wondering how much of this work we could delay to the first
> > store-merging opportunity with EH we find (but I don't care too much
> > about -fnon-call-exceptions).
>
> This block of code is a manual, stripped down ehcleanup pass.
>
> > To isolate the details above maybe move this piece into a helper
> > in tree-eh.c so you also can avoid exporting unsplit_all_eh?
>
> The main point is the unsplitting though so this would trade an explicit call
> for a less implicit one.  But what I could do is to rename unsplit_all_eh into
> unsplit_all_eh_1 and hide the technicalities in a new unsplit_all_eh.

that works for me - the patch is OK with that change.

Thanks,
Richard.

> --
> Eric Botcazou


Re: Store float for pow result test

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 10:56 AM Alexandre Oliva  wrote:
>
> Optimizing gcc.dg/torture/pr41094.c, the compiler computes the
> constant value and short-circuits the whole thing.  At -O0, however,
> on 32-bit x86, the call to pow() remains, and the program compares the
> returned value in a stack register, with excess precision, with the
> exact return value expected from pow().  If libm's pow() returns a
> slightly off result, the compare fails.  If the value in the register
> is stored in a separate variable, so it gets rounded to double
> precision, and then compared, the compare passes.
>
> It's not clear that the test was meant to detect libm's reliance on
> rounding off the excess precision, but I guess it wasn't, so I propose
> this slight change that enables it to pass regardless of the slight
> inaccuracy of the C library in use.
>
> Regstrapped on x86_64-linux-gnu, and tested on the affected target.
> Ok to install?

OK.

Richard.

>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/torture/pr41094.c: Introduce intermediate variable.
> ---
>  gcc/testsuite/gcc.dg/torture/pr41094.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr41094.c 
> b/gcc/testsuite/gcc.dg/torture/pr41094.c
> index 2a4e9616cbfad..9219a1741a37f 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr41094.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr41094.c
> @@ -13,7 +13,8 @@ double foo(void)
>
>  int main()
>  {
> -  if (foo() != 2.0)
> +  double r = foo ();
> +  if (r != 2.0)
>  abort ();
>return 0;
>  }
>
> --
> Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
> Be the change, be Free!FSF VP & FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Joel Hutton
Hi all,


Currently SLP vectorization can build SLP trees starting from reductions or 
from group stores. This patch adds a third starting point: vector constructors.


For the following aarch64 test case (compiled with -O3 -fno-vect-cost-model):


char g_d[1024], g_s1[1024], g_s2[1024];
void test_loop(void)
{
char d = g_d, s1 = g_s1, *s2 = g_s2;




for ( int y = 0; y < 128; y++ )
{
  for ( int x = 0; x < 16; x++ )
d[x] = s1[x] + s2[x];
  d += 16;
}


}

before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, g_s2 add 
x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, #:lo12:g_s2] ldrb w1, 
[x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] add x0, x0, :lo12:g_d ldrb w5, 
[x3, 1] add w1, w1, w7 fmov s0, w1 ldrb w7, [x4, 2] add w5, w5, w6 ldrb w1, 
[x3, 2] ldrb w6, [x4, 3] add x2, x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb 
w7, [x3, 3] ldrb w5, [x4, 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 
ldrb w8, [x4, 5] add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, 
w8 ldrb w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
[x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins v0.b[4], w6 
ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb w7, [x4, 10] add w9, 
w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, [x4, 11] add w8, w8, w7 ldrb 
w7, [x3, 11] ldrb w5, [x4, 12] add w7, w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 
ldrb w12, [x4, 13] add w6, w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, 
w5, w12 ldrb w13, [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 
ldrb w1, [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins v0.b[14], 
w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp x2, x0 bne .L2 ret 
.cfi_endproc .LFE0:

After patch:

test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, x3, 
:lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, :lo12:g_d add x1, 
x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, v0.16b, v1.16b .p2align 3,,7 
.L2: str q0, [x0], 16 cmp x0, x1 bne .L2 ret .cfi_endproc .LFE0:

bootstrapped and tested on aarch64-none-linux-gnu


Re: [PATCH] DWARF array bounds missing from C++ array definitions

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 10:51 AM Alexandre Oliva  wrote:
>
> On Sep 26, 2019, Richard Biener  wrote:
>
> > On Thu, Sep 26, 2019 at 4:05 AM Alexandre Oliva  wrote:
>
> > Heh, I don't have one - which usually makes me simply inline the
> > beast into the single caller :P
>
> > Maybe simply have_new_type_for_decl_with_old_die_p?
> > Or new_type_for_die_p?
>
> How about override_type_for_decl_p?

Also good.

OK for trunk (& branches I guess, it's a regression).

Thanks,
Richard.

>
> for  gcc/ChangeLog
>
> * dwarf2out.c (override_type_for_decl_p): New.
> (gen_variable_die): Use it.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/debug/dwarf2/array-0.c: New.
> * gcc.dg/debug/dwarf2/array-1.c: New.
> * gcc.dg/debug/dwarf2/array-2.c: New.
> * gcc.dg/debug/dwarf2/array-3.c: New.
> * g++.dg/debug/dwarf2/array-0.C: New.
> * g++.dg/debug/dwarf2/array-1.C: New.
> * g++.dg/debug/dwarf2/array-2.C: New.  Based on libstdc++-v3's
> src/c++98/pool_allocator.cc:__pool_alloc_base::_S_heap_size.
> * g++.dg/debug/dwarf2/array-3.C: New.  Based on
> gcc's config/i386/i386-features.c:xlogue_layout::s_instances.
> * g++.dg/debug/dwarf2/array-4.C: New.
> ---
>  gcc/dwarf2out.c |   32 
> ++-
>  gcc/testsuite/g++.dg/debug/dwarf2/array-0.C |   13 +++
>  gcc/testsuite/g++.dg/debug/dwarf2/array-1.C |   13 +++
>  gcc/testsuite/g++.dg/debug/dwarf2/array-2.C |   15 +
>  gcc/testsuite/g++.dg/debug/dwarf2/array-3.C |   20 +
>  gcc/testsuite/g++.dg/debug/dwarf2/array-4.C |   16 ++
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c |   10 
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c |   10 
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c |8 +++
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c |8 +++
>  10 files changed, 144 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index cec25fa5fa2b8..a29a200f19814 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -23706,6 +23706,34 @@ local_function_static (tree decl)
>  && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL;
>  }
>
> +/* Return true iff DECL overrides (presumably completes) the type of
> +   OLD_DIE within CONTEXT_DIE.  */
> +
> +static bool
> +override_type_for_decl_p (tree decl, dw_die_ref old_die,
> + dw_die_ref context_die)
> +{
> +  tree type = TREE_TYPE (decl);
> +  int cv_quals;
> +
> +  if (decl_by_reference_p (decl))
> +{
> +  type = TREE_TYPE (type);
> +  cv_quals = TYPE_UNQUALIFIED;
> +}
> +  else
> +cv_quals = decl_quals (decl);
> +
> +  dw_die_ref type_die = modified_type_die (type,
> +  cv_quals | TYPE_QUALS (type),
> +  false,
> +  context_die);
> +
> +  dw_die_ref old_type_die = get_AT_ref (old_die, DW_AT_type);
> +
> +  return type_die != old_type_die;
> +}
> +
>  /* Generate a DIE to represent a declared data object.
> Either DECL or ORIGIN must be non-null.  */
>
> @@ -23958,7 +23986,9 @@ gen_variable_die (tree decl, tree origin, dw_die_ref 
> context_die)
>   && !DECL_ABSTRACT_P (decl_or_origin)
>   && variably_modified_type_p (TREE_TYPE (decl_or_origin),
>decl_function_context
> -   (decl_or_origin
> +  (decl_or_origin)))
> +  || (old_die && specialization_p
> + && override_type_for_decl_p (decl_or_origin, old_die, context_die)))
>  {
>tree type = TREE_TYPE (decl_or_origin);
>
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C 
> b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
> new file mode 100644
> index 0..a3458bd0d32a4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-gdwarf-2 -dA" } */
> +struct S
> +{
> +  static int array[42];
> +};
> +
> +int S::array[42];
> +
> +/* Verify that we get only one DW_TAG_subrange_type with a
> +   DW_AT_upper_bound.  */
> +/* { dg-final { scan-assembler-times " DW_TAG_subrange_type" 2 } } */
> +/* { dg-final { scan-assembler-times " 

Re: [patch] Extend GIMPLE store merging to throwing stores

2019-10-01 Thread Eric Botcazou
[Thanks for the quick review and sorry for the longish delay]

> +/* Return the index number of the landing pad for STMT, if any.  */
> +
> +static int
> +lp_nr_for_store (gimple *stmt)
> +{
> +  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
> +return 0;
> +
> +  if (!stmt_could_throw_p (cfun, stmt))
> +return 0;
> +
> +  return lookup_stmt_eh_lp (stmt);
> +}
> 
> Did you add the wrapper as compile-time optimization?  That is,
> I don't see why simply calling lookup_stmt_eh_lp wouldn't work?

Yes, I added it for C & C++, which both trivially fail the first test.  More 
generally, every additional processing is (directly or indirectly) guarded by 
the conjunction cfun->can_throw_non_call_exceptions && cfun->eh throughout.

> +  /* If the function can throw and catch non-call exceptions, we'll be
> trying + to merge stores across different basic blocks so we need to
> first unsplit + the EH edges in order to streamline the CFG of the
> function.  */ +  if (cfun->can_throw_non_call_exceptions && cfun->eh)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  maybe_remove_unreachable_handlers ();
> +  changed = unsplit_all_eh ();
> +  if (changed)
> +   delete_unreachable_blocks ();
> +}
> 
> uh, can unsplitting really result in unreachable blocks or does it
> merely forget to delete forwarders it made unreachable?

The latter.

> Removing unreachable handlers is also to make things match better?

Nope, only because calculate_dominance_info aborts otherwise below.

> Just wondering how much of this work we could delay to the first
> store-merging opportunity with EH we find (but I don't care too much
> about -fnon-call-exceptions).

This block of code is a manual, stripped down ehcleanup pass.

> To isolate the details above maybe move this piece into a helper
> in tree-eh.c so you also can avoid exporting unsplit_all_eh?

The main point is the unsplitting though so this would trade an explicit call 
for a less implicit one.  But what I could do is to rename unsplit_all_eh into 
unsplit_all_eh_1 and hide the technicalities in a new unsplit_all_eh.

-- 
Eric Botcazou


Re: build-failure for cris-elf with "[00/32] Support multiple ABIs in the same translation unit"

2019-10-01 Thread Hans-Peter Nilsson
> From: Richard Sandiford 
> Date: Tue, 1 Oct 2019 09:51:51 +0200

> Hans-Peter Nilsson  writes:
> > My autotester for cris-elf complains about a build-breaking
> > commit in the revision range (working:breaking) 276299:276359

> Fixed as below.  I also belatedly see a new definition of
> HARD_REGNO_CALLER_SAVE_MODE was added since I posted the patches,
> so the patch fixes that too.  Tested by cross-building cris-elf
> and sparc-linux-gnu, applied as obvious.

Thanks, but...  The build still fails, now when building
gfortran.  I've entered PR91948 (JFTR; CC:ed).

brgds, H-P


Re: [04/32] [x86] Robustify vzeroupper handling across calls

2019-10-01 Thread Uros Bizjak
On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
 wrote:

> > The comment suggests that this code is only needed for Win64 and that
> > not testing for Win64 is just a simplification.  But in practice it was
> > needed for correctness on GNU/Linux and other targets too, since without
> > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > registers across calls that are known not to clobber them.
> >
> > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > that some SSE registers are not touched by a call.  There are then no
> > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > for GNU/Linux (not something we should do, was just for testing).

If RA can sse that some SSE regs are not touched by the call, then we
are sure that the called function is part of the current TU. In this
case, the called function will be compiled using VEX instructions,
where there is no AVX-SSE transition penalty. So, skipping VZEROUPPER
is beneficial here.

Uros.

> > If in fact we want -fipa-ra to pretend that all functions clobber
> > SSE registers above 128 bits, it'd certainly be possible to arrange
> > that.  But IMO that would be an optimisation decision, whereas what
> > the patch is fixing is a correctness decision.  So I think we should
> > have this check even so.
>
> 2019-09-25  Richard Sandiford  
>
> gcc/
> * config/i386/i386.c: Include function-abi.h.
> (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
> if they preserve some 256-bit or 512-bit SSE registers.
>
> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  2019-09-25 16:47:48.0 +0100
> +++ gcc/config/i386/i386.c  2019-09-25 16:47:49.089962608 +0100
> @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
>  #include "i386-builtins.h"
>  #include "i386-expand.h"
>  #include "i386-features.h"
> +#include "function-abi.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
> }
> }
>
> +  /* If the function is known to preserve some SSE registers,
> +RA and previous passes can legitimately rely on that for
> +modes wider than 256 bits.  It's only safe to issue a
> +vzeroupper if all SSE registers are clobbered.  */
> +  const function_abi  = insn_callee_abi (insn);
> +  if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> + abi.mode_clobbers (V4DImode)))
> +   return AVX_U128_ANY;
> +
>return AVX_U128_CLEAN;
>  }
>


Re: [PATCH] IPA-CP release transformation summary (PR jit/91928)

2019-10-01 Thread Andrea Corallo

Martin Jambor writes:

> Hi,
>
> On Mon, Sep 30 2019, Andrea Corallo wrote:
>> Hi all,
>> I'd like to submit this patch.
>> It release the ipa cp transformation summary after functions being expanded.
>> This is to fix the compiler when used with libgccjit on subsequent
>> compilations (every new compilation should have a clean transformation
>> summary).
>
> if this is a general problem then I think we should instead add another
> hook to class ipa_opt_pass_d to free transformation summary, call it for
> all IPA passes at the appropriate time and implement it for IPA-CP. That
> way it will work for all IPA passes which might have a transformation
> summary.
>
> Martin
>
>
>>
>> Bootstrap on arm64 and X86-64.
>>
>> Bests
>>   Andrea
>>
>> gcc/ChangeLog
>> 2019-??-??  Andrea Corallo  
>>
>>  * cgraphunit.c (expand_all_functions): Release ipcp_transformation_sum
>>  when finished.
>>  * ipa-prop.c (ipcp_free_transformation_sum): New function.
>>  * ipa-prop.h (ipcp_free_transformation_sum): Add declaration.

Hi,
actually looking around in order to implement the suggestions I realized
that already some code was put in place in toplev::finalize calling
then ipa_cp_c_finalize exactly for this purpose.

I've updated the patch accordingly.

Bootstraped on aarch64.

Is it okay for trunk?

Bests
  Andrea

gcc/ChangeLog
2019-??-??  Andrea Corallo  

* ipa-cp.c (ipa_cp_c_finalize): Release ipcp_transformation_sum.
* ipa-prop.c (ipcp_free_transformation_sum): New function.
* ipa-prop.h (ipcp_free_transformation_sum): Add declaration.
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index b4fb74e..2b40220 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -5305,4 +5305,5 @@ ipa_cp_c_finalize (void)
   max_count = profile_count::uninitialized ();
   overall_size = 0;
   max_new_size = 0;
+  ipcp_free_transformation_sum ();
 }
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 30948fb..0ff8085 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -561,6 +561,7 @@ struct GTY(()) ipcp_transformation
 void ipa_set_node_agg_value_chain (struct cgraph_node *node,
    struct ipa_agg_replacement_value *aggvals);
 void ipcp_transformation_initialize (void);
+void ipcp_free_transformation_sum (void);
 
 /* ipa_edge_args stores information related to a callsite and particularly its
arguments.  It can be accessed by the IPA_EDGE_REF macro.  */
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 2f2b070..158dbe7 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3758,6 +3758,18 @@ ipcp_transformation_initialize (void)
 ipcp_transformation_sum = ipcp_transformation_t::create_ggc (symtab);
 }
 
+/* Release the IPA CP transformation summary.  */
+
+void
+ipcp_free_transformation_sum (void)
+{
+  if (!ipcp_transformation_sum)
+return;
+
+  ipcp_transformation_sum->release ();
+  ipcp_transformation_sum = NULL;
+}
+
 /* Set the aggregate replacements of NODE to be AGGVALS.  */
 
 void


Add myself to MAINTAINERS files

2019-10-01 Thread Harwath, Frederik
2019-10-01  Frederik Harwath 

* MAINTAINERS: Add myself to Write After Approval

Index: ChangeLog
===
--- ChangeLog   (revision 276390)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2019-10-01  Frederik Harwath 
+
+   * MAINTAINERS: Add myself to Write After Approval
+
 2019-09-26  Richard Sandiford  

* MAINTAINERS: Add myself as an aarch64 maintainer.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 276390)
+++ MAINTAINERS (working copy)
@@ -409,6 +409,7 @@
 Wei Guozhi 
 Mostafa Hagog  
 Andrew Haley   
+Frederik Harwath   
 Stuart Hastings
 Michael Haubenwallner  

 Pat Haugen 






Re: [x86] recompute opt flags after opt level change

2019-10-01 Thread Uros Bizjak
On Tue, Oct 1, 2019 at 10:59 AM Alexandre Oliva  wrote:
>
> flag_omit_frame_pointer is set in machine-independent code depending
> on the optimization level.  It is then overridden in x86
> target-specific code depending on a macro defined by
> --enable-frame-pointer.
>
> Uses of attribute optimize go through machine-independent overriding
> of flag_omit_frame_pointer, but the x86-specific overriding code did
> NOT cover this flag, so, even if the attribute does not change the
> optimization level, flag_omit_frame_pointer may end up with a
> different value, and prevent inlining because of incompatible flags,
> as detected by the gcc.dg/ipa/iinline-attr.c test on an
> --enable-frame-pointer x86 toolchain.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
> * config/i386/i386-options.c
> (ix86_recompute_optlev_based_flags): New, moved out of...
> (ix86_option_override_internal): ... this.  Call it.
> (ix86_override_options_after_change): Call it here too.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.c |   89 
> ++--
>  1 file changed, 50 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index c148aa20511db..ed286bffaaa3b 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -1527,12 +1527,61 @@ ix86_default_align (struct gcc_options *opts)
>  opts->x_str_align_functions = 
> processor_cost_table[ix86_tune]->align_func;
>  }
>
> +#ifndef USE_IX86_FRAME_POINTER
> +#define USE_IX86_FRAME_POINTER 0
> +#endif
> +
> +/* (Re)compute option overrides affected by optimization levels in
> +   target-specific ways.  */
> +
> +static void
> +ix86_recompute_optlev_based_flags (struct gcc_options *opts,
> +  struct gcc_options *opts_set)
> +{
> +  /* Set the default values for switches whose default depends on 
> TARGET_64BIT
> + in case they weren't overwritten by command line options.  */
> +  if (TARGET_64BIT_P (opts->x_ix86_isa_flags))
> +{
> +  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
> +   opts->x_flag_omit_frame_pointer = !USE_IX86_FRAME_POINTER;
> +  if (opts->x_flag_asynchronous_unwind_tables
> + && !opts_set->x_flag_unwind_tables
> + && TARGET_64BIT_MS_ABI)
> +   opts->x_flag_unwind_tables = 1;
> +  if (opts->x_flag_asynchronous_unwind_tables == 2)
> +   opts->x_flag_unwind_tables
> + = opts->x_flag_asynchronous_unwind_tables = 1;
> +  if (opts->x_flag_pcc_struct_return == 2)
> +   opts->x_flag_pcc_struct_return = 0;
> +}
> +  else
> +{
> +  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
> +   opts->x_flag_omit_frame_pointer
> + = !(USE_IX86_FRAME_POINTER || opts->x_optimize_size);
> +  if (opts->x_flag_asynchronous_unwind_tables == 2)
> +   opts->x_flag_asynchronous_unwind_tables = !USE_IX86_FRAME_POINTER;
> +  if (opts->x_flag_pcc_struct_return == 2)
> +   {
> + /* Intel MCU psABI specifies that -freg-struct-return should
> +be on.  Instead of setting DEFAULT_PCC_STRUCT_RETURN to 1,
> +we check -miamcu so that -freg-struct-return is always
> +turned on if -miamcu is used.  */
> + if (TARGET_IAMCU_P (opts->x_target_flags))
> +   opts->x_flag_pcc_struct_return = 0;
> + else
> +   opts->x_flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
> +   }
> +}
> +}
> +
>  /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
>
>  void
>  ix86_override_options_after_change (void)
>  {
>ix86_default_align (_options);
> +  ix86_recompute_optlev_based_flags (_options, _options_set);
>  }
>
>  /* Clear stack slot assignments remembered from previous functions.
> @@ -2220,45 +2269,7 @@ ix86_option_override_internal (bool main_args_p,
>
>set_ix86_tune_features (ix86_tune, opts->x_ix86_dump_tunes);
>
> -#ifndef USE_IX86_FRAME_POINTER
> -#define USE_IX86_FRAME_POINTER 0
> -#endif
> -
> -  /* Set the default values for switches whose default depends on 
> TARGET_64BIT
> - in case they weren't overwritten by command line options.  */
> -  if (TARGET_64BIT_P (opts->x_ix86_isa_flags))
> -{
> -  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
> -   opts->x_flag_omit_frame_pointer = !USE_IX86_FRAME_POINTER;
> -  if (opts->x_flag_asynchronous_unwind_tables
> - && !opts_set->x_flag_unwind_tables
> - && TARGET_64BIT_MS_ABI)
> -   opts->x_flag_unwind_tables = 1;
> -  if (opts->x_flag_asynchronous_unwind_tables == 2)
> -   opts->x_flag_unwind_tables
> - = opts->x_flag_asynchronous_unwind_tables = 1;
> -  if (opts->x_flag_pcc_struct_return == 2)
> -   opts->x_flag_pcc_struct_return = 0;
> -}
> -  else
> -{
> -  if (opts->x_optimize >= 1 && 

Re: [PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Richard Sandiford
Segher Boessenkool  writes:
> The regrename pass temporarily changes some operand RTL to CC0 so that
> note_stores and scan_rtx don't see those operands.  CC0 is deprecated
> and we want to remove it, so we need to use something else here.
> PC fits the bill fine.
>
> Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
>
>
> Segher
>
>
> 2019-10-01  Segher Boessenkool  
>
>   * regrename.c (hide_operands): Use pc_rtx instead of cc0_rtx.
>
> ---
>  gcc/regrename.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index 8c3bae8..ed1dcde 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1517,7 +1517,7 @@ scan_rtx (rtx_insn *insn, rtx *loc, enum reg_class cl, 
> enum scan_actions action,
>  }
>  
>  /* Hide operands of the current insn (of which there are N_OPS) by
> -   substituting cc0 for them.
> +   substituting pc for them.
> Previous values are stored in the OLD_OPERANDS and OLD_DUPS.
> For every bit set in DO_NOT_HIDE, we leave the operand alone.
> If INOUT_AND_EC_ONLY is set, we only do this for OP_INOUT type operands
> @@ -1541,7 +1541,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
> *old_dups,
>   continue;
>if (!inout_and_ec_only || recog_data.operand_type[i] == OP_INOUT
> || op_alt[i].earlyclobber)
> - *recog_data.operand_loc[i] = cc0_rtx;
> + *recog_data.operand_loc[i] = pc_rtx;
>  }
>for (i = 0; i < recog_data.n_dups; i++)
>  {
> @@ -1551,7 +1551,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
> *old_dups,
>   continue;
>if (!inout_and_ec_only || recog_data.operand_type[opn] == OP_INOUT
> || op_alt[opn].earlyclobber)
> - *recog_data.dup_loc[i] = cc0_rtx;
> + *recog_data.dup_loc[i] = pc_rtx;
>  }
>  }

It's mentioned by name later too:

  /* Step 2: Mark chains for which we have reads outside operands
 as unrenamable.
 We do this by munging all operands into CC0, and closing
 everything remaining.  */

OK with the same change there.

Thanks,
Richard


[PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Segher Boessenkool
The regrename pass temporarily changes some operand RTL to CC0 so that
note_stores and scan_rtx don't see those operands.  CC0 is deprecated
and we want to remove it, so we need to use something else here.
PC fits the bill fine.

Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?


Segher


2019-10-01  Segher Boessenkool  

* regrename.c (hide_operands): Use pc_rtx instead of cc0_rtx.

---
 gcc/regrename.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 8c3bae8..ed1dcde 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1517,7 +1517,7 @@ scan_rtx (rtx_insn *insn, rtx *loc, enum reg_class cl, 
enum scan_actions action,
 }
 
 /* Hide operands of the current insn (of which there are N_OPS) by
-   substituting cc0 for them.
+   substituting pc for them.
Previous values are stored in the OLD_OPERANDS and OLD_DUPS.
For every bit set in DO_NOT_HIDE, we leave the operand alone.
If INOUT_AND_EC_ONLY is set, we only do this for OP_INOUT type operands
@@ -1541,7 +1541,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
*old_dups,
continue;
   if (!inout_and_ec_only || recog_data.operand_type[i] == OP_INOUT
  || op_alt[i].earlyclobber)
-   *recog_data.operand_loc[i] = cc0_rtx;
+   *recog_data.operand_loc[i] = pc_rtx;
 }
   for (i = 0; i < recog_data.n_dups; i++)
 {
@@ -1551,7 +1551,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
*old_dups,
continue;
   if (!inout_and_ec_only || recog_data.operand_type[opn] == OP_INOUT
  || op_alt[opn].earlyclobber)
-   *recog_data.dup_loc[i] = cc0_rtx;
+   *recog_data.dup_loc[i] = pc_rtx;
 }
 }
 
-- 
1.8.3.1



Re: [PATCH] Implement C++20 P1023 for __debug::array

2019-10-01 Thread Jonathan Wakely

On 30/09/19 22:31 +0200, François Dumont wrote:

This is a missing part of C++20 P1023 for __debug::array.

    Implement C++20 p1023 - constexpr comparison operators for 
__debug::array.

    * include/debug/array: Add C++20 constexpr to comparison operators.
    * testsuite/23_containers/array/tuple_interface/get_debug_neg.cc: Adapt
    dg-error line numbers.
    * testsuite/23_containers/array/tuple_interface/
    tuple_element_debug_neg.cc: Likewise.


Tested under Linux x86_64 normal and debug modes.


Thanks.



[x86] recompute opt flags after opt level change

2019-10-01 Thread Alexandre Oliva
flag_omit_frame_pointer is set in machine-independent code depending
on the optimization level.  It is then overridden in x86
target-specific code depending on a macro defined by
--enable-frame-pointer.

Uses of attribute optimize go through machine-independent overriding
of flag_omit_frame_pointer, but the x86-specific overriding code did
NOT cover this flag, so, even if the attribute does not change the
optimization level, flag_omit_frame_pointer may end up with a
different value, and prevent inlining because of incompatible flags,
as detected by the gcc.dg/ipa/iinline-attr.c test on an
--enable-frame-pointer x86 toolchain.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* config/i386/i386-options.c
(ix86_recompute_optlev_based_flags): New, moved out of...
(ix86_option_override_internal): ... this.  Call it.
(ix86_override_options_after_change): Call it here too.
---
 gcc/config/i386/i386-options.c |   89 ++--
 1 file changed, 50 insertions(+), 39 deletions(-)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index c148aa20511db..ed286bffaaa3b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1527,12 +1527,61 @@ ix86_default_align (struct gcc_options *opts)
 opts->x_str_align_functions = processor_cost_table[ix86_tune]->align_func;
 }
 
+#ifndef USE_IX86_FRAME_POINTER
+#define USE_IX86_FRAME_POINTER 0
+#endif
+
+/* (Re)compute option overrides affected by optimization levels in
+   target-specific ways.  */
+
+static void
+ix86_recompute_optlev_based_flags (struct gcc_options *opts,
+  struct gcc_options *opts_set)
+{
+  /* Set the default values for switches whose default depends on TARGET_64BIT
+ in case they weren't overwritten by command line options.  */
+  if (TARGET_64BIT_P (opts->x_ix86_isa_flags))
+{
+  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
+   opts->x_flag_omit_frame_pointer = !USE_IX86_FRAME_POINTER;
+  if (opts->x_flag_asynchronous_unwind_tables
+ && !opts_set->x_flag_unwind_tables
+ && TARGET_64BIT_MS_ABI)
+   opts->x_flag_unwind_tables = 1;
+  if (opts->x_flag_asynchronous_unwind_tables == 2)
+   opts->x_flag_unwind_tables
+ = opts->x_flag_asynchronous_unwind_tables = 1;
+  if (opts->x_flag_pcc_struct_return == 2)
+   opts->x_flag_pcc_struct_return = 0;
+}
+  else
+{
+  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
+   opts->x_flag_omit_frame_pointer
+ = !(USE_IX86_FRAME_POINTER || opts->x_optimize_size);
+  if (opts->x_flag_asynchronous_unwind_tables == 2)
+   opts->x_flag_asynchronous_unwind_tables = !USE_IX86_FRAME_POINTER;
+  if (opts->x_flag_pcc_struct_return == 2)
+   {
+ /* Intel MCU psABI specifies that -freg-struct-return should
+be on.  Instead of setting DEFAULT_PCC_STRUCT_RETURN to 1,
+we check -miamcu so that -freg-struct-return is always
+turned on if -miamcu is used.  */
+ if (TARGET_IAMCU_P (opts->x_target_flags))
+   opts->x_flag_pcc_struct_return = 0;
+ else
+   opts->x_flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
+   }
+}
+}
+
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
 
 void
 ix86_override_options_after_change (void)
 {
   ix86_default_align (_options);
+  ix86_recompute_optlev_based_flags (_options, _options_set);
 }
 
 /* Clear stack slot assignments remembered from previous functions.
@@ -2220,45 +2269,7 @@ ix86_option_override_internal (bool main_args_p,
 
   set_ix86_tune_features (ix86_tune, opts->x_ix86_dump_tunes);
 
-#ifndef USE_IX86_FRAME_POINTER
-#define USE_IX86_FRAME_POINTER 0
-#endif
-
-  /* Set the default values for switches whose default depends on TARGET_64BIT
- in case they weren't overwritten by command line options.  */
-  if (TARGET_64BIT_P (opts->x_ix86_isa_flags))
-{
-  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
-   opts->x_flag_omit_frame_pointer = !USE_IX86_FRAME_POINTER;
-  if (opts->x_flag_asynchronous_unwind_tables
- && !opts_set->x_flag_unwind_tables
- && TARGET_64BIT_MS_ABI)
-   opts->x_flag_unwind_tables = 1;
-  if (opts->x_flag_asynchronous_unwind_tables == 2)
-   opts->x_flag_unwind_tables
- = opts->x_flag_asynchronous_unwind_tables = 1;
-  if (opts->x_flag_pcc_struct_return == 2)
-   opts->x_flag_pcc_struct_return = 0;
-}
-  else
-{
-  if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
-   opts->x_flag_omit_frame_pointer
- = !(USE_IX86_FRAME_POINTER || opts->x_optimize_size);
-  if (opts->x_flag_asynchronous_unwind_tables == 2)
-   opts->x_flag_asynchronous_unwind_tables = !USE_IX86_FRAME_POINTER;
-  if (opts->x_flag_pcc_struct_return == 2)
-   {
- 

Store float for pow result test

2019-10-01 Thread Alexandre Oliva
Optimizing gcc.dg/torture/pr41094.c, the compiler computes the
constant value and short-circuits the whole thing.  At -O0, however,
on 32-bit x86, the call to pow() remains, and the program compares the
returned value in a stack register, with excess precision, with the
exact return value expected from pow().  If libm's pow() returns a
slightly off result, the compare fails.  If the value in the register
is stored in a separate variable, so it gets rounded to double
precision, and then compared, the compare passes.

It's not clear that the test was meant to detect libm's reliance on
rounding off the excess precision, but I guess it wasn't, so I propose
this slight change that enables it to pass regardless of the slight
inaccuracy of the C library in use.

Regstrapped on x86_64-linux-gnu, and tested on the affected target.
Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/torture/pr41094.c: Introduce intermediate variable.
---
 gcc/testsuite/gcc.dg/torture/pr41094.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr41094.c 
b/gcc/testsuite/gcc.dg/torture/pr41094.c
index 2a4e9616cbfad..9219a1741a37f 100644
--- a/gcc/testsuite/gcc.dg/torture/pr41094.c
+++ b/gcc/testsuite/gcc.dg/torture/pr41094.c
@@ -13,7 +13,8 @@ double foo(void)
 
 int main()
 {
-  if (foo() != 2.0)
+  double r = foo ();
+  if (r != 2.0)
 abort ();
   return 0;
 }

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [AArch64] Make call insns record the callee's arm_pcs

2019-10-01 Thread Richard Sandiford
Richard Sandiford  writes:
> [This follows on from:
>  https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
>  https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01456.html]
>
> At the moment we rely on SYMBOL_REF_DECL to get the ABI of the callee
> of a call insn, falling back to the default ABI if the decl isn't
> available.  I think it'd be cleaner to attach the ABI directly to the
> call instruction instead, which would also have the very minor benefit
> of handling indirect calls more efficiently.
>
> Tested on aarch64-linux-gnu.  OK to install?

Now applied as r276391 after getting a second opinion from Kyrill.

Richard


Re: [PATCH] DWARF array bounds missing from C++ array definitions

2019-10-01 Thread Alexandre Oliva
On Sep 26, 2019, Richard Biener  wrote:

> On Thu, Sep 26, 2019 at 4:05 AM Alexandre Oliva  wrote:

> Heh, I don't have one - which usually makes me simply inline the
> beast into the single caller :P

> Maybe simply have_new_type_for_decl_with_old_die_p?
> Or new_type_for_die_p?

How about override_type_for_decl_p?


for  gcc/ChangeLog

* dwarf2out.c (override_type_for_decl_p): New.
(gen_variable_die): Use it.

for  gcc/testsuite/ChangeLog

* gcc.dg/debug/dwarf2/array-0.c: New.
* gcc.dg/debug/dwarf2/array-1.c: New.
* gcc.dg/debug/dwarf2/array-2.c: New.
* gcc.dg/debug/dwarf2/array-3.c: New.
* g++.dg/debug/dwarf2/array-0.C: New.
* g++.dg/debug/dwarf2/array-1.C: New.
* g++.dg/debug/dwarf2/array-2.C: New.  Based on libstdc++-v3's
src/c++98/pool_allocator.cc:__pool_alloc_base::_S_heap_size.
* g++.dg/debug/dwarf2/array-3.C: New.  Based on
gcc's config/i386/i386-features.c:xlogue_layout::s_instances.
* g++.dg/debug/dwarf2/array-4.C: New.
---
 gcc/dwarf2out.c |   32 ++-
 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C |   13 +++
 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C |   13 +++
 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C |   15 +
 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C |   20 +
 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C |   16 ++
 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c |   10 
 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c |   10 
 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c |8 +++
 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c |8 +++
 10 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index cec25fa5fa2b8..a29a200f19814 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -23706,6 +23706,34 @@ local_function_static (tree decl)
 && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL;
 }
 
+/* Return true iff DECL overrides (presumably completes) the type of
+   OLD_DIE within CONTEXT_DIE.  */
+
+static bool
+override_type_for_decl_p (tree decl, dw_die_ref old_die,
+ dw_die_ref context_die)
+{
+  tree type = TREE_TYPE (decl);
+  int cv_quals;
+
+  if (decl_by_reference_p (decl))
+{
+  type = TREE_TYPE (type);
+  cv_quals = TYPE_UNQUALIFIED;
+}
+  else
+cv_quals = decl_quals (decl);
+
+  dw_die_ref type_die = modified_type_die (type,
+  cv_quals | TYPE_QUALS (type),
+  false,
+  context_die);
+
+  dw_die_ref old_type_die = get_AT_ref (old_die, DW_AT_type);
+
+  return type_die != old_type_die;
+}
+
 /* Generate a DIE to represent a declared data object.
Either DECL or ORIGIN must be non-null.  */
 
@@ -23958,7 +23986,9 @@ gen_variable_die (tree decl, tree origin, dw_die_ref 
context_die)
  && !DECL_ABSTRACT_P (decl_or_origin)
  && variably_modified_type_p (TREE_TYPE (decl_or_origin),
   decl_function_context
-   (decl_or_origin
+  (decl_or_origin)))
+  || (old_die && specialization_p
+ && override_type_for_decl_p (decl_or_origin, old_die, context_die)))
 {
   tree type = TREE_TYPE (decl_or_origin);
 
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
new file mode 100644
index 0..a3458bd0d32a4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-gdwarf-2 -dA" } */
+struct S
+{
+  static int array[42];
+};
+
+int S::array[42];
+
+/* Verify that we get only one DW_TAG_subrange_type with a
+   DW_AT_upper_bound.  */
+/* { dg-final { scan-assembler-times " DW_TAG_subrange_type" 2 } } */
+/* { dg-final { scan-assembler-times " DW_AT_upper_bound" 1 } } */
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/array-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/array-1.C
new file mode 100644
index 0..e8fd6f8ffea56
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/array-1.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-gdwarf-2 -dA" } */
+struct S
+{
+  static int array[];
+};
+

Re: [PATCH v2 1/2] libada: Remove racy duplicate gnatlib installation

2019-10-01 Thread Arnaud Charlet
>  I have verified this change by running my combined build process where a 
> native `x86_64-linux-gnu' configuration is built first and then used to 
> build a `riscv64-linux-gnu' cross-compiler, both with `--disable-libada' 
> specified, without and with this patch applied.  I have added `make -C gcc 
> gnatlib && make -C gcc gnattools' as an extra build step before `make 
> install'.
> 
>  This has run up to failing to find `riscv64-linux-gnu' system headers in 
> `make -C gcc gnatlib' as noted above, at which point the installation 
> trees had both the same contents, including `x86_64-linux-gnu' gnatlib 
> development files and static libraries as well as gnattools in particular.

Can you also please do a native build with --disable-libada and
make -C gcc gnatlib && make -C gcc gnattools && make install
?

Once successful, the change is OK, thanks for the extra work.

Arno


Re: build-failure for cris-elf with "[00/32] Support multiple ABIs in the same translation unit"

2019-10-01 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
>> From: Richard Sandiford 
>> Date: Wed, 11 Sep 2019 21:02:26 +0200
>
>> This series of patches introduces some classes and helpers for handling
>> multiple ABIs in the same translation unit.  At the moment "ABI" maans
>> specifically the choice of call-clobbered registers
> [...]
>
>> The series also makes -fipa-ra work for partially-clobbered registers too.
> [...]
>
> My autotester for cris-elf complains about a build-breaking
> commit in the revision range (working:breaking) 276299:276359
> and a glance at those commits and the error message says the
> cause is likely one of your commits.  Relevant part of
> build-log, hopefully sufficient:
>
> ---
> g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   
> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
> -I/x/hpautotest-gcc1/gcc/gcc -I/x/hpautotest-gcc1/gcc/gcc/. 
> -I/x/hpautotest-gcc1/gcc/gcc/../include 
> -I/x/hpautotest-gcc1/gcc/gcc/../libcpp/include 
> -I/x/hpautotest-gcc1/cris-elf/gccobj/./gmp -I/x/hpautotest-gcc1/gcc/gmp 
> -I/x/hpautotest-gcc1/cris-elf/gccobj/./mpfr/src 
> -I/x/hpautotest-gcc1/gcc/mpfr/src -I/x/hpautotest-gcc1/gcc/mpc/src  
> -I/x/hpautotest-gcc1/gcc/gcc/../libdecnumber 
> -I/x/hpautotest-gcc1/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/x/hpautotest-gcc1/gcc/gcc/../libbacktrace   -o caller-save.o -MT 
> caller-save.o -MMD -MP -MF ./.deps/caller-save.TPo 
> /x/hpautotest-gcc1/gcc/gcc/caller-save.c
> In file included from /x/hpautotest-gcc1/gcc/gcc/caller-save.c:31:0:
> /x/hpautotest-gcc1/gcc/gcc/caller-save.c: In function 'void 
> init_caller_save()':
> /x/hpautotest-gcc1/gcc/gcc/regs.h:195:44: error: cannot convert 'bool' to 
> 'const predefined_function_abi*' for argument '3' to 'machine_mode 
> choose_hard_reg_mode(unsigned int, unsigned int, const 
> predefined_function_abi*)'
>choose_hard_reg_mode (REGNO, NREGS, false)
> ^
> /x/hpautotest-gcc1/gcc/gcc/caller-save.c:203:26: note: in expansion of macro 
> 'HARD_REGNO_CALLER_SAVE_MODE'
>   regno_save_mode[i][j] = HARD_REGNO_CALLER_SAVE_MODE (i, j, VOIDmode);
>   ^~~
> /x/hpautotest-gcc1/gcc/gcc/caller-save.c: In function 'void 
> save_call_clobbered_regs()':
> /x/hpautotest-gcc1/gcc/gcc/regs.h:195:44: error: cannot convert 'bool' to 
> 'const predefined_function_abi*' for argument '3' to 'machine_mode 
> choose_hard_reg_mode(unsigned int, unsigned int, const 
> predefined_function_abi*)'
>choose_hard_reg_mode (REGNO, NREGS, false)
> ^
> /x/hpautotest-gcc1/gcc/gcc/caller-save.c:821:12: note: in expansion of macro 
> 'HARD_REGNO_CALLER_SAVE_MODE'
>  mode = HARD_REGNO_CALLER_SAVE_MODE
> ^~~
> Makefile:1117: recipe for target 'caller-save.o' failed
> ---
>
>> Also tested by compiling at least one target per CPU directory and
>> checking for no new warnings.
>
> (Hmm...  So maybe a host gcc issue?)

Bah, yeah, seems so.

> My host is x86-64 Debian 9, i.e. gcc-6.3.0.

Mine was 5.4.0, which only treats this as a warning.  Normally I check
for extra warnings too, but mustn't have done this time, sorry.

Fixed as below.  I also belatedly see a new definition of
HARD_REGNO_CALLER_SAVE_MODE was added since I posted the patches,
so the patch fixes that too.  Tested by cross-building cris-elf
and sparc-linux-gnu, applied as obvious.

Richard


2019-10-01  Richard Sandiford  

gcc/
* regs.h (HARD_REGNO_CALLER_SAVE_MODE): Update call to
choose_hard_reg_mode.
* config/sparc/sparc.h (HARD_REGNO_CALLER_SAVE_MODE): Likewise.

Index: gcc/regs.h
===
--- gcc/regs.h  2019-09-30 17:19:45.047128655 +0100
+++ gcc/regs.h  2019-10-01 08:46:22.368168133 +0100
@@ -192,7 +192,7 @@ #define REG_BASIC_BLOCK(N) (reg_info_p[N
 /* Select a register mode required for caller save of hard regno REGNO.  */
 #ifndef HARD_REGNO_CALLER_SAVE_MODE
 #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
-  choose_hard_reg_mode (REGNO, NREGS, false)
+  choose_hard_reg_mode (REGNO, NREGS, NULL)
 #endif
 
 /* Target-dependent globals.  */
Index: gcc/config/sparc/sparc.h
===
--- gcc/config/sparc/sparc.h2019-09-21 13:56:08.855935013 +0100
+++ gcc/config/sparc/sparc.h2019-10-01 08:46:22.368168133 +0100
@@ -716,7 +716,7 @@ #define HARD_REGNO_RENAME_OK(FROM, TO) (
mode but the largest suitable mode for the given (REGNO, NREGS) pair and
it quickly creates paradoxical subregs that can be problematic.  */
 #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
-  ((MODE) == 

Re: [C][C++] Avoid exposing internal details in aka types

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 3:21 PM Richard Sandiford
 wrote:
>
> The current aka diagnostics can sometimes leak internal details that
> seem more likely to be distracting than useful.  E.g. on aarch64:
>
>   void f (va_list *va) { *va = 1; }
>
> gives:
>
>   incompatible types when assigning to type ‘va_list’ {aka ‘__va_list’} from 
> type ‘int’
>
> where __va_list isn't something the user is expected to know about.
> A similar thing happens for C++ on the arm_neon.h-based:
>
>   float x;
>   int8x8_t y = x;
>
> which gives:
>
>   cannot convert ‘float’ to ‘int8x8_t’ {aka ‘__Int8x8_t’} in initialization
>
> This is accurate -- and __Int8x8_t is defined by the AArch64 PCS --
> but it's not going to be meaningful to most users.
>
> This patch stops the aka code looking through typedefs if all of
> the following are true:
>
> (1) the typedef is built into the compiler or comes from a system header
>
> (2) the target of the typedef is anonymous or has a name in the
> implementation namespace
>
> (3) the target type is a tag type or vector type, which have in common that:
> (a) we print their type names if they have one
> (b) what we print for anonymous types isn't all that useful
> ("struct " etc. for tag types, pseudo-C "__vector(N) T"
> for vector types)
>
> The C side does this by recursively looking for the aka type, like the
> C++ side already does.  This in turn makes "aka" work for distinct type
> copies like __Int8x8_t on aarch64, fixing the ??? in aarch64/diag_aka_1.c.
>
> On the C++ side, strip_typedefs had:
>
>   /* Explicitly get the underlying type, as TYPE_MAIN_VARIANT doesn't
>  strip typedefs with attributes.  */
>   result = TYPE_MAIN_VARIANT (DECL_ORIGINAL_TYPE (TYPE_NAME (t)));
>   result = strip_typedefs (result);
>
> Applying TYPE_MAIN_VARIANT predates the strip_typedefs call, with the
> comment originally contrasting with plain:
>
>   result = TYPE_MAIN_VARIANT (t);
>
> But the recursive call to strip_typedefs will apply TYPE_MAIN_VARIANT,
> so it doesn't seem necessary to do it here too.  I think there was also
> a missing "remove_attributes" argument, since wrapping something in a
> typedef shouldn't change which attributes get removed.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

In other context (debug-info) we also look at DECL_ARTIFICIAL,
not sure if that is set on compiler-generated TYPE_DECLs.

> Richard
>
>
> 2019-09-30  Richard Sandiford  
>
> gcc/c-family/
> * c-common.h (user_facing_original_type_p): Declare.
> * c-common.c (user_facing_original_type_p): New function.
>
> gcc/c/
> * c-objc-common.c (useful_aka_type_p): Replace with...
> (get_aka_type): ...this new function.  Given the original type,
> decide which aka type to print (if any).  Only look through typedefs
> if user_facing_original_type_p.
> (print_type): Update accordingly.
>
> gcc/cp/
> * cp-tree.h (STF_USER_VISIBLE): New constant.
> (strip_typedefs, strip_typedefs_expr): Take a flags argument.
> * tree.c (strip_typedefs, strip_typedefs_expr): Likewise,
> updating mutual calls accordingly.  When STF_USER_VISIBLE is true,
> only look through typedefs if user_facing_original_type_p.
> * error.c (dump_template_bindings, type_to_string): Pass
> STF_USER_VISIBLE to strip_typedefs.
> (dump_type): Likewise, unless pp_c_flag_gnu_v3 is set.
>
> gcc/testsuite/
> * g++.dg/diagnostic/aka5.h: New test.
> * g++.dg/diagnostic/aka5a.C: Likewise.
> * g++.dg/diagnostic/aka5b.C: Likewise.
> * g++.target/aarch64/diag_aka_1.C: Likewise.
> * gcc.dg/diag-aka-5.h: Likewise.
> * gcc.dg/diag-aka-5a.c: Likewise.
> * gcc.dg/diag-aka-5b.c: Likewise.
> * gcc.target/aarch64/diag_aka_1.c (f): Expect an aka to be printed
> for myvec.
>
> Index: gcc/c-family/c-common.h
> ===
> --- gcc/c-family/c-common.h 2019-09-30 13:54:16.0 +0100
> +++ gcc/c-family/c-common.h 2019-09-30 14:16:45.002103890 +0100
> @@ -1063,6 +1063,7 @@ extern tree builtin_type_for_size (int,
>  extern void c_common_mark_addressable_vec (tree);
>
>  extern void set_underlying_type (tree);
> +extern bool user_facing_original_type_p (const_tree);
>  extern void record_types_used_by_current_var_decl (tree);
>  extern vec *make_tree_vector (void);
>  extern void release_tree_vector (vec *);
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c 2019-09-30 13:54:16.0 +0100
> +++ gcc/c-family/c-common.c 2019-09-30 14:16:45.002103890 +0100
> @@ -7713,6 +7713,55 @@ set_underlying_type (tree x)
>  }
>  }
>
> +/* Return true if it is worth exposing the DECL_ORIGINAL_TYPE of TYPE to
> +   the user in diagnostics, false if it would 

Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:
>
> My comments accidentally got lost.
>
> Several places in the front-end (and elsewhere) use the same lazy
> allocation pattern for hash_maps, and this patch replaces them with
> hash_map_safe_* functions like vec_safe_*.  They don't provide a way
> to specify an initial size, but I don't think that's a significant
> issue.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?

You are using create_ggc but the new functions do not indicate that ggc
allocation is done.  It's then also incomplete with not having a non-ggc variant
of them?  Maybe I'm missing something.

Thanks,
Richard.

> On Mon, Sep 30, 2019 at 2:30 PM Jason Merrill  wrote:
> >
> > gcc/
> > * hash-map.h (default_size): Put in member variable.
> > (create_ggc): Use it as default argument.
> > (hash_map_maybe_create, hash_map_safe_get)
> > (hash_map_safe_get_or_insert, hash_map_safe_put): New fns.
> > gcc/cp/
> > * constexpr.c (maybe_initialize_fundef_copies_table): Remove.
> > (get_fundef_copy): Use hash_map_safe_get_or_insert.
> > * cp-objcp-common.c (cp_get_debug_type): Use hash_map_safe_*.
> > * decl.c (store_decomp_type): Remove.
> > (cp_finish_decomp): Use hash_map_safe_put.
> > * init.c (get_nsdmi): Use hash_map_safe_*.
> > * pt.c (store_defaulted_ttp, lookup_defaulted_ttp): Remove.
> > (add_defaults_to_ttp): Use hash_map_safe_*.
> > ---
> >  gcc/hash-map.h   | 38 --
> >  gcc/cp/constexpr.c   | 14 ++
> >  gcc/cp/cp-objcp-common.c |  6 ++
> >  gcc/cp/decl.c|  9 +
> >  gcc/cp/init.c|  9 ++---
> >  gcc/cp/pt.c  | 21 +++--
> >  gcc/hash-table.c |  2 +-
> >  7 files changed, 47 insertions(+), 52 deletions(-)
> >
> > diff --git a/gcc/hash-map.h b/gcc/hash-map.h
> > index ba20fe79f23..e638f761465 100644
> > --- a/gcc/hash-map.h
> > +++ b/gcc/hash-map.h
> > @@ -128,8 +128,9 @@ class GTY((user)) hash_map
> > }
> >};
> >
> > +  static const size_t default_size = 13;
> >  public:
> > -  explicit hash_map (size_t n = 13, bool ggc = false,
> > +  explicit hash_map (size_t n = default_size, bool ggc = false,
> >  bool sanitize_eq_and_hash = true,
> >  bool gather_mem_stats = GATHER_STATISTICS
> >  CXX_MEM_STAT_INFO)
> > @@ -146,7 +147,7 @@ public:
> >HASH_MAP_ORIGIN PASS_MEM_STAT) {}
> >
> >/* Create a hash_map in ggc memory.  */
> > -  static hash_map *create_ggc (size_t size,
> > +  static hash_map *create_ggc (size_t size = default_size,
> >bool gather_mem_stats = GATHER_STATISTICS
> >CXX_MEM_STAT_INFO)
> >  {
> > @@ -326,4 +327,37 @@ gt_pch_nx (hash_map *h, gt_pointer_operator 
> > op, void *cookie)
> >op (>m_table.m_entries, cookie);
> >  }
> >
> > +template
> > +inline hash_map *
> > +hash_map_maybe_create (hash_map *)
> > +{
> > +  if (!h)
> > +h = h->create_ggc ();
> > +  return h;
> > +}
> > +
> > +/* Like h->get, but handles null h.  */
> > +template
> > +inline V*
> > +hash_map_safe_get (hash_map *h, const K& k)
> > +{
> > +  return h ? h->get (k) : NULL;
> > +}
> > +
> > +/* Like h->get, but handles null h.  */
> > +template
> > +inline V&
> > +hash_map_safe_get_or_insert (hash_map *, const K& k, bool *e = 
> > NULL)
> > +{
> > +  return hash_map_maybe_create (h)->get_or_insert (k, e);
> > +}
> > +
> > +/* Like h->put, but handles null h.  */
> > +template
> > +inline bool
> > +hash_map_safe_put (hash_map *, const K& k, const V& v)
> > +{
> > +  return hash_map_maybe_create (h)->put (k, v);
> > +}
> > +
> >  #endif
> > diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> > index cb5484f4b72..904b70a9c99 100644
> > --- a/gcc/cp/constexpr.c
> > +++ b/gcc/cp/constexpr.c
> > @@ -1098,15 +1098,6 @@ maybe_initialize_constexpr_call_table (void)
> >
> >  static GTY(()) hash_map *fundef_copies_table;
> >
> > -/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
> > -
> > -static void
> > -maybe_initialize_fundef_copies_table ()
> > -{
> > -  if (fundef_copies_table == NULL)
> > -fundef_copies_table = hash_map::create_ggc (101);
> > -}
> > -
> >  /* Reuse a copy or create a new unshared copy of the function FUN.
> > Return this copy.  We use a TREE_LIST whose PURPOSE is body, VALUE
> > is parms, TYPE is result.  */
> > @@ -1114,11 +1105,10 @@ maybe_initialize_fundef_copies_table ()
> >  static tree
> >  get_fundef_copy (constexpr_fundef *fundef)
> >  {
> > -  maybe_initialize_fundef_copies_table ();
> > -
> >tree copy;
> >bool existed;
> > -  tree *slot = _copies_table->get_or_insert (fundef->decl, 
> > );
> > +  tree *slot = _map_safe_get_or_insert (fundef_copies_table,
> > +fundef->decl, );
> >
> >if (!existed)
> >  {
> > 

Re: [PATCH] ifcvt: improve cost estimation (PR 87047)

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 7:51 PM Alexander Monakov  wrote:
>
> On Mon, 30 Sep 2019, Alexander Monakov wrote:
>
> > +static unsigned
> > +average_cost (unsigned then_cost, unsigned else_cost, edge e)
> > +{
> > +  return else_cost + e->probability.apply ((int) then_cost - else_cost);
>
> Ugh, I made a wrong last-minute edit here, we want signed cost difference so
> the argument to probability.apply should be
>
>   (int) (then_cost - else_cost)
>
> or
>
>   (int) then_cost - (int) else_cost.
>
> The patch I bootstrapped and passed Martin for testing correctly had
>
>   (gcov_type) then_cost - else_cost
>
> (gcov_type is int64).

OK for trunk with that fixed.  Not OK for backports.

Thanks,
Richard.

> Alexander


[PATCH] doc/md.texi: Fix some typos

2019-10-01 Thread Segher Boessenkool
It says " size N/2" in a few places where "size S/2" is meant.

Committing as trivial and obvious.


Segher


2019-10-01  Segher Boessenkool  

* doc/md.texi (vec_pack_trunc_@var{m}): Fix typo.
(vec_pack_sfix_trunc_@var{m}, vec_pack_ufix_trunc_@var{m}): Ditto.
(vec_packs_float_@var{m}, vec_packu_float_@var{m}): Ditto.

---
 gcc/doc/md.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 868016a..859ebed 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5454,7 +5454,7 @@ The output and input vectors should have the same modes.
 Narrow (demote) and merge the elements of two vectors. Operands 1 and 2
 are vectors of the same mode having N integral or floating point elements
 of size S@.  Operand 0 is the resulting vector in which 2*N elements of
-size N/2 are concatenated after narrowing them down using truncation.
+size S/2 are concatenated after narrowing them down using truncation.
 
 @cindex @code{vec_pack_sbool_trunc_@var{m}} instruction pattern
 @item @samp{vec_pack_sbool_trunc_@var{m}}
@@ -5481,7 +5481,7 @@ saturating arithmetic.
 Narrow, convert to signed/unsigned integral type and merge the elements
 of two vectors.  Operands 1 and 2 are vectors of the same mode having N
 floating point elements of size S@.  Operand 0 is the resulting vector
-in which 2*N elements of size N/2 are concatenated.
+in which 2*N elements of size S/2 are concatenated.
 
 @cindex @code{vec_packs_float_@var{m}} instruction pattern
 @cindex @code{vec_packu_float_@var{m}} instruction pattern
@@ -5489,7 +5489,7 @@ in which 2*N elements of size N/2 are concatenated.
 Narrow, convert to floating point type and merge the elements
 of two vectors.  Operands 1 and 2 are vectors of the same mode having N
 signed/unsigned integral elements of size S@.  Operand 0 is the resulting 
vector
-in which 2*N elements of size N/2 are concatenated.
+in which 2*N elements of size S/2 are concatenated.
 
 @cindex @code{vec_unpacks_hi_@var{m}} instruction pattern
 @cindex @code{vec_unpacks_lo_@var{m}} instruction pattern
-- 
1.8.3.1



Re: [SVE] PR91532

2019-10-01 Thread Richard Biener
On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:

> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
> >
> > On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> >
> > > On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > > >
> > > > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > > Hi,
> > > > > For PR91532, the dead store is trivially deleted if we place dse pass
> > > > > between ifcvt and vect. Would it be OK to add another instance of dse 
> > > > > there ?
> > > > > Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > > > > will clean up the dead store ?
> > > > I'd hesitate to add another DSE pass.  If there's one nearby could we
> > > > move the existing pass?
> > > Well I think the nearest one is just after pass_warn_restrict. Not
> > > sure if it's a good
> > > idea to move it up from there ?
> >
> > You'll need it inbetween ifcvt and vect so it would be disabled
> > w/o vectorization, so no, that doesn't work.
> >
> > ifcvt already invokes SEME region value-numbering so if we had
> > MESE region DSE it could use that.  Not sure if you feel like
> > refactoring DSE to work on regions - it currently uses a DOM
> > walk which isn't suited for that.
> >
> > if-conversion has a little "local" dead predicate compute removal
> > thingy (not that I like that), eventually it can be enhanced to
> > do the DSE you want?  Eventually it should be moved after the local
> > CSE invocation though.
> Hi,
> Thanks for the suggestions.
> For now, would it be OK to do "dse" on loop header in
> tree_if_conversion, as in the attached patch ?
> The patch does local dse in a new function ifcvt_local_dse instead of
> ifcvt_local_dce, because it needed to be done after RPO VN which
> eliminates:
> Removing dead stmt _ifc__62 = *_55;
> and makes the following store dead:
> *_55 = _ifc__61;

I suggested trying to move ifcvt_local_dce after RPO VN, you could
try that as independent patch (pre-approved).

I don't mind the extra walk though.

What I see as possible issue is that dse_classify_store walks virtual
uses and I'm not sure if the loop exit is a natural boundary for
such walk (eventually the loop header virtual PHI is reached but
there may also be a loop-closed PHI for the virtual operand,
but not necessarily).  So the question is whether to add a
"stop at" argument to dse_classify_store specifying the virtual
use the walk should stop at?

Thanks,
Richard.