Re: [PATCH 1/3] Power10: Add PCREL_OPT load support

2020-08-20 Thread Segher Boessenkool
Hi!

On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote:
> +// Maximum number of insns to scan between the load address and the load that

Please don't mix /* and // comments.  Just stick to /* comments, like
all the rest of our backend?

> +const int MAX_PCREL_OPT_INSNS= 10;

"static const", and not tab please (just a space).

> +  // LWA is a DS format instruction, but LWZ is a D format instruction.  We 
> use
> +  // DImode for the mode to force checking whether the bottom 2 bits are 0.

How so?  Can't you just check if the resulting insn is accepted? That
would be so much more robust (and can have a correct description more
easily, too!)

> +  // However FPR and vector registers uses the LFIWAX instruction which is
> +  // indexed only.

(vectors use lxsiwax)

> +  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)

You're checking here whether the address of the MEM is SImode.

> +{
> +  if (!INT_REGNO_P (reg_regno))

That should use base_reg_operand instead?

> +  // The optimization will only work on non-prefixed offsettable loads.
> +  rtx addr = XEXP (mem_inner, 0);
> +  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
> +  if (iform != INSN_FORM_BASE_REG
> +  && iform != INSN_FORM_D
> +  && iform != INSN_FORM_DS
> +  && iform != INSN_FORM_DQ)
> +return false;

Sounds like something for a utility function?  Do we use this elsewhere?

> +  ++pcrel_opt_next_num;

Normal style is postfix increment.  Empty lines around this are nice, it
indicates this is the point where we decided to do the thing.  Or add a
comment even, maybe?

> +  unsigned int addr_regno = reg_or_subregno (addr_reg);
> +  rtx label_num = GEN_INT (pcrel_opt_next_num);
> +  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
> +
> +  PATTERN (addr_insn)
> += ((addr_regno != reg_regno)
> +   ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
> +   : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num));

Please use if() instead of multi-line expressions.  The outer parens are
unnecessary, too.

> +  // Revalidate the insn, backing out of the optimization if the insn is not
> +  // supported.

So the count will be incorrect then?

> +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +  if (INSN_CODE (addr_insn) < 0)
> +{
> +  PATTERN (addr_insn) = addr_set;
> +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +  return false;
> +}

Can you use the normal undo mechanisms, instead?  apply_change_group
etc.

> +static rtx_insn *
> +next_active_insn_in_basic_block (rtx_insn *insn)
> +{
> +  insn = NEXT_INSN (insn);
> +
> +  while (insn != NULL_RTX)
> +{
> +  /* If the basic block ends or there is a jump of some kind, exit the
> +  loop.  */
> +  if (CALL_P (insn)
> +   || JUMP_P (insn)
> +   || JUMP_TABLE_DATA_P (insn)
> +   || LABEL_P (insn)
> +   || BARRIER_P (insn))
> + return NULL;
> +
> +  /* If this is a real insn, return it.  */
> +  if (!insn->deleted ()
> +   && NONJUMP_INSN_P (insn)
> +   && GET_CODE (PATTERN (insn)) != USE
> +   && GET_CODE (PATTERN (insn)) != CLOBBER)
> + return insn;
> +
> +  /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
> +  insn = NEXT_INSN (insn);
> +}
> +
> +  return NULL;
> +}

There are things to do this.  Please don't write code manually parsing
RTL unless you have to.

> +// Validate that a load is actually a single instruction that can be 
> optimized
> +// with the PCREL_OPT optimization.
> +
> +static bool
> +is_single_instruction (rtx_insn *insn, rtx reg)

Of course it is a single instruction!  A single RTL instruction...
Clarify as "single machine instruction"?

> +{
> +  if (!REG_P (reg) && !SUBREG_P (reg))
> +return false;

You need to check if is a subreg of reg, then.  There are subregs of
other things.  Not that you should see those here, but still.

> +  // _Decimal128 and IBM extended double are always multiple instructions.
> +  machine_mode mode = GET_MODE (reg);
> +  if (mode == TFmode && !TARGET_IEEEQUAD)
> +return false;
> +
> +  if (mode == TDmode || mode == IFmode)
> +return false;

Don't we have a helper for this?  The ibm128 part at least.

Maybe we should just have an attribute on the insns that *can* get this
optimisation?

> +  return (base_reg_operand (XEXP (addr, 0), Pmode)
> +   && satisfies_constraint_I (XEXP (addr, 1)));

short_cint_operand.  But maybe just rs6000_legitimate_offset_address_p?

>  /* Flags that need to be turned off if -mno-power10.  */
>  #define OTHER_POWER10_MASKS  (OPTION_MASK_MMA\
>| OPTION_MASK_PCREL\
> +  | OPTION_MASK_PCREL_OPT\
>| OPTION_MASK_PREFIXED)

This isn't a CPU flag.  Instead, it is just an optimisation option.


How to decide if a target supports relocations or not?

2020-08-20 Thread HAO CHEN GUI via Gcc-patches

Hi,

Some questions about relocation. Could anyone kindly help on them? 
Thanks a lot.


1 If targetm.asm_out.reloc_rw_mask () returns 0, does it mean the target 
doesn't support relocations in read only section. So it should be put in 
read-write section?


2 Here, does the read only section stand for rodata section or 
data.rel.ro section?


3 What's the relationships between PIC flag and relocations?

Thanks again.



[committed] analyzer: add regression tests [PR95152]

2020-08-20 Thread David Malcolm via Gcc-patches
PR analyzer/95152 reports various ICEs in
region_model::get_or_create_mem_ref.

I removed this function as part of the state rewrite in
r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d.
I've verified that these two test cases reproduce the issue with 10.2
and don't ICE with trunk; adding them as regression tests.

Successfully tested on x86_64-pc-linux-gnu.
Pushed to master as r11-2792-g6b31b6b52612a6d4a7a84e71f6331464d68400d4.

gcc/testsuite/ChangeLog:
PR analyzer/95152
* gcc.dg/analyzer/pr95152-4.c: New test.
* gcc.dg/analyzer/pr95152-5.c: New test.
---
 gcc/testsuite/gcc.dg/analyzer/pr95152-4.c | 11 +++
 gcc/testsuite/gcc.dg/analyzer/pr95152-5.c |  6 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr95152-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr95152-5.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr95152-4.c 
b/gcc/testsuite/gcc.dg/analyzer/pr95152-4.c
new file mode 100644
index 000..f2a72cad01c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr95152-4.c
@@ -0,0 +1,11 @@
+/* { dg-additional-options "-Wno-pointer-to-int-cast" } */
+extern void my_func (int);
+typedef struct {
+  int var;
+} info_t;
+extern void *_data_offs;
+void test()
+{
+  info_t *info = ((void *)((void *)1) + ((unsigned int)&_data_offs));
+  my_func(info->var == 0);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr95152-5.c 
b/gcc/testsuite/gcc.dg/analyzer/pr95152-5.c
new file mode 100644
index 000..604b78458c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr95152-5.c
@@ -0,0 +1,6 @@
+/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+void foo(void)
+{
+  void (*a[1]) ();
+  void (*p) () = a + 1;
+}
-- 
2.26.2



Re: [PATCH] improve validation of attribute arguments (PR c/78666)

2020-08-20 Thread Martin Sebor via Gcc-patches

On 8/20/20 3:00 PM, Aldy Hernandez wrote:
First, didn't Marek say in the PR that the diagnostic code should go in 
diagnose_mismatched_attributes?


My understanding of the structure of the attribute handling code
is that with just a few exceptions, for C and C++ it's pretty much
all in c-attribs.c.  That makes sense to me and I would rather
prefer to avoid spreading attribute-specific logic across other
files.  diagnose_mismatched_attributes predates the ability to
access the prior declaration of a function (via node[1] in
attribute handlers).  It's shrunk considerably since Marek added
it years ago as a result of the attribute exclusion framework.
If it's possible (I haven't checked) it might make sense to move
the attribute optimize validation logic (the only attribute left
it still handles) from it to handle_optimize_attribute which now
has access to the previous declaration via node[1].



An overall comment-- could we write a generic validator rather than 
having to special case validation on a case by case basis?


Is there way of marking attributes as immutable if specified on the same 
decl?  For example, marking that alloc_size, nonnull, sentinel, etc 
should never have differing versions of the same attribute for the same 
decl name?  And then you could go and validate each of those attributes 
such marked automatically.


Or is this too much work?  Marek, as the C maintainer what are your 
thoughts? ;-)


I like the idea of coming up with general properties for attributes
and validating based on those rather than case by case.  An example
along these lines is the attribute exclusion framework I put in
place to reject mutually exclusive attributes (e.g., aligned and
packed).  Detection of conflicts between attribute arguments is
a natural extension of the same idea.

But to answer your question: extending the attribute handling
infrastructure does tend to be a lot of work because every change
to struct attribute_spec means updating all back ends (each member
of the struct has to be explicitly initialized in each back end's
attribute_table; that's silly because most of the members take on
default values but that's the way things are set up for now).  I'd
like to take some time to think about how to generalize the overall
design to make future enhancements like the one you suggest less
intrusive.  Until then, I think this is a worthwhile improvement
to make on its own.



Regardless, here are some random comments.


Thanks for the careful review!


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 37214831538..bc4f409e346 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -720,6 +725,124 @@ positional_argument (const_tree fntype, 
const_tree atname, tree pos,

   return pos;
 }

+/* Given a pair of NODEs for arbitrary DECLs or TYPEs, validate one or
+   two integral or string attribute arguments NEWARGS to be applied to
+   NODE[0] for the absence of conflicts with the same attribute 
arguments

+   already applied to NODE[1]. Issue a warning for conflicts and return
+   false.  Otherwise, when no conflicts are found, return true.  */
+
+static bool
+validate_attr_args (tree node[2], tree name, tree newargs[2])


I think you're doing too much work in one function.  Also, I *really* 
dislike sending pairs of objects in arrays, especially when they're 
called something so abstract as "node" and "newargs".


Would it be possible to make this function only validate one single 
argument and call it twice?  Or do we gain something by having it do two 
things at once?


I agree about the name "node."  The argument comes from the attribute
handlers: they all take something called a node as their first argument.
It's an array of three elements:
  [0] the current declaration or type
  [1] the previous declaration or type or null
  [2] the current declaration if [0] is a type
validate_attr_args() is called with the same node as the handlers
and uses both node[0] and node[1] to recursively validate the first
one against itself and then against the second.  It could be changed
to take two arguments instead of an array (the new "node" and
the original "node," perhaps even under some better name).  That
would make it different from every handler but maybe that wouldn't
be a problem.

The newargs argument is also an array, with the second element
being optional.  Both elements are used and validated against
the attribute arguments on the same declaration first and again
on the previous one.  The array could be split up into two
distinct arguments, say newarg1 and newarg2, or passed in as
std::pair.  I'm not sure I see much of a difference
between the approaches.

I can't think of a good way to call the function twice, once for
each attribute argument.  It's part of the validation to make sure
that the second (optional) argument is provided either in both
instances of the same attribute and with the same value, or in
neither.  In the case of a conflict, the function also 

Re: [PATCH 0/3] Power10 PCREL_OPT support

2020-08-20 Thread Segher Boessenkool
Hi!

On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
> Currently on power10, the compiler compiles this as:
> 
>   ret_var:
>   pld 9,ext_variable@got@pcrel
>   lwa 3,0(9)
>   blr
> 
>   store_var:
>   pld 9,ext_variable@got@pcrel
>   stw 3,0(9)
>   blr
> 
> That is, it loads up the address of 'ext_variable' from the GOT table into
> register r9, and then uses r9 as a base register to reference the actual
> variable.
> 
> The linker does optimize the case where you are compiling the main program, 
> and
> the variable is also defined in the main program to be:
> 
>   ret_var:
>   pla 9,ext_variable,1
>   lwa 3,0(9)
>   blr
> 
>   store_var:
>   pla 9,ext_variable,1
>   stw 3,0(9)
>   blr

Those "pla" insns are invalid; please correct them?  (You mixed "pla"
and "paddi" syntax I think.)

> These patches generate:
> 
>   ret_var:
>   pld 9,ext_variable@got@pcrel
>   .Lpcrel1:
>   .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
>   lwa 3,0(9)
>   blr
> 
>   store_var:
>   pld 9,ext_variable@got@pcrel
>   .Lpcrel2:
>   .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
>   stw 3,0(9)
>   blr
> 
> Note, the label for locating the PLD occurs after the PLD and not before it.
> This is so that if the assembler adds a NOP in front of the PLD to align it,
> the relocations will still work.
> 
> If the linker can, it will convert the code into:
> 
>   ret_var:
>   plwa3,ext_variable,1
>   nop
>   blr
> 
>   store_var:
>   pstw3,ext_variable,1
>   nop
>   blr

Those "plwa" and "pstw" are invalid syntax as well (should have "(0)"
after the symbol name).

> These patches allow the load of the address to not be physically adjacent to
> the actual load or store, which should allow for better code.

Why is that?  That is not what it does anyway?  /confused

> In order to do this, the pass that converts the load address and load/store
> must occur late in the compilation cycle.

That does not follow afaics.

> In particular, the second scheduler
> pass will duplicate and optimize some of the references and it will produce an
> invalid program.  In the past, Segher has said that we should be able to move
> it earlier.

I said that you shouldn't require this to be the very last pass.  There
is no reason for that, and that will not scale (what if a second pass
shows up that also requires this!)

It also makes it impossible to do normal late optimisations on code
produced here (optimisations like peephole, cprop_hardreg, dce).

I also said that you should use the DF framework, not parse all RTL by
hand and getting it all wrong, as *everyone* does: this stuff is hard.


Segher


[committed] d: Merge upstream dmd 1b5a53d01.

2020-08-20 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE in setValue at dmd/dinterpret.c:7046

This was originally seen when running the testsuite for a 16-bit target,
however, it could be reproduced on 32-bit using long[] as well.

Regstrapped on x86_64-linux-gnu/-m32/-mx32, committed to mainline and
backported to gcc-10 branch.

Regards
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 1b5a53d01.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/ctfeexpr.c  |  2 +-
 gcc/d/dmd/dinterpret.c|  9 -
 .../gdc.test/compilable/interpret3.d  | 38 +++
 .../gdc.test/fail_compilation/reg6769.d   | 29 ++
 5 files changed, 69 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/reg6769.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index daa3e565ff7..d0e5f442247 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-c2274e56a3220ea636c6199fd06cd54fcdf6bad9
+1b5a53d01c465109ce47edf49ace6143b69b118b
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/ctfeexpr.c b/gcc/d/dmd/ctfeexpr.c
index 5230647e626..ee38033ac82 100644
--- a/gcc/d/dmd/ctfeexpr.c
+++ b/gcc/d/dmd/ctfeexpr.c
@@ -1913,7 +1913,7 @@ bool isCtfeValueValid(Expression *newval)
 // e1 should be a CTFE reference
 Expression *e1 = ((AddrExp *)newval)->e1;
 return tb->ty == Tpointer &&
-   ((e1->op == TOKstructliteral && isCtfeValueValid(e1)) ||
+   (((e1->op == TOKstructliteral || e1->op == TOKarrayliteral) && 
isCtfeValueValid(e1)) ||
 (e1->op == TOKvar) ||
 (e1->op == TOKdotvar && isCtfeReferenceValid(e1)) ||
 (e1->op == TOKindex && isCtfeReferenceValid(e1)) ||
diff --git a/gcc/d/dmd/dinterpret.c b/gcc/d/dmd/dinterpret.c
index dd1105c03bd..74c5b40741f 100644
--- a/gcc/d/dmd/dinterpret.c
+++ b/gcc/d/dmd/dinterpret.c
@@ -1947,15 +1947,6 @@ public:
 Type *elemtype = ((TypeArray *)(val->type))->next;
 d_uns64 elemsize = elemtype->size();
 
-// It's OK to cast from fixed length to dynamic array, eg [3] 
to int[]*
-if (val->type->ty == Tsarray && pointee->ty == Tarray &&
-elemsize == pointee->nextOf()->size())
-{
-new(pue) AddrExp(e->loc, val, e->type);
-result = pue->exp();
-return;
-}
-
 // It's OK to cast from fixed length to fixed length array, eg 
[n] to int[d]*.
 if (val->type->ty == Tsarray && pointee->ty == Tsarray &&
 elemsize == pointee->nextOf()->size())
diff --git a/gcc/testsuite/gdc.test/compilable/interpret3.d 
b/gcc/testsuite/gdc.test/compilable/interpret3.d
index 14d1a12c240..6e7304d742e 100644
--- a/gcc/testsuite/gdc.test/compilable/interpret3.d
+++ b/gcc/testsuite/gdc.test/compilable/interpret3.d
@@ -3235,6 +3235,44 @@ int ctfeSort6250()
 
 static assert(ctfeSort6250() == 57);
 
+/**/
+
+long[]* simple6250b(long[]* x) { return x; }
+
+void swap6250b(long[]* lhs, long[]* rhs)
+{
+long[] kk = *lhs;
+assert(simple6250b(lhs) == lhs);
+lhs = simple6250b(lhs);
+assert(kk[0] == 18);
+assert((*lhs)[0] == 18);
+assert((*rhs)[0] == 19);
+*lhs = *rhs;
+assert((*lhs)[0] == 19);
+*rhs = kk;
+assert(*rhs == kk);
+assert(kk[0] == 18);
+assert((*rhs)[0] == 18);
+}
+
+long ctfeSort6250b()
+{
+ long[][2] x;
+ long[3] a = [17, 18, 19];
+ x[0] = a[1 .. 2];
+ x[1] = a[2 .. $];
+ assert(x[0][0] == 18);
+ assert(x[0][1] == 19);
+ swap6250b([0], [1]);
+ assert(x[0][0] == 19);
+ assert(x[1][0] == 18);
+ a[1] = 57;
+ assert(x[0][0] == 19);
+ return x[1][0];
+}
+
+static assert(ctfeSort6250b() == 57);
+
 /**
 6672 circular references in array
 **/
diff --git a/gcc/testsuite/gdc.test/fail_compilation/reg6769.d 
b/gcc/testsuite/gdc.test/fail_compilation/reg6769.d
new file mode 100644
index 000..b11fac925a0
--- /dev/null
+++ b/gcc/testsuite/gdc.test/fail_compilation/reg6769.d
@@ -0,0 +1,29 @@
+/*
+TEST_OUTPUT
+---
+fail_compilation/reg6769.d(14): Error: reinterpreting cast from `int[]` to 
`int[7]*` is not supported in CTFE
+fail_compilation/reg6769.d(27):called from here: `reg6769a([0, 1, 2, 
3, 4, 5, 6])`
+fail_compilation/reg6769.d(27):while evaluating: `static 
assert(reg6769a([0, 1, 2, 3, 4, 5, 6]) == 1)`
+fail_compilation/reg6769.d(20): Error: reinterpreting cast from `int[7]` to 
`int[]*` is not supported in CTFE
+fail_compilation/reg6769.d(28):called from here: `reg6769b([0, 1, 2, 
3, 4, 5, 6])`
+fail_compilation/reg6769.d(28):while evaluating: `static 
assert(reg6769b([0, 1, 2, 3, 4, 

Re: [Patch, fortran, v2] PR fortran/96728 - Fatal Error: Reading module inquiry functions on assumed-rank

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Exactly the same thing, only actually including the patch this time.

Sorry for the mishap.

Thank you very much.

Best regards,
José Rui


On 20/08/20 19:33, José Rui Faustino de Sousa wrote:

Hi all!

Proposed patch to PR96728 - Fatal Error: Reading module inquiry 
functions on assumed-rank.


Patch tested only on x86_64-pc-linux-gnu.

The rank of the argument to specification functions gets written when 
writing the module file, but, since the value will be negative for 
assumed-rank arrays, the reading the module will fail.


So the patch adds code to handle signed integers.

Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

  PR fortran/96728
  * module.c (module_peek_char): Peek ahead function.
  (parse_integer): Add code for parsing signed integers.
  (parse_atom): Add code to handle signed integers.
  (peek_atom): Add code to handle signed integers.

2020-8-20  José Rui Faustino de Sousa  

  PR fortran/96728
  * PR96728.f90: New test.


diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 5114d55..b06cebb 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -1234,6 +1234,13 @@ get_module_locus (module_locus *m)
   m->pos = module_pos;
 }
 
+/* Peek at the next character in the module.  */
+
+static int
+module_peek_char (void)
+{
+  return module_content[module_pos];
+}
 
 /* Get the next character in the module, updating our reckoning of
where we are.  */
@@ -1314,7 +1321,19 @@ parse_string (void)
 static void
 parse_integer (int c)
 {
-  atom_int = c - '0';
+  int sign = 1;
+
+  atom_int = 0;
+  switch (c)
+{
+case ('-'):
+  sign = -1;
+case ('+'):
+  break;
+default:
+  atom_int = c - '0';
+  break;
+}
 
   for (;;)
 {
@@ -1328,6 +1347,7 @@ parse_integer (int c)
   atom_int = 10 * atom_int + c - '0';
 }
 
+  atom_int *= sign; 
 }
 
 
@@ -1401,6 +1421,16 @@ parse_atom (void)
   parse_integer (c);
   return ATOM_INTEGER;
 
+case '+':
+case '-':
+  if (ISDIGIT (module_peek_char ()))
+	{
+	  parse_integer (c);
+	  return ATOM_INTEGER;
+	}
+  else
+	bad_module ("Bad name");
+
 case 'a':
 case 'b':
 case 'c':
@@ -1504,6 +1534,16 @@ peek_atom (void)
   module_unget_char ();
   return ATOM_INTEGER;
 
+case '+':
+case '-':
+  if (ISDIGIT (module_peek_char ()))
+	{
+	  module_unget_char ();
+	  return ATOM_INTEGER;
+	}
+  else
+	bad_module ("Bad name");
+
 case 'a':
 case 'b':
 case 'c':
diff --git a/gcc/testsuite/gfortran.dg/PR96728.f90 b/gcc/testsuite/gfortran.dg/PR96728.f90
new file mode 100644
index 000..4caa3a5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR96728.f90
@@ -0,0 +1,49 @@
+! { dg-do run }
+!
+! Test the fix for PR96728
+!
+
+module cref_m
+
+  implicit none
+
+  private
+
+  public ::   &
+isub_a_m
+  
+contains
+
+  subroutine isub_a_m(a, b)
+integer, intent(in)  :: a(..)
+integer, intent(out) :: b(size(a))
+
+integer :: i
+
+b = [(i, i=1,size(b))]
+return
+  end subroutine isub_a_m
+  
+end module cref_m
+
+program cref_p
+
+  use cref_m, only: &
+isub_a_m
+
+  implicit none
+  
+  integer:: i
+
+  integer, parameter :: n = 3
+  integer, parameter :: p(*) = [(i, i=1,n*n)]
+  
+  integer :: a(n,n)
+  integer :: b(n*n)
+
+  a = reshape(p, shape=[n,n])
+  call isub_a_m(a, b)
+  if (any(b/=p)) stop 1
+  stop
+
+end program cref_p


[Patch, fortran] PR fortran/94110 - Passing an assumed-size to an assumed-shape argument should be rejected

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR94110 - Passing an assumed-size to an assumed-shape 
argument should be rejected.


Patch tested only on x86_64-pc-linux-gnu.

Add code to also check for deferred-shape and assumed-rank pointer 
(allocatable arguments are checked elsewhere) dummy arguments being 
passed an assumed-size array formal argument when raising an error.


Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

 PR fortran/94110
 * interface.c (gfc_compare_actual_formal): Add code to also raise the
 actual argument cannot be an assumed-size array error when the dummy
 arguments are deferred-shape or assumed-rank pointer.

2020-8-20  José Rui Faustino de Sousa  

 PR fortran/94110
 * PR94110.f90: New test.
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 7985fc7..020cdd7 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3303,7 +3303,10 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  return false;
 	}
 
-  if (f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE
+  if (f->sym->as
+	  && (f->sym->as->type == AS_ASSUMED_SHAPE
+	  || f->sym->as->type == AS_DEFERRED
+	  || (f->sym->as->type == AS_ASSUMED_RANK && f->sym->attr.pointer))
 	  && a->expr->expr_type == EXPR_VARIABLE
 	  && a->expr->symtree->n.sym->as
 	  && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SIZE
diff --git a/gcc/testsuite/gfortran.dg/PR94110.f90 b/gcc/testsuite/gfortran.dg/PR94110.f90
new file mode 100644
index 000..9ec70ec
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR94110.f90
@@ -0,0 +1,88 @@
+! { dg-do compile }
+!
+! Test the fix for PR94110
+! 
+  
+program asa_p
+
+  implicit none
+
+  integer, parameter :: n = 7
+
+  integer :: p(n)
+  integer :: s
+
+  p = 1
+  s = sumf_as(p)
+  if (s/=n) stop 1
+  s = sumf_ar(p)
+  if (s/=n) stop 2
+  stop
+
+contains
+
+  function sumf_as(a) result(s)
+integer, target, intent(in) :: a(*)
+
+integer :: s
+
+s = sum_as(a)   ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+s = sum_p_ds(a) ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+s = sum_p_ar(a) ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+return
+  end function sumf_as
+
+  function sumf_ar(a) result(s)
+integer, target, intent(in) :: a(..)
+
+integer :: s
+
+select rank(a)
+rank(*)
+  s = sum_as(a)   ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+  s = sum_p_ds(a) ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+  s = sum_p_ar(a) ! { dg-error "Actual argument for .a. cannot be an assumed-size array" } 
+rank default
+  stop 3
+end select
+return
+  end function sumf_ar
+
+  function sum_as(a) result(s)
+integer, intent(in) :: a(:)
+  
+integer :: s
+
+s = sum(a)
+return
+  end function sum_as
+
+  function sum_p_ds(a) result(s)
+integer, pointer, intent(in) :: a(:)
+  
+integer :: s
+
+s = -1
+if(associated(a))&
+  s = sum(a)
+return
+  end function sum_p_ds
+
+  function sum_p_ar(a) result(s)
+integer, pointer, intent(in) :: a(..)
+  
+integer :: s
+
+s = -1
+select rank(a)
+rank(1)
+  if(associated(a))&
+s = sum(a)
+rank default
+  stop 4
+end select
+return
+  end function sum_p_ar
+
+end program asa_p
+


Re: [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values.

2020-08-20 Thread Segher Boessenkool
Hi!

On Tue, Aug 11, 2020 at 12:23:13PM -0700, Carl Love wrote:
[ Perfect stuff, or I don't see anything anyway! ]

Okay for trunk.  Thank you!


Segher


Re: [PATCH 4/6] Add `+' for Jobserver Integration

2020-08-20 Thread Joseph Myers
On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:

>  libbacktrace/Makefile.in |   2 +-
>  zlib/Makefile.in |  64 ++--

These directories use makefiles generated by automake.  Rather than 
modifying the generated files, you need to modify the sources (whether 
that's Makefile.am, or code in automake itself - if in automake itself, we 
should wait for an actual new automake release before updating the version 
used in GCC).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libgccjit: Fix several memory leaks in the driver

2020-08-20 Thread Joseph Myers
On Thu, 9 Jul 2020, Alex Coplan wrote:

> 2020-07-09  Alex Coplan  
> 
>   * gcc.c (set_static_spec): New.
>   (set_static_spec_owned): New.
>   (set_static_spec_shared): New.
>   (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Use
>   set_static_spec_owned() to take ownership of lto_wrapper_file
>   such that it gets freed in driver::finalize.
>   (driver::maybe_run_linker): Use set_static_spec_shared() to
>   ensure that we don't try and free() the static string "ld",
>   also ensuring that any previously-allocated string in
>   linker_name_spec is freed. Likewise with argv0.
>   (driver::finalize): Use set_static_spec_shared() when resetting
>   specs that previously had allocated strings; remove if(0)
>   around call to free().

OK with a comment added to set_static_spec, documenting the semantics of 
the function and its arguments.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 4/6] Add `+' for Jobserver Integration

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
GNU Make expects that a `+' token is present on the beggining of the
rule command if it wants to interact with the Jobserver [1]. This commit
add such token for the Makefiles in GCC.

[1] 
https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html#POSIX-Jobserver

gcc/ChangeLog:
intl/ChageLog:
libbacktrace/ChangeLog:
libcpp/ChangeLog:
libdecnumber/ChangeLog:
libiberty/ChangeLog:
zlib/ChangeLog:

2020-08-20  Giuliano Belinassi  

* Makefile.in: Use `+' on rule calling GCC.
---
 gcc/Makefile.in  |   4 +-
 intl/Makefile.in |   2 +-
 libbacktrace/Makefile.in |   2 +-
 libcpp/Makefile.in   |   2 +-
 libdecnumber/Makefile.in |   2 +-
 libiberty/Makefile.in| 212 +++
 zlib/Makefile.in |  64 ++--
 7 files changed, 144 insertions(+), 144 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c00617cfc1a..2e7aa4b6d30 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2703,14 +2703,14 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_D_H) 
$(TM_H) multilib.h \
 # How to compile object files to run on the build machine.
 
 build/%.o :  # dependencies provided by explicit rule later
-   $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
+   +$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
-o $@ $<
 
 ## build/version.o is compiled by the $(COMPILER_FOR_BUILD) but needs
 ## several C macro definitions, just like version.o
 build/version.o:  version.c version.h \
   $(REVISION) $(DATESTAMP) $(BASEVER) $(DEVPHASE)
-   $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
+   +$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
-DBASEVER=$(BASEVER_s) -DDATESTAMP=$(DATESTAMP_s) \
-DREVISION=$(REVISION_s) \
-DDEVPHASE=$(DEVPHASE_s) -DPKGVERSION=$(PKGVERSION_s) \
diff --git a/intl/Makefile.in b/intl/Makefile.in
index 356c8ab9b65..de95846cc1d 100644
--- a/intl/Makefile.in
+++ b/intl/Makefile.in
@@ -131,7 +131,7 @@ libintl.h: $(srcdir)/libgnuintl.h
 .SUFFIXES: .c .y .o
 
 .c.o:
-   $(COMPILE) $<
+   +$(COMPILE) $<
 
 .y.c:
 @BISON3_YES@   echo '#define USE_BISON3' > $(patsubst %.c,%-config.h,$@)
diff --git a/libbacktrace/Makefile.in b/libbacktrace/Makefile.in
index b244ca10a4a..08212bb8ac0 100644
--- a/libbacktrace/Makefile.in
+++ b/libbacktrace/Makefile.in
@@ -1326,7 +1326,7 @@ distclean-compile:
-rm -f *.tab.c
 
 .c.o:
-   $(AM_V_CC)$(COMPILE) -c -o $@ $<
+   +$(AM_V_CC)$(COMPILE) -c -o $@ $<
 
 .c.obj:
$(AM_V_CC)$(COMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
index 5fbba9b9c76..eaffeedf31a 100644
--- a/libcpp/Makefile.in
+++ b/libcpp/Makefile.in
@@ -223,7 +223,7 @@ endif
 # Implicit rules and I18N
 
 .c.o:
-   $(COMPILE) $<
+   +$(COMPILE) $<
$(POSTCOMPILE)
 
 # N.B. We do not attempt to copy these into $(srcdir).
diff --git a/libdecnumber/Makefile.in b/libdecnumber/Makefile.in
index 9da028d7f2f..2192e7434ad 100644
--- a/libdecnumber/Makefile.in
+++ b/libdecnumber/Makefile.in
@@ -191,7 +191,7 @@ COMPILE = source='$<' object='$@' libtool=no $(CC) $(DEFS) 
$(INCLUDES) $(CPPFLAG
 # Implicit rules
 
 .c.$(objext):
-   $(COMPILE) $<
+   +$(COMPILE) $<
 
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index 895f701bcd0..fe85363ba2a 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -419,7 +419,7 @@ etags tags TAGS: etags-subdir
 demangle: $(ALL) $(srcdir)/cp-demangle.c
@echo "The standalone demangler, now named c++filt, is now"
@echo "a part of binutils."
-   $(CC) @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) \
+   +$(CC) @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) \
  $(srcdir)/cp-demangle.c -DSTANDALONE_DEMANGLER $(TARGETLIB) -o $@
 
 ls:
@@ -739,7 +739,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
if [ x"$(NOASANFLAG)" != x ]; then \
  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/dyn-string.c -o 
noasan/$@; \
else true; fi
-   $(COMPILE.c) $(srcdir)/dyn-string.c $(OUTPUT_OPTION)
+   +$(COMPILE.c) $(srcdir)/dyn-string.c $(OUTPUT_OPTION)
 
 ./fdmatch.$(objext): $(srcdir)/fdmatch.c config.h $(INCDIR)/ansidecl.h \
$(INCDIR)/libiberty.h
@@ -749,7 +749,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
if [ x"$(NOASANFLAG)" != x ]; then \
  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/fdmatch.c -o 
noasan/$@; \
else true; fi
-   $(COMPILE.c) $(srcdir)/fdmatch.c $(OUTPUT_OPTION)
+   +$(COMPILE.c) $(srcdir)/fdmatch.c $(OUTPUT_OPTION)
 
 ./ffs.$(objext): $(srcdir)/ffs.c
if [ x"$(PICFLAG)" != x ]; then \
@@ -758,7 +758,7 @@ $(CONFIGURED_OFILES): stamp-picdir 

[PATCH 2/6] Implement a new partitioner for parallel compilation

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
When using the LTO infrastructure to compile files in parallel, we
can't simply use any of the LTO partitioner, once extra dependency
analysis is required to ensure that some nodes are correctly
partitioned together.

Therefore, here we implement a new partitioner called
"lto_merge_comdat_map" that does all these required analysis.
The partitioner works as follows:

1. We create a number of disjoint sets and inserts each node into a
   separate set, which may be merged together in the future.

2. Find COMDAT groups, and mark them to be partitioned together.

3. Check all nodes that would require any COMDAT group to be
   copied to its partition (which we name "COMDAT frontier"),
   and mark them to be partitioned together.
   This avoids duplication of COMDAT groups and crashes on the LTO
   partitioning infrastructure.

4. Check if the user allows the partitioner to promote non-public
   functions or variables to global to improve parallelization
   opportunity with a cost of modifying the output code layout.

5. Balance generated partitions for performance unless not told to.

The choice of 1. was by design, so we could use a union-find
data structure, which are know for being very fast on set unite
operations.

For 3. to work properly, we also had to modify
lto_promote_cross_file_statics to handle this case.

The parameters --param=promote-statics and --param=balance-partitions
control 4. and 5., respectively

gcc/ChangeLog:
2020-08-20  Giuliano Belinassi  

* Makefile.in: Add lto-partition.o
* cgraph.h (struct symtab_node::aux2): New variable.
* lto-partition.c: Move from gcc/lto/lto-partition.c
(add_symbol_to_partition_1): Only compute insn size
if information is available.
(node_cmp): Same as above.
(class union_find): New.
(ds_print_roots): New function.
(balance_partitions): New function.
(build_ltrans_partitions): New function.
(merge_comdat_nodes): New function.
(merge_static_calls): New function.
(merge_contained_symbols): New function.
(lto_merge_comdat_map): New function.
(privatize_symbol_name_1): Handle when WPA is not enabled.
(privatize_symbol_name): Same as above.
(lto_promote_cross_file_statics): New parameter to select when
to promote to global.
(lto_check_usage_from_other_partitions): New function.
* lto-partition.h: Move from gcc/lto/lto-partition.h
(lto_promote_cross_file_statics): Update prototype.
(lto_check_usage_from_other_partitions): Declare.
(lto_merge_comdat_map): Declare.

gcc/lto/ChangeLog:
2020-08-20  Giuliano Belinassi  

* lto-partition.c: Move to gcc/lto-partition.c.
* lto-partition.h: Move to gcc/lto-partition.h.
* lto.c: Update call to lto_promote_cross_file_statics.
* Makefile.in: Remove lto-partition.o.
---
 gcc/Makefile.in   |   1 +
 gcc/cgraph.h  |   1 +
 gcc/{lto => }/lto-partition.c | 463 +-
 gcc/{lto => }/lto-partition.h |   4 +-
 gcc/lto/Make-lang.in  |   4 +-
 gcc/lto/lto.c |   2 +-
 gcc/params.opt|   8 +
 gcc/tree.c|  23 +-
 8 files changed, 489 insertions(+), 17 deletions(-)
 rename gcc/{lto => }/lto-partition.c (78%)
 rename gcc/{lto => }/lto-partition.h (89%)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..be42b15f4ff 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1457,6 +1457,7 @@ OBJS = \
lra-spills.o \
lto-cgraph.o \
lto-streamer.o \
+   lto-partition.o \
lto-streamer-in.o \
lto-streamer-out.o \
lto-section-in.o \
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0211f08964f..b4a7871bd3d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -615,6 +615,7 @@ public:
   struct lto_file_decl_data * lto_file_data;
 
   PTR GTY ((skip)) aux;
+  int aux2;
 
   /* Comdat group the symbol is in.  Can be private if GGC allowed that.  */
   tree x_comdat_group;
diff --git a/gcc/lto/lto-partition.c b/gcc/lto-partition.c
similarity index 78%
rename from gcc/lto/lto-partition.c
rename to gcc/lto-partition.c
index 8e0488ab13e..ca962e69b5d 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto-partition.c
@@ -170,7 +170,11 @@ add_symbol_to_partition_1 (ltrans_partition part, 
symtab_node *node)
 {
   struct cgraph_edge *e;
   if (!node->alias && c == SYMBOL_PARTITION)
-   part->insns += ipa_size_summaries->get (cnode)->size;
+   {
+ /* FIXME: Find out why this is being returned NULL in some cases.  */
+ if (ipa_size_summaries->get (cnode))
+   part->insns += ipa_size_summaries->get (cnode)->size;
+   }
 
   /* Add all inline clones and callees that are duplicated.  */
   for (e = cnode->callees; e; e = e->next_callee)
@@ -372,6 +376,402 @@ lto_max_map (void)
 new_partition ("empty");
 }
 
+/* Class implementing a 

[PATCH 3/6] Implement fork-based parallelism engine

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
This patch belongs to the "Parallelize GCC with Processes" series.

Here, we implement the parallelism by forking the compiler into
multiple processes after what would be the LTO LTRANS stage,
partitioning the callgraph into several partitions, as implemented in
"maybe_compile_in_parallel". From a high level, what happens is:

1. If the partitioner manages to generate multiple partitions, the
compiler will then call lto_promote_cross_file_statics to compute
the partition boundary, and symbols are promoted to global only if
promote_statics is set to true. This option is controlled by the user
through --param=promote-statics, which is disabled by default.

2. The compiler will initialize the file passed by the driver trough
the hidden "-fsplit-outputs=", creating such file.

3. The compiler will fork into multiple processes and apply the
allocated partition to the symbol table, removing every node which
is unnecessary for the partition.

4. The parent process wait for all child processes to finish, and then
call exit (0).

For implementing 3., however, we had to do some more detailed analysis
and figure a way to correctly remove reachable nodes from the callgraph
without corrupting any other node. LTO does this by simple trowing
everything into files and reloading it, but we had to avoid this
because that would result in a huge overhead. We implemented this in
"lto_apply_partition_mask" by classifying each node according to
a dependency analysis:

* Start by trusting what lto_promote_cross_file_statics
gave to us.

* Look for nodes in which may need additional nodes to be
carried with it. For example, inline clones requires that their body
keep present, so we have to expand the boundary a little by adding
all nodes that it calls.

* If the node is in the boundary, we release all unnecessary
informations about it.  For varpool nodes, we have to declare it
external, otherwise we end up with multiple instances of the same
global variable in the program, which results in incorrect linking.

* Avoid duplicated release of function summaries (ipa-fnsummary).

* Finally, we had to delay the assembler file initialization,
delay any early assembler output to file, and remove any initialized
RTL code if a certain varaible requires to be renamed.

We also implemented a GNU Make Jobserver integration to this mechanism,
as implemented in jobserver.cc. This works as follows:

* If -fparallel-jobs=jobserver, then we will query the existence of a
jobserver by calling jobserver_initialize. This method will look if
the file descriptors provided by make are valid, and check the flags
of the read file descriptor are set to O_NONBLOCK.

* Then, the parent process will return the token which Make
originally gave to it, since the child is blocked awaiting for a
new token. To correctly block the child, there are two cases: (1)
when select is available in the host, and (2) when it is not. In
(1), we have to use it, since the read fd will have O_NONBLOCK. In
(2), we can simply read the fd, as the read is set to blocking mode.

* Once the child read a token, it will then compile its part, and return
the token before finalizing. If the compilation crash, however, the 
parent
process will correctly detect that a signal was sent to it, so there is
no need for any fancy crash control by the jobserver engine part.

gcc/ChangeLog:
2020-08-20  Giuliano Belinassi  

* jobserver.cc: New file.
* jobserver.h: New file.
* cgraph.c (cgraph_node::maybe_release_dominators): New function.
* cgraph.h (symtab_node::find_by_order): Declare.
(symtab_node::find_by_name): Declare.
(symtab_node::find_by_asm_name): Declare.
(maybe_release_dominators): Declare.
* cgraphunit.c (cgraph_node::expand): Quickly return if body removed.
(ipa_passes): Run all_regular_ipa_passes if split_outputs.
(is_number): New function.
(childno): New variable.
(maybe_compile_in_parallel): New function.
* ipa-fnsummary (pass_ipa_free_fn_summary::gate): Avoid running twice
when compiling in parallel.
* ipa-icf.c (sem_item_optimizer::filter_removed_items): Behaviour when
compiling in parallel should be the same as if in LTO.
* ipa-visibility (localize_node): Same as above.
lto-cgraph.c (handle_node_in_boundary): New function.
(compute_boundary): New function.
(lto_apply_partition_mask): New function.
symtab.c: (symbol_table::change_decl_assembler_name): Discard RTL decl
if name changed.
(symtab_node::dump_base): Dump aux2.
(symtab_node::find_by_order): New function.
(symtab_node::find_by_name): New function.
(symtab_node::find_by_asm_name): New function.
  

[PATCH 5/6] Add invoke documentation

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
Add documentation about how to invoke GCC in order to use parallel
compilation.

gcc/ChangeLog:
20-08-2020  Giuliano Belinassi  

* doc/invoke.texi: Document -fparallel-jobs=.
---
 gcc/doc/invoke.texi | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 70dc1ab73a1..18cebf99dfd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -504,7 +504,8 @@ Objective-C and Objective-C++ Dialects}.
 -fno-sched-spec  -fno-signed-zeros @gol
 -fno-toplevel-reorder  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
 -fomit-frame-pointer  -foptimize-sibling-calls @gol
--fpartial-inlining  -fpeel-loops  -fpredictive-commoning @gol
+-fpartial-inlining  -fparallel-jobs=@var{alg} @gol
+-fpeel-loops  -fpredictive-commoning @gol
 -fprefetch-loop-arrays @gol
 -fprofile-correction @gol
 -fprofile-use  -fprofile-use=@var{path} -fprofile-partial-training @gol
@@ -14511,6 +14512,35 @@ of the function name, it is considered to be a match.  
For C99 and C++
 extended identifiers, the function name must be given in UTF-8, not
 using universal character names.
 
+@item -fparallel-jobs=@var{n}
+@opindex parallel
+This option is experimental.
+
+This option enables parallel compilation of files using a maximum of
+@var{n} parallel jobs.  When invoked, it tries to distribute the symbols
+within the file into multiple partitions and compile them in parallel.
+
+For now, private symbols are paritioned together with public symbols
+if there are references to them to avoid code layout modifications
+when compiling.  This means that compiling a file
+with very few public symbols will not provide noticeable improvements
+in compilation time.  However, you can use
+@option{--param=promote-statics=1} to allow GCC to automatically
+promote a symbol to be globally available, improving compilation
+performance in exchange to changing code layout.
+
+You can also specify @option{-fparallel-jobs=jobserver} to use GNU make's
+job server mode to determine the number of parallel jobs.  This
+is useful when the Makefile calling GCC is already executing in parallel.
+You must prepend a @samp{+} to the command recipe in the parent Makefile
+for this to work.  This option likely only works if @env{MAKE} is
+GNU make.  If you specify @option{-fparallel-jobs=auto}, GCC will try to
+automatically detect a running GNU make's job server.
+
+An extra parameter, @option{--param=balance-partitions=0} can be used to
+avoid balancing created partitions.  This should only be used to debug
+the compiler.
+
 @item -fpatchable-function-entry=@var{N}[,@var{M}]
 @opindex fpatchable-function-entry
 Generate @var{N} NOPs right at the beginning
-- 
2.28.0



[PATCH 6/6] New tests for parallel compilation feature

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
Adds new tests for testing the parallel compilation engine.
They mainly test issues with regard to symbol promotion clash and
incorrect early assembler output.

2020-08-20  Giuliano Belinassi  

* gcc.dg/parallel-early-constant.c: New test.
* gcc.dg/parallel-static-1.c: New test.
* gcc.dg/parallel-static-2.c: New test.
* gcc.dg/parallel-static-clash-1.c: New test.
* gcc.dg/parallel-static-clash-aux.c: New test.
---
 .../gcc.dg/parallel-early-constant.c  | 22 ++
 gcc/testsuite/gcc.dg/parallel-static-1.c  | 21 +
 gcc/testsuite/gcc.dg/parallel-static-2.c  | 21 +
 .../gcc.dg/parallel-static-clash-1.c  | 23 +++
 .../gcc.dg/parallel-static-clash-aux.c| 14 +++
 5 files changed, 101 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c

diff --git a/gcc/testsuite/gcc.dg/parallel-early-constant.c 
b/gcc/testsuite/gcc.dg/parallel-early-constant.c
new file mode 100644
index 000..fc8c5a986ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-early-constant.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+#define A "This is a long test that tests the structure initialization"
+#define B A,A
+#define C B,B,B,B
+#define D C,C,C,C
+
+const char *foo1 ()
+{
+  return A;
+}
+
+int foo2 ()
+{
+  return 42;
+}
+
+int main()
+{
+  char *subs[]={ D, D, D, D, D, D, D, D, D, D, D, D, D, D, D};
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-1.c 
b/gcc/testsuite/gcc.dg/parallel-static-1.c
new file mode 100644
index 000..cf1cc7df93d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+static int global_var;
+
+int foo1(void)
+{
+  global_var = 1;
+}
+
+int foo2(void)
+{
+  global_var = 2;
+}
+
+int main ()
+{
+  foo1 ();
+  foo2 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-2.c 
b/gcc/testsuite/gcc.dg/parallel-static-2.c
new file mode 100644
index 000..44f5b0d5a02
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+int foo1(void)
+{
+  static int var;
+  var = 1;
+}
+
+int foo2(void)
+{
+  static int var;
+  var = 2;
+}
+
+int main ()
+{
+  foo1 ();
+  foo2 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-clash-1.c 
b/gcc/testsuite/gcc.dg/parallel-static-clash-1.c
new file mode 100644
index 000..37a01e28b1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-clash-1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0 
--param=promote-statics=1" } */
+/* { dg-additional-sources "parallel-static-clash-aux.c" } */
+
+int file2_c ();
+
+static int __attribute__ ((noinline))
+private ()
+{
+  return 42;
+}
+
+int
+file1_c ()
+{
+  return private ();
+}
+
+int
+main ()
+{
+  return file1_c () + file2_c ();
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c 
b/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
new file mode 100644
index 000..aac473933a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+static int __attribute__ ((noinline))
+private ()
+{
+  return -42;
+}
+
+int
+file2_c ()
+{
+  return private ();
+}
-- 
2.28.0



[PATCH 1/6] Modify gcc driver for parallel compilation

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
Update the driver for parallel compilation. This process work as
follows:

When calling gcc, the driver will check if the flag
"-fparallel-jobs" was provided by the user. If yes, then we will
check what is the desired output, and if it can be parallelized.
There are the following cases, which is described:

1. -S or -E was provided: We can't run in parallel, as the output
   can not be easily merged together into one file.

2. -c was provided: When cc1* forks into multiple processes, it
   must tell the driver where it stored its generated assembler files.
   Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
   and we check if "filename" was created by it. If yes, we open it,
   call assembler for each generated asm file
   (this file must not be empty), and link them together with
   partial linking to a single .o file. This process is done for each
   object file in the argument list.

3. -c was not provided, and the final product will be an binary: Here
   we proceed exactly as 2., but we avoid doing the partial
   linking, feeding the generated object files directly into the final link.

For that to work, we had to heavily modify how the "execute" function
works, extracting common code which is used multiple times, and
also detecting when the command is a call to a compiler or an
assembler, as can be seen in append_split_outputs.

Finally, we added some tests which reflects all cases found when
bootstrapping the compiler, so development of further features to the
driver get faster for now on.

gcc/ChangeLog
2020-08-20  Giuliano Belinassi  

* common.opt (fsplit-outputs): New flag.
(fparallel-jobs): New flag.
* gcc.c (extra_arg_storer): New class.
(have_S): New variable.
(struct command): Move from execute.
(is_compiler): New function.
(is_assembler): New function.
(get_number_of_args): New function.
(get_file_by_lines): New function.
(identify_asm_file): New function.
(struct infile): New attribute temp_additional_asm.
(current_infile): New variable.
(get_path_to_ld): New function.
(has_hidden_E): New function.
(sort_asm_files): New function.
(append_split_outputs): New function.
(print_command): New function.
(print_commands): New function.
(print_argbuf): New function.
(handle_verbose): Extracted from execute.
(append_valgrind): Same as above.
(async_launch_commands): Same as above.
(await_commands_to_finish): Same as above.
(split_commands): Same as above.
(parse_argbuf): Same as above.
(execute): Refator.
(fsplit_arg): New function.
(alloc_infile): Initialize infiles with 0.
(process_command): Remember when -S was passed.
(do_spec_on_infiles): Remember current infile being processed.
(maybe_run_linker): Replace object files when -o is a executable.
(finalize): Deinitialize temp_object_files.

gcc/testsuite/ChangeLog:
20-08-2020  Giuliano Belinassi  

* driver/driver.exp: New test.
* driver/a.c: New file.
* driver/b.c: New file.
* driver/empty.c: New file.
* driver/foo.c: New file.
---
 gcc/common.opt  |4 +
 gcc/gcc.c   | 1219 ---
 gcc/testsuite/driver/a.c|6 +
 gcc/testsuite/driver/b.c|6 +
 gcc/testsuite/driver/driver.exp |   80 ++
 gcc/testsuite/driver/empty.c|0
 gcc/testsuite/driver/foo.c  |7 +
 7 files changed, 1049 insertions(+), 273 deletions(-)
 create mode 100644 gcc/testsuite/driver/a.c
 create mode 100644 gcc/testsuite/driver/b.c
 create mode 100644 gcc/testsuite/driver/driver.exp
 create mode 100644 gcc/testsuite/driver/empty.c
 create mode 100644 gcc/testsuite/driver/foo.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 4b08e91859f..4aa3ad8c95b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3465,4 +3465,8 @@ fipa-ra
 Common Report Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fsplit-outputs=
+Common Joined Var(split_outputs)
+-fsplit-outputs=  Filename in which current Compilation Unit will be 
split to.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 10bc9881aed..c276a11ca7a 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -343,6 +343,74 @@ static struct obstack obstack;
 
 static struct obstack collect_obstack;
 
+/* This is used to store new argv arrays created dinamically to avoid memory
+   leaks.  */
+
+class extra_arg_storer
+{
+  public:
+
+/* Initialize the vec with a default size.  */
+
+extra_arg_storer ()
+  {
+   string_vec.create (8);
+   extra_args.create (64);
+  }
+
+/* Create new array of strings of size N.  */
+const char **create_new (size_t n)
+  {
+   const char **ret = XNEWVEC (const char *, n);
+   

[PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-20 Thread Giuliano Belinassi via Gcc-patches
This patch series add a new flag "-fparallel-jobs=" to control if the
compiler should try to compile the current file in parallel.

There are three modes which is supported by now:

1. -fparallel-jobs=: Try to compile the file using a maximum of N
jobs.

2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
Jobserver. If positive, communicate with it in order to launch jobs,
but alert the user if the jobserver was not found, since it requires
modifications in the project Makefile.

3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
of 2 jobs if the jobserver was not found.

The parallelization works by using a modified LTO engine, as no IR is
dumped into the disk, and a new partitioner is employed to find
symbols which must be partitioned together.

In order to implement the parallelism feature, we:

1. The driver will pass a hidden -fsplit-outputs= to cc1*.

2. After IPA, cc1* will search for symbols in which must be partitioned
together.  If the user allows GCC to automatically promote symbols to
globals through "--param=promote-statics=1" for a better parallel
compilation performance, it will also be done.  However, if it decides
that partitioning is a bad idea, it will continue with a default serial
compilation, and the additional  will not be created.  It will
avoid compiling in parallel if and only if:

  * File size exceeds the minimum file size specified by LTO default
  --param=lto-min-partition.

  * The partitioner is unable to find any point of partitioning in the
  file.

3. cc1* will fork itself; one fork for each partition. Each child
process will apply its partition mask generated by the partitioner
and write a new assembler name file to  pointed by the driver.

4. The driver will open each file and partially link them together into
a single .o file, if -c was requested, else into a binary.  -S and -E
is unsupported for now and probably will remain so.


Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
when testing with two files in GCC, as stated in the following table.
The test was the result of a single execution with a previous warm up
execution. The compiled GCC had checking enabled, and therefore release
version might have better timings in both sequential and parallel, but the
speedup may remain the same.

||| Without Static | With Static |   Max   |
| File   | Sequential |Promotion   |  Promotion  | Speedup |
||||---|
| gimple-match.c | 60s|   63s  | 34s |   1.7x  |
| insn-emit.c| 37s|   19s  | 20s |   1.9x  |

Notice that we have a slowdown in some cases when it is enabled, that
is why the parallelism feature is enabled with a flag for now.

Bootstrapped and Regtested on Linux x86_64.

Giuliano Belinassi (6):
  Modify gcc driver for parallel compilation
  Implement a new partitioner for parallel compilation
  Implement fork-based parallelism engine
  Add `+' for Jobserver Integration
  Add invoke documentation
  New tests for parallel compilation feature

 gcc/Makefile.in   |6 +-
 gcc/cgraph.c  |   16 +
 gcc/cgraph.h  |   13 +
 gcc/cgraphunit.c  |  198 ++-
 gcc/common.opt|4 +
 gcc/doc/invoke.texi   |   32 +-
 gcc/gcc.c | 1219 +
 gcc/ipa-fnsummary.c   |2 +-
 gcc/ipa-icf.c |3 +-
 gcc/ipa-visibility.c  |3 +-
 gcc/ipa.c |4 +-
 gcc/jobserver.cc  |  168 +++
 gcc/jobserver.h   |   33 +
 gcc/lto-cgraph.c  |  172 +++
 gcc/{lto => }/lto-partition.c |  463 ++-
 gcc/{lto => }/lto-partition.h |4 +-
 gcc/lto-streamer.h|4 +
 gcc/lto/Make-lang.in  |4 +-
 gcc/lto/lto.c |2 +-
 gcc/params.opt|8 +
 gcc/symtab.c  |   46 +-
 gcc/testsuite/driver/a.c  |6 +
 gcc/testsuite/driver/b.c  |6 +
 gcc/testsuite/driver/driver.exp   |   80 ++
 gcc/testsuite/driver/empty.c  |0
 gcc/testsuite/driver/foo.c|7 +
 .../gcc.dg/parallel-early-constant.c  |   22 +
 gcc/testsuite/gcc.dg/parallel-static-1.c  |   21 +
 gcc/testsuite/gcc.dg/parallel-static-2.c  |   21 +
 .../gcc.dg/parallel-static-clash-1.c  |   23 +
 .../gcc.dg/parallel-static-clash-aux.c|   14 +
 gcc/toplev.c  |   58 +-
 gcc/toplev.h

Re: [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type.

2020-08-20 Thread Segher Boessenkool
Hi!

On Tue, Aug 11, 2020 at 12:23:05PM -0700, Carl Love wrote:
> +;; 128-bit int modes
> +(define_mode_iterator VEC_I128 [V1TI TI])

We already have VSX_TI for this (in vsx.md).  Rename that to something
without VSX, and move it to vector.md or such?  Maybe name it VEC_TI
or anyTI.

Do that renaming as a separate patch before this one?  It is logically
separate, and it is boring stuff, so putting it in a separate patch
makes the non-boring stuff stand out more.

(It would be better if we could just get rid of V1TI, but that isn't
going to happen soon).

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -367,7 +367,7 @@
> UNSPEC_INSERTR
> UNSPEC_REPLACE_ELT
> UNSPEC_REPLACE_UN
> - UNSPEC_XXSWAPD_V1TI
> + UNSPEC_XXSWAPD_VEC_I128

Why not just UNSPEC_XXSWAPD?  And, why an unspec at all?


Segher


[committed 2/3] [og10] Annotate inner loops in "acc kernels loop" directives (C/C++).

2020-08-20 Thread Sandra Loosemore
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.
---
 gcc/c-family/ChangeLog.omp |  7 +
 gcc/c-family/c-omp.c   | 36 ++
 gcc/testsuite/ChangeLog.omp|  9 ++
 .../c-c++-common/goacc/combined-directives.c   |  2 +-
 .../goacc/kernels-loop-annotation-18.c | 18 +++
 .../goacc/kernels-loop-annotation-19.c | 19 
 6 files changed, 78 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c

diff --git a/gcc/c-family/ChangeLog.omp b/gcc/c-family/ChangeLog.omp
index 92ffa85..51a3b9e 100644
--- a/gcc/c-family/ChangeLog.omp
+++ b/gcc/c-family/ChangeLog.omp
@@ -1,3 +1,10 @@
+2020-08-19  Sandra Loosemore  
+
+   Annotate inner loops in "acc kernels loop" directives (C/C++).
+
+   * c-omp.c (annotate_loops_in_kernels_regions): Process inner
+   loops in combined "acc kernels loop" directives.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index c1d6afa..24f2448 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -2782,18 +2782,30 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
 purpose; for example, all available levels of parallelism may
-have been used up.  */
-  {
-   struct annotation_info nested_info
- = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
- node, info };
-   if (info->state >= as_in_kernels_region)
- do_not_annotate_loop_nest (info, as_explicit_annotation,
-node);
-   walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
-  (void *) _info, NULL);
-   *walk_subtrees = 0;
-  }
+have been used up.  However, assume that the combined construct
+"#pragma acc kernels loop" means to try to process the whole
+loop nest.
+Note that a single OACC_LOOP construct represents an entire set
+of collapsed loops so we do not have to deal explicitly with the
+collapse clause here, as the Fortran front end does.  */
+  if (info->state == as_in_kernels_region && OACC_LOOP_COMBINED (node))
+   {
+ walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
+(void *) info, NULL);
+ *walk_subtrees = 0;
+   }
+  else
+   {
+ struct annotation_info nested_info
+   = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
+   node, info };
+ if (info->state >= as_in_kernels_region)
+   do_not_annotate_loop_nest (info, as_explicit_annotation,
+  node);
+ walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
+(void *) _info, NULL);
+ *walk_subtrees = 0;
+   }
   break;
 
 case FOR_STMT:
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index ffc8f63..970345b 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,12 @@
+2020-08-19   Sandra Loosemore  
+
+   Annotate inner loops in "acc kernels loop" directives (C/C++).
+
+   * c-c++-common/goacc/kernels-loop-annotation-18.c: New.
+   * c-c++-common/goacc/kernels-loop-annotation-19.c: New.
+   * c-c++-common/goacc/combined-directives.c: Adjust expected
+   patterns.
+
 2020-08-19  Kwok Cheung Yeung  
 
* gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c 
b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c2a3c57..2519f23 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -110,7 +110,7 @@ test ()
 // { dg-final { scan-tree-dump-times "acc loop worker" 2 

[committed 3/3] [OG10] Annotate inner loops in "acc kernels loop" directives (Fortran).

2020-08-20 Thread Sandra Loosemore
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  

gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
---
 gcc/fortran/ChangeLog.omp  |  8 
 gcc/fortran/openmp.c   | 50 +-
 gcc/testsuite/ChangeLog.omp| 12 ++
 .../gfortran.dg/goacc/combined-directives.f90  | 19 ++--
 .../goacc/kernels-loop-annotation-18.f95   | 28 
 .../goacc/kernels-loop-annotation-19.f95   | 29 +
 .../goacc/private-explicit-kernels-1.f95   |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95  |  7 ++-
 8 files changed, 151 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 1d1ee9e..88b2729 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,5 +1,13 @@
 2020-08-19  Sandra Loosemore  
 
+   Annotate inner loops in "acc kernels loop" directives (Fortran).
+
+   * openmp.c (annotate_do_loops_in_kernels): Handle
+   EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
+   loops in a combined "acc kernels loop" directive.
+
+2020-08-19  Sandra Loosemore  
+
Add a "combined" flag for "acc kernels loop" etc directives.
 
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 9d13863..b9e4bda 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -6910,7 +6910,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,
 
case EXEC_OACC_PARALLEL_LOOP:
case EXEC_OACC_PARALLEL:
-   case EXEC_OACC_KERNELS_LOOP:
case EXEC_OACC_LOOP:
  /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
@@ -6955,6 +6954,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,
}
  break;
 
+   case EXEC_OACC_KERNELS_LOOP:
+ /* This is a combined "acc kernels loop" directive.  We want to
+leave the outer loop alone but try to annotate any nested
+loops in the body.  The expected structure nesting here is
+  EXEC_OACC_KERNELS_LOOP
+EXEC_OACC_KERNELS_LOOP
+  EXEC_DO
+EXEC_DO
+  ...body...  */
+ if (code->block)
+   /* Might be empty?  */
+   {
+ gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP);
+ gfc_omp_clauses *clauses = code->ext.omp_clauses;
+ int collapse = clauses->collapse;
+ gfc_expr_list *tile = clauses->tile_list;
+ gfc_code *inner = code->block->next;
+
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+
+ /* We need to skip over nested loops covered by "collapse" or
+"tile" clauses.  "Tile" takes precedence
+(see gfc_trans_omp_do).  */
+ if (tile)
+   {
+ collapse = 0;
+ for (gfc_expr_list *el = tile; el; el = el->next)
+   collapse++;
+   }
+ if (clauses->orderedc)
+   collapse = clauses->orderedc;
+ if (collapse <= 0)
+   collapse = 1;
+ for (int i = 1; i < collapse; i++)
+   {
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+ inner = inner->block->next;
+   }
+ if (inner)
+   /* Loop might have empty body?  */
+   annotate_do_loops_in_kernels (inner->block->next,
+ inner, goto_targets,
+   

[committed 1/3] [OG10] Add a "combined" flag for "acc kernels loop" etc directives.

2020-08-20 Thread Sandra Loosemore
2020-08-19  Sandra Loosemore  

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED.  Update all call sites.
---
 gcc/ChangeLog.omp  |  6 ++
 gcc/c/ChangeLog.omp|  6 ++
 gcc/c/c-parser.c   |  3 +++
 gcc/cp/ChangeLog.omp   |  6 ++
 gcc/cp/parser.c|  3 +++
 gcc/fortran/ChangeLog.omp  |  7 +++
 gcc/fortran/trans-openmp.c | 30 +++---
 gcc/tree.h |  5 +
 8 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 063eda3..6bed0b9 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,9 @@
+2020-08-19  Sandra Loosemore  
+
+   Add a "combined" flag for "acc kernels loop" etc directives.
+
+   * tree.h (OACC_LOOP_COMBINED): New.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/c/ChangeLog.omp b/gcc/c/ChangeLog.omp
index 7eff6ab..2803482 100644
--- a/gcc/c/ChangeLog.omp
+++ b/gcc/c/ChangeLog.omp
@@ -1,3 +1,9 @@
+2020-08-19  Sandra Loosemore  
+
+   Add a "combined" flag for "acc kernels loop" etc directives.
+
+   * c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b7ed742..a895bdb 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -16798,6 +16798,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);
 
   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -16816,6 +16817,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
   tree block = c_begin_compound_stmt (true);
   tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL,
 if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp
index 023321a..3c97668 100644
--- a/gcc/cp/ChangeLog.omp
+++ b/gcc/cp/ChangeLog.omp
@@ -1,3 +1,9 @@
+2020-08-19  Sandra Loosemore  
+
+   Add a "combined" flag for "acc kernels loop" etc directives.
+
+   * parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 437253e..b657e4f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -41185,6 +41185,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
 omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);
 
   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -41203,6 +41204,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
   tree block = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
   tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (block));
 
diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index e64bf82..1d1ee9e 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,10 @@
+2020-08-19  Sandra Loosemore  
+
+   Add a "combined" flag for "acc kernels loop" etc directives.
+
+   * trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
+   use it to set OACC_LOOP_COMBINED.  Update all call sites.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f2e6868..1c8ca81 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4273,7 +4273,8 @@ typedef struct dovar_init_d {
 
 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
- gfc_omp_clauses *do_clauses, tree par_clauses)
+ gfc_omp_clauses *do_clauses, tree par_clauses,
+ bool combined)
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
@@ -4601,7 +4602,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
stmtblock_t *pblock,
 case EXEC_OMP_DO: stmt = make_node (OMP_FOR); 

[committed 0/3] [OG10] openacc: Fix annotation of inner loops in combined "acc kernels loop" directives

2020-08-20 Thread Sandra Loosemore
The annotator that detects loops in kernels regions and adds "auto"
attributes to them presently ignores loops nested in an
explicitly-annotated loop, on the theory that the user likely marked
up only some of the loops in the nest as a means of deliberately
controlling the parallelism.

Inspection of actual user code, though, indicates that this isn't
really the correct expected behavior for combined "acc kernels loop"
directives.  Here the expectation seems to be that the entire loop
nest should be considered for annotation.  This series of patches
implements this tweak to the annotation logic in the C, C++, and
Fortran front ends.

We've had some internal discussion to confirm that this is a
reasonable change in behavior, but nobody is available to review these
patches in a timely manner, so I have gone ahead and committed them to
the OG10 branch.  In due course they will be combined with the
original annotation patches (which haven't been reviewed properly yet
either) and any other bug fixes or tweaks to the behavior, and
resubmitted for mainline.

-Sandra

Sandra Loosemore (3):
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).

 gcc/ChangeLog.omp  |  6 +++
 gcc/c-family/ChangeLog.omp |  7 +++
 gcc/c-family/c-omp.c   | 36 ++--
 gcc/c/ChangeLog.omp|  6 +++
 gcc/c/c-parser.c   |  3 ++
 gcc/cp/ChangeLog.omp   |  6 +++
 gcc/cp/parser.c|  3 ++
 gcc/fortran/ChangeLog.omp  | 15 +++
 gcc/fortran/openmp.c   | 50 +-
 gcc/fortran/trans-openmp.c | 30 -
 gcc/testsuite/ChangeLog.omp| 21 +
 .../c-c++-common/goacc/combined-directives.c   |  2 +-
 .../goacc/kernels-loop-annotation-18.c | 18 
 .../goacc/kernels-loop-annotation-19.c | 19 
 .../gfortran.dg/goacc/combined-directives.f90  | 19 ++--
 .../goacc/kernels-loop-annotation-18.f95   | 28 
 .../goacc/kernels-loop-annotation-19.f95   | 29 +
 .../goacc/private-explicit-kernels-1.f95   |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95  |  7 ++-
 gcc/tree.h |  5 +++
 20 files changed, 284 insertions(+), 33 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

-- 
2.8.1



[committed] analyzer: fix infinite recursion ICE on unions [PR96723]

2020-08-20 Thread David Malcolm via Gcc-patches
Attempts to store sm-state into a union in C++ triggered an infinite
recursion when trying to generate a representative tree, due to
erroneously trying to use the dtor of the union as a field.

Fix it by filtering out non-FIELD_DECLs when walking TYPE_FIELDs
in region::get_subregions_for_binding.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2789-g00cb0f5840795698557731c6e549a5ce99573223.

gcc/analyzer/ChangeLog:
PR analyzer/96723
* region-model-manager.cc
(region_model_manager::get_field_region): Assert that field is a
FIELD_DECL.
* region.cc (region::get_subregions_for_binding): In
union-handling, filter the TYPE_FIELDS traversal to just FIELD_DECLs.

gcc/testsuite/ChangeLog:
PR analyzer/96723
* g++.dg/analyzer/pr96723.C: New test.
---
 gcc/analyzer/region-model-manager.cc|  2 ++
 gcc/analyzer/region.cc  |  2 ++
 gcc/testsuite/g++.dg/analyzer/pr96723.C | 10 ++
 3 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr96723.C

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 422c4a95e7b..75402649a91 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -781,6 +781,8 @@ region_model_manager::get_region_for_global (tree expr)
 const region *
 region_model_manager::get_field_region (const region *parent, tree field)
 {
+  gcc_assert (TREE_CODE (field) == FIELD_DECL);
+
   field_region::key_t key (parent, field);
   if (field_region *reg = m_field_regions.get (key))
 return reg;
diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index c3dc8cdfa84..1823901a3ee 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -311,6 +311,8 @@ region::get_subregions_for_binding (region_model_manager 
*mgr,
for (tree field = TYPE_FIELDS (get_type ()); field != NULL_TREE;
 field = DECL_CHAIN (field))
  {
+   if (TREE_CODE (field) != FIELD_DECL)
+ continue;
const region *subregion = mgr->get_field_region (this, field);
subregion->get_subregions_for_binding (mgr,
   relative_bit_offset,
diff --git a/gcc/testsuite/g++.dg/analyzer/pr96723.C 
b/gcc/testsuite/g++.dg/analyzer/pr96723.C
new file mode 100644
index 000..5d9980c9d2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr96723.C
@@ -0,0 +1,10 @@
+void
+foo ()
+{
+  union
+  {
+int *p;
+  } u;
+  u.p = new int;
+  delete u.p;
+}
-- 
2.26.2



Re: [PATCH] improve validation of attribute arguments (PR c/78666)

2020-08-20 Thread Aldy Hernandez via Gcc-patches
First, didn't Marek say in the PR that the diagnostic code should go in 
diagnose_mismatched_attributes?


An overall comment-- could we write a generic validator rather than 
having to special case validation on a case by case basis?


Is there way of marking attributes as immutable if specified on the same 
decl?  For example, marking that alloc_size, nonnull, sentinel, etc 
should never have differing versions of the same attribute for the same 
decl name?  And then you could go and validate each of those attributes 
such marked automatically.


Or is this too much work?  Marek, as the C maintainer what are your 
thoughts? ;-)


Regardless, here are some random comments.

On 7/9/20 2:01 AM, Martin Sebor via Gcc-patches wrote:

GCC has gotten better at detecting conflicts between various
attributes but it still doesn't do a perfect job of detecting
similar problems due to mismatches between contradictory
arguments to the same attribute.  For example,

   __attribute ((alloc_size (1))) void* allocate (size_t, size_t);

followed by

   __attribute ((alloc_size (2))) void* allocate (size_t, size_t);

is accepted with the former overriding the latter in calls to
the function.  Similar problem exists with a few other attributes
that take arguments.

The attached change adds a new utility function that checks for
such mismatches and issues warnings.  It also adds calls to it
to detect the problem in attributes alloc_align, alloc_size, and
section.  This isn't meant to be a comprehensive fix but rather
a starting point for one.

Tested on x86_64-linux.

Martin

PS I ran into this again while debugging some unrelated changes
and wondering about the behavior in similar situations to mine.
Since the behavior seemed clearly suboptimal I figured I might
as well fix it.

PPS The improved checking triggers warnings in a few calls to
__builtin_has_attribute due to apparent conflicts.  I've xfailed
those in the test since it's a known issue with some existing
attributes that should be fixed at some point.  Valid uses of
the built-in shouldn't trigger diagnostics except for completely
nonsensical arguments.  Unfortunately, the line between valid
and completely nonsensical is a blurry one (GCC either issues
errors, or -Wattributes, or silently ignores some cases
altogether, such as those that are the subject of this patch)
and there is no internal mechanism to control the response.




diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 37214831538..bc4f409e346 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -720,6 +725,124 @@ positional_argument (const_tree fntype, const_tree 
atname, tree pos,
   return pos;
 }
 
+/* Given a pair of NODEs for arbitrary DECLs or TYPEs, validate one or

+   two integral or string attribute arguments NEWARGS to be applied to
+   NODE[0] for the absence of conflicts with the same attribute arguments
+   already applied to NODE[1]. Issue a warning for conflicts and return
+   false.  Otherwise, when no conflicts are found, return true.  */
+
+static bool
+validate_attr_args (tree node[2], tree name, tree newargs[2])


I think you're doing too much work in one function.  Also, I *really* 
dislike sending pairs of objects in arrays, especially when they're 
called something so abstract as "node" and "newargs".


Would it be possible to make this function only validate one single 
argument and call it twice?  Or do we gain something by having it do two 
things at once?



+{
+  /* First validate the arguments against those already applied to
+ the same declaration (or type).  */
+  tree self[2] = { node[0], node[0] };
+  if (node[0] != node[1] && !validate_attr_args (self, name, newargs))
+return false;
+
+  if (!node[1])
+return true;
+
+  /* Extract the same attribute from the previous declaration or type.  */
+  tree prevattr = NULL_TREE;
+  if (DECL_P (node[1]))
+{
+  prevattr = DECL_ATTRIBUTES (node[1]);
+  if (!prevattr)
+   {
+ tree type = TREE_TYPE (node[1]);
+ prevattr = TYPE_ATTRIBUTES (type);
+   }
+}
+  else if (TYPE_P (node[1]))
+prevattr = TYPE_ATTRIBUTES (node[1]);
+
+  const char* const namestr = IDENTIFIER_POINTER (name);
+  prevattr = lookup_attribute (namestr, prevattr);
+  if (!prevattr)
+return true;
+
+  /* Extract one or both attribute arguments.  */
+  tree prevargs[2];
+  prevargs[0] = TREE_VALUE (TREE_VALUE (prevattr));
+  prevargs[1] = TREE_CHAIN (TREE_VALUE (prevattr));
+  if (prevargs[1])
+prevargs[1] = TREE_VALUE (prevargs[1]);
+
+  /* Both arguments must be equal or, for the second pair, neither must
+ be provided to succeed.  */
+  bool arg1eq, arg2eq;
+  if (TREE_CODE (newargs[0]) == INTEGER_CST)
+{
+  arg1eq = tree_int_cst_equal (newargs[0], prevargs[0]);
+  if (newargs[1] && prevargs[1])
+   arg2eq = tree_int_cst_equal (newargs[1], prevargs[1]);
+  else
+   arg2eq = newargs[1] == prevargs[1];
+}
+  else if (TREE_CODE 

Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Vaseeharan Vinayagamoorthy
Hi Tobias,

This patch fixes the issue that I was seeing, thanks.
I will also now try your updated patch from 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552330.html

Kind Regards
Vasee

On 20/08/2020, 17:29, "Tobias Burnus"  wrote:

Hi,

how about my (unreviewed) patch for PR 96612 at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551900.html ?

That one also tries to solve the CXX_FOR_BUILD issue with an older GCC.
Would that solve your issue as well?

Tobias

On 8/20/20 4:19 PM, Ilya Leoshkevich via Gcc-patches wrote:

> On Thu, 2020-08-20 at 13:59 +0100, Vasee Vinayagamoorthy wrote:
>> Hello,
>>
>> After commit [1] ("Redefine NULL to nullptr"), building gcc
>> fails when $CXX_FOR_BUILD is not using C++11 mode by default.
>> This happens with gcc-4.8 which is still supported.
>>
>> This patch fixes this by adding -std=c++11 or its equivalent
>> to $CXX_FOR_BUILD using AX_CXX_COMPILE_STDCXX(11).
>>
>> Tested by successful cross native build for aarch64-none-linux-gnu
>> target.
>>
>> OK for trunk?
>>
>>
>> ChangeLog:
>>
>> 2020-08-20  Ilya Leoshkevich  
>>  Vasee Vinayagamoorthy  <
>> vaseeharan.vinayagamoor...@arm.com>
>>
>>  PR target/95700
>>  * configure: Regenerate.
>>  * configure.ac: Require C++11 for building code generation
>> tools.
>>
>> Regards
>> Vasee Vinayagamoorthy
>>
>>
>> PS: I do not have commit rights, therefore could I request
>> someone to commit it on my behalf if this patch is approved.
>> Thanks in advance.
>>
>> [1]
>> 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=d59a576b8b5e12c3a56f0262912090e2921f5daa
> Hi!
>
> First of all, sorry about losing track of this issue.
>
> Regarding your addition to the patch, I fear that
>
> CXX="$CXX -std=c++11"
>
> might break some rarely used compilers. But doing this portably is
> exactly what AX_CXX_COMPILE_STDCXX(11) is for. Do you know why isn't it
> working for you?
>
> Best regards,
> Ilya
>
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / 
Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
Alexander Walter



Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-20 Thread Segher Boessenkool
On Thu, Aug 20, 2020 at 07:31:50PM +, GT wrote:
> I'm still trying to understand why we need attribute((target("vsx"))).

You need Power8, even!  "vsx" alone is not enough (that only guarantees
Power7).  Your minimum version ("b") requires Power8.


Segher


[PATCH] c++: Implement P1009: Array size deduction in new-expressions.

2020-08-20 Thread Marek Polacek via Gcc-patches
This patch implements C++20 P1009, allowing code like

  new double[]{1,2,3}; // array bound will be deduced

Since this proposal makes the initialization rules more consistent, it is
applied to all previous versions of C++ (thus, effectively, all the way back
to C++11).

My patch is based on Jason's patch that handled the basic case.  I've
extended it to work with ()-init and also the string literal case.
Further testing revealed that to handle stuff like

  new int[]{t...};

in a template, we have to consider such a NEW_EXPR type-dependent.
Obviously, we first have to expand the pack to be able to deduce the
number of elements in the array.

Curiously, while implementing this proposal, I noticed that we fail
to accept

  new char[4]{"abc"};

so I've assigned 77841 to self.  I think the fix will depend on the
build_new_1 hunk in this patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/93529
* init.c (build_new_1): Handle new char[]{"foo"}.
(build_new): Deduce the array size in new-expression if not
present.  Handle ()-init.  Handle initializing an array from
a string literal.
* parser.c (cp_parser_new_type_id): Leave [] alone.
(cp_parser_direct_new_declarator): Allow [].
* pt.c (type_dependent_expression_p): In a NEW_EXPR, consider
array types whose dimension has to be deduced type-dependent.

gcc/testsuite/ChangeLog:

PR c++/93529
* g++.dg/cpp0x/sfinae4.C: Adjust expected result after P1009.
* g++.dg/cpp2a/new-array1.C: New test.
* g++.dg/cpp2a/new-array2.C: New test.
* g++.dg/cpp2a/new-array3.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/cp/init.c   | 54 ++-
 gcc/cp/parser.c | 11 ++--
 gcc/cp/pt.c |  4 ++
 gcc/testsuite/g++.dg/cpp0x/sfinae4.C|  8 ++-
 gcc/testsuite/g++.dg/cpp2a/new-array1.C | 70 +
 gcc/testsuite/g++.dg/cpp2a/new-array2.C | 22 
 gcc/testsuite/g++.dg/cpp2a/new-array3.C | 17 ++
 7 files changed, 180 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array3.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 872c23453fd..ae1177079e4 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3559,8 +3559,8 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   else if (array_p)
{
  tree vecinit = NULL_TREE;
- if (vec_safe_length (*init) == 1
- && DIRECT_LIST_INIT_P ((**init)[0]))
+ const size_t len = vec_safe_length (*init);
+ if (len == 1 && DIRECT_LIST_INIT_P ((**init)[0]))
{
  vecinit = (**init)[0];
  if (CONSTRUCTOR_NELTS (vecinit) == 0)
@@ -3578,6 +3578,15 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  vecinit = digest_init (arraytype, vecinit, complain);
}
}
+ /* This handles code like new char[]{"foo"}.  */
+ else if (len == 1
+  && char_type_p (TYPE_MAIN_VARIANT (type))
+  && TREE_CODE (tree_strip_any_location_wrapper ((**init)[0]))
+ == STRING_CST)
+   {
+ vecinit = (**init)[0];
+ STRIP_ANY_LOCATION_WRAPPER (vecinit);
+   }
  else if (*init)
 {
   if (complain & tf_error)
@@ -3917,6 +3926,47 @@ build_new (location_t loc, vec **placement, 
tree type,
   return error_mark_node;
 }
 
+  /* P1009: Array size deduction in new-expressions.  */
+  if (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type)
+  && *init)
+{
+  /* This means we have 'new T[]()'.  */
+  if ((*init)->is_empty ())
+   {
+ tree ctor = build_constructor (init_list_type_node, NULL);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ vec_safe_push (*init, ctor);
+   }
+  tree  = (**init)[0];
+  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
+  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
+   {
+ /* Handle new char[]("foo").  */
+ if (vec_safe_length (*init) == 1
+ && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && TREE_CODE (tree_strip_any_location_wrapper (elt))
+== STRING_CST)
+   /* Leave it alone: the string should not be wrapped in {}.  */;
+ else
+   {
+ /* Create a CONSTRUCTOR from the vector INIT.  */
+ tree list = build_tree_list_vec (*init);
+ tree ctor = build_constructor_from_list (init_list_type_node, 
list);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ CONSTRUCTOR_IS_PAREN_INIT (ctor) = true;
+ elt = ctor;
+ /* We've 

Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 20, 2020 at 07:31:50PM +, GT wrote:
> I'm still trying to understand why we need attribute((target("vsx"))).
> 
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
> 
> The documentation for target(string) states that the purpose is to allow a 
> function to be
> compiled with different target options than were specified on the command 
> line. Why would we
> want the vector version of say sinf to be compiled for a different target 
> than the scalar sinf?

The scalar sinf (well, better let's talk about some arbitrary user function,
sinf or at least the vector versions thereof will be often implemented in
assembly) can be compiled with whatever ISA options the user chooses.
But, the SIMD ABI says that the 'b' char variants are VSX only, need to pass
arguments in certain registers that might not be even available without that
ISA, similarly for returns, and per the ABI the callers will ensure such
functions are only used from code which assumes that ISA.

Have a look at the x86_64 case, we have several different variants there,
e.g. SSE2 that passes in registers that are only available in SSE1 and
later, then AVX variants which use AVX registers, AVX2 variants which use
the same registers but different in he way how integral vectors are passed,
and finally AVX512F variants that use AVX512F registers.
So, on the caller side among the other desirability checks (if the function
is declare simd at all, what vectorization factor is available, whether it
is inbranch or notinbranch or both, whether arguments are vectors or uniform
or linear etc.), the compiler also checks the ISA.  And in loops that aren't
even SSE2 (-mno-sse, -msse1) just won't use any of the simd variants and
will use always scalar version, in code that isn't -mavx or later will only
use SSE2 (or scalar), in code that isn't -mavx2 will only use AVX or SSE2 or
scalar, etc.  Either one wouldn't be even able to pass the arguments or read
the return values without those, or if it would, the ABI still says one can
assume such ISA level.
This is all implemented by:
1) having the target hook used during the vectorization decision
(simd_clone_usable) decide based on the current ISA level
2) on the side of generation of the functions, it checks if the ISA is
already available (e.g. from command line), if it is, nothing needs to be
done, otherwise target attribute is added.

Jakub



[Patch, fortran] PR fortran/96728 - Fatal Error: Reading module inquiry functions on assumed-rank

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR96728 - Fatal Error: Reading module inquiry 
functions on assumed-rank.


Patch tested only on x86_64-pc-linux-gnu.

The rank of the argument to specification functions gets written when 
writing the module file, but, since the value will be negative for 
assumed-rank arrays, the reading the module will fail.


So the patch adds code to handle signed integers.

Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96728
 * module.c (module_peek_char): Peek ahead function.
 (parse_integer): Add code for parsing signed integers.
 (parse_atom): Add code to handle signed integers.
 (peek_atom): Add code to handle signed integers.

2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96728
 * PR96728.f90: New test.


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-20 Thread GT via Gcc-patches
‐‐‐ Original Message ‐‐‐
On Thursday, August 13, 2020 5:00 PM, Jakub Jelinek  wrote:

> On Thu, Aug 13, 2020 at 08:40:22PM +, GT wrote:
>
> > I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. 
> > The latter is
> > much simpler and I see how I would add PPC attribute "vsx" similarly. If I 
> > was to follow
> > the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p 
> > called near
> > the end of the function is a problem. Because it in turn calls
> > ix86_valid_target_attribute_tree and this last function doesn't have a 
> > similarly named
> > function in PPC code.
> > Also, once the attribute "vsx" is added, where is it used? I mean that in 
> > the sense of
> > where is execution conditioned on the definition of say, the "sse2" string 
> > in x86_64?
>
> You need to trigger what will the middle-end and backend do if you use
> explicit attribute((target ("vsx"))) on the function, so in the end it
> needs to do some parsing, create a TARGET_OPTION_NODE with the right option
> changes and put it to the function.
>
> Jakub

I'm still trying to understand why we need attribute((target("vsx"))).

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

The documentation for target(string) states that the purpose is to allow a 
function to be
compiled with different target options than were specified on the command line. 
Why would we
want the vector version of say sinf to be compiled for a different target than 
the scalar sinf?

Bert.


Re: [Patch] configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

2020-08-20 Thread Joseph Myers
On Thu, 20 Aug 2020, Tobias Burnus wrote:

> Thanks for the first review; new version attached.

Thanks, this version is OK for GCC (but the GCC version will need updating 
if autoconf-archive ends up with a different version of these changes).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Richard Sandiford
Hans-Peter Nilsson via Gcc-patches  writes:
>> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >CLEAR_RESOURCE ();
>> >CLEAR_RESOURCE ();
>> >  
>> > +  /* Handle the flags register specially, to be able to accept a
>> > + candidate that clobbers it.  See also fill_simple_delay_slots.  */
>> > +  bool filter_flags
>> > += (slots_to_fill == 1
>> > +   && targetm.flags_regnum != INVALID_REGNUM
>> > +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
>> > +  struct resources fset;
>> > +  struct resources flags_res;
>> > +  if (filter_flags)
>> > +{
>> > +  CLEAR_RESOURCE ();
>> > +  CLEAR_RESOURCE (_res);
>> > +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
>> > +}
>> > +
>> >/* If we do not own this thread, we must stop as soon as we find
>> >   something that we can't put in a delay slot, since all we can do
>> >   is branch into THREAD at a later point.  Therefore, labels stop
>> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>> > don't separate or copy insns that set and use CC0.  */
>> >if (! insn_references_resource_p (trial, , true)
>> > -&& ! insn_sets_resource_p (trial, , true)
>> > +&& ! insn_sets_resource_p (trial, filter_flags ?  : , true)
>> >  && ! insn_sets_resource_p (trial, , true)
>> > +/* If we're handling sets to the flags register specially, we
>> > +   only allow an insn into a delay-slot, if it either:
>> > +   - doesn't set the flags register,
>> > +   - the "set" of the flags register isn't used (clobbered),
>> > +   - insns between the delay-slot insn and the trial-insn
>> > +   as accounted in "set", have not affected the flags register.  */
>> > +&& (! filter_flags
>> > +|| ! insn_sets_resource_p (trial, _res, true)
>> > +|| find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
>> > +|| ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>> >  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>> >  && (! own_thread || ! sets_cc0_p (pat)
>> >  && ! can_throw_internal (trial))
>> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >lose = 1;
>> >mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
>> >mark_referenced_resources (trial, , true);
>> > +  if (filter_flags)
>> > +  {
>> > +mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
>> > +
>> > +/* Groups of flags-register setters with users should not
>> > +   affect opportunities to move flags-register-setting insns
>> > +   (clobbers) into the delay-slot.  */
>> > +CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
>> 
>> If we do this, what stops us from trying to move a flags-register user
>> ahead of the associated setter, when the user doesn't itself set the
>> flags register?
>
> First, all sets are still in set (set.regs) including any that
> set the flags register between the insn with the delay-slot and
> the trial.
>
> (That's why it's separate from fset (fset.regs).  I used the
> same naming scheme as in the named commit, but perhaps a better
> name would be set_with_flags_filtered or something.)
>
>>  Feels like there should be some test involving
>> insn_references_resource_p (trial, _res, true) in the
>> (! filter_flags || ...) condition above.
>
> There is: the "! insn_references_resource_p (trial, , true)"
> (pre-existing in the context above the patch), so we don't move
> anything that references anything set (only additional flags
> register setters where the result is unused).

Ah, right, I think I was getting them the wrong way around.

Looks OK to me, but give Eric 24 hrs to object/comment.
Like you say, he's better qualified to review this, I was just
stepping in because it looked like the patch had fallen through
the cracks.

Thanks,
Richard


Re: [Patch] configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

2020-08-20 Thread Tobias Burnus

On 8/20/20 7:12 PM, Joseph Myers wrote:


It appears you're requiring _FOR_BUILD here and considering other suffixes
invalid, which would prevent any other use, e.g. _FOR_TARGET.


Actually, the main reason I required _FOR_BUILD
was that I couldn't find m4_ifnblank and then gave up ...
Now having finally found it, I removed the $4 arg check and
the later _FOR_BUILD use.


m4_if([$2], [], [dnl
  AC_CACHE_CHECK(whether $CXX supports C++$1 features by default,
-   ax_cv_cxx_compile_cxx$1,
+   ax_cv_cxx_compile_cxx$1$4,

[AC_COMPILE_IFELSE([AC_LANG_SOURCE([_AX_CXX_COMPILE_STDCXX_testbody_$1])],
  [ax_cv_cxx_compile_cxx$1=yes],
  [ax_cv_cxx_compile_cxx$1=no])])

I think this needs to update the variable name in the assignments of the
result of the check, and then in the subsequent check for whether to set
ac_success=yes, not just in the second argument to AC_CACHE_CHECK.


I know I would miss one of those ...

Thanks for the first review; new version attached.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

config/ChangeLog:

	PR bootstrap/96612
	* ax_cxx_compile_stdcxx.m4: Add fourth argument to check also
	the CXX_FOR_BUILD compiler.

ChangeLog:

	PR bootstrap/96612
	* configure.ac: Run AX_CXX_COMPILE_STDCXX also for ${build} compiler,
	if not the same as ${host}.
	* configure: Regenerate.

 config/ax_cxx_compile_stdcxx.m4 |   39 +-
 configure   | 1007 +++
 configure.ac|4 +
 3 files changed, 1039 insertions(+), 11 deletions(-)
diff --git a/configure.ac b/configure.ac
index 1a53ed418e4..392389fb2fb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1470,6 +1470,10 @@ if test "$enable_bootstrap:$GXX" = "yes:yes"; then
   CXX="$CXX -std=c++11"
 elif test "$have_compiler" = yes; then
   AX_CXX_COMPILE_STDCXX(11)
+
+  if test "${build}" != "${host}"; then
+AX_CXX_COMPILE_STDCXX(11, [], [], [_FOR_BUILD])
+  fi
 fi
 
 # Used for setting $lt_cv_objdir
diff --git a/config/ax_cxx_compile_stdcxx.m4 b/config/ax_cxx_compile_stdcxx.m4
index 9413da624d2..8c55ebd2044 100644
--- a/config/ax_cxx_compile_stdcxx.m4
+++ b/config/ax_cxx_compile_stdcxx.m4
@@ -25,6 +25,10 @@
 #   regardless, after defining HAVE_CXX${VERSION} if and only if a
 #   supporting mode is found.
 #
+#   If the fourth argument is an optional CXX/CXXFLAG/CPPFLAG suffix, e.g.
+#   "_FOR_BUILD" or "_FOR_TARGET".
+#
+#
 # LICENSE
 #
 #   Copyright (c) 2008 Benjamin Kosnik 
@@ -62,14 +66,20 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 [m4_fatal([invalid third argument `$3' to AX_CXX_COMPILE_STDCXX])])
   AC_LANG_PUSH([C++])dnl
   ac_success=no
-
+  m4_ifnblank([$4], [dnl
+ax_cv_cxx_compile_cxx$1_orig_cxx="$CXX"
+ax_cv_cxx_compile_cxx$1_orig_cxxflags="$CXXFLAGS"
+ax_cv_cxx_compile_cxx$1_orig_cppflags="$CPPFLAGS"
+CXX="$CXX$4"
+CXXFLAGS="$CXXFLAGS$4"
+CPPFLAGS="$CPPFLAGS$4"])
   m4_if([$2], [], [dnl
 AC_CACHE_CHECK(whether $CXX supports C++$1 features by default,
-		   ax_cv_cxx_compile_cxx$1,
+		   ax_cv_cxx_compile_cxx$1$4,
   [AC_COMPILE_IFELSE([AC_LANG_SOURCE([_AX_CXX_COMPILE_STDCXX_testbody_$1])],
-[ax_cv_cxx_compile_cxx$1=yes],
-[ax_cv_cxx_compile_cxx$1=no])])
-if test x$ax_cv_cxx_compile_cxx$1 = xyes; then
+[ax_cv_cxx_compile_cxx$1$4=yes],
+[ax_cv_cxx_compile_cxx$1$4=no])])
+if test x$ax_cv_cxx_compile_cxx$1$4 = xyes; then
   ac_success=yes
 fi])
 
@@ -77,7 +87,7 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
   if test x$ac_success = xno; then
 for alternative in ${ax_cxx_compile_alternatives}; do
   switch="-std=gnu++${alternative}"
-  cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1_$switch])
+  cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1$4_$switch])
   AC_CACHE_CHECK(whether $CXX supports C++$1 features with $switch,
  $cachevar,
 [ac_save_CXX="$CXX"
@@ -104,7 +114,7 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 dnl Cray's crayCC needs "-h std=c++11"
 for alternative in ${ax_cxx_compile_alternatives}; do
   for switch in -std=c++${alternative} +std=c++${alternative} "-h std=c++${alternative}"; do
-cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1_$switch])
+cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1$4_$switch])
 AC_CACHE_CHECK(whether $CXX supports C++$1 features with $switch,
$cachevar,
   [ac_save_CXX="$CXX"
@@ -127,6 +137,13 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
   fi
 done
   fi])
+  m4_ifnblank([$4], [dnl
+CXX$4="$CXX"
+CXXFLAGS$4="$CXXFLAGS"
+CPPFLAGS$4="$CPPFLAGS"
+CXX="$ax_cv_cxx_compile_cxx$1_orig_cxx"
+CXXFLAGS="$ax_cv_cxx_compile_cxx$1_orig_cxxflags"
+

[PATCH] libstdc++: Fix iota_view::size() to avoid overflow

2020-08-20 Thread Jonathan Wakely via Gcc-patches
This avoids overfow that occurs when negating the most negative value of
an integral type.

Also prevent returning signed int when the values have lower rank and
promote to int.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::iota_view::size()): Perform all
calculations in the right unsigned types.
* testsuite/std/ranges/iota/size.cc: New test.

Tested powerpc64-linux. Not yet pushed.

Does anybody see any reason to stick with exactly what C++20 requires,
despite its bugs?


commit 6e7e3f742bf22139716712908efa83c74c734ed2
Author: Jonathan Wakely 
Date:   Thu Aug 20 19:44:43 2020

libstdc++: Fix iota_view::size() to avoid overflow

This avoids overfow that occurs when negating the most negative value of
an integral type.

Also prevent returning signed int when the values have lower rank and
promote to int.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::iota_view::size()): Perform all
calculations in the right unsigned types.
* testsuite/std/ranges/iota/size.cc: New test.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index b8023e67c9f..22184006c08 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -889,12 +889,13 @@ namespace ranges
   {
using __detail::__is_integer_like;
using __detail::__to_unsigned_like;
-   if constexpr (__is_integer_like<_Winc> && __is_integer_like<_Bound>)
- return (_M_value < 0)
-   ? ((_M_bound < 0)
-   ? __to_unsigned_like(-_M_value) - __to_unsigned_like(-_M_bound)
-   : __to_unsigned_like(_M_bound) + __to_unsigned_like(-_M_value))
-   : __to_unsigned_like(_M_bound) - __to_unsigned_like(_M_value);
+   if constexpr (integral<_Winc> && integral<_Bound>)
+ {
+   using _Up = make_unsigned_t;
+   return _Up(_M_bound) - _Up(_M_value);
+ }
+   else if constexpr (__is_integer_like<_Winc>)
+ return __to_unsigned_like(_M_bound) - __to_unsigned_like(_M_value);
else
  return __to_unsigned_like(_M_bound - _M_value);
   }
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/size.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/size.cc
new file mode 100644
index 000..2a9d3870c5d
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/iota/size.cc
@@ -0,0 +1,110 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=c++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+#include 
+
+template
+constexpr bool
+equal(T t, U u) requires std::same_as
+{
+  return t == u;
+}
+
+template>
+void
+test_integer_iota()
+{
+  using std::numeric_limits;
+
+  using V = std::ranges::iota_view;
+  static_assert( std::ranges::sized_range );
+
+  constexpr V zero(0, 0);
+  static_assert( equal(zero.size(), (S)0) );
+
+  constexpr V min(numeric_limits::min(),
+ numeric_limits::min());
+  static_assert( equal(min.size(), (S)0) );
+
+  constexpr V max(numeric_limits::max(),
+ numeric_limits::max());
+  static_assert( equal(max.size(), (S)0) );
+
+  constexpr V minmax(numeric_limits::min(),
+numeric_limits::max());
+  if constexpr (sizeof(W) < sizeof(S))
+  {
+using S2 = std::make_unsigned_t;
+static_assert( equal(minmax.size(), (S)numeric_limits::max()) );
+  }
+  else
+static_assert( equal(minmax.size(), numeric_limits::max()) );
+
+  constexpr V pospos(20, 22);
+  static_assert( equal(pospos.size(), (S)2) );
+
+  if constexpr (std::numeric_limits::is_signed)
+  {
+constexpr V negneg(-20, -2);
+static_assert( equal(negneg.size(), (S)18) );
+
+constexpr V negpos(-20, 22);
+static_assert( equal(negpos.size(), (S)42) );
+  }
+}
+
+void
+test01()
+{
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+
+#ifdef __SIZEOF_INT128__
+  // When the target supports __int128 it can be used in iota_view
+  // even in strict mode where !integral<__int128>.
+  // Specify the size type explicitly, because 

Re: [PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-20 Thread Maciej W. Rozycki via Gcc-patches
On Wed, 19 Aug 2020, Richard Earnshaw wrote:

> >  That said I'm of course happy to keep the ARM overrides if you consider 
> > them still necessary in the context of the generic change made.  Let me 
> > know what you prefer, and if required, I will submit v3 with the ARM 
> > pieces removed.
[...]
> So you've made a change to the Arm target, but not tested it.  And
> what's more didn't even bother to mention that fact.

 Well, I explicitly named the targets that have been tested, and it was 
clear ARM wasn't among them.

 I admit I forgot to cc ARM maintainers with v1, which I apologise for, 
and which mistake Richard B. has kindly corrected for me.  Nobody's 
perfect.

> If you make changes, you need to test them, particularly when there are
> likely to be target-specific implications.  If you can't test yourself
> then you need to make that very clear in your submission.
> 
> There are Arm targets in the testfarm, so it's not really an excuse for
> not doing testing.

 I think it's the port maintainer's role to verify their pet target; 
that's what I have been doing on the binutils/GDB side when I was an 
active port maintainer.  I did not require people to bend backwards and 
appreciated their effort to make the toolchain better.

 It takes a maintainer maybe a couple of seconds to pull a change and push 
it through their readily available automated verification system they 
surely have, while it may be a days' effort for someone who has to figure 
out all the details, choose all the configuration options required, avoid 
pitfalls, keep rebuilding until all is sound, etc.  And then repeat that 
for every new target possibly affected.

 As the change was intended to address an issue observed with RISC-V 
targets the ARM pieces are not needed.  I've sent v3 now, which keeps 
ARM-specific parts intact so that you won't have to be involved or 
otherwise spend your time on it.  You're free to pick the parts removed of 
course and do whatever you want with them according to the GNU GPL and 
keeping in mind my copyright assignment with FSF.

 NB it is actually the case that when the original ARM fix/workaround was 
submitted that has introduced LIB2_DIVMOD_EXCEPTION_FLAGS, the failure, 
clearly not ARM-specific, should have been properly analysed and a general 
solution like mine proposed so as to fix all targets that use these 
libcalls, rather than taking care of your own business only, and making a 
local fix for ARM and letting other target developers rediscover the same 
issue.

 I regret now that I bothered touching the ARM part; I'll follow the 
example from the paragraph above and in the future I will only take care 
of my business, avoid going the extra mile in the future where it could 
only cause me trouble and give no benefit.

 Thank you for your review anyway, it has taught me something.

  Maciej


[PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-20 Thread Maciej W. Rozycki via Gcc-patches
Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
, and replace 
`-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
for the affected functions while not pulling the unwinder proper, which 
is not required here.

Beyond saving program space it fixes a RISC-V glibc build error due to 
unsatisfied `malloc' and `free' references from the unwinder causing 
link errors with `ld.so' where libgcc has been built at -O0.

gcc/
* testsuite/gcc.target/arm/div64-unwinding.c: Rename to...
* testsuite/gcc.dg/div64-unwinding.c: ... this.

libgcc/
* Makefile.in [!LIB2_DIVMOD_EXCEPTION_FLAGS]
(LIB2_DIVMOD_EXCEPTION_FLAGS): Replace `-fexceptions
-fnon-call-exceptions' with `-fasynchronous-unwind-tables'.
---
Hi,

 No change from v2 except for the removal of the ARM parts; hence no need 
to retest.  OK to apply?

  Maciej

Changes from v2:

- Removal of the ARM overrides removed.

Changes from v1:

- ChangeLog entries added.
---
 gcc/testsuite/gcc.dg/div64-unwinding.c |   25 +
 gcc/testsuite/gcc.target/arm/div64-unwinding.c |   25 -
 libgcc/Makefile.in |2 +-
 3 files changed, 26 insertions(+), 26 deletions(-)

gcc-libgcc-divmod-asynchronous-unwind-tables.diff
Index: gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
@@ -0,0 +1,25 @@
+/* Performing a 64-bit division should not pull in the unwinder.  */
+
+/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
+/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
+/* { dg-options "-O0" } */
+
+#include 
+
+long long
+foo (long long c, long long d)
+{
+  return c/d;
+}
+
+long long x = 0;
+long long y = 1;
+
+extern int (*_Unwind_RaiseException) (void *) __attribute__((weak));
+
+int main(void)
+{
+  if (&_Unwind_RaiseException != NULL)
+abort ();;
+  return foo (x, y);
+}
Index: gcc/gcc/testsuite/gcc.target/arm/div64-unwinding.c
===
--- gcc.orig/gcc/testsuite/gcc.target/arm/div64-unwinding.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* Performing a 64-bit division should not pull in the unwinder.  */
-
-/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
-/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
-/* { dg-options "-O0" } */
-
-#include 
-
-long long
-foo (long long c, long long d)
-{
-  return c/d;
-}
-
-long long x = 0;
-long long y = 1;
-
-extern int (*_Unwind_RaiseException) (void *) __attribute__((weak));
-
-int main(void)
-{
-  if (&_Unwind_RaiseException != NULL)
-abort ();;
-  return foo (x, y);
-}
Index: gcc/libgcc/Makefile.in
===
--- gcc.orig/libgcc/Makefile.in
+++ gcc/libgcc/Makefile.in
@@ -533,7 +533,7 @@ endif
 ifeq ($(LIB2_DIVMOD_EXCEPTION_FLAGS),)
 # Provide default flags for compiling divmod functions, if they haven't been
 # set already by a target-specific Makefile fragment.
-LIB2_DIVMOD_EXCEPTION_FLAGS := -fexceptions -fnon-call-exceptions
+LIB2_DIVMOD_EXCEPTION_FLAGS := -fasynchronous-unwind-tables
 endif
 
 # Build LIB2_DIVMOD_FUNCS.


Re: [committed] libstdc++: Make __int128 meet integer-class requirements [PR 96042]

2020-08-20 Thread Jonathan Wakely via Gcc-patches

On 19/08/20 20:36 +0100, Jonathan Wakely wrote:

On 19/08/20 17:00 +0100, Jonathan Wakely wrote:

Because __int128 can be used as the difference type for iota_view, we
need to ensure that it meets the requirements of an integer-class type.
The requirements in [iterator.concept.winc] p10 include numeric_limits
being specialized and giving meaningful answers. Currently we only
specialize numeric_limits for non-standard integer types in non-strict
modes.  However, nothing prevents us from defining an explicit
specialization for any implementation-defined type, so it doesn't matter
whether std::is_integral<__int128> is true or not.

This patch ensures that the numeric_limits specializations for signed
and unsigned __int128 are defined whenever __int128 is available. It
also makes the __numeric_traits and __int_limits helpers work for
__int128, via a new __gnu_cxx::__is_integer_nonstrict trait.

libstdc++-v3/ChangeLog:

PR libstdc++/96042
* include/ext/numeric_traits.h (__is_integer_nonstrict): New
trait which is true for 128-bit integers even in strict modes.
(__numeric_traits_integer, __numeric_traits): Use
__is_integer_nonstrict instead of __is_integer.
* include/std/limits [__STRICT_ANSI__ && __SIZEOF_INT128__]
(numeric_limits<__int128>, (numeric_limits):
Define.
* testsuite/std/ranges/iota/96042.cc: New test.


The attached patch is another change needed to support __int128 as an
integer-like type in strict mode.


And one more piece of __int128 support.

Tested x86_64-linux, -m32 and -m64. Committed to trunk.

I'll backport this to gcc-10 too.


commit 5e9ad288eb6fb366142b166e7985d16727b398e1
Author: Jonathan Wakely 
Date:   Thu Aug 20 19:41:15 2020

libstdc++: Make incrementable<__int128> satisfied in strict mode

This adds specializations of std::incrementable_traits so that 128-bit
integers are always considered incrementable (and therefore usable with
std::ranges::iota_view) even when they don't satisfy std::integral.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h [__STRICT_ANSI__]
(incrementable_traits<__int128>): Define specialization.
(incrementable_traits): Likewise.
* testsuite/std/ranges/iota/96042.cc: Test iota_view with
__int128.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h
index 5033f2bddc3..bd6660c5f22 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -173,6 +173,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	= make_signed_t() - std::declval<_Tp>())>;
 };
 
+#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
+  // __int128 is incrementable even if !integral<__int128>
+  template<>
+struct incrementable_traits<__int128>
+{ using difference_type = __int128; };
+
+  template<>
+struct incrementable_traits
+{ using difference_type = __int128; };
+#endif
+
   namespace __detail
   {
 // An iterator such that iterator_traits<_Iter> names a specialization
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/96042.cc b/libstdc++-v3/testsuite/std/ranges/iota/96042.cc
index 6f5c8f61fd2..911663bc413 100644
--- a/libstdc++-v3/testsuite/std/ranges/iota/96042.cc
+++ b/libstdc++-v3/testsuite/std/ranges/iota/96042.cc
@@ -24,8 +24,13 @@ void
 test01()
 {
   // PR libstdc++/96042
-  using V = std::ranges::iota_view;
+  using V = std::ranges::iota_view;
+
+  // In strict -std=c++20 mode there is no integer wider than long long,
+  // so V's difference type is an integer-class type, [iterator.concept.winc].
+  // In practice this is either __int128 or __detail::__max_diff_type.
   using D = std::ranges::range_difference_t;
+  // Ensure that numeric_limits is correctly specialized for the type.
   using L = std::numeric_limits;
   static_assert( L::is_specialized );
   static_assert( L::is_signed );
@@ -37,3 +42,24 @@ test01()
   static_assert( L::max() == ~L::min() );
   static_assert( L::lowest() == L::min() );
 }
+
+#ifdef __SIZEOF_INT128__
+void
+test02()
+{
+  // When the target supports __int128 it can be used in iota_view
+  // even in strict mode where !integral<__int128>.
+  using V = std::ranges::iota_view<__int128, __int128>;
+  using D = std::ranges::range_difference_t; // __detail::__max_diff_type
+  using L = std::numeric_limits;
+  static_assert( L::is_specialized );
+  static_assert( L::is_signed );
+  static_assert( L::is_integer );
+  static_assert( L::is_exact );
+  static_assert( L::digits > std::numeric_limits::digits );
+  static_assert( L::digits10 == static_cast(L::digits * 0.30103) );
+  static_assert( L::min() == (D(1) << L::digits) );
+  static_assert( L::max() == ~L::min() );
+  static_assert( L::lowest() == L::min() );
+}
+#endif


[Patch, fortran] PR fortran/96727 - ICE with character length specified using specification function on assumed-rank array

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR96727 - ICE with character length specified using 
specification function on assumed-rank array.


Patch tested only on x86_64-pc-linux-gnu.

Add missing default error message for the assumed-rank array case.

Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96727
 * expr.c (gfc_check_init_expr): Add default error message for the
 AS_ASSUMED_RANK case.

2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96727
 * PR96727.f90: New test.
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 6707ca5..aecbe46 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3007,6 +3007,12 @@ gfc_check_init_expr (gfc_expr *e)
 			   e->symtree->n.sym->name, >where);
 		break;
 
+	  case AS_ASSUMED_RANK:
+		gfc_error ("Assumed-rank array %qs at %L is not permitted "
+			   "in an initialization expression",
+			   e->symtree->n.sym->name, >where);
+		break;
+
 	  default:
 		gcc_unreachable();
 	  }
diff --git a/gcc/testsuite/gfortran.dg/PR96727.f90 b/gcc/testsuite/gfortran.dg/PR96727.f90
new file mode 100644
index 000..d45dbb7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR96727.f90
@@ -0,0 +1,34 @@
+! { dg-do run }
+!
+! Test the fix for PR96727
+!
+
+program cref_p
+
+  implicit none
+  
+  integer :: i
+
+  integer,  parameter :: n = 3
+  integer,  parameter :: p(*) = [(i, i=1,n*n)]
+  character(len=*), parameter :: q = repeat('a', n*n)
+  
+  integer:: a(n,n)
+  character(len=n*n) :: c
+
+  a = reshape(p, shape=[n,n])
+  call csub(a, c)
+  if (c/=q) stop 1
+  stop
+
+contains
+
+  subroutine csub(a, b)
+integer,intent(in)  :: a(..)
+character(len=size(a)), intent(out) :: b
+
+b = repeat('a', len(b))
+return
+  end subroutine csub
+  
+end program cref_p


[Patch, fortran] PR fortran/96726 - ICE with user defined specification function on assumed-rank array

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR96726 - ICE with user defined specification function 
on assumed-rank array.


Patch tested only on x86_64-pc-linux-gnu.

Obvious fix, replace different operator with less than to avoid infinite 
loop.


Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96726
 * expr.c (check_references): Change different relational operator to
 less-than operator to avoid infinite loop.

2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96726
 * PR96726.f90: New test.
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 6707ca5..2ef01f0 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3273,7 +3273,7 @@ check_references (gfc_ref* ref, bool (*checker) (gfc_expr*))
   switch (ref->type)
 {
 case REF_ARRAY:
-  for (dim = 0; dim != ref->u.ar.dimen; ++dim)
+  for (dim = 0; dim < ref->u.ar.dimen; ++dim)
 	{
 	  if (!checker (ref->u.ar.start[dim]))
 	return false;
diff --git a/gcc/testsuite/gfortran.dg/PR96726.f90 b/gcc/testsuite/gfortran.dg/PR96726.f90
new file mode 100644
index 000..b0b26b9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR96726.f90
@@ -0,0 +1,72 @@
+! { dg-do run }
+!
+! Test the fix for PR96726
+!
+
+module cref_m
+
+  implicit none
+
+  private
+
+  public ::   &
+sizeish
+  
+contains
+
+  pure function sizeish(a) result(s)
+integer, intent(in) :: a(..)
+
+integer :: s
+
+s = size(a)
+return
+  end function sizeish
+  
+end module cref_m
+
+program cref_p
+
+  use cref_m, only: &
+sizeish
+
+  implicit none
+  
+  integer:: i
+
+  integer, parameter :: n = 3
+  integer, parameter :: p(*) = [(i, i=1,n*n)]
+  
+  integer :: a(n,n)
+  integer :: b(n*n)
+
+  a = reshape(p, shape=[n,n])
+  call isub_a(a, b)
+  if (any(b/=p)) stop 1
+  call isub_b(a, b)
+  if (any(b/=p)) stop 2
+  stop
+
+contains
+
+  subroutine isub_a(a, b)
+integer, intent(in)  :: a(..)
+integer, intent(out) :: b(size(a))
+
+integer :: i
+
+b = [(i, i=1,size(b))]
+return
+  end subroutine isub_a
+  
+  subroutine isub_b(a, b)
+integer, intent(in)  :: a(..)
+integer, intent(out) :: b(sizeish(a))
+
+integer :: i
+
+b = [(i, i=1,sizeish(b))]
+return
+  end subroutine isub_b
+  
+end program cref_p


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-20 Thread Segher Boessenkool
On Thu, Aug 20, 2020 at 04:19:36PM +, GT wrote:
> > Great! Please repost with what I already pointed out fixed, that
> > explanation added, and working links to the documentation?
> 
> Are you ok with the titles of the patch and this document?
> 
> https://sourceware.org/glibc/wiki/HomePage?action=AttachFile=view=powerarchvectfuncabi.html

It is very misleading.  You can undo some of the damage in the first
lines of the commit message, but you can also just fix the title itself,
so that anyone can see what this is about even before reading the
message (which is what a mail subject is *for*!)


Segher


Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Richard Earnshaw (lists)
On 20/08/2020 18:07, Vaseeharan Vinayagamoorthy wrote:
> Hi Szabolcs,
> 
> In the top level gcc config.log, I see:
> 
> configure:5541: checking whether aarch64-none-linux-gnu-g++ supports C++11 
> features by default
> configure:5837: aarch64-none-linux-gnu-g++ -c -g -O2  conftest.cpp >&5
> configure:5837: $? = 0
> configure:5844: result: yes
> configure:6542: checking whether g++ supports C++11 features by default
> configure:6845: result: yes
> 
> Not sure whether that helps?
> 

I suspect this is a host/build confusion issue.  Are you sure that's the
compiler used for BUILD?

R.

> Regards
> Vasee
> 
> 
> On 20/08/2020, 16:26, "Szabolcs Nagy"  wrote:
> 
> The 08/20/2020 13:59, Vasee Vinayagamoorthy wrote:
> > +# Also require C++11 for building code generation tools.
> > +# Do nothing if "${build}" = "${host}", because in this case
> > +# CXX_FOR_BUILD="\$(CXX)", and $CXX is already set to the correct 
> value above.
> > +if test "${build}" != "${host}"; then
> > +  saved_CXX=$CXX
> > +  saved_CXXCPP=$CXXCPP
> > +  CXX=$CXX_FOR_BUILD
> > +  CXXCPP=
> > +  AX_CXX_COMPILE_STDCXX(11)
> > +  CXX="$CXX -std=c++11"
> > +  CXX_FOR_BUILD=$CXX
> > +  CXX=$saved_CXX
> > +  CXXCPP=$saved_CXXCPP
> > +fi
> 
> i think AX_CXX_COMPILE_STDCXX(11) should
> set CXX correctly (it seems it would set
> it to "g++ -std=gnu++11" instead of
> "g++ -std=c++11" but either should work)
> 
> please look at the top level config.log
> i think you should look for
> 
> "checking whether g++ supports C++11 features with -std=gnu++11"
> 
> and that check should be successful.
> 



Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Hans-Peter Nilsson via Gcc-patches
> From: Richard Sandiford 
> Date: Thu, 20 Aug 2020 10:30:56 +0200

> Anything I once knew about reorg.c has long since faded away, but since
> noone else has reviewed it...

Thanks.  I forgot to add PATCH and/or RFA: in the subject and
forgot to CC Eric, assuming he's interested (I did CC him as a
heads-up that this was coming, that's why I added you now, Eric).

> Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
> reliable during reorg.c?  It was written at a time when passes were
> expected to keep the notes up-to-date, but that's not true these days.
> My worry is that they might be mostly right, but just stale enough
> to be harmful in corner cases.

They are depended upon in reorg.c as absolutely accurate (and
kept updated), and there's IMO enough empirical evidence that
they still are.  In this particular patch, I'm using it in the
same way Eric used it in 33c2207d3fda for
fill_simple_delay_slots (i.e. it would be similarly flawed), but
it's also used all over for the old-style dataflow analysis done
here.  We'd be seeing failing for those "corner cases" all
around delayed-branch targets if that didn't work.  I should be
able to use the existing machinery in this patch.

BTW, I happened to notice that bugs here are also somewhat more
visible than your ordinary wrong-result bug. :)

> Hans-Peter Nilsson via Gcc-patches  writes:
> > Originally I thought to bootstrap this patch on MIPS and SPARC
> > since they're both delayed-branch-slot targets but I
> > reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> > seems only visium and CRIS has this feature set, and I see no
> > trace of visium in neither newlib nor the simulator next to
> > glibc.  So, I just tested cris-elf.
> >
> > This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> > fillers using a method similar to that in commit 33c2207d3fda,
> > where care was taken for fill_simple_delay_slots to allow such
> > insns when scanning for delay-slot fillers *backwards* (before
> > the insn).
> >
> > A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> > For cc0 targets, insns don't mention clobbering cc0, so the
> > clobbers are mentioned in the "resources" only as a special
> > entity and only for compare-insns and branches, where the cc0
> > value matters.
> >
> > In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> > the register liveness detection in reorg.c / resource.c treats
> > that as a blocker (for other insns mentioning it, i.e. most)
> > when looking for delay-slot-filling candidates.  This means that
> > when comparing core and performance for a delay-slot cc0 target
> > before and after the de-cc0 conversion, the inability to fill a
> > delay slot after conversion manifests as a regression.  This was
> > one such case, for CRIS, with random_bitstring in
> > gcc.c-torture/execute/arith-rand-ll.c as well as the target
> > libgcc division function.
> >
> > After this, all known performance regressions compared to cc0
> > are fixed.
> >
> > Ok to commit?
> >
> > gcc:
> > PR target/93372
> > * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> > TARGET_FLAGS_REGNUM as delay-slot fillers.
> >
> > gcc/testsuite:
> > PR target/93372
> > * gcc.target/cris/pr93372-47.c: New test.
> > ---
> >  gcc/reorg.c| 37 +-
> >  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 
> > ++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
> >
> > diff --git a/gcc/reorg.c b/gcc/reorg.c
> > index dfd7494bf..83161caa0 100644
> > --- a/gcc/reorg.c
> > +++ b/gcc/reorg.c
> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> > condition,
> >CLEAR_RESOURCE ();
> >CLEAR_RESOURCE ();
> >  
> > +  /* Handle the flags register specially, to be able to accept a
> > + candidate that clobbers it.  See also fill_simple_delay_slots.  */
> > +  bool filter_flags
> > += (slots_to_fill == 1
> > +   && targetm.flags_regnum != INVALID_REGNUM
> > +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> > +  struct resources fset;
> > +  struct resources flags_res;
> > +  if (filter_flags)
> > +{
> > +  CLEAR_RESOURCE ();
> > +  CLEAR_RESOURCE (_res);
> > +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> > +}
> > +
> >/* If we do not own this thread, we must stop as soon as we find
> >   something that we can't put in a delay slot, since all we can do
> >   is branch into THREAD at a later point.  Therefore, labels stop
> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> > condition,
> >/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
> >  don't separate or copy insns that set and use CC0.  */
> >if (! insn_references_resource_p (trial, , true)
> > - && ! 

Re: [Patch] configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

2020-08-20 Thread Joseph Myers
On Thu, 13 Aug 2020, Tobias Burnus wrote:

> diff --git a/config/ax_cxx_compile_stdcxx.m4 b/config/ax_cxx_compile_stdcxx.m4
> index 9413da624d2..0cd515fc65b 100644
> --- a/config/ax_cxx_compile_stdcxx.m4
> +++ b/config/ax_cxx_compile_stdcxx.m4
> @@ -25,6 +25,10 @@
>  #   regardless, after defining HAVE_CXX${VERSION} if and only if a
>  #   supporting mode is found.
>  #
> +#   If the fourth argument is the CXX/CXXFLAG/CPPFLAG suffix, e.g.
> +#   "_FOR_BUILD".

It appears you're requiring _FOR_BUILD here and considering other suffixes 
invalid, which would prevent any other use, e.g. _FOR_TARGET.

When building GCC, _FOR_TARGET is of course irrelevant because the 
top-level build support in the source tree is only intended to work with 
the version of GCC in that source tree so can assume what language 
features it supports.  It's less clear that no other suffix will ever be 
relevant elsewhere, given that this is autoconf-archive code rather than 
just used by GCC.

> +  m4_if([$4], [], [],
> +[$4], [_FOR_BUILD], [],
> +[m4_fatal([invalid fourth argument `$4' to 
> AX_CXX_COMPILE_STDCXX])])dnl

So I'm not convinced this check that the suffix should be empty or 
_FOR_BUILD is a good idea.

> +  m4_if([$4], [_FOR_BUILD],
> +[ax_cv_cxx_compile_cxx$1_orig_cxx="$CXX"
> + ax_cv_cxx_compile_cxx$1_orig_cxxflags="$CXXFLAGS"
> + ax_cv_cxx_compile_cxx$1_orig_cppflags="$CPPFLAGS"
> + CXX="$CXX$4"
> + CXXFLAGS="$CXXFLAGS$4"
> + CPPFLAGS="$CPPFLAGS$4"])

And then it might be better for this to be a check for the suffix not 
being empty, rather than it being exactly _FOR_BUILD (even if you keep the 
check that other suffixes are invalid, cutting down the number of places 
hardcoding _FOR_BUILD seems a good idea).

>m4_if([$2], [], [dnl
>  AC_CACHE_CHECK(whether $CXX supports C++$1 features by default,
> -ax_cv_cxx_compile_cxx$1,
> +ax_cv_cxx_compile_cxx$1$4,
>
> [AC_COMPILE_IFELSE([AC_LANG_SOURCE([_AX_CXX_COMPILE_STDCXX_testbody_$1])],
>  [ax_cv_cxx_compile_cxx$1=yes],
>  [ax_cv_cxx_compile_cxx$1=no])])

I think this needs to update the variable name in the assignments of the 
result of the check, and then in the subsequent check for whether to set 
ac_success=yes, not just in the second argument to AC_CACHE_CHECK.

> +  m4_if([$4], [_FOR_BUILD],
> +[CXX$4="$CXX"
> + CXXFLAGS$4="$CXXFLAGS"
> + CPPFLAGS$4="$CPPFLAGS"
> + CXX="$ax_cv_cxx_compile_cxx$1_orig_cxx"
> + CXXFLAGS="$ax_cv_cxx_compile_cxx$1_orig_cxxflags"
> + CPPFLAGS="$ax_cv_cxx_compile_cxx$1_orig_cppflags"])

I think this also would be better checking for $4 not being empty rather 
than for it being exactly _FOR_BUILD.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Vaseeharan Vinayagamoorthy
Hi Szabolcs,

In the top level gcc config.log, I see:

configure:5541: checking whether aarch64-none-linux-gnu-g++ supports C++11 
features by default
configure:5837: aarch64-none-linux-gnu-g++ -c -g -O2  conftest.cpp >&5
configure:5837: $? = 0
configure:5844: result: yes
configure:6542: checking whether g++ supports C++11 features by default
configure:6845: result: yes

Not sure whether that helps?

Regards
Vasee


On 20/08/2020, 16:26, "Szabolcs Nagy"  wrote:

The 08/20/2020 13:59, Vasee Vinayagamoorthy wrote:
> +# Also require C++11 for building code generation tools.
> +# Do nothing if "${build}" = "${host}", because in this case
> +# CXX_FOR_BUILD="\$(CXX)", and $CXX is already set to the correct value 
above.
> +if test "${build}" != "${host}"; then
> +  saved_CXX=$CXX
> +  saved_CXXCPP=$CXXCPP
> +  CXX=$CXX_FOR_BUILD
> +  CXXCPP=
> +  AX_CXX_COMPILE_STDCXX(11)
> +  CXX="$CXX -std=c++11"
> +  CXX_FOR_BUILD=$CXX
> +  CXX=$saved_CXX
> +  CXXCPP=$saved_CXXCPP
> +fi

i think AX_CXX_COMPILE_STDCXX(11) should
set CXX correctly (it seems it would set
it to "g++ -std=gnu++11" instead of
"g++ -std=c++11" but either should work)

please look at the top level config.log
i think you should look for

"checking whether g++ supports C++11 features with -std=gnu++11"

and that check should be successful.



[Patch, fortran] PR fortran/96724 - Bogus warnings with the repeat intrinsic and the flag -Wconversion-extra

2020-08-20 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR96724 - Bogus warnings with the repeat intrinsic and 
the flag -Wconversion-extra.


Patch tested only on x86_64-pc-linux-gnu.

Add code to force conversion to the default wider integer type before 
multiplication.


Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96724
 * iresolve.c (gfc_resolve_repeat): Force conversion to
 gfc_index_integer_kind before the call to gfc_multiply.

2020-8-20  José Rui Faustino de Sousa  

 PR fortran/96724
 * repeat_8.f90.f90: New test.
diff --git a/gcc/fortran/iresolve.c b/gcc/fortran/iresolve.c
index 7376961..74075a7 100644
--- a/gcc/fortran/iresolve.c
+++ b/gcc/fortran/iresolve.c
@@ -2332,7 +2332,22 @@ gfc_resolve_repeat (gfc_expr *f, gfc_expr *string,
 }
 
   if (tmp)
-f->ts.u.cl->length = gfc_multiply (tmp, gfc_copy_expr (ncopies));
+{
+  gfc_expr *e = gfc_copy_expr (ncopies);
+
+  /* Force-convert to index_kind so that we don't need
+	 so many runtime variations.  */
+  if (e->ts.kind != gfc_index_integer_kind)
+	{
+	  gfc_typespec ts = e->ts;
+
+	  ts.kind = gfc_index_integer_kind;
+	  gfc_convert_type_warn (e, , 2, 0);
+	}
+  if (tmp->ts.kind != gfc_index_integer_kind)
+	gfc_convert_type_warn (tmp, >ts, 2, 0);
+  f->ts.u.cl->length = gfc_multiply (tmp, e);
+}
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/repeat_8.f90 b/gcc/testsuite/gfortran.dg/repeat_8.f90
new file mode 100644
index 000..6876af9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/repeat_8.f90
@@ -0,0 +1,88 @@
+! { dg-do compile }
+! { dg-additional-options "-Wconversion-extra" }
+!
+! Test fix for PR96724
+!
+
+program repeat_p
+
+  use, intrinsic :: iso_fortran_env, only: &
+int8, int16, int32, int64
+  
+  implicit none
+
+  integer, parameter :: n = 20
+
+  integer(kind=int8),  parameter :: p08 = int(n, kind=int8)
+  integer(kind=int16), parameter :: p16 = int(n, kind=int16)
+  integer(kind=int16), parameter :: p32 = int(n, kind=int32)
+  integer(kind=int16), parameter :: p64 = int(n, kind=int64)
+  
+  integer(kind=int8)  :: i08
+  integer(kind=int16) :: i16
+  integer(kind=int32) :: i32
+  integer(kind=int64) :: i64
+  
+  character(len=n) :: c
+
+  i08 = p08
+  c = repeat('X', 20_int8)
+  c = repeat('X', i08)
+  c = repeat('X', p08)
+  c = repeat('X', len08(c))
+  i16 = p16
+  c = repeat('X', 20_int16)
+  c = repeat('X', i16)
+  c = repeat('X', p16)
+  c = repeat('X', len16(c))
+  i32 = p32
+  c = repeat('X', 20_int32)
+  c = repeat('X', i32)
+  c = repeat('X', p32)
+  c = repeat('X', len32(c))
+  i64 = p64
+  c = repeat('X', 20_int64)
+  c = repeat('X', i64)
+  c = repeat('X', p64)
+  c = repeat('X', len64(c))
+  stop
+
+contains
+
+  function len08(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int8) :: l
+
+l = int(len(x), kind=int8)
+return
+  end function len08
+  
+  function len16(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int16) :: l
+
+l = int(len(x), kind=int16)
+return
+  end function len16
+  
+  function len32(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int32) :: l
+
+l = int(len(x), kind=int32)
+return
+  end function len32
+  
+  function len64(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int64) :: l
+
+l = int(len(x), kind=int64)
+return
+  end function len64
+  
+end program repeat_p


Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Tobias Burnus

Hi,

how about my (unreviewed) patch for PR 96612 at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551900.html ?

That one also tries to solve the CXX_FOR_BUILD issue with an older GCC.
Would that solve your issue as well?

Tobias

On 8/20/20 4:19 PM, Ilya Leoshkevich via Gcc-patches wrote:


On Thu, 2020-08-20 at 13:59 +0100, Vasee Vinayagamoorthy wrote:

Hello,

After commit [1] ("Redefine NULL to nullptr"), building gcc
fails when $CXX_FOR_BUILD is not using C++11 mode by default.
This happens with gcc-4.8 which is still supported.

This patch fixes this by adding -std=c++11 or its equivalent
to $CXX_FOR_BUILD using AX_CXX_COMPILE_STDCXX(11).

Tested by successful cross native build for aarch64-none-linux-gnu
target.

OK for trunk?


ChangeLog:

2020-08-20  Ilya Leoshkevich  
 Vasee Vinayagamoorthy  <
vaseeharan.vinayagamoor...@arm.com>

 PR target/95700
 * configure: Regenerate.
 * configure.ac: Require C++11 for building code generation
tools.

Regards
Vasee Vinayagamoorthy


PS: I do not have commit rights, therefore could I request
someone to commit it on my behalf if this patch is approved.
Thanks in advance.

[1]
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=d59a576b8b5e12c3a56f0262912090e2921f5daa

Hi!

First of all, sorry about losing track of this issue.

Regarding your addition to the patch, I fear that

CXX="$CXX -std=c++11"

might break some rarely used compilers. But doing this portably is
exactly what AX_CXX_COMPILE_STDCXX(11) is for. Do you know why isn't it
working for you?

Best regards,
Ilya


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [Patch, committed] Fortran: Fix OpenMP's 'if(simd:' etc. conditions

2020-08-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 20, 2020 at 01:36:01PM +0200, Tobias Burnus wrote:
> gcc/fortran/ChangeLog:
> 
> * openmp.c (gfc_match_omp_clauses): Re-order 'if' clause pasing
> to avoid creating spurious symbols.
> 
> libgomp/ChangeLog:
> 
> * testsuite/libgomp.fortran/lastprivate-conditional-10.f90: New 
> test.

LGTM, thanks.

Jakub



Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-20 Thread GT via Gcc-patches
‐‐‐ Original Message ‐‐‐
On Tuesday, August 18, 2020 5:32 PM, Segher Boessenkool 
 wrote:

> On Tue, Aug 18, 2020 at 07:14:19PM +, GT wrote:
>
> > > That sounds like libmvec?
> > > I still don't know what this is.
> >
> > Yes, it is libmvec.
> > Now look at what GCC does to the code in Examples 1 and 2 at this link:
> > https://sourceware.org/glibc/wiki/libmvec
> > x86_64 added functionality to GCC so such code uses the new functions 
> > without the user
> > having to re-write the loops and explicitly call the new functions.
> > We are aiming to provide that same capability for PPC64 in GCC.
>
> Great! Please repost with what I already pointed out fixed, that
> explanation added, and working links to the documentation?
>

Are you ok with the titles of the patch and this document?

https://sourceware.org/glibc/wiki/HomePage?action=AttachFile=view=powerarchvectfuncabi.html

Bert.


Re: [PATCH] Fortran : get_environment_variable runtime error PR96486

2020-08-20 Thread Thomas Koenig via Gcc-patches

Hi Mark,

Please find attached a fix for PR96486.

OK to commit?


OK. Thanks for the patch!

Best regards

Thomas



Re: [Patch] Fortran: Add 'device_type' clause to OpenMP's declare target

2020-08-20 Thread Andre Vehreschild
Hi Tobias,

to me this looks OK now.

Regards,
Andre

On Thu, 20 Aug 2020 11:51:50 +0200
Tobias Burnus  wrote:

> Updated patch – taking Andre's suggestions into account +
> extending the testcase, which now catches the previous (NO)HOST
> module issue.
> 
> OK?
> 
> Tobias
> 
> On 8/19/20 2:51 PM, Tobias Burnus wrote:
> > Am 18.08.20 um 19:33 schrieb Andre Vehreschild:  
> >> +case OMP_DEVICE_TYPE_HOST:
> >> +  MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_NOHOST, attr_bits);
> >> Why also NOHOST here?  
> > Copy and paste error.  
> ...
> >> +  tree clauses = NULL_TREE;
> >> Would you mind using "omp_clauses" or the like here?  
> Done now.
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-20 Thread Palmer Dabbelt

On Wed, 19 Aug 2020 02:25:37 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Hi Andrew:

I am not sure the reason why some targets pick different numbers.
It seems it's not only target dependent but also OS dependent[1].

For RV32, I think using 1<<29 like other 32 bit targets is fine.

[1] 
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/asan/asan_mapping.h#L159

Hi Joshua:

Could you update that for RV32, and this patch will be pending until
LLVM accepts the libsanitizer part.


This is ABI, and Linux only supports kasan on rv64 right now so it's
technically undefined.  It's probably best to avoid picking an arbitrary number
for rv32, as we still have some open questions WRT the kernel memory map over
there.  I doubt that will get sorted out for a while, as the rv32 doesn't get a
lot of attention (though hopefully the glibc stuff will help out).


On Wed, Aug 19, 2020 at 4:48 PM Andrew Waterman  wrote:


I'm having trouble understanding why different ports chose their
various constants--e.g., SPARC uses 1<<29 for 32-bit and 1<<43 for
64-bit, whereas x86 uses 1<<29 and 0x7fff8000, respectively.  So I
can't comment on the choice of the constant 1<<36 for RISC-V.  But
isn't it a problem that 1<<36 is not a valid Pmode value for ILP32?


This is for kasan (not regular asan), which requires some coordination between
the kernel's memory map and the compiler's inline address sanitizer (as you
can't just pick your own memory map).  Essentially what's going on is that
there's an array of valid tags associated with each address, which is checked
in-line by the compiler for performance reasons (IIRC it used to be library
routines).  The compiler needs to know how to map between addresses and tags,
which depends on the kernel's memory map -- essentially baking the kernel's
memory map into the compiler.  That's why the constants seem somewhat
arbitrary.

In order to save memory there's some lossyness in the address->tag mapping.
Most 32-bit ports pick a tag array that's 1/8th of the memory size, which is
where the 29 comes from.  I don't see any reason why that wouldn't be workable
on rv32, but it seems better to make sure that's the case rather than just
making up an ABI :)


On Wed, Aug 19, 2020 at 1:02 AM Joshua via Gcc-patches
 wrote:
>
> From: cooper.joshua 
>
> gcc/
>
> * config/riscv/riscv.c (asan_shadow_offset): Implement the offset of 
asan shadow memory for risc-v.
> (asan_shadow_offset): new macro definition.
> ---
>
>  gcc/config/riscv/riscv.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 63b0c38..b85b459 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
>return true;
>  }
>
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  return HOST_WIDE_INT_1U << 36;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> --
> 2.7.4
>


Re: [Patch, fortran, coarray] Fix obvious typo in co_broadcast's argument assembly

2020-08-20 Thread Andre Vehreschild
Hi,

commited with the input by Tobias applied. The full commit message now is:

Fix obvious typo were errmsg_len was assigned to errmsg.

gcc/fortran/ChangeLog:

2020-08-20  Andre Vehreschild  

PR fortran/94958
* trans-array.c (gfc_bcast_alloc_comp): Use the correct variable.

Regards,
Andre

On Tue, 18 Aug 2020 19:27:50 +0200
Andre Vehreschild  wrote:

> Hi Tobias,
> 
> On Tue, 18 Aug 2020 19:14:30 +0200
> Tobias Burnus  wrote:
> 
> > On 8/18/20 7:04 PM, Andre Vehreschild wrote:
> >   
> > > attached patch fixes an obvious typo in the routine gathering arguments
> > > for co_broadcast().  See pr94958 for a detailed analysis, please.
> > 
> > LGTM – except that I do not like the ChangeLog entry.
> > 
> > It sounds like a mispelling in terms of a comment or
> > error message. How about "Using the correct variable."
> > – or something like that?  
> 
> That's a good idea. Will use that.
> 
> > You could also consider adding a libcaf_single test case,
> > given that you wrote one (see PR)...  
> 
> Well, the test case in the PR does not test the issue, only with additional
> modifications of trans-array one may see an impact in the pseudo code.
> Alternatively one has to do a lot more of code generation aggregating the
> results of the broadcasts of the different components.  Given this is not
> defined in the standard, I am not sure what to do here. And therefore just
> wanted to correct the "miss-assignment" allowing future correct code
> generation.
> 
> Regards,
>   Andre
> > 
> > Thanks for the patch!
> > 
> > Tobias
> >   
> > > gcc/fortran/ChangeLog:
> > >
> > > 2020-08-18  Andre Vehreschild
> > >
> > >   PR fortran/94958
> > >   * trans-array.c (gfc_bcast_alloc_comp): Fix typo.
> > >
> > >
> > > pr94958.patch
> > >
> > > diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
> > > index 7a1b2fc74c9..73a45cd2dcf 100644
> > > --- a/gcc/fortran/trans-array.c
> > > +++ b/gcc/fortran/trans-array.c
> > > @@ -9732,7 +9732,7 @@ gfc_bcast_alloc_comp (gfc_symbol *derived, gfc_expr
> > > *expr, int rank, args.image_index = image_index;
> > > args.stat = stat;
> > > args.errmsg = errmsg;
> > > -  args.errmsg = errmsg_len;
> > > +  args.errmsg_len = errmsg_len;
> > >
> > > if (rank == 0)
> > >   {
> > -
> > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> > Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> > Alexander Walter  
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-08-20 Thread Antony Polukhin via Gcc-patches
ср, 19 авг. 2020 г. в 14:29, Jonathan Wakely :
<...>
> Do we also want to check
> (std::__is_complete_or_unbounded(__type_identity<_ArgTypes>{}) && ...)
> for invoke_result and the is_invocable traits?

Done.

Changelog:

2020-08-20  Antony Polukhin  

PR libstdc/71579
* include/std/type_traits (invoke_result, is_invocable, is_invocable_r)
(is_nothrow_invocable, is_nothrow_invocable_r): Add static_asserts
to make sure that the arguments of the type traits are not misused
with incomplete types.`
* testsuite/20_util/invoke_result/incomplete_args_neg.cc: New test.
* testsuite/20_util/is_invocable/incomplete_args_neg.cc: New test.
* testsuite/20_util/is_invocable/incomplete_neg.cc: New test.
* testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc: New test.
* testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc: Check for
error on incomplete response type usage in trait.


-- 
Best regards,
Antony Polukhin
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 6ced781..bf2260a 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2965,6 +2965,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   
static_assert(std::__is_complete_or_unbounded(__type_identity<_Functor>{}),
"_Functor must be a complete class or an unbounded array");
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_ArgTypes>{}) && ...),
+   "Argument types must be a complete class or an unbounded array");
 };
 
   /// std::invoke_result_t
@@ -2978,6 +2981,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_ArgTypes>{}) && ...),
+   "Argument types must be a complete class or an unbounded array");
 };
 
   /// std::is_invocable_r
@@ -2987,6 +2993,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_ArgTypes>{}) && ...),
+   "Argument types must be a complete class or an unbounded array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Ret>{}),
+   "_Ret must be a complete class or an unbounded array");
 };
 
   /// std::is_nothrow_invocable
@@ -2997,6 +3008,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_ArgTypes>{}) && ...),
+   "Argument types must be a complete class or an unbounded array");
 };
 
   template
@@ -3017,6 +3031,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
+  static_assert((std::__is_complete_or_unbounded(
+   __type_identity<_ArgTypes>{}) && ...),
+   "Argument types must be a complete class or an unbounded array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Ret>{}),
+   "_Ret must be a complete class or an unbounded array");
 };
 
   /// std::is_invocable_v
diff --git 
a/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_args_neg.cc 
b/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_args_neg.cc
new file mode 100644
index 000..a35ff4c
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_args_neg.cc
@@ -0,0 +1,47 @@
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-error "must be a complete class" "" { target *-*-* } 0 }
+
+#include 
+
+class X;
+
+void test01()
+{
+  std::invoke_result(); // { dg-error "required from 
here" }
+  std::invoke_result();   // { dg-error "required 
from here" }
+  std::invoke_result();   // { dg-error "required 
from here" 

Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Szabolcs Nagy
The 08/20/2020 13:59, Vasee Vinayagamoorthy wrote:
> +# Also require C++11 for building code generation tools.
> +# Do nothing if "${build}" = "${host}", because in this case
> +# CXX_FOR_BUILD="\$(CXX)", and $CXX is already set to the correct value 
> above.
> +if test "${build}" != "${host}"; then
> +  saved_CXX=$CXX
> +  saved_CXXCPP=$CXXCPP
> +  CXX=$CXX_FOR_BUILD
> +  CXXCPP=
> +  AX_CXX_COMPILE_STDCXX(11)
> +  CXX="$CXX -std=c++11"
> +  CXX_FOR_BUILD=$CXX
> +  CXX=$saved_CXX
> +  CXXCPP=$saved_CXXCPP
> +fi

i think AX_CXX_COMPILE_STDCXX(11) should
set CXX correctly (it seems it would set
it to "g++ -std=gnu++11" instead of
"g++ -std=c++11" but either should work)

please look at the top level config.log
i think you should look for

"checking whether g++ supports C++11 features with -std=gnu++11"

and that check should be successful.


Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-20 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 20, 2020 at 3:31 PM Joe Ramsay  wrote:
>
> Hi Ramana,
>
> Thanks for the review.
>
> On 18/08/2020, 18:37, "Ramana Radhakrishnan"  
> wrote:
>
> On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
> >
> > From: Joe Ramsay 
> >
> > Hi,
> >
> > Previously, the machine description patterns for vst1q accepted a 
> generic memory
> > operand for the destination, which could lead to an unrecognised 
> builtin when
> > expanding vst1q* intrinsics. This change fixes the patterns to only 
> accept MVE
> > memory operands.
>
> This is OK though I suspect this needs a PR and a backport request for 
> GCC 10.
>
> There's now a PR for this, 96683. I've attached an updated patch file, the 
> only change is
> that I've included the PR number in the changelog. Please let me know if this 
> is OK for
> trunk.

Yep absolutely fine - FTR such fixes to changelogs with PR numbers and
administrativia count as obvious and can just be applied.

Ramana

>
> Thanks,
> Joe
>
> regards
> Ramana
>
> >
> > Thanks,
> > Joe
> >
> > gcc/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * config/arm/mve.md (mve_vst1q_f): Require MVE memory 
> operand for
> > destination.
> > (mve_vst1q_): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only 
> MVE
> > memory operand is accepted.
> > * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> > ---
> >  gcc/config/arm/mve.md   |  4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
> >  6 files changed, 37 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 9758862..465b39a 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -9330,7 +9330,7 @@
> >[(set_attr "length" "4")])
> >
> >  (define_expand "mve_vst1q_f"
> > -  [(match_operand: 0 "memory_operand")
> > +  [(match_operand: 0 "mve_memory_operand")
> > (unspec: [(match_operand:MVE_0 1 "s_register_operand")] 
> VST1Q_F)
> >]
> >"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> > @@ -9340,7 +9340,7 @@
> >  })
> >
> >  (define_expand "mve_vst1q_"
> > -  [(match_operand:MVE_2 0 "memory_operand")
> > +  [(match_operand:MVE_2 0 "mve_memory_operand")
> > (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
> >]
> >"TARGET_HAVE_MVE"
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > index 363b4ca..312b746 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
> >vst1q_f16 (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > -
> >  void
> >  foo1 (float16_t * addr, float16x8_t value)
> >  {
> >vst1q (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> > +
> > +void
> > +foo2 (float16_t a, float16x8_t x)
> > +{
> > +  vst1q (, x);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > index 37c4713..cd14e2c 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
> >vst1q_s16 (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > -
> >  void
> >  foo1 (int16_t * addr, int16x8_t value)
> >  {
> >vst1q (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> > +
> > +void
> > +foo2 (int16_t a, int16x8_t x)
> > +{
> > +  vst1q (, x);
> > +}
> > diff 

Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-20 Thread Joe Ramsay
Hi Ramana,

Thanks for the review.

On 18/08/2020, 18:37, "Ramana Radhakrishnan"  wrote:

On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
memory
> operand for the destination, which could lead to an unrecognised builtin 
when
> expanding vst1q* intrinsics. This change fixes the patterns to only 
accept MVE
> memory operands.

This is OK though I suspect this needs a PR and a backport request for GCC 
10.

There's now a PR for this, 96683. I've attached an updated patch file, the only 
change is
that I've included the PR number in the changelog. Please let me know if this 
is OK for
trunk.

Thanks,
Joe

regards
Ramana

>
> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory 
operand for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only 
MVE
> memory operand is accepted.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] 
VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  

Re: [PATCH, nvptx, libgomp] Avoid use of GOMP_PLUGIN_acc_thread() in nvptx_free()

2020-08-20 Thread Chung-Lin Tang

Hi Tom,
thanks for the extremely quick review :)

On 2020/8/20 9:33 PM, Tom de Vries wrote:

I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is
returned
for such CUDA calls inside callback context,

Right, that's "Callbacks must not make any CUDA API calls. Attempting to
use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at
cuStreamAddCallback.  Perhaps mention more precisely where you found this.


I've added a little bit more in the comments in the final patch pushed.


and it appears to be enough
to replace
the current check, so the new code sees if this error is returned on the
first
cuMemGetAddressRange call to determine callback context. This should now
work
for both OpenACC/OpenMP.

(Tobias, Catherine, the earlier internal patch to re-organize this
callback context
checking did not work in general, since OpenACC also uses the
.queue_callback
functionality to free the struct target_mem_desc asynchronously, so in
general we
have to ensure nvptx_free() could be used under both normal/callback
context)

This patch has been libgomp tested for x86_64-linux with nvptx
offloading without
regressions, and should be applied for mainline and GCC10. Is this okay?


Ok, thanks for fixing this.


Just pushed to master, releases/gcc-10, and devel/omp/gcc-10.

Chung-Lin


Re: [PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2020-08-20 at 13:59 +0100, Vasee Vinayagamoorthy wrote:
> Hello,
> 
> After commit [1] ("Redefine NULL to nullptr"), building gcc
> fails when $CXX_FOR_BUILD is not using C++11 mode by default.
> This happens with gcc-4.8 which is still supported.
> 
> This patch fixes this by adding -std=c++11 or its equivalent
> to $CXX_FOR_BUILD using AX_CXX_COMPILE_STDCXX(11).
> 
> Tested by successful cross native build for aarch64-none-linux-gnu
> target.
> 
> OK for trunk?
> 
> 
> ChangeLog:
> 
> 2020-08-20  Ilya Leoshkevich  
> Vasee Vinayagamoorthy  <
> vaseeharan.vinayagamoor...@arm.com>
> 
>   PR target/95700
>   * configure: Regenerate.
>   * configure.ac: Require C++11 for building code generation
> tools.
> 
> Regards
> Vasee Vinayagamoorthy
> 
> 
> PS: I do not have commit rights, therefore could I request
> someone to commit it on my behalf if this patch is approved.
> Thanks in advance.
> 
> [1] 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=d59a576b8b5e12c3a56f0262912090e2921f5daa

Hi!

First of all, sorry about losing track of this issue.

Regarding your addition to the patch, I fear that

CXX="$CXX -std=c++11"

might break some rarely used compilers. But doing this portably is
exactly what AX_CXX_COMPILE_STDCXX(11) is for. Do you know why isn't it
working for you?

Best regards,
Ilya



Re: [PATCH, nvptx, libgomp] Avoid use of GOMP_PLUGIN_acc_thread() in nvptx_free()

2020-08-20 Thread Tom de Vries
On 8/20/20 3:03 PM, Chung-Lin Tang wrote:
> Hi Tom,
> this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a
> "GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under
> OpenMP offloading.
> 
> This check was originally used to determine if nvptx_free() was running
> under
> CUDA callback context, when freeing resources from an OpenACC
> asynchronous compute
> region. Since CUDA API calls are not allowed inside callback context, we
> have
> to save the freed block to ptx_dev->free_blocks, and cuMemFree it later.
> 
> The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal
> host thread
> vs. callback thread worked under -fopenacc, but since the OpenACC
> per-thread data
> is not created under -fopenmp, and always returned NULL, we have a leak
> situation
> where OpenMP offloading kept accumulating freed device memory blocks
> until failing;
> nvptx_free() never reaches the part where cuMemFree() is actually called.
> 
> I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is
> returned
> for such CUDA calls inside callback context,

Right, that's "Callbacks must not make any CUDA API calls. Attempting to
use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at
cuStreamAddCallback.  Perhaps mention more precisely where you found this.

> and it appears to be enough
> to replace
> the current check, so the new code sees if this error is returned on the
> first
> cuMemGetAddressRange call to determine callback context. This should now
> work
> for both OpenACC/OpenMP.
> 
> (Tobias, Catherine, the earlier internal patch to re-organize this
> callback context
> checking did not work in general, since OpenACC also uses the
> .queue_callback
> functionality to free the struct target_mem_desc asynchronously, so in
> general we
> have to ensure nvptx_free() could be used under both normal/callback
> context)
> 
> This patch has been libgomp tested for x86_64-linux with nvptx
> offloading without
> regressions, and should be applied for mainline and GCC10. Is this okay?
> 

Ok, thanks for fixing this.

- Tom

> Thanks,
> Chung-Lin
> 
> 2020-08-20  Chung-Lin Tang  
> 
> libgomp/
> * plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread
> () == NULL"
> test into check of CUDA_ERROR_NOT_PERMITTED status for
> cuMemGetAddressRange.
> Adjust comments.


> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index ec103a2f40b..188a34f1d04 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread 
> *thr, void *p)
>  }
>  
>  static bool
>  nvptx_free (void *p, struct ptx_device *ptx_dev)
>  {
> -  /* Assume callback context if this is null.  */
> -  if (GOMP_PLUGIN_acc_thread () == NULL)
> +  CUdeviceptr pb;
> +  size_t ps;
> +
> +  CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, , ,
> +   (CUdeviceptr) p);
> +  if (r == CUDA_ERROR_NOT_PERMITTED)
>  {
> +  /* We assume that this error indicates we are in a CUDA callback 
> context,
> +  where all CUDA calls are not allowed. Arrange to free this piece of
> +  device memory later.  */
>struct ptx_free_block *n
>   = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block));
>n->ptr = p;
>pthread_mutex_lock (_dev->free_blocks_lock);
>n->next = ptx_dev->free_blocks;
>ptx_dev->free_blocks = n;
>pthread_mutex_unlock (_dev->free_blocks_lock);
>return true;
>  }
> -
> -  CUdeviceptr pb;
> -  size_t ps;
> -
> -  CUDA_CALL (cuMemGetAddressRange, , , (CUdeviceptr) p);
> +  else if (r != CUDA_SUCCESS)
> +{
> +  GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r));
> +  return false;
> +}
>if ((CUdeviceptr) p != pb)
>  {
>GOMP_PLUGIN_error ("invalid device address");
>return false;
>  }


[PATCH, nvptx, libgomp] Avoid use of GOMP_PLUGIN_acc_thread() in nvptx_free()

2020-08-20 Thread Chung-Lin Tang

Hi Tom,
this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a
"GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under
OpenMP offloading.

This check was originally used to determine if nvptx_free() was running under
CUDA callback context, when freeing resources from an OpenACC asynchronous 
compute
region. Since CUDA API calls are not allowed inside callback context, we have
to save the freed block to ptx_dev->free_blocks, and cuMemFree it later.

The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal host 
thread
vs. callback thread worked under -fopenacc, but since the OpenACC per-thread 
data
is not created under -fopenmp, and always returned NULL, we have a leak 
situation
where OpenMP offloading kept accumulating freed device memory blocks until 
failing;
nvptx_free() never reaches the part where cuMemFree() is actually called.

I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is returned
for such CUDA calls inside callback context, and it appears to be enough to 
replace
the current check, so the new code sees if this error is returned on the first
cuMemGetAddressRange call to determine callback context. This should now work
for both OpenACC/OpenMP.

(Tobias, Catherine, the earlier internal patch to re-organize this callback 
context
checking did not work in general, since OpenACC also uses the .queue_callback
functionality to free the struct target_mem_desc asynchronously, so in general 
we
have to ensure nvptx_free() could be used under both normal/callback context)

This patch has been libgomp tested for x86_64-linux with nvptx offloading 
without
regressions, and should be applied for mainline and GCC10. Is this okay?

Thanks,
Chung-Lin

2020-08-20  Chung-Lin Tang  

libgomp/
* plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread () == 
NULL"
test into check of CUDA_ERROR_NOT_PERMITTED status for 
cuMemGetAddressRange.
Adjust comments.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index ec103a2f40b..188a34f1d04 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread *thr, 
void *p)
 }
 
 static bool
 nvptx_free (void *p, struct ptx_device *ptx_dev)
 {
-  /* Assume callback context if this is null.  */
-  if (GOMP_PLUGIN_acc_thread () == NULL)
+  CUdeviceptr pb;
+  size_t ps;
+
+  CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, , ,
+ (CUdeviceptr) p);
+  if (r == CUDA_ERROR_NOT_PERMITTED)
 {
+  /* We assume that this error indicates we are in a CUDA callback context,
+where all CUDA calls are not allowed. Arrange to free this piece of
+device memory later.  */
   struct ptx_free_block *n
= GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block));
   n->ptr = p;
   pthread_mutex_lock (_dev->free_blocks_lock);
   n->next = ptx_dev->free_blocks;
   ptx_dev->free_blocks = n;
   pthread_mutex_unlock (_dev->free_blocks_lock);
   return true;
 }
-
-  CUdeviceptr pb;
-  size_t ps;
-
-  CUDA_CALL (cuMemGetAddressRange, , , (CUdeviceptr) p);
+  else if (r != CUDA_SUCCESS)
+{
+  GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r));
+  return false;
+}
   if ((CUdeviceptr) p != pb)
 {
   GOMP_PLUGIN_error ("invalid device address");
   return false;
 }


[PATCH] configure: Require C++11 for building code generation tools

2020-08-20 Thread Vasee Vinayagamoorthy
Hello,

After commit [1] ("Redefine NULL to nullptr"), building gcc
fails when $CXX_FOR_BUILD is not using C++11 mode by default.
This happens with gcc-4.8 which is still supported.

This patch fixes this by adding -std=c++11 or its equivalent
to $CXX_FOR_BUILD using AX_CXX_COMPILE_STDCXX(11).

Tested by successful cross native build for aarch64-none-linux-gnu target.

OK for trunk?


ChangeLog:

2020-08-20  Ilya Leoshkevich  
Vasee Vinayagamoorthy  

PR target/95700
* configure: Regenerate.
* configure.ac: Require C++11 for building code generation tools.

Regards
Vasee Vinayagamoorthy


PS: I do not have commit rights, therefore could I request
someone to commit it on my behalf if this patch is approved.
Thanks in advance.

[1] 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=d59a576b8b5e12c3a56f0262912090e2921f5daa
diff --git a/configure b/configure
index a0c5aca9e8d..1023dc72fd4 100755
--- a/configure
+++ b/configure
@@ -6523,6 +6523,1011 @@ $as_echo "#define HAVE_CXX11 1" >>confdefs.h
 
 fi
 
+# Also require C++11 for building code generation tools.
+# Do nothing if "${build}" = "${host}", because in this case
+# CXX_FOR_BUILD="\$(CXX)", and $CXX is already set to the correct value above.
+if test "${build}" != "${host}"; then
+  saved_CXX=$CXX
+  saved_CXXCPP=$CXXCPP
+  CXX=$CXX_FOR_BUILD
+  CXXCPP=
+ax_cxx_compile_alternatives="11 0x"ax_cxx_compile_cxx11_required=true
+  ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+  ac_success=no
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX supports C++11 features by default" >&5
+$as_echo_n "checking whether $CXX supports C++11 features by default... " >&6; }
+if ${ax_cv_cxx_compile_cxx11+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+
+// If the compiler admits that it is not ready for C++11, why torture it?
+// Hopefully, this will speed up the test.
+
+#ifndef __cplusplus
+
+#error "This is not a C++ compiler"
+
+#elif __cplusplus < 201103L
+
+#error "This is not a C++11 compiler"
+
+#else
+
+namespace cxx11
+{
+
+  namespace test_static_assert
+  {
+
+template 
+struct check
+{
+  static_assert(sizeof(int) <= sizeof(T), "not big enough");
+};
+
+  }
+
+  namespace test_final_override
+  {
+
+struct Base
+{
+  virtual ~Base() {}
+  virtual void f() {}
+};
+
+struct Derived : public Base
+{
+  virtual ~Derived() override {}
+  virtual void f() override {}
+};
+
+  }
+
+  namespace test_double_right_angle_brackets
+  {
+
+template < typename T >
+struct check {};
+
+typedef check single_type;
+typedef check> double_type;
+typedef check>> triple_type;
+typedef check>>> quadruple_type;
+
+  }
+
+  namespace test_decltype
+  {
+
+int
+f()
+{
+  int a = 1;
+  decltype(a) b = 2;
+  return a + b;
+}
+
+  }
+
+  namespace test_type_deduction
+  {
+
+template < typename T1, typename T2 >
+struct is_same
+{
+  static const bool value = false;
+};
+
+template < typename T >
+struct is_same
+{
+  static const bool value = true;
+};
+
+template < typename T1, typename T2 >
+auto
+add(T1 a1, T2 a2) -> decltype(a1 + a2)
+{
+  return a1 + a2;
+}
+
+int
+test(const int c, volatile int v)
+{
+  static_assert(is_same::value == true, "");
+  static_assert(is_same::value == false, "");
+  static_assert(is_same::value == false, "");
+  auto ac = c;
+  auto av = v;
+  auto sumi = ac + av + 'x';
+  auto sumf = ac + av + 1.0;
+  static_assert(is_same::value == true, "");
+  static_assert(is_same::value == true, "");
+  static_assert(is_same::value == true, "");
+  static_assert(is_same::value == false, "");
+  static_assert(is_same::value == true, "");
+  return (sumf > 0.0) ? sumi : add(c, v);
+}
+
+  }
+
+  namespace test_noexcept
+  {
+
+int f() { return 0; }
+int g() noexcept { return 0; }
+
+static_assert(noexcept(f()) == false, "");
+static_assert(noexcept(g()) == true, "");
+
+  }
+
+  namespace test_constexpr
+  {
+
+template < typename CharT >
+unsigned long constexpr
+strlen_c_r(const CharT *const s, const unsigned long acc) noexcept
+{
+  return *s ? strlen_c_r(s + 1, acc + 1) : acc;
+}
+
+template < typename CharT >
+unsigned long constexpr
+strlen_c(const CharT *const s) noexcept
+{
+  return strlen_c_r(s, 0UL);
+}
+
+static_assert(strlen_c("") == 0UL, "");
+static_assert(strlen_c("1") == 1UL, "");
+static_assert(strlen_c("example") == 7UL, "");
+static_assert(strlen_c("another\0example") == 7UL, "");
+
+  }
+
+  

Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-20 Thread Iain Buclaw via Gcc-patches
Excerpts from Olivier Hainque's message of August 20, 2020 11:01 am:
> Hello Iain,
> 
>> On 19 Aug 2020, at 14:17, Iain Buclaw  wrote:
> 
>> Ah, no worries, attached updated patch.
> 
>> As we have discussed this off the lists though, we agreed to compromise
>> and leave -nostdinc as it is in SELFTEST_FLAGS.
> 
>> Iain.
>> 
>> ---
>> gcc/ChangeLog:
>> 
>>  * config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Don't include
>>  VxWorks header files if -fself-tests is used.
>>  (STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if -fself-tests is used.
> 
> OK, nit on the ChangeLog: this is -fself-test (no trailing s).
> 
> We have a batch of vxworks changes queued that we will be submitting soon,
> and we might get to rationalize this with other places along the way.
> 

Running the build through one more time, and I've noticed that the make
recipe macro_list also triggers a VSB_DIR not defined error.

However unlike selftests, it does not result in a failed build, so
probably only a minor concern.

Iain.


Re: [PATCH] PR target/96347: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable location

2020-08-20 Thread Iain Buclaw via Gcc-patches
Excerpts from Richard Sandiford's message of August 19, 2020 1:22 pm:
> Iain Buclaw via Gcc-patches  writes:
>> Hi,
>>
>> On x86, a memory reference reference to a TLS address can be broken out
>> and stored in a register, for instance:
>>
>> movq %fs:8+testYearsBC@tpoff, %rdx
>>
>> Subsequently becomes:
>>
>> pushq%rbp
>> leaq 8+testYearsBC@tpoff, %rbp
>> // later...
>> movq %fs:0(%rbp), %rdx
>>
>> When it comes to representing this in Dwarf, mem_loc_descriptor is left
>> with the following RTL, which ix86_delegitimize_tls_address is unable to
>> handle as the symbol_ref has been swapped out with the temporary.
>>
>> (plus:DI (unspec:DI [
>>  (const_int 0 [0])
>>  ] UNSPEC_TP)
>>  (reg/f:DI 6 bp [90]))
>>
>> The UNSPEC_TP expression is checked, ix86_const_not_ok_for_debug_p
>> returns false as it only lets through UNSPEC_GOTOFF, and finally
>> const_ok_for_output is either not smart enough or does not have the
>> information needed to recognize that UNSPEC_TP is a TLS UNSPEC that
>> should be ignored.  This results in informational warnings being fired
>> under -fchecking.
>>
>> The entrypoint that triggers this warning to occur is that MEM codes are
>> first lowered with mem_loc_descriptor, with tls_mem_loc_descriptor only
>> being used if that failed.  This patch switches that around so that TLS
>> memory expressions first call tls_mem_loc_descriptor, and only use
>> mem_loc_descriptor if that fails (which may result in UNSPEC warnings).
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu, I do also have a
>> sparc64-linux-gnu build at hand, but have not yet got the results to
>> check for a before/after comparison.
>>
>> OK for mainline?
>>
>> Regards
>> Iain
>>
>> ---
>> gcc/ChangeLog:
>>
>>  PR target/96347
>>  * dwarf2out.c (is_tls_mem_loc): New.
>>  (mem_loc_descriptor): Call tls_mem_loc_descriptor on TLS memory
>>  expressions, fallback to mem_loc_descriptor if unsuccessful.
>>  (loc_descriptor): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/96347
>>  * g++.dg/other/pr96347.C: New test.
>> ---
>>  gcc/dwarf2out.c  | 30 ++
>>  gcc/testsuite/g++.dg/other/pr96347.C | 61 
>>  2 files changed, 84 insertions(+), 7 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/other/pr96347.C
>>
>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>> index 9deca031fc2..093ceb23c7a 100644
>> --- a/gcc/dwarf2out.c
>> +++ b/gcc/dwarf2out.c
>> @@ -14435,6 +14435,20 @@ is_based_loc (const_rtx rtl)
>> && CONST_INT_P (XEXP (rtl, 1);
>>  }
>>  
>> +/* Return true if this MEM expression is for a TLS variable.  */
>> +
>> +static bool
>> +is_tls_mem_loc (rtx mem)
>> +{
>> +  tree base;
>> +
>> +  if (MEM_EXPR (mem) == NULL_TREE || !MEM_OFFSET_KNOWN_P (mem))
>> +return false;
>> +
>> +  base = get_base_address (MEM_EXPR (mem));
>> +  return (base && VAR_P (base) && DECL_THREAD_LOCAL_P (base));
>> +}
>> +
>>  /* Try to handle TLS MEMs, for which mem_loc_descriptor on XEXP (mem, 0)
>> failed.  */
>>  
>> @@ -15671,11 +15685,12 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
>>return mem_loc_result;
>>}
>>}
>> -  mem_loc_result = mem_loc_descriptor (XEXP (rtl, 0),
>> -   get_address_mode (rtl), mode,
>> -   VAR_INIT_STATUS_INITIALIZED);
>> -  if (mem_loc_result == NULL)
>> +  if (is_tls_mem_loc (rtl))
>>  mem_loc_result = tls_mem_loc_descriptor (rtl);
>> +  if (mem_loc_result == NULL)
> 
> Is the is_tls_mem_loc part necessary?  It looks like it's just
> repeating the first part of tls_mem_loc_descriptor, and so we
> could call that unconditionally instead.
> 

Not necessary, other than it makes it clear from the caller that rtl is
a TLS symbol reference.  The comments for both tls_mem_loc_descriptor
and mem_loc_descriptor would need to be updated to reflect that the
latter is called only if the tls_mem function fails.

> I guess this raises the question: if we swapping the order for
> TLS so that MEM_EXPR trumps the MEM address, why shouldn't we take
> that approach more generally?  I.e. is there any reason to look at
> the MEM_EXPR only when DECL_THREAD_LOCAL_P is true for the base address,
> rather than for all symbolic base addresses?
> 

Someone else might better know.  But as I understand, TLS addresses
must always be looked up through the thread pointer, even if it is
cached to a temporary.

For the given test, the generated DWARF doesn't change whether it is
either inferred from the MEM_EXPR or the MEM address, should the test be
fixed up to not trigger the UNSPEC warning (swap the length and ptr fields).

(0x77615870) DW_OP_const8u address, 0
(0x776158c0) DW_OP_GNU_push_tls_address 0, 0
(0x77615960) DW_OP_deref 0, 0

Whereas globals can be dereferenced from anywhere that can hold an
address 

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-20 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, August 20, 2020 4:55 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng  writes:
> >> -Original Message-
> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> Sent: Wednesday, August 19, 2020 6:06 PM
> >> To: xiezhiheng 
> >> Cc: Richard Biener ;
> gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> >> emitted at -O3
> >>
> >> xiezhiheng  writes:
> >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first 
> >> > for a
> try,
> >> > including all the add/sub arithmetic intrinsics.
> >> >
> >> > Something like faddp intrinsic which only handles floating-point
> operations,
> >> > both FP and NONE flags are suitable for it because FLAG_FP will be
> added
> >> > later if the intrinsic handles floating-point operations.  And I prefer 
> >> > FP
> >> since
> >> > it would be more clear.
> >>
> >> Sounds good to me.
> >>
> >> > But for qadd intrinsics, they would modify FPSR register which is a
> scenario
> >> > I missed before.  And I consider to add an additional flag
> >> FLAG_WRITE_FPSR
> >> > to represent it.
> >>
> >> I don't think we make any attempt to guarantee that the Q flag is
> >> meaningful after saturating intrinsics.  To do that, we'd need to model
> >> the modification of the flag in the .md patterns too.
> >>
> >> So my preference would be to leave this out and just use NONE for the
> >> saturating forms too.
> >
> > The problem is that the test case in the attachment has different results
> under -O0 and -O2.
> 
> Right.  But my point was that I don't think that use case is supported.
> If you want to use saturating instructions and read the Q flag afterwards,
> the saturating instructions need to be inline asm too.
> 
> > In gimple phase statement:
> >   _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
> > would be treated as dead code if we set NONE flag for saturating intrinsics.
> > Adding FLAG_WRITE_FPSR would help fix this problem.
> >
> > Even when we set FLAG_WRITE_FPSR, the uqadd insn:
> >   (insn 11 10 12 2 (set (reg:V2SI 97)
> > (us_plus:V2SI (reg:V2SI 98)
> > (reg:V2SI 99))) {aarch64_uqaddv2si}
> >  (nil))
> > could also be eliminated in RTL phase because this insn will be treated as
> dead insn.
> > So I think we might also need to modify saturating instruction patterns
> adding the side effect of set the FPSR register.
> 
> The problem is that FPSR is global state and we don't in general
> know who might read it.  So if we modelled the modification of the FPSR,
> we'd never be able to fold away saturating arithmetic that does actually
> saturate at compile time, because we'd never know whether the program
> wanted the effect on the Q flag result to be visible (perhaps to another
> function that the compiler can't see).  We'd also be unable to remove
> results that really are dead.
> 
> So I think this is one of those situations in which we can't keep all
> constituents happy.  Catering for people who want to read the Q flag
> would make things worse for those who want saturating arithmetic to be
> optimised as aggressively as possible.  And the same holds in reverse.

I agree.  The test case is extracted from 
gcc.target/aarch64/advsimd-intrinsics/vqadd.c
If we set NONE flag for saturating intrinsics, it would fail in regression 
because some qadd
intrinsics would be treated as dead code and be eliminated.
  Running target unix
  Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ...
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  (test for excess 
errors)
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  (test for 
excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  (test for 
excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
  

Re: [PATCH] emit-rtl.c: Allow splitting of RTX_FRAME_RELATED_P insns?

2020-08-20 Thread Richard Sandiford
Sorry for the slow reply.

Senthil Kumar Selvaraj  writes:
> 2020-08-13  Senthil Kumar Selvaraj  
>
> gcc/ChangeLog:
> 
>   * emit-rtl.c (try_split): Call copy_frame_info_to_split_insn
>   to split certain RTX_FRAME_RELATED_P insns.
>   * recog.c (copy_frame_info_to_split_insn): New function.
>   (peep2_attempt): Split copying of frame related info of
>   RTX_FRAME_RELATED_P insns into above function and call it.
>   * recog.h (copy_frame_info_to_split_insn): Declare it.
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index f9b0e9714d9..3706f0a03fd 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -3822,10 +3822,6 @@ try_split (rtx pat, rtx_insn *trial, int last)
>int njumps = 0;
>rtx_insn *call_insn = NULL;
>  
> -  /* We're not good at redistributing frame information.  */
> -  if (RTX_FRAME_RELATED_P (trial))
> -return trial;
> -
>if (any_condjump_p (trial)
>&& (note = find_reg_note (trial, REG_BR_PROB, 0)))
>  split_branch_probability
> @@ -3842,6 +3838,7 @@ try_split (rtx pat, rtx_insn *trial, int last)
>if (!seq)
>  return trial;
>  
> +  int split_insn_count = 0;
>/* Avoid infinite loop if any insn of the result matches
>   the original pattern.  */
>insn_last = seq;
> @@ -3850,11 +3847,25 @@ try_split (rtx pat, rtx_insn *trial, int last)
>if (INSN_P (insn_last)
> && rtx_equal_p (PATTERN (insn_last), pat))
>   return trial;
> +  split_insn_count++;
>if (!NEXT_INSN (insn_last))
>   break;
>insn_last = NEXT_INSN (insn_last);
>  }
>  
> +  /* We're not good at redistributing frame information if
> + the split occurs before reload or if it results in more
> + than one insn.  */
> +  if (RTX_FRAME_RELATED_P (trial))
> +{
> +  if (!reload_completed || split_insn_count != 1)
> +return trial;
> +
> +  rtx_insn *new_insn = seq;
> +  rtx_insn *old_insn = trial;
> +  copy_frame_info_to_split_insn (old_insn, new_insn);
> +}
> +
>/* We will be adding the new sequence to the function.  The splitters
>   may have introduced invalid RTL sharing, so unshare the sequence now.  
> */
>unshare_all_rtl_in_chain (seq);
> diff --git a/gcc/recog.c b/gcc/recog.c
> index 25f19b1b1cf..e024597f9d7 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -3277,6 +3277,78 @@ peep2_reinit_state (regset live)
>COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
>  }
>  
> +/* Copies frame related info of an insn (old_insn) to the single
> +   insn (new_insn) that was obtained by splitting old_insn.  */

By convention, old_insn and new_insn should be in caps, since they
refer to parameter names.

OK otherwise, thanks.

Richard


Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-20 Thread Martin Jambor
Hi,

On Thu, Aug 20 2020, Richard Sandiford wrote:
>>
>>
>> Really appreciate for your detailed explanation.  BTW, My previous patch
>> for PGO build on exchange2 takes this similar method by setting each cloned
>> node to 1/10th of the frequency several month agao :)
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
>
> Does it seem likely that we'll reach a resolution on this soon?
> I take the point that the patch that introduced the exchange regression
> [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> was just uncovering a latent issue, but still, this is a large regression
> in an important benchmark to be carrying around.  For those of us doing
> regular benchmark CI, the longer the performance trough gets, the harder
> it is to spot other unrelated regressions in the “properly optimised” code.
>
> So if we don't have a specific plan for fixing the regression soon,
> I think we should consider reverting the patch until we have something
> that avoids the exchange regression, even though the patch wasn't
> technically wrong.

Honza's changes have been motivated to big extent as an enabler for
IPA-CP heuristics changes to actually speed up 548.exchange2_r.

On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
weeks ago, this week it is 403, but with my WIP (and so far untested)
patch below it is just 276 seconds - faster than one built with GCC 8
which needs 283 seconds.

I'll be interested in knowing if it also works this well on other
architectures.

The patch still needs a bit of a cleanup.  The change of the default
value of ipa-cp-unit-growth needs to be done only for small compilation
units (like inlining does).  I should experiment if the value of
param_ipa_cp_loop_hint_bonus should be changed or not.  And last but not
least, I also want to clean-up the interfaces between ipa-fnsummary.c
and ipa-cp.c a bit.  I am working on all of this and hope to finish the
patch set in a few (working) days.

The bottom line is that there is a plan to address this regression.

Martin



diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index e4910a04ffa..0d44310503a 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3190,11 +3190,23 @@ devirtualization_time_bonus (struct cgraph_node *node,
 /* Return time bonus incurred because of HINTS.  */
 
 static int
-hint_time_bonus (cgraph_node *node, ipa_hints hints)
+hint_time_bonus (cgraph_node *node, ipa_hints hints, sreal known_iter_freq,
+sreal known_strides_freq)
 {
   int result = 0;
-  if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
-result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+  sreal bonus_for_one = opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+
+  if (hints & INLINE_HINT_loop_iterations)
+{
+  /* FIXME: This should probably be way more nuanced.  */
+  result += (known_iter_freq * bonus_for_one).to_int ();
+}
+  if (hints & INLINE_HINT_loop_stride)
+{
+  /* FIXME: And this as well.  */
+  result += (known_strides_freq * bonus_for_one).to_int ();
+}
+
   return result;
 }
 
@@ -3395,12 +3407,13 @@ perform_estimation_of_a_value (cgraph_node *node, 
vec known_csts,
   int est_move_cost, ipcp_value_base *val)
 {
   int size, time_benefit;
-  sreal time, base_time;
+  sreal time, base_time, known_iter_freq, known_strides_freq;
   ipa_hints hints;
 
   estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
 known_aggs, , ,
-_time, );
+_time, , _iter_freq,
+_strides_freq);
   base_time -= time;
   if (base_time > 65535)
 base_time = 65535;
@@ -3414,7 +3427,7 @@ perform_estimation_of_a_value (cgraph_node *node, 
vec known_csts,
 time_benefit = base_time.to_int ()
   + devirtualization_time_bonus (node, known_csts, known_contexts,
 known_aggs)
-  + hint_time_bonus (node, hints)
+  + hint_time_bonus (node, hints, known_iter_freq, known_strides_freq)
   + removable_params_cost + est_move_cost;
 
   gcc_checking_assert (size >=0);
@@ -3476,7 +3489,7 @@ estimate_local_effects (struct cgraph_node *node)
 {
   struct caller_statistics stats;
   ipa_hints hints;
-  sreal time, base_time;
+  sreal time, base_time, known_iter_freq, known_strides_freq;
   int size;
 
   init_caller_stats ();
@@ -3484,9 +3497,11 @@ estimate_local_effects (struct cgraph_node *node)
  false);
   estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
 known_aggs, , ,
-_time, );
+_time, , _iter_freq,
+_strides_freq);
   time -= devirt_bonus;
-  time -= 

Re: [PATCH] cmpelim: recognize extra clobbers in insns

2020-08-20 Thread Pip Cet via Gcc-patches
On Wed, Aug 19, 2020 at 11:05 AM Richard Sandiford
 wrote:
> Sorry for the slow reply.

Not a problem at all. Thank you for responding!

> Pip Cet via Gcc-patches  writes:
> > I'm working on the AVR cc0 -> CCmode conversion (bug#92729). One
> > problem is that the cmpelim pass is currently very strict in requiring
> > insns of the form
> >
> > (parallel [(set (reg:SI) (op:SI ... ...))
> >(clobber (reg:CC REG_CC))])
> >
> > when in fact AVR's insns often have the form
> >
> > (parallel [(set (reg:SI) (op:SI ... ...))
> >(clobber (scratch:QI))
> >(clobber (reg:CC REG_CC))])
> >
> > The attached patch relaxes checks in the cmpelim code to recognize
> > such insns, and makes it attempt to recognize
> >
> > (parallel [(set (reg:CC REG_CC) (compare:CC ... ...))
> >(set (reg:SI (op:SI ... ...)))
> >(clobber (scratch:QI))])
> >
> > as a new insn for that example. This appears to work.
>
> The idea looks good.  However, I think it'd be better (or at least
> more usual) for the define_insns to list the clobbers the other
> way around:
>
> (parallel [(set (reg:SI) (op:SI ... ...))
>(clobber (reg:CC REG_CC))
>(clobber (scratch:QI))])

That makes sense, thanks for the suggestion. I realized quite quickly
that I would have to reproduce the patterns precisely, including order
in a parallel, and decided to go with the wrong consistent option by
always placing the CC clobber last.

My question is whether it makes more sense to recognize either form
(i.e. a clobber of targetm.flags_regnum at any position in a parallel)
or whether it's okay to assume that the clobber is always the second
element in the parallel. I'm leaning towards the latter version.

> (clobber (scratch…))s generally come last because any rtl optimisation
> pass that uses recog can automatically add any necessary
> (clobber (scratch…))s.  In contrast, very few passes (probably just
> combine) know how to add (clobber (reg:CC REG_CC)) to a pattern that
> didn't already have it.  This is because adding a REG_CC clobber
> requires the pass to “prove” that REG_CC is dead at that point,
> whereas there are no restrictions on adding a scratch clobber.

Thanks again!

Pip


[Patch, committed] Fortran: Fix OpenMP's 'if(simd:' etc. conditions

2020-08-20 Thread Tobias Burnus

This patch adds two 'simdlen' testcases by converting
the respective C testcases. When doing so, it turned out that
  'if (simd: x)'
generated the symbol 'simd' due to:
   gfc_match ("if ( ") == MATCH_YES
   ...
   gfc_match ("%e )", >if_expr)
The latter matches "%e", which creates the symbol and
then fails at ")".  It frees the gfc_expr but does not
undo the symbol.

(In the normal parser there is some logic to do this
→ undo_new_statement, but this is not done for "subpasing")

The next match works (" simd : %e )" and parsing finishes
successfully. But at resolution stage, the 'implicit none'
picks up the spurious 'simd' symbol and gives an error.

Solution: Simply match first the ' : %e' and only
then try to match '%e )'.

Committed as obvious to mainline: 
r11-2780-g656218ab982cc22b826227045826c92743143af1

Tobias

PS: I note that C/C++ print an error if safelen is not >0
while Fortran only prints a warning.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 656218ab982cc22b826227045826c92743143af1
Author: Tobias Burnus 
Date:   Thu Aug 20 13:33:21 2020 +0200

Fortran: Fix OpenMP's 'if(simd:' etc. conditions

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_clauses): Re-order 'if' clause pasing
to avoid creating spurious symbols.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/lastprivate-conditional-10.f90: New test.
---
 gcc/fortran/openmp.c   |  4 +-
 gcc/testsuite/gfortran.dg/gomp/pr67500.f90 | 57 
 .../libgomp.fortran/lastprivate-conditional-10.f90 | 63 ++
 3 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 4d33a450a33..50983737af4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1299,8 +1299,6 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  && c->if_expr == NULL
 	  && gfc_match ("if ( ") == MATCH_YES)
 	{
-	  if (gfc_match ("%e )", >if_expr) == MATCH_YES)
-		continue;
 	  if (!openacc)
 		{
 		  /* This should match the enum gfc_omp_if_kind order.  */
@@ -1323,6 +1321,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		  if (i < OMP_IF_LAST)
 		continue;
 		}
+	  if (gfc_match ("%e )", >if_expr) == MATCH_YES)
+		continue;
 	  gfc_current_locus = old_loc;
 	}
 	  if ((mask & OMP_CLAUSE_IF_PRESENT)
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr67500.f90 b/gcc/testsuite/gfortran.dg/gomp/pr67500.f90
new file mode 100644
index 000..1cecdc48578
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/pr67500.f90
@@ -0,0 +1,57 @@
+! Fortran version of PR c/67500
+! { dg-do compile }
+
+subroutine f1
+  !$omp declare simd simdlen(d)   ! { dg-error "requires a scalar INTEGER expression" }
+end subroutine
+
+subroutine f2
+  !$omp declare simd simdlen(0.5)  ! { dg-error "requires a scalar INTEGER expression" }
+end
+
+subroutine f3 (i)
+  !$omp declare simd simdlen(-2)   ! { dg-warning "INTEGER expression of SIMDLEN clause at .1. must be positive" }
+end subroutine
+
+subroutine f4
+  !$omp declare simd simdlen(0)	   ! { dg-warning "INTEGER expression of SIMDLEN clause at .1. must be positive" }
+end
+
+subroutine foo(p, d, n)
+  integer, allocatable :: p(:)
+  real, value :: d
+  integer, value :: n
+  integer :: i
+
+  !$omp simd safelen(d) ! { dg-error "requires a scalar INTEGER expression" }
+  do i = 1, 16
+  end do
+
+  !$omp simd safelen(0.5)   ! { dg-error "requires a scalar INTEGER expression" }
+  do i = 1, 16
+  end do
+
+  !$omp simd safelen(-2)! { dg-warning "INTEGER expression of SAFELEN clause at .1. must be positive" }
+  do i = 1, 16
+  end do
+
+  !$omp simd safelen(0) ! { dg-warning "INTEGER expression of SAFELEN clause at .1. must be positive" }
+  do i = 1, 16
+  end do
+
+  !$omp simd aligned(p:n)   ! { dg-error "requires a scalar positive constant integer alignment expression" }
+  do i = 1, 16
+  end do
+
+  !$omp simd aligned(p:0.5)  ! { dg-error "requires a scalar positive constant integer alignment expression" }
+  do i = 1, 16
+  end do
+
+  !$omp simd aligned(p:-2)  ! { dg-error "requires a scalar positive constant integer alignment expression" }
+  do i = 1, 16
+  end do
+
+  !$omp simd aligned(p:0)! { dg-error "requires a scalar positive constant integer alignment expression" }
+  do i = 1, 16
+  end do
+end
diff --git a/libgomp/testsuite/libgomp.fortran/lastprivate-conditional-10.f90 b/libgomp/testsuite/libgomp.fortran/lastprivate-conditional-10.f90
new file mode 100644
index 000..116166c48ee
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/lastprivate-conditional-10.f90
@@ -0,0 +1,63 @@
+! { dg-do run }
+! Fortran version of libgomp.c-c++-common/lastprivate-conditional-10.c
+
+module m
+  implicit none
+ 

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive callsg

2020-08-20 Thread Jan Hubicka
> >
> > Really appreciate for your detailed explanation.  BTW, My previous patch
> > for PGO build on exchange2 takes this similar method by setting each cloned
> > node to 1/10th of the frequency several month agao :)
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html

I was looking again into the patch and I think we could adjust it for
mainline.  I wanted to discuss this with Martin Jambor (added to CC)
after he returns from vacation.

I think the idea of scaling all copies by number of copies is a
reasonable thing to do (as discussed in earlier mail) since generally we
do not have information about where the hot recursion levels lie.
By master theorem it is easy to see that this is actually very subtle
problem and our profile info is not precise enough to be useful here.
Making all copies even is thus a good conservative solution.  I did not
quite understand this point earlier, so the patch did not seem to make
much sense to me and I saw it as an odd exchange only hack. I apologize
for that.

What I think is wrong about the patch is that the scale is not set
accoridng to number of actual copies produced, but according to the
--param controlling maximal recursion depth (since with exchage we
manage to meet it).  We do not need to cap recursion depth so low as
patch does (as shown by the factorial example that seems easy enough so
we should handle it eventualy by default). 

I think we need to work out how to scale by actual number of clones.
Problem is that we clone one by one and each time we update the
profile.  So we do not really know how many times we will clone at that
time.  I was thinking of simply keeping the chain of recursively cloned
nodes and each time rescale profile of all of them so it stays the same.
It would cause some roundoff issues and also will affect the ipa-cp
decisions itself since it will use the intermediate values.
This is probably not to bad, but perhaps Martin may have better idea.

Honza
> 
> Does it seem likely that we'll reach a resolution on this soon?
> I take the point that the patch that introduced the exchange regression
> [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> was just uncovering a latent issue, but still, this is a large regression
> in an important benchmark to be carrying around.  For those of us doing
> regular benchmark CI, the longer the performance trough gets, the harder
> it is to spot other unrelated regressions in the “properly optimised” code.
> 
> So if we don't have a specific plan for fixing the regression soon,
> I think we should consider reverting the patch until we have something
> that avoids the exchange regression, even though the patch wasn't
> technically wrong.

The problem here is that IRA needs to be lied to to get good results.
One direction of attack is to ignore this problem, get IPA-CP to work by
default and convince inliner to not inline on large loop depths. This
seems to get good performance on x86-64, I did not test ARM.

We may fix IRA issue, but Vladimir does not seem to know what to do
according to the PR log.

We may lie to compiler again and produce wrong profile.

diff --git a/gcc/predict.def b/gcc/predict.def
index 2d8d4958b6d..d8e502152b8 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -177,7 +177,7 @@ DEF_PREDICTOR (PRED_LOOP_GUARD, "loop guard", HITRATE (73), 
0)
 
 /* Same but for loops containing recursion.  */
 DEF_PREDICTOR (PRED_LOOP_GUARD_WITH_RECURSION, "loop guard with recursion",
-  HITRATE (85), 0)
+  HITRATE (95), 0)
 
 /* The following predictors are used in Fortran. */
 
Gets perofrmance back for me.
I do not really like this alternative.  One option I would still like to
look into is that IRA seems to work better with profile feedback than
with currently guessed profile. It work even better with the wrongly
guessed profile with the patch above, but perhaps we could fix profile
estimate somewhere else to get profile more realistic and IRA happier.

Honza
> 
> Thanks,
> Richard


Re: [Patch] Fortran: Add 'device_type' clause to OpenMP's declare target

2020-08-20 Thread Tobias Burnus

Updated patch – taking Andre's suggestions into account +
extending the testcase, which now catches the previous (NO)HOST
module issue.

OK?

Tobias

On 8/19/20 2:51 PM, Tobias Burnus wrote:

Am 18.08.20 um 19:33 schrieb Andre Vehreschild:

+case OMP_DEVICE_TYPE_HOST:
+  MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_NOHOST, attr_bits);
Why also NOHOST here?

Copy and paste error.

...

+  tree clauses = NULL_TREE;
Would you mind using "omp_clauses" or the like here?

Done now.
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Add 'device_type' clause to OpenMP's declare target

gcc/fortran/ChangeLog:

	* gfortran.h (enum gfc_omp_device_type): New.
	(symbol_attribute, gfc_omp_clauses, gfc_common_head): Use it.
	* module.c (enum ab_attribute): Add AB_OMP_DEVICE_TYPE_HOST,
	AB_OMP_DEVICE_TYPE_NOHOST and AB_OMP_DEVICE_TYPE_ANY.
	(attr_bits, mio_symbol_attribute): Handle it.
	(load_commons, write_common_0): Handle omp_device_type flag.
	* openmp.c (enum omp_mask1): Add OMP_CLAUSE_DEVICE_TYPE
	(OMP_DECLARE_TARGET_CLAUSES): Likewise.
	(gfc_match_omp_clauses): Match 'device_type'.
	(gfc_match_omp_declare_target): Handle it.
	* trans-common.c (build_common_decl): Write device-type clause.
	* trans-decl.c (add_attributes_to_decl): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/declare-target-4.f90: New test.
	* gfortran.dg/gomp/declare-target-5.f90: New test.

 gcc/fortran/gfortran.h | 10 +++
 gcc/fortran/module.c   | 33 -
 gcc/fortran/openmp.c   | 50 -
 gcc/fortran/trans-common.c | 25 ++-
 gcc/fortran/trans-decl.c   | 22 +-
 .../gfortran.dg/gomp/declare-target-4.f90  | 81 ++
 .../gfortran.dg/gomp/declare-target-5.f90  | 63 +
 7 files changed, 277 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 559d3c6b8b8..d0cea838444 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -753,6 +753,13 @@ CInteropKind_t;
that the list is initialized.  */
 extern CInteropKind_t c_interop_kinds_table[];
 
+enum gfc_omp_device_type
+{
+  OMP_DEVICE_TYPE_UNSET,
+  OMP_DEVICE_TYPE_HOST,
+  OMP_DEVICE_TYPE_NOHOST,
+  OMP_DEVICE_TYPE_ANY
+};
 
 /* Structure and list of supported extension attributes.  */
 typedef enum
@@ -919,6 +926,7 @@ typedef struct
   /* Mentioned in OMP DECLARE TARGET.  */
   unsigned omp_declare_target:1;
   unsigned omp_declare_target_link:1;
+  ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
 
   /* Mentioned in OACC DECLARE.  */
   unsigned oacc_declare_create:1;
@@ -1360,6 +1368,7 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *num_threads;
   gfc_omp_namelist *lists[OMP_LIST_NUM];
   enum gfc_omp_sched_kind sched_kind;
+  enum gfc_omp_device_type device_type;
   struct gfc_expr *chunk_size;
   enum gfc_omp_default_sharing default_sharing;
   int collapse, orderedc;
@@ -1699,6 +1708,7 @@ typedef struct gfc_common_head
   char use_assoc, saved, threadprivate;
   unsigned char omp_declare_target : 1;
   unsigned char omp_declare_target_link : 1;
+  ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
   /* Provide sufficient space to hold "symbol.symbol.eq.1234567890".  */
   char name[2*GFC_MAX_SYMBOL_LEN + 1 + 14 + 1];
   struct gfc_symbol *head;
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 5114d5534b8..714fbd9c299 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2051,7 +2051,8 @@ enum ab_attribute
   AB_OMP_REQ_REVERSE_OFFLOAD, AB_OMP_REQ_UNIFIED_ADDRESS,
   AB_OMP_REQ_UNIFIED_SHARED_MEMORY, AB_OMP_REQ_DYNAMIC_ALLOCATORS,
   AB_OMP_REQ_MEM_ORDER_SEQ_CST, AB_OMP_REQ_MEM_ORDER_ACQ_REL,
-  AB_OMP_REQ_MEM_ORDER_RELAXED
+  AB_OMP_REQ_MEM_ORDER_RELAXED, AB_OMP_DEVICE_TYPE_NOHOST,
+  AB_OMP_DEVICE_TYPE_HOST, AB_OMP_DEVICE_TYPE_ANY
 };
 
 static const mstring attr_bits[] =
@@ -2132,6 +2133,9 @@ static const mstring attr_bits[] =
 minit ("OMP_REQ_MEM_ORDER_SEQ_CST", AB_OMP_REQ_MEM_ORDER_SEQ_CST),
 minit ("OMP_REQ_MEM_ORDER_ACQ_REL", AB_OMP_REQ_MEM_ORDER_ACQ_REL),
 minit ("OMP_REQ_MEM_ORDER_RELAXED", AB_OMP_REQ_MEM_ORDER_RELAXED),
+minit ("OMP_DEVICE_TYPE_HOST", AB_OMP_DEVICE_TYPE_HOST),
+minit ("OMP_DEVICE_TYPE_NOHOST", AB_OMP_DEVICE_TYPE_NOHOST),
+minit ("OMP_DEVICE_TYPE_ANYHOST", AB_OMP_DEVICE_TYPE_ANY),
 minit (NULL, -1)
 };
 
@@ -2397,6 +2401,22 @@ mio_symbol_attribute (symbol_attribute *attr)
 	  == OMP_REQ_ATOMIC_MEM_ORDER_RELAXED)
 	MIO_NAME (ab_attribute) (AB_OMP_REQ_MEM_ORDER_RELAXED, attr_bits);
 	}
+  switch (attr->omp_device_type)
+	{
+	case OMP_DEVICE_TYPE_UNSET:
+	  break;
+	case OMP_DEVICE_TYPE_HOST:
+	  MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_HOST, attr_bits);
+	  break;
+	case 

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-20 Thread Richard Sandiford
Xionghu, thanks for working on fixing the exchange regression.

luoxhu via Gcc-patches  writes:
> On 2020/8/13 20:52, Jan Hubicka wrote:
>>> Since there are no other callers outside of these specialized nodes, the
>>> guessed profile count should be same equal?  Perf tool shows that even
>>> each specialized node is called only once, none of them take same time for
>>> each call:
>>>
>>>40.65%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.4
>>>16.31%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.3
>>>10.91%  exchange2_gcc.o  libgfortran.so.5.0.0 [.] 
>>> _gfortran_mminloc0_4_i4
>>> 5.41%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.6
>>> 4.68%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __logic_MOD_new_solver
>>> 3.76%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.5
>>> 1.07%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.7
>>> 0.84%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_brute.constprop.0
>>> 0.47%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.2
>>> 0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_digits_2.constprop.1
>>> 0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_covered.constprop.0
>>> 0.11%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_reflected.constprop.0
>>> 0.00%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
>>> __brute_force_MOD_brute.constprop.1
>>>
>>>
>>> digits_2.constprop.4 & digits_2.constprop.3 takes most of the execution 
>>> time,
>>> So profile count and frequency seem not very helpful for this case?
>> Yep, you can not really determine the time spent on each of recursion
>> levels from the recursive edge probability since you can not assume it
>> to be the same on each level of recursion (here we know it is 0 at level
>> 10 and it is slowly dropping as the recursion increases because there
>> are fewer posiblities to fill up the partly filled sudoku:).
>> Especially you can not assume it after the ipa-cp did the work to figure
>> out that there is recursion depth counter that affect the function.
>> 
>> Thinking of the ipa-cp profile updating changes, I did not consider safe
>> the approach of adding extra weight to the recursive call. The problem
>> is that the recursive call frequency estimate, when comming from static
>> profile stimation, is likely completely off, just like static profile
>> estimator can not be used to predict realistically the number of
>> iterations of loops.  However even with profile feedback this is hard to
>> interpret.
>> 
>> I was wondering if we can not simply detect this scenario and avoid
>> using the recursive edge probability in the profile updating.
>> We could simply scale by the number of copies.
>> This means that if we produce 10 clones, we could just set each clone to
>> 1/10th of the frequency.  This implies that the hottest spot will not be
>> underestimated expontentially much as can happen now, but just 10 times
>> at worst, wich is probably still OK. We play similar games in loop
>> optimizers: static profile estimators expect every loop to trip around 3
>> times so unroller/vectorizer/etc would make no sense on them.
>> 
>> By scaling down according to number of copies we will keep the
>> frequencies of calls to function called from clones "sane".  We will
>> still run into problems when we inline one clone to another (sine its
>> proflie will get scaled by the recursive edge frequency), but perhaps
>> that can be special cases in profiler or make ipa-cp to adjust the
>> recursive edges to get frequency close to 1 as a hack.
>> 
>> This is not pretty, but at least it looks safer to me than the original
>> profile updating patch adding extra weight to the frequency.
>
>
> Really appreciate for your detailed explanation.  BTW, My previous patch
> for PGO build on exchange2 takes this similar method by setting each cloned
> node to 1/10th of the frequency several month agao :)
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html

Does it seem likely that we'll reach a resolution on this soon?
I take the point that the patch that introduced the exchange regression
[https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
was just uncovering a latent issue, but still, this is a large regression
in an important benchmark to be carrying around.  For those of us doing
regular benchmark CI, the longer the performance trough gets, the harder
it is to spot other unrelated regressions in the “properly optimised” code.

So if we don't have a specific plan for fixing the regression soon,
I think we should consider reverting the patch until we have something
that avoids the exchange regression, even 

[PATCH] [obvious] testsuite: Remove test for arm32 in arm_soft_ok

2020-08-20 Thread Christophe Lyon via Gcc-patches
There is no reason to check for arm32 when checking for
-mfloat=abi-soft support. Instead this implies skipping some tests
when targetting a thumb-1 cpu, while they pass.

This patch removes the arm32 check, and uses the same skeleton as
arm_softfp_ok and arm_hard_ok.

Committed as obvious.

2020-08-20  Christophe Lyon  

gcc/testsuite/
* lib/target-supports.exp (arm_soft_ok): Remove arm32 check.
---
 gcc/testsuite/lib/target-supports.exp | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f223fc6..c24330a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3639,13 +3639,11 @@ proc check_effective_target_arm_vect_no_misalign { } {
 # multilibs may be incompatible with this option.
 
 proc check_effective_target_arm_soft_ok { } {
-if { [check_effective_target_arm32] } {
-   return [check_no_compiler_messages arm_soft_ok executable {
-   int main() { return 0;}
+return [check_no_compiler_messages arm_soft_ok object {
+   #include 
+   int dummy;
+   int main (void) { return 0; }
} "-mfloat-abi=soft"]
-} else {
-   return 0
-}
 }
 
 # Return 1 if this is an ARM target supporting -mfpu=vfp with an
-- 
2.7.4



Re: [PATCH] [FIX] Remove object adjustment to preserve object attributes

2020-08-20 Thread Petro Karashchenko via Gcc-patches
Hello Ricard!

Thank you very much for your reply.
The case is that currently the "uncached" attribute is used to generate
special "cache bypass" instructions instead of regular one by ARC backend.
That decision is made based on the result of "lookup_attribute()" call for
a specified MEM_REF. For example
  if (TREE_CODE (addr) == MEM_REF)
{
  attrs = TYPE_ATTRIBUTES (TREE_TYPE (TREE_OPERAND (addr, 0)));
  if (lookup_attribute ("uncached", attrs))
return true;
  attrs = TYPE_ATTRIBUTES (TREE_TYPE (TREE_OPERAND (addr, 1)));
  if (lookup_attribute ("uncached", attrs))
return true;
}

That works pretty well for aligned types, but for unaligned/packed types
(also for example for array -X access) the expression is dropped. In my
test example with current code the "a1" will still inherit "uncached"
attribute because "adjust_object" will not be done, but "a2", "a3" and "a4"
will drop expression, hence "uncached" attribute can't be accessed.

Does that make sense?

Maybe we can clone/copy expression instead of dropping to preserve
"uncached" attribute?

Best Regards,
Petro Karashchenko

чт, 20 серп. 2020 о 11:01 Richard Sandiford 
пише:

> Petro Karashchenko via Gcc-patches  writes:
> > for bitfield MEMREFs
> >
> > [FIX] Propagate uncached type attributes to unaligned/packed types
>
> This doesn't look safe in general.  The current code was added
> to avoid wrong-code problems for accesses that step outside the
> bounds of the original MEM_EXPR.
>
> Could you explain in more detail what's going wrong in the testcase?
> It wasn't obvious why the code you're removing would trigger for that test.
>
> Thanks,
> Richard
>
> >
> > [ARC] Update tests
> >
> > gcc/
> > -xx-xx  Petro Karashchenko  
> >
> > * emit-rtl.c (adjust_address_1): Do not drop the object
> > if the new memory reference is outside the underlying
> > object to preserve object attributes that are needed
> > by some backend implementations. Remove adjust_object
> > parameter as it is not used anymore.
> > (adjust_automodify_address_1): Adjust according to
> > new adjust_address_1 prototype.
> > (replace_equiv_address_nv): Likewise.
> > * gcc/emit-rtl.h (adjust_address): Adjust according to
> > new adjust_address_1 prototype.
> > (adjust_address_nv): Likewise.
> > (adjust_bitfield_address): Likewise.
> > (adjust_bitfield_address_size): Likewise.
> > (adjust_bitfield_address_nv): Likewise.
> >
> > testsuite/
> > -xx-xx  Petro Karashchenko  
> >
> > * gcc.target/arc/uncached-9.c: New file.
> >
> > Problem description:
> > __attribute__((uncached)) is dropped for other than the first member of
> > "packed" or "unaligned" types.
> >
> > Tests:
> > Tested with ARC600 bases ASIC. The correct code is generated.
> >
> > From 1f8a824f8ed8c1452f2bdfc513716c63d5d12545 Mon Sep 17 00:00:00 2001
> > From: Petro Karashchenko 
> > Date: Sun, 16 Aug 2020 01:10:11 +0300
> > Subject: [PATCH] [FIX] Remove object adjustment to preserve object
> attributes
> >  for bitfield MEMREFs
> >
> > [FIX] Propagate uncached type attributes to unaligned/packed types
> >
> > [ARC] Update tests
> >
> > gcc/
> > -xx-xx  Petro Karashchenko  
> >
> > * emit-rtl.c (adjust_address_1): Do not drop the object
> > if the new memory reference is outside the underlying
> > object to preserve object attributes that are needed
> > by some backend implementations. Remove adjust_object
> > parameter as it is not used anymore.
> > (adjust_automodify_address_1): Adjust according to
> > new adjust_address_1 prototype.
> > (replace_equiv_address_nv): Likewise.
> > * gcc/emit-rtl.h (adjust_address): Adjust according to
> > new adjust_address_1 prototype.
> > (adjust_address_nv): Likewise.
> > (adjust_bitfield_address): Likewise.
> > (adjust_bitfield_address_size): Likewise.
> > (adjust_bitfield_address_nv): Likewise.
> >
> > testsuite/
> > -xx-xx  Petro Karashchenko  
> >
> > * gcc.target/arc/uncached-9.c: New file.
> >
> > Signed-off-by: Petro Karashchenko 
> > ---
> >  gcc/emit-rtl.c| 28 +++
> >  gcc/emit-rtl.h| 12 +-
> >  gcc/testsuite/gcc.target/arc/uncached-9.c | 19 +++
> >  3 files changed, 28 insertions(+), 31 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arc/uncached-9.c
> >
> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index f9b0e9714d9..c0e2ead907b 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -2337,8 +2337,7 @@ change_address (rtx memref, machine_mode mode, rtx
> addr)
> >
> >  rtx
> >  adjust_address_1 (rtx memref, machine_mode mode, poly_int64 offset,
> > -   int validate, int adjust_address, int adjust_object,
> > -   poly_int64 size)
> > +   

Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-20 Thread Olivier Hainque
Hello Iain,

> On 19 Aug 2020, at 14:17, Iain Buclaw  wrote:

> Ah, no worries, attached updated patch.

> As we have discussed this off the lists though, we agreed to compromise
> and leave -nostdinc as it is in SELFTEST_FLAGS.

> Iain.
> 
> ---
> gcc/ChangeLog:
> 
>   * config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Don't include
>   VxWorks header files if -fself-tests is used.
>   (STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if -fself-tests is used.

OK, nit on the ChangeLog: this is -fself-test (no trailing s).

We have a batch of vxworks changes queued that we will be submitting soon,
and we might get to rationalize this with other places along the way.

Thanks!

Olivier






[PATCH] [obvious] testsuite: Skip arm/pure-code tests for arm*-*-uclinuxfdpiceabi

2020-08-20 Thread Christophe Lyon via Gcc-patches
FDPIC it uses PIC code, which is incompatible with -mpure-code, so we
want to skip these tests for arm*-*-uclinuxfdpiceabi.

This patch also fixes a typo where the final closing bracket was
commented out.

Committed as obvious.

2020-08-20  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/pure-code/pure-code.exp: Skip for
arm*-*-uclinuxfdpiceabi. Fix missing closing bracket.
---
 gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp 
b/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
index fabebe1a..cf3664a 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
+++ b/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
@@ -16,6 +16,12 @@
 
 # GCC testsuite for ARM's -mpure-code option, using the `dg.exp' driver.
 
+# Exit immediately if this is an ARM FDPIC target (it uses PIC code,
+# which is incompatible with -mpure-code).
+if [istarget arm*-*-uclinuxfdpiceabi] then {
+  return
+}
+
 # Load support procs.
 load_lib gcc-dg.exp
 
@@ -53,4 +59,4 @@ set LTO_TORTURE_OPTIONS ${saved-lto_torture_options}
 
 # All done.
 dg-finish
-#}
+}
-- 
2.7.4



Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-20 Thread Richard Sandiford
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Wednesday, August 19, 2020 6:06 PM
>> To: xiezhiheng 
>> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng  writes:
>> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first for 
>> > a try,
>> > including all the add/sub arithmetic intrinsics.
>> >
>> > Something like faddp intrinsic which only handles floating-point 
>> > operations,
>> > both FP and NONE flags are suitable for it because FLAG_FP will be added
>> > later if the intrinsic handles floating-point operations.  And I prefer FP
>> since
>> > it would be more clear.
>> 
>> Sounds good to me.
>> 
>> > But for qadd intrinsics, they would modify FPSR register which is a 
>> > scenario
>> > I missed before.  And I consider to add an additional flag
>> FLAG_WRITE_FPSR
>> > to represent it.
>> 
>> I don't think we make any attempt to guarantee that the Q flag is
>> meaningful after saturating intrinsics.  To do that, we'd need to model
>> the modification of the flag in the .md patterns too.
>> 
>> So my preference would be to leave this out and just use NONE for the
>> saturating forms too.
>
> The problem is that the test case in the attachment has different results 
> under -O0 and -O2.

Right.  But my point was that I don't think that use case is supported.
If you want to use saturating instructions and read the Q flag afterwards,
the saturating instructions need to be inline asm too.

> In gimple phase statement:
>   _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
> would be treated as dead code if we set NONE flag for saturating intrinsics.
> Adding FLAG_WRITE_FPSR would help fix this problem.
>
> Even when we set FLAG_WRITE_FPSR, the uqadd insn: 
>   (insn 11 10 12 2 (set (reg:V2SI 97)
> (us_plus:V2SI (reg:V2SI 98)
> (reg:V2SI 99))) {aarch64_uqaddv2si}
>  (nil))
> could also be eliminated in RTL phase because this insn will be treated as 
> dead insn.
> So I think we might also need to modify saturating instruction patterns 
> adding the side effect of set the FPSR register.

The problem is that FPSR is global state and we don't in general
know who might read it.  So if we modelled the modification of the FPSR,
we'd never be able to fold away saturating arithmetic that does actually
saturate at compile time, because we'd never know whether the program
wanted the effect on the Q flag result to be visible (perhaps to another
function that the compiler can't see).  We'd also be unable to remove
results that really are dead.

So I think this is one of those situations in which we can't keep all
constituents happy.  Catering for people who want to read the Q flag
would make things worse for those who want saturating arithmetic to be
optimised as aggressively as possible.  And the same holds in reverse.

Thanks,
Richard

>
> So if we could use NONE flag for saturating intrinsics, the description of 
> function attributes and patterns are both incorrect. 
> I think I can propose another patch to fix the patterns if you agree? 
>
> Thanks,
> Xie Zhiheng
>
> #include 
> #include 
>
> typedef union {
>   struct {
> int _xxx:24;
> unsigned int FZ:1;
> unsigned int DN:1;
> unsigned int AHP:1;
> unsigned int QC:1;
> int V:1;
> int C:1;
> int Z:1;
> int N:1;
>   } b;
>   unsigned int word;
> } _ARM_FPSCR;
>
> static volatile int __read_neon_cumulative_sat (void) {
> _ARM_FPSCR _afpscr_for_qc;
> asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
> return _afpscr_for_qc.b.QC;
> }
>
> int main()
> {
>   uint32x2_t op0, op1, res;
>
>   op0 = vdup_n_u32 ((uint32_t)0xfff0);
>   op1 = vdup_n_u32 ((uint32_t)0x20);
>
>   _ARM_FPSCR _afpscr_for_qc;
>   asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
>   _afpscr_for_qc.b.QC = (0);
>   asm volatile ("msr fpsr,%0" :  : "r" (_afpscr_for_qc));
>
>   res = vqadd_u32 (op0, op1);
>   if (__read_neon_cumulative_sat () != 1)
> abort ();
>
>   return 0;
> }


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Richard Sandiford
Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…

Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c?  It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.

Hans-Peter Nilsson via Gcc-patches  writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
>   PR target/93372
>   * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
>   TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
>   PR target/93372
>   * gcc.target/cris/pr93372-47.c: New test.
> ---
>  gcc/reorg.c| 37 +-
>  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 
> ++
>  2 files changed, 85 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>CLEAR_RESOURCE ();
>CLEAR_RESOURCE ();
>  
> +  /* Handle the flags register specially, to be able to accept a
> + candidate that clobbers it.  See also fill_simple_delay_slots.  */
> +  bool filter_flags
> += (slots_to_fill == 1
> +   && targetm.flags_regnum != INVALID_REGNUM
> +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> +  struct resources fset;
> +  struct resources flags_res;
> +  if (filter_flags)
> +{
> +  CLEAR_RESOURCE ();
> +  CLEAR_RESOURCE (_res);
> +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> +}
> +
>/* If we do not own this thread, we must stop as soon as we find
>   something that we can't put in a delay slot, since all we can do
>   is branch into THREAD at a later point.  Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>don't separate or copy insns that set and use CC0.  */
>if (! insn_references_resource_p (trial, , true)
> -   && ! insn_sets_resource_p (trial, , true)
> +   && ! insn_sets_resource_p (trial, filter_flags ?  : , true)
> && ! insn_sets_resource_p (trial, , true)
> +   /* If we're handling sets to the flags register specially, we
> +  only allow an insn into a delay-slot, if it either:
> +  - doesn't set the flags register,
> +  - the "set" of the flags register isn't used (clobbered),
> +  - insns between the delay-slot insn and the trial-insn
> +  as accounted in "set", have not affected the flags register.  */
> +   && (! filter_flags
> +   || ! insn_sets_resource_p (trial, _res, true)
> +   || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> +   || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
> && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
> && (! own_thread || ! sets_cc0_p (pat)
> && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>lose = 1;
>

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-20 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, August 19, 2020 6:06 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng  writes:
> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first for a 
> > try,
> > including all the add/sub arithmetic intrinsics.
> >
> > Something like faddp intrinsic which only handles floating-point operations,
> > both FP and NONE flags are suitable for it because FLAG_FP will be added
> > later if the intrinsic handles floating-point operations.  And I prefer FP
> since
> > it would be more clear.
> 
> Sounds good to me.
> 
> > But for qadd intrinsics, they would modify FPSR register which is a scenario
> > I missed before.  And I consider to add an additional flag
> FLAG_WRITE_FPSR
> > to represent it.
> 
> I don't think we make any attempt to guarantee that the Q flag is
> meaningful after saturating intrinsics.  To do that, we'd need to model
> the modification of the flag in the .md patterns too.
> 
> So my preference would be to leave this out and just use NONE for the
> saturating forms too.

The problem is that the test case in the attachment has different results under 
-O0 and -O2.

In gimple phase statement:
  _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
would be treated as dead code if we set NONE flag for saturating intrinsics.
Adding FLAG_WRITE_FPSR would help fix this problem.

Even when we set FLAG_WRITE_FPSR, the uqadd insn: 
  (insn 11 10 12 2 (set (reg:V2SI 97)
(us_plus:V2SI (reg:V2SI 98)
(reg:V2SI 99))) {aarch64_uqaddv2si}
 (nil))
could also be eliminated in RTL phase because this insn will be treated as dead 
insn.
So I think we might also need to modify saturating instruction patterns adding 
the side effect of set the FPSR register.

So if we could use NONE flag for saturating intrinsics, the description of 
function attributes and patterns are both incorrect. 
I think I can propose another patch to fix the patterns if you agree? 

Thanks,
Xie Zhiheng
#include 
#include 

typedef union {
  struct {
int _xxx:24;
unsigned int FZ:1;
unsigned int DN:1;
unsigned int AHP:1;
unsigned int QC:1;
int V:1;
int C:1;
int Z:1;
int N:1;
  } b;
  unsigned int word;
} _ARM_FPSCR;

static volatile int __read_neon_cumulative_sat (void) {
_ARM_FPSCR _afpscr_for_qc;
asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
return _afpscr_for_qc.b.QC;
}

int main()
{
  uint32x2_t op0, op1, res;

  op0 = vdup_n_u32 ((uint32_t)0xfff0);
  op1 = vdup_n_u32 ((uint32_t)0x20);

  _ARM_FPSCR _afpscr_for_qc;
  asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
  _afpscr_for_qc.b.QC = (0);
  asm volatile ("msr fpsr,%0" :  : "r" (_afpscr_for_qc));

  res = vqadd_u32 (op0, op1);
  if (__read_neon_cumulative_sat () != 1)
abort ();

  return 0;
}


Re: [PATCH] [FIX] Remove object adjustment to preserve object attributes

2020-08-20 Thread Richard Sandiford
Petro Karashchenko via Gcc-patches  writes:
> for bitfield MEMREFs
>
> [FIX] Propagate uncached type attributes to unaligned/packed types

This doesn't look safe in general.  The current code was added
to avoid wrong-code problems for accesses that step outside the
bounds of the original MEM_EXPR.

Could you explain in more detail what's going wrong in the testcase?
It wasn't obvious why the code you're removing would trigger for that test.

Thanks,
Richard

>
> [ARC] Update tests
>
> gcc/
> -xx-xx  Petro Karashchenko  
>
> * emit-rtl.c (adjust_address_1): Do not drop the object
> if the new memory reference is outside the underlying
> object to preserve object attributes that are needed
> by some backend implementations. Remove adjust_object
> parameter as it is not used anymore.
> (adjust_automodify_address_1): Adjust according to
> new adjust_address_1 prototype.
> (replace_equiv_address_nv): Likewise.
> * gcc/emit-rtl.h (adjust_address): Adjust according to
> new adjust_address_1 prototype.
> (adjust_address_nv): Likewise.
> (adjust_bitfield_address): Likewise.
> (adjust_bitfield_address_size): Likewise.
> (adjust_bitfield_address_nv): Likewise.
>
> testsuite/
> -xx-xx  Petro Karashchenko  
>
> * gcc.target/arc/uncached-9.c: New file.
>
> Problem description:
> __attribute__((uncached)) is dropped for other than the first member of
> "packed" or "unaligned" types.
>
> Tests:
> Tested with ARC600 bases ASIC. The correct code is generated.
>
> From 1f8a824f8ed8c1452f2bdfc513716c63d5d12545 Mon Sep 17 00:00:00 2001
> From: Petro Karashchenko 
> Date: Sun, 16 Aug 2020 01:10:11 +0300
> Subject: [PATCH] [FIX] Remove object adjustment to preserve object attributes
>  for bitfield MEMREFs
>
> [FIX] Propagate uncached type attributes to unaligned/packed types
>
> [ARC] Update tests
>
> gcc/
> -xx-xx  Petro Karashchenko  
>
> * emit-rtl.c (adjust_address_1): Do not drop the object
> if the new memory reference is outside the underlying
> object to preserve object attributes that are needed
> by some backend implementations. Remove adjust_object
> parameter as it is not used anymore.
> (adjust_automodify_address_1): Adjust according to
> new adjust_address_1 prototype.
> (replace_equiv_address_nv): Likewise.
> * gcc/emit-rtl.h (adjust_address): Adjust according to
> new adjust_address_1 prototype.
> (adjust_address_nv): Likewise.
> (adjust_bitfield_address): Likewise.
> (adjust_bitfield_address_size): Likewise.
> (adjust_bitfield_address_nv): Likewise.
>
> testsuite/
> -xx-xx  Petro Karashchenko  
>
> * gcc.target/arc/uncached-9.c: New file.
>
> Signed-off-by: Petro Karashchenko 
> ---
>  gcc/emit-rtl.c| 28 +++
>  gcc/emit-rtl.h| 12 +-
>  gcc/testsuite/gcc.target/arc/uncached-9.c | 19 +++
>  3 files changed, 28 insertions(+), 31 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/uncached-9.c
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index f9b0e9714d9..c0e2ead907b 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -2337,8 +2337,7 @@ change_address (rtx memref, machine_mode mode, rtx addr)
>  
>  rtx
>  adjust_address_1 (rtx memref, machine_mode mode, poly_int64 offset,
> -   int validate, int adjust_address, int adjust_object,
> -   poly_int64 size)
> +   int validate, int adjust_address, poly_int64 size)
>  {
>rtx addr = XEXP (memref, 0);
>rtx new_rtx;
> @@ -2413,25 +2412,11 @@ adjust_address_1 (rtx memref, machine_mode mode, 
> poly_int64 offset,
>if (new_rtx == memref && maybe_ne (offset, 0))
>  new_rtx = copy_rtx (new_rtx);
>  
> -  /* Conservatively drop the object if we don't know where we start from.  */
> -  if (adjust_object && (!attrs.offset_known_p || !attrs.size_known_p))
> -{
> -  attrs.expr = NULL_TREE;
> -  attrs.alias = 0;
> -}
> -
>/* Compute the new values of the memory attributes due to this adjustment.
>   We add the offsets and update the alignment.  */
>if (attrs.offset_known_p)
>  {
>attrs.offset += offset;
> -
> -  /* Drop the object if the new left end is not within its bounds.  */
> -  if (adjust_object && maybe_lt (attrs.offset, 0))
> - {
> -   attrs.expr = NULL_TREE;
> -   attrs.alias = 0;
> - }
>  }
>  
>/* Compute the new alignment by taking the MIN of the alignment and the
> @@ -2445,18 +2430,11 @@ adjust_address_1 (rtx memref, machine_mode mode, 
> poly_int64 offset,
>  
>if (maybe_ne (size, 0))
>  {
> -  /* Drop the object if the new right end is not within its bounds.  */
> -  if (adjust_object && maybe_gt (offset + size, attrs.size))
> - {
> -   attrs.expr = 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 20, 2020 at 3:40 PM Uros Bizjak  wrote:
>
> On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu  wrote:
> >
> > On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is 
> > > > > > > not
> > > > > > > enabled since there's no corresponding instruction for general
> > > > > > > registers.
> > > > > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.md: (*movsi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movhi_internal): Ditto.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > > > --- a/gcc/config/i386/i386.md
> > > > > > +++ b/gcc/config/i386/i386.md
> > > > > > @@ -2294,7 +2294,7 @@
> > > > > >
> > > > > >  (define_insn "*movsi_internal"
> > > > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > > > >  (match_operand:SI 1 "general_operand"
> > > > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k 
> > > > > > ,CBC"))]
> > > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > > >
> > > > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > > > allocate a mask register by default, but it will be used to optimize
> > > > > > moves. With the above change, you are risking that during integer
> > > > > > register pressure, the register allocator will allocate zero to a 
> > > > > > mask
> > > > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > > > move.
> > > > > >
> > > > > > The current strategy is that only general registers get allocated 
> > > > > > for
> > > > > > integer modes. Let's keep it this way for now.
> > > > > >
> > > > >
> > > > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > > > move zero into mask register directly.
> > > >
> > > > Although it would be nice if the register allocator was smart enough,
> > > > the current strategy is to introduce peephole2 patterns to fix these
> > > > problems, similar to [1]. These peepholes can be introduced in a
> > > > follow-up patch.
> > > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> > > >
> > >
> > > peephole2 added.
> > >
> > > > > > Otherwise, the patchset LGTM, but please test the suggested changes 
> > > > > > and repost.
> > > > > >
> > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they 
> > > > > > are
> > > > > > used to distinguish mask operations, generated from builtins from
> > > > > > generic operations, so I'd like to keep them for a while. The 
> > > > > > drawback
> > > > > > is, that they are not combined with other operations, but at the end
> > > > > > of the day, this is what the programmer asked for by using builtins.
> > > > >
> > > > > Agree, I prefer to keep them.
> > > >
> > > > Thinking some more about the approach, it looks to me that the optimal
> > > > solution is a post-reload splitter that would convert "generic"
> > > > patterns to mask operations from sse.md. The mask operations don't set
> > > > flags, so we can substantially improve post reload scheduling of these
> > > > instructions by removing flags clobber.
> > > >
> > > > So, simply add "#" to relevant alternatives of logic patterns and add
> > > > something like:
> > > >
> > > > --cut here--
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu  wrote:
>
> On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
> >
> > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > > > enabled since there's no corresponding instruction for general
> > > > > > registers.
> > > > > >
> > > > > > gcc/
> > > > > > PR target/88808
> > > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > > > for mask registers.
> > > > > > (*movhi_internal): Ditto.
> > > > > > (*movqi_internal): Ditto.
> > > > > > (*anddi_1): Support mask register operations
> > > > > > (*and_1): Ditto.
> > > > > > (*andqi_1): Ditto.
> > > > > > (*andn_1): Ditto.
> > > > > > (*_1): Ditto.
> > > > > > (*qi_1): Ditto.
> > > > > > (*one_cmpl2_1): Ditto.
> > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > (*one_cmplqi2_1): Ditto.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > >
> > > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > > --- a/gcc/config/i386/i386.md
> > > > > +++ b/gcc/config/i386/i386.md
> > > > > @@ -2294,7 +2294,7 @@
> > > > >
> > > > >  (define_insn "*movsi_internal"
> > > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > > >  (match_operand:SI 1 "general_operand"
> > > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > >
> > > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > > allocate a mask register by default, but it will be used to optimize
> > > > > moves. With the above change, you are risking that during integer
> > > > > register pressure, the register allocator will allocate zero to a mask
> > > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > > move.
> > > > >
> > > > > The current strategy is that only general registers get allocated for
> > > > > integer modes. Let's keep it this way for now.
> > > > >
> > > >
> > > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > > move zero into mask register directly.
> > >
> > > Although it would be nice if the register allocator was smart enough,
> > > the current strategy is to introduce peephole2 patterns to fix these
> > > problems, similar to [1]. These peepholes can be introduced in a
> > > follow-up patch.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> > >
> >
> > peephole2 added.
> >
> > > > > Otherwise, the patchset LGTM, but please test the suggested changes 
> > > > > and repost.
> > > > >
> > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > > > used to distinguish mask operations, generated from builtins from
> > > > > generic operations, so I'd like to keep them for a while. The drawback
> > > > > is, that they are not combined with other operations, but at the end
> > > > > of the day, this is what the programmer asked for by using builtins.
> > > >
> > > > Agree, I prefer to keep them.
> > >
> > > Thinking some more about the approach, it looks to me that the optimal
> > > solution is a post-reload splitter that would convert "generic"
> > > patterns to mask operations from sse.md. The mask operations don't set
> > > flags, so we can substantially improve post reload scheduling of these
> > > instructions by removing flags clobber.
> > >
> > > So, simply add "#" to relevant alternatives of logic patterns and add
> > > something like:
> > >
> > > --cut here--
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 41c6dbfa668..ad49bdc7583 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -1470,6 +1470,18 @@
> > >]
> > >(const_string "")))])
> > >
> > > +(define_split
> > > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > > +   (any_logic:SWI1248_AVX512BW
> > > + 

Re: [PATCH 2/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 19, 2020 at 2:31 PM Uros Bizjak  wrote:
>
> On Wed, Aug 19, 2020 at 4:17 AM Hongtao Liu  wrote:
>
> OK, modulo:
>
> +/* { dg-final { scan-assembler-not "%xmm" } } */
>
> It is not clear to me what the testcase is testing here. The scan
> string is probably too broad and can generate false positives.
>
> Is it possible to refine the scans?
>

Refine tests to scan for kmov(there should be many spills to mask).

> Uros.

and this one.

gcc/
* config/i386/i386.c (inline_secondary_memory_needed):
No memory is needed between mask regs and gpr.
(ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
mask regno.
* config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* config/i386/i386.md: Exclude mask register in
define_peephole2 which is avaiable only for gpr.

gcc/testsuites/
* gcc.target/i386/spill_to_mask-1.c: New tests.
* gcc.target/i386/spill_to_mask-2.c: New tests.
* gcc.target/i386/spill_to_mask-3.c: New tests.
* gcc.target/i386/spill_to_mask-4.c: New tests.
---
 gcc/config/i386/i386.c|  2 +-
 gcc/config/i386/i386.h|  3 +
 gcc/config/i386/i386.md   |  4 +-
 .../gcc.target/i386/spill_to_mask-1.c | 92 +++
 .../gcc.target/i386/spill_to_mask-2.c | 10 ++
 .../gcc.target/i386/spill_to_mask-3.c | 10 ++
 .../gcc.target/i386/spill_to_mask-4.c | 10 ++
 7 files changed, 128 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/spill_to_mask-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/spill_to_mask-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/spill_to_mask-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/spill_to_mask-4.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f5e824a16ad..d71d6d55be6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19000,7 +19000,7 @@ ix86_hard_regno_mode_ok (unsigned int regno,
machine_mode mode)
   if ((mode == P2QImode || mode == P2HImode))
 return MASK_PAIR_REGNO_P(regno);

-  return (VALID_MASK_REG_MODE (mode)
+  return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode))
   || (TARGET_AVX512BW
   && VALID_MASK_AVX512BW_MODE (mode)));
 }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e0af87450b8..852dd017aa4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1418,6 +1418,7 @@ enum reg_class
   FLOAT_INT_SSE_REGS,
   MASK_REGS,
   ALL_MASK_REGS,
+  INT_MASK_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1477,6 +1478,7 @@ enum reg_class
"FLOAT_INT_SSE_REGS",\
"MASK_REGS",\
"ALL_MASK_REGS",\
+   "INT_MASK_REGS",\
"ALL_REGS" }

 /* Define which registers fit in which classes.  This is an initializer
@@ -1515,6 +1517,7 @@ enum reg_class
  { 0xff9, 0xfff0,   0xf },/* FLOAT_INT_SSE_REGS */\
{ 0x0,0x0, 0xfe0 },/* MASK_REGS */
   \
{ 0x0,0x0, 0xff0 },/* ALL_MASK_REGS */\
+   { 0x900ff,  0xff0, 0xff0 },/* INT_MASK_REGS */\
 { 0x, 0x, 0xfff }/* ALL_REGS  */
  \
 }

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b24a4557871..3a15941c3e8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15047,7 +15047,7 @@
 ;; Replace zero_extend:HI followed by parityhi2_cmp with parityqi2_cmp
 (define_peephole2
   [(set (match_operand:HI 0 "register_operand")
-(zero_extend:HI (match_operand:QI 1 "register_operand")))
+(zero_extend:HI (match_operand:QI 1 "general_reg_operand")))
(parallel [(set (reg:CC FLAGS_REG)
(unspec:CC [(match_dup 0)] UNSPEC_PARITY))
   (clobber (match_dup 0))])]
@@ -15058,7 +15058,7 @@
 ;; Eliminate QImode popcount&1 using parity flag
 (define_peephole2
   [(set (match_operand:SI 0 "register_operand")
-(zero_extend:SI (match_operand:QI 1 "register_operand")))
+(zero_extend:SI (match_operand:QI 1 "general_reg_operand")))
(parallel [(set (match_operand:SI 2 "register_operand")
(popcount:SI (match_dup 0)))
   (clobber (reg:CC FLAGS_REG))])
diff --git a/gcc/testsuite/gcc.target/i386/spill_to_mask-1.c
b/gcc/testsuite/gcc.target/i386/spill_to_mask-1.c
new file mode 100644
index 000..c5043e224ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/spill_to_mask-1.c
@@ -0,0 +1,92 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+#ifndef DTYPE
+#define DTYPE u32
+#endif
+
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef unsigned short u16;
+typedef unsigned char u8;
+
+#define R(x,n) ( (x 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
>
> On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> >
> > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > > > >
> > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > > enabled since there's no corresponding instruction for general
> > > > > registers.
> > > > >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movhi_internal): Ditto.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > --- a/gcc/config/i386/i386.md
> > > > +++ b/gcc/config/i386/i386.md
> > > > @@ -2294,7 +2294,7 @@
> > > >
> > > >  (define_insn "*movsi_internal"
> > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > >  (match_operand:SI 1 "general_operand"
> > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > >
> > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > allocate a mask register by default, but it will be used to optimize
> > > > moves. With the above change, you are risking that during integer
> > > > register pressure, the register allocator will allocate zero to a mask
> > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > move.
> > > >
> > > > The current strategy is that only general registers get allocated for
> > > > integer modes. Let's keep it this way for now.
> > > >
> > >
> > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > move zero into mask register directly.
> >
> > Although it would be nice if the register allocator was smart enough,
> > the current strategy is to introduce peephole2 patterns to fix these
> > problems, similar to [1]. These peepholes can be introduced in a
> > follow-up patch.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> >
>
> peephole2 added.
>
> > > > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > > > repost.
> > > >
> > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > > used to distinguish mask operations, generated from builtins from
> > > > generic operations, so I'd like to keep them for a while. The drawback
> > > > is, that they are not combined with other operations, but at the end
> > > > of the day, this is what the programmer asked for by using builtins.
> > >
> > > Agree, I prefer to keep them.
> >
> > Thinking some more about the approach, it looks to me that the optimal
> > solution is a post-reload splitter that would convert "generic"
> > patterns to mask operations from sse.md. The mask operations don't set
> > flags, so we can substantially improve post reload scheduling of these
> > instructions by removing flags clobber.
> >
> > So, simply add "#" to relevant alternatives of logic patterns and add
> > something like:
> >
> > --cut here--
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 41c6dbfa668..ad49bdc7583 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -1470,6 +1470,18 @@
> >]
> >(const_string "")))])
> >
> > +(define_split
> > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > +   (any_logic:SWI1248_AVX512BW
> > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> > + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(parallel
> > + [(set (match_dup 0)
> > +  

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
>
> On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > > >
> > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > enabled since there's no corresponding instruction for general
> > > > registers.
> > > >
> > > > gcc/
> > > > PR target/88808
> > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > for mask registers.
> > > > (*movhi_internal): Ditto.
> > > > (*movqi_internal): Ditto.
> > > > (*anddi_1): Support mask register operations
> > > > (*and_1): Ditto.
> > > > (*andqi_1): Ditto.
> > > > (*andn_1): Ditto.
> > > > (*_1): Ditto.
> > > > (*qi_1): Ditto.
> > > > (*one_cmpl2_1): Ditto.
> > > > (*one_cmplsi2_1_zext): Ditto.
> > > > (*one_cmplqi2_1): Ditto.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > >
> > > index 74d207c3711..e8ad79d1b0a 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -2294,7 +2294,7 @@
> > >
> > >  (define_insn "*movsi_internal"
> > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > >  (match_operand:SI 1 "general_operand"
> > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > >
> > > I'd rather see *k everywhere, also with *movqi_internal and
> > > *movhi_internal patterns. The "*" means that the allocator won't
> > > allocate a mask register by default, but it will be used to optimize
> > > moves. With the above change, you are risking that during integer
> > > register pressure, the register allocator will allocate zero to a mask
> > > register, and later "optimize" the move with a direct maskreg-intreg
> > > move.
> > >
> > > The current strategy is that only general registers get allocated for
> > > integer modes. Let's keep it this way for now.
> > >
> >
> > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > move zero into mask register directly.
>
> Although it would be nice if the register allocator was smart enough,
> the current strategy is to introduce peephole2 patterns to fix these
> problems, similar to [1]. These peepholes can be introduced in a
> follow-up patch.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
>

peephole2 added.

> > > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > > repost.
> > >
> > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > used to distinguish mask operations, generated from builtins from
> > > generic operations, so I'd like to keep them for a while. The drawback
> > > is, that they are not combined with other operations, but at the end
> > > of the day, this is what the programmer asked for by using builtins.
> >
> > Agree, I prefer to keep them.
>
> Thinking some more about the approach, it looks to me that the optimal
> solution is a post-reload splitter that would convert "generic"
> patterns to mask operations from sse.md. The mask operations don't set
> flags, so we can substantially improve post reload scheduling of these
> instructions by removing flags clobber.
>
> So, simply add "#" to relevant alternatives of logic patterns and add
> something like:
>
> --cut here--
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 41c6dbfa668..ad49bdc7583 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1470,6 +1470,18 @@
>]
>(const_string "")))])
>
> +(define_split
> +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> +   (any_logic:SWI1248_AVX512BW
> + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F && reload_completed"
> +  [(parallel
> + [(set (match_dup 0)
> +  (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
> +  (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
>  (define_insn "kandn"
>[(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> (and:SWI1248_AVX512BW
> --cut here--
>
> and similar for kandn and knot in 

[PATCH] Fortran : Runtime error, reshape constant array assignment, PR96624

2020-08-20 Thread Mark Eggleston
Please find attached a fix for PR96624.  The original patch was by Steve 
Kargl.


Also occurs on releases/gcc-10, releases/gcc-9 and releases/gcc-8 branches.

OK to commit to master and backport?

[PATCH] Fortran  : Runtime error, reshape constant array assignment  PR96624

When assigning a reshaped constant array of shape [2,0] to a
variable fails with an invalid memory access.  If a varibale
with the parameter attribute is initialised with the same reshape
there is no runtime error.

2020-08-20  Steven G. Kargl  

gcc/fortran/

    PR fortran/96624
    * simplify.c (gfc_simplifiy_reshape): Add new variable "zerosize".
    Set zerosize if any of the result shape ranks are zero.  After
    setting the result shapes, if zerosize is set jump to new label
    "sizezero".  Add label "sizezero" just before clearing index and
    returning result.

2020-08-20  Mark Eggleston 

gcc/testsuite/

    PR fortran/96624
    *gfortran/pr96624.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From ab94bb744a7d64751f6b93cc56ad3ed5fe5cfc81 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Mon, 17 Aug 2020 13:50:28 +0100
Subject: [PATCH] Fortran  : Runtime error, reshape constant array assignment
 PR96624

When assigning a reshaped constant array of shape [2,0] to a
variable fails with an invalid memory access.  If a varibale
with the parameter attribute is initialised with the same reshape
there is no runtime error.

2020-08-20  Steven G. Kargl  

gcc/fortran/

	PR fortran/96624
	* simplify.c (gfc_simplifiy_reshape): Add new variable "zerosize".
	Set zerosize if any of the result shape ranks are zero.  After
	setting the result shapes, if zerosize is set jump to new label
	"sizezero".  Add label "sizezero" just before clearing index and
	returning result.

2020-08-20  Mark Eggleston  

gcc/testsuite/

	PR fortran/96624
	*gfortran/pr96624.f90: New test.
---
 gcc/fortran/simplify.c| 11 ++-
 gcc/testsuite/gfortran.dg/pr96624.f90 | 10 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr96624.f90

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index eb8b2afeb29..0d77d289651 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6721,6 +6721,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   unsigned long j;
   size_t nsource;
   gfc_expr *e, *result;
+  bool zerosize = false;
 
   /* Check that argument expression types are OK.  */
   if (!is_constant_array_expr (source)
@@ -6843,7 +6844,14 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   result->rank = rank;
   result->shape = gfc_get_shape (rank);
   for (i = 0; i < rank; i++)
-mpz_init_set_ui (result->shape[i], shape[i]);
+{
+  mpz_init_set_ui (result->shape[i], shape[i]);
+  if (shape[i] == 0)
+	zerosize = true;
+}
+ 
+  if (zerosize)
+goto sizezero;
 
   while (nsource > 0 || npad > 0)
 {
@@ -6893,6 +6901,7 @@ inc:
   break;
 }
 
+sizezero:
   mpz_clear (index);
 
   return result;
diff --git a/gcc/testsuite/gfortran.dg/pr96624.f90 b/gcc/testsuite/gfortran.dg/pr96624.f90
new file mode 100644
index 000..a4cfe5c3279
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96624.f90
@@ -0,0 +1,10 @@
+! { dg-do run }
+
+program test
+  integer :: a(2,0)
+  character(4) :: buffer
+  a = reshape([1,2,3,4], [2,0])
+  write(buffer,"(2a1)") ">", "<"
+  if (trim(buffer).ne."><") stop 1
+end
+
-- 
2.11.0