Re: [RS6000] Use gen_hard_reg_clobber in rs6000.c

2018-12-14 Thread Alan Modra
On Fri, Dec 14, 2018 at 07:57:08PM -0600, Segher Boessenkool wrote:
> On Sat, Dec 15, 2018 at 11:48:07AM +1030, Alan Modra wrote:
> > I noticed when looking at PR88311 that rs6000_call_sysv should be
> > using gen_hard_reg_clobber (as the sysv call insns did prior to me
> > introducing rs6000_call_sysv).  This patch fixes that minor
> > regression, and other like places in rs6000.c.  Bootstrapped and
> > regression tested powerpc64le-linux.  Powerpc64-linux biarch bootstrap
> > still in progress.  OK mainline?
> 
> Sure, okay for trunk.  Does this actually change behaviour anywhere, or
> is it just a performance / memory use optimisation?

No behaviour changes as far as I know.  It just saves creating
unnecessary duplicated RTL.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-12-14 Thread Segher Boessenkool
On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
> On 11/16/18 5:29 PM, Segher Boessenkool wrote:
> > On Fri, Nov 16, 2018 at 04:26:18PM -0600, Peter Bergner wrote:
> >> However, when I made the change below, the length attribute seems a
> >> little off.  For *_64bit, we have a length of 4, but for *_32bit, we
> >> have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
> >> if we're loading an easy_vector_constant into one of the vector regs.
> >> For loading a TImode constant into a GPR, then it could be anything from
> >> 8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaad) for
> >> -m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
> >> it would be 16 bytes to ??? bytes.
> >>
> >> Should those sizes be updated too?  If so, what should they be?
> >> The smallest, average or worst case lengths?  I assume we could use
> >> another iterator to separate the vector lengths from the gpr lengths
> >> if we need to.
> > 
> > Worst case.  This is required for correctness.
> 
> Ok, I looked into this and the point where we must have correct length info
> is in final assembly generation, so very very late.

I'm not convinced this is true.  And problems with this only show up
with unusual code, so typically after releases :-(

Maybe we could make a mode where conditional jumps can jump only 128
bytes or similar, that would make testing much easier (problems will
show up much more often than with the 32kB max distance we have).

> For the alternatives
> I'm changing, we're loading into GPR regs and these alternatives are always
> split (split2), so these length values are never used/seen at final assembly
> time.

But some move instructions are created *after* split2.

> Given the above, I'm guessing we should probably go with the most common
> length value (ie, 8 for 64-bit and 16 for 32-bit)?  The following patch
> implements that.  Does this seem reasonable to you?

I do like the patch.  Let me sleep on it.


Segher


Re: [RS6000] Use gen_hard_reg_clobber in rs6000.c

2018-12-14 Thread Segher Boessenkool
On Sat, Dec 15, 2018 at 11:48:07AM +1030, Alan Modra wrote:
> I noticed when looking at PR88311 that rs6000_call_sysv should be
> using gen_hard_reg_clobber (as the sysv call insns did prior to me
> introducing rs6000_call_sysv).  This patch fixes that minor
> regression, and other like places in rs6000.c.  Bootstrapped and
> regression tested powerpc64le-linux.  Powerpc64-linux biarch bootstrap
> still in progress.  OK mainline?

Sure, okay for trunk.  Does this actually change behaviour anywhere, or
is it just a performance / memory use optimisation?


Segher


>   * config/rs6000/rs6000.c (generate_set_vrsave, rs6000_emit_savres_rtx),
>   (rs6000_emit_prologue, rs6000_call_aix, rs6000_call_sysv),
>   (rs6000_call_darwin_1): Use gen_hard_reg_clobber.


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-14 Thread Martin Sebor

On 12/14/18 4:36 PM, Jeff Law wrote:

On 12/14/18 3:05 AM, Uecker, Martin wrote:


Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:

On 12/12/18 11:12 AM, Uecker, Martin wrote:

...

diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 78e768c2366..ef039560eb9 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
not see
  
  #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P

  #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
+
+#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
+#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
  #endif /* GCC_C_OBJC_COMMON */

I wonder if we even need the lang hook anymore.  ISTM that a
front-end
that wants to use the function descriptors can just set
FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
else we'll
use the trampoline.  Thoughts?

The lang hook also affects the minimum alignment for function
pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
does
not appear to change the default alignment on any architecture, but
it causes a failure in i386/gcc.target/i386/attr-aligned.c when
requesting a smaller alignment which is then silently ignored.

Ugh.  I didn't see that.

The test is new (2019-11-29 Martin Sebor), but one could
argue that we could simply remove this specific test as 'aligned'
is only required to increase alignment. Martin?

The test is meant to test that we do the right thing consistently.  If
we're failing with your patch, then that needs to be addressed.


I haven't been paying attention here and so I don't know how the test
fails after the change.  It's meant to verify that attribute aligned
successfully reduces the alignment of functions that have not been
previously declared with one all the way down to the supported minimum
(which is 1 on i386).  I agree with Jeff that removing the test would
not be right unless the failure is due to some bad assumptions on my
part.  If it's the built-in that fails that could be due to a bug in
it (it's very new).

Martin



I read your note as the test would fail if you dropped the
CUSTOM_FUNCTION_DESCRIPTORS macro, not that it was failing with your
patch as-is.






I am not sure what the best approach is, but my preference
would be to remove the lang hook and the FUNCTION_ALIGNMENT
logic which will also fix the test case (the requested
alignment will be applied).

I would then instead add a warning (or error?) which triggers
only with -fno-trampolines if the user requests an alignment
which is too small for this mechanism to work.
Does this sound reasonable?

So I'm thinking we should wrap the existing patch as-is for the trunk
(we're well into stage3 after all).  So leave the hook as-is for gcc-
9.

We can then tackle removal of the hook, including twiddling
FUNCTION_ALIGNMENT for gcc-10.

Does that sound reasonable to you?

This is fine with me. So just confirm: I should install the
patch despite the regression?

We need to address the regression.  Simply removing the test is probably
not the way to go.

jeff





[RS6000] Use gen_hard_reg_clobber in rs6000.c

2018-12-14 Thread Alan Modra
I noticed when looking at PR88311 that rs6000_call_sysv should be
using gen_hard_reg_clobber (as the sysv call insns did prior to me
introducing rs6000_call_sysv).  This patch fixes that minor
regression, and other like places in rs6000.c.  Bootstrapped and
regression tested powerpc64le-linux.  Powerpc64-linux biarch bootstrap
still in progress.  OK mainline?

* config/rs6000/rs6000.c (generate_set_vrsave, rs6000_emit_savres_rtx),
(rs6000_emit_prologue, rs6000_call_aix, rs6000_call_sysv),
(rs6000_call_darwin_1): Use gen_hard_reg_clobber.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 74175d2dada..34d6b37e411 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25928,8 +25928,7 @@ generate_set_vrsave (rtx reg, rs6000_stack_t *info, int 
epiloguep)
 if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
   {
if (!epiloguep || call_used_regs [i])
- clobs[nclobs++] = gen_rtx_CLOBBER (VOIDmode,
-gen_rtx_REG (V4SImode, i));
+ clobs[nclobs++] = gen_hard_reg_clobber (V4SImode, i);
else
  {
rtx reg = gen_rtx_REG (V4SImode, i);
@@ -26253,8 +26252,7 @@ rs6000_emit_savres_rtx (rs6000_stack_t *info,
   if (!(sel & SAVRES_SAVE) && (sel & SAVRES_LR))
 RTVEC_ELT (p, offset++) = ret_rtx;
 
-  RTVEC_ELT (p, offset++)
-= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
+  RTVEC_ELT (p, offset++) = gen_hard_reg_clobber (Pmode, LR_REGNO);
 
   sym = rs6000_savres_routine_sym (info, sel);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
@@ -26263,8 +26261,7 @@ rs6000_emit_savres_rtx (rs6000_stack_t *info,
   if ((sel & SAVRES_REG) == SAVRES_VR)
 {
   /* Vector regs are saved/restored using [reg+reg] addressing.  */
-  RTVEC_ELT (p, offset++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, use_reg));
+  RTVEC_ELT (p, offset++) = gen_hard_reg_clobber (Pmode, use_reg);
   RTVEC_ELT (p, offset++)
= gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 0));
 }
@@ -26942,9 +26939,7 @@ rs6000_emit_prologue (void)
   sz += LAST_ALTIVEC_REGNO - info->first_altivec_reg_save + 1;
   p = rtvec_alloc (sz);
   j = 0;
-  RTVEC_ELT (p, j++) = gen_rtx_CLOBBER (VOIDmode,
-   gen_rtx_REG (SImode,
-LR_REGNO));
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (SImode, LR_REGNO);
   RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode,
gen_rtx_SYMBOL_REF (Pmode,
"*save_world"));
@@ -28117,8 +28112,7 @@ rs6000_emit_epilogue (int sibcall)
= gen_rtx_USE (VOIDmode, gen_rtx_SYMBOL_REF (Pmode, alloc_rname));
   /* The instruction pattern requires a clobber here;
 it is shared with the restVEC helper. */
-  RTVEC_ELT (p, j++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 11));
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (Pmode, 11);
 
   {
/* CR register traditionally saved as CR2.  */
@@ -28164,14 +28158,10 @@ rs6000_emit_epilogue (int sibcall)
  && save_reg_p (info->first_fp_reg_save + i))
cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
}
-  RTVEC_ELT (p, j++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 0));
-  RTVEC_ELT (p, j++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (SImode, 12));
-  RTVEC_ELT (p, j++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (SImode, 7));
-  RTVEC_ELT (p, j++)
-   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (SImode, 8));
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (Pmode, 0);
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (SImode, 12);
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (SImode, 7);
+  RTVEC_ELT (p, j++) = gen_hard_reg_clobber (SImode, 8);
   RTVEC_ELT (p, j++)
= gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, 10));
   insn = emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
@@ -28819,8 +28809,7 @@ rs6000_emit_epilogue (int sibcall)
   int elt = 0;
   RTVEC_ELT (p, elt++) = ret_rtx;
   if (lr)
-   RTVEC_ELT (p, elt++)
- = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
+   RTVEC_ELT (p, elt++) = gen_hard_reg_clobber (Pmode, LR_REGNO);
 
   /* We have to restore more than two FP registers, so branch to the
 restore function.  It will return to our caller.  */
@@ -37925,7 +37914,7 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, 
rtx cookie)
   if (toc_restore)
 call[n_call++] = toc_restore;
 
-  call[n_call++] = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
+  call[n_call++] = gen_hard_reg_clobber (Pmode, LR_REGNO);
 
   insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (n_call, call));
   insn = emit_call_insn (insn);
@@ -38020,7 +38009,7 @@ 

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-14 Thread Martin Sebor

the manual a function declaration like

   __attribute__ ((alloc_size (1), malloc))
   void* allocate (unsigned);

should have those two attributes applied to it.  Yet, alloc_size
is actually applied to its type (but not to its decl) while malloc
to the function's decl but not its type (bug 88397).

I'm pretty sure most users still expect the following to pass:

   _Static_assert (__builtin_has_attribute (allocate, alloc_size));
   _Static_assert (__builtin_has_attribute (allocate, malloc));


Users shouldn't expect this.  If anything, we should document what
attributes are type attributes and which attributes are declaration
attributes.


I designed the built-in based on what I expect.

The programs that care about whether a function is declared,
say with attribute alloc_align or malloc, do not and should
not have to worry about whether the attribute is on the decl
or on its type -- in the expected use cases it makes no
difference.  Those that might care whether an attribute is
on the type can easily check the type:

  __builtin_has_attribute (__typeof__ (allocate), alloc_size)

(I would expect GCC to apply an attribute either to a decl or
to a type but not to both.)

I do agree that whether an attribute applies to a function or
its type should be reviewed and where it makes sense documented.
More than that, some attributes that currently only apply to
function decls should be changed to apply to (function) types
instead so that calls via pointers to such functions can get
the same benefits as calls to the functions themselves.  Malloc
is an example (bug 88397).


With the way you're proposing, users could check the type attributes
simply through __typeof/decltype etc., but couldn't test solely the
declaration attributes.


I'm not proposing anything -- what I described is the design.
I don't know of a use case for testing the decl alone for
attributes.  In all those I can think of, what matters is
the union of attributes between the decl and its type.

But I'm not opposed to enhancing the function if an important
use case does turn up that's not supported.  For what you are
asking for this should already do it so I don't see a need to
change anything:

  #define decl_has_attribute(d, ...)  \
 (__builtin_has_attribute (d, __VA_ARGS__)  \
  && !__builtin_has_attribute (__typeof__ (d), __VA_ARGS__))

Martin


Re: [PATCH] x86: Don't use get_frame_size to finalize stack frame

2018-12-14 Thread H.J. Lu
On Fri, Dec 14, 2018 at 3:24 PM Jeff Law  wrote:
>
> On 12/14/18 4:01 PM, H.J. Lu wrote:
> > On Thu, Dec 13, 2018 at 11:11 PM Uros Bizjak  wrote:
> >> On Thu, Dec 13, 2018 at 6:36 PM H.J. Lu  wrote:
> >>> get_frame_size () returns used stack slots during compilation, which
> >>> may be optimized out later.  Since ix86_find_max_used_stack_alignment
> >>> is called by ix86_finalize_stack_frame_flags to check if stack frame
> >>> is required, there is no need to call get_frame_size () which may give
> >>> inaccurate final stack frame size.
> >>>
> >>> Tested on AVX512 machine configured with
> >>>
> >>> --with-arch=native --with-cpu=native
> >>>
> >>> OK for trunk?
> >>>
> >>>
> >>> H.J.
> >>> ---
> >>> gcc/
> >>>
> >>> PR target/88483
> >>> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> >>> use get_frame_size ().
> >>>
> >>> gcc/testsuite/
> >>>
> >>> PR target/88483
> >>> * gcc.target/i386/stackalign/pr88483.c: New test.
> >> LGTM, but you know this part of the compiler better than I.
> >>
> >> Thanks,
> >> Uros.
> >>
> >>> ---
> >>>  gcc/config/i386/i386.c  |  1 -
> >>>  .../gcc.target/i386/stackalign/pr88483.c| 17 +
> >>>  2 files changed, 17 insertions(+), 1 deletion(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> >>>
> >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >>> index caa701fe242..edc8f4f092e 100644
> >>> --- a/gcc/config/i386/i386.c
> >>> +++ b/gcc/config/i386/i386.c
> >>> @@ -12876,7 +12876,6 @@ ix86_finalize_stack_frame_flags (void)
> >>>&& flag_exceptions
> >>>&& cfun->can_throw_non_call_exceptions)
> >>>&& !ix86_frame_pointer_required ()
> >>> -  && get_frame_size () == 0
> >>>&& ix86_nsaved_sseregs () == 0
> >>>&& ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
> >>>  {
> >>> diff --git a/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c 
> >>> b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> >>> new file mode 100644
> >>> index 000..5aec8fd4cf6
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> >>> @@ -0,0 +1,17 @@
> >>> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> >>> +/* { dg-options "-O2 -mavx2" } */
> >>> +
> >>> +struct B
> >>> +{
> >>> +  char a[12];
> >>> +  int b;
> >>> +};
> >>> +
> >>> +struct B
> >>> +f2 (void)
> >>> +{
> >>> +  struct B x = {};
> >>> +  return x;
> >>> +}
> >>> +
> >>> +/* { dg-final { scan-assembler-not 
> >>> "and\[lq\]?\[^\\n\]*-\[0-9\]+,\[^\\n\]*sp" } } */
> >>> --
> >>> 2.19.2
> >>>
> > My fix triggered a latent bug in ix86_find_max_used_stack_alignment.
> > Here is the fix.  OK for trunk?
> >
> > Thanks.
> >
> > -- H.J.
> >
> >
> > 0001-x86-Properly-check-stack-reference.patch
> >
> > From 83f0b37f287ed198a3b50e2be6b0f7f5c154020e Mon Sep 17 00:00:00 2001
> > From: "H.J. Lu" 
> > Date: Fri, 14 Dec 2018 12:21:02 -0800
> > Subject: [PATCH] x86: Properly check stack reference
> >
> > A latent bug in ix86_find_max_used_stack_alignment was uncovered by the
> > fix for PR target/88483, which caused:
> >
> > FAIL: gcc.target/i386/incoming-8.c scan-assembler andl[\\t ]*\\$-16,[\\t 
> > ]*%esp
> >
> > on i386.  ix86_find_max_used_stack_alignment failed to notice stack
> > reference via non-stack/frame registers and missed stack alignment
> > requirement.  We should track all registers which may reference stack
> > by checking registers set from stack referencing registers.
> >
> > Tested on i686 and x86-64 with
> >
> > --with-arch=native --with-cpu=native
> >
> > on AVX512 machine.  Tested on i686 and x86-64 without
> >
> > --with-arch=native --with-cpu=native
> >
> > on x86-64 machine.
> >
> >   PR target/88483
> >   * config/i386/i386.c (ix86_stack_referenced_p): New function.
> >   (ix86_find_max_used_stack_alignment): Call ix86_stack_referenced_p
> >   to check if stack is referenced.
> > ---
> >  gcc/config/i386/i386.c | 43 ++
> >  1 file changed, 39 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 4599ca2a7d5..bf93ec3722f 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -12777,6 +12777,18 @@ output_probe_stack_range (rtx reg, rtx end)
> >return "";
> >  }
> >
> > +/* Return true if OP references stack frame though one of registers
> > +   in STACK_REF_REGS.  */
> > +
> > +static bool
> > +ix86_stack_referenced_p (const_rtx op, rtx *stack_ref_regs)
> > +{
> > +  for (int i = 0; i < LAST_REX_INT_REG; i++)
> > +if (stack_ref_regs[i] && reg_mentioned_p (stack_ref_regs[i], op))
> > +  return true;
> > +  return false;
> > +}
> > +
> >  /* Return true if stack frame is required.  Update STACK_ALIGNMENT
> > to the largest alignment, in bits, of stack slot used if stack
> > frame is required and CHECK_STACK_SLOT is true.  */
> 

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-14 Thread Jeff Law
On 12/14/18 3:05 AM, Uecker, Martin wrote:
> 
> Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:
>> On 12/12/18 11:12 AM, Uecker, Martin wrote:
> ...
> diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
> index 78e768c2366..ef039560eb9 100644
> --- a/gcc/c/c-objc-common.h
> +++ b/gcc/c/c-objc-common.h
> @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
> not see
>  
>  #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
>  #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
> +
> +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
> +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
>  #endif /* GCC_C_OBJC_COMMON */
 I wonder if we even need the lang hook anymore.  ISTM that a
 front-end
 that wants to use the function descriptors can just set
 FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
 else we'll
 use the trampoline.  Thoughts?
>>> The lang hook also affects the minimum alignment for function
>>> pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
>>> does
>>> not appear to change the default alignment on any architecture, but
>>> it causes a failure in i386/gcc.target/i386/attr-aligned.c when
>>> requesting a smaller alignment which is then silently ignored.
>> Ugh.  I didn't see that.
> The test is new (2019-11-29 Martin Sebor), but one could
> argue that we could simply remove this specific test as 'aligned'
> is only required to increase alignment. Martin?
The test is meant to test that we do the right thing consistently.  If
we're failing with your patch, then that needs to be addressed.

I read your note as the test would fail if you dropped the
CUSTOM_FUNCTION_DESCRIPTORS macro, not that it was failing with your
patch as-is.



> 
>>> I am not sure what the best approach is, but my preference
>>> would be to remove the lang hook and the FUNCTION_ALIGNMENT
>>> logic which will also fix the test case (the requested
>>> alignment will be applied).
>>>
>>> I would then instead add a warning (or error?) which triggers
>>> only with -fno-trampolines if the user requests an alignment
>>> which is too small for this mechanism to work.
>>> Does this sound reasonable?
>> So I'm thinking we should wrap the existing patch as-is for the trunk
>> (we're well into stage3 after all).  So leave the hook as-is for gcc-
>> 9.
>>
>> We can then tackle removal of the hook, including twiddling
>> FUNCTION_ALIGNMENT for gcc-10.
>>
>> Does that sound reasonable to you?
> This is fine with me. So just confirm: I should install the 
> patch despite the regression?
We need to address the regression.  Simply removing the test is probably
not the way to go.

jeff


[PATCH] v5: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-14 Thread David Malcolm
On Thu, 2018-12-13 at 15:37 -0500, Jason Merrill wrote:
> On 12/13/18 3:12 PM, David Malcolm wrote:
> > On Wed, 2018-12-12 at 15:37 -0500, Jason Merrill wrote:
> > > On 12/7/18 3:13 PM, David Malcolm wrote:
> > > > On Tue, 2018-12-04 at 18:31 -0500, Jason Merrill wrote:
> > > > > On 12/3/18 5:10 PM, Jeff Law wrote:
> > > > > > On 11/19/18 9:51 AM, David Malcolm wrote:
> > > > 
> > > > [...]
> > > > > > @@ -1058,6 +1058,9 @@ grokbitfield (const cp_declarator
> > > > > > *declarator,
> > > > > >  return NULL_TREE;
> > > > > >}
> > > > > >
> > > > > > +  if (width)
> > > > > > +STRIP_ANY_LOCATION_WRAPPER (width);
> > > > > 
> > > > > Why is this needed?  We should already be reducing width to
> > > > > an
> > > > > unwrapped
> > > > > constant value somewhere along the line.
> > > > 
> > > > "width" is coming from cp_parser_member_declaration, from:
> > > > 
> > > >   /* Get the width of the bitfield.  */
> > > >   width = cp_parser_constant_expression (parser,
> > > > false,
> > > > NULL,
> > > >  cxx_dialec
> > > > t >=
> > > > cxx11);
> > > > 
> > > > and currently nothing is unwrapping the value.  We presumably
> > > > need
> > > > to
> > > > unwrap (or fold?) it before it is stashed in
> > > > DECL_BIT_FIELD_REPRESENTATIVE (value).
> > > > 
> > > > Without stripping (or folding) here, we e.g. lose a warning and
> > > > get
> > > > this:
> > > > FAIL: g++.dg/abi/empty22.C  -std=gnu++98  (test for
> > > > warnings,
> > > > line 15)
> > > 
> > > Why does that happen?  check_bitfield_decl ought to handle the
> > > location
> > > wrapper fine.  That's where it gets folded.
> > 
> > The unstripped location wrapper defeats this check for zero in
> > check_field_decls within cp/class.c:
> > 
> > 3555  if (DECL_C_BIT_FIELD (x)
> > 3556  && integer_zerop
> > (DECL_BIT_FIELD_REPRESENTATIVE (x)))
> > 3556/* We don't treat zero-width bitfields as
> > making a class
> > 3557   non-empty.  */
> 
> Aha.  I wonder if integer_zerop should look through location
> wrappers? 
> Or alternately, abort if it gets one?
> 
> On a tangent, perhaps we also want a macro like TREE_CODE that looks 
> through wrappers.  TREE_CODE_WRAPPED?  _NO_WRAP?  Other name ideas?

Here's a v5 of the patch which makes integer_zerop (and various other
predicates) look through location wrappers.  I added a selftest for them
to tree.c (selftest::test_predicates).

There are a few places which do things like:

  if (integer_zerop (expr) && !TREE_OVERFLOW (expr))

so I converted these to:

  if (integer_zerop (expr)
  && !TREE_OVERFLOW (tree_strip_any_location_wrapper (expr)))

i.e. to use TREE_OVERFLOW on the constant, rather than on the wrapper.

[I first attempted the "abort if it gets one" approach, via:
  gcc_assert (!location_wrapper_p (expr))
but it led to numerous bootstrap and test failures, and lots of
stripping, so doing it within the predicates seemed cleaner]

> > 3558;
> > 3559  else
> > 
> > leading it to erroneously use the "else" clause, which thus
> > erroneously
> > clears CLASSTYPE_EMPTY_P, leading to the loss of:
> > 
> > g++.dg/abi/empty22.C:15:6: warning: empty class 'dummy' parameter
> > passing
> >ABI changes in -fabi-version=12 (GCC 8) [-Wabi]
> > 15 |   fun(d, f); // { dg-warning "empty" }
> >|   ~~~^~
> > 
> > check_bitfield_decl is called *after* that check, so the folding
> > there
> > doesn't help.

With the conversion of integer_zerop to look through location wrappers,
I removed the stripping from grokbitfield.  This turned up one other
site that uses DECL_BIT_FIELD_REPRESENTATIVE, in objc's
gen_declaration (for obj-c++), so the v5 patch looks through wrappers
there.

> > > > > > @@ -656,6 +656,9 @@ add_capture (tree lambda, tree id, tree
> > > > > > orig_init, bool by_reference_p,
> > > > > >  listmem = make_pack_expansion (member);
> > > > > >  initializer = orig_init;
> > > > > >}
> > > > > > +
> > > > > > +  STRIP_ANY_LOCATION_WRAPPER (initializer);
> > > > > 
> > > > > Why is this needed?  What cares about the tree codes of the
> > > > > capture
> > > > > initializer?
> > > > 
> > > > This is used to populate LAMBDA_EXPR_CAPTURE_LIST.  Without
> > > > stripping,
> > > > we end up with wrapped VAR_DECLs, rather than the VAR_DECLs
> > > > themselves,
> > > 
> > > Sure, that sounds fine.
> > > 
> > > > and this confuses things later on, for example leading to:
> > > > 
> > > >PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++14
> > > > (test for excess errors)
> > > >PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++17
> > > > (test for excess errors)
> > > 
> > > Confuses how?
> > 
> > Without stripping, we get extra errors for decltype() on
> > identifiers in
> > the capture-list, e.g.:
> > 
> >[i] {
> >  same_type();
> >  same_type();
> >

[PATCH, ARM] Fix PR77904 testcase failure

2018-12-14 Thread Thomas Preudhomme
Hi,

Commit r242693 forced fp to be saved/restored when needed due to an
instance of GCC using fp as a scratch register to save sp while it's
being clobbered by an inline asm. The normal path in
thumb1_compute_save_reg_mask saving callee-saved registers which are
live in the function does not work in that case because fp is chosen to
hold sp after that function is called.

Since clobbering sp is now errored out by the compiler and this was the
only case reported where fp was live but not marked as such when
thumb1_compute_save_reg_mask is called, I believe the whole commit
r242693 should be reverted.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-12-14  Thomas Preud'homme  

Revert:
2016-11-22  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): Mark frame pointer
in save register mask if it is needed.

*** gcc/testsuite/ChangeLog ***

2018-12-14  Thomas Preud'homme  

Revert:
2016-11-22  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.

Testing: Built an arm-none-eabi GCC cross-compiler targeting Armv6S-M
and regression testsuite does not show any regression.

Ok for stage3?

Best regards,

Thomas
From 63c52e7bf932947be7122cdc63f6cdc913479259 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme 
Date: Fri, 14 Dec 2018 16:02:59 +
Subject: [PATCH] [PATCH, ARM] Fix PR77904 testcase failure

Hi,

Commit r242693 forced fp to be saved/restored when needed due to an
instance of GCC using fp as a scratch register to save sp while it's
being clobbered by an inline asm. The normal path in
thumb1_compute_save_reg_mask saving callee-saved registers which are
live in the function does not work in that case because fp is chosen to
hold sp after that function is called.

Since clobbering sp is now errored out by the compiler and this was the
only case reported where fp was live but not marked as such when
thumb1_compute_save_reg_mask is called, I believe the whole commit
r242693 should be reverted.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-12-14  Thomas Preud'homme  

Revert:
2016-11-22  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): Mark frame pointer
in save register mask if it is needed.

*** gcc/testsuite/ChangeLog ***

2018-12-14  Thomas Preud'homme  

Revert:
2016-11-22  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.

Testing: Built an arm-none-eabi GCC cross-compiler targeting Armv6S-M
and regression testsuite does not show any regression.

Ok for stage3?

Best regards,

Thomas
---
 gcc/ChangeLog  |  9 ++
 gcc/config/arm/arm.c   |  4 ---
 gcc/testsuite/ChangeLog|  8 +
 gcc/testsuite/gcc.target/arm/pr77904.c | 45 --
 4 files changed, 17 insertions(+), 49 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/arm/pr77904.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d8e374fb15f..9caeb1d5e18 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2018-12-14  Thomas Preud'homme  
+
+	Revert:
+	2016-11-22  Thomas Preud'homme  
+
+	PR target/77904
+	* config/arm/arm.c (thumb1_compute_save_reg_mask): Mark frame pointer
+	in save register mask if it is needed.
+
 2018-11-27  Alan Modra  
 
 	* config/rs6000/aix71.h (ASM_SPEC): Don't select default -maix64
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 40f0574e32e..2ab5d8abc33 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19553,10 +19553,6 @@ thumb1_compute_save_core_reg_mask (void)
 if (df_regs_ever_live_p (reg) && callee_saved_reg_p (reg))
   mask |= 1 << reg;
 
-  /* Handle the frame pointer as a special case.  */
-  if (frame_pointer_needed)
-mask |= 1 << HARD_FRAME_POINTER_REGNUM;
-
   if (flag_pic
   && !TARGET_SINGLE_PIC_BASE
   && arm_pic_register != INVALID_REGNUM
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9e1f6d05a45..4e58c8940da 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2018-12-14  Thomas Preud'homme  
+
+	Revert:
+	2016-11-22  Thomas Preud'homme  
+
+	PR target/77904
+	* gcc.target/arm/pr77904.c: New test.
+
 2018-11-27  Jozef Lawrynowicz  
 
 	* lib/target-supports.exp
diff --git a/gcc/testsuite/gcc.target/arm/pr77904.c b/gcc/testsuite/gcc.target/arm/pr77904.c
deleted file mode 100644
index 76728c07e73..000
--- a/gcc/testsuite/gcc.target/arm/pr77904.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O2" } */
-
-__attribute__ ((noinline, noclone)) void
-clobber_sp (void)
-{
-  __asm volatile ("" : : : "sp");
-}
-
-int
-main (void)
-{
-  int ret;
-
-  __asm volatile ("mov\tr4, #0xf4\n\t"
-		  "mov\tr5, #0xf5\n\t"
-		  "mov\tr6, #0xf6\n\t"
-		  "mov\tr7, #0xf7\n\t"
-		  "mov\tr0, #0xf8\n\t"
-		  "mov\tr8, r0\n\t"
-		  "mov\tr0, #0xfa\n\t"
-		  "mov\tr10, r0"
-		  

Re: [PATCH] x86: Don't use get_frame_size to finalize stack frame

2018-12-14 Thread Jeff Law
On 12/14/18 4:01 PM, H.J. Lu wrote:
> On Thu, Dec 13, 2018 at 11:11 PM Uros Bizjak  wrote:
>> On Thu, Dec 13, 2018 at 6:36 PM H.J. Lu  wrote:
>>> get_frame_size () returns used stack slots during compilation, which
>>> may be optimized out later.  Since ix86_find_max_used_stack_alignment
>>> is called by ix86_finalize_stack_frame_flags to check if stack frame
>>> is required, there is no need to call get_frame_size () which may give
>>> inaccurate final stack frame size.
>>>
>>> Tested on AVX512 machine configured with
>>>
>>> --with-arch=native --with-cpu=native
>>>
>>> OK for trunk?
>>>
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> PR target/88483
>>> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
>>> use get_frame_size ().
>>>
>>> gcc/testsuite/
>>>
>>> PR target/88483
>>> * gcc.target/i386/stackalign/pr88483.c: New test.
>> LGTM, but you know this part of the compiler better than I.
>>
>> Thanks,
>> Uros.
>>
>>> ---
>>>  gcc/config/i386/i386.c  |  1 -
>>>  .../gcc.target/i386/stackalign/pr88483.c| 17 +
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index caa701fe242..edc8f4f092e 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -12876,7 +12876,6 @@ ix86_finalize_stack_frame_flags (void)
>>>&& flag_exceptions
>>>&& cfun->can_throw_non_call_exceptions)
>>>&& !ix86_frame_pointer_required ()
>>> -  && get_frame_size () == 0
>>>&& ix86_nsaved_sseregs () == 0
>>>&& ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
>>>  {
>>> diff --git a/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c 
>>> b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
>>> new file mode 100644
>>> index 000..5aec8fd4cf6
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
>>> @@ -0,0 +1,17 @@
>>> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
>>> +/* { dg-options "-O2 -mavx2" } */
>>> +
>>> +struct B
>>> +{
>>> +  char a[12];
>>> +  int b;
>>> +};
>>> +
>>> +struct B
>>> +f2 (void)
>>> +{
>>> +  struct B x = {};
>>> +  return x;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-not 
>>> "and\[lq\]?\[^\\n\]*-\[0-9\]+,\[^\\n\]*sp" } } */
>>> --
>>> 2.19.2
>>>
> My fix triggered a latent bug in ix86_find_max_used_stack_alignment.
> Here is the fix.  OK for trunk?
> 
> Thanks.
> 
> -- H.J.
> 
> 
> 0001-x86-Properly-check-stack-reference.patch
> 
> From 83f0b37f287ed198a3b50e2be6b0f7f5c154020e Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Fri, 14 Dec 2018 12:21:02 -0800
> Subject: [PATCH] x86: Properly check stack reference
> 
> A latent bug in ix86_find_max_used_stack_alignment was uncovered by the
> fix for PR target/88483, which caused:
> 
> FAIL: gcc.target/i386/incoming-8.c scan-assembler andl[\\t ]*\\$-16,[\\t 
> ]*%esp
> 
> on i386.  ix86_find_max_used_stack_alignment failed to notice stack
> reference via non-stack/frame registers and missed stack alignment
> requirement.  We should track all registers which may reference stack
> by checking registers set from stack referencing registers.
> 
> Tested on i686 and x86-64 with
> 
> --with-arch=native --with-cpu=native
> 
> on AVX512 machine.  Tested on i686 and x86-64 without
> 
> --with-arch=native --with-cpu=native
> 
> on x86-64 machine.
> 
>   PR target/88483
>   * config/i386/i386.c (ix86_stack_referenced_p): New function.
>   (ix86_find_max_used_stack_alignment): Call ix86_stack_referenced_p
>   to check if stack is referenced.
> ---
>  gcc/config/i386/i386.c | 43 ++
>  1 file changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 4599ca2a7d5..bf93ec3722f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -12777,6 +12777,18 @@ output_probe_stack_range (rtx reg, rtx end)
>return "";
>  }
>  
> +/* Return true if OP references stack frame though one of registers
> +   in STACK_REF_REGS.  */
> +
> +static bool
> +ix86_stack_referenced_p (const_rtx op, rtx *stack_ref_regs)
> +{
> +  for (int i = 0; i < LAST_REX_INT_REG; i++)
> +if (stack_ref_regs[i] && reg_mentioned_p (stack_ref_regs[i], op))
> +  return true;
> +  return false;
> +}
> +
>  /* Return true if stack frame is required.  Update STACK_ALIGNMENT
> to the largest alignment, in bits, of stack slot used if stack
> frame is required and CHECK_STACK_SLOT is true.  */
> @@ -12801,6 +12813,12 @@ ix86_find_max_used_stack_alignment (unsigned int 
> _alignment,
>  
>bool require_stack_frame = false;
>  
> +  /* Array of hard registers which reference stack frame.  */
> +  rtx stack_ref_regs[LAST_REX_INT_REG];
> +  memset (stack_ref_regs, 0, sizeof (stack_ref_regs));
> +  

[committed] Fix valgrind error in cselib_record_sets (PR rtl-optimization/88478)

2018-12-14 Thread Jakub Jelinek
Hi!

On Wed, Dec 05, 2018 at 04:50:19AM -0200, Alexandre Oliva wrote:
>   * cselib.c (cselib_record_sets): Skip strict low part sets
>   with NULL src_elt.
> ---
>  gcc/cselib.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/cselib.c b/gcc/cselib.c
> index 6d3a4078c689..4a68439455fd 100644
> --- a/gcc/cselib.c
> +++ b/gcc/cselib.c
> @@ -2616,6 +2616,7 @@ cselib_record_sets (rtx_insn *insn)
>preserves the upper bits that di:SI=zero_extend(flags:CCNO<=0).  */
>scalar_int_mode mode;
>if (dest != orig
> +   && sets[i].src_elt
> && cselib_record_sets_hook
> && REG_P (dest)
> && HARD_REGISTER_P (dest)

This regresses following testcase under valgrind on x86_64-linux.

The problem is that sets[i].src_elt is only conditionally initialized before
this:

  /* We don't know how to record anything but REG or MEM.  */
  if (REG_P (dest)
  || (MEM_P (dest) && cselib_record_memory))
{
  rtx src = sets[i].src;
  if (cond)
src = gen_rtx_IF_THEN_ELSE (GET_MODE (dest), cond, src, dest);
  sets[i].src_elt = cselib_lookup (src, GET_MODE (dest), 1, VOIDmode);
...
}

otherwise it is uninitialized.  So, we need to test it after REG_P (dest)
two lines after it.

Tested on x86_64-linux, committed to trunk as obvious.

2018-12-15  Jakub Jelinek  

PR rtl-optimization/88478
* cselib.c (cselib_record_sets): Move sets[i].src_elt tests
after REG_P (dest) test.

* g++.dg/opt/pr88478.C: New test.

--- gcc/cselib.c.jj 2018-12-07 00:23:15.722987285 +0100
+++ gcc/cselib.c2018-12-15 00:10:16.77976 +0100
@@ -2616,10 +2616,10 @@ cselib_record_sets (rtx_insn *insn)
 preserves the upper bits that di:SI=zero_extend(flags:CCNO<=0).  */
   scalar_int_mode mode;
   if (dest != orig
- && sets[i].src_elt
  && cselib_record_sets_hook
  && REG_P (dest)
  && HARD_REGISTER_P (dest)
+ && sets[i].src_elt
  && is_a  (GET_MODE (dest), )
  && n_sets + n_strict_low_parts < MAX_SETS)
{
--- gcc/testsuite/g++.dg/opt/pr88478.C.jj   2018-12-15 00:14:14.427927166 
+0100
+++ gcc/testsuite/g++.dg/opt/pr88478.C  2018-12-15 00:12:20.762761443 +0100
@@ -0,0 +1,17 @@
+// PR rtl-optimization/88478
+// { dg-do compile }
+// { dg-options "-O2" }
+
+struct A {
+  bool b;
+  int s;
+  template 
+  A (T, U) {}
+};
+enum F {} f;
+
+A
+foo ()
+{
+  return A (false, f);
+}


Jakub


Re: [PATCH] Fix avx512f_sfixupimm* (PR target/88489)

2018-12-14 Thread Jeff Law
On 12/14/18 12:50 PM, Jakub Jelinek wrote:
> Hi!
> 
> The avx512f-vfixupimms{s,d}-2.c testcases were miscompiled with -mavx512vl.
> The problem is that there are separate instructions (e.g. vfixupimmsd vs.
> vfixupimmpd), each of those have different behavior, the first one is
> TARGET_AVX512F, the latter TARGET_AVX512VL, but the 128-bit version of the
> latter used identical RTL pattern to the vfixupimmsd instruction and was
> defined earlier.
> 
> Fixed by using different UNSPEC number for the scalar vs. vector ones.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2018-12-14  Jakub Jelinek  
> 
>   PR target/88489
>   * config/i386/sse.md (UNSPEC_SFIXUPIMM): New unspec enumerator.
>   (avx512f_sfixupimm): Use it
>   instead of UNSPEC_FIXUPIMM.
> 
>   * gcc.target/i386/avx512vl-vfixupimmsd-2.c: New test.
>   * gcc.target/i386/avx512vl-vfixupimmss-2.c: New test.
OK
jeff


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Alexandre Oliva
On Dec 14, 2018, Jason Merrill  wrote:

> Let's move the initialization of "fields" inside the 'then' block here
> with the initialization of "cvquals", rather than clear it in the
> 'else'.

We'd still have to NULL-initialize it somewhere, so I'd rather just move
the entire loop into the conditional, and narrow the scope of variables
only used within the loop, like this.  The full patch below is very hard
to read because of the reindentation, so here's a diff -b.

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index fd023e200538..17404a65b0fd 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -675,12 +675,9 @@ do_build_copy_constructor (tree fndecl)
 }
   else
 {
-  tree fields = TYPE_FIELDS (current_class_type);
   tree member_init_list = NULL_TREE;
-  int cvquals = cp_type_quals (TREE_TYPE (parm));
   int i;
   tree binfo, base_binfo;
-  tree init;
   vec *vbases;
 
   /* Initialize all the base-classes with the parameter converted
@@ -704,15 +701,18 @@ do_build_copy_constructor (tree fndecl)
inh, member_init_list);
}
 
-  for (; fields; fields = DECL_CHAIN (fields))
+  if (!inh)
+   {
+ int cvquals = cp_type_quals (TREE_TYPE (parm));
+
+ for (tree fields = TYPE_FIELDS (current_class_type);
+  fields; fields = DECL_CHAIN (fields))
{
  tree field = fields;
  tree expr_type;
 
  if (TREE_CODE (field) != FIELD_DECL)
continue;
- if (inh)
-   continue;
 
  expr_type = TREE_TYPE (field);
  if (DECL_NAME (field))
@@ -742,7 +742,7 @@ do_build_copy_constructor (tree fndecl)
  expr_type = cp_build_qualified_type (expr_type, quals);
}
 
- init = build3 (COMPONENT_REF, expr_type, parm, field, NULL_TREE);
+ tree init = build3 (COMPONENT_REF, expr_type, parm, field, 
NULL_TREE);
  if (move_p && !TYPE_REF_P (expr_type)
  /* 'move' breaks bit-fields, and has no effect for scalars.  
*/
  && !scalarish_type_p (expr_type))
@@ -751,6 +751,8 @@ do_build_copy_constructor (tree fndecl)
 
  member_init_list = tree_cons (field, init, member_init_list);
}
+   }
+
   finish_mem_initializers (member_init_list);
 }
 }
@@ -891,6 +893,7 @@ synthesize_method (tree fndecl)
 
   /* Reset the source location, we might have been previously
  deferred, and thus have saved where we were first needed.  */
+  if (!DECL_INHERITED_CTOR (fndecl))
 DECL_SOURCE_LOCATION (fndecl)
   = DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT (fndecl)));
 

Is this OK too?  (pending regstrapping)




[PR c++/88146] do not crash synthesizing inherited ctor(...)

This patch started out from the testcase in PR88146, that attempted to
synthesize an inherited ctor without any args before a varargs
ellipsis and crashed while at that, because of the unguarded
dereferencing of the parm type list, that usually contains a
terminator.  The terminator is not there for varargs functions,
however, and without any other args, we ended up dereferencing a NULL
pointer.  Oops.

Guarding accesses to parm would be easy, but not necessary.  In
do_build_copy_constructor, non-inherited ctors are copy-ctors, that
always have at least one parm, so parm needs not be guarded when we
know the access will only take place when we're dealing with an
inherited ctor.  The only other problematic use was in the cvquals
initializer, a variable only used in a loop over fields, that we
skipped individually in inherited ctors.  I've guarded the cvquals
initialization and the entire loop over fields so they only run for
copy-ctors.

Avoiding the crash from unguarded accesses was easy, but I thought we
should still produce the sorry message we got in other testcases that
passed arguments through the ellipsis in inherited ctors.  I put a
check in, and noticed the inherited ctors were synthesized with the
location assigned to the class name, although they were initially
assigned the location of the using declaration.  I decided the latter
was better, and arranged for the better location to be retained.

Further investigation revealed the lack of a sorry message had to do
with the call being in a non-evaluated context, in this case, a
noexcept expression.  The sorry would be correctly reported in other
contexts, so I rolled back the check I'd added, but retained the
source location improvement.

I was still concerned about issuing sorry messages while instantiating
template ctors even in non-evaluated contexts, e.g., if a template
ctor had a base initializer that used an inherited ctor with enough
arguments that they'd go through an ellipsis.  I wanted to defer the
instantiation of such template ctors, but that would have been wrong
for constexpr template ctors, and already done for non-constexpr ones.
So, I just 

Re: [PATCH] x86: Don't use get_frame_size to finalize stack frame

2018-12-14 Thread H.J. Lu
On Thu, Dec 13, 2018 at 11:11 PM Uros Bizjak  wrote:
>
> On Thu, Dec 13, 2018 at 6:36 PM H.J. Lu  wrote:
> >
> > get_frame_size () returns used stack slots during compilation, which
> > may be optimized out later.  Since ix86_find_max_used_stack_alignment
> > is called by ix86_finalize_stack_frame_flags to check if stack frame
> > is required, there is no need to call get_frame_size () which may give
> > inaccurate final stack frame size.
> >
> > Tested on AVX512 machine configured with
> >
> > --with-arch=native --with-cpu=native
> >
> > OK for trunk?
> >
> >
> > H.J.
> > ---
> > gcc/
> >
> > PR target/88483
> > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> > use get_frame_size ().
> >
> > gcc/testsuite/
> >
> > PR target/88483
> > * gcc.target/i386/stackalign/pr88483.c: New test.
>
> LGTM, but you know this part of the compiler better than I.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.c  |  1 -
> >  .../gcc.target/i386/stackalign/pr88483.c| 17 +
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index caa701fe242..edc8f4f092e 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -12876,7 +12876,6 @@ ix86_finalize_stack_frame_flags (void)
> >&& flag_exceptions
> >&& cfun->can_throw_non_call_exceptions)
> >&& !ix86_frame_pointer_required ()
> > -  && get_frame_size () == 0
> >&& ix86_nsaved_sseregs () == 0
> >&& ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c 
> > b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> > new file mode 100644
> > index 000..5aec8fd4cf6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +
> > +struct B
> > +{
> > +  char a[12];
> > +  int b;
> > +};
> > +
> > +struct B
> > +f2 (void)
> > +{
> > +  struct B x = {};
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not 
> > "and\[lq\]?\[^\\n\]*-\[0-9\]+,\[^\\n\]*sp" } } */
> > --
> > 2.19.2
> >

My fix triggered a latent bug in ix86_find_max_used_stack_alignment.
Here is the fix.  OK for trunk?

Thanks.

-- 
H.J.
From 83f0b37f287ed198a3b50e2be6b0f7f5c154020e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 14 Dec 2018 12:21:02 -0800
Subject: [PATCH] x86: Properly check stack reference

A latent bug in ix86_find_max_used_stack_alignment was uncovered by the
fix for PR target/88483, which caused:

FAIL: gcc.target/i386/incoming-8.c scan-assembler andl[\\t ]*\\$-16,[\\t ]*%esp

on i386.  ix86_find_max_used_stack_alignment failed to notice stack
reference via non-stack/frame registers and missed stack alignment
requirement.  We should track all registers which may reference stack
by checking registers set from stack referencing registers.

Tested on i686 and x86-64 with

--with-arch=native --with-cpu=native

on AVX512 machine.  Tested on i686 and x86-64 without

--with-arch=native --with-cpu=native

on x86-64 machine.

	PR target/88483
	* config/i386/i386.c (ix86_stack_referenced_p): New function.
	(ix86_find_max_used_stack_alignment): Call ix86_stack_referenced_p
	to check if stack is referenced.
---
 gcc/config/i386/i386.c | 43 ++
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4599ca2a7d5..bf93ec3722f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12777,6 +12777,18 @@ output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
+/* Return true if OP references stack frame though one of registers
+   in STACK_REF_REGS.  */
+
+static bool
+ix86_stack_referenced_p (const_rtx op, rtx *stack_ref_regs)
+{
+  for (int i = 0; i < LAST_REX_INT_REG; i++)
+if (stack_ref_regs[i] && reg_mentioned_p (stack_ref_regs[i], op))
+  return true;
+  return false;
+}
+
 /* Return true if stack frame is required.  Update STACK_ALIGNMENT
to the largest alignment, in bits, of stack slot used if stack
frame is required and CHECK_STACK_SLOT is true.  */
@@ -12801,6 +12813,12 @@ ix86_find_max_used_stack_alignment (unsigned int _alignment,
 
   bool require_stack_frame = false;
 
+  /* Array of hard registers which reference stack frame.  */
+  rtx stack_ref_regs[LAST_REX_INT_REG];
+  memset (stack_ref_regs, 0, sizeof (stack_ref_regs));
+  stack_ref_regs[STACK_POINTER_REGNUM] = stack_pointer_rtx;
+  stack_ref_regs[FRAME_POINTER_REGNUM] = frame_pointer_rtx;
+
   FOR_EACH_BB_FN (bb, cfun)
 {
   rtx_insn *insn;
@@ -12811,16 +12829,33 @@ ix86_find_max_used_stack_alignment (unsigned int _alignment,
 	  {
 	 

V6 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-14 Thread H.J. Lu
On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:
>
> On 12/13/18 6:56 PM, H.J. Lu wrote:
> > On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:
> >>
> >> On 9/25/18 11:46 AM, H.J. Lu wrote:
> >>> On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:
>  On 07/23/2018 05:24 PM, H.J. Lu wrote:
> >
> > On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> > wrote:
> >>
> >> On Mon, 18 Jun 2018, Jason Merrill wrote:
> >>
> >>> On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> >>> 
> >>> wrote:
> 
>  On Mon, 18 Jun 2018, Jason Merrill wrote:
> 
> >> +  if (TREE_CODE (rhs) == COND_EXPR)
> >> +{
> >> +  /* Check the THEN path first.  */
> >> +  tree op1 = TREE_OPERAND (rhs, 1);
> >> +  context = check_address_of_packed_member (type, op1);
> >
> >
> > This should handle the GNU extension of re-using operand 0 if 
> > operand
> > 1 is omitted.
> 
> 
>  Doesn't that just use a SAVE_EXPR?
> >>>
> >>>
> >>> Hmm, I suppose it does, but many places in the compiler seem to expect
> >>> that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.
> >>
> >>
> >> Maybe that's used somewhere inside the C++ front end.  For C a 
> >> SAVE_EXPR
> >> is produced directly.
> >
> >
> > Here is the updated patch.  Changes from the last one:
> >
> > 1. Handle COMPOUND_EXPR.
> > 2. Fixed typos in comments.
> > 3. Combined warn_for_pointer_of_packed_member and
> > warn_for_address_of_packed_member into
> > warn_for_address_or_pointer_of_packed_member.
> 
> 
> > c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases 
> > the
> > alignment of ‘long int *’ pointer from 1 to 8 
> > [-Waddress-of-packed-member]
> 
> 
>  I think this would read better as
> 
>  c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 
>  1) to
>  ‘long int *’ (alignment 8) may result in an unaligned pointer value
>  [-Waddress-of-packed-member]
> >>>
> >>> Fixed.
> >>>
> > +  while (TREE_CODE (base) == ARRAY_REF)
> > +   base = TREE_OPERAND (base, 0);
> > +  if (TREE_CODE (base) != COMPONENT_REF)
> > +   return NULL_TREE;
> 
> 
>  Are you deliberately not handling the other handled_component_p cases? If
>  so, there should be a comment.
> >>>
> >>> I changed it to
> >>>
> >>>while (handled_component_p (base))
> >>>   {
> >>> enum tree_code code = TREE_CODE (base);
> >>> if (code == COMPONENT_REF)
> >>>   break;
> >>> switch (code)
> >>>   {
> >>>   case ARRAY_REF:
> >>> base = TREE_OPERAND (base, 0);
> >>> break;
> >>>   default:
> >>> /* FIXME: Can it ever happen?  */
> >>> gcc_unreachable ();
> >>> break;
> >>>   }
> >>>   }
> >>>
> >>> Is there a testcase to trigger this ICE? I couldn't find one.
> >>
> >> You can take the address of an element of complex:
> >>
> >> __complex int i;
> >> int *p = &__real(i);
> >>
> >> You may get VIEW_CONVERT_EXPR with location wrappers.
> >
> > Fixed.  I replaced gcc_unreachable with return NULL_TREE;
>
> Then we're back to my earlier question: are you deliberately not
> handling the other cases?  Why not look through them as well?  What if
> e.g. the operand of __real is a packed field?
>

Here is the updated patch with

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index 615134cfdac..f105742598e 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -2669,6 +2669,9 @@ check_address_of_packed_member (tree type, tree rhs)
switch (code)
  {
  case ARRAY_REF:
+ case REALPART_EXPR:
+ case IMAGPART_EXPR:
+ case VIEW_CONVERT_EXPR:
base = TREE_OPERAND (base, 0);
break;
  default:

Now I got

[hjl@gnu-cfl-1 pr51628-6]$ cat foo.i
struct A { __complex int i; };
struct B { struct A a; };
struct C { struct B b __attribute__ ((packed)); };

extern struct C *p;

int*
foo1 (void)
{
  return &__real(p->b.a.i);
}
int*
foo2 (void)
{
  return &__imag(p->b.a.i);
}
[hjl@gnu-cfl-1 pr51628-6]$ make foo.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S foo.i
foo.i: In function ‘foo1’:
foo.i:10:10: warning: taking address of packed member of ‘struct C’
may result in an unaligned pointer value [-Waddress-of-packed-member]
   10 |   return &__real(p->b.a.i);
  |  ^
foo.i: In function ‘foo2’:
foo.i:15:10: warning: taking address of packed member of ‘struct C’
may result in an unaligned pointer value [-Waddress-of-packed-member]
   15 |   return &__imag(p->b.a.i);
  

Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Jason Merrill

On 12/14/18 5:33 PM, Alexandre Oliva wrote:

On Dec 14, 2018, Jason Merrill  wrote:


If inh is false, we're a copy constructor, which always has a parm,
so this hunk seems unnecessary.


ack


-  int cvquals = cp_type_quals (TREE_TYPE (parm));
+  int cvquals = parm ? cp_type_quals (TREE_TYPE (parm)) : 0;


This could also check !inh.


*nod*


And in the existing code, while I'm looking at it:



The "if (inh) continue" is odd, there's no reason to iterate through
the fields ignoring all of them when we could skip the loop entirely.


Heh, funny, an earlier version of the patch that added an if (inh) to
print an error on zero-args had an 'else fields = NULL;'.  That
improvement went away along with my course change.  But look!, it's back
in the version below ;-)

Testing...  Ok to install if it passes?


[PR c++/88146] do not crash synthesizing inherited ctor(...)

This patch started out from the testcase in PR88146, that attempted to
synthesize an inherited ctor without any args before a varargs
ellipsis and crashed while at that, because of the unguarded
dereferencing of the parm type list, that usually contains a
terminator.  The terminator is not there for varargs functions,
however, and without any other args, we ended up dereferencing a NULL
pointer.  Oops.

Guarding accesses to parm would be easy, but not necessary.  In
do_build_copy_constructor, non-inherited ctors are copy-ctors, that
always have at least one parm, so parm needs not be guarded when we
know the access will only take place when we're dealing with an
inherited ctor.  The only other problematic use was in the cvquals
initializer, a variable only used in a loop over fields, that we
skipped individually in inherited ctors.  I've arranged to skip the
entire loop over fields for inherited ctors, and to only initialize
cvquals otherwise.

Avoiding the crash from unguarded accesses was easy, but I thought we
should still produce the sorry message we got in other testcases that
passed arguments through the ellipsis in inherited ctors.  I put a
check in, and noticed the inherited ctors were synthesized with the
location assigned to the class name, although they were initially
assigned the location of the using declaration.  I decided the latter
was better, and arranged for the better location to be retained.

Further investigation revealed the lack of a sorry message had to do
with the call being in a non-evaluated context, in this case, a
noexcept expression.  The sorry would be correctly reported in other
contexts, so I rolled back the check I'd added, but retained the
source location improvement.

I was still concerned about issuing sorry messages while instantiating
template ctors even in non-evaluated contexts, e.g., if a template
ctor had a base initializer that used an inherited ctor with enough
arguments that they'd go through an ellipsis.  I wanted to defer the
instantiation of such template ctors, but that would have been wrong
for constexpr template ctors, and already done for non-constexpr ones.
So, I just consolidated multiple test variants into a single testcase
that explores and explains various of the possibilities I thought of.


for  gcc/cp/ChangeLog

PR c++/88146
* method.c (do_build_copy_constructor): Skip iteration over
fields for inherited ctors, and initialize cvquals otherwise.
(synthesize_method): Retain location of inherited ctor.

for  gcc/testsuite/ChangeLog

PR c++/88146
* g++.dg/cpp0x/inh-ctor32.C: New.
---
  gcc/cp/method.c |   14 +-
  gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C |  229 +++
  2 files changed, 238 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index fd023e200538..4cbdadbe3d26 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -677,7 +677,7 @@ do_build_copy_constructor (tree fndecl)
  {
tree fields = TYPE_FIELDS (current_class_type);
tree member_init_list = NULL_TREE;
-  int cvquals = cp_type_quals (TREE_TYPE (parm));
+  int cvquals;
int i;
tree binfo, base_binfo;
tree init;
@@ -704,6 +704,11 @@ do_build_copy_constructor (tree fndecl)
inh, member_init_list);
}
  
+  if (!inh)

+   cvquals = cp_type_quals (TREE_TYPE (parm));
+  else
+   fields = NULL;


Let's move the initialization of "fields" inside the 'then' block here 
with the initialization of "cvquals", rather than clear it in the 
'else'.  OK with that change.


Jason


Re: [PATCH] Improve gimplification of constructors with RANGE_EXPRs (PR c++/82294, PR c++/87436)

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 12:22:28PM +0100, Richard Biener wrote:
> > Still need to wait for the FE patch if I want to commit the testcases, those
> > depend on both patches.
> > I've added size32plus effective target to the larger test, as 384MB is too
> > much for 16 or 20 bit address targets.
> > And, I'll gather statistics on how often this makes a difference during
> > gimplification during my next bootstraps/regtests.

Besides those 2 new testcases it only made a difference on:
g++.dg/torture/pr60746.C
which has
{[0 ... 4]={[0 ... 1]={._vptr.One=&_ZTV3One + 16}}}
ctor (80 bytes), but only at -O0, so comparing the assembly in that case
is kind of pointless.  If I modify the testcase to pass list_arry address
to external function and compile with -O2, the resulting assembly is
identical before/after this commit, as we inline the copy from the constant
ctor and unroll the loops.
If I modify it to:
class One
{
public:
  virtual unsigned long getSize () const;
};

class Two
{
  virtual int run ();
};

void bar (One (*)[2]);

int
Two::run ()
{
  One list_arry[1000][2];
  bar (list_arry);
  return 0;
}
it is similar difference to the other testcases, either 16000 bytes
long .rodata initializer with memcpy from it, or a small loop.

Jakub


Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-14 Thread Jason Merrill
On Fri, Dec 7, 2018 at 4:00 PM Jason Merrill  wrote:
> On 12/7/18 12:48 PM, Tom Tromey wrote:
> >> "Pedro" == Pedro Alves  writes:
> >
> > Pedro> I would say that it's very, very unlikely, and not worth it of the
> > Pedro> maintenance burden.
> >
> > Agreed, and especially true for the more unusual demanglings like Lucid
> > or EDG.
> >
> > On the gdb side perhaps we can get rid of "demangle-style" now.  It
> > probably hasn't worked properly in years, and after this it would be
> > guaranteed not to.
>
> So, here's the patch to tear out the old code, which passes the GCC
> regression testsuite.  I also tried building binutils/gdb with it, and
> both will need to remove code that calls cplus_mangle_opname for dealing
> with the old mangling scheme.

GDB/binutils folks, how do you want to handle this? Shall I go ahead
with this patch, with the understanding that there will be associated
changes necessary when merging it into the binutils-gdb repository, or
go with the small disabling patch to start with?

Jason


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Alexandre Oliva
On Dec 14, 2018, Jason Merrill  wrote:

>> If inh is false, we're a copy constructor, which always has a parm,
>> so this hunk seems unnecessary.

ack

>>> -  int cvquals = cp_type_quals (TREE_TYPE (parm));
>>> +  int cvquals = parm ? cp_type_quals (TREE_TYPE (parm)) : 0;
>> 
>> This could also check !inh.

*nod*

> And in the existing code, while I'm looking at it:

> The "if (inh) continue" is odd, there's no reason to iterate through
> the fields ignoring all of them when we could skip the loop entirely.

Heh, funny, an earlier version of the patch that added an if (inh) to
print an error on zero-args had an 'else fields = NULL;'.  That
improvement went away along with my course change.  But look!, it's back
in the version below ;-)

Testing...  Ok to install if it passes?


[PR c++/88146] do not crash synthesizing inherited ctor(...)

This patch started out from the testcase in PR88146, that attempted to
synthesize an inherited ctor without any args before a varargs
ellipsis and crashed while at that, because of the unguarded
dereferencing of the parm type list, that usually contains a
terminator.  The terminator is not there for varargs functions,
however, and without any other args, we ended up dereferencing a NULL
pointer.  Oops.

Guarding accesses to parm would be easy, but not necessary.  In
do_build_copy_constructor, non-inherited ctors are copy-ctors, that
always have at least one parm, so parm needs not be guarded when we
know the access will only take place when we're dealing with an
inherited ctor.  The only other problematic use was in the cvquals
initializer, a variable only used in a loop over fields, that we
skipped individually in inherited ctors.  I've arranged to skip the
entire loop over fields for inherited ctors, and to only initialize
cvquals otherwise.

Avoiding the crash from unguarded accesses was easy, but I thought we
should still produce the sorry message we got in other testcases that
passed arguments through the ellipsis in inherited ctors.  I put a
check in, and noticed the inherited ctors were synthesized with the
location assigned to the class name, although they were initially
assigned the location of the using declaration.  I decided the latter
was better, and arranged for the better location to be retained.

Further investigation revealed the lack of a sorry message had to do
with the call being in a non-evaluated context, in this case, a
noexcept expression.  The sorry would be correctly reported in other
contexts, so I rolled back the check I'd added, but retained the
source location improvement.

I was still concerned about issuing sorry messages while instantiating
template ctors even in non-evaluated contexts, e.g., if a template
ctor had a base initializer that used an inherited ctor with enough
arguments that they'd go through an ellipsis.  I wanted to defer the
instantiation of such template ctors, but that would have been wrong
for constexpr template ctors, and already done for non-constexpr ones.
So, I just consolidated multiple test variants into a single testcase
that explores and explains various of the possibilities I thought of.


for  gcc/cp/ChangeLog

PR c++/88146
* method.c (do_build_copy_constructor): Skip iteration over
fields for inherited ctors, and initialize cvquals otherwise.
(synthesize_method): Retain location of inherited ctor.

for  gcc/testsuite/ChangeLog

PR c++/88146
* g++.dg/cpp0x/inh-ctor32.C: New.
---
 gcc/cp/method.c |   14 +-
 gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C |  229 +++
 2 files changed, 238 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index fd023e200538..4cbdadbe3d26 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -677,7 +677,7 @@ do_build_copy_constructor (tree fndecl)
 {
   tree fields = TYPE_FIELDS (current_class_type);
   tree member_init_list = NULL_TREE;
-  int cvquals = cp_type_quals (TREE_TYPE (parm));
+  int cvquals;
   int i;
   tree binfo, base_binfo;
   tree init;
@@ -704,6 +704,11 @@ do_build_copy_constructor (tree fndecl)
inh, member_init_list);
}
 
+  if (!inh)
+   cvquals = cp_type_quals (TREE_TYPE (parm));
+  else
+   fields = NULL;
+
   for (; fields; fields = DECL_CHAIN (fields))
{
  tree field = fields;
@@ -711,8 +716,6 @@ do_build_copy_constructor (tree fndecl)
 
  if (TREE_CODE (field) != FIELD_DECL)
continue;
- if (inh)
-   continue;
 
  expr_type = TREE_TYPE (field);
  if (DECL_NAME (field))
@@ -891,8 +894,9 @@ synthesize_method (tree fndecl)
 
   /* Reset the source location, we might have been previously
  deferred, and thus have saved where we were first needed.  */
-  DECL_SOURCE_LOCATION (fndecl)
-= 

Re: [doc,committed] clarify docs for function attribute "const"

2018-12-14 Thread Martin Sebor

On 12/13/18 7:07 PM, Sandra Loosemore wrote:

On 12/13/18 3:52 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00426.html

(I have now committed the @code{const} cleanup mentioned below.)


This patch is OK except for one nit.


 @cindex pointer arguments
 Note that a function that has pointer arguments and examines the data
-pointed to must @emph{not} be declared @code{const}.  Likewise, a
-function that calls a non-@code{const} function usually must not be
-@code{const}.  Because a @code{const} function cannot have any side
-effects it does not make sense for such a function to return 
@code{void}.

-Declaring such a function is diagnosed.
+pointed to must @emph{not} be declared @code{const} if the pointed-to
+data might change between successive invocations of the function.  In
+general, since a function cannot distinguish data that might change
+from data that cannot, const functions should never be
+pass-by-reference (take pointer or, in C++, reference arguments).


I'd really rather avoid "pass-by-reference" since it adds nothing with 
the clarification about what it means there, too.  How about just:


...const functions should never take pointer or, in C++, reference 
arguments.


OK to commit with that fixed.


Committed in r267156.

Martin


Re: V5 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-14 Thread Jason Merrill

On 12/13/18 6:56 PM, H.J. Lu wrote:

On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:


On 9/25/18 11:46 AM, H.J. Lu wrote:

On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:

On 07/23/2018 05:24 PM, H.J. Lu wrote:


On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


+  if (TREE_CODE (rhs) == COND_EXPR)
+{
+  /* Check the THEN path first.  */
+  tree op1 = TREE_OPERAND (rhs, 1);
+  context = check_address_of_packed_member (type, op1);



This should handle the GNU extension of re-using operand 0 if operand
1 is omitted.



Doesn't that just use a SAVE_EXPR?



Hmm, I suppose it does, but many places in the compiler seem to expect
that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.



Maybe that's used somewhere inside the C++ front end.  For C a SAVE_EXPR
is produced directly.



Here is the updated patch.  Changes from the last one:

1. Handle COMPOUND_EXPR.
2. Fixed typos in comments.
3. Combined warn_for_pointer_of_packed_member and
warn_for_address_of_packed_member into
warn_for_address_or_pointer_of_packed_member.




c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases the
alignment of ‘long int *’ pointer from 1 to 8 [-Waddress-of-packed-member]



I think this would read better as

c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 1) to
‘long int *’ (alignment 8) may result in an unaligned pointer value
[-Waddress-of-packed-member]


Fixed.


+  while (TREE_CODE (base) == ARRAY_REF)
+   base = TREE_OPERAND (base, 0);
+  if (TREE_CODE (base) != COMPONENT_REF)
+   return NULL_TREE;



Are you deliberately not handling the other handled_component_p cases? If
so, there should be a comment.


I changed it to

   while (handled_component_p (base))
  {
enum tree_code code = TREE_CODE (base);
if (code == COMPONENT_REF)
  break;
switch (code)
  {
  case ARRAY_REF:
base = TREE_OPERAND (base, 0);
break;
  default:
/* FIXME: Can it ever happen?  */
gcc_unreachable ();
break;
  }
  }

Is there a testcase to trigger this ICE? I couldn't find one.


You can take the address of an element of complex:

__complex int i;
int *p = &__real(i);

You may get VIEW_CONVERT_EXPR with location wrappers.


Fixed.  I replaced gcc_unreachable with return NULL_TREE;


Then we're back to my earlier question: are you deliberately not 
handling the other cases?  Why not look through them as well?  What if 
e.g. the operand of __real is a packed field?


Jason


Re: [C++ PATCH] Sanity check __cxa_* user declarations (PR c++/88482)

2018-12-14 Thread Jason Merrill

On 12/13/18 6:29 PM, Jakub Jelinek wrote:

Hi!

If the user provides his own __cxa_* prototypes and does so incorrectly
(or even worse declares them as variables etc.), we can get various ICEs.

The following patch adds some sanity checking, mainly that they are actually
functions and with a compatible return type and argument type(s).
For __cxa_throw it gives some freedom for the second and third arguments,
but apparently not enough for libitm where it causes
+FAIL: libitm.c++/throwdown.C (test for excess errors)

This is because cxxabi.h has:
   void
   __cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
   __attribute__((__noreturn__));
and the patch checks that the second argument is a pointer (any kind) and
third parameter is a function pointer, but libitm.h has:
extern void _ITM_cxa_throw (void *obj, void *tinfo, void *dest);
and eh_cpp.cc has:
extern void __cxa_throw (void *, void *, void *) WEAK;
void
_ITM_cxa_throw (void *obj, void *tinfo, void *dest)
{
   // This used to be instrumented, but does not need to be anymore.
   __cxa_throw (obj, tinfo, dest);
}
Shall we fix libitm to use void (*dest) (void *) instead of void *dest,
or shall I make the verify_library_fn handle both i == 1 and i == 2
the same?


Fix libitm.  Do function pointers and object pointers have the same 
representation on all the targets we support?



Bootstrapped/regtested on x86_64-linux and i686-linux (with the above
mentioned FAIL), ok for trunk (and with what solution for libitm)?

2018-12-13  Jakub Jelinek  

PR c++/88482
* except.c (verify_library_fn): New function.
(declare_library_fn): Use it.
(do_end_catch): Don't set TREE_NOTHROW on error_mark_node.
(expand_start_catch_block): Don't call initialize_handler_parm
for error_mark_node.
(build_throw): Use verify_library_fn.  Don't crash if
any library fn is error_mark_node.

* g++.dg/eh/builtin5.C: New test.
* g++.dg/eh/builtin6.C: New test.
* g++.dg/eh/builtin7.C: New test.
* g++.dg/eh/builtin8.C: New test.
* g++.dg/eh/builtin9.C: New test.
* g++.dg/eh/builtin10.C: New test.
* g++.dg/eh/builtin11.C: New test.
* g++.dg/parse/crash55.C: Adjust expected diagnostics.

--- gcc/cp/except.c.jj  2018-12-07 00:23:15.008998854 +0100
+++ gcc/cp/except.c 2018-12-13 20:05:23.053122023 +0100
@@ -132,6 +132,49 @@ build_exc_ptr (void)
   1, integer_zero_node);
  }
  
+/* Check that user declared function FN is a function and has return

+   type RTYPE and argument types ARG{1,2,3}TYPE.  */
+
+static bool
+verify_library_fn (tree fn, const char *name, tree rtype,
+  tree arg1type, tree arg2type, tree arg3type)
+{


Do we want to skip all of this if DECL_ARTIFICIAL?


+  if (TREE_CODE (fn) != FUNCTION_DECL
+  || TREE_CODE (TREE_TYPE (fn)) != FUNCTION_TYPE)
+{
+  bad:
+  error_at (DECL_SOURCE_LOCATION (fn), "%qs declared incorrectly", name);
+  return false;
+}
+  tree fntype = TREE_TYPE (fn);
+  if (!same_type_p (TREE_TYPE (fntype), rtype))
+goto bad;
+  tree targs = TYPE_ARG_TYPES (fntype);
+  tree args[3] = { arg1type, arg2type, arg3type };
+  for (int i = 0; i < 3 && args[i]; i++)
+{
+  if (targs == NULL_TREE)
+   goto bad;
+  if (!same_type_p (TREE_VALUE (targs), args[i]))
+   {
+ if (i == 0)
+   goto bad;
+ /* Be less strict for __cxa_throw last 2 arguments.  */
+ if (i == 1 && TREE_CODE (TREE_VALUE (targs)) != POINTER_TYPE)
+   goto bad;
+ if (i == 2
+ && (TREE_CODE (TREE_VALUE (targs)) != POINTER_TYPE
+ || (TREE_CODE (TREE_TYPE (TREE_VALUE (targs)))
+ != FUNCTION_TYPE)))


These seem to assume that any library function with more than one 
parameter will have pointers for the second and third parameters.


TYPE_PTROBV_P might be useful.


+   goto bad;
+   }
+  targs = TREE_CHAIN (targs);
+}
+  if (targs != void_list_node)
+goto bad;
+  return true;
+}
+
  /* Find or declare a function NAME, returning RTYPE, taking a single
 parameter PTYPE, with an empty exception specification. ECF are the
 library fn flags.  If TM_ECF is non-zero, also find or create a
@@ -161,9 +204,16 @@ declare_library_fn (const char *name, tr
  tree tm_fn = get_global_binding (tm_ident);
  if (!tm_fn)
tm_fn = push_library_fn (tm_ident, type, except, ecf | tm_ecf);
- record_tm_replacement (res, tm_fn);
+ else if (!verify_library_fn (tm_fn, tm_name, rtype, ptype,
+  NULL_TREE, NULL_TREE))
+   tm_fn = error_mark_node;
+ if (tm_fn != error_mark_node)
+   record_tm_replacement (res, tm_fn);
}
  }
+  else if (!verify_library_fn (res, name, rtype, ptype, NULL_TREE, NULL_TREE))
+return error_mark_node;
+
return res;
  }
  
@@ -236,7 

Re: PING^2: [PATCH] i386; Add -mmanual-endbr and cf_check function attribute

2018-12-14 Thread H.J. Lu
On Fri, Dec 14, 2018 at 1:28 PM Jeff Law  wrote:
>
> On 12/11/18 9:03 AM, H.J. Lu wrote:
> > On Mon, Dec 3, 2018 at 5:45 AM H.J. Lu  wrote:
> >> On Mon, Jun 18, 2018 at 2:20 AM Richard Biener
> >>  wrote:
> >>> On Fri, Jun 15, 2018 at 2:59 PM H.J. Lu  wrote:
>  Currently GCC inserts ENDBR instruction at entries of all non-static
>  functions, unless LTO compilation is used.  Marking all functions,
>  which are not called indirectly with nocf_check attribute, is not
>  ideal since 99% of functions in a program may be of this kind.
> 
>  This patch adds -mmanual-endbr and cf_check function attribute.  They
>  can be used together with -fcf-protection such that ENDBR instruction
>  is inserted only at entries of functions with cf_check attribute.  It
>  can limit number of ENDBR instructions to reduce program size.
> 
>  OK for trubk?
> >>> I wonder if the linker could assist with ENDBR creation by
> >>> redirecting all non-direct call relocs to a linker-generated
> >>> stub with ENBR and a direct branch?
> >>>
> >> The goal of this patch is to add as few as ENDBR as possible
> >> to reduce program size as much as possible.   Also there is no
> >> relocation for indirect branch via register.
> >>
> > Hi Honza, Jakub, Jeff, Richard,
> >
> > Here is the rebased patch.  Can you guys take a look?
> >
> > Thanks.
> >
> >
> > -- H.J.
> >
> >
> > 0001-i386-Add-mmanual-endbr-and-cf_check-function-attribu.patch
> >
> > From 5934c6be6495b2d6f278646e25f9e684f6610e2b Mon Sep 17 00:00:00 2001
> > From: "H.J. Lu" 
> > Date: Thu, 14 Jun 2018 09:19:27 -0700
> > Subject: [PATCH] i386; Add -mmanual-endbr and cf_check function attribute
> >
> > Currently GCC inserts ENDBR instruction at entries of all non-static
> > functions, unless LTO compilation is used.  Marking all functions,
> > which are not called indirectly with nocf_check attribute, is not
> > ideal since 99% of functions in a program may be of this kind.
> >
> > This patch adds -mmanual-endbr and cf_check function attribute.  They
> > can be used together with -fcf-protection such that ENDBR instruction
> > is inserted only at entries of functions with cf_check attribute.  It
> > can limit number of ENDBR instructions to reduce program size.
> >
> > gcc/
> >
> >   * config/i386/i386.c (rest_of_insert_endbranch): Insert ENDBR
> >   at the function entry only when -mmanual-endbr isn't used or
> >   there is cf_check function attribute.
> >   (ix86_attribute_table): Add cf_check.
> >   * config/i386/i386.opt: Add -mmanual-endbr.
> >   * doc/extend.texi: Document cf_check attribute.
> >   * doc/invoke.texi: Document -mmanual-endbr.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/i386/cf_check-1.c: New test.
> >   * gcc.target/i386/cf_check-2.c: Likewise.
> >   * gcc.target/i386/cf_check-3.c: Likewise.
> >   * gcc.target/i386/cf_check-4.c: Likewise.
> >   * gcc.target/i386/cf_check-5.c: Likewise.
> OK.
>
> Though I'm not sure how valuable this is in practice.  Yea, it saves
> some space at the start of functions, but I find myself wondering more
> and more if we should be pushing folks towards LTO for a variety of reasons.
>

LTO can solve many issues in addition to unnecessary ENDBR.
I have been fixing LTO bugs in bfd linker uncovered by people
who are using LTO :-).

-- 
H.J.


Re: [C++ Patch] PR 84644 ("internal compiler error: in warn_misplaced_attr_for_class_type, at cp/decl.c:4718")

2018-12-14 Thread Jason Merrill

On 12/14/18 4:33 PM, Paolo Carlini wrote:

Hi,

On 14/12/18 21:19, Jason Merrill wrote:

On 12/14/18 1:44 PM, Paolo Carlini wrote:

Hi,

On 13/12/18 22:03, Jason Merrill wrote:

On 10/30/18 9:22 PM, Paolo Carlini wrote:

Hi,

On 30/10/18 21:37, Jason Merrill wrote:

On 10/26/18 2:02 PM, Paolo Carlini wrote:

On 26/10/18 17:18, Jason Merrill wrote:
On Fri, Oct 26, 2018 at 4:52 AM Paolo Carlini 
 wrote:

On 24/10/18 22:41, Jason Merrill wrote:

On 10/15/18 12:45 PM, Paolo Carlini wrote:

 && ((TREE_CODE (declspecs->type) != TYPENAME_TYPE
+   && TREE_CODE (declspecs->type) != DECLTYPE_TYPE
  && MAYBE_CLASS_TYPE_P (declspecs->type))
I would think that the MAYBE_CLASS_TYPE_P here should be 
CLASS_TYPE_P,

and then we can remove the TYPENAME_TYPE check. Or do we want to
allow template type parameters for some reason?
Indeed, it would be nice to just use OVERLOAD_TYPE_P. However 
it seems
we at least want to let through TEMPLATE_TYPE_PARMs 
representing 'auto'

- otherwise Dodji's check a few lines below which fixed c++/51473
doesn't work anymore - and also BOUND_TEMPLATE_TEMPLATE_PARM, 
otherwise
we regress on template/spec32.C and template/ttp22.C because we 
don't
diagnose the shadowing anymore. Thus, I would say either we 
keep on
using MAYBE_CLASS_TYPE_P or we pick what we need, possibly we 
add a comment?

Aha.  I guess the answer is not to restrict that test any more, but
instead to fix the code further down so it gives a proper 
diagnostic

rather than call warn_misplaced_attr_for_class_type.


I see. Thus something like the below? It passes testing on 
x86_64-linux.


+  if ((!declared_type || TREE_CODE (declared_type) == 
DECLTYPE_TYPE)

+  && ! saw_friend && !error_p)
 permerror (input_location, "declaration does not declare 
anything");


I see no reason to make this specific to decltype.  Maybe move 
this diagnostic into the final 'else' block with the other 
declspec diagnostics and not look at declared_type at all?


I'm not sure to fully understand: if we do that we still want to at 
least minimally check that declared_type is null, like we already 
do, and then we simply accept the new testcase. Is that Ok? 
Because, as I probably mentioned at some point, all the other 
compilers I have at hand issue a "does not declare anything" 
diagnostic, and we likewise do that for the legacy __typeof. Not 
looking into declared_type *at all* doesn't work with plain class 
types and enums, of course. Or you meant something entirely 
different??



+  if (declspecs->attributes && warn_attributes && declared_type
+  && TREE_CODE (declared_type) != DECLTYPE_TYPE)


I think we do want to give a diagnostic about useless attributes, 
not skip it.


Agreed. FWIW the attached tests fine.


The problem here is that the code toward the bottom expects 
"declared_type" to be the tagged type declared by a declaration with 
no declarator, and in this testcase it's ending up as a DECLTYPE_TYPE.


I think once we've checked for 'auto' we don't want declared_type to 
be anything that isn't OVERLOAD_TYPE_P.  We can arrange that either 
by checking for 'auto' first and then changing the code that sets 
declared_type to use OVERLOAD_TYPE_P, or by clearing declared_type 
after checking for 'auto' if it isn't OVERLOAD_TYPE_P.


Thanks. I'm slowly catching up on this issue... Any suggestion about 
BOUND_TEMPLATE_TEMPLATE_PARM? If we don't let through such tree nodes 
- which are MAYBE_CLASS_TYPE_P and aren't OVERLOAD_TYPE_P - we 
regress on template/spec32.C, we don't reject it anymore.
If we clear declared_type for a BOUND_TEMPLATE_TEMPLATE_PARM, we 
should get the "does not declare anything" error.


Ah, now I see, I didn't realize that we would also change the errors we 
issue for those testcases. Thus the below is finishing testing, appears 
to work fine.


OK.

Jason



Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Jason Merrill

On 12/14/18 3:42 PM, Jason Merrill wrote:

On 12/6/18 7:23 PM, Alexandre Oliva wrote:

This patch started out from the testcase in PR88146, that attempted to
synthesize an inherited ctor without any args before a varargs
ellipsis and crashed while at that, because of the unguarded
dereferencing of the parm type list, that usually contains a
terminator.  The terminator is not there for varargs functions,
however, and without any other args, we ended up dereferencing a NULL
pointer.  Oops.

Guarding the accesses there was easy, but I missed the sorry message
we got in other testcases that passed arguments through the ellipsis
in inherited ctors.  I put a check in, and noticed the inherited ctors
were synthesized with the location assigned to the class name,
although they were initially assigned the location of the using
declaration.  I decided the latter was better, and arranged for the
better location to be retained.

Further investigation revealed the lack of a sorry message had to do
with the call being in a non-evaluated context, in this case, a
noexcept expression.  The sorry would be correctly reported in other
contexts, so I rolled back the check I'd added, but retained the
source location improvement.

I was still concerned about issuing sorry messages while instantiating
template ctors even in non-evaluated contexts, e.g., if a template
ctor had a base initializer that used an inherited ctor with enough
arguments that they'd go through an ellipsis.  I wanted to defer the
instantiation of such template ctors, but that would have been wrong
for constexpr template ctors, and already done for non-constexpr ones.
So, I just consolidated multiple test variants into a single testcase
that explores and explains various of the possibilities I thought of.

Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
with a known regression, and got only that known regression.  Retesting
without it.  Ok to install?


for  gcc/cp/ChangeLog

PR c++/88146
* method.c (do_build_copy_constructor): Do not crash with
ellipsis-only parm list.
(synthesize_method): Retain location of inherited ctor.

for  gcc/testsuite/ChangeLog

PR c++/88146
* g++.dg/cpp0x/inh-ctor32.C: New.
---
  gcc/cp/method.c |    9 +
  gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C |  229 
+++

  2 files changed, 234 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index fd023e200538..41d609fb1de6 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -643,7 +643,7 @@ do_build_copy_constructor (tree fndecl)
    bool trivial = trivial_fn_p (fndecl);
    tree inh = DECL_INHERITED_CTOR (fndecl);
-  if (!inh)
+  if (parm && !inh)
  parm = convert_from_reference (parm);


If inh is false, we're a copy constructor, which always has a parm, so 
this hunk seems unnecessary.



    if (trivial)
@@ -677,7 +677,7 @@ do_build_copy_constructor (tree fndecl)
  {
    tree fields = TYPE_FIELDS (current_class_type);
    tree member_init_list = NULL_TREE;
-  int cvquals = cp_type_quals (TREE_TYPE (parm));
+  int cvquals = parm ? cp_type_quals (TREE_TYPE (parm)) : 0;


This could also check !inh.


And in the existing code, while I'm looking at it:


  for (; fields; fields = DECL_CHAIN (fields))
{
  tree field = fields;
  tree expr_type;

  if (TREE_CODE (field) != FIELD_DECL)
continue;
  if (inh)
continue;


The "if (inh) continue" is odd, there's no reason to iterate through the 
fields ignoring all of them when we could skip the loop entirely.


Jason


Re: [PATCH 1/4] introduce struct strlen_data_t into gimple-fold

2018-12-14 Thread Martin Sebor

On 12/14/18 2:23 PM, Jeff Law wrote:

On 12/10/18 2:00 PM, Martin Sebor wrote:

Jeff, is there something you are expecting me to change in
response to this or have you just not gotten around to reviewing
the rest?

Your last comment led me to believe you had another iteration on the
first patch to do before we moved forward.


I said that since the majority of the changes in the patch
are just minor adjustments and the meat of the change is in
gimple-fold.c I was hoping having a single patch won't be
a problem.  I was not planning on breaking things up due
to the conflicts with trunk.

Martin


Martin

On 11/28/18 9:26 PM, Jeff Law wrote:

On 11/25/18 5:05 PM, Martin Sebor wrote:



If so, then I think we need
to look for a better name than MAXSIZE and MAXLEN.


I find these names quite fitting and I'm not sure what might work
better.  I renamed MAXSIZE to MAXBOUND but nothing comes to mind
as a replacement for MAXLEN.  Please suggest something you think
is better.




+  /* When non-null, NONSTR refers to the declaration known to store
+ an unterminated constant character array, as in:
+ const char s[] = { 'a', 'b', 'c' };
+ It is used to diagnose uses of such arrays in functions such as
+ strlen() that expect a nul-terminated string as an argument.  */
+  tree nonstr;

So rather than NONSTR, DECL may make more sense -- if for no other
reason than you don't have to think in terms of "not a string".


Done, but I think DECL is a poor choice for the reasons below.

The field is only set when the thing the object refers to is
a character array that is not a string.  It identifies the first
array the expression refers to that's not a terminated string
(there could be multiple).  I can't think of anything else one
might want to think of it as than "a declaration of an array
that is not a string."

As a name, DECL is generic and used all over the place for any
sort of a declaration so it's not a good choice also for that
reason.  It's only marginally more descriptive that the pervasive
NODE or T, but just as useless to grep for (which I have been
relying on when working with this patch).

I have been using the name NONSTR in all contexts where
I introduced the unterminated array handling, so renaming
the member to DECL makes this scheme inconsistent

NONSTR requires you to think in the negative and it sounds more like a
boolean property to me, but it's actually carrying more information than
just a boolean.

I'm certainly not wed to DECL.  If you've got a better name, please
suggest one.





+  /* ELTSIZE is set to 1 for single byte character strings, and 2
or 4
+ for wide characer strings.  */
+  unsigned eltsize;

Bernd's suggestion that we separate the input vs output paramters
may be
a reasonable one -- I think this is the only in-parameter you're
passing
with the structure, right?  And everything else is a pure output?
If so
we may be better off continuing to pass the element size as a separate
parameter.


I changed it in the updated patch.  I had chosen to make it
a member to reduce the number of arguments to these functions and
in anticipation of having them update it before returning if they
discover that the actual element size doesn't match the expected
size, as in:

    printf ("%ls", "narrow string");

Similarly to what I proposed here:
    https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01321.html

I don't see what has been gained by making it an argument again.

It's the separation of inputs from outputs he was trying to achieve that
I said *may* be a reasonable one.

It's not always a good thing, nor is it always a bad thing.  If there
are subsequent patches are going to have callees changing it, then it
absolutely makes sense to include it.  But adding it to the structure
for the mere sake of reducing arguments may not.

And just to be clear, I'm not inherently against pulling stuff into
structures and classes.   Quite the opposite.




+  /* FLEXARRAY is true if the range of the string lengths has been
+ obtained from the upper bound of an array at the end of a
struct.
+ Such an array may hold a string that's longer than its upper
bound
+ due to it being used as a poor-man's flexible array member.  */
+  bool flexarray;
+
+  /* Set ELTSIZE and value-initialize all other members.  */
+  strlen_data_t (unsigned eltbytes)
+    : minlen (), maxlen (), maxsize (), nonstr (), eltsize
(eltbytes),
+  flexarray () { }

I think if you pull ELTSIZE out and pass it as a distinct parameter,
then you don't need a ctor and you can have a POD.  You can then
initialize with memset rather than having to individually initialize
each field -- meaning it's easier to add more output fields in the
future.


Without ELTSIZE neither a ctor nor memset is necessary for
initialization.  This works too and is the preferred style
in C++ 98:

    c_strlen_data data = { };

I thought that didn't work somewhere.  I certainly would have preferred
that over memset when I was twiddling yours 

Re: [C++ Patch] PR 84644 ("internal compiler error: in warn_misplaced_attr_for_class_type, at cp/decl.c:4718")

2018-12-14 Thread Paolo Carlini

Hi,

On 14/12/18 21:19, Jason Merrill wrote:

On 12/14/18 1:44 PM, Paolo Carlini wrote:

Hi,

On 13/12/18 22:03, Jason Merrill wrote:

On 10/30/18 9:22 PM, Paolo Carlini wrote:

Hi,

On 30/10/18 21:37, Jason Merrill wrote:

On 10/26/18 2:02 PM, Paolo Carlini wrote:

On 26/10/18 17:18, Jason Merrill wrote:
On Fri, Oct 26, 2018 at 4:52 AM Paolo Carlini 
 wrote:

On 24/10/18 22:41, Jason Merrill wrote:

On 10/15/18 12:45 PM, Paolo Carlini wrote:

 && ((TREE_CODE (declspecs->type) != TYPENAME_TYPE
+   && TREE_CODE (declspecs->type) != DECLTYPE_TYPE
  && MAYBE_CLASS_TYPE_P (declspecs->type))
I would think that the MAYBE_CLASS_TYPE_P here should be 
CLASS_TYPE_P,

and then we can remove the TYPENAME_TYPE check. Or do we want to
allow template type parameters for some reason?
Indeed, it would be nice to just use OVERLOAD_TYPE_P. However 
it seems
we at least want to let through TEMPLATE_TYPE_PARMs 
representing 'auto'

- otherwise Dodji's check a few lines below which fixed c++/51473
doesn't work anymore - and also BOUND_TEMPLATE_TEMPLATE_PARM, 
otherwise
we regress on template/spec32.C and template/ttp22.C because we 
don't
diagnose the shadowing anymore. Thus, I would say either we 
keep on
using MAYBE_CLASS_TYPE_P or we pick what we need, possibly we 
add a comment?

Aha.  I guess the answer is not to restrict that test any more, but
instead to fix the code further down so it gives a proper 
diagnostic

rather than call warn_misplaced_attr_for_class_type.


I see. Thus something like the below? It passes testing on 
x86_64-linux.


+  if ((!declared_type || TREE_CODE (declared_type) == 
DECLTYPE_TYPE)

+  && ! saw_friend && !error_p)
 permerror (input_location, "declaration does not declare 
anything");


I see no reason to make this specific to decltype.  Maybe move 
this diagnostic into the final 'else' block with the other 
declspec diagnostics and not look at declared_type at all?


I'm not sure to fully understand: if we do that we still want to at 
least minimally check that declared_type is null, like we already 
do, and then we simply accept the new testcase. Is that Ok? 
Because, as I probably mentioned at some point, all the other 
compilers I have at hand issue a "does not declare anything" 
diagnostic, and we likewise do that for the legacy __typeof. Not 
looking into declared_type *at all* doesn't work with plain class 
types and enums, of course. Or you meant something entirely 
different??



+  if (declspecs->attributes && warn_attributes && declared_type
+  && TREE_CODE (declared_type) != DECLTYPE_TYPE)


I think we do want to give a diagnostic about useless attributes, 
not skip it.


Agreed. FWIW the attached tests fine.


The problem here is that the code toward the bottom expects 
"declared_type" to be the tagged type declared by a declaration with 
no declarator, and in this testcase it's ending up as a DECLTYPE_TYPE.


I think once we've checked for 'auto' we don't want declared_type to 
be anything that isn't OVERLOAD_TYPE_P.  We can arrange that either 
by checking for 'auto' first and then changing the code that sets 
declared_type to use OVERLOAD_TYPE_P, or by clearing declared_type 
after checking for 'auto' if it isn't OVERLOAD_TYPE_P.


Thanks. I'm slowly catching up on this issue... Any suggestion about 
BOUND_TEMPLATE_TEMPLATE_PARM? If we don't let through such tree nodes 
- which are MAYBE_CLASS_TYPE_P and aren't OVERLOAD_TYPE_P - we 
regress on template/spec32.C, we don't reject it anymore.
If we clear declared_type for a BOUND_TEMPLATE_TEMPLATE_PARM, we 
should get the "does not declare anything" error.


Ah, now I see, I didn't realize that we would also change the errors we 
issue for those testcases. Thus the below is finishing testing, appears 
to work fine.


Thanks, Paolo.

///

Index: cp/decl.c
===
--- cp/decl.c   (revision 267131)
+++ cp/decl.c   (working copy)
@@ -4803,9 +4803,8 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
 declared_type = declspecs->type;
   else if (declspecs->type == error_mark_node)
 error_p = true;
-  if (declared_type == NULL_TREE && ! saw_friend && !error_p)
-permerror (input_location, "declaration does not declare anything");
-  else if (declared_type != NULL_TREE && type_uses_auto (declared_type))
+
+  if (type_uses_auto (declared_type))
 {
   error_at (declspecs->locations[ds_type_spec],
"% can only be specified for variables "
@@ -4812,6 +4811,12 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
"or function declarations");
   return error_mark_node;
 }
+
+  if (declared_type && !OVERLOAD_TYPE_P (declared_type))
+declared_type = NULL_TREE;
+
+  if (!declared_type && !saw_friend && !error_p)
+permerror (input_location, "declaration does not declare anything");
   /* Check for an anonymous union.  */
   else if (declared_type && RECORD_OR_UNION_CODE_P 

Re: PING^2: [PATCH] i386; Add -mmanual-endbr and cf_check function attribute

2018-12-14 Thread Jeff Law
On 12/11/18 9:03 AM, H.J. Lu wrote:
> On Mon, Dec 3, 2018 at 5:45 AM H.J. Lu  wrote:
>> On Mon, Jun 18, 2018 at 2:20 AM Richard Biener
>>  wrote:
>>> On Fri, Jun 15, 2018 at 2:59 PM H.J. Lu  wrote:
 Currently GCC inserts ENDBR instruction at entries of all non-static
 functions, unless LTO compilation is used.  Marking all functions,
 which are not called indirectly with nocf_check attribute, is not
 ideal since 99% of functions in a program may be of this kind.

 This patch adds -mmanual-endbr and cf_check function attribute.  They
 can be used together with -fcf-protection such that ENDBR instruction
 is inserted only at entries of functions with cf_check attribute.  It
 can limit number of ENDBR instructions to reduce program size.

 OK for trubk?
>>> I wonder if the linker could assist with ENDBR creation by
>>> redirecting all non-direct call relocs to a linker-generated
>>> stub with ENBR and a direct branch?
>>>
>> The goal of this patch is to add as few as ENDBR as possible
>> to reduce program size as much as possible.   Also there is no
>> relocation for indirect branch via register.
>>
> Hi Honza, Jakub, Jeff, Richard,
> 
> Here is the rebased patch.  Can you guys take a look?
> 
> Thanks.
> 
> 
> -- H.J.
> 
> 
> 0001-i386-Add-mmanual-endbr-and-cf_check-function-attribu.patch
> 
> From 5934c6be6495b2d6f278646e25f9e684f6610e2b Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Thu, 14 Jun 2018 09:19:27 -0700
> Subject: [PATCH] i386; Add -mmanual-endbr and cf_check function attribute
> 
> Currently GCC inserts ENDBR instruction at entries of all non-static
> functions, unless LTO compilation is used.  Marking all functions,
> which are not called indirectly with nocf_check attribute, is not
> ideal since 99% of functions in a program may be of this kind.
> 
> This patch adds -mmanual-endbr and cf_check function attribute.  They
> can be used together with -fcf-protection such that ENDBR instruction
> is inserted only at entries of functions with cf_check attribute.  It
> can limit number of ENDBR instructions to reduce program size.
> 
> gcc/
> 
>   * config/i386/i386.c (rest_of_insert_endbranch): Insert ENDBR
>   at the function entry only when -mmanual-endbr isn't used or
>   there is cf_check function attribute.
>   (ix86_attribute_table): Add cf_check.
>   * config/i386/i386.opt: Add -mmanual-endbr.
>   * doc/extend.texi: Document cf_check attribute.
>   * doc/invoke.texi: Document -mmanual-endbr.
> 
> gcc/testsuite/
> 
>   * gcc.target/i386/cf_check-1.c: New test.
>   * gcc.target/i386/cf_check-2.c: Likewise.
>   * gcc.target/i386/cf_check-3.c: Likewise.
>   * gcc.target/i386/cf_check-4.c: Likewise.
>   * gcc.target/i386/cf_check-5.c: Likewise.
OK.

Though I'm not sure how valuable this is in practice.  Yea, it saves
some space at the start of functions, but I find myself wondering more
and more if we should be pushing folks towards LTO for a variety of reasons.

jeff



Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior

2018-12-14 Thread Thomas Schwinge
Hi!

On Fri, 25 May 2018 13:01:58 -0700, Cesar Philippidis  
wrote:
> This patch updates GCC's to support OpenACC 2.5's data clause semantics. 
> In OpenACC 2.5, copy, copyin and copyout all behave like their 
> present_or_* counterparts in OpenACC 2.0. The patch also adds support 
> for the new finalize and if_present data clauses introduced in OpenACC 
> 2.5. The finalize clause introduced some new reference counting behavior 
> in the runtime; whereas 'acc exit data copyout' decrements the reference 
> count of a variable, 'acc exit data finalize' actually removes it from 
> the accelerator regardless of there are any pending references to it.
> 
> Due to the size of this patch, I had to compress it.

Well, or just handled separately the pieces that already were nicely
separated, instead of combining them into one big patch?

> However, despite 
> the size of the patch, which is mainly due to the augmented test cases, 
> it fairly noninvasive. I was originally going to include support for 
> declare allocate and deallocate, but those require more extensive 
> modifications to the fortran FE.

(Why the idea of including even more independent changes?)


Anyway.  Committed to trunk in r267153 the following to add in the
missing changes from Chung-Lin's "Adjust copy/copyin/copyout/create for
OpenACC 2.5" patch:

commit 8ccac5746ba73bde3a1db490359bbe2d5f62d4a0
Author: tschwinge 
Date:   Fri Dec 14 20:43:12 2018 +

Missing changes from "Adjust copy/copyin/copyout/create for OpenACC 2.5"

Most of that patch's changes were already committed as part of r261813 
"Update
OpenACC data clause semantics to the 2.5 behavior", but not all of them.

libgomp/
* oacc-mem.c (acc_present_or_create): Remove definition and change
to alias of acc_create.
(acc_present_or_copyin): Remove definition and change to alias of
acc_copyin.
* oacc-parallel.c (GOACC_enter_exit_data): Call acc_create instead
of acc_present_or_create.
* testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Remove.
* testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-6.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-4.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-5.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-6.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-7.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-8.f: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267153 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  | 26 ++
 libgomp/oacc-mem.c | 55 --
 libgomp/oacc-parallel.c|  4 --
 .../libgomp.oacc-c-c++-common/data-already-1.c | 20 
 .../libgomp.oacc-c-c++-common/data-already-2.c | 20 
 .../libgomp.oacc-c-c++-common/data-already-3.c | 20 
 .../libgomp.oacc-c-c++-common/data-already-4.c | 18 ---
 .../libgomp.oacc-c-c++-common/data-already-5.c | 18 ---
 .../libgomp.oacc-c-c++-common/data-already-6.c | 18 ---
 .../libgomp.oacc-c-c++-common/data-already-7.c | 18 ---
 .../libgomp.oacc-c-c++-common/data-already-8.c | 20 
 .../libgomp.oacc-fortran/data-already-1.f  | 16 ---
 .../libgomp.oacc-fortran/data-already-2.f  | 16 ---
 .../libgomp.oacc-fortran/data-already-3.f  | 15 --
 .../libgomp.oacc-fortran/data-already-4.f  | 14 --
 .../libgomp.oacc-fortran/data-already-5.f  | 14 --
 .../libgomp.oacc-fortran/data-already-6.f  | 14 --
 .../libgomp.oacc-fortran/data-already-7.f  | 14 --
 .../libgomp.oacc-fortran/data-already-8.f  | 16 ---
 19 files changed, 55 insertions(+), 301 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 349497d58ee6..084f174513b0 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,29 @@
+2018-12-14  Thomas Schwinge  
+   Chung-Lin Tang  
+
+   * oacc-mem.c (acc_present_or_create): Remove definition and change
+   to alias of acc_create.
+   

Re: [PATCH 1/4] introduce struct strlen_data_t into gimple-fold

2018-12-14 Thread Jeff Law
On 12/10/18 2:00 PM, Martin Sebor wrote:
> Jeff, is there something you are expecting me to change in
> response to this or have you just not gotten around to reviewing
> the rest?
Your last comment led me to believe you had another iteration on the
first patch to do before we moved forward.


> 
> Martin
> 
> On 11/28/18 9:26 PM, Jeff Law wrote:
>> On 11/25/18 5:05 PM, Martin Sebor wrote:
>>>
 If so, then I think we need
 to look for a better name than MAXSIZE and MAXLEN.
>>>
>>> I find these names quite fitting and I'm not sure what might work
>>> better.  I renamed MAXSIZE to MAXBOUND but nothing comes to mind
>>> as a replacement for MAXLEN.  Please suggest something you think
>>> is better.
>>>

> +  /* When non-null, NONSTR refers to the declaration known to store
> + an unterminated constant character array, as in:
> + const char s[] = { 'a', 'b', 'c' };
> + It is used to diagnose uses of such arrays in functions such as
> + strlen() that expect a nul-terminated string as an argument.  */
> +  tree nonstr;
 So rather than NONSTR, DECL may make more sense -- if for no other
 reason than you don't have to think in terms of "not a string".
>>>
>>> Done, but I think DECL is a poor choice for the reasons below.
>>>
>>> The field is only set when the thing the object refers to is
>>> a character array that is not a string.  It identifies the first
>>> array the expression refers to that's not a terminated string
>>> (there could be multiple).  I can't think of anything else one
>>> might want to think of it as than "a declaration of an array
>>> that is not a string."
>>>
>>> As a name, DECL is generic and used all over the place for any
>>> sort of a declaration so it's not a good choice also for that
>>> reason.  It's only marginally more descriptive that the pervasive
>>> NODE or T, but just as useless to grep for (which I have been
>>> relying on when working with this patch).
>>>
>>> I have been using the name NONSTR in all contexts where
>>> I introduced the unterminated array handling, so renaming
>>> the member to DECL makes this scheme inconsistent
>> NONSTR requires you to think in the negative and it sounds more like a
>> boolean property to me, but it's actually carrying more information than
>> just a boolean.
>>
>> I'm certainly not wed to DECL.  If you've got a better name, please
>> suggest one.
>>
>>
>>>
> +  /* ELTSIZE is set to 1 for single byte character strings, and 2
> or 4
> + for wide characer strings.  */
> +  unsigned eltsize;
 Bernd's suggestion that we separate the input vs output paramters
 may be
 a reasonable one -- I think this is the only in-parameter you're
 passing
 with the structure, right?  And everything else is a pure output? 
 If so
 we may be better off continuing to pass the element size as a separate
 parameter.
>>>
>>> I changed it in the updated patch.  I had chosen to make it
>>> a member to reduce the number of arguments to these functions and
>>> in anticipation of having them update it before returning if they
>>> discover that the actual element size doesn't match the expected
>>> size, as in:
>>>
>>>    printf ("%ls", "narrow string");
>>>
>>> Similarly to what I proposed here:
>>>    https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01321.html
>>>
>>> I don't see what has been gained by making it an argument again.
>> It's the separation of inputs from outputs he was trying to achieve that
>> I said *may* be a reasonable one.
>>
>> It's not always a good thing, nor is it always a bad thing.  If there
>> are subsequent patches are going to have callees changing it, then it
>> absolutely makes sense to include it.  But adding it to the structure
>> for the mere sake of reducing arguments may not.
>>
>> And just to be clear, I'm not inherently against pulling stuff into
>> structures and classes.   Quite the opposite.
>>
>>>
> +  /* FLEXARRAY is true if the range of the string lengths has been
> + obtained from the upper bound of an array at the end of a
> struct.
> + Such an array may hold a string that's longer than its upper
> bound
> + due to it being used as a poor-man's flexible array member.  */
> +  bool flexarray;
> +
> +  /* Set ELTSIZE and value-initialize all other members.  */
> +  strlen_data_t (unsigned eltbytes)
> +    : minlen (), maxlen (), maxsize (), nonstr (), eltsize
> (eltbytes),
> +  flexarray () { }
 I think if you pull ELTSIZE out and pass it as a distinct parameter,
 then you don't need a ctor and you can have a POD.  You can then
 initialize with memset rather than having to individually initialize
 each field -- meaning it's easier to add more output fields in the
 future.
>>>
>>> Without ELTSIZE neither a ctor nor memset is necessary for
>>> initialization.  This works too and is the preferred style
>>> in C++ 98:
>>>
>>>    

Re: [PATCH] Remove duplicated code block in gimple-ssa-split-paths.c

2018-12-14 Thread Jeff Law
On 12/14/18 2:25 AM, Richard Biener wrote:
> 
> Jeffs last commit added the fix twice.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> Richard.
> 
> 2018-12-14  Richard Biener  
> 
>   * gimple-ssa-split-paths.c (is_feasible_trace): Remove
>   duplicated code block.
I've made that same mistake twice in the last week.  I'm in the middle
of changing some of my workflows and clearly I'm mucking it up.

jeff


[PR88495] An OpenACC async queue is always synchronized with itself

2018-12-14 Thread Thomas Schwinge
Hi!

Committed to trunk in r267152:

commit 963e7a8d58a248f8093947e9a5ba56306d36a8e2
Author: tschwinge 
Date:   Fri Dec 14 20:43:02 2018 +

[PR88495] An OpenACC async queue is always synchronized with itself

An OpenACC async queue is always synchronized with itself, so invocations 
like
"#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot 
of
sense, but are still valid.

libgomp/
PR libgomp/88495
* plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse
"identical parameters".
* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267152 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |   6 +
 libgomp/plugin/plugin-nvptx.c  |   3 +-
 .../libgomp.oacc-c-c++-common/asyncwait-nop-1.c|   3 -
 .../testsuite/libgomp.oacc-c-c++-common/lib-80.c   | 135 -
 4 files changed, 8 insertions(+), 139 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 2914066f7532..349497d58ee6 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,11 @@
 2018-12-14  Thomas Schwinge  
 
+   PR libgomp/88495
+   * plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse
+   "identical parameters".
+   * testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update.
+   * testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove.
+
PR libgomp/88484
* oacc-parallel.c (GOACC_wait): Correct handling for "async >= 0".
* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: New file.
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 6f9b16634b10..fb686de73f25 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1617,8 +1617,9 @@ nvptx_wait_async (int async1, int async2)
  necessarily have to exist already.  */
   s2 = select_stream_for_async (async2, self, true, NULL);
 
+  /* A stream is always synchronized with itself.  */
   if (s1 == s2)
-GOMP_PLUGIN_fatal ("identical parameters");
+return;
 
   e = (CUevent *) GOMP_PLUGIN_malloc (sizeof (CUevent));
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c
index e4f627d38bc2..4ab67363ba67 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c
@@ -51,9 +51,6 @@ main ()
 {
   for (size_t j = 0; j < values_n; ++j)
{
- if (values[i] == values[j])
-   continue;
-
 #pragma acc parallel wait (values[i]) async (values[j])
  ;
 #pragma acc wait (values[i]) async (values[j])
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
deleted file mode 100644
index 9a9a837fa4f2..
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
+++ /dev/null
@@ -1,135 +0,0 @@
-/* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-lcuda" } */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "timer.h"
-
-int
-main (int argc, char **argv)
-{
-  CUdevice dev;
-  CUfunction delay;
-  CUmodule module;
-  CUresult r;
-  CUstream stream;
-  int N;
-  int i;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
-
-  acc_init (acc_device_nvidia);
-
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (, devnum);
-  if (r != CUDA_SUCCESS)
-{
-  fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-  abort ();
-}
-
-  r =
-cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
- dev);
-  if (r != CUDA_SUCCESS)
-{
-  fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-  abort ();
-}
-
-  r = cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-{
-  fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-  abort ();
-}
-
-  r = cuModuleLoad (, "subr.ptx");
-  if (r != CUDA_SUCCESS)
-{
-  fprintf (stderr, "cuModuleLoad failed: %d\n", r);
-  abort ();
-}
-
-  r = cuModuleGetFunction (, module, "delay");
-  if (r != CUDA_SUCCESS)
-{
-  fprintf (stderr, "cuModuleGetFunction failed: %d\n", r);
-  abort ();
-}
-
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  N = nprocs;
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  r = cuStreamCreate (, CU_STREAM_DEFAULT);
-  if (r != CUDA_SUCCESS)
-   {
- fprintf (stderr, "cuStreamCreate failed: 

[PR88484] OpenACC wait directive without wait argument but with async clause

2018-12-14 Thread Thomas Schwinge
Hi!

Committed to trunk in r267151:

commit 44b7d2b9c1b1a535212b8312c6dc76dd1570db45
Author: tschwinge 
Date:   Fri Dec 14 20:42:50 2018 +

[PR88484] OpenACC wait directive without wait argument but with async clause

We don't correctly handle "#pragma acc wait async (a)" for "a >= 0", 
handling
as a no-op whereas it should enqueue the appropriate wait operations on
"async (a)".

libgomp/
PR libgomp/88484
* oacc-parallel.c (GOACC_wait): Correct handling for "async >= 0".
* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267151 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |  4 ++
 libgomp/oacc-parallel.c|  4 +-
 .../libgomp.oacc-c-c++-common/asyncwait-nop-1.c| 78 ++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index c1f98d76e013..2914066f7532 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,9 @@
 2018-12-14  Thomas Schwinge  
 
+   PR libgomp/88484
+   * oacc-parallel.c (GOACC_wait): Correct handling for "async >= 0".
+   * testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: New file.
+
PR libgomp/88407
* plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait)
(nvptx_wait_async): Unseen async-argument is a no-op.
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 1e08af70b4da..89b6b6f6fc2b 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -630,8 +630,8 @@ GOACC_wait (int async, int num_waits, ...)
 }
   else if (async == acc_async_sync)
 acc_wait_all ();
-  else if (async == acc_async_noval)
-goacc_thread ()->dev->openacc.async_wait_all_async_func (acc_async_noval);
+  else
+acc_wait_all_async (async);
 }
 
 int
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c
new file mode 100644
index ..e4f627d38bc2
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c
@@ -0,0 +1,78 @@
+/* Several of the async/wait combinations invoked here are no-ops -- they don't
+   effect anything, but are still valid.
+
+   This doesn't verify that the asynchronous operations synchronize correctly,
+   but just verifies that we don't refuse any variants.  */
+
+#undef NDEBUG
+#include 
+#include 
+
+int values[] = { acc_async_sync,
+acc_async_noval,
+0,
+1,
+2,
+36,
+1982, };
+const size_t values_n = sizeof values / sizeof values[0];
+
+int
+main ()
+{
+  /* Explicitly initialize: it's not clear whether the following OpenACC
+ runtime library calls implicitly initialize;
+ .  */
+  acc_device_t d;
+#if defined ACC_DEVICE_TYPE_nvidia
+  d = acc_device_nvidia;
+#elif defined ACC_DEVICE_TYPE_host
+  d = acc_device_host;
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+  acc_init (d);
+
+
+  for (size_t i = 0; i < values_n; ++i)
+assert (acc_async_test (values[i]) == 1);
+
+
+  for (size_t i = 0; i < values_n; ++i)
+{
+#pragma acc parallel wait (values[i])
+  ;
+#pragma acc wait (values[i])
+  acc_wait (values[i]);
+}
+
+
+  for (size_t i = 0; i < values_n; ++i)
+{
+  for (size_t j = 0; j < values_n; ++j)
+   {
+ if (values[i] == values[j])
+   continue;
+
+#pragma acc parallel wait (values[i]) async (values[j])
+ ;
+#pragma acc wait (values[i]) async (values[j])
+ acc_wait_async (values[i], values[j]);
+   }
+}
+
+
+  for (size_t i = 0; i < values_n; ++i)
+{
+#pragma acc parallel wait async (values[i])
+  ;
+#pragma acc wait async (values[i])
+  acc_wait_all_async (values[i]);
+}
+
+
+  /* Clean up.  */
+  acc_wait_all ();
+
+  return 0;
+}


Grüße
 Thomas


Re: [PATCH, ARM] Do softfloat when -mfpu set, -mfloat-abi=softfp and targeting Thumb-1

2018-12-14 Thread Thomas Preudhomme
Hi Richard,

Thanks for catching the problem with this approach. Hopefully this
version should solve the real problem:


FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
not set. Among other things, it makes some of the cmse tests (eg.
gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
-march=armv8-m.base -mcmse -mfpu= -mfloat-abi=softfp. This
patch adds an extra check for TARGET_32BIT to TARGET_HARD_FLOAT such
that it is false on TARGET_THUMB1 targets even when a FPU is specified.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-12-14  thomas Preud'homme  

* config/arm/arm.h (TARGET_HARD_FLOAT): Restrict to TARGET_32BIT
targets.

*** gcc/testsuite/ChangeLog ***

2018-12-14  thomas Preud'homme  

* gcc.target/arm/cmse/baseline/softfp.c: Force an FPU.

Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M
with -mfloat-abi=softfp

Is this ok for stage3?

Best regards,

Thomas

On Thu, 29 Nov 2018 at 14:52, Richard Earnshaw (lists)
 wrote:
>
> On 29/11/2018 10:51, Thomas Preudhomme wrote:
> > Hi,
> >
> > FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> > but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> > not set. Among other things, it makes some of the cmse tests (eg.
> > gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> > -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
> > errors out when a Thumb-1 -like target is selected and a FPU is
> > specified, thus making such tests being skipped.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-11-28  thomas Preud'homme  
> >
> > * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error out
> > if targeting Thumb-1 with an FPU specified.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-11-28  thomas Preud'homme  
> >
> > * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
> > * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
> >
> > Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M.
> > Fails as expected when targeting Armv6-M with an -mfpu or a default FPU.
> > Succeeds without.
> >
> > Is this ok for stage3?
> >
>
> This doesn't sound right.  Specifically this bit...
>
> +  else if (TARGET_THUMB1
> +  && bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
> +   error ("Thumb-1 does not allow FP instructions");
>
> If I use
>
> -mcpu=arm1176jzf-s -mfpu=auto -mfloat-abi=softfp -mthumb
>
> then that shouldn't error, since softfp and thumb is, in reality, just
> float-abi=soft (as there are no fp instructions in thumb).  We also want
> it to work this way so that I can add the thumb/arm attribute to
> specific functions and have the compiler use HW float instructions when
> they are suitable.
>
>
> R.
>
> > Best regards,
> >
> > Thomas
> >
> >
> > thumb1_mfpu_error.patch
> >
> > From 051e38552d7c596873e0303f6ec4272b26d50900 Mon Sep 17 00:00:00 2001
> > From: Thomas Preud'homme 
> > Date: Tue, 27 Nov 2018 15:52:38 +
> > Subject: [PATCH] [PATCH, ARM] Error out when -mfpu set and targeting Thumb-1
> >
> > Hi,
> >
> > FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> > but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> > not set. Among other things, it makes some of the cmse tests (eg.
> > gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> > -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
> > errors out when a Thumb-1 -like target is selected and a FPU is
> > specified, thus making such tests being skipped.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-11-28  thomas Preud'homme  
> >
> >   * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error out
> >   if targeting Thumb-1 with an FPU specified.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-11-28  thomas Preud'homme  
> >
> >   * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
> >   * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
> >
> > Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M.
> > Fails as expected when targeting Armv6-M with an -mfpu or a default FPU.
> > Succeeds without.
> >
> > Is this ok for stage3?
> >
> > Best regards,
> >
> > Thomas
> > ---
> >  gcc/config/arm/arm.c | 3 +++
> >  gcc/testsuite/gcc.target/arm/thumb1_mfpu-1.c | 7 +++
> >  gcc/testsuite/gcc.target/arm/thumb1_mfpu-2.c | 8 
> >  3 files changed, 18 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1_mfpu-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1_mfpu-2.c
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 40f0574e32e..1a205123cf5 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -3747,6 +3747,9 @@ 

Re: [PATCH] Delete powerpcspe

2018-12-14 Thread Jeff Law
On 12/14/18 11:41 AM, Joseph Myers wrote:
> On Fri, 14 Dec 2018, Jeff Law wrote:
> 
>>> I wonder if we could set up auto-(simulator)-testing for all supported
>>> archs (and build testing for all supported configs) on the CF
>>> (with the required scripting in contrib/ so it's easy to replicate).  I'd
>>> simply test only released snapshots to keep the load reasonable
>>> and besides posting to gcc-testresults also post testresults
>>> differences to gcc-regression?
>> It's certainly possible.  Though I've found that managing this kind of
>> thing with Jenkins is far easier than rolling our own.  I'd be happy to
>> move an instance out into the CF.
> 
> On the other hand, in glibc having a single script build-many-glibcs.py 
> that does everything (rather than some external piece of software to 
> orchestrate builds) serves to make it easy for people making changes with 
> cross-architecture risks to run the compilation tests for all 
> architectures - although doing so is slow unless you have lots of CPU 
> cores.
Yup.

What I've been playing with to deal with that is the ability for the
tester to monitor a per-user branch in a special repository.

When a change on that branch is detected, it fires up builds with and
without the patches on that branch and compares the results.

Right now it's just native x86, but the idea is to include a control
file on the branch that would allow it to cycle through specific targets
or groups of targets and any special flags (like additional C++ language
variants to test).

It's a fairly simple extension since I already had the ability to drop
patches into a special github repo and the tester would automatically
use them.

jeff


Re: [PR88407] [OpenACC] Correctly handle unseen async-arguments

2018-12-14 Thread Thomas Schwinge
Hi!

On Fri, 7 Dec 2018 16:38:58 +0100, I wrote:
> So, confused about the intended behavior, I've asked the OpenACC
> committee to clarify, and filed  "[OpenACC]
> Correctly handle unseen async-arguments".
> 
> Assuming this gets clarified in the way I think it should, I suggest the
> following.  Any comments?

Have not yet heard back, but given that the PGI compiler also seems to
handle it this way, I committed the following to trunk in r267150:

commit e7acb9ffce94d592054ecba2eb1970eaf5cbc313
Author: tschwinge 
Date:   Fri Dec 14 20:42:40 2018 +

[PR88407] [OpenACC] Correctly handle unseen async-arguments

... which turn the operation into a no-op.

libgomp/
PR libgomp/88407
* plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait)
(nvptx_wait_async): Unseen async-argument is a no-op.
* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
* testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into...
* testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.  Update.
* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into...
* testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.  Update

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267150 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |  13 ++
 libgomp/plugin/plugin-nvptx.c  |  13 +-
 .../libgomp.oacc-c-c++-common/async_queue-1.c  |  30 +
 .../libgomp.oacc-c-c++-common/data-2-lib.c |   2 +
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |   2 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |   7 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   | 122 --
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |   4 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   | 138 -
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |  24 
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90  |   5 +
 11 files changed, 93 insertions(+), 267 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index d84c3f4bfe2e..c1f98d76e013 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,18 @@
 2018-12-14  Thomas Schwinge  
 
+   PR libgomp/88407
+   * plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait)
+   (nvptx_wait_async): Unseen async-argument is a no-op.
+   * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update.
+   * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
+   * testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into...
+   * testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.  Update.
+   * testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into...
+   * testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.  Update
+
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7d0d38e0c2e1..6f9b16634b10 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1539,9 +1539,8 @@ nvptx_async_test (int async)
   struct ptx_stream *s;
 
   s = select_stream_for_async (async, pthread_self (), false, NULL);
-
   if (!s)
-GOMP_PLUGIN_fatal ("unknown async %d", async);
+return 1;
 
   r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream);
   if (r == CUDA_SUCCESS)
@@ -1596,7 +1595,7 @@ nvptx_wait (int async)
 
   s = select_stream_for_async (async, pthread_self (), false, NULL);
   if (!s)
-GOMP_PLUGIN_fatal ("unknown async %d", async);
+return;
 
   CUDA_CALL_ASSERT (cuStreamSynchronize, s->stream);
 
@@ -1610,14 +1609,14 @@ nvptx_wait_async (int async1, int async2)
   struct ptx_stream *s1, *s2;
   pthread_t self = pthread_self ();
 
+  s1 = select_stream_for_async (async1, self, false, NULL);
+  if (!s1)
+return;
+
   /* The stream that is waiting (rather than being waited for) doesn't
  necessarily have to exist already.  */
   s2 = select_stream_for_async (async2, self, true, NULL);
 
-  s1 = select_stream_for_async (async1, self, false, NULL);
-  if (!s1)
-GOMP_PLUGIN_fatal ("invalid async 1\n");
-
   if (s1 == s2)
 GOMP_PLUGIN_fatal ("identical parameters");
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c
index 

Revise libgomp.oacc-c-c++-common/data-2-lib.c, libgomp.oacc-c-c++-common/data-2.c

2018-12-14 Thread Thomas Schwinge
Hi!

Committed to trunk in r267149:

commit 1d61d32a5dda2b567f2253284ce3ecf40c253fab
Author: tschwinge 
Date:   Fri Dec 14 20:42:29 2018 +

Revise libgomp.oacc-c-c++-common/data-2-lib.c, 
libgomp.oacc-c-c++-common/data-2.c

These are meant to be functionally equivalent (but no longer are), just 
using
different means.  Also, use the OpenACC "*_async" functions recently added.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267149 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |   5 +
 .../libgomp.oacc-c-c++-common/data-2-lib.c | 129 --
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   | 148 +
 3 files changed, 125 insertions(+), 157 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index b6cbb34908a2..d84c3f4bfe2e 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2018-12-14  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise.
+   * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
+
 2018-12-14  Chung-Lin Tang  
 
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Adjust.
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
index f553d3d839c5..e432f8d9c796 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
@@ -1,16 +1,15 @@
-/* This test is similar to data-2.c, but it uses acc_* library functions
-   to move data.  */
-
-/* { dg-do run } */
+/* Test asynchronous, unstructed data regions, runtime library variant.  */
+/* See also data-2.c.  */
 
 #include 
+#undef NDEBUG
 #include 
 #include 
 
 int
 main (int argc, char **argv)
 {
-  int N = 128; //1024 * 1024;
+  int N = 12345;
   float *a, *b, *c, *d, *e;
   void *d_a, *d_b, *d_c, *d_d;
   int i;
@@ -30,19 +29,21 @@ main (int argc, char **argv)
   b[i] = 0.0;
 }
 
-  d_a = acc_copyin (a, nbytes);
-  d_b = acc_copyin (b, nbytes);
-  acc_copyin (, sizeof (int));
+  acc_copyin_async (a, nbytes, acc_async_noval);
+  acc_copyin_async (b, nbytes, acc_async_noval);
+  acc_copyin_async (, sizeof (int), acc_async_noval);
   
-#pragma acc parallel present (a[0:N], b[0:N], N) async wait
+#pragma acc parallel present (a[0:N], b[0:N], N) async
 #pragma acc loop
   for (i = 0; i < N; i++)
 b[i] = a[i];
 
-  acc_wait_all ();
+  d_a = acc_deviceptr (a);
+  acc_memcpy_from_device_async (a, d_a, nbytes, acc_async_noval);
+  d_b = acc_deviceptr (b);
+  acc_memcpy_from_device_async (b, d_b, nbytes, acc_async_noval);
 
-  acc_memcpy_from_device (a, d_a, nbytes);
-  acc_memcpy_from_device (b, d_b, nbytes);
+  acc_wait (acc_async_noval);
 
   for (i = 0; i < N; i++)
 {
@@ -56,19 +57,19 @@ main (int argc, char **argv)
   b[i] = 0.0;
 }
 
-  acc_update_device (a, nbytes);
-  acc_update_device (b, nbytes);
+  acc_update_device_async (a, nbytes, 1);
+  acc_update_device_async (b, nbytes, 1);
   
-#pragma acc parallel present (a[0:N], b[0:N], N)  async (1)
+#pragma acc parallel present (a[0:N], b[0:N], N) async (1)
 #pragma acc loop
   for (i = 0; i < N; i++)
 b[i] = a[i];
 
+  acc_memcpy_from_device_async (a, d_a, nbytes, 1);
+  acc_memcpy_from_device_async (b, d_b, nbytes, 1);
+
   acc_wait (1);
 
-  acc_memcpy_from_device (a, d_a, nbytes);
-  acc_memcpy_from_device (b, d_b, nbytes);
-  
   for (i = 0; i < N; i++)
 {
   assert (a[i] == 2.0);
@@ -83,46 +84,42 @@ main (int argc, char **argv)
   d[i] = 0.0;
 }
 
-  acc_update_device (a, nbytes);
-  acc_update_device (b, nbytes);
-  d_c = acc_copyin (c, nbytes);
-  d_d = acc_copyin (d, nbytes);
+  acc_update_device_async (a, nbytes, 0);
+  acc_update_device_async (b, nbytes, 1);
+  acc_copyin_async (c, nbytes, 2);
+  acc_copyin_async (d, nbytes, 3);
 
-#pragma acc parallel present (a[0:N], b[0:N], N) async (1)
+#pragma acc parallel present (a[0:N], b[0:N], N) wait (0) async (1)
 #pragma acc loop
   for (i = 0; i < N; i++)
 b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel present (a[0:N], c[0:N], N) async (2)
+#pragma acc parallel present (a[0:N], c[0:N], N) wait (0) async (2)
 #pragma acc loop
   for (i = 0; i < N; i++)
 c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i];
 
-#pragma acc parallel present (a[0:N], d[0:N], N) async (3)
+#pragma acc parallel present (a[0:N], d[0:N], N) wait (0) async (3)
 #pragma acc loop
   for (i = 0; i < N; i++)
 d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
 
-  acc_wait_all ();
+  acc_memcpy_from_device_async (a, d_a, nbytes, 0);
+  acc_memcpy_from_device_async (b, d_b, nbytes, 1);
+  d_c = acc_deviceptr (c);
+  acc_memcpy_from_device_async (c, d_c, nbytes, 2);
+  d_d = acc_deviceptr (d);
+  acc_memcpy_from_device_async (d, 

Re: [PATCH, libgcc/ARM & testsuite] Optimize executable size when using softfloat fmul/dmul

2018-12-14 Thread Thomas Preudhomme
Hi Richard,

None, is there any? All the one I could find in the big switch
selecting tm_files and tmake_files in gcc/config.gcc are including
arm/elf.h. I tried to build for arm-wince-pe but got: "Configuration
arm-wince-pe not supported". However note that to guarantee correct
results the only requirement is to support global symbol overriding
weak symbol correctly and I see .weak usage in many other libgcc
backend (eg. i386). The "take the first definition resolving an
undefined reference and ignore the one in following object of a static
library" is only to benefit from the size optimization.

Best regards,

Thomas
On Fri, 7 Dec 2018 at 14:14, Richard Earnshaw (lists)
 wrote:
>
> On 19/11/2018 09:57, Thomas Preudhomme wrote:
> > Softfloat single precision and double precision floating-point
> > multiplication routines in libgcc share some code with the
> > floating-point division of their corresponding precision. As the code
> > is structured now, this leads to *all* division code being pulled in an
> > executable in softfloat mode even if only multiplication is
> > performed.
> >
> > This patch create some new LIB1ASMFUNCS macros to also build files with
> > just the multiplication and shared code as weak symbols. By putting
> > these earlier in the static library, they can then be picked up when
> > only multiplication is used and they are overriden by the global
> > definition in the existing file containing both multiplication and
> > division code when division is needed.
> >
> > The patch also removes changes made to the FUNC_START and ARM_FUNC_START
> > macros in r218124 since the intent was to put multiplication and
> > division code into their own section in a later patch to achieve the
> > same size optimization. That approach relied on specific section layout
> > to ensure multiplication and division were not too far from the shared
> > bit of code in order to the branches to be within range. Due to lack of
> > guarantee regarding section layout, in particular with all the
> > possibility of linker scripts, this approach was chosen instead. This
> > patch keeps the two testcases that were posted by Tony Wang (an Arm
> > employee at the time) on the mailing list to implement this approach
> > and adds a new one, hence the attribution.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-11-14  Thomas Preud'homme  
> >
> > * config/arm/elf.h: Update comment about condition that need to
> > match with libgcc/config/arm/lib1funcs.S to also include
> > libgcc/config/arm/t-arm.
> > * doc/sourcebuild.texi (output-exists, output-exists-not): Rename
> > subsubsection these directives are in to "Check for output files".
> > Move scan-symbol to that section and add to it new scan-symbol-not
> > directive.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-11-16  Tony Wang  
> > Thomas Preud'homme  
> >
> > * lib/lto.exp (lto-execute): Define output_file and testname_with_flags
> > to same value as execname.
> > (scan-symbol): Move and rename to ...
> > * lib/gcc-dg.exp (scan-symbol-common): This.  Adapt into a
> > helper function returning true or false if a symbol is present.
> > (scan-symbol): New procedure.
> > (scan-symbol-not): Likewise.
> > * gcc.target/arm/size-optimization-ieee-1.c: New testcase.
> > * gcc.target/arm/size-optimization-ieee-2.c: Likewise.
> > * gcc.target/arm/size-optimization-ieee-3.c: Likewise.
> >
> > *** libgcc/ChangeLog ***
> >
> > 2018-11-16  Thomas Preud'homme  
> >
> > * /config/arm/lib1funcs.S (FUNC_START): Remove unused sp_section
> > parameter and corresponding code.
> > (ARM_FUNC_START): Likewise in both definitions.
> > Also update footer comment about condition that need to match with
> > gcc/config/arm/elf.h to also include libgcc/config/arm/t-arm.
> > * config/arm/ieee754-df.S (muldf3): Also build it if L_arm_muldf3 is
> > defined.  Weakly define it in this case.
> > * config/arm/ieee754-sf.S (mulsf3): Likewise with L_arm_mulsf3.
> > * config/arm/t-elf (LIB1ASMFUNCS): Build _arm_muldf3.o and
> > _arm_mulsf3.o before muldiv versions if targeting Thumb-1 only. Add
> > comment to keep condition in sync with the one in
> > libgcc/config/arm/lib1funcs.S and gcc/config/arm/elf.h.
> >
> > Testing: Bootstrapped on arm-linux-gnueabihf (Arm & Thumb-2) and
> > testsuite shows no
> > regression. Also built an arm-none-eabi cross compiler targeting
> > soft-float which also shows no regression. In particular newly added
> > tests and gcc.dg/lto/20081212-1 test pass.
>
> Which non-elf targets have you tested?
>
> R.
>
> >
> > Is this ok for stage3?
> >
> > Best regards,
> >
> > Thomas
> >
> >
> > Optimize-size-fpmul_without_div.patch
> >
> > From 8740697791f99b7175e188f049663883c39e51b0 Mon Sep 17 00:00:00 2001
> > From: Thomas Preud'homme 
> > Date: Fri, 26 Oct 2018 16:21:09 +0100
> > Subject: [PATCH] [PATCH, 

Re: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes

2018-12-14 Thread Thomas Schwinge
Hi!

On Fri, 7 Dec 2018 16:30:53 +0100, I wrote:
> On Tue, 25 Sep 2018 21:11:42 +0800, Chung-Lin Tang  
> wrote:
> > These are the testsuite/libgomp.oacc-c-c++-common/* changes.
> 
> Please commit the following three hunks to trunk: the code as present
> doesn't declare its async/wait dependencies correctly.

As I had this queued as a prerequisite for other changes, in r267148 I
have now committed the following to trunk:

commit fef25f06de8e800d2a6ac04b12b6399923d414a9
Author: tschwinge 
Date:   Fri Dec 14 20:42:18 2018 +

Correctly describe OpenACC async/wait dependencies

libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.

Reviewed-by: Thomas Schwinge 

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267148 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog| 6 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c | 2 +-
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index b4ab6b690553..b6cbb34908a2 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,9 @@
+2018-12-14  Chung-Lin Tang  
+
+   * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Adjust.
+   * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
+
 2018-12-14  Thomas Schwinge  
 
PR libgomp/88370
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
index 2ddfa7d4a01b..f553d3d839c5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
@@ -153,7 +153,7 @@ main (int argc, char **argv)
 d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
 #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N], N) \
-  async (4)
+  wait (1, 2, 3) async (4)
   for (int ii = 0; ii < N; ii++)
 e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
index 0c6abe69dc17..81d623afa0ea 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
@@ -162,7 +162,7 @@ main (int argc, char **argv)
 d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
 #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) \
-  wait (1) async (4)
+  wait (1, 2, 3) async (4)
   for (int ii = 0; ii < N; ii++)
 e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
index 0bf706a1b5d4..5ec50b808a73 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
@@ -138,7 +138,7 @@ main (int argc, char **argv)
 d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
 #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) \
-  wait (1,5) async (4)
+  wait (1, 2, 3, 5) async (4)
   for (int ii = 0; ii < N; ii++)
 e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 


Grüße
 Thomas


Re: [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval

2018-12-14 Thread Thomas Schwinge
Hi!

On Wed, 5 Dec 2018 15:14:16 +0100, I wrote:
> On Mon, 19 Nov 2018 16:33:30 +0900, Chung-Lin Tang  
> wrote:
> > On 2018/11/18 10:36 AM, Thomas Schwinge wrote:
> > > Generally, I envision test cases running a few "acc_get_cuda_stream"
> > > calls with relevant argument values, to see whether the expected
> > > queues/streames are being used.  (Similar for other offload targets.)
> > > 
> > > But I suppose we might again need to get clarified whether
> > > "acc_get_cuda_stream(acc_async_sync)",
> > > "acc_get_cuda_stream(acc_async_noval)", or
> > > "acc_get_cuda_stream(acc_async_default)" are actually valid calls (given
> > > that these argument values are not valid "async value"s), and these would
> > > then return the respective CUDA stream handles, different from the one
> > > returned for "acc_get_cuda_stream(0)" etc.
> > > 
> > > That said, we can certainly implement it that way, because that's not
> > > against the specification.
> > 
> > I think the likely clarification we'll ever get on this is that it's
> > implementation defined :P
> 
> Well, actually, I've been able to convince myself ;-) to a reading of the
> specification so that this is supported, and filed
> .
> 
> Does the following look alright to you?
> 
> Do you agree that 'Refusing request to set CUDA stream associated with
> "acc_async_sync"' should just be an informational debug message, instead
> of a hard error?  (This restriction might disappear in the future.)  (Oh,
> and other negative values will still be diagnosed as errors by
> "select_stream_for_async".)

Not having heard anything against this, and as a prerequisite for other
changes, I have now committed the following in r267147:

commit 815940afeefeeafa49ad3a5d81ef2d273ddeb3d7
Author: tschwinge 
Date:   Fri Dec 14 20:42:08 2018 +

[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, 
acc_async_noval

Per my reading of the OpenACC specification (and as supported by secondary
documentation, such as code examples, or presentations), it's valid to call
"acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync",
"acc_async_noval" arguments, not just with the nonnegative values as 
currently
implemented.

libgomp/
PR libgomp/88370
* libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Clarify.
* oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
"async_valid_p".
* plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
acc_async_sync".
* testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New 
file.
* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267147 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  | 14 
 libgomp/libgomp.texi   | 17 ++--
 libgomp/oacc-cuda.c|  4 +-
 libgomp/plugin/plugin-nvptx.c  | 10 ++-
 .../acc_set_cuda_stream-1.c| 42 ++
 .../libgomp.oacc-c-c++-common/async_queue-1.c  | 97 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-84.c   | 31 +--
 .../testsuite/libgomp.oacc-c-c++-common/lib-85.c   | 27 +-
 8 files changed, 222 insertions(+), 20 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 4c66021c367d..b4ab6b690553 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,17 @@
+2018-12-14  Thomas Schwinge  
+
+   PR libgomp/88370
+   * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
+   (acc_set_cuda_stream): Clarify.
+   * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
+   "async_valid_p".
+   * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
+   acc_async_sync".
+   * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file.
+   * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
+   * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
+
 2018-12-14  Tom de Vries  
 
* testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test.
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 3fa8eb8165e5..e6c20525bc0c 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -2768,7 +2768,7 @@ as used by the CUDA Runtime or Driver API's.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);}
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
 @end multitable
 
 @item 

Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 09:55:51PM +0100, Thomas Schwinge wrote:
> Anyway, that was easy enough to fix; in r267145 committed to trunk:

> liboffloadmic/
> * runtime/offload.h (omp_target_is_present, omp_target_memcpy)
> (omp_target_memcpy_rect, omp_target_associate_ptr)
> (omp_target_disassociate_ptr): Adjust to libgomp changes.

Thanks; indeed, I wasn't testing the OpenMP 5.0 patchset with offloading
mainly because there weren't too many offloading related changes so far
(most of them waiting for GCC 10).

Jakub


Add user-friendly diagnostics for OpenACC loop parallelism assigned (was: [PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism)

2018-12-14 Thread Thomas Schwinge
Hi!

On Thu, 26 Jul 2018 07:14:21 -0700, Cesar Philippidis  
wrote:
> On 07/26/2018 01:33 AM, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 5:30 PM Cesar Philippidis
> >  wrote:
> >>
> >> This patch

Thanks!

> >> teaches GCC to inform the user how it assigned parallelism
> >> to each OpenACC loop at compile time

Hence, I changed that diagnostig to "assigned OpenACC [...] loop
parallelism" instead of "Detected parallelism ".

> >> using the -fopt-info-note-omp
> >> flag. For instance, given the acc parallel loop nest:
> >>
> >>   #pragma acc parallel loop
> >>   for (...)
> >> #pragma acc loop vector
> >> for (...)
> >>
> >> GCC will report somthing like
> >>
> >>   foo.c:4:0: note: Detected parallelism 
> >>   foo.c:6:0: note: Detected parallelism 
> >>
> >> Note how only the inner loop specifies vector parallelism. In this
> >> example, GCC automatically assigned gang and worker parallelism to the
> >> outermost loop. Perhaps, going forward, it would be useful to
> >> distinguish which parallelism was specified by the user and which was
> >> assigned by the compiler. But that can be added in a follow up patch.

ACK.

> The attached revised patch now uses MSG_OPTIMIZED_LOCATIONS for the
> diagnostics.

> Is this OK for trunk?

I further changed that to make it build ;-) at all, and also emit
diagnostics for OpenACC kernels constructs, and also added considerably
more testsuite coverage.

Committed to trunk in r267146:

commit 75180da2a558d3106e26173326933f65b417182c
Author: tschwinge 
Date:   Fri Dec 14 20:41:58 2018 +

Add user-friendly diagnostics for OpenACC loop parallelism assigned

gcc/
* omp-offload.c (inform_oacc_loop): New function.
(execute_oacc_device_lower): Use it to display loop parallelism.
gcc/testsuite/
* c-c++-common/goacc/note-parallelism.c: New test.
* gfortran.dg/goacc/note-parallelism.f90: New test.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Update.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/kernels-1.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-routine.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267146 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |   6 +
 gcc/omp-offload.c  |  49 +++-
 gcc/testsuite/ChangeLog|  18 +++
 .../goacc/classify-kernels-unparallelized.c|   3 +-
 .../c-c++-common/goacc/classify-kernels.c  |   3 +-
 .../c-c++-common/goacc/classify-parallel.c |   3 +-
 .../c-c++-common/goacc/classify-routine.c  |   3 +-
 gcc/testsuite/c-c++-common/goacc/kernels-1.c   |  10 +-
 .../goacc/kernels-double-reduction-n.c |   3 +-
 .../c-c++-common/goacc/kernels-double-reduction.c  |   3 +-
 .../c-c++-common/goacc/note-parallelism.c  | 115 ++
 .../goacc/classify-kernels-unparallelized.f95  |   3 +-
 .../gfortran.dg/goacc/classify-kernels.f95 |   3 +-
 .../gfortran.dg/goacc/classify-parallel.f95|   3 +-
 .../gfortran.dg/goacc/classify-routine.f95 |   3 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95   |   3 +-
 .../gfortran.dg/goacc/note-parallelism.f90 | 131 +
 17 files changed, 346 insertions(+), 16 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 527164c4f9ec..7fb4958da485 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-12-14  Thomas Schwinge  
+   Cesar Philippidis  
+
+   * omp-offload.c (inform_oacc_loop): New function.
+   (execute_oacc_device_lower): Use it to display loop parallelism.
+
 2018-12-14  Jakub Jelinek  
 
PR c++/82294
diff --git gcc/omp-offload.c gcc/omp-offload.c
index 0abf0283c9e2..4457e1a3079b 100644
--- gcc/omp-offload.c
+++ gcc/omp-offload.c
@@ -823,7 +823,7 @@ dump_oacc_loop_part (FILE *file, gcall *from, int depth,
 }
 }
 
-/* Dump OpenACC loops LOOP, its siblings and its children.  */
+/* Dump OpenACC loop LOOP, its children, and its siblings.  */
 
 static void
 dump_oacc_loop (FILE *file, oacc_loop *loop, int depth)
@@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loop LOOP, its children, and its
+   siblings.  */
+
+static void

Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-12-14 Thread Thomas Schwinge
Hi Jakub!

On Thu, 8 Nov 2018 18:16:11 +0100, Jakub Jelinek  wrote:
> The OpenMP 5.0 specification, https://www.openmp.org/specifications/ ,
> has been just released a few minutes ago and to celebrate that, I've merged
> gomp-5_0-branch into trunk after bootstrapping/regtesting it on x86_64-linux 
> and
> i686-linux.

In addition to not having tested this with nvptx offloading (where Tom
and you now restored the regressed test cases, thanks!), I can tell that
you also didn't test this with Intel MIC (emulated) offloading.  ;-)

> Because the amount of changes in OpenMP 5.0 is much bigger than in any of the 
> earlier
> releases of the standard, [...]

Oh yes, that's massive!  I immediately thought "poor Jakub" ;-) when I
read a summary of all the new stuff in there.


After  'Intel MIC (emulated) offloading:
"relocation [...] can not be used when making a shared object; recompile
with -fPIC"', yours is now another commit that further broke Intel MIC
(emulated) offloading, but in the past month apparently nobody but me has
run into this (or didn't bother to report it), and I thus again wonder
whether anyone but me is still testing Intel MIC (emulated) offloading?

Anyway, that was easy enough to fix; in r267145 committed to trunk:

commit fbd4f724c13b078755a96a257eabc18ddb83a9cd
Author: tschwinge 
Date:   Fri Dec 14 20:41:46 2018 +

Repair liboffloadmic after "(Partial) OpenMP 5.0 support for GCC 9"

..., which now failed to build, as follows:

In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
[...]/source-gcc/liboffloadmic/runtime/offload.h:220:12: error: 
conflicting declaration of C function 'int omp_target_is_present(void*, int)'
  220 | extern int omp_target_is_present(
  |^
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload.h:45,
 from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
./../libgomp/omp.h:166:12: note: previous declaration 'int 
omp_target_is_present(const void*, int)'
  166 | extern int omp_target_is_present (const void *, int) 
__GOMP_NOTHROW;
  |^
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
[...]/source-gcc/liboffloadmic/runtime/offload.h:236:12: error: 
conflicting declaration of C function 'int omp_target_memcpy(void*, void*, 
size_t, size_t, size_t, int, int)'
  236 | extern int omp_target_memcpy(
  |^
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload.h:45,
 from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
./../libgomp/omp.h:167:12: note: previous declaration 'int 
omp_target_memcpy(void*, const void*, long unsigned int, long unsigned int, 
long unsigned int, int, int)'
  167 | extern int omp_target_memcpy (void *, const void *, 
__SIZE_TYPE__,
  |^
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
[...]/source-gcc/liboffloadmic/runtime/offload.h:262:12: error: 
conflicting declaration of C function 'int omp_target_memcpy_rect(void*, void*, 
size_t, int, const size_t*, const size_t*, const size_t*, const size_t*, const 
size_t*, int, int)'
  262 | extern int omp_target_memcpy_rect(
  |^~
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload.h:45,
 from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
./../libgomp/omp.h:170:12: note: previous declaration 'int 
omp_target_memcpy_rect(void*, const void*, long unsigned int, int, const long 
unsigned int*, const long unsigned int*, const long unsigned int*, const long 
unsigned int*, const long unsigned int*, int, int)'
  170 | extern int omp_target_memcpy_rect (void *, const void *, 
__SIZE_TYPE__, int,
  |^~
In file included from 
[...]/source-gcc/liboffloadmic/runtime/offload_common.h:43,
 from 
[...]/source-gcc/liboffloadmic/runtime/dv_util.cpp:31:
[...]/source-gcc/liboffloadmic/runtime/offload.h:285:12: error: 
conflicting declaration of C function 'int 

Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Jason Merrill

On 12/6/18 7:23 PM, Alexandre Oliva wrote:

This patch started out from the testcase in PR88146, that attempted to
synthesize an inherited ctor without any args before a varargs
ellipsis and crashed while at that, because of the unguarded
dereferencing of the parm type list, that usually contains a
terminator.  The terminator is not there for varargs functions,
however, and without any other args, we ended up dereferencing a NULL
pointer.  Oops.

Guarding the accesses there was easy, but I missed the sorry message
we got in other testcases that passed arguments through the ellipsis
in inherited ctors.  I put a check in, and noticed the inherited ctors
were synthesized with the location assigned to the class name,
although they were initially assigned the location of the using
declaration.  I decided the latter was better, and arranged for the
better location to be retained.

Further investigation revealed the lack of a sorry message had to do
with the call being in a non-evaluated context, in this case, a
noexcept expression.  The sorry would be correctly reported in other
contexts, so I rolled back the check I'd added, but retained the
source location improvement.

I was still concerned about issuing sorry messages while instantiating
template ctors even in non-evaluated contexts, e.g., if a template
ctor had a base initializer that used an inherited ctor with enough
arguments that they'd go through an ellipsis.  I wanted to defer the
instantiation of such template ctors, but that would have been wrong
for constexpr template ctors, and already done for non-constexpr ones.
So, I just consolidated multiple test variants into a single testcase
that explores and explains various of the possibilities I thought of.

Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
with a known regression, and got only that known regression.  Retesting
without it.  Ok to install?


for  gcc/cp/ChangeLog

PR c++/88146
* method.c (do_build_copy_constructor): Do not crash with
ellipsis-only parm list.
(synthesize_method): Retain location of inherited ctor.

for  gcc/testsuite/ChangeLog

PR c++/88146
* g++.dg/cpp0x/inh-ctor32.C: New.
---
  gcc/cp/method.c |9 +
  gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C |  229 +++
  2 files changed, 234 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index fd023e200538..41d609fb1de6 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -643,7 +643,7 @@ do_build_copy_constructor (tree fndecl)
bool trivial = trivial_fn_p (fndecl);
tree inh = DECL_INHERITED_CTOR (fndecl);
  
-  if (!inh)

+  if (parm && !inh)
  parm = convert_from_reference (parm);


If inh is false, we're a copy constructor, which always has a parm, so 
this hunk seems unnecessary.


  
if (trivial)

@@ -677,7 +677,7 @@ do_build_copy_constructor (tree fndecl)
  {
tree fields = TYPE_FIELDS (current_class_type);
tree member_init_list = NULL_TREE;
-  int cvquals = cp_type_quals (TREE_TYPE (parm));
+  int cvquals = parm ? cp_type_quals (TREE_TYPE (parm)) : 0;


This could also check !inh.

Jason


Re: [C++ PATCH] [PR c++/87814] undefer deferred noexcept on tsubst if request

2018-12-14 Thread Jason Merrill

On 12/6/18 7:19 PM, Alexandre Oliva wrote:

tsubst_expr and tsubst_copy_and_build are not expected to handle
DEFERRED_NOEXCEPT exprs, but if tsubst_exception_specification takes a
DEFERRED_NOEXCEPT expr with !defer_ok, it just passes the expr on for
tsubst_copy_and_build to barf.

This patch arranges for tsubst_exception_specification to combine the
incoming args with those already stored in a DEFERRED_NOEXCEPT, and
then substitute them into the pattern, when retaining a deferred
noexcept is unacceptable.

Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
with a known regression, and got only that known regression.  Retesting
without it.  Ok to install?


OK.

Jason



Re: [C++ Patch] PR 84644 ("internal compiler error: in warn_misplaced_attr_for_class_type, at cp/decl.c:4718")

2018-12-14 Thread Jason Merrill

On 12/14/18 1:44 PM, Paolo Carlini wrote:

Hi,

On 13/12/18 22:03, Jason Merrill wrote:

On 10/30/18 9:22 PM, Paolo Carlini wrote:

Hi,

On 30/10/18 21:37, Jason Merrill wrote:

On 10/26/18 2:02 PM, Paolo Carlini wrote:

On 26/10/18 17:18, Jason Merrill wrote:
On Fri, Oct 26, 2018 at 4:52 AM Paolo Carlini 
 wrote:

On 24/10/18 22:41, Jason Merrill wrote:

On 10/15/18 12:45 PM, Paolo Carlini wrote:

 && ((TREE_CODE (declspecs->type) != TYPENAME_TYPE
+   && TREE_CODE (declspecs->type) != DECLTYPE_TYPE
  && MAYBE_CLASS_TYPE_P (declspecs->type))
I would think that the MAYBE_CLASS_TYPE_P here should be 
CLASS_TYPE_P,

and then we can remove the TYPENAME_TYPE check.  Or do we want to
allow template type parameters for some reason?
Indeed, it would be nice to just use OVERLOAD_TYPE_P. However it 
seems
we at least want to let through TEMPLATE_TYPE_PARMs representing 
'auto'

- otherwise Dodji's check a few lines below which fixed c++/51473
doesn't work anymore - and also BOUND_TEMPLATE_TEMPLATE_PARM, 
otherwise
we regress on template/spec32.C and template/ttp22.C because we 
don't

diagnose the shadowing anymore. Thus, I would say either we keep on
using MAYBE_CLASS_TYPE_P or we pick what we need, possibly we add 
a comment?

Aha.  I guess the answer is not to restrict that test any more, but
instead to fix the code further down so it gives a proper diagnostic
rather than call warn_misplaced_attr_for_class_type.


I see. Thus something like the below? It passes testing on 
x86_64-linux.



+  if ((!declared_type || TREE_CODE (declared_type) == DECLTYPE_TYPE)
+  && ! saw_friend && !error_p)
 permerror (input_location, "declaration does not declare 
anything");


I see no reason to make this specific to decltype.  Maybe move this 
diagnostic into the final 'else' block with the other declspec 
diagnostics and not look at declared_type at all?


I'm not sure to fully understand: if we do that we still want to at 
least minimally check that declared_type is null, like we already do, 
and then we simply accept the new testcase. Is that Ok? Because, as I 
probably mentioned at some point, all the other compilers I have at 
hand issue a "does not declare anything" diagnostic, and we likewise 
do that for the legacy __typeof. Not looking into declared_type *at 
all* doesn't work with plain class types and enums, of course. Or you 
meant something entirely different??



+  if (declspecs->attributes && warn_attributes && declared_type
+  && TREE_CODE (declared_type) != DECLTYPE_TYPE)


I think we do want to give a diagnostic about useless attributes, 
not skip it.


Agreed. FWIW the attached tests fine.


The problem here is that the code toward the bottom expects 
"declared_type" to be the tagged type declared by a declaration with 
no declarator, and in this testcase it's ending up as a DECLTYPE_TYPE.


I think once we've checked for 'auto' we don't want declared_type to 
be anything that isn't OVERLOAD_TYPE_P.  We can arrange that either by 
checking for 'auto' first and then changing the code that sets 
declared_type to use OVERLOAD_TYPE_P, or by clearing declared_type 
after checking for 'auto' if it isn't OVERLOAD_TYPE_P.


Thanks. I'm slowly catching up on this issue... Any suggestion about 
BOUND_TEMPLATE_TEMPLATE_PARM? If we don't let through such tree nodes - 
which are MAYBE_CLASS_TYPE_P and aren't OVERLOAD_TYPE_P - we regress on 
template/spec32.C, we don't reject it anymore.


If we clear declared_type for a BOUND_TEMPLATE_TEMPLATE_PARM, we should 
get the "does not declare anything" error.


Jason




Re: [C++ PATCH] [PR c++/87814] undefer deferred noexcept on tsubst if request

2018-12-14 Thread Alexandre Oliva
On Dec  6, 2018, Alexandre Oliva  wrote:

> Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
> with a known regression, and got only that known regression.  Retesting
> without it.  Ok to install?

Ping?  That retesting confirmed no regressions.
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00423.html


> for  gcc/cp/ChangeLog

>   PR c++/87814
>   * pt.c (tsubst_exception_specification): Handle
>   DEFERRED_NOEXCEPT with !defer_ok.

> for  gcc/testsuite/ChangeLog

>   PR c++/87814
>   * g++.dg/cpp1z/pr87814.C: New.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-14 Thread Alexandre Oliva
On Dec  6, 2018, Alexandre Oliva  wrote:

> Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
> with a known regression, and got only that known regression.  Retesting
> without it.  Ok to install?

Ping?  That round of retesting confirmed no regressions.
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00424.html


> for  gcc/cp/ChangeLog

>   PR c++/88146
>   * method.c (do_build_copy_constructor): Do not crash with
>   ellipsis-only parm list.
>   (synthesize_method): Retain location of inherited ctor.

> for  gcc/testsuite/ChangeLog

>   PR c++/88146
>   * g++.dg/cpp0x/inh-ctor32.C: New.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [nvptx] vector length patch series

2018-12-14 Thread Tom de Vries
On 29-10-18 20:28, Cesar Philippidis wrote:
> On 10/5/18 23:22, Tom de Vries wrote:
>> On 9/18/18 10:04 PM, Cesar Philippidis wrote:
>>> 591973d3c3a [nvptx] use user-defined vectors when possible
>>
>> If I drop this patch, I get the same test results. Can you find a
>> testcase for which this patch has an effect?
> 
> I just revisited the vector length patch series, and that patch in
> specific is bogus and can be safely dropped.
> 

Hi Thomas,

The new vector length patch series contains these patches:
...
0001-libgomp-OpenACC-Adjust-offsets-for-present-data-clau.patch
0002-nvptx-Update-insufficient-launch-message-to-accommod.patch
0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch
0004-openacc-Make-GFC-default-to-1-for-OpenACC-routine-di.patch
0005-nvptx-update-openacc-dim-macros.patch
0006-nvptx-Rename-worker_bcast-variables-oacc_bcast.patch
0007-nvptx-consolidate-offloaded-function-attributes-into.patch
0008-nvptx-make-nvptx-state-propagation-function-names-mo.patch
0009-nvptx-Fix-whitespace-in-nvptx_single-and-nvptx_neute.patch
0010-nvptx-only-use-one-bar.sync-barriers-in-OpenACC-offl.patch
0011-nvptx-Add-thread-count-parm-to-bar.sync.patch
0012-nvptx-Add-axis_dim.patch
0013-nvptx-Use-TARGET_SET_CURRENT_FUNCTION.patch
0014-nvptx-Use-MAX-MIN-ROUND_UP-macros.patch
0015-nvptx-Generalize-state-propagation-and-synchronizati.patch
0016-nvptx-Add-vector_length-128-testcases.patch
0017-nvptx-Enable-large-vectors.patch
0018-nvptx-Handle-large-vectors-in-libgomp.patch
0019-nvptx-Enable-worker-partitioning-with-warp-sized-vec.patch
0020-nvptx-Simplifly-logic-in-nvptx_single.patch
0021-PR85246-nvptx-Fix-propagation-of-branch-cond-in-vw-n.patch
0022-nvptx-openacc-Don-t-emit-barriers-for-empty-loops.patch
0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch
0024-nvptx-Handle-large-vector-reductions.patch
0025-OpenACC-Enable-firstprivate-OpenACC-reductions.patch
...

> 0001-libgomp-OpenACC-Adjust-offsets-for-present-data-clau.patch

This patch (well, a variant of it) has been committed (although it's not
clear to me why this was included in this patch series).

> 0025-OpenACC-Enable-firstprivate-OpenACC-reductions.patch

This patch is not required for this patch series. If you remove it,
vred2d-128.c and gemm.f90 start to fail, which is trivially fixable by
adding firstprivate clauses according to the test-cases.

> 0004-openacc-Make-GFC-default-to-1-for-OpenACC-routine-di.patch

If I remove this, I run into ICEs in the compiler, but I think that
means we need to understand and fix that ICE, instead of concluding that
we need this patch. It looks completely unrelated.

Thanks,
- Tom


[PATCH] Fix avx512f_sfixupimm* (PR target/88489)

2018-12-14 Thread Jakub Jelinek
Hi!

The avx512f-vfixupimms{s,d}-2.c testcases were miscompiled with -mavx512vl.
The problem is that there are separate instructions (e.g. vfixupimmsd vs.
vfixupimmpd), each of those have different behavior, the first one is
TARGET_AVX512F, the latter TARGET_AVX512VL, but the 128-bit version of the
latter used identical RTL pattern to the vfixupimmsd instruction and was
defined earlier.

Fixed by using different UNSPEC number for the scalar vs. vector ones.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-14  Jakub Jelinek  

PR target/88489
* config/i386/sse.md (UNSPEC_SFIXUPIMM): New unspec enumerator.
(avx512f_sfixupimm): Use it
instead of UNSPEC_FIXUPIMM.

* gcc.target/i386/avx512vl-vfixupimmsd-2.c: New test.
* gcc.target/i386/avx512vl-vfixupimmss-2.c: New test.

--- gcc/config/i386/sse.md.jj   2018-12-13 08:59:03.0 +0100
+++ gcc/config/i386/sse.md  2018-12-14 18:43:04.924881740 +0100
@@ -95,6 +95,7 @@ (define_c_enum "unspec" [
   UNSPEC_RCP14
   UNSPEC_RSQRT14
   UNSPEC_FIXUPIMM
+  UNSPEC_SFIXUPIMM
   UNSPEC_SCALEF
   UNSPEC_VTERNLOG
   UNSPEC_GETEXP
@@ -8872,7 +8873,7 @@ (define_insn "avx512f_sfixupimm 2 "" 
"")
 (match_operand:SI 3 "const_0_to_255_operand")]
-   UNSPEC_FIXUPIMM))]
+   UNSPEC_SFIXUPIMM))]
"TARGET_AVX512F"
   "vfixupimm\t{%3, %2, %1, 
%0|%0, %1, %2, %3}";
   [(set_attr "prefix" "evex")
--- gcc/testsuite/gcc.target/i386/avx512vl-vfixupimmsd-2.c.jj   2018-12-14 
18:46:05.334946460 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-vfixupimmsd-2.c  2018-12-14 
18:50:11.986933429 +0100
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512vl -O2 -std=gnu99" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target c99_runtime } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#include "avx512f-vfixupimmsd-2.c"
+
+static void
+test_256 (void)
+{
+  test_512 ();
+}
+
+static void
+test_128 (void)
+{
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-vfixupimmss-2.c.jj   2018-12-14 
18:50:52.808269261 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-vfixupimmss-2.c  2018-12-14 
18:51:06.084053265 +0100
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512vl -O2 -std=gnu99" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target c99_runtime } */
+
+#define AVX512VL
+#define AVX512F_LEN 512
+#define AVX512F_LEN_HALF 256
+#include "avx512f-vfixupimmss-2.c"
+
+static void
+test_256 (void)
+{
+  test_512 ();
+}
+
+static void
+test_128 (void)
+{
+}

Jakub


[PATCH] Fix up AVX512F masked gather vectorization, add support for AVX512F 512-bit masked scatter vectorization (PR tree-optimization/88464)

2018-12-14 Thread Jakub Jelinek
Hi!

In the previous patch I've unfortunately left one important case from the
testcase and apparently it wasn't covered by anything else in the testsuite.
The 3 functions covered float and double gathers with indexes with the same
bitsize and WIDENING gather (double gather with int index), but didn't cover
NARROWING case (float gather with long index with -m64).  That was the only
case that tried to permute the mask, unfortunately that isn't really
supported and ICEs.  What works is VEC_UNPACK_{LO,HI}_EXPR on the
VECTOR_BOOLEAN_TYPE_P, that is what other spots in the vectorizer emit for
those.

I had to also fix up the x86 backend, which had in expansion of these
NARROWING gather builtins code cut from the 256-bit builtin,
unfortunately it wasn't adjusted for the fact that the 512-bit builtin uses
integral mask argument while the 256-bit one doesn't.  And even in the
256-bit one there was a bug, it relied on the mask and src arguments to be
always in the same register (which is actually what the vectorizer generates
for those right now, but it could do something else).

This patch fixes that and enables also masked x86 AVX512F 512-bit
scatter support.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

What is still unhandled (doesn't vectorize) is 128-bit or 256-bit scatters,
I bet the mask operand is vectorized using normal non-bool vectors, but the
instructions with AVX512VL actually need a mask register.  There are
instructions that can handle that, but let's defer that for later.

2018-12-14  Jakub Jelinek  

PR tree-optimization/88464
* tree-vect-stmts.c (vect_build_gather_load_calls): For NARROWING
and mask with integral masktype, don't try to permute mask vectors,
instead emit VEC_UNPACK_{LO,HI}_EXPR.  Fix up NOP_EXPR operand.
(vectorizable_store): Handle masked scatters with decl and integral
mask type.
(permute_vec_elements): Allow scalar_dest to be NULL.
* config/i386/i386.c (ix86_get_builtin)
: Use lowpart_subreg for masks.
: Don't assume mask and src have
to be the same.

* gcc.target/i386/avx512f-pr88462-1.c: Rename to ...
* gcc.target/i386/avx512f-pr88464-1.c: ... this.  Fix up PR number.
Expect 4 vectorized loops instead of 3.
(f4): New function.
* gcc.target/i386/avx512f-pr88462-2.c: Rename to ...
* gcc.target/i386/avx512f-pr88464-2.c: ... this.  Fix up PR number
and #include.
(avx512f_test): Prepare arguments for f4 and check the results.
* gcc.target/i386/avx512f-pr88464-3.c: New test.
* gcc.target/i386/avx512f-pr88464-4.c: New test.

--- gcc/tree-vect-stmts.c.jj2018-12-13 18:01:13.0 +0100
+++ gcc/tree-vect-stmts.c   2018-12-14 17:10:42.079054458 +0100
@@ -2655,6 +2655,7 @@ vect_build_gather_load_calls (stmt_vec_i
   if (mask && TREE_CODE (masktype) == INTEGER_TYPE)
 masktype = build_same_sized_truth_vector_type (srctype);
 
+  tree mask_halftype = masktype;
   tree perm_mask = NULL_TREE;
   tree mask_perm_mask = NULL_TREE;
   if (known_eq (nunits, gather_off_nunits))
@@ -2690,13 +2691,16 @@ vect_build_gather_load_calls (stmt_vec_i
 
   ncopies *= 2;
 
-  if (mask)
+  if (mask && masktype == real_masktype)
{
  for (int i = 0; i < count; ++i)
sel[i] = i | (count / 2);
  indices.new_vector (sel, 2, count);
  mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
}
+  else if (mask)
+   mask_halftype
+ = build_same_sized_truth_vector_type (gs_info->offset_vectype);
 }
   else
 gcc_unreachable ();
@@ -2761,7 +2765,7 @@ vect_build_gather_load_calls (stmt_vec_i
{
  if (j == 0)
vec_mask = vect_get_vec_def_for_operand (mask, stmt_info);
- else
+ else if (modifier != NARROW || (j & 1) == 0)
vec_mask = vect_get_vec_def_for_stmt_copy (loop_vinfo,
   vec_mask);
 
@@ -2779,17 +2783,27 @@ vect_build_gather_load_calls (stmt_vec_i
  mask_op = var;
}
}
+ if (modifier == NARROW && masktype != real_masktype)
+   {
+ var = vect_get_new_ssa_name (mask_halftype, vect_simple_var);
+ gassign *new_stmt
+   = gimple_build_assign (var, (j & 1) ? VEC_UNPACK_HI_EXPR
+   : VEC_UNPACK_LO_EXPR,
+  mask_op);
+ vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
+ mask_op = var;
+   }
  src_op = mask_op;
}
 
   tree mask_arg = mask_op;
   if (masktype != real_masktype)
{
- tree utype;
- if (TYPE_MODE (real_masktype) == TYPE_MODE (masktype))
+ tree utype, optype = TREE_TYPE (mask_op);
+ if (TYPE_MODE 

Re: [C++ Patch] PR 84644 ("internal compiler error: in warn_misplaced_attr_for_class_type, at cp/decl.c:4718")

2018-12-14 Thread Paolo Carlini

Hi,

On 13/12/18 22:03, Jason Merrill wrote:

On 10/30/18 9:22 PM, Paolo Carlini wrote:

Hi,

On 30/10/18 21:37, Jason Merrill wrote:

On 10/26/18 2:02 PM, Paolo Carlini wrote:

On 26/10/18 17:18, Jason Merrill wrote:
On Fri, Oct 26, 2018 at 4:52 AM Paolo Carlini 
 wrote:

On 24/10/18 22:41, Jason Merrill wrote:

On 10/15/18 12:45 PM, Paolo Carlini wrote:

 && ((TREE_CODE (declspecs->type) != TYPENAME_TYPE
+   && TREE_CODE (declspecs->type) != DECLTYPE_TYPE
  && MAYBE_CLASS_TYPE_P (declspecs->type))
I would think that the MAYBE_CLASS_TYPE_P here should be 
CLASS_TYPE_P,

and then we can remove the TYPENAME_TYPE check.  Or do we want to
allow template type parameters for some reason?
Indeed, it would be nice to just use OVERLOAD_TYPE_P. However it 
seems
we at least want to let through TEMPLATE_TYPE_PARMs representing 
'auto'

- otherwise Dodji's check a few lines below which fixed c++/51473
doesn't work anymore - and also BOUND_TEMPLATE_TEMPLATE_PARM, 
otherwise
we regress on template/spec32.C and template/ttp22.C because we 
don't

diagnose the shadowing anymore. Thus, I would say either we keep on
using MAYBE_CLASS_TYPE_P or we pick what we need, possibly we add 
a comment?

Aha.  I guess the answer is not to restrict that test any more, but
instead to fix the code further down so it gives a proper diagnostic
rather than call warn_misplaced_attr_for_class_type.


I see. Thus something like the below? It passes testing on 
x86_64-linux.



+  if ((!declared_type || TREE_CODE (declared_type) == DECLTYPE_TYPE)
+  && ! saw_friend && !error_p)
 permerror (input_location, "declaration does not declare 
anything");


I see no reason to make this specific to decltype.  Maybe move this 
diagnostic into the final 'else' block with the other declspec 
diagnostics and not look at declared_type at all?


I'm not sure to fully understand: if we do that we still want to at 
least minimally check that declared_type is null, like we already do, 
and then we simply accept the new testcase. Is that Ok? Because, as I 
probably mentioned at some point, all the other compilers I have at 
hand issue a "does not declare anything" diagnostic, and we likewise 
do that for the legacy __typeof. Not looking into declared_type *at 
all* doesn't work with plain class types and enums, of course. Or you 
meant something entirely different??



+  if (declspecs->attributes && warn_attributes && declared_type
+  && TREE_CODE (declared_type) != DECLTYPE_TYPE)


I think we do want to give a diagnostic about useless attributes, 
not skip it.


Agreed. FWIW the attached tests fine.


The problem here is that the code toward the bottom expects 
"declared_type" to be the tagged type declared by a declaration with 
no declarator, and in this testcase it's ending up as a DECLTYPE_TYPE.


I think once we've checked for 'auto' we don't want declared_type to 
be anything that isn't OVERLOAD_TYPE_P.  We can arrange that either by 
checking for 'auto' first and then changing the code that sets 
declared_type to use OVERLOAD_TYPE_P, or by clearing declared_type 
after checking for 'auto' if it isn't OVERLOAD_TYPE_P.


Thanks. I'm slowly catching up on this issue... Any suggestion about 
BOUND_TEMPLATE_TEMPLATE_PARM? If we don't let through such tree nodes - 
which are MAYBE_CLASS_TYPE_P and aren't OVERLOAD_TYPE_P - we regress on 
template/spec32.C, we don't reject it anymore.,


Paolo.



Re: [PATCH] Delete powerpcspe

2018-12-14 Thread Joseph Myers
On Fri, 14 Dec 2018, Jeff Law wrote:

> > I wonder if we could set up auto-(simulator)-testing for all supported
> > archs (and build testing for all supported configs) on the CF
> > (with the required scripting in contrib/ so it's easy to replicate).  I'd
> > simply test only released snapshots to keep the load reasonable
> > and besides posting to gcc-testresults also post testresults
> > differences to gcc-regression?
> It's certainly possible.  Though I've found that managing this kind of
> thing with Jenkins is far easier than rolling our own.  I'd be happy to
> move an instance out into the CF.

On the other hand, in glibc having a single script build-many-glibcs.py 
that does everything (rather than some external piece of software to 
orchestrate builds) serves to make it easy for people making changes with 
cross-architecture risks to run the compilation tests for all 
architectures - although doing so is slow unless you have lots of CPU 
cores.

-- 
Joseph S. Myers
jos...@codesourcery.com


Linux x86 unwinder: Handle __NR_sigreturn for __kernel_sigreturn support

2018-12-14 Thread Florian Weimer
I believe this may address recent unwinder failures in Fedora if the
vDSO unwinder does not contain unwinding data:

  

The question is: Do we want to move in that direction?  Or should we
make clear that the userspace ABI *requires* unwinding information?

Thanks,
Florian

2018-12-14  Florian Weimer  

* config/i386/linux-unwind.h (x86_frob_update_context): Also check
for __NR_sigreturn.

diff --git a/libgcc/config/i386/linux-unwind.h 
b/libgcc/config/i386/linux-unwind.h
index ea838e4e47b..502a87a2cb0 100644
--- a/libgcc/config/i386/linux-unwind.h
+++ b/libgcc/config/i386/linux-unwind.h
@@ -190,9 +190,10 @@ x86_frob_update_context (struct _Unwind_Context *context,
 {
   unsigned char *pc = context->ra;
 
-  /* movl $__NR_rt_sigreturn,%eax ; {int $0x80 | syscall}  */
+  /* movl $__NR_{rt_|}sigreturn,%eax ; {int $0x80 | syscall}  */
   if (*(unsigned char *)(pc+0) == 0xb8
-  && *(unsigned int *)(pc+1) == 173
+  && (*(unsigned int *)(pc+1) == 119
+ || *(unsigned int *)(pc+1) == 173)
   && (*(unsigned short *)(pc+5) == 0x80cd
  || *(unsigned short *)(pc+5) == 0x050f))
 _Unwind_SetSignalFrame (context, 1);


Re: [PATCH, libphobos] Committed remove unused internal modules

2018-12-14 Thread Iain Buclaw
On Mon, 19 Nov 2018 at 00:40, Iain Buclaw  wrote:
>
> Hi,
>
> This patch removes two x86-centric modules that are ignored as gdc
> doesn't implement the D_InlineAsm version condition.  Bootstrapped and
> testsuite ran on x86_64-linux-gnu.
>
> Committed to trunk as r266256
>
> --
> Iain
> ---
> libphobos/ChangeLog:
>
> 2018-11-19  Iain Buclaw  
>
> * src/Makefile.am: Remove std.internal.digest.sha_SSSE3 and
> std.internal.math.biguintx86 modules.
> * src/Makefile.in: Rebuild.
> * src/std/internal/digest/sha_SSSE3.d: Remove.
> * src/std/internal/math/biguintx86.d: Remove.
>
> ---

The digest directory was made empty by this, but wasn't removed.  I've
committed the removal of as obvious.

-- 
Iain


Re: [PATCH 1/6, OpenACC, libgomp] Async re-work, interfaces

2018-12-14 Thread Thomas Schwinge
Hi!

A few more -- final? ;-) -- comments:

On Tue, 25 Sep 2018 21:10:21 +0800, Chung-Lin Tang  
wrote:
> This patch separates out the header interface changes. GOMP_VERSION has been 
> bumped,
> and various changes to the plugin interface, and a few libgomp internal 
> functions
> declared. The libgomp linkmap updated as well.

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h

> @@ -199,7 +200,7 @@ enum gomp_map_kind
>  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
> should be incremented whenever an ABI-incompatible change is introduced
> to the plugin interface defined in libgomp/libgomp.h.  */
> -#define GOMP_VERSION 1
> +#define GOMP_VERSION 2
>  #define GOMP_VERSION_NVIDIA_PTX 1
>  #define GOMP_VERSION_INTEL_MIC 0
>  #define GOMP_VERSION_HSA 0

OK, I think -- but I'm never quite sure whether we do need to increment
"GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,
which don't affect the user/GCC side?

GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls
synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,
/* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks
in "GOMP_offload_register_ver", so that we don't try to load offloading
code with an _old_ libgomp that has been compiled with/for the _new_
version.  (Right?)

void
GOMP_offload_register_ver (unsigned version, const void *host_table,
   int target_type, const void *target_data)
{ [...]
  if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
gomp_fatal ("Library too old for offload (version %u < %u)",
GOMP_VERSION, GOMP_VERSION_LIB (version));

I don't have a problem with your change per se, but wouldn't we still be
able to load such code, given that we only changed the libgomp-interal
libgomp-plugin interface?

Am I confused?

Or is the above just an (unavoidable?) side effect, because we do need to
increment "GOMP_VERSION" for this check here:

  if (device->version_func () != GOMP_VERSION)
{
  err = "plugin version mismatch";
  goto fail;
}

..., which is making sure that the libgomp proper vs. libgomp-plugin
versions match.


> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {
>   GOMP_PLUGIN_debug;
>   GOMP_PLUGIN_error;
>   GOMP_PLUGIN_fatal;
> - GOMP_PLUGIN_async_unmap_vars;
>   GOMP_PLUGIN_acc_thread;
>  };

I think that's fine, but highlighting this again for Jakub, in case
there's an issue with removing a symbol from the libgomp-plugin
interface.


> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h

> +/* Opaque type to represent plugin-dependent implementation of an
> +   OpenACC asynchronous queue.  */
> +struct goacc_asyncqueue;
> +
> +/* Used to keep a list of active asynchronous queues.  */
> +struct goacc_asyncqueue_list
> +{
> +  struct goacc_asyncqueue *aq;
> +  struct goacc_asyncqueue_list *next;
> +};
> +
> +typedef struct goacc_asyncqueue *goacc_aq;
> +typedef struct goacc_asyncqueue_list *goacc_aq_list;

I'm not too fond of such "syntactic sugar" typedefs, but if that's fine
for Jakub to have in libgomp, then I won't object.

I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"
variants however, instead of introducing yet another "goacc_aq" acronym
next to "goacc_asyncqueue", and "async queue" or "asynchronous queue" as
used in the descriptive texts (comments, etc.).  Maybe standardize all
these to "asyncqueue", also in the descriptive texts?

OpenACC, by the way, uses the term "device activity queue" (in most?
places...) to describe the underlying mechanism used to implement the
OpenACC "async" clause etc.

Should "struct goacc_asyncqueue_list" and its typedef still be defined
here in "libgomp/libgomp-plugin.h" (for proximity to the other stuff),
even though it's not actually used in the libgomp-plugin interface?

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -888,19 +888,23 @@ typedef struct acc_dispatch_t
[...]
> +  struct {
> +gomp_mutex_t lock;
> +int nasyncqueue;
> +struct goacc_asyncqueue **asyncqueue;
> +struct goacc_asyncqueue_list *active;
[...]
> +  } async;

For "lock" see my comments elsewhere.

That data structure itself should be fine, no need for something more
complex, given that users typically only use a handful of such queues,
with low integer ID async-arguments.

I'd maybe name these members "queues_n", "queues", "queues_active".


As for the following changes, will you please make sure that there is one
common order for these, used in "libgomp/libgomp-plugin.h" function
prototypes, "libgomp/libgomp.h:acc_dispatch_t",
"libgomp/target.c:gomp_load_plugin_for_device", "libgomp/oacc-host.c"
function definitions as well as in "host_dispatch", and the
libgomp-plugin(s) themselves (that's all, I think?).

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> 

[committed] Fix minor goof in last change for TARGET_ASM_POST_CFI_STARTPROC

2018-12-14 Thread Jeff Law

It looks like Jason asked Sam to make a last minute doc change.  Sam
made that change in the tm.texi file, but target.def has the old text.

This causes a build failure.  I'm guessing the wrong target.def was
committed.  Regardless the fix is trivial.

Jeff
diff --git a/gcc/target.def b/gcc/target.def
index c425341ac3..698c3aa796 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -98,7 +98,8 @@ DEFHOOK
 (post_cfi_startproc,
   "This target hook is used to emit assembly strings required by the target\n\
 after the .cfi_startproc directive.  The first argument is the file stream 
to\n\
-write the strings to and the second argument is the function\'s declaration.\n\
+write the strings to and the second argument is the function\'s declaration.  
The\n\
+expected use is to add more .cfi_* directives.\n\
 \n\
 The default is to not output any assembly strings.",
   void, (FILE *, tree),


Re: Add a loop versioning pass

2018-12-14 Thread Richard Sandiford
Richard Biener  writes:
> On December 12, 2018 7:43:10 PM GMT+01:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On Thu, Dec 6, 2018 at 2:19 PM Richard Sandiford
 Tested on x86_64-linux-gnu, aarch64-linux-gnu and aarch64_be-elf.
 Also repeated the performance testing (but haven't yet tried an
 LTO variant; will do that over the weekend).
>>>
>>> Any results?
>>
>>Sorry, I should've remembered that finding time to run tests is easy,
>>finding time to analyse them is hard.
>>
>>Speed-wise, the impact of the patch for LTO is similar to without,
>>with 554.roms_r being the main beneficiary for both AArch64 and x86_64.
>>I get a 6.8% improvement on Cortex-A57 with -Ofast -mcpu=native
>>-flto=jobserver.
>>
>>Size-wise, there are three tests that grow by >=2% on x86_64:
>>
>>549.fotonik3d_r: 5.5%
>>548.exchange2_r: 29.5%
>>554.roms_r: 39.6%
>
> Uh. With LTO we might have a reasonable guessed profile and you do have a 
> optimize_loop_nest_for_speed guard on the transform? 

Guard now added :-)  But unfortunately it doesn't make any significant
difference.  548.exchange2_r goes from 29.5% to 27.7%, but the other
two are the same as before.

> How does compile time fare with the above benchmarks?

For 554.roms_r it's +80%(!) with -flto=1, but I think that's par for
the course given the increase in function sizes.

For 549.fotonik3d_r it's +5% with -flto=1.

For 503.bwaves_r (as an example of a benchmark whose size doesn't change),
the difference is in the noise.

[...]

>>You mean something like:
>>
>>  real :: foo(:,:), bar(:)
>>
>>  do i=1,n
>>do j=1,n
>>  foo(i,j) = ...
>>end do
>>bar(i) = ..
>>  end do
>>
>>?  I can add a test if so.
>
> Please. 

OK, I've added them to loop_versioning_4.f90.

>>> There may also be some subtle issues with substitute_and_fold being
>>> applied to non-up-to-date SSA form given it folds stmts looking at
>>> (single-use!) SSA edges.  The single-use guard might be what saves
>>you
>>> here (SSA uses in the copies are not yet updated to point to the
>>> copied DEFs).
>>
>>OK.  I was hoping that because we only apply substitute_and_fold
>>to new code, there would be no problem with uses elsewhere.
>
> Might be, yes.
>
>>Would it be safer to:
>>
>>  - version all loops we want to version
>>  - update SSA explicitly
>>  - apply substitute and fold to all "new" loops
>
> That would be definitely less fishy. But you need to get at the actual
> 'new' SSA names for the replacements as I guess they'll be rewritten?
> Or maybe those are not.
>
>>?  Could we then get away with returning a 0 TODO at the end?
>
> Yes. 

OK, the updated patch does it this way.

Tested as before.

Thanks,
Richard


2018-12-14  Richard Sandiford  
Ramana Radhakrishnan  
Kyrylo Tkachov  

gcc/
* doc/invoke.texi (-fversion-loops-for-strides): Document
(loop-versioning-group-size, loop-versioning-max-inner-insns)
(loop-versioning-max-outer-insns): Document new --params.
* Makefile.in (OBJS): Add gimple-loop-versioning.o.
* common.opt (fversion-loops-for-strides): New option.
* opts.c (default_options_table): Enable fversion-loops-for-strides
at -O3.
* params.def (PARAM_LOOP_VERSIONING_GROUP_SIZE)
(PARAM_LOOP_VERSIONING_MAX_INNER_INSNS)
(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS): New parameters.
* passes.def: Add pass_loop_versioning.
* timevar.def (TV_LOOP_VERSIONING): New time variable.
* tree-ssa-propagate.h
(substitute_and_fold_engine::substitute_and_fold): Add an optional
block parameter.
* tree-ssa-propagate.c
(substitute_and_fold_engine::substitute_and_fold): Likewise.
When passed, only walk blocks dominated by that block.
* tree-vrp.h (range_includes_p): Declare.
(range_includes_zero_p): Turn into an inline wrapper around
range_includes_p.
* tree-vrp.c (range_includes_p): New function, generalizing...
(range_includes_zero_p): ...this.
* tree-pass.h (make_pass_loop_versioning): Declare.
* gimple-loop-versioning.cc: New file.

gcc/testsuite/
* gcc.dg/loop-versioning-1.c: New test.
* gcc.dg/loop-versioning-10.c: Likewise.
* gcc.dg/loop-versioning-11.c: Likewise.
* gcc.dg/loop-versioning-2.c: Likewise.
* gcc.dg/loop-versioning-3.c: Likewise.
* gcc.dg/loop-versioning-4.c: Likewise.
* gcc.dg/loop-versioning-5.c: Likewise.
* gcc.dg/loop-versioning-6.c: Likewise.
* gcc.dg/loop-versioning-7.c: Likewise.
* gcc.dg/loop-versioning-8.c: Likewise.
* gcc.dg/loop-versioning-9.c: Likewise.
* gfortran.dg/loop_versioning_1.f90: Likewise.
* gfortran.dg/loop_versioning_2.f90: Likewise.
* gfortran.dg/loop_versioning_3.f90: Likewise.
* gfortran.dg/loop_versioning_4.f90: Likewise.
* gfortran.dg/loop_versioning_5.f90: Likewise.
* 

Re: [PATCH, ARM] Improve robustness of -mslow-flash-data

2018-12-14 Thread Kyrill Tkachov

Hi Thomas,

On 11/12/18 16:09, Thomas Preudhomme wrote:

Hi Kyrill,

I've tested on armeb-none-eabi with -mslow-flash-data for both
-mfloat-abi=hard and -mfloat-abi=soft. Both show no regression and the
former shows some new PASS.

Regarding the part you are hesitant about, the code was taken from
aarch64_reinterpret_float_as_int in config/aarch64/aarch64.c. I'm not
too keen on splitting the patch unless it's just for review (ie still
committed as one) since the changes really go together. The tighter
predicate and constraint are to prevent normal pattern to match when
-mslow-flash-data is in effect while the new splitter and expander is
to deal with load under those circumstances.

Best regards,
Thomas


Ok. Thanks for the explanation.
Kyrill


On Fri, 30 Nov 2018 at 14:11, Kyrill Tkachov
 wrote:

Hi Thomas,

On 19/11/18 17:56, Thomas Preudhomme wrote:

Hi,

Current code to handle -mslow-flash-data in machine description files
suffers from a number of issues which this patch fixes:

1) The insn_and_split in vfp.md to load a generic floating-point
constant via GPR first and move it to VFP register are guarded by
!reload_completed which is forbidden explicitely in the GCC internals
documentation section 17.2 point 3;

2) A number of testcase in the testsuite ICEs under -mslow-flash-data
when targeting the hardfloat ABI [1];

3) Instructions performing load from literal pool are not disabled.

These problems are addressed by 2 separate actions:

1) Making the splitters take a clobber and changing the expanders
accordingly to generate a mov with clobber in cases where a literal
pool would be used. The splitter can thus be enabled after reload since
it does not call gen_reg_rtx anymore;

2) Adding new predicates and constraints to disable literal pool loads
in existing instructions when -mslow-flash-data is in effect.


Please split these into two separate patches so we can more clearly see which 
changes address which problem


The patch also rework the splitter for DFmode slightly to generate an
intermediate DI load instead of 2 intermediate SI loads, thus relying on
the existing DI splitters instead of redoing their job. At last, the
patch adds some missing arm_fp_ok effective target to some of the
slow-flash-data testcases.

[1]
c-c++-common/Wunused-var-3.c
gcc.c-torture/compile/pr72771.c
gcc.c-torture/compile/vector-5.c
gcc.c-torture/compile/vector-6.c
gcc.c-torture/execute/20030914-1.c
gcc.c-torture/execute/20050316-1.c
gcc.c-torture/execute/pr59643.c
gcc.dg/builtin-tgmath-1.c
gcc.dg/debug/pr55730.c
gcc.dg/graphite/interchange-7.c
gcc.dg/pr56890-2.c
gcc.dg/pr68474.c
gcc.dg/pr80286.c
gcc.dg/torture/pr35227.c
gcc.dg/torture/pr65077.c
gcc.dg/torture/pr86363.c
g++.dg/torture/pr81112.C
g++.dg/torture/pr82985.C
g++.dg/warn/Wunused-var-7.C
and a lot more in libstdc++ in special_functions/*_comp_ellint_* and
special_functions/*_ellint_* directories.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-11-14  Thomas Preud'homme 

 * config/arm/arm.md (arm_movdi): Split if -mslow-flash-data and
 source is a constant that would be loaded by literal pool.
 (movsf expander): Generate a no_literal_pool_sf_immediate insn if
 -mslow-flash-data is present, targeting hardfloat ABI and source is a
 float constant that cannot be loaded via vmov.
 (movdf expander): Likewise but generate a no_literal_pool_df_immediate
 insn.
 (arm_movsf_soft_insn): Split if -mslow-flash-data and source is a
 float constant that would be loaded by literal pool.
 (softfloat constant movsf splitter): Splitter for the above case.
 (movdf_soft_insn): Split if -mslow-flash-data and source is a float
 constant that would be loaded by literal pool.
 (softfloat constant movdf splitter): Splitter for the above case.
 * config/arm/constraints.md (Pz): Document existing constraint.
 (Ha): Define constraint.
 (Tu): Likewise.
 * config/arm/predicates.md (hard_sf_operand): New predicate.
 (hard_df_operand): Likewise.
 * config/arm/thumb2.md (thumb2_movsi_insn): Split if
 -mslow-flash-data and constant would be loaded by literal pool.
 * constant/arm/vfp.md (thumb2_movsi_vfp): Likewise and disable constant
 load in VFP register.
 (movdi_vfp): Likewise.
 (thumb2_movsf_vfp): Use hard_sf_operand as predicate for source to
 prevent match for a constant load if -mslow-flash-data and constant
 cannot be loaded via vmov.  Adapt constraint accordingly by
 using Ha instead of E for generic floating-point constant load.
 (thumb2_movdf_vfp): Likewise using hard_df_operand predicate instead.
 (no_literal_pool_df_immediate): Add a clobber to use as the
 intermediate general purpose register and also enable it after reload
 but disable it constant is a valid FP constant.  Add constraints and
 generate a DI 

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-12-14 Thread Sudakshina Das
Hi James

On 29/11/18 16:47, Sudakshina Das wrote:
> Hi
> 
> On 13/11/18 14:47, Sudakshina Das wrote:
>> Hi
>>
>> On 02/11/18 18:38, Sudakshina Das wrote:
>>> Hi
>>>
>>> This patch is part of a series that enables ARMv8.5-A in GCC and
>>> adds Branch Target Identification Mechanism.
>>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>>
>>> This patch adds a new pass called "bti" which is triggered by the
>>> command line argument -mbranch-protection whenever "bti" is turned on.
>>>
>>> The pass iterates through the instructions and adds appropriated BTI
>>> instructions based on the following:
>>>* Add a new "BTI C" at the beginning of a function, unless its 
>>> already
>>>  protected by a "PACIASP/PACIBSP". We exempt the functions that are
>>>  only called directly.
>>>* Add a new "BTI J" for every target of an indirect jump, jump table
>>>  targets, non-local goto targets or labels that might be referenced
>>>  by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
>>>
>>> Since we have already changed the use of indirect tail calls to only x16
>>> and x17, we do not have to use "BTI JC".
>>> (check patch 3/6).
>>>
>>
>> I missed out on the explanation for the changes to the trampoline code.
>> The patch also updates the trampoline code in case BTI is enabled. Since
>> the trampoline code is a target of an indirect branch, we need to add an
>> appropriate BTI instruction at the beginning of it to avoid a branch
>> target exception.
>>
>>> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
>>> new tests.
>>> Is this ok for trunk?
>>>
>>> Thanks
>>> Sudi
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>> Ramana Radhakrishnan  
>>>
>>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>>> * gcc/config/aarch64/aarch64.h: Update comment for
>>> TRAMPOLINE_SIZE.
>>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>>> Update if bti is enabled.
>>> * config/aarch64/aarch64-bti-insert.c: New file.
>>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>>> bti pass.
>>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>>> Declare the new bti pass.
>>> * config/aarch64/aarch64.md (bti_nop): Define.
>>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>>
>>> * gcc.target/aarch64/bti-1.c: New test.
>>> * gcc.target/aarch64/bti-2.c: New test.
>>> * lib/target-supports.exp
>>> (check_effective_target_aarch64_bti_hw): Add new check for
>>> BTI hw.
>>>
>>
>> Updated patch attached with more comments and a bit of simplification
>> in aarch64-bti-insert.c. ChangeLog still applies.
>>
>> Thanks
>> Sudi
>>
> 
> I found a missed case in the bti pass and edited the patch to include
> it. This made me realize that the only 2 regressions I saw with the
> BTI enabled model can now be avoided. (as quoted below from my 6/6
> patch)
> "Bootstrapped and regression tested with aarch64-none-linux-gnu with
> and without the configure option turned on.
> Also tested on aarch64-none-elf with and without configure option with a
> BTI enabled aem. Only 2 regressions and these were because newlib
> requires patches to protect hand coded libraries with BTI."
> 
> The ChangeLog still applies.
> 
> Sudi
> 
I have updated the patch according to our discussions offline.
The md pattern is now split into 4 patterns and i have added a new
test for the setjmp case along with some comments where missing.

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  
Ramana Radhakrishnan  

* config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
* gcc/config/aarch64/aarch64.h: Update comment for
TRAMPOLINE_SIZE.
* config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
Update if bti is enabled.
* config/aarch64/aarch64-bti-insert.c: New file.
* config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
bti pass.
* config/aarch64/aarch64-protos.h (make_pass_insert_bti):
Declare the new bti pass.
* config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
(bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
* config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: New test.
* gcc.target/aarch64/bti-2.c: New test.
* gcc.target/aarch64/bti-3.c: New test.
* lib/target-supports.exp
(check_effective_target_aarch64_bti_hw): Add new check for
BTI hw.

Thanks
Sudi
diff --git a/gcc/config.gcc b/gcc/config.gcc
index cbabd21b33723a65790e2eafe8aa4979051cae48..f3dd3feceb3375374a29ca58e4ad87f949cea44d 100644

Re: [PATCH] Delete powerpcspe

2018-12-14 Thread Jeff Law
On 12/14/18 1:20 AM, Segher Boessenkool wrote:
> On Thu, Dec 13, 2018 at 09:49:51AM -0700, Jeff Law wrote:
>> On 12/12/18 10:33 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 12, 2018 at 11:36:29AM +0100, Richard Biener wrote:
 On Tue, Dec 11, 2018 at 2:37 PM Jeff Law  wrote:
> One way to deal with these problems is to create a fake simulator that
> always returns success.  That's what my tester does for the embedded
> targets.  That allows us to do reliable compile-time tests as well as
> the various scan-whatever tests.
>
> It would be trivial to start sending those results to gcc-testresults.

 I think it would be more useful if the execute testing would be
 reported as UNSUPPORTED rather than simply PASS w/o being
 sure it does.
>>>
>>> Yes.
>> Yes, but I don't think we've got a reasonable way to do that in the
>> existing dejagnu framework.
> 
> I think you can have your board's ${board}_load just do
>   return [list "unresolved" ""]
> or something like that.
That's easy 'nuff to try.  My dejagnu days are so far back in the past
any tricks have been dropped into the bit bucket of my mind.

> 
> 
>>> If results are posted to gcc-testresults then other people can get a
>>> feel whether the port is detoriating, and at what rate.  If no results
>>> are posted we just have to assume the worst.  Most people do not have
>>> the time (or setup) to test it for themselves.
>> Yup.  I wish I had the time to extract more of the data the tester is
>> gathering and produce this kind of info.
>>
>> I have not made it a priority to try and address all the issues I've
>> seen in the tester.  We have some ports that are incredibly flaky
>> (epiphany for example), and many that have a lot of failures, but are
>> stable in their set of failures.
>>
>> My goal to date has mostly been to identify regressions.  I'm not even
>> able to keep up with that.  For example s390/s390x have been failing for
>> about a week with their kernel builds.sparc, i686, aarch64 are
>> consistently tripping over regressions.  ia64 hasn't worked since we put
>> in qsort consistency checking, etc etc.
> 
> About a third of kernel builds have failed (for my configs) this whole
> stage 1 and stage 3...  Hopefully it will be better in stage 4.
I'm not seeing as many, though I do have some patches to avoid known
issues that are on the kernel side.  Broken s390 asms, missing includes
is the mellanox drivers, whatever.

It's also the case that there's targets where I could cover a kernel
build, but don;t at the moment.  A great example would be the H8, but
there's others.  The common thread is if there isn't a glibc port, then
I'm not covering the kernel.



Jeff


Re: [PATCH] Delete powerpcspe

2018-12-14 Thread Jeff Law
On 12/14/18 2:52 AM, Richard Biener wrote:
> On Thu, Dec 13, 2018 at 5:49 PM Jeff Law  wrote:
>>
>> On 12/12/18 10:33 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 12, 2018 at 11:36:29AM +0100, Richard Biener wrote:
 On Tue, Dec 11, 2018 at 2:37 PM Jeff Law  wrote:
> One way to deal with these problems is to create a fake simulator that
> always returns success.  That's what my tester does for the embedded
> targets.  That allows us to do reliable compile-time tests as well as
> the various scan-whatever tests.
>
> It would be trivial to start sending those results to gcc-testresults.

 I think it would be more useful if the execute testing would be
 reported as UNSUPPORTED rather than simply PASS w/o being
 sure it does.
>>>
>>> Yes.
>> Yes, but I don't think we've got a reasonable way to do that in the
>> existing dejagnu framework.
>>
>>
>>>
 But while posting to gcc-testresults is a sign of testing tracking
 regressions (and progressions!) in bugzilla and caring for those
 bugs is far more important...
>>>
>>> If results are posted to gcc-testresults then other people can get a
>>> feel whether the port is detoriating, and at what rate.  If no results
>>> are posted we just have to assume the worst.  Most people do not have
>>> the time (or setup) to test it for themselves.
>> Yup.  I wish I had the time to extract more of the data the tester is
>> gathering and produce this kind of info.
>>
>> I have not made it a priority to try and address all the issues I've
>> seen in the tester.  We have some ports that are incredibly flaky
>> (epiphany for example), and many that have a lot of failures, but are
>> stable in their set of failures.
>>
>> My goal to date has mostly been to identify regressions.  I'm not even
>> able to keep up with that.  For example s390/s390x have been failing for
>> about a week with their kernel builds.sparc, i686, aarch64 are
>> consistently tripping over regressions.  ia64 hasn't worked since we put
>> in qsort consistency checking, etc etc.
> 
> Yeah :/
> 
> I wonder if we could set up auto-(simulator)-testing for all supported
> archs (and build testing for all supported configs) on the CF
> (with the required scripting in contrib/ so it's easy to replicate).  I'd
> simply test only released snapshots to keep the load reasonable
> and besides posting to gcc-testresults also post testresults
> differences to gcc-regression?
It's certainly possible.  Though I've found that managing this kind of
thing with Jenkins is far easier than rolling our own.  I'd be happy to
move an instance out into the CF.

> 
> That said, can we document how to simulator-test $target in
> a structural way somewhere?  Either my means of (a) script(s)
> in contrib/ or by simple documentation in a new gcc/testing.texi
> or on the wiki?
It should be possible. Sometimes it's just using the right
--target_board.Other times there isn't one so you write your own
glue code :(  That glue code is part of dejagnu.



> 
> You at least seem to have some sort of scripting for some targets?
> Esp. having target boards and simulator configs would be nice
> (and pointers where to look for simulators).
Well, since I'm using a fake simulator no mapping is needed.  Though
I've got plumbing in to use the simulator from gdb in place.  The plan
was to turn that on once things using the fake simulator were stable.

Jeff



Re: [PATCH 1/3][GCC] Add new target hook asm_post_cfi_startproc

2018-12-14 Thread Sam Tebbs

On 12/13/18 7:03 PM, Jason Merrill wrote:
> And this seems consistent with the other stuff in 
> dwarf2out_do_cfi_startproc.  You might amend the documentation to 
> mention that the expected use is to add more .cfi_* directives. OK 
> with that change.
>
> Jason

Hi Jason,

Thanks for the approval. Committed with your proposal as r267135.



Re: [PATCH] [MSP430] Fix gcc.dg/pr85180.c and gcc.dg/pr87985.c timeouts for msp430-elf -mlarge

2018-12-14 Thread Jozef Lawrynowicz
Hi Segher,

Thanks for the review.

On Wed, 12 Dec 2018 19:47:53 -0600
Segher Boessenkool  wrote:

> The unused bits in a MODE_PARTIAL_INT value are undefined, so nonzero_bits
> isn't valid for conversion in either direction.
>
> And *which* bits are undefined isn't defined anywhere either, so we cannot
> convert to/from smaller MODE_INT modes, either.

Can't we use the last_set_nonzero_bits if last_set_mode was MODE_INT and the
current mode is MODE_PARTIAL_INT? As long as the current mode bitsize is less
than the bitsize of nonzero_bits_mode, which it will be if we've gotten to this
point.

If the current mode is MODE_PARTIAL_INT, then on entry to
reg_nonzero_bits_for_combine, the current nonzero_bits has already been
initialized to GET_MODE_MASK(mode), so we are not at risk of disturbing the
undefined bits, as we are only ever doing &= with GET_MODE_MASK(mode).

However, the above suggestion doesn't actually provide any visible benefit to
testresults, so it doesn't need to be included.

> I don't see how that follows; not all bits in MODE_PARTIAL_INT modes
> are necessarily valid.

Yes, this was an oversight on my part. 
nonzero_bits_mode is only used to calculate last_set_nonzero_bits if
last_set_mode is in the MODE_INT class.
If last_set_mode was MODE_PARTIAL_INT class, then last_set_mode was just that
partial int mode; it wasn't calculated in the wider nonzero_bits_mode.

After some further investigation, it seems we can attribute the recursion to
an inconsistency with how nonzero_bits() is invoked.
The mode passed to nonzero_bits(rtx, mode) is normally the mode of rtx
itself. But there are two cases where nonzero_bits_mode is used instead, even
if that is wider than the mode of the rtx.

In record_value_for_reg:
>   rsp->last_set_mode = mode;
>   if (GET_MODE_CLASS (mode) == MODE_INT
>   && HWI_COMPUTABLE_MODE_P (mode))
> mode = nonzero_bits_mode;
>   rsp->last_set_nonzero_bits = nonzero_bits (value, mode);

In update_rsp_from_reg_equal: 
>  if (rsp->nonzero_bits != HOST_WIDE_INT_M1U)
>{
>  bits = nonzero_bits (src, nonzero_bits_mode);

Note that the the mode of src in update_rsp_from_reg_equal is not
checked to see if it is a MODE_INT class and HWI_COMPUTABLE, nonzero_bits_mode
is always used.

This mode passed to nonzero_bits() eventually makes its way to
reg_nonzero_bits_for_combine. rsp->last_set_mode is always the true mode of the
reg (i.e. not nonzero_bits_mode) from when it is set in record_value_for_reg.
So the recursion happens because update_rsp_from_reg_equal has asked for the
nonzero_bits in nonzero_bits_mode, but the last_set_mode was PSImode.
nonzero_bits_mode is not equal to PSImode, nor is it in the same class, so the
nonzero bits will never be reused.

So my revised patch (attached) instead modifies update_rsp_from_reg_equal to
only request nonzero_bits in nonzero_bits_mode if the mode class is MODE_INT
and HWI_COMPUTABLE. This gives it parity with how last_set_nonzero_bits are set
in record_value_for_reg.

I've regtested the attached patch for msp430-elf, currently bootstrapping and
testing on x86_64-pc-linux-gnu.
Is this ok for trunk if bootstrap and regtest for x86_64-pc-linux-gnu is
successful?

Jozef
2018-12-14  Jozef Lawrynowicz  

	gcc/ChangeLog:
	* combine.c (update_rsp_from_reg_equal): Only look for the nonzero bits
	of src in nonzero_bits_mode if the mode of src is MODE_INT and
	HWI_COMPUTABLE.
	(reg_nonzero_bits_for_combine): Add clarification to comment.

diff --git a/gcc/combine.c b/gcc/combine.c
index 7e61139..c93aaed 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1698,9 +1698,13 @@ update_rsp_from_reg_equal (reg_stat_type *rsp, rtx_insn *insn, const_rtx set,
   /* Don't call nonzero_bits if it cannot change anything.  */
   if (rsp->nonzero_bits != HOST_WIDE_INT_M1U)
 {
-  bits = nonzero_bits (src, nonzero_bits_mode);
+  machine_mode mode = GET_MODE (x);
+  if (GET_MODE_CLASS (mode) == MODE_INT
+	  && HWI_COMPUTABLE_MODE_P (mode))
+	mode = nonzero_bits_mode;
+  bits = nonzero_bits (src, mode);
   if (reg_equal && bits)
-	bits &= nonzero_bits (reg_equal, nonzero_bits_mode);
+	bits &= nonzero_bits (reg_equal, mode);
   rsp->nonzero_bits |= bits;
 }
 
@@ -10224,6 +10228,7 @@ simplify_and_const_int (rtx x, scalar_int_mode mode, rtx varop,
 
 /* Given a REG X of mode XMODE, compute which bits in X can be nonzero.
We don't care about bits outside of those defined in MODE.
+   We DO care about all the bits in MODE, even if XMODE is smaller than MODE.
 
For most X this is simply GET_MODE_MASK (GET_MODE (MODE)), but if X is
a shift, AND, or zero_extract, we can do better.  */


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang  
wrote:
> This patch is a re-organization of OpenACC asynchronous queues.

Again, many thanks for that!

In addition to the review emails I just posted, I've also put all that
stuff into a GitHub branch:
.

This also includes some more "into async re-work: replicate [...]"
commits to adjust your work for preparational things that I plan to
commit before.  I split these out intentionally, so that you can easily
see/review these changes.


Grüße
 Thomas


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Chung-Lin Tang

On 2018/12/14 10:53 PM, Thomas Schwinge wrote:

Additionally the following, or why not?  Please comment on the one TODO
which before your async re-work also was -- incorrectly? -- run
asynchronously?




diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 5a441c9efe38..91875c57fc97 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -413,11 +413,11 @@ GOACC_enter_exit_data (int device, size_t mapnum,
{
case GOMP_MAP_ALLOC:
case GOMP_MAP_FORCE_ALLOC:
- acc_create (hostaddrs[i], sizes[i]);
+ acc_create_async (hostaddrs[i], sizes[i], async);
  break;
case GOMP_MAP_TO:
case GOMP_MAP_FORCE_TO:
- acc_copyin (hostaddrs[i], sizes[i]);
+ acc_copyin_async (hostaddrs[i], sizes[i], async);
  break;
default:


Yes! I think these were somehow missed by mistake. Thanks for catching!


  gomp_fatal (" GOACC_enter_exit_data UNHANDLED kind 
0x%.2x",
@@ -563,6 +563,8 @@ GOACC_update (int device, size_t mapnum,
 the value of the allocated device memory in the
 previous pointer.  */
  *(uintptr_t *) hostaddrs[i] = (uintptr_t)dptr;
+ /* This is intentionally no calling acc_update_device_async,
+because TODO.  */
  acc_update_device (hostaddrs[i], sizeof (uintptr_t));
  
  	  /* Restore the host pointer.  */


I don't remember adding this piece of comment, it might have been Cesar I guess?
I'm not sure if there's any real reason not to use acc_update_device_async 
here...
Change and test to see?

Thanks,
Chung-Lin


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:
> --- a/libgomp/oacc-async.c
> +++ b/libgomp/oacc-async.c

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> + default async stream.  */
> +  if (async == acc_async_noval)
> +async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +return NULL;
> +
> +  if (async < 0)
> +gomp_fatal ("bad async %d", async);
> +
> +  struct gomp_device_descr *dev = thr->dev;
> +
> +  if (!create
> +  && (async >= dev->openacc.async.nasyncqueue
> +   || !dev->openacc.async.asyncqueue[async]))
> +return NULL;
> +

Doesn't this last block also have to be included in the lock you're
taking below?

> +  gomp_mutex_lock (>openacc.async.lock);
> +  if (async >= dev->openacc.async.nasyncqueue)
> +{
> +  int diff = async + 1 - dev->openacc.async.nasyncqueue;
> +  dev->openacc.async.asyncqueue
> + = gomp_realloc (dev->openacc.async.asyncqueue,
> + sizeof (goacc_aq) * (async + 1));
> +  memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
> +   0, sizeof (goacc_aq) * diff);
> +  dev->openacc.async.nasyncqueue = async + 1;
> +}
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> +{
> +  dev->openacc.async.asyncqueue[async] = 
> dev->openacc.async.construct_func ();
> +
> +  /* Link new async queue into active list.  */
> +  goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
> +  n->aq = dev->openacc.async.asyncqueue[async];
> +  n->next = dev->openacc.async.active;
> +  dev->openacc.async.active = n;
> +}
> +  gomp_mutex_unlock (>openacc.async.lock);
> +  return dev->openacc.async.asyncqueue[async];
> +}

And then, some more concerns, as encoded in the following patch (but
please also continue reading below):

commit d2d6aaeca840debbec14e421be705ef56d444ac7
Author: Thomas Schwinge 
Date:   Wed Dec 12 15:57:30 2018 +0100

into async re-work: locking concerns
---
 libgomp/oacc-async.c  | 18 +++---
 libgomp/plugin/plugin-nvptx.c |  6 ++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 89a405ebcdb1..68e4e65e8182 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -84,17 +84,21 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool 
create, int async)
   if (id < 0)
 return NULL;
 
+  struct goacc_asyncqueue *ret = NULL;
+
   struct gomp_device_descr *dev = thr->dev;
 
+  gomp_mutex_lock (>openacc.async.lock);
+
   if (!create
   && (id >= dev->openacc.async.nasyncqueue
  || !dev->openacc.async.asyncqueue[id]))
-return NULL;
+goto out;
 
-  gomp_mutex_lock (>openacc.async.lock);
   if (id >= dev->openacc.async.nasyncqueue)
 {
   int diff = id + 1 - dev->openacc.async.nasyncqueue;
+  // TODO gomp_realloc might call "gomp_fatal" with 
">openacc.async.lock" locked.  Might cause deadlock?
   dev->openacc.async.asyncqueue
= gomp_realloc (dev->openacc.async.asyncqueue,
sizeof (goacc_aq) * (id + 1));
@@ -105,16 +109,23 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool 
create, int async)
 
   if (!dev->openacc.async.asyncqueue[id])
 {
+  //TODO We have ">openacc.async.lock" locked here, and if 
"openacc.async.construct_func" calls "GOMP_PLUGIN_fatal" (via 
"CUDA_CALL_ASSERT", for example), that might cause deadlock?
+  //TODO Change the interface to emit an error in the plugin, but then 
"return NULL", and we catch that here, unlock, and bail out?
   dev->openacc.async.asyncqueue[id] = dev->openacc.async.construct_func ();
 
   /* Link new async queue into active list.  */
+  // TODO gomp_malloc might call "gomp_fatal" with 
">openacc.async.lock" locked.  Might cause deadlock?
   goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
   n->aq = dev->openacc.async.asyncqueue[id];
   n->next = dev->openacc.async.active;
   dev->openacc.async.active = n;
 }
+  ret = dev->openacc.async.asyncqueue[id];
+
+ out:
   gomp_mutex_unlock (>openacc.async.lock);
-  return dev->openacc.async.asyncqueue[id];
+
+  return ret;
 }
 
 /* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
@@ -305,6 +316,7 @@ goacc_fini_asyncqueues (struct gomp_device_descr *devicep)
   goacc_aq_list next;
   for (goacc_aq_list l = devicep->openacc.async.active; l; l = next)
{
+ //TODO Can/should/must we "synchronize" here (how?), so as to make 
sure that no other operation on this asyncqueue is going on while/after we've 
destructed it here?
  ret &= devicep->openacc.async.destruct_func (l->aq);
  next = l->next;
  free (l);
diff --git libgomp/plugin/plugin-nvptx.c 

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:
> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -377,8 +360,6 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   finalize = true;
>  }
>  
> -  acc_dev->openacc.async_set_async_func (async);
> -
>/* Determine if this is an "acc enter data".  */
>for (i = 0; i < mapnum; ++i)
>  {
> @@ -450,7 +431,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
> else
>   {
> gomp_acc_insert_pointer (pointer, [i],
> -[i], [i]);
> +[i], [i], async);
> /* Increment 'i' by two because OpenACC requires fortran
>arrays to be contiguous, so each PSET is associated with
>one of MAP_FORCE_ALLOC/MAP_FORCE_PRESET/MAP_FORCE_TO, and
> @@ -475,17 +456,17 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   if (acc_is_present (hostaddrs[i], sizes[i]))
> {
>   if (finalize)
> -   acc_delete_finalize (hostaddrs[i], sizes[i]);
> +   acc_delete_finalize_async (hostaddrs[i], sizes[i], async);
>   else
> -   acc_delete (hostaddrs[i], sizes[i]);
> +   acc_delete_async (hostaddrs[i], sizes[i], async);
> }
>   break;
> case GOMP_MAP_FROM:
> case GOMP_MAP_FORCE_FROM:
>   if (finalize)
> -   acc_copyout_finalize (hostaddrs[i], sizes[i]);
> +   acc_copyout_finalize_async (hostaddrs[i], sizes[i], async);
>   else
> -   acc_copyout (hostaddrs[i], sizes[i]);
> +   acc_copyout_async (hostaddrs[i], sizes[i], async);
>   break;
> default:
>   gomp_fatal (" GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
> @@ -503,8 +484,6 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   i += pointer - 1;
> }
>}
> -
> -  acc_dev->openacc.async_set_async_func (acc_async_sync);
>  }

Additionally the following, or why not?  Please comment on the one TODO
which before your async re-work also was -- incorrectly? -- run
asynchronously?

commit 34c9ce65ad1f9865d0716d18c364d8c6928e694c
Author: Thomas Schwinge 
Date:   Fri Dec 14 14:34:17 2018 +0100

into async re-work: more async function usage
---
 libgomp/oacc-parallel.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 5a441c9efe38..91875c57fc97 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -413,11 +413,11 @@ GOACC_enter_exit_data (int device, size_t mapnum,
{
case GOMP_MAP_ALLOC:
case GOMP_MAP_FORCE_ALLOC:
- acc_create (hostaddrs[i], sizes[i]);
+ acc_create_async (hostaddrs[i], sizes[i], async);
  break;
case GOMP_MAP_TO:
case GOMP_MAP_FORCE_TO:
- acc_copyin (hostaddrs[i], sizes[i]);
+ acc_copyin_async (hostaddrs[i], sizes[i], async);
  break;
default:
  gomp_fatal (" GOACC_enter_exit_data UNHANDLED kind 
0x%.2x",
@@ -563,6 +563,8 @@ GOACC_update (int device, size_t mapnum,
 the value of the allocated device memory in the
 previous pointer.  */
  *(uintptr_t *) hostaddrs[i] = (uintptr_t)dptr;
+ /* This is intentionally no calling acc_update_device_async,
+because TODO.  */
  acc_update_device (hostaddrs[i], sizeof (uintptr_t));
 
  /* Restore the host pointer.  */


Grüße
 Thomas


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Chung-Lin Tang

On 2018/12/14 10:17 PM, Thomas Schwinge wrote:

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:

--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c



+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);


To make this "resolve" part more obvious, that is, the translation from
the "async" argument to an "asyncqueue" array index:


+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+return NULL;
+[...]


..., I propose adding a "async2id" function for that, and then rename all
"asyncqueue[async]" to "asyncqueue[id]".


I don't think this is needed. This is the only place in the entire runtime that
does asyncqueue indexing, adding more conceptual layers of re-directed indexing
seems unneeded.

I do think the more descriptive comments are nice though.


And, this also restores the current trunk behavior, so that
"acc_async_noval" gets its own, separate "asyncqueue".


Is there a reason we need to restore that behavior right now?

Thanks,
Chung-Lin



Re: Too strict synchronization with the local (host) thread?

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 11 Dec 2018 21:30:31 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/7 11:56 PM, Thomas Schwinge wrote:
> >> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
> >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
> >> @@ -114,6 +114,7 @@ main (int argc, char **argv)
> >>   
> >> for (i = 0; i < N; i++)
> >>   {
> >> +  stream = (CUstream) acc_get_cuda_stream (i & 1);
> >> r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
> > What's the motivation for this change?
> 
> To place work on both streams 0 and 1.

That's describing what it doesn, not the motivation behind it.  ;-)


> > ..., and this change are needed because we're now more strictly
> > synchronizing with the local (host) thread.
> > 
> > Regarding the case of "libgomp.oacc-c-c++-common/lib-81.c", as currently
> > present:
> > 
> >  [...]
> >for (i = 0; i < N; i++)
> >  {
> >r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], 
> > kargs, 0);
> >if (r != CUDA_SUCCESS)
> >  {
> >fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
> >abort ();
> >  }
> >  }
> > 
> > This launches N kernels on N separate async queues/CUDA streams, [0..N).
> > 
> >acc_wait_all_async (N);
> > 
> > Then, the "acc_wait_all_async (N)" -- in my understanding! -- should
> > *not*  synchronize with the local (host) thread, but instead just set up
> > the additional async queue/CUDA stream N to "depend" on [0..N).
> > 
> >for (i = 0; i <= N; i++)
> >  {
> >if (acc_async_test (i) != 0)
> >  abort ();
> >  }
> > 
> > Thus, all [0..N) should then still be "acc_async_test (i) != 0" (still
> > running).
> > 
> >acc_wait (N);
> > 
> > Here, the "acc_wait (N)" would synchronize the local (host) thread with
> > async queue/CUDA stream N and thus recursively with [0..N).
> > 
> >for (i = 0; i <= N; i++)
> >  {
> >if (acc_async_test (i) != 1)
> >  abort ();
> >  }
> >  [...]
> > 
> > So, then all these async queues/CUDA streams here indeed are
> > "acc_async_test (i) != 1", thas is, idle.
> > 
> > 
> > Now, the more strict synchronization with the local (host) thread is not
> > wrong in term of correctness, but I suppose it will impact performance of
> > otherwise asynchronous operations, which now get synchronized too much?
> > 
> > Or, of course, I'm misunderstanding something...
> 
> IIRC, we encountered many issues where people misunderstood the meaning of 
> "wait+async",
> using it as if the local host sync happened, where in our original 
> implementation it does not.

..., and that's the right thing, in my opinion.  (Do you disagree?)

> Also some areas of the OpenACC spec were vague on whether the local host 
> synchronization should
> or should not happen; basically, the wording treated as if it was only an 
> implementation detail
> and didn't matter, and didn't acknowledge that this would be something 
> visible to the user.

I suppose in correct code that correctly uses a different mechanism for
inter-thread synchronization, it shouldn't be visible?  (Well, with the
additional synchronization, it would be visible in terms of performance
degradation.)

For example, OpenACC 2.6, 3.2.11. "acc_wait" explicitly states that "If
two or more threads share the same accelerator, the 'acc_wait' routine
will return only if all matching asynchronous operations initiated by
this thread have completed; there is no guarantee that all matching
asynchronous operations initiated by other threads have completed".

I agree that this could be made more explicit throught the specification,
and also the reading of OpenACC 2.6, 2.16.1. "async clause" is a bit
confusing regarding multiple host threads, but as I understand, the idea
still is that such wait operations do not synchronize at the host thread
level.  (Let's please assume that, and then work with the OpenACC
technical committee to get that clarified in the documentation.)

> At the end, IIRC, I decided that adding a local host synchronization is 
> easier for all of us,

Well...

> and took the opportunity of the re-org to make this change.

Well...  Again, a re-org/re-work should not make such functional
changes...

> That said, I didn't notice those tests you listed above were meant to test 
> such delicate behavior.
> 
> > (For avoidance of doubt, I would accept the "async re-work" as is, but we
> > should eventually clarify this, and restore the behavior we -- apparently
> > -- had before, where we didn't synchronize so much?  (So, technically,
> > the "async re-work" would constitute a regression for this kind of
> > usage?)
> 
> It's not hard to restore the old behavior, just a few lines to delete. 
> Although as described
> above, this change was deliberate.
> 
> This might be another issue to raise with the committee. I think I 

[PATCH v2] [MIPS] GCC: Fix Loongson3 LLSC Errata

2018-12-14 Thread YunQiang Su
From: Paul Hua 

In some older Loongson3 processors there is a LL/SC errata that can
cause the CPU to deadlock occasionally.  The details are very
complicated. We find a way to work around this errata by
 a) adding a sync before ll/lld instruction,
 b) adding a sync before branch target that between ll and sc.
The assembler do the jobs 'a', gcc do the jobs 'b'.

This patch add an option '-mfix-loongson3-llsc' option, and it will also
call 'as' with the same option. So if 'as' doesn't support this option,
an error will happen.

The macro '__mips_fix_loongson3_llsc' is predefined, when this option is
enabled, as some program, like linux kernel will need to aware this
option is used.

This patch also add a configure options
--with-mips-fix-loongson3-llsc=[yes|no] to enable fix-loongson3-llsc
by config.

v1 -> v2:
  * Predefine __mips_fix_loongson3_llsc.
  * Add to ASM_SPECS.

gcc/
* config.gcc (supported_defaults): Add fix-loongson3-llsc
(with_fix_loongson3_llsc): Add validation.
(all_defaults): Add fix-loongson3-llsc.
* config/mips/mips.c (mips_process_sync_loop): Add sync before
branch target that between ll and sc.
* config/mips/mips.h
(OPTION_DEFAULT_SPECS): Add a default for fix-loongson3-llsc.
(ASM_SPECS): Add a default for fix-loongson3-llsc.
(TARGET_CPU_CPP_BUILTINS): Predefine __mips_fix_loongson3_llsc.
gcc/config/mips/mips.opt: New option.
* doc/install.texi (--with-fix-loongson3-llsc):Document the new
option.
* doc/invoke.texi (-mfix-loongson3-llsc):Document the new option.

gcc/testsuite/
* gcc.target/mips/fix-loongson3-llsc.c: New test.
* gcc.target/mips/mips.exp (option): Add fix-loongson3-llsc.
---
 gcc/config.gcc| 19 +--
 gcc/config/mips/mips.c| 13 +++--
 gcc/config/mips/mips.h|  7 ++-
 gcc/config/mips/mips.opt  |  4 
 .../gcc.target/mips/fix-loongson3-llsc.c  | 10 ++
 gcc/testsuite/gcc.target/mips/mips.exp|  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/fix-loongson3-llsc.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3122a0ce2..72b94b1be 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4265,7 +4265,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4 fix-loongson3-llsc"
 
case ${with_float} in
"" | soft | hard)
@@ -4418,6 +4418,21 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_fix_loongson3_llsc} in
+   yes)
+   with_fix_loongson3_llsc=fix-loongson3-llsc
+   ;;
+   no)
+   with_fix_loongson3_llsc=no-fix-loongson3-llsc
+   ;;
+   "")
+   ;;
+   *)
+   echo "Unknown fix-loongson3-llsc type used in 
--with-fix-loongson3-llsc" 1>&2
+   exit 1
+   ;;
+   esac
;;
 
nds32*-*-*)
@@ -4937,7 +4952,7 @@ case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4 fix-loongson3-llsc"
 for option in $all_defaults
 do
eval "val=\$with_"`echo $option | sed s/-/_/g`
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 55b440785..717f3d032 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -14127,7 +14127,7 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
   mips_multi_start ();
 
   /* Output the release side of the memory barrier.  */
-  if (need_atomic_barrier_p (model, true))
+  if (need_atomic_barrier_p (model, true) && !FIX_LOONGSON3_LLSC)
 {
   if (required_oldval == 0 && TARGET_OCTEON)
{
@@ -14148,6 +14148,10 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
   /* Output the branch-back label.  */
   mips_multi_add_label ("1:");
 
+  /* Loongson3 target need sync before ll/lld.  */
+  if (need_atomic_barrier_p (model,  true) && FIX_LOONGSON3_LLSC)
+mips_multi_add_insn ("sync", NULL);
+
   /* OLDVAL = *MEM.  */
   mips_multi_add_insn (is_64bit_p ? "lld\t%0,%1" : "ll\t%0,%1",
  

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Chung-Lin Tang

On 2018/12/14 10:32 PM, Thomas Schwinge wrote:

Invoked as "acc_wait_async ([...], acc_async_sync)" (as used in a test
case that I'll soon submit/commit), we'll end up with "aq2 == NULL", and
will segfault in the nvptx "openacc.async.serialize_func".


What does "wait async(acc_async_sync)" supposed to mean? Instead of fixing
it here, will it make more sense to have the serialize_func hook to accommodate
the NULL asyncqueue?

Chung-Lin



Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:
>  void
>  acc_wait_async (int async1, int async2)
>  {
> +  struct goacc_thread *thr = get_goacc_thread ();
>  
> +  goacc_aq aq2 = lookup_goacc_asyncqueue (thr, true, async2);
> +  goacc_aq aq1 = lookup_goacc_asyncqueue (thr, false, async1);
> +  if (!aq1)
> +gomp_fatal ("invalid async 1");
> +  if (aq1 == aq2)
> +gomp_fatal ("identical parameters");
>  
> +  thr->dev->openacc.async.synchronize_func (aq1);
> +  thr->dev->openacc.async.serialize_func (aq1, aq2);
>  }

Invoked as "acc_wait_async ([...], acc_async_sync)" (as used in a test
case that I'll soon submit/commit), we'll end up with "aq2 == NULL", and
will segfault in the nvptx "openacc.async.serialize_func".

Good to fix as follows?

commit 448ff855bd954a72b5edb19fc1f3d481833fcb59
Author: Thomas Schwinge 
Date:   Thu Dec 13 17:43:42 2018 +0100

into async re-work: adjust for test case added in "[PR88484] OpenACC wait 
directive without wait argument but with async clause"
---
 libgomp/oacc-async.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 7e61b5dc0a05..a38e42781aa0 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -196,7 +196,8 @@ acc_wait_async (int async1, int async2)
 gomp_fatal ("identical parameters");
 
   thr->dev->openacc.async.synchronize_func (aq1);
-  thr->dev->openacc.async.serialize_func (aq1, aq2);
+  if (aq2)
+thr->dev->openacc.async.serialize_func (aq1, aq2);
 }
 
 void


Grüße
 Thomas


Re: [PATCH] [PR86823] retain deferred access checks from outside firewall

2018-12-14 Thread Jason Merrill
OK.
On Thu, Dec 13, 2018 at 8:35 PM Alexandre Oliva  wrote:
>
> On Dec  6, 2018, Alexandre Oliva  wrote:
>
> > I'm giving your proposed patch a full round of testing along with other
> > patches.
>
> [PR86823] retain deferred access checks from outside firewall
>
> We used to preserve deferred access check along with resolved template
> ids, but a tentative parsing firewall introduced additional layers of
> deferred access checks, so that we don't preserve the checks we
> want to any more.
>
> This patch moves the deferred access checks from outside the firewall
> into it.
>
> Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?
>
>
> From: Jason Merrill 
> for  gcc/cp/ChangeLog
>
> PR c++/86823
> * parser.c (cp_parser_template_id): Rearrange deferred access
> checks into the firewall.
>
> From: Alexandre Oliva 
> for  gcc/testsuite/ChangeLog
>
> PR c++/86823
> * g++.dg/pr86823.C: New.
> ---
>  gcc/cp/parser.c|   10 ++
>  gcc/testsuite/g++.dg/pr86823.C |   15 +++
>  2 files changed, 21 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr86823.C
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index adfe09e494dc..0bf0e309a588 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -16182,16 +16182,18 @@ cp_parser_template_id (cp_parser *parser,
>is_declaration,
>tag_type,
>_identifier);
> +
> +  /* Push any access checks inside the firewall we're about to create.  */
> +  vec *checks = get_deferred_access_checks ();
> +  pop_deferring_access_checks ();
>if (templ == error_mark_node || is_identifier)
> -{
> -  pop_deferring_access_checks ();
> -  return templ;
> -}
> +return templ;
>
>/* Since we're going to preserve any side-effects from this parse, set up a
>   firewall to protect our callers from cp_parser_commit_to_tentative_parse
>   in the template arguments.  */
>tentative_firewall firewall (parser);
> +  reopen_deferring_access_checks (checks);
>
>/* If we find the sequence `[:' after a template-name, it's probably
>   a digraph-typo for `< ::'. Substitute the tokens and check if we can
> diff --git a/gcc/testsuite/g++.dg/pr86823.C b/gcc/testsuite/g++.dg/pr86823.C
> new file mode 100644
> index ..18914b00aa8d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr86823.C
> @@ -0,0 +1,15 @@
> +// { dg-do compile }
> +
> +struct X {
> +private:
> +  template
> +  struct Y {
> +int data;
> +  };
> +public:
> +  int value;
> +};
> +
> +int main() {
> +  typename X::Y a; // { dg-error "private" }
> +}
>
>
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH] [PR87012] canonicalize ref type for tmpl arg

2018-12-14 Thread Jason Merrill
On Thu, Dec 13, 2018 at 8:37 PM Alexandre Oliva  wrote:
> On Dec  5, 2018, Jason Merrill  wrote:
>
> > I would expect that this same issue would come up with other types; I
> > think we want to fix this sooner, when we are figuring out what type
> > we want to convert the argument to.
>
> You mean like this?
>
> [PR87012] canonicalize ref type for tmpl arg
>
> When binding an object to a template parameter of reference type, we
> take the address of the object and dereference that address.  The type
> of the address may still carry (template) typedefs, but
> verify_unstripped_args_1 rejects such typedefs other than in the top
> level of template arguments.
>
> Canonicalizing the type we want to convert to right after any
> substitutions or deductions avoids that issue.
>
> Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?
>
>
> for  gcc/cp/ChangeLog
>
> PR c++/87012
> * pt.c (convert_template_argument): Canonicalize type after
> tsubst/deduce.
>
> for  gcc/testsuite/ChangeLog
>
> PR c++/87012
> * g++.dg/cpp0x/pr87012.C: New.
> ---
>  gcc/cp/pt.c  |2 ++
>  gcc/testsuite/g++.dg/cpp0x/pr87012.C |   11 +++
>  2 files changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr87012.C
>
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 72ae7173d92c..0d388c67459a 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -8018,6 +8018,8 @@ convert_template_argument (tree parm,
>if (invalid_nontype_parm_type_p (t, complain))
> return error_mark_node;
>
> +  t = canonicalize_type_argument (t, complain);

Yes, like that, thanks.  It might be a bit of an optimization to skip
this when t == TREE_TYPE (parm).  OK either way.

Jason


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Chung-Lin Tang

On 2018/12/13 11:51 PM, Thomas Schwinge wrote:

On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin Tang  
wrote:

On 2018/12/7 6:26 AM, Julian Brown wrote:

On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:


On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:


[...]
..., where the "Invalid read of size 8" happens, and which
eventually would try to "free (tgt)" again, via
libgomp/target.c:gomp_unmap_tgt:

  attribute_hidden void
  gomp_unmap_tgt (struct target_mem_desc *tgt)
  {
/* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
region.  */ if (tgt->tgt_end)
  gomp_free_device_memory (tgt->device_descr, tgt->to_free);
  
free (tgt->array);

free (tgt);
  }

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
or something else?


I think I understand the problem now. In gomp_unmap_vars_async(), in the case of
tgt->list_count == 0 (i.e. no map arguments at all) the code should simply free 
the tgt
and return, while the code in goacc_async_copyout_unmap_vars() didn't handle 
this case
and always scheduled an asynchronous free of the tgt later, causing that 
valgrind error
you see.

I am still testing the attached patch, but I think it is the right fix: I 
reviewed what I
wrote and it seemed the way I organized things into a 
goacc_async_copyout_unmap_vars() routine,
including the hackish refcount++, etc. is simply unneeded. I have deleted those 
stuff
and consolidated things back into gomp_unmap_vars_async().

I'll update the whole patches later after complete testing, the attached is the 
patch atop
of the prior async patches. (the small program you gave above does pass 
valgrind now)

Julian, I didn't try the OG8 refcount changes, it's just too large a set of 
changes to
reason about in so short time, maybe later when we are prepared to fix things 
completely as
you noted what those patches were capable of.

Chung-Lin






diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
--- trunk-orig/libgomp/oacc-async.c 2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-async.c 2018-12-14 22:11:29.252251925 +0800
@@ -238,31 +238,6 @@
   thr->default_async = async;
 }
 
-static void
-goacc_async_unmap_tgt (void *ptr)
-{
-  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
-
-  if (tgt->refcount > 1)
-tgt->refcount--;
-  else
-gomp_unmap_tgt (tgt);
-}
-
-attribute_hidden void
-goacc_async_copyout_unmap_vars (struct target_mem_desc *tgt,
-   struct goacc_asyncqueue *aq)
-{
-  struct gomp_device_descr *devicep = tgt->device_descr;
-
-  /* Increment reference to delay freeing of device memory until callback
- has triggered.  */
-  tgt->refcount++;
-  gomp_unmap_vars_async (tgt, true, aq);
-  devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
- (void *) tgt);
-}
-
 attribute_hidden void
 goacc_async_free (struct gomp_device_descr *devicep,
  struct goacc_asyncqueue *aq, void *ptr)
diff -ru trunk-orig/libgomp/oacc-int.h trunk-work/libgomp/oacc-int.h
--- trunk-orig/libgomp/oacc-int.h   2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-int.h   2018-12-14 22:11:43.379947915 +0800
@@ -104,8 +104,6 @@
 
 void goacc_init_asyncqueues (struct gomp_device_descr *);
 bool goacc_fini_asyncqueues (struct gomp_device_descr *);
-void goacc_async_copyout_unmap_vars (struct target_mem_desc *,
-struct goacc_asyncqueue *);
 void goacc_async_free (struct gomp_device_descr *, struct goacc_asyncqueue *,
   void *);
 struct goacc_asyncqueue *get_goacc_asyncqueue (int);
diff -ru trunk-orig/libgomp/oacc-mem.c trunk-work/libgomp/oacc-mem.c
--- trunk-orig/libgomp/oacc-mem.c   2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-mem.c   2018-12-14 22:10:08.325998369 +0800
@@ -911,7 +911,7 @@
   else
{
  goacc_aq aq = get_goacc_asyncqueue (async);
- goacc_async_copyout_unmap_vars (t, aq);
+ gomp_unmap_vars_async (t, true, aq);
}
 }
 
diff -ru trunk-orig/libgomp/oacc-parallel.c trunk-work/libgomp/oacc-parallel.c
--- trunk-orig/libgomp/oacc-parallel.c  2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-parallel.c  2018-12-14 22:09:51.918353575 +0800
@@ -245,7 +245,7 @@
 {
   acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
dims, tgt, aq);
-  goacc_async_copyout_unmap_vars (tgt, aq);
+  gomp_unmap_vars_async (tgt, true, aq);
 }
 }
 
diff -ru trunk-orig/libgomp/target.c trunk-work/libgomp/target.c
--- trunk-orig/libgomp/target.c 2018-12-14 21:06:06.653794622 +0800
+++ trunk-work/libgomp/target.c 2018-12-14 20:42:03.629154346 +0800
@@ -1072,6 +1072,17 @@
   return is_tgt_unmapped;
 }
 
+static void
+gomp_unref_tgt (void *ptr)
+{
+  struct target_mem_desc *tgt = 

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

A little bit of documentation starter update for you to include.  Please
make sure that all relevant functions have such comments addded.

commit 7e0896281d155e1544751f43c1eaace8e005e019
Author: Thomas Schwinge 
Date:   Thu Dec 13 17:59:46 2018 +0100

[WIP] into async re-work: documentation
---
 libgomp/libgomp.h | 3 +++
 libgomp/oacc-async.c  | 7 +++
 libgomp/plugin/plugin-nvptx.c | 4 ++--
 libgomp/target.c  | 3 +++
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git libgomp/libgomp.h libgomp/libgomp.h
index 8b74d6368389..574fcd1ee4ad 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -949,6 +949,9 @@ typedef struct acc_dispatch_t
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
   struct {
+/* Once created and put into the "active" list, asyncqueues are then never
+   destructed and removed from the "active" list, other than if the TODO
+   device is shut down.  */
 gomp_mutex_t lock;
 int nasyncqueue;
 struct goacc_asyncqueue **asyncqueue;
diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index b091ba2460ac..0f5f74bdf836 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -280,6 +280,10 @@ goacc_async_free (struct gomp_device_descr *devicep,
 devicep->openacc.async.queue_callback_func (aq, free, ptr);
 }
 
+/* This function initializes the asyncqueues for the device specified by
+   DEVICEP.  TODO DEVICEP must be locked on entry, and remains locked on
+   return.  */
+
 attribute_hidden void
 goacc_init_asyncqueues (struct gomp_device_descr *devicep)
 {
@@ -289,6 +293,9 @@ goacc_init_asyncqueues (struct gomp_device_descr *devicep)
   devicep->openacc.async.active = NULL;
 }
 
+/* This function finalizes the asyncqueues for the device specified by DEVICEP.
+   TODO DEVICEP must be locked on entry, and remains locked on return.  */
+
 attribute_hidden bool
 goacc_fini_asyncqueues (struct gomp_device_descr *devicep)
 {
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7b658264b8e7..577ed39ef3f6 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1340,14 +1340,14 @@ GOMP_OFFLOAD_openacc_cuda_get_current_context (void)
   return nvptx_get_current_cuda_context ();
 }
 
-/* NOTE: This returns a CUstream, not a ptx_stream pointer.  */
+/* This returns a CUstream.  */
 void *
 GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *aq)
 {
   return (void *) aq->cuda_stream;
 }
 
-/* NOTE: This takes a CUstream, not a ptx_stream pointer.  */
+/* This takes a CUstream.  */
 int
 GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *aq, void 
*stream)
 {
diff --git libgomp/target.c libgomp/target.c
index e67d9248ae0b..96df1890a729 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1506,6 +1506,9 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->state = GOMP_DEVICE_INITIALIZED;
 }
 
+/* This function finalizes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
+
 attribute_hidden bool
 gomp_fini_device (struct gomp_device_descr *devicep)
 {


Grüße
 Thomas


Re: [C++ PATCH] Use RANGE_EXPRs in build_vec_init (PR c++/82294, PR c++/87436)

2018-12-14 Thread Jason Merrill
OK.
On Thu, Dec 13, 2018 at 6:03 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch makes use of RANGE_EXPRs in build_vec_init, instead of
> appending many ctor elements.
>
> E.g. on the PR87436 testcase the memory usage goes down from more than 5GB
> to a few megabytes and compile time decreases significantly too.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Testcases not included, see follow-up patch.
>
> 2018-12-13  Jakub Jelinek  
>
> PR c++/82294
> PR c++/87436
> * init.c (build_vec_init): Change num_initialized_elts type from int
> to HOST_WIDE_INT.  Build a RANGE_EXPR if e needs to be repeated more
> than once.
>
> --- gcc/cp/init.c.jj2018-11-13 09:49:33.150035688 +0100
> +++ gcc/cp/init.c   2018-12-13 15:08:08.446783069 +0100
> @@ -4104,7 +4104,7 @@ build_vec_init (tree base, tree maxindex
>tree compound_stmt;
>int destroy_temps;
>tree try_block = NULL_TREE;
> -  int num_initialized_elts = 0;
> +  HOST_WIDE_INT num_initialized_elts = 0;
>bool is_global;
>tree obase = base;
>bool xvalue = false;
> @@ -4539,10 +4539,13 @@ build_vec_init (tree base, tree maxindex
>
>   if (e)
> {
> - int max = tree_to_shwi (maxindex)+1;
> - for (; num_initialized_elts < max; ++num_initialized_elts)
> + HOST_WIDE_INT last = tree_to_shwi (maxindex);
> + if (num_initialized_elts <= last)
> {
>   tree field = size_int (num_initialized_elts);
> + if (num_initialized_elts != last)
> +   field = build2 (RANGE_EXPR, sizetype, field,
> +   size_int (last));
>   CONSTRUCTOR_APPEND_ELT (const_vec, field, e);
> }
> }
>
> Jakub


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:
> --- a/libgomp/oacc-async.c
> +++ b/libgomp/oacc-async.c

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> + default async stream.  */
> +  if (async == acc_async_noval)
> +async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +return NULL;
> +
> +  if (async < 0)
> +gomp_fatal ("bad async %d", async);

To make this "resolve" part more obvious, that is, the translation from
the "async" argument to an "asyncqueue" array index:

> +  if (!create
> +  && (async >= dev->openacc.async.nasyncqueue
> +   || !dev->openacc.async.asyncqueue[async]))
> +return NULL;
> +[...]

..., I propose adding a "async2id" function for that, and then rename all
"asyncqueue[async]" to "asyncqueue[id]".

And, this also restores the current trunk behavior, so that
"acc_async_noval" gets its own, separate "asyncqueue".

commit e0d10cd744906c031af536bbf523ed6607370bf7
Author: Thomas Schwinge 
Date:   Wed Dec 12 15:22:29 2018 +0100

into async re-work: libgomp/oacc-async.c:async2id
---
 libgomp/oacc-async.c | 58 +++-
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index c9b134ac3380..b091ba2460ac 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -54,53 +54,73 @@ get_goacc_thread_device (void)
   return thr->dev;
 }
 
-attribute_hidden struct goacc_asyncqueue *
-lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+/* Translate from an OpenACC async-argument to an internal asyncqueue ID, or -1
+   if no asyncqueue is to be used.  */
+
+static int
+async2id (int async)
 {
-  /* The special value acc_async_noval (-1) maps to the thread-specific
- default async stream.  */
-  if (async == acc_async_noval)
-async = 0; //TODO thr->default_async;
+  if (!async_valid_p (async))
+gomp_fatal ("invalid async-argument: %d", async);
 
   if (async == acc_async_sync)
+return -1;
+  else if (async == acc_async_noval)
+return 0;
+  else if (async >= 0)
+return 1 + async;
+  else
+__builtin_unreachable ();
+}
+
+/* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
+   might return NULL if no asyncqueue is to be used.  Otherwise, if CREATE,
+   create the asyncqueue if it doesn't exist yet.  */
+
+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  int id = async2id (async);
+  if (id < 0)
 return NULL;
 
-  if (async < 0)
-gomp_fatal ("bad async %d", async);
-
   struct gomp_device_descr *dev = thr->dev;
 
   if (!create
-  && (async >= dev->openacc.async.nasyncqueue
- || !dev->openacc.async.asyncqueue[async]))
+  && (id >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[id]))
 return NULL;
 
   gomp_mutex_lock (>openacc.async.lock);
-  if (async >= dev->openacc.async.nasyncqueue)
+  if (id >= dev->openacc.async.nasyncqueue)
 {
-  int diff = async + 1 - dev->openacc.async.nasyncqueue;
+  int diff = id + 1 - dev->openacc.async.nasyncqueue;
   dev->openacc.async.asyncqueue
= gomp_realloc (dev->openacc.async.asyncqueue,
-   sizeof (goacc_aq) * (async + 1));
+   sizeof (goacc_aq) * (id + 1));
   memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
  0, sizeof (goacc_aq) * diff);
-  dev->openacc.async.nasyncqueue = async + 1;
+  dev->openacc.async.nasyncqueue = id + 1;
 }
 
-  if (!dev->openacc.async.asyncqueue[async])
+  if (!dev->openacc.async.asyncqueue[id])
 {
-  dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func 
();
+  dev->openacc.async.asyncqueue[id] = dev->openacc.async.construct_func ();
 
   /* Link new async queue into active list.  */
   goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
-  n->aq = dev->openacc.async.asyncqueue[async];
+  n->aq = dev->openacc.async.asyncqueue[id];
   n->next = dev->openacc.async.active;
   dev->openacc.async.active = n;
 }
   gomp_mutex_unlock (>openacc.async.lock);
-  return dev->openacc.async.asyncqueue[async];
+  return dev->openacc.async.asyncqueue[id];
 }
 
+/* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
+   might return NULL if no asyncqueue is to be used.  Otherwise, create the
+   asyncqueue if it doesn't exist yet.  */
+
 attribute_hidden struct goacc_asyncqueue *
 get_goacc_asyncqueue (int async)
 {


Grüße
 Thomas


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Fri, 7 Dec 2018 22:19:14 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/7 07:32 PM, Thomas Schwinge wrote:
> > Does the following make sense?
> 
> I don't quite remember why I simply ensured asyncqueue creation here at the 
> time,
> maybe simply because it allowed simpler code at this level.

Well, I think it's just overhead we can avoid.  ;-)

> OTOH, the old logic is to GOMP_fatal upon such an unknown queue case, so 
> maybe that's
> the right thing to do (inside lookup_goacc_asyncqueue()), instead of silently 
> allowing it to pass.
> 
> WDYT?

I argued and posted patches (or will post if not yet done) to make this
defined, valid behavior,  "[OpenACC]
Correctly handle unseen async-arguments".  Please speak up soon if you
disagree.

Thus, I still propose that you include the following.

Please especially review the "libgomp/oacc-parallel.c:goacc_wait" change,
and confirm no corresponding "libgomp/oacc-parallel.c:GOACC_wait" change
to be done, because that code is structured differently.

commit c96c6607b77bdbf562f35209718d8b8c5705c603
Author: Thomas Schwinge 
Date:   Fri Dec 7 12:19:56 2018 +0100

into async re-work: don't create an asyncqueue just to then 
test/synchronize with it
---
 libgomp/oacc-async.c| 12 
 libgomp/oacc-parallel.c |  4 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 553082fe3d4a..c9b134ac3380 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -119,8 +119,11 @@ acc_async_test (int async)
   if (!thr || !thr->dev)
 gomp_fatal ("no device active");
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  return thr->dev->openacc.async.test_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (!aq)
+return 1;
+  else
+return thr->dev->openacc.async.test_func (aq);
 }
 
 int
@@ -148,8 +151,9 @@ acc_wait (int async)
 
   struct goacc_thread *thr = get_goacc_thread ();
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  thr->dev->openacc.async.synchronize_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (aq)
+thr->dev->openacc.async.synchronize_func (aq);
 }
 
 /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 2815a10f0386..9519abeccc2c 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -493,7 +493,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
 {
   int qid = va_arg (*ap, int);
   
-  goacc_aq aq = get_goacc_asyncqueue (qid);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
+  if (!aq)
+   continue;
   if (acc_dev->openacc.async.test_func (aq))
continue;
   if (async == acc_async_sync)


Grüße
 Thomas


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Thu, 06 Dec 2018 21:42:14 +0100, I wrote:
> On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang  
> wrote:
> > Also included in this patch is the code for the acc_get/set_default_async 
> > API functions in OpenACC 2.5.
> > It's a minor part of this patch, but since some code was merge together, 
> > I'm submitting it together here.
> 
> As I requested, I'm reviewing those changes separately, and have backed
> out those changes in my working copy.

... as follows:

commit 79b89a5214dc2624a52f0593bbfad5cefed0c025
Author: Thomas Schwinge 
Date:   Thu Dec 6 15:57:46 2018 +0100

into async re-work: revert default_async changes
---
 include/gomp-constants.h   |   1 -
 libgomp/libgomp.map|   4 -
 libgomp/oacc-async.c   |  19 +-
 libgomp/oacc-init.c|   2 -
 libgomp/oacc-int.h |   3 -
 libgomp/openacc.f90|  22 +-
 libgomp/openacc.h  |   3 -
 libgomp/openacc_lib.h  |  13 -
 .../libgomp.oacc-c-c++-common/asyncwait-2.c| 904 -
 9 files changed, 2 insertions(+), 969 deletions(-)

diff --git include/gomp-constants.h include/gomp-constants.h
index acd25851bcc7..1021306ed661 100644
--- include/gomp-constants.h
+++ include/gomp-constants.h
@@ -160,7 +160,6 @@ enum gomp_map_kind
 /* Asynchronous behavior.  Keep in sync with
libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
 
-#define GOMP_ASYNC_DEFAULT 0
 #define GOMP_ASYNC_NOVAL   -1
 #define GOMP_ASYNC_SYNC-2
 
diff --git libgomp/libgomp.map libgomp/libgomp.map
index c5e1b876fccd..d2381da3bf07 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -464,12 +464,8 @@ OACC_2.5 {
acc_delete_finalize_async_32_h_;
acc_delete_finalize_async_64_h_;
acc_delete_finalize_async_array_h_;
-   acc_get_default_async;
-   acc_get_default_async_h_;
acc_memcpy_from_device_async;
acc_memcpy_to_device_async;
-   acc_set_default_async;
-   acc_set_default_async_h_;
acc_update_device_async;
acc_update_device_async_32_h_;
acc_update_device_async_64_h_;
diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 68aaf199a27e..553082fe3d4a 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -60,7 +60,7 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool 
create, int async)
   /* The special value acc_async_noval (-1) maps to the thread-specific
  default async stream.  */
   if (async == acc_async_noval)
-async = thr->default_async;
+async = 0; //TODO thr->default_async;
 
   if (async == acc_async_sync)
 return NULL;
@@ -221,23 +221,6 @@ acc_wait_all_async (int async)
   gomp_mutex_unlock (>dev->openacc.async.lock);
 }
 
-int
-acc_get_default_async (void)
-{
-  struct goacc_thread *thr = get_goacc_thread ();
-  return thr->default_async;
-}
-
-void
-acc_set_default_async (int async)
-{
-  if (async < acc_async_sync)
-gomp_fatal ("invalid async argument: %d", async);
-
-  struct goacc_thread *thr = get_goacc_thread ();
-  thr->default_async = async;
-}
-
 static void
 goacc_async_unmap_tgt (void *ptr)
 {
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 2c2f91ce3c2c..c40f48829078 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -426,8 +426,6 @@ goacc_attach_host_thread_to_device (int ord)
   
   thr->target_tls
 = acc_dev->openacc.create_thread_data_func (ord);
-
-  thr->default_async = acc_async_default;
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
diff --git libgomp/oacc-int.h libgomp/oacc-int.h
index 3354eb654ce9..97f3fc8a61ed 100644
--- libgomp/oacc-int.h
+++ libgomp/oacc-int.h
@@ -73,9 +73,6 @@ struct goacc_thread
 
   /* Target-specific data (used by plugin).  */
   void *target_tls;
-
-  /* Default OpenACC async queue for current thread, exported to plugin.  */
-  int default_async;
 };
 
 #if defined HAVE_TLS || defined USE_EMUTLS
diff --git libgomp/openacc.f90 libgomp/openacc.f90
index 7d31ee689479..7c809fe00738 100644
--- libgomp/openacc.f90
+++ libgomp/openacc.f90
@@ -51,10 +51,9 @@ module openacc_kinds
 
   integer, parameter :: acc_handle_kind = int32
 
-  public :: acc_async_default, acc_async_noval, acc_async_sync
+  public :: acc_async_noval, acc_async_sync
 
   ! Keep in sync with include/gomp-constants.h.
-  integer (acc_handle_kind), parameter :: acc_async_default = 0
   integer (acc_handle_kind), parameter :: acc_async_noval = -1
   integer (acc_handle_kind), parameter :: acc_async_sync = -2
 
@@ -93,16 +92,6 @@ module openacc_internal
   integer (acc_device_kind) d
 end function
 
-subroutine acc_set_default_async_h (a)
-  import
-  integer a
-end subroutine
-
-function acc_get_default_async_h ()
-  import
-  integer 

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-14 Thread Richard Sandiford
(Maybe the discussion has moved on from this already -- sorry if so)

Christophe Lyon  writes:
> On Wed, 12 Dec 2018 at 12:21, Thomas Preudhomme
>  wrote:
>>
>> So my understanding is that the original code (CMSIS library) used to
>> clobber sp because the asm statement was actually changing the sp.
>> That in turn led GCC to try to save and restore sp which is not what
>> CMSIS was expecting to happen. Changing sp without clobber as done now
>> is probably the right solution and r242693 can be reverted. That will
>> remove the failing test.
>>
>
> OK, I read PR52813 too, but I'm not sure to fully understand the new status.
> My understanding is that since this patch was committed, if an asm statement
> clobbers sp, it is now allowed to actually declare it as clobber (this patch
> generates an error in such a case).
> So the user is now expected to lie to the compiler when writing to
> this kind of register (sp, pic register), by not declaring it as "clobber"?

The user isn't expected to lie.  The point is that GCC simply doesn't
support asms that leave the stack pointer with a different value from
before, and IMO never has.  If that happened to work in some cases then
it was purely an accident.

The PRs also show disagreement about what GCC should do for an asm like
that.  The asm in PR52813 temporarily changed the stack pointer and the
bug was that GCC didn't restore the original value afterwards.  The asm
in PR77904 was trying to set the stack pointer to an entirely new value
and the bug was the GCC did restore the original value afterwards,
defeating the point.

This wouldn't be the first time that there's disagreement about what
the behaviour should be.  But IMO we can't support either reliably.
Spilling sp is dangerous in general because we might need the stack
for the reload, or we might accidentally try to reload something else
before restoring the stack pointer.  And continuing with a new sp
provided by the asm could lead to all sorts of problems.  (AIUI, the
point of PR77904 was that it would also be wrong for GCC to set up a
frame pointer and restore the sp from that frame pointer on function
exit.  The new sp value was supposed to survive.  So the answer isn't
simply "use a frame pointer".)

Thanks,
Richard


Re: [PATCH] error on missing LTO symbols

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 02:17:49PM +0100, Tom de Vries wrote:
> Build x86_64 and reg-tested libgomp.
> 
> 2018-12-13  Tom de Vries  
> 
>   * lto-cgraph.c (verify_node_partition): New function.
>   (input_overwrite_node, input_varpool_node): Use verify_node_partition.
> 
>   * testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test.
>   * testsuite/libgomp.c-c++-common/function-not-offloaded.c: New test.
>   * testsuite/libgomp.c-c++-common/variable-not-offloaded.c: New test.
>   * testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c: New 
> test.
>   * testsuite/libgomp.oacc-c-c++-common/variable-not-offloaded.c: New 
> test.

> +/* Verify the partitioning of a varpool_node or cgraph_node with DECL and 
> NAME,
> +   as specified by IN_OTHER_PARTITION and USED_FROM_OTHER_PARTITION.  */
> +
> +static inline void
> +verify_node_partition (symtab_node *node)

Please update the function comment.

Ok with that change.

Jakub


[PATCH] Fix PR84521

2018-12-14 Thread Wilco Dijkstra
This fixes and simplifies the setjmp and non-local goto implementation.
Currently the virtual frame pointer is saved when using __builtin_setjmp or
a non-local goto.  Depending on whether a frame pointer is used, this may
either save SP or FP with an immediate offset.  However the goto or longjmp
always updates the hard frame pointer.

A receiver veneer in the original function then assigns the hard frame pointer
to the virtual frame pointer, which should, if it works correctly, again assign
SP or FP.  However the special elimination code in eliminate_regs_in_insn
doesn't do this correctly unless the frame pointer is used, and even if it
worked by writing SP, the frame pointer would still be corrupted.

A much simpler implementation is to always save and restore the hard frame
pointer.  This avoids 2 redundant instructions which add/subtract the virtual
frame offset.  A large amount of code can be removed as a result, including all
implementations of TARGET_BUILTIN_SETJMP_FRAME_VALUE (all of which already use
the hard frame pointer).  The expansion of nonlocal_goto on PA can be simplied
to just restore the hard frame pointer. 

This fixes the most obvious issues, however there are still issues on targets
which define HARD_FRAME_POINTER_IS_FRAME_POINTER (arm, mips, xtensa).
Each function could have a different hard frame pointer, so a non-local goto
may restore the wrong frame pointer (TARGET_BUILTIN_SETJMP_FRAME_VALUE could
be useful for this).

The i386 TARGET_BUILTIN_SETJMP_FRAME_VALUE was incorrect: if stack_realign_fp
is true, it would save the hard frame pointer value but restore the virtual
frame pointer which according to ix86_initial_elimination_offset can have a
non-zero offset from the hard frame pointer.

The ia64 implementation of nonlocal_goto seems incorrect since the helper
function moves the the frame pointer value into the static chain register
(so this patch does nothing to make it better or worse).

AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm.

ChangeLog:
2018-12-13  Wilco Dijkstra  

gcc/
PR middle-end/84521
* builtins.c (expand_builtin_setjmp_setup): Save hard_frame_pointer_rtx.
(expand_builtin_setjmp_receiver): Do not emit sfp = fp move since we 
restore fp.
* function.c (expand_function_start): Save hard_frame_pointer_rtx for 
non-local goto.
* lra-eliminations.c (eliminate_regs_in_insn): Remove sfp = fp 
elimination code.
(remove_reg_equal_offset_note): Remove unused function.
* reload1.c (eliminate_regs_in_insn): Remove sfp = fp elimination code.
* config/arc/arc.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
(arc_builtin_setjmp_frame_value): Remove function.
* config/avr/avr.c  (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
(avr_builtin_setjmp_frame_value): Remove function.
* config/i386/i386.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
(ix86_builtin_setjmp_frame_value): Remove function.
* config/pa/pa.md (nonlocal_goto): Remove FP adjustment.
* config/sparc/sparc.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
(sparc_builtin_setjmp_frame_value): Remove function.
* config/vax/vax.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
(vax_builtin_setjmp_frame_value): Remove function.

testsuite/
PR middle-end/84521
* gcc.c-torture/execute/pr84521.c: New test.

---
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
2ef9c9afcc69fcb775dc6a6fff550025bdc76337..55b78adbc3df8c970083e6d9b548a8ca7dc52600
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -982,7 +982,7 @@ expand_builtin_setjmp_setup (rtx buf_addr, rtx 
receiver_label)
 
   mem = gen_rtx_MEM (Pmode, buf_addr);
   set_mem_alias_set (mem, setjmp_alias_set);
-  emit_move_insn (mem, targetm.builtin_setjmp_frame_value ());
+  emit_move_insn (mem, hard_frame_pointer_rtx);
 
   mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, buf_addr,
   GET_MODE_SIZE (Pmode))),
@@ -1024,31 +1024,6 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
   if (chain && REG_P (chain))
 emit_clobber (chain);
 
-  /* Now put in the code to restore the frame pointer, and argument
- pointer, if needed.  */
-  if (! targetm.have_nonlocal_goto ())
-{
-  /* First adjust our frame pointer to its actual value.  It was
-previously set to the start of the virtual area corresponding to
-the stacked variables when we branched here and now needs to be
-adjusted to the actual hardware fp value.
-
-Assignments to virtual registers are converted by
-instantiate_virtual_regs into the corresponding assignment
-to the underlying register (fp in this case) that makes
-the original assignment true.
-So the following insn will actually be decrementing fp by
-TARGET_STARTING_FRAME_OFFSET.  */
-  emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
-
-  /* 

Re: [PATCH] error on missing LTO symbols

2018-12-14 Thread Tom de Vries
On 14-12-18 14:08, Jakub Jelinek wrote:
> On Fri, Dec 14, 2018 at 02:07:18PM +0100, Tom de Vries wrote:
>> Done, using offload_device_nonshared_as for
>> libgomp.c-c++-common/variable-not-offloaded.c and
>> openacc_nvidia_accel_configured for
>> libgomp.oacc-c-c++-common/function-not-offloaded.c.
>>
>>> Otherwise LGTM.
>>
>> Updated patch OK?
> 
> ENOPATCH

Sorry, here it is.

Thanks,
- Tom
[offloading] Error on missing symbols

When compiling an OpenMP or OpenACC program containing a reference in the
offloaded code to a symbol that has not been included in the offloaded code,
the offloading compiler may ICE in lto1.

Fix this by erroring out instead, mentioning the problematic symbol:
...
error: variable 'var' has been referenced in offloaded code but hasn't
  been marked to be included in the offloaded code
lto1: fatal error: errors during merging of translation units
compilation terminated.
...

Build x86_64 with nvptx accelerator and reg-tested libgomp.

Build x86_64 and reg-tested libgomp.

2018-12-13  Tom de Vries  

	* lto-cgraph.c (verify_node_partition): New function.
	(input_overwrite_node, input_varpool_node): Use verify_node_partition.

	* testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test.
	* testsuite/libgomp.c-c++-common/function-not-offloaded.c: New test.
	* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/variable-not-offloaded.c: New test.

---
 gcc/lto-cgraph.c   | 40 ++
 .../function-not-offloaded-aux.c   | 12 +++
 .../libgomp.c-c++-common/function-not-offloaded.c  | 16 +
 .../libgomp.c-c++-common/variable-not-offloaded.c  | 19 ++
 .../function-not-offloaded.c   | 18 ++
 .../variable-not-offloaded.c   | 17 +
 6 files changed, 115 insertions(+), 7 deletions(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 8cc3c75..546abaeff48 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1091,6 +1091,37 @@ output_offload_tables (void)
 }
 }
 
+/* Verify the partitioning of a varpool_node or cgraph_node with DECL and NAME,
+   as specified by IN_OTHER_PARTITION and USED_FROM_OTHER_PARTITION.  */
+
+static inline void
+verify_node_partition (symtab_node *node)
+{
+  if (flag_ltrans)
+return;
+
+#ifdef ACCEL_COMPILER
+  if (node->in_other_partition)
+{
+  if (TREE_CODE (node->decl) == FUNCTION_DECL)
+	error_at (DECL_SOURCE_LOCATION (node->decl),
+		  "function %qs has been referenced in offloaded code but"
+		  " hasn%'t been marked to be included in the offloaded code",
+		  node->name ());
+  else if (VAR_P (node->decl))
+	error_at (DECL_SOURCE_LOCATION (node->decl),
+		  "variable %qs has been referenced in offloaded code but"
+		  " hasn%'t been marked to be included in the offloaded code",
+		  node->name ());
+  else
+	gcc_unreachable ();
+}
+#else
+  gcc_assert (!node->in_other_partition
+	  && !node->used_from_other_partition);
+#endif
+}
+
 /* Overwrite the information in NODE based on FILE_DATA, TAG, FLAGS,
STACK_SIZE, SELF_TIME and SELF_SIZE.  This is called either to initialize
NODE or to replace the values in it, for instance because the first
@@ -1153,9 +1184,7 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
   node->resolution = bp_unpack_enum (bp, ld_plugin_symbol_resolution,
  LDPR_NUM_KNOWN);
   node->split_part = bp_unpack_value (bp, 1);
-  gcc_assert (flag_ltrans
-	  || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+  verify_node_partition (node);
 }
 
 /* Return string alias is alias of.  */
@@ -1366,10 +1395,7 @@ input_varpool_node (struct lto_file_decl_data *file_data,
 node->set_section_for_node (section);
   node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
 	LDPR_NUM_KNOWN);
-  gcc_assert (flag_ltrans
-	  || (!node->in_other_partition
-		  && !node->used_from_other_partition));
-
+  verify_node_partition (node);
   return node;
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c
new file mode 100644
index 000..b8aa3da48a1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c
@@ -0,0 +1,12 @@
+/* { dg-skip-if "" { *-*-* } } */
+
+#pragma omp declare target
+extern int var;
+#pragma omp end declare target
+
+void __attribute__((noinline, noclone))
+foo (void)
+{
+  var++;
+}
+
diff --git a/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c
new file mode 100644
index 000..9e59ef8864e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c
@@ -0,0 +1,16 @@
+/* { dg-do link } */
+/* { 

Re: [PATCH] error on missing LTO symbols

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 02:07:18PM +0100, Tom de Vries wrote:
> Done, using offload_device_nonshared_as for
> libgomp.c-c++-common/variable-not-offloaded.c and
> openacc_nvidia_accel_configured for
> libgomp.oacc-c-c++-common/function-not-offloaded.c.
> 
> > Otherwise LGTM.
> 
> Updated patch OK?

ENOPATCH

Jakub


Re: [PATCH] error on missing LTO symbols

2018-12-14 Thread Tom de Vries
[ cc-ing HSAIL maintainer ]

On 14-12-18 10:54, Jakub Jelinek wrote:
> On Fri, Dec 14, 2018 at 10:21:35AM +0100, Tom de Vries wrote:
>> Build and reg-tested on x86_64 with nvptx accelerator.
>>
>> 2018-12-13  Tom de Vries  
>>
>>  * lto-cgraph.c (verify_node_partition): New function.
>>  (input_overwrite_node, input_varpool_node): Use verify_node_partition.
>>
>>  * testsuite/libgomp.c-c++-common/function-not-offloaded.c: New test.
>>  * testsuite/libgomp.c-c++-common/variable-not-offloaded.c: New test.
> 
>> +  if (TREE_CODE (decl) == FUNCTION_DECL
>> +  || TREE_CODE (decl) == VAR_DECL)
>> +error_at (DECL_SOURCE_LOCATION (decl),
>> +  "%s %qs has been referenced in offloaded code but"
>> +  " hasn't been marked to be included in the offloaded code",
>> +  TREE_CODE (decl) == FUNCTION_DECL ? "function" : "variable",
>> +  name);
> 
> This is translation unfriendly.  Please just do:
>   if (TREE_CODE (decl) == FUNCTION_DECL)
>   error_at (...);
>   else if (VAR_P (decl))
>   error_at (...);
>   else
>   gcc_unreachable ();
> (also note VAR_P).  And, please use hasn%'t instead of hasn't.
> 

Done.

>> @@ -1153,9 +1184,8 @@ input_overwrite_node (struct lto_file_decl_data 
>> *file_data,
>>node->resolution = bp_unpack_enum (bp, ld_plugin_symbol_resolution,
>>   LDPR_NUM_KNOWN);
>>node->split_part = bp_unpack_value (bp, 1);
>> -  gcc_assert (flag_ltrans
>> -  || (!node->in_other_partition
>> -  && !node->used_from_other_partition));
>> +  verify_node_partition (node->decl, node->name (), 
>> node->in_other_partition,
>> + node->used_from_other_partition);
>>  }
> 
> Why are you passing all these arguments to that function (especially calling
> node->name () even when you don't know if it will be needed or not)?
> Doesn't both cgraph_node and varpool_node inherit from symtab_node, which
> has all these 3 fields as well as name () method?
> So, I think if verify_node_partition takes a symtab_node *node argument
> and the callers just both do
>   verify_node_partition (node);
> it should work fine.  I'd make it inline too.
> 

Done.

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do link } */
>> +/* { dg-excess-errors "lto1, mkoffload and lto-wrapper fatal errors" } */
>> +
>> +#pragma omp declare target
>> +int results[2000];
>> +#pragma omp end declare target
>> +
>> +void __attribute__((noinline, noclone))
>> +baz (int i) /* { dg-error "function 'baz' has been referenced in offloaded 
>> code but hasn't been marked to be included in the offloaded code" } */
>> +{
>> +  results[i]++;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +#pragma omp target
>> +#pragma omp for
>> +  for (int i = 0; i < 2000; i++)
>> +baz (i);
>> +}
> 
> Note, this will be well defined in OpenMP 5.0, just the support isn't there.
> The spec says:
> "If a function (C, C++, Fortran) or subroutine (Fortran) is referenced in a 
> target construct then
> that function or subroutine is treated as if its name had appeared in a to 
> clause on a
> declare target directive."
> and
> "If a function is referenced in a function that appears as a list item in a 
> to clause on a
> declare target directive then the name of the referenced function is treated 
> as if it had
> appeared in a to clause on a declare target directive.
> 
> If a variable with static storage duration or a function (except lambda for 
> C++) is referenced in the
> initializer expression list of a variable with static storage duration that 
> appears as a list item in a to
> clause on a declare target directive then the name of the referenced variable 
> or function is
> treated as if it had appeared in a to clause on a declare target directive."
> 
> so for functions it should work through implicit propagation of the
> "omp declare target" attribute.
> 
> Can't find a restriction I'd expect to see that if a declaration of a
> function or variable is marked declare target then the definition has to be
> as well, will talk to omp-lang.
> 
> In any case, for the above testcase it might be better to split it into two
> sources with dg-auxiliary-source, declare baz in the source with main and
> define in another file (where declare instead of define the variable).
>

Done, but that ends up not triggering the introduced error condition
anymore. Instead we generate an "unresolved symbol foo".

So I've added the corresponding OpenACC test-cases as well (and the
introduced error triggers only for OpenACC functions and OpenMP variables).

>> diff --git a/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c 
>> b/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
>> new file mode 100644
>> index 000..c2e1d57adea
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
>> @@ 

Re: [PATCH 01/10] Fix LRA bug

2018-12-14 Thread Andrew Stubbs

On 14/12/2018 10:04, Andrew Stubbs wrote:
Anyway, this patch should not affect any use case that did not already 
have UB, so I'll get it committed shortly.


Now done. Thanks for the review.

Andrew


Re: [PATCH] Improve gimplification of constructors with RANGE_EXPRs (PR c++/82294, PR c++/87436)

2018-12-14 Thread Richard Biener
On Fri, 14 Dec 2018, Jakub Jelinek wrote:

> On Fri, Dec 14, 2018 at 10:40:19AM +0100, Richard Biener wrote:
> > This looks OK to me - the only comment I have is on the two magic
> > constants (64 and 8) which are used twice in the patch.  Can you
> > either see to hoist the common condition into sth like
> > 
> >  bool prefer_loop_initializer_p = ...
> > 
> > or add #defines for those constants?  I suppose the hoisting
> > might be tricky as int_size_in_bytes can return -1 and the
> > workarounds are different in both places right now (maybe that's
> > a bug as well...).  OTOH using (unsigned)int_size_in_bytes
> > looks reasonable in general.
> 
> So like this?

Yes.

Thanks,
Richard.

> Still need to wait for the FE patch if I want to commit the testcases, those
> depend on both patches.
> I've added size32plus effective target to the larger test, as 384MB is too
> much for 16 or 20 bit address targets.
> And, I'll gather statistics on how often this makes a difference during
> gimplification during my next bootstraps/regtests.
> 
> 2018-12-14  Jakub Jelinek  
> 
>   PR c++/82294
>   PR c++/87436
>   * expr.h (categorize_ctor_elements): Add p_unique_nz_elts argument.
>   * expr.c (categorize_ctor_elements_1): Likewise.  Compute it like
>   p_nz_elts, except don't multiply it by mult.  Adjust recursive call.
>   Fix up COMPLEX_CST handling.
>   (categorize_ctor_elements): Add p_unique_nz_elts argument, initialize
>   it and pass it through to categorize_ctor_elements_1.
>   (mostly_zeros_p, all_zeros_p): Adjust categorize_ctor_elements callers.
>   * gimplify.c (gimplify_init_constructor): Likewise.  Don't force
>   ctor into readonly data section if num_unique_nonzero_elements is
>   smaller or equal to 1/8 of num_nonzero_elements and size is >= 64
>   bytes.
> 
>   * g++.dg/tree-ssa/pr82294.C: New test.
>   * g++.dg/tree-ssa/pr87436.C: New test.
> 
> --- gcc/expr.h.jj 2018-12-13 18:00:10.527301479 +0100
> +++ gcc/expr.h2018-12-14 11:52:31.941071185 +0100
> @@ -309,7 +309,8 @@ extern bool can_move_by_pieces (unsigned
>  extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
>  
>  extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
> -   HOST_WIDE_INT *, bool *);
> +   HOST_WIDE_INT *, HOST_WIDE_INT *,
> +   bool *);
>  
>  extern void expand_operands (tree, tree, rtx, rtx*, rtx*,
>enum expand_modifier);
> --- gcc/expr.c.jj 2018-12-13 18:00:10.426303121 +0100
> +++ gcc/expr.c2018-12-14 11:52:31.945071118 +0100
> @@ -5945,10 +5945,11 @@ count_type_elements (const_tree type, bo
>  
>  static bool
>  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
> + HOST_WIDE_INT *p_unique_nz_elts,
>   HOST_WIDE_INT *p_init_elts, bool *p_complete)
>  {
>unsigned HOST_WIDE_INT idx;
> -  HOST_WIDE_INT nz_elts, init_elts, num_fields;
> +  HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
>tree value, purpose, elt_type;
>  
>/* Whether CTOR is a valid constant initializer, in accordance with what
> @@ -5958,6 +5959,7 @@ categorize_ctor_elements_1 (const_tree c
>bool const_p = const_from_elts_p ? true : TREE_STATIC (ctor);
>  
>nz_elts = 0;
> +  unique_nz_elts = 0;
>init_elts = 0;
>num_fields = 0;
>elt_type = NULL_TREE;
> @@ -5982,12 +5984,13 @@ categorize_ctor_elements_1 (const_tree c
>   {
>   case CONSTRUCTOR:
> {
> - HOST_WIDE_INT nz = 0, ic = 0;
> + HOST_WIDE_INT nz = 0, unz = 0, ic = 0;
>  
> - bool const_elt_p = categorize_ctor_elements_1 (value, , ,
> -p_complete);
> + bool const_elt_p = categorize_ctor_elements_1 (value, , ,
> +, p_complete);
>  
>   nz_elts += mult * nz;
> + unique_nz_elts += unz;
>   init_elts += mult * ic;
>  
>   if (const_from_elts_p && const_p)
> @@ -5999,21 +6002,31 @@ categorize_ctor_elements_1 (const_tree c
>   case REAL_CST:
>   case FIXED_CST:
> if (!initializer_zerop (value))
> - nz_elts += mult;
> + {
> +   nz_elts += mult;
> +   unique_nz_elts++;
> + }
> init_elts += mult;
> break;
>  
>   case STRING_CST:
> nz_elts += mult * TREE_STRING_LENGTH (value);
> +   unique_nz_elts += TREE_STRING_LENGTH (value);
> init_elts += mult * TREE_STRING_LENGTH (value);
> break;
>  
>   case COMPLEX_CST:
> if (!initializer_zerop (TREE_REALPART (value)))
> - nz_elts += mult;
> + {
> +   nz_elts += mult;
> +   unique_nz_elts++;
> + }
> if (!initializer_zerop (TREE_IMAGPART (value)))
> - nz_elts += mult;
> -  

Re: [committed][testsuite] Remove bashism from libbacktrace/allocfail.sh

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 10:48:26AM +0100, Tom de Vries wrote:
> Test-case libbacktrace/allocfail.sh contains bashism "set -o pipefail", which
> makes the script fail on ubuntu 18.04, which links /bin/sh to /bin/dash.
> 
> Fix this by removing the "set -o pipefail".
> 
> Tested by running the test-case with dash on x86_64-linux.

Yeah, the script doesn't contain any |s, so it shouldn't make any difference.

> 2018-12-14  Tom de Vries  
> 
>   PR testsuite/88491
>   * allocfail.sh: Remove "set -o pipefail".
> 
> ---
>  libbacktrace/allocfail.sh | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/libbacktrace/allocfail.sh b/libbacktrace/allocfail.sh
> index 91bc7a3e73d..6914de173a3 100755
> --- a/libbacktrace/allocfail.sh
> +++ b/libbacktrace/allocfail.sh
> @@ -32,7 +32,6 @@
>  # POSSIBILITY OF SUCH DAMAGE.
>  
>  set -e
> -set -o pipefail
>  
>  if [ ! -f ./allocfail ]; then
>  # Hard failure.

Jakub


[Ada] Fix Max_Size_In_Storage_Elements for unconstrained array types

2018-12-14 Thread Eric Botcazou
It appears that GNAT was not fully compliant with the intent of the RM here 
because it wouldn't include the size of the bounds added in front of the data 
in an allocation in the value of Max_Size_In_Storage_Elements.

Tested on x86_64-suse-linux, applied on the mainline.


2018-12-14  Eric Botcazou  

* gcc-interface/decl.c (rm_size): Take into account the padding in
the case of a record type containing a template.
* gcc-interface/trans.c (Attribute_to_gnu) : Likewise.
Do not subtract the padded size for Max_Size_In_Storage_Elements.
: Tweak comment.


2018-12-14  Eric Botcazou  

* gnat.dg/max_size.adb: New test.
* gnat.dg/max_size_pkg.ads: Likewise.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 267130)
+++ gcc-interface/decl.c	(working copy)
@@ -10049,13 +10049,14 @@ rm_size (tree gnu_type)
   if (INTEGRAL_TYPE_P (gnu_type) && TYPE_RM_SIZE (gnu_type))
 return TYPE_RM_SIZE (gnu_type);
 
-  /* Return the RM size of the actual data plus the size of the template.  */
+  /* If the type contains a template, return the padded size of the template
+ plus the RM size of the actual data.  */
   if (TREE_CODE (gnu_type) == RECORD_TYPE
   && TYPE_CONTAINS_TEMPLATE_P (gnu_type))
 return
   size_binop (PLUS_EXPR,
-		  rm_size (TREE_TYPE (DECL_CHAIN (TYPE_FIELDS (gnu_type,
-		  DECL_SIZE (TYPE_FIELDS (gnu_type)));
+		  bit_position (DECL_CHAIN (TYPE_FIELDS (gnu_type))),
+		  rm_size (TREE_TYPE (DECL_CHAIN (TYPE_FIELDS (gnu_type);
 
   /* For record or union types, we store the size explicitly.  */
   if (RECORD_OR_UNION_TYPE_P (gnu_type)
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 267062)
+++ gcc-interface/trans.c	(working copy)
@@ -2308,10 +2308,8 @@ Attribute_to_gnu (Node_Id gnat_node, tre
   gnu_type = TREE_TYPE (gnu_prefix);
 
   /* Replace an unconstrained array type with the type of the underlying
-	 array.  We can't do this with a call to maybe_unconstrained_array
-	 since we may have a TYPE_DECL.  For 'Max_Size_In_Storage_Elements,
-	 use the record type that will be used to allocate the object and its
-	 template.  */
+	 array, except for 'Max_Size_In_Storage_Elements because we need to
+	 return the (maximum) size requested for an allocator.  */
   if (TREE_CODE (gnu_type) == UNCONSTRAINED_ARRAY_TYPE)
 	{
 	  gnu_type = TYPE_OBJECT_RECORD_TYPE (gnu_type);
@@ -2375,11 +2373,15 @@ Attribute_to_gnu (Node_Id gnat_node, tre
 	gnu_result = substitute_placeholder_in_expr (gnu_result, gnu_expr);
 	}
 
-  /* If the type contains a template, subtract its size.  */
+  /* If the type contains a template, subtract the padded size of the
+	 template, except for 'Max_Size_In_Storage_Elements because we need
+	 to return the (maximum) size requested for an allocator.  */
   if (TREE_CODE (gnu_type) == RECORD_TYPE
-	  && TYPE_CONTAINS_TEMPLATE_P (gnu_type))
-	gnu_result = size_binop (MINUS_EXPR, gnu_result,
- DECL_SIZE (TYPE_FIELDS (gnu_type)));
+	  && TYPE_CONTAINS_TEMPLATE_P (gnu_type)
+	  && attribute != Attr_Max_Size_In_Storage_Elements)
+	gnu_result
+	  = size_binop (MINUS_EXPR, gnu_result,
+			bit_position (DECL_CHAIN (TYPE_FIELDS (gnu_type;
 
   /* For 'Max_Size_In_Storage_Elements, adjust the unit.  */
   if (attribute == Attr_Max_Size_In_Storage_Elements)
@@ -2856,8 +2858,7 @@ Attribute_to_gnu (Node_Id gnat_node, tre
   gnu_type = TREE_TYPE (gnu_prefix);
   gcc_assert (TREE_CODE (gnu_type) == UNCONSTRAINED_ARRAY_TYPE);
 
-  /* What we want is the offset of the ARRAY field in the record
-	 that the thin pointer designates.  */
+  /* Return the padded size of the template in the object record type.  */
   gnu_type = TYPE_OBJECT_RECORD_TYPE (gnu_type);
   gnu_result = bit_position (DECL_CHAIN (TYPE_FIELDS (gnu_type)));
   gnu_result_type = get_unpadded_type (Etype (gnat_node));
-- { dg-do run }

with Max_Size_Pkg; use Max_Size_Pkg;

procedure Max_Size is
begin
  if Arr1'Max_Size_In_Storage_Elements /= 7 then
raise Program_Error;
  end if;
  if Arr2'Max_Size_In_Storage_Elements /= 24 then
raise Program_Error;
  end if;
end;
package Max_Size_Pkg is

  type Index is range 1 .. 5;

  type Arr1 is array (Index range <>) of Short_Short_Integer;

  type Arr2 is array (Index range <>) of Integer;

end Max_Size_Pkg;


Re: [PATCH] Improve gimplification of constructors with RANGE_EXPRs (PR c++/82294, PR c++/87436)

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 10:40:19AM +0100, Richard Biener wrote:
> This looks OK to me - the only comment I have is on the two magic
> constants (64 and 8) which are used twice in the patch.  Can you
> either see to hoist the common condition into sth like
> 
>  bool prefer_loop_initializer_p = ...
> 
> or add #defines for those constants?  I suppose the hoisting
> might be tricky as int_size_in_bytes can return -1 and the
> workarounds are different in both places right now (maybe that's
> a bug as well...).  OTOH using (unsigned)int_size_in_bytes
> looks reasonable in general.

So like this?
Still need to wait for the FE patch if I want to commit the testcases, those
depend on both patches.
I've added size32plus effective target to the larger test, as 384MB is too
much for 16 or 20 bit address targets.
And, I'll gather statistics on how often this makes a difference during
gimplification during my next bootstraps/regtests.

2018-12-14  Jakub Jelinek  

PR c++/82294
PR c++/87436
* expr.h (categorize_ctor_elements): Add p_unique_nz_elts argument.
* expr.c (categorize_ctor_elements_1): Likewise.  Compute it like
p_nz_elts, except don't multiply it by mult.  Adjust recursive call.
Fix up COMPLEX_CST handling.
(categorize_ctor_elements): Add p_unique_nz_elts argument, initialize
it and pass it through to categorize_ctor_elements_1.
(mostly_zeros_p, all_zeros_p): Adjust categorize_ctor_elements callers.
* gimplify.c (gimplify_init_constructor): Likewise.  Don't force
ctor into readonly data section if num_unique_nonzero_elements is
smaller or equal to 1/8 of num_nonzero_elements and size is >= 64
bytes.

* g++.dg/tree-ssa/pr82294.C: New test.
* g++.dg/tree-ssa/pr87436.C: New test.

--- gcc/expr.h.jj   2018-12-13 18:00:10.527301479 +0100
+++ gcc/expr.h  2018-12-14 11:52:31.941071185 +0100
@@ -309,7 +309,8 @@ extern bool can_move_by_pieces (unsigned
 extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
 
 extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
- HOST_WIDE_INT *, bool *);
+ HOST_WIDE_INT *, HOST_WIDE_INT *,
+ bool *);
 
 extern void expand_operands (tree, tree, rtx, rtx*, rtx*,
 enum expand_modifier);
--- gcc/expr.c.jj   2018-12-13 18:00:10.426303121 +0100
+++ gcc/expr.c  2018-12-14 11:52:31.945071118 +0100
@@ -5945,10 +5945,11 @@ count_type_elements (const_tree type, bo
 
 static bool
 categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
+   HOST_WIDE_INT *p_unique_nz_elts,
HOST_WIDE_INT *p_init_elts, bool *p_complete)
 {
   unsigned HOST_WIDE_INT idx;
-  HOST_WIDE_INT nz_elts, init_elts, num_fields;
+  HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
   tree value, purpose, elt_type;
 
   /* Whether CTOR is a valid constant initializer, in accordance with what
@@ -5958,6 +5959,7 @@ categorize_ctor_elements_1 (const_tree c
   bool const_p = const_from_elts_p ? true : TREE_STATIC (ctor);
 
   nz_elts = 0;
+  unique_nz_elts = 0;
   init_elts = 0;
   num_fields = 0;
   elt_type = NULL_TREE;
@@ -5982,12 +5984,13 @@ categorize_ctor_elements_1 (const_tree c
{
case CONSTRUCTOR:
  {
-   HOST_WIDE_INT nz = 0, ic = 0;
+   HOST_WIDE_INT nz = 0, unz = 0, ic = 0;
 
-   bool const_elt_p = categorize_ctor_elements_1 (value, , ,
-  p_complete);
+   bool const_elt_p = categorize_ctor_elements_1 (value, , ,
+  , p_complete);
 
nz_elts += mult * nz;
+   unique_nz_elts += unz;
init_elts += mult * ic;
 
if (const_from_elts_p && const_p)
@@ -5999,21 +6002,31 @@ categorize_ctor_elements_1 (const_tree c
case REAL_CST:
case FIXED_CST:
  if (!initializer_zerop (value))
-   nz_elts += mult;
+   {
+ nz_elts += mult;
+ unique_nz_elts++;
+   }
  init_elts += mult;
  break;
 
case STRING_CST:
  nz_elts += mult * TREE_STRING_LENGTH (value);
+ unique_nz_elts += TREE_STRING_LENGTH (value);
  init_elts += mult * TREE_STRING_LENGTH (value);
  break;
 
case COMPLEX_CST:
  if (!initializer_zerop (TREE_REALPART (value)))
-   nz_elts += mult;
+   {
+ nz_elts += mult;
+ unique_nz_elts++;
+   }
  if (!initializer_zerop (TREE_IMAGPART (value)))
-   nz_elts += mult;
- init_elts += mult;
+   {
+ nz_elts += mult;
+ unique_nz_elts++;
+   }
+ init_elts += 2 * mult;
  break;
 
case VECTOR_CST:
@@ 

[Ada] Small fix for records with boolean discriminants

2018-12-14 Thread Eric Botcazou
The point is to avoid building comparisons of the form "D == true" in this 
case for the qualifiers of the union describing the variant part because 1) 
it's useless and 2) it's output as "D = 1" by -gnatR, which is not valid Ada.

The dwarf2out.c change only affects the Ada compiler.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2018-12-14  Eric Botcazou  

* dwarf2out.c (analyze_discr_in_predicate): Simplify.
(analyze_variants_discr): Deal with naked boolean discriminants.
ada/
* gcc-interface/decl.c (choices_to_gnu): Directly use a naked boolean
discriminant if the value is the boolean true.

-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 267062)
+++ dwarf2out.c	(working copy)
@@ -24537,6 +24537,7 @@ gen_inheritance_die (tree binfo, tree ac
 
 /* Return whether DECL is a FIELD_DECL that represents the variant part of a
structure.  */
+
 static bool
 is_variant_part (tree decl)
 {
@@ -24550,17 +24551,8 @@ is_variant_part (tree decl)
 static tree
 analyze_discr_in_predicate (tree operand, tree struct_type)
 {
-  bool continue_stripping = true;
-  while (continue_stripping)
-switch (TREE_CODE (operand))
-  {
-  CASE_CONVERT:
-	operand = TREE_OPERAND (operand, 0);
-	break;
-  default:
-	continue_stripping = false;
-	break;
-  }
+  while (CONVERT_EXPR_P (operand))
+operand = TREE_OPERAND (operand, 0);
 
   /* Match field access to members of struct_type only.  */
   if (TREE_CODE (operand) == COMPONENT_REF
@@ -24780,6 +24772,19 @@ analyze_variants_discr (tree variant_par
 	  new_node->dw_discr_range = true;
 	}
 
+	  else if ((candidate_discr
+		  = analyze_discr_in_predicate (match_expr, struct_type))
+		   && TREE_TYPE (candidate_discr) == boolean_type_node)
+	{
+	  /* We are matching:   for a boolean discriminant.
+		 This sub-expression matches boolean_true_node.  */
+	  new_node = ggc_cleared_alloc ();
+	  if (!get_discr_value (boolean_true_node,
+_node->dw_discr_lower_bound))
+		goto abort;
+	  new_node->dw_discr_range = false;
+	}
+
 	  else
 	/* Unsupported sub-expression: we cannot determine the set of
 	   matching discriminant values.  Abort everything.  */
Index: ada/gcc-interface/decl.c
===
--- ada/gcc-interface/decl.c	(revision 267062)
+++ ada/gcc-interface/decl.c	(working copy)
@@ -6848,6 +6848,9 @@ choices_to_gnu (tree gnu_operand, Node_I
 			 build_binary_op (LE_EXPR, boolean_type_node,
 	  gnu_operand, gnu_high, true),
 			 true);
+  else if (gnu_low == boolean_true_node
+	   && TREE_TYPE (gnu_operand) == boolean_type_node)
+	gnu_test = gnu_operand;
   else if (gnu_low)
 	gnu_test
 	  = build_binary_op (EQ_EXPR, boolean_type_node, gnu_operand, gnu_low,


Re: [PATCH] x86: Add -march=cascadelake

2018-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 06:33:37PM +0800, Wei Xiao wrote:
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -832,8 +832,16 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  cpu = "skylake";
  break;
case 0x55:
- /* Skylake with AVX-512.  */
- cpu = "skylake-avx512";
+ if (has_avx512vnni)
+ {
+   /* Cascade Lake.  */
+   cpu = "cascadelake";
+ }
+ else
+ {
+   /* Skylake with AVX-512.  */
+   cpu = "skylake-avx512";
+ }
  break;

Just a formatting nit here, if {}s are used, they should be indented
2 columns to the right from the if or else and the body of {} should
be indented by two further columns over {.
But, in this case, there is another rule, that if the body has a single
statement, then there shouldn't be {}s around it.  Thus just:
  if (has_avx512vnni)
/* Cascade Lake.  */
cpu = "cascadelake";
  else
/* Skylake with AVX-512.  */
cpu = "skylake-avx512";

Jakub


RE: [PATCH 0/2][ARC] Fixes needed for the upcomming release.

2018-12-14 Thread Claudiu Zissulescu
Thank you Andrew for your quick review. Both patches are committed.

Claudiu

From: Claudiu Zissulescu [claz...@gmail.com]
Sent: Tuesday, December 11, 2018 11:23 AM
To: gcc-patches@gcc.gnu.org
Cc: francois.bed...@synopsys.com; claudiu.zissule...@synopsys.com; 
andrew.burg...@embecosm.com
Subject: [PATCH 0/2][ARC] Fixes needed for the upcomming release.

Hi Andrew,

Please find two small patches which are fixing a number of issues found in the 
upcomming release.

Ok to apply?
Claudiu

Claudiu Zissulescu (2):
  [ARC] Fix REG_CLASS_NAMES
  [ARC] Fix millicode wrong blink restore.

 gcc/config/arc/arc.c   |  4 +---
 gcc/config/arc/arc.h   |  1 +
 gcc/testsuite/gcc.target/arc/milli-1.c | 23 +++
 3 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/milli-1.c

--
2.19.1



Re: [PATCH] x86: Add -march=cascadelake

2018-12-14 Thread Wei Xiao
The part 2 is implemented by attached patch.
Ok for trunk?

Wei

gcc/
   * config/i386/driver-i386.c (host_detect_local_cpu): Detect cascadelake.
   * config/i386/i386.c (fold_builtin_cpu): Handle cascadelake.
   * doc/extend.texi: Add cascadelake.

gcc/testsuite/
   * g++.target/i386/mv16.C: Handle new march.
   * gcc.target/i386/builtin_target.c: Ditto.

libgcc/
   * config/i386/cpuinfo.c (get_intel_cpu): Handle cascadelake.
   * config/i386/cpuinfo.h: Add INTEL_COREI7_CASCADELAKE.
Uros Bizjak  于2018年12月13日周四 上午12:46写道:
>
> On Wed, Dec 12, 2018 at 10:48 AM Wei Xiao  wrote:
> >
> > Hi Uros and other reviewers,
> >
> > I'd like to split the work into 2 parts:
> > 1) Basic processor enabling.
> > 2) Processor type dynamic check.
> >
> > Let's use a separate patch to implement the part 2.
> > The part 1 is implemented by attached patch.
> > Is it ok for trunk?
> >
> > Wei
> >
> > gcc/
> >   * common/config/i386/i386-common.c (processor_names): Add cascadelake.
> >   (processor_alias_table): Add cascadelake.
> >   * config.gcc: Add -march=cascadelake.
> >   * config/i386/i386-c.c (ix86_target_macros_internal): Handle 
> > cascadelake.
> >   * config/i386/i386.c (Add m_CASCADELAKE): New.
> >   (processor_cost_table): Add cascadelake.
> >   (get_builtin_code_for_version): Handle cascadelake.
> >   * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): New.
> >   (PTA_CASCADELAKE): Ditto.
> >   * doc/invoke.texi: Add -march=cascadelake.
> >
> > gcc/testsuite/
> >   * gcc.target/i386/funcspec-56.inc: Handle new march.
>
> OK for mainline.
>
> Thanks,
> Uros.
>
> > Wei Xiao  于2018年11月29日周四 下午4:32写道:
> > >
> > > Hi
> > >
> > > Distinguish based on stepping number is not recommended for some reasons:
> > > 1) Intel doesn't officially disclose stepping information in SDM.
> > > 2) Stepping can be changing in the future.
> > >
> > > We still prefer the conventional distinguish approach based on feature 
> > > bits.
> > > I have refined the patch as attached according to all your suggestions.
> > >
> > > Wei
> > >
> > > gcc/
> > > * common/config/i386/i386-common.c (processor_names): Add 
> > > cascadelake.
> > > (processor_alias_table): Add cascadelake.
> > > * config.gcc: Add -march=cascadelake.
> > > * config/i386/driver-i386.c
> > > (host_detect_local_cpu): Detect cascadelake.
> > > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > > cascadelake.
> > > * config/i386/i386.c (ix86_cost): Add m_CASCADELAKE.
> > > (processor_cost_table): Add cascadelake.
> > > (get_builtin_code_for_version): Handle cascadelake.
> > > (fold_builtin_cpu): Ditto.
> > > * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): 
> > > New.
> > > (PTA_CASCADELAKE): Ditto.
> > > * doc/extend.texi: Add cascadelake.
> > > * doc/invoke.texi: Add -march=cascadelake.
> > > gcc/testsuite/
> > > * g++.target/i386/mv16.C: Handle new march.
> > > * gcc.target/i386/builtin_target.c: Ditto.
> > > * gcc.target/i386/funcspec-56.inc: Ditto.
> > > libgcc/
> > > * config/i386/cpuinfo.c (get_intel_cpu): Handle cascadelake.
> > > * config/i386/cpuinfo.h: Add INTEL_COREI7_CASCADELAKE.
> > > Wei Xiao  于2018年11月27日周二 下午6:40写道:
> > > >
> > > > Thanks for the helpful information!
> > > > But I'm still checking with hardware team about the
> > > > family/model/stepping numbers for Cascadelake which are not officially
> > > > disclosed by Intel (to my best knowledge).
> > > >
> > > > Wei
> > > > Martin Liška  于2018年11月26日周一 下午10:00写道:
> > > > >
> > > > > On 11/26/18 12:18 PM, Jakub Jelinek wrote:
> > > > > > On Mon, Nov 26, 2018 at 12:03:53PM +0100, Martin Liška wrote:
> > > > > >>> For Cascade Lake the model number is the same as Skylake Server,
> > > > > >>> it can only be distinguished based on the stepping (5 vs 4)
> > > > > >>
> > > > > >> Very interesting, probably the first time a distinguish is based 
> > > > > >> on stepping number?
> > > > > >
> > > > > > Wouldn't it be better to distinguish it based on availability of 
> > > > > > VNNI, like
> > > > > > we do for unknown family/model?
> > > > > >
> > > > > >>> Like gcc -mcpu=native needs to learn about this.
> > > > > >>
> > > > > >> I'm attaching patch that does that. Note that it's completely 
> > > > > >> untested as I don't have
> > > > > >> access to any of the new machines (Skylake server).
> > > > >
> > > > > Would be possible, the only ugly place would be in 
> > > > > libgcc/config/i386/cpuinfo.c where we
> > > > > call:
> > > > >
> > > > >   get_intel_cpu (family, model, stepping, brand_id);
> > > > >   /* Find available features. */
> > > > >   get_available_features (ecx, edx, max_level, _vnni);
> > > > >
> > > > > one would need a feature to distinguish CPU model. Do we really want 
> > > > > that?
> > > > >
> > > > 

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-14 Thread Uecker, Martin


Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:
> On 12/12/18 11:12 AM, Uecker, Martin wrote:

...
> > > > diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
> > > > index 78e768c2366..ef039560eb9 100644
> > > > --- a/gcc/c/c-objc-common.h
> > > > +++ b/gcc/c/c-objc-common.h
> > > > @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
> > > > not see
> > > >  
> > > >  #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
> > > >  #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
> > > > +
> > > > +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
> > > > +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
> > > >  #endif /* GCC_C_OBJC_COMMON */
> > > 
> > > I wonder if we even need the lang hook anymore.  ISTM that a
> > > front-end
> > > that wants to use the function descriptors can just set
> > > FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
> > > else we'll
> > > use the trampoline.  Thoughts?
> > 
> > The lang hook also affects the minimum alignment for function
> > pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
> > does
> > not appear to change the default alignment on any architecture, but
> > it causes a failure in i386/gcc.target/i386/attr-aligned.c when
> > requesting a smaller alignment which is then silently ignored.
> 
> Ugh.  I didn't see that.

The test is new (2019-11-29 Martin Sebor), but one could
argue that we could simply remove this specific test as 'aligned'
is only required to increase alignment. Martin?

> > I am not sure what the best approach is, but my preference
> > would be to remove the lang hook and the FUNCTION_ALIGNMENT
> > logic which will also fix the test case (the requested
> > alignment will be applied).
> > 
> > I would then instead add a warning (or error?) which triggers
> > only with -fno-trampolines if the user requests an alignment
> > which is too small for this mechanism to work.
>
> > Does this sound reasonable?
> 
> So I'm thinking we should wrap the existing patch as-is for the trunk
> (we're well into stage3 after all).  So leave the hook as-is for gcc-
> 9.
> 
> We can then tackle removal of the hook, including twiddling
> FUNCTION_ALIGNMENT for gcc-10.
> 
> Does that sound reasonable to you?

This is fine with me. So just confirm: I should install the 
patch despite the regression?

Best,
Martin




Re: [PATCH 01/10] Fix LRA bug

2018-12-14 Thread Andrew Stubbs

On 13/12/2018 23:49, Jeff Law wrote:

OK.  But be aware we may have to revisit and look more closely what what
you're doing in your port if we stumble over more problems with reload
changing the structure of your insns and causing problems in the process.


Thanks.

What's novel about this, I think, is that we have two candidates for an 
add instruction, and each clobbers a different condition register. I had 
previously handled this by clobbering both, but that was unsatisfactory, 
so I've changed it to use the match_scratch trick I found used in other 
ports.


Using a match_scratch is clearly common practice, but there does appear 
to have been an assumption in LRA that this wouldn't occur within 
patterns that can be transformed by register elimination. Presumably 
those other ports using match_scratch are not using it for some key 
patterns, or do not use register elimination.


Anyway, this patch should not affect any use case that did not already 
have UB, so I'll get it committed shortly.


Andrew


Re: [PATCH] error on missing LTO symbols

2018-12-14 Thread Thomas Schwinge
Hi Tom!

Thanks for looking into this one!

Just one quick comment:

On Fri, 14 Dec 2018 10:21:35 +0100, Tom de Vries  wrote:
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c

> +#ifdef ACCEL_COMPILER
> +  if (in_other_partition)
> +{
> +  if (TREE_CODE (decl) == FUNCTION_DECL
> +   || TREE_CODE (decl) == VAR_DECL)
> + error_at (DECL_SOURCE_LOCATION (decl),
> +   "%s %qs has been referenced in offloaded code but"
> +   " hasn't been marked to be included in the offloaded code",
> +   TREE_CODE (decl) == FUNCTION_DECL ? "function" : "variable",
> +   name);
> +  else
> + gcc_unreachable ();
> +}
> +#else
> +  gcc_assert (!in_other_partition
> +   && !used_from_other_partition);
> +#endif

Given the above...

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c
> @@ -0,0 +1,21 @@
> +/* { dg-do link } */
> +/* { dg-excess-errors "lto1, mkoffload and lto-wrapper fatal errors" } */
> +
> +#pragma omp declare target
> +int results[2000];
> +#pragma omp end declare target
> +
> +void __attribute__((noinline, noclone))
> +baz (int i) /* { dg-error "function 'baz' has been referenced in offloaded 
> code but hasn't been marked to be included in the offloaded code" } */

I think this error will trigger only if offloading compilation is
enabled, so this error or the whole test case needs to be conditionalized
on that?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
> @@ -0,0 +1,21 @@
> +/* { dg-do link } */
> +/* { dg-excess-errors "lto1, mkoffload and lto-wrapper fatal errors" } */
> +
> +int results[2000]; /* { dg-error "variable 'results' has been referenced in 
> offloaded code but hasn't been marked to be included in the offloaded code" } 
> */

Likewise.


Grüße
 Thomas


  1   2   >