Re: types for VR_VARYING

2019-08-15 Thread Aldy Hernandez

On 8/15/19 12:06 PM, Aldy Hernandez wrote:



On 8/15/19 7:23 AM, Richard Biener wrote:

On Thu, Aug 15, 2019 at 12:40 PM Aldy Hernandez  wrote:


On 8/14/19 1:37 PM, Jeff Law wrote:

On 8/13/19 6:39 PM, Aldy Hernandez wrote:



On 8/12/19 7:46 PM, Jeff Law wrote:

On 8/12/19 12:43 PM, Aldy Hernandez wrote:

This is a fresh re-post of:

https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html

Andrew gave me some feedback a week ago, and I obviously don't 
remember
what it was because I was about to leave on PTO.  However, I do 
remember
I addressed his concerns before getting drunk on rum in tropical 
islands.


FWIW found a great coffee infused rum while in Kauai last week.  
I'm not

a coffee fan, but it was wonderful.  The one bottle we brought back
isn't going to last until Cauldron and I don't think I can get a 
special

order filled before I leave :(


You must bring some to Cauldron before we believe you. :)

That's the problem.  The nearest place I can get it is in Vegas and
there's no distributor in Montreal.   I can special order it in our
state run stores, but it won't be here in time.

Of course, I don't mind if you don't believe me.  More for me in that
case...


Is the supports_type_p stuff there to placate the calls from 
ipa-cp?  I

can live with it in the short term, but it really feels like there
should be something in the ipa-cp client that avoids this silliness.


I am not happy with this either, but there are various places where
statements that are !stmt_interesting_for_vrp() are still setting a
range of VARYING, which is then being ignored at a later time.

For example, vrp_initialize:

    if (!stmt_interesting_for_vrp (phi))
  {
    tree lhs = PHI_RESULT (phi);
    set_def_to_varying (lhs);
    prop_set_simulate_again (phi, false);
  }

Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if 
the
statement is interesting for VRP but extract_range_from_stmt() does 
not
produce a useful range, we also set a varying for a range we will 
never

use.  Similarly for a statement that is not interesting in this hunk.

Ugh.  One could perhaps argue that setting any kind of range in these
circumstances is silly.   But I suspect it's necessary due to the
optimistic handling of VR_UNDEFINED in value_range_base::union_helper.
It's all coming back to me now...




Then there is vrp_prop::visit_stmt() where we also set VARYING for 
types

that VRP will never handle:

    case IFN_ADD_OVERFLOW:
    case IFN_SUB_OVERFLOW:
    case IFN_MUL_OVERFLOW:
    case IFN_ATOMIC_COMPARE_EXCHANGE:
  /* These internal calls return _Complex integer type,
 which VRP does not track, but the immediate uses
 thereof might be interesting.  */
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
    {
  imm_use_iterator iter;
  use_operand_p use_p;
  enum ssa_prop_result res = SSA_PROP_VARYING;

  set_def_to_varying (lhs);

I've adjusted the patch so that set_def_to_varying will set the 
range to

VR_UNDEFINED if !supports_type_p.  This is a fail safe, as we can't
really do anything with a nonsensical range.  I just don't want to 
leave

the range in an indeterminate state.


I think VR_UNDEFINED is unsafe due to value_range_base::union_helper.
And that's a more general than this patch.  VR_UNDEFINED is _not_ a 
safe
range to set something to if we can't handle it.  We have to use 
VR_VARYING.


Why?  See the beginning of value_range_base::union_helper:

 /* VR0 has the resulting range if VR1 is undefined or VR0 is 
varying.  */

 if (vr1->undefined_p ()
 || vr0->varying_p ())
   return *vr0;

 /* VR1 has the resulting range if VR0 is undefined or VR1 is 
varying.  */

 if (vr0->undefined_p ()
 || vr1->varying_p ())
   return *vr1;
This can get called for something like

    a =  ? name1 : name2;

If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe
value for something we can't handle, then we'll incorrectly return the
range for name2.


I think that if name1 was !supports_type_p, we will have never called
union/intersect.  We will have bailed at some point earlier.  However, I
do see your point about being consistent.



VR_UNDEFINED can only be used for the ranges of objects we haven't
processed.  If we can't produce a range for an object because the
statement is something we don't handle or just doesn't produce anythign
useful, then the right result is VR_VARYING.

This may be worth commenting at the definition site for VR_*.



I also noticed that Andrew's patch was setting num_vr_values to
num_ssa_names + num_ssa_names / 10.  I think he meant num_vr_values +
num_vr_values / 10.  Please verify the current incantation makes 
sense.
Going to assume this will be adjusted per the other messages in this 
thread.


Done.





diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39ea22f0554..663dd6e2398 100644
--- a/gcc/tre

RE: Add TIGERLAKE and COOPERLAKE to GCC

2019-08-15 Thread Cui, Lili


> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Friday, August 16, 2019 6:02 AM
> To: Jeff Law 
> Cc: Cui, Lili ; Uros Bizjak ; GCC
> Patches ; Zhang, Annita
> ; Xiao, Wei3 ; Liu, Hongtao
> ; Wang, Hongyu ;
> Castillo, Jason M 
> Subject: Re: Add TIGERLAKE and COOPERLAKE to GCC
> 
> On Wed, Aug 14, 2019 at 11:04 AM Jeff Law  wrote:
> >
> > On 8/14/19 1:38 AM, Cui, Lili wrote:
> > > Resend this mail for GCC Patches rejected my message, thanks.
> > >
> > > -Original Message-
> > >
> > > Hi Uros and all:
> > >
> > > This patch is about to add TIGERLAKE and COOPERLAKE to GCC.
> > > TIGERLAKE is based on ICELAKE_CLIENT and plus new ISA
> MOVEDIRI/MOVDIR64B/AVX512VP2INTERSECT.
> > > COOPERLAKE is based on CASCADELAKE and plus new ISA AVX512BF16.
> > >
> > > Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> > >
> > > Changelog:
> > > gcc/
> > >   * common/config/i386/i386-common.c
> > >   (processor_names): Add tigerlake and cooperlake.
> > >   (processor_alias_table): Add tigerlake and cooperlake.
> > >   * config.gcc: Add -march=tigerlake and cooperlake.
> > >   * config/i386/driver-i386.c
> > >(host_detect_local_cpu): Detect tigerlake and cooperlake.
> > >   * config/i386/i386-builtins.c
> > >   (processor_model) : Add M_INTEL_COREI7_TIGERLAKE and
> M_INTEL_COREI7_COOPERLAKE.
> > >   (arch_names_table): Add tigerlake and cooperlake.
> > >   (get_builtin_code_for_version) : Handle PROCESSOR_TIGERLAKE and
> PROCESSOR_COOPERLAKE.
> > >   * config/i386/i386-c.c
> > >   (ix86_target_macros_internal): Handle tigerlake and cooperlake.
> > >   (ix86_target_macros_internal): Handle
> OPTION_MASK_ISA_AVX512VP2INTERSECT.
> > >   * config/i386/i386-options.c
> > >   (m_TIGERLAKE)  : Define.
> > >   (m_COOPERLAKE) : Ditto.
> > >   (m_CORE_AVX512): Ditto.
> > >   (processor_cost_table): Add cascadelake.
> > >   (ix86_target_string)  : Handle -mavx512vp2intersect.
> > >   (ix86_valid_target_attribute_inner_p) : Handle avx512vp2intersect.
> > >   (ix86_option_override_internal): Hadle PTA_SHSTK, PTA_MOVDIRI,
> > >PTA_MOVDIR64B, PTA_AVX512VP2INTERSECT.
> > >   * config/i386/i386.h
> > >   (ix86_size_cost) : Define TARGET_TIGERLAKE and
> TARGET_COOPERLAKE.
> > >   (processor_type) : Add PROCESSOR_TIGERLAKE and
> PROCESSOR_COOPERLAKE.
> > >   (PTA_SHSTK) : Define.
> > >   (PTA_MOVDIRI): Ditto.
> > >   (PTA_MOVDIR64B): Ditto.
> > >   (PTA_COOPERLAKE) : Ditto.
> > >   (PTA_TIGERLAKE)  : Ditto.
> > >   (TARGET_AVX512VP2INTERSECT) : Ditto.
> > >   (TARGET_AVX512VP2INTERSECT_P(x)) : Ditto.
> > >   (processor_type) : Add PROCESSOR_TIGERLAKE and
> PROCESSOR_COOPERLAKE.
> > >   * doc/extend.texi: Add tigerlake and cooperlake.
> > >
> > > gcc/testsuite/
> > >   * gcc.target/i386/funcspec-56.inc: Handle new march.
> > >   * g++.target/i386/mv16.C: Handle new march
> > >
> > > libgcc/
> > >   * config/i386/cpuinfo.h: Add INTEL_COREI7_TIGERLAKE and
> INTEL_COREI7_COOPERLAKE.
> > >
> > ENOPATCH
> >
> > Note that HJ's reworking of the cost tables may require this patch to
> > change for the trunk.
> >
> 
> Yes, I have checked in my patch.  Please rebase.

Done, there is no conflict , thanks.


Lili.


0001-add-tigerlake-and-cooperlake-to-gcc.patch
Description: 0001-add-tigerlake-and-cooperlake-to-gcc.patch


[PATCH] PR target/91441 - Turn off -fsanitize=kernel-address if TARGET_ASAN_SHADOW_OFFSET is not implemented.

2019-08-15 Thread Kito Cheng
 - -fsanitize=kernel-address will call targetm.asan_shadow_offset ()
   at asan_shadow_offset, so it will crash if TARGET_ASAN_SHADOW_OFFSET
   is not implemented, that's mean -fsanitize=kernel-address is not
   supported for target without TARGET_ASAN_SHADOW_OFFSET implementation.

gcc/ChangeLog:

PR target/91441
* toplev.c (process_options): Check TARGET_ASAN_SHADOW_OFFSET is
implemented for -fsanitize=kernel-address, and merge check logic
with -fsanitize=address.

testsuite/ChangeLog:

PR target/91441
* gcc.target/riscv/pr91441.c: New.
---
 gcc/testsuite/gcc.target/riscv/pr91441.c | 10 ++
 gcc/toplev.c | 10 +-
 2 files changed, 11 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr91441.c

diff --git a/gcc/testsuite/gcc.target/riscv/pr91441.c 
b/gcc/testsuite/gcc.target/riscv/pr91441.c
new file mode 100644
index 000..593a2972a0f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr91441.c
@@ -0,0 +1,10 @@
+/* PR target/91441 */
+/* { dg-do compile  } */
+/* { dg-options "--param asan-stack=1 -fsanitize=kernel-address" } */
+
+int *bar(int *);
+int *f( int a)
+{
+  return bar(&a);
+}
+/* { dg-warning ".'-fsanitize=address' and '-fsanitize=kernel-address' are not 
supported for this target" "" { target *-*-* } 0 } */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 7e0b9216dea..ddbb8b49436 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1744,7 +1744,7 @@ process_options (void)
   /* Address Sanitizer needs porting to each target architecture.  */
 
   if ((flag_sanitize & SANITIZE_ADDRESS)
-  && !FRAME_GROWS_DOWNWARD)
+  && (!FRAME_GROWS_DOWNWARD || targetm.asan_shadow_offset == NULL))
 {
   warning_at (UNKNOWN_LOCATION, 0,
  "%<-fsanitize=address%> and %<-fsanitize=kernel-address%> "
@@ -1752,14 +1752,6 @@ process_options (void)
   flag_sanitize &= ~SANITIZE_ADDRESS;
 }
 
-  if ((flag_sanitize & SANITIZE_USER_ADDRESS)
-  && targetm.asan_shadow_offset == NULL)
-{
-  warning_at (UNKNOWN_LOCATION, 0,
- "%<-fsanitize=address%> not supported for this target");
-  flag_sanitize &= ~SANITIZE_ADDRESS;
-}
-
  /* Do not use IPA optimizations for register allocation if profiler is active
 or patchable function entries are inserted for run-time instrumentation
 or port does not emit prologue and epilogue as RTL.  */
-- 
2.17.1



Re: PC-relative TLS support

2019-08-15 Thread Alan Modra
On Thu, Aug 15, 2019 at 01:24:07PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 15, 2019 at 01:35:10PM +0930, Alan Modra wrote:
> > Supporting TLS for -mpcrel turns out to be relatively simple, in part
> > due to deciding that !TARGET_TLS_MARKERS with -mpcrel is silly.  No
> > assembler that I know of supporting prefix insns lacks TLS marker
> > support.
> 
> Will this stay that way?  (Or do we not care, not now anyway?)

I'd say we leave that problem to someone who wants pcrel without tls
markers.  It's not hard to do, just extend rs6000_output_tlsargs and
adjust IS_NOMARK_TLSGETADDR length attribute expressions.

> > Also, at some point powerpc gcc ought to remove
> > !TARGET_TLS_MARKERS generally and simplify all the occurrences of
> > IS_NOMARK_TLSGETADDR in rs6000.md rather than complicating them.
> 
> The last time this came up (a year ago) the conclusion was that we first
> would have to remove AIX support.

Hmm, I wonder has that changed?  A quick look at the source says the
AIX TLS support uses completely different patterns and shouldn't care.

> (Changelog has whitespace damage, I guess that is just from how you
> mailed this?  Please fix when applying it).

Fixed.  (It wasn't the mailer..)

-- 
Alan Modra
Australia Development Lab, IBM


Proposal to patch libiberty.a for controlling whether pathnames on Windows are converted to lower case

2019-08-15 Thread Carroll, Paul

This is a proposed patch to libiberty, with an accompanying patch to GCC.
The purpose of this patch is to make it possible for Windows-hosted 
toolchains to have the ability to control whether Canonicalized 
filenames are converted to all lower-case.

Most Windows users are not affected by this behavior.
However, we have at least one customer who does use Windows systems 
where their hard drives are case-sensitive.  Thus, they desire this ability.


The first implementation of Windows support in libiberty/lrealpath.c was 
added back in 2004.  The new code included a call to GetFullPathName(), 
to Canonicalize the filename.  Next,  if there was sufficient room in 
the buffer, the following code was run:


    /* The file system is case-preserving but case-insensitive,
   Canonicalize to lowercase, using the codepage associated
   with the process locale.  */
    CharLowerBuff (buf, len);
    return strdup (buf);

In effect, the assumption of the code is that all Windows file systems 
will be case-preserving but case-insensitive, so converting a filename 
to lowercase is not a problem.  And tools would always find the 
resulting file on the file system.  That turns out not to be true, but 
lrealpath() was mostly used just for system header files, so no one noticed.


However, in April 2014, libcpp was patched, to cause even non-system 
headers on Windows systems to be Canonicalized.  This patch has caused 
problems for users that have case-sensitive file systems on their 
Windows systems.  A patch to libcpp was proposed to do additional 
Canonicalize non-system headers on Windows systems.
The discussion on the patch starts at 
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg9.html

As is noted in the comments:
  For DOS based file system, we always try to shorten non-system 
headers, as DOS has a tighter constraint on max path length.
The okay to add the patch was given May 9, 2014 at 
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00557.html , with the note:
  I've spoke with Kai a bit about this and he thinks it's 
appropriate and desirable to shorten paths on these kinds of filesystems.


The libcpp change meant that lrealpath() began to get called for 
non-system header files.  There is other code in both bfd and libiberty 
that can reach the lrealpath() function, but apparently those functions 
are not as much of a problem, as in not really being used in most tools 
(obviously since our customer had no complaints before 2014).


What I am proposing is to modify lrealpath() to only call CharLowerBuff 
when the user has a filesystem where changing the case of the filename 
is not an issue.

That is actually most users, so the default should be to call CharLowerBuff.
But I want to give the user the ability to control that behavior.

My proposal is to change that section of code in libiberty/lrealpath.c 
as follows:


 else
   {
-   /* The file system is case-preserving but case-insensitive,
-  Canonicalize to lowercase, using the codepage associated
+   /* The file system is normally case-preserving but case-insensitive,
+  Canonicalize to lowercase if desired, using the codepage 
associated

   with the process locale.  */
-    CharLowerBuff (buf, len);
+   if (libiberty_lowerpath)
+  CharLowerBuff (buf, len);
 return strdup (buf);
   }

In effect, this just adds a control to let the user decide whether or 
not a pathname is converted to lowercase on Windows systems.


I also added a global definition for that control at the start of the 
libiberty/lrealpath.c source, setting it so the default behavior on 
Windows is to convert pathnames to lowercase:


+/* External option to control whether a pathname is converted
+   to all lower-case on Windows systems.  The default behavior
+   is to convert.
+*/
+#if defined (_WIN32)
+#ifdef __cplusplus
+extern "C" {
+#endif
+unsigned char libiberty_lowerpath = 1;
+#ifdef __cplusplus
+}
+#endif
+#endif

And, for use by tools that link to libiberty.a, I added an external 
reference to that control in include/libiberty.h:


+#if defined (_WIN32)
+/* Determine if filenames should be converted to lower case */
+extern unsigned char libiberty_lowerpath;
+#endif


Adding the above code to include/libiberty.h and libiberty/lrealpath.c 
results in a libiberty.a that behaves exactly the same as it does today 
for most users.
It also makes available a method of affecting whether or not a given 
tool affects canonicalized pathnames on Windows by also converting those 
pathnames to lowercase.


The questions to discuss:
1.    Is this a reasonable solution to this problem, by adding such a 
controlling symbol?
2.    Is it reasonable to use ‘libiberty_lowerpath’ as the symbol’s 
name?  There are other global symbols defined in libiberty that use a 
similar name, so this seems reasonable.
3.    Should the symbol be called ‘libiberty_lowerpath’ or something 
else, such as ‘libib

Re: [PATCH], Patch #3 of 10, Add prefixed addressing support

2019-08-15 Thread Bill Schmidt
On 8/14/19 5:06 PM, Michael Meissner wrote:
> This patch adds prefixed memory support to all offsettable instructions.
>
> Unlike previous versions of the patch, this patch combines all of the
> modifications for addressing to one patch.  Previously, I had 3 separate
> patches (one for PADDI, one for scalar types, and one for vector types).
>
> 2019-08-14   Michael Meissner  
>
>   * config/rs6000/predicates.md (add_operand): Add support for the
>   PADDI instruction.
>   (non_add_cint_operand): Add support for the PADDI instruction.
>   (lwa_operand): Add support for the prefixed PLWA instruction.
>   * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached):
>   Only treat modes < 16 bytes as scalars.
>   (rs6000_debug_print_mode): Print whether the mode supports
>   prefixed addressing.
>   (setup_insn_form): Enable prefixed addressing for all modes whose
>   default instruction form includes offset addressing.
>   (num_insns_constant_gpr): Add support for the PADDI instruction.
>   (quad_address_p): Add support for prefixed addressing.
>   (mem_operand_gpr): Add support for prefixed addressing.
>   (mem_operand_ds_form): Add support for prefixed addressing.
>   (rs6000_legitimate_offset_address_p): Add support for prefixed
>   addressing.
>   (rs6000_legitimate_address_p): Add support for prefixed
>   addressing.
>   (rs6000_mode_dependent_address): Add support for prefixed
>   addressing.
>   (rs6000_rtx_costs): Make PADDI cost the same as ADDI or ADDIS.
>   * config/rs6000/rs6000.md (add3): Add support for PADDI.
>   (movsi_internal1): Add support for prefixed addressing, and using
>   PADDI to load up large integers.
>   (movsi splitter): Do not split up a PADDI instruction.
>   (mov_64bit_dm): Add support for prefixed addressing.
>   (movtd_64bit_nodm): Add support for prefixed addressing.
>   (movdi_internal64): Add support for prefixed addressing, and using
>   PADDI to load up large integers.
>   (movdi splitter): Update comment about PADDI.
>   (stack_protect_setdi): Add support for prefixed addressing.
>   (stack_protect_testdi): Add support for prefixed addressing.
>   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
>   prefixed addressing.
>   (vsx_extract___load): Add support for prefixed
>   addressing.
>   (vsx_extract___load): Add support for prefixed
>   addressing.
>
> Index: gcc/config/rs6000/predicates.md
> ===
> --- gcc/config/rs6000/predicates.md   (revision 274174)
> +++ gcc/config/rs6000/predicates.md   (working copy)
> @@ -839,7 +839,8 @@
>  (define_predicate "add_operand"
>(if_then_else (match_code "const_int")
>  (match_test "satisfies_constraint_I (op)
> -  || satisfies_constraint_L (op)")
> +  || satisfies_constraint_L (op)
> +  || satisfies_constraint_eI (op)")
>  (match_operand 0 "gpc_reg_operand")))
>
>  ;; Return 1 if the operand is either a non-special register, or 0, or -1.
> @@ -852,7 +853,8 @@
>  (define_predicate "non_add_cint_operand"
>(and (match_code "const_int")
> (match_test "!satisfies_constraint_I (op)
> - && !satisfies_constraint_L (op)")))
> + && !satisfies_constraint_L (op)
> + && !satisfies_constraint_eI (op)")))
>
>  ;; Return 1 if the operand is a constant that can be used as the operand
>  ;; of an AND, OR or XOR.
> @@ -933,6 +935,13 @@
>  return false;
>
>addr = XEXP (inner, 0);
> +
> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> + the offset must be 0.  The prefixed PLWA does not have this
> + restriction.  */
> +  if (prefixed_local_addr_p (addr, mode, INSN_FORM_DS))
> +return true;
> +
>if (GET_CODE (addr) == PRE_INC
>|| GET_CODE (addr) == PRE_DEC
>|| (GET_CODE (addr) == PRE_MODIFY
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 274175)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -1828,7 +1828,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, mac
>
>if (ALTIVEC_REGNO_P (regno))
>   {
> -   if (GET_MODE_SIZE (mode) != 16 && !reg_addr[mode].scalar_in_vmx_p)
> +   if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p)
>   return 0;

Unrelated change?  I don't quite understand why it was changed, either. 
Is this to do with vector_pair support?  If so, maybe it belongs with a
different patch?
>
> return ALTIVEC_REGNO_P (last_regno);
> @@ -2146,6 +2146,11 @@ rs6000_debug_print_mode (ssize_t m)
>rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_FPR]),
>rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_VMX]));
>
> +  if (reg_addr[m].prefixed_mem

Re: [PATCH], Patch #1 replacement (fix issues with future TLS patches)

2019-08-15 Thread Bill Schmidt
Hi Mike, just a couple points from me...

On 8/15/19 4:19 PM, Michael Meissner wrote:


> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 274172)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -369,8 +369,11 @@ struct rs6000_reg_addr {
>enum insn_code reload_fpr_gpr; /* INSN to move from FPR to GPR.  */
>enum insn_code reload_gpr_vsx; /* INSN to move from GPR to VSX.  */
>enum insn_code reload_vsx_gpr; /* INSN to move from VSX to GPR.  */
> +  enum insn_form default_insn_form;  /* Default format for offsets.  */
> +  enum insn_form insn_form[(int)N_RELOAD_REG]; /* Register insn format.  */
>addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
>bool scalar_in_vmx_p;  /* Scalar value can go in VMX.  
> */
> +  bool prefixed_memory_p;/* We can use prefixed memory.  */
>  };
>
>  static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
> @@ -2053,6 +2056,28 @@ rs6000_debug_vector_unit (enum rs6000_ve
>return ret;
>  }
>
> +/* Return a character that can be printed out to describe an instruction
> +   format.  */
> +
> +DEBUG_FUNCTION char
> +rs6000_debug_insn_form (enum insn_form iform)
> +{
> +  char ret;
> +
> +  switch (iform)
> +{
> +case INSN_FORM_UNKNOWN:  ret = '-'; break;
> +case INSN_FORM_D:ret = 'd'; break;
> +case INSN_FORM_DS:   ret = 's'; break;
> +case INSN_FORM_DQ:   ret = 'q'; break;
> +case INSN_FORM_X:ret = 'x'; break;
> +case INSN_FORM_PREFIXED: ret = 'p'; break;
> +default: ret = '?'; break;
> +}
> +
> +  return ret;
> +}
> +
>  /* Inner function printing just the address mask for a particular reload
> register class.  */
>  DEBUG_FUNCTION char *
> @@ -2115,6 +2140,12 @@ rs6000_debug_print_mode (ssize_t m)
>  fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
>rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
>
> +  fprintf (stderr, "  Format: %c:%c%c%c",
> +  rs6000_debug_insn_form (reg_addr[m].default_insn_form),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_GPR]),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_FPR]),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_VMX]));
> +
>if ((reg_addr[m].reload_store != CODE_FOR_nothing)
>|| (reg_addr[m].reload_load != CODE_FOR_nothing))
>  {
> @@ -2668,6 +2699,153 @@ rs6000_setup_reg_addr_masks (void)
>  }
>  }
>
> +/* Set up the instruction format for each mode and register type from the
> +   addr_mask.  */
> +
> +static void
> +setup_insn_form (void)
> +{
> +  for (ssize_t m = 0; m < NUM_MACHINE_MODES; ++m)
> +{
> +  machine_mode scalar_mode = (machine_mode) m;
> +
> +  /* Convert complex and IBM double double/_Decimal128 into their scalar
> +  parts that the registers will be split into for doing load or
> +  store.  */
> +  if (COMPLEX_MODE_P (scalar_mode))
> + scalar_mode = GET_MODE_INNER (scalar_mode);
> +
> +  if (FLOAT128_2REG_P (scalar_mode))
> + scalar_mode = DFmode;
> +
> +  for (ssize_t rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; 
> rc++)
> + {
> +   machine_mode single_reg_mode = scalar_mode;
> +   size_t msize = GET_MODE_SIZE (scalar_mode);
> +   addr_mask_type addr_mask = reg_addr[scalar_mode].addr_mask[rc];
> +   enum insn_form iform = INSN_FORM_UNKNOWN;
> +
> +   /* Is the mode permitted in the GPR/FPR/Altivec registers?  */
> +   if ((addr_mask & RELOAD_REG_VALID) != 0)

To help with readability and maintainability, may I suggest factoring
the following into a separate function...
> + {
> +   /* The addr_mask does not have the offsettable or indexed bits
> +  set for modes that are split into multiple registers (like
> +  IFmode).  It doesn't need this set, since typically by time it
> +  is used in secondary reload, the modes are split into
> +  component parts.
> +
> +  The instruction format however can be used earlier in the
> +  compilation, so we need to setup what kind of instruction can
> +  be generated for the modes that are split.  */
> +   if ((addr_mask & (RELOAD_REG_MULTIPLE
> + | RELOAD_REG_OFFSET
> + | RELOAD_REG_INDEXED)) == RELOAD_REG_MULTIPLE)
> + {
> +   /* Multiple register types in GPRs depend on whether we can
> +  use DImode in a single register or SImode.  */
> +   if (rc == RELOAD_REG_GPR)
> + {
> +   if (TARGET_POWERPC64)
> + {
> +   gcc_assert ((msize % 8) == 0);
> +   single_reg_mode = DImode;
> +

Re: [patch][aarch64]: add intrinsics for vld1(q)_x4 and vst1(q)_x4

2019-08-15 Thread Jason Merrill

On 8/6/19 5:51 AM, Richard Earnshaw (lists) wrote:

On 18/07/2019 18:18, James Greenhalgh wrote:

On Mon, Jun 10, 2019 at 06:21:05PM +0100, Sylvia Taylor wrote:

Greetings,

This patch adds the intrinsic functions for:
- vld1__x4
- vst1__x4
- vld1q__x4
- vst1q__x4

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk? If yes, I don't have any commit rights, so can someone
please commit it on my behalf.


Hi,

I'm concerned by this strategy for implementing the arm_neon.h builtins:


+__extension__ extern __inline int8x8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x4 (const int8_t *__a)
+{
+  union { int8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
+  __au.__o
+    = __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi 
*) __a);

+  return __au.__i;
+}


As far as I know this is undefined behaviour in C++11. This was the best
resource I could find pointing to the relevant standards paragraphs.

   
https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior 



That said, GCC explicitly allows it, so maybe this is fine?

   
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Optimize-Options.html#Type-punning 



Can anyone from the languages side chime in on whether we're exposing
undefined behaviour (in either C or C++) here?


Yes, this is a GNU extension.  My only question is whether or not this 
can be disabled within GCC if you're trying to check for strict 
standards conformance of your code?


It's undefined behavior: doing something reasonable is a conformant 
interpretation of undefined behavior.


I don't imagine that ubsan checks for this case, but it's possible.


And if so, is there a way of making sure that this header still works in that 
case?


The well-defined solution is memcpy.  Or, in C++20, bit_cast (not 
implemented yet).


Jason


Re: C++ PATCH for c++/91264 - detect modifying const objects in constexpr

2019-08-15 Thread Jason Merrill

On 8/15/19 5:34 PM, Marek Polacek wrote:

On Wed, Aug 14, 2019 at 02:50:13PM -0400, Jason Merrill wrote:

On Thu, Aug 8, 2019 at 3:25 PM Marek Polacek  wrote:


On Thu, Aug 08, 2019 at 11:06:17AM -0400, Jason Merrill wrote:

On 8/6/19 3:20 PM, Marek Polacek wrote:

On Mon, Aug 05, 2019 at 03:54:19PM -0400, Jason Merrill wrote:

On 7/31/19 3:26 PM, Marek Polacek wrote:

One of the features of constexpr is that it doesn't allow UB; and such UB must
be detected at compile-time.  So running your code in a context that requires
a constant expression should ensure that the code in question is free of UB.
In effect, constexpr can serve as a sanitizer.  E.g. this article describes in
in more detail:


[dcl.type.cv]p4 says "Any attempt to modify a const object during its lifetime
results in undefined behavior." However, as the article above points out, we
aren't detecting that case in constexpr evaluation.

This patch fixes that.  It's not that easy, though, because we have to keep in
mind [class.ctor]p5:
"A constructor can be invoked for a const, volatile or const volatile object.
const and volatile semantics are not applied on an object under construction.
They come into effect when the constructor for the most derived object ends."

I handled this by keeping a hash set which tracks objects under construction.
I considered other options, such as going up call_stack, but that wouldn't
work with trivial constructor/op=.  It was also interesting to find out that
the definition of TREE_HAS_CONSTRUCTOR says "When appearing in a FIELD_DECL,
it means that this field has been duly initialized in its constructor" though
nowhere in the codebase do we set TREE_HAS_CONSTRUCTOR on a FIELD_DECL as far
as I can see.  Unfortunately, using this bit proved useless for my needs here.



Also, be mindful of mutable subobjects.

Does this approach look like an appropriate strategy for tracking objects'
construction?


For scalar objects, we should be able to rely on INIT_EXPR vs. MODIFY_EXPR
to distinguish between initialization and modification; for class objects, I


This is already true: only class object go into the hash set.


wonder about setting a flag on the CONSTRUCTOR after initialization is
complete to indicate that the value is now constant.


But here we're not dealing with CONSTRUCTORs in the gcc sense (i.e. exprs with
TREE_CODE == CONSTRUCTOR).  We have a CALL_EXPR like Y::Y ((struct Y *) &y),
which initializes the object "y".  Setting a flag on the CALL_EXPR or its 
underlying
function decl wouldn't help.

Am I missing something?


I was thinking that where in your current patch you call
remove_object_under_construction, we could instead mark the object's value
CONSTRUCTOR as immutable.


Ah, what you meant was to look at DECL_INITIAL of the object we're
constructing, which could be a CONSTRUCTOR.  Unfortunately, this
DECL_INITIAL is null (in all the new tests when doing
remove_object_under_construction), so there's nothing to mark as TREE_READONLY 
:/.


There's a value in ctx->values, isn't there?


Doesn't seem to be the case for e.g.

struct A {
   int n;
   constexpr A() : n(1) { n = 2; }
};

struct B {
   const A a;
   constexpr B(bool b) {
 if (b)
   const_cast(a).n = 3; // { dg-error "modifying a const object" }
 }
};

constexpr B b(false);
static_assert(b.a.n == 2, "");

Here we're constructing "b", its ctx->values->get(new_obj) is initially
"{}".  In the middle of constructing "b", we construct "b.a", but that
has nothing in ctx->values.


Right, subobjects aren't in ctx->values.  In cxx_eval_call_expression we 
have


  if (DECL_CONSTRUCTOR_P (fun))
/* This can be null for a subobject constructor call, in 

   which case what we care about is the initialization 

   side-effects rather than the value.  We could get at the 

   value by evaluating *this, but we don't bother; there's 


   no need to put such a call in the hash table.  */
result = lval ? ctx->object : ctx->ctor;

Your patch already finds *this (b.a) and puts it in new_obj; if it's 
const we can evaluate it to get the CONSTRUCTOR to set TREE_READONLY on.


Jason


Re: [PATCH], Patch #1 replacement (fix issues with future TLS patches)

2019-08-15 Thread Segher Boessenkool
Hi Mike,

On Thu, Aug 15, 2019 at 05:19:16PM -0400, Michael Meissner wrote:
> -;; Return true if the operand is a pc-relative address.
> +;; Return true if the operand is a pc-relative address to a local symbol.

The pcrel_addr_p comment says it is *not* just for local symbols.  So
which is it?

Having both something called "pcrel_address" and something called
"pcrel_addr_p" is confusing.  And "pcrel_addr_p" isn't a predicate at
all, so it should not be called that.  Or you can remove that "info"
argument, which is probably a good idea.

>  (define_predicate "pcrel_address"
>(match_code "label_ref,symbol_ref,const")
>  {
> +  return pcrel_addr_p (op, true, false, PCREL_NULL);
>  })

Please avoid boolean arguments altogether; it isn't clear at all what
they mean here.

Ah, they say only locals are allowed here.  So this RTL predicate
shouldn't be called "pcrel_address"; it should have "local" in the name
somewhere.

>  ;; Return 1 if op is a prefixed memory operand.
>  (define_predicate "prefixed_mem_operand"
>(match_code "mem")
>  {
> -  return rs6000_prefixed_address_mode_p (XEXP (op, 0), GET_MODE (op));
> +  return prefixed_local_addr_p (XEXP (op, 0), mode, INSN_FORM_UNKNOWN);
>  })

Similar issues with "local" here.

>  (define_predicate "pcrel_external_mem_operand"
>(match_code "mem")
>  {
> -  return pcrel_external_address (XEXP (op, 0), Pmode);
> +  return pcrel_addr_p (XEXP (op, 0), false, true, PCREL_NULL);
>  })

Why this change?

> +/* Pc-relative address broken into component parts by pcrel_addr_p.  */
> +typedef struct {
> +  rtx base_addr; /* SYMBOL_REF or LABEL_REF.  */
> +  HOST_WIDE_INT offset;  /* Offset from the base address.  */
> +  bool external_p;   /* Is the symbol external?  */
> +} pcrel_info_type;

Don't use typedefs please.

Don't call booleans xxx_p; xxx_p is a name used for a predicate, that
is, a pure (or "const" in GCC terms) function returning a boolean.

Don't name types "*_type".

> +#define PCREL_NULL ((pcrel_info_type *)0)

Please just use NULL where you use this.  (Or 0 as far as I care, but
that's not the GCC coding style :-) ).

> --- gcc/config/rs6000/rs6000.c(revision 274172)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -369,8 +369,11 @@ struct rs6000_reg_addr {
>enum insn_code reload_fpr_gpr; /* INSN to move from FPR to GPR.  */
>enum insn_code reload_gpr_vsx; /* INSN to move from GPR to VSX.  */
>enum insn_code reload_vsx_gpr; /* INSN to move from VSX to GPR.  */
> +  enum insn_form default_insn_form;  /* Default format for offsets.  */
> +  enum insn_form insn_form[(int)N_RELOAD_REG]; /* Register insn format.  */
>addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */

Why the casts here?  Not all places use this cast, so why is it needed
here and not in all cases?

> +/* Return a character that can be printed out to describe an instruction
> +   format.  */
> +
> +DEBUG_FUNCTION char
> +rs6000_debug_insn_form (enum insn_form iform)
> +{
> +  char ret;
> +
> +  switch (iform)
> +{
> +case INSN_FORM_UNKNOWN:  ret = '-'; break;
> +case INSN_FORM_D:ret = 'd'; break;
> +case INSN_FORM_DS:   ret = 's'; break;
> +case INSN_FORM_DQ:   ret = 'q'; break;
> +case INSN_FORM_X:ret = 'x'; break;
> +case INSN_FORM_PREFIXED: ret = 'p'; break;
> +default: ret = '?'; break;
> +}
> +
> +  return ret;
> +}

This doesn't follow the coding style.

> +  fprintf (stderr, "  Format: %c:%c%c%c",
> +  rs6000_debug_insn_form (reg_addr[m].default_insn_form),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_GPR]),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_FPR]),
> +  rs6000_debug_insn_form (reg_addr[m].insn_form[RELOAD_REG_VMX]));

Is this useful?  For others I mean, not just for you.

> +/* Set up the instruction format for each mode and register type from the
> +   addr_mask.  */
> +
> +static void
> +setup_insn_form (void)
> +{
> +  for (ssize_t m = 0; m < NUM_MACHINE_MODES; ++m)

Why ssize_t?  Most places just use int.

> +{
> +  machine_mode scalar_mode = (machine_mode) m;
> +
> +  /* Convert complex and IBM double double/_Decimal128 into their scalar
> +  parts that the registers will be split into for doing load or
> +  store.  */
> +  if (COMPLEX_MODE_P (scalar_mode))
> + scalar_mode = GET_MODE_INNER (scalar_mode);

Do you also need to handle some vector modes here?

> +  if (FLOAT128_2REG_P (scalar_mode))
> + scalar_mode = DFmode;
> +
> +  for (ssize_t rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; 
> rc++)

(overwide line)

> + {
> +   machine_mode single_reg_mode = scalar_mode;
> +   size_t msize = GET_MODE_SIZE (scalar_mode);
> +   addr_mask_type addr_mask = reg_addr[scalar_mode].addr_mask[rc];
> +   enum insn_form iform = INSN_FORM_UNKNOWN;
> +
> +   /* Is

Re: [patch, fortran] Fix PR 91443

2019-08-15 Thread Thomas Koenig

Hi Janne,


The patch itself looks Ok. One worry, are you introducing an
O(N**2)(?) algorithm (looping over all symbols for every symbol?), and
does this cause performance issues when compiling some gigantic F77
project?


This is a single pass over the code, so O(N) for the code size.
The lookup is O(log M), where M is the number of global symbols
in the file, so altogether, we're at O(N*log M), which should be OK.

Thanks for the review.

Committed as r274551.

I will add the documentation about the change in behavior later.

Regards

Thomas


Re: [PATCH 0/8] eBPF support for GCC

2019-08-15 Thread Jose E. Marchesi


Hi Richard.

> . Dynamic stack allocation (alloca and VLAs) is achieved by using what
>   otherwise would be a perfectly normal general register, %r9, as a
>   pseudo stack pointer.  This has the disadvantage of making the
>   register "fixed" and therefore not available for general register
>   allocation.  Hopefully there is a way to conditionalize this, since
>   both alloca and VLAs are relatively uncommon; I haven't found it
>   yet.

In principle it's possible to define register eliminations for
target-specific registers as well as the usual
FRAME/ARG_POINTER_REGNUM crowd.

Yeah, before I started using %r9 as a stack pointer, I was indeed
"eliminating" a fake stack pointer hard register to the frame register,
i.e. the opposite of what is usually done.

That seemed to work well, but as soon as __builtin_alloca and/or VLAs
were used, lra-eliminations would enter into an infinite loop: it didn't
like the stack pointer being eliminated.

So you could have a fake fixed register to represent the pseudo
stack pointer, then allow that to be "eliminated" to %r9 in
functions that need it.  Functions that don't need it can continue
(not) using the fake register and leave %r9 free for general use.

Interesting idea...  but wouldn't that require to have %r9 declared as a
fixed register, in functions that cfun->calls_alloca?

After reading your reply I investigated a bit, and found out that
CONDITIONAL_REGISTER_USAGE can indeed be called at pleasure, via
reinit_regs(). The i386 port calls reinit_regs in set_current_function,
for example.

So it should be possible to declare %r9 as fixed or non-fixed, in
bpf_set_current_function, depending on the value of
cfun->calls_alloca...


Re: Add TIGERLAKE and COOPERLAKE to GCC

2019-08-15 Thread H.J. Lu
On Wed, Aug 14, 2019 at 11:04 AM Jeff Law  wrote:
>
> On 8/14/19 1:38 AM, Cui, Lili wrote:
> > Resend this mail for GCC Patches rejected my message, thanks.
> >
> > -Original Message-
> >
> > Hi Uros and all:
> >
> > This patch is about to add TIGERLAKE and COOPERLAKE to GCC.
> > TIGERLAKE is based on ICELAKE_CLIENT and plus new ISA 
> > MOVEDIRI/MOVDIR64B/AVX512VP2INTERSECT.
> > COOPERLAKE is based on CASCADELAKE and plus new ISA AVX512BF16.
> >
> > Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> >
> > Changelog:
> > gcc/
> >   * common/config/i386/i386-common.c
> >   (processor_names): Add tigerlake and cooperlake.
> >   (processor_alias_table): Add tigerlake and cooperlake.
> >   * config.gcc: Add -march=tigerlake and cooperlake.
> >   * config/i386/driver-i386.c
> >(host_detect_local_cpu): Detect tigerlake and cooperlake.
> >   * config/i386/i386-builtins.c
> >   (processor_model) : Add M_INTEL_COREI7_TIGERLAKE and 
> > M_INTEL_COREI7_COOPERLAKE.
> >   (arch_names_table): Add tigerlake and cooperlake.
> >   (get_builtin_code_for_version) : Handle PROCESSOR_TIGERLAKE and 
> > PROCESSOR_COOPERLAKE.
> >   * config/i386/i386-c.c
> >   (ix86_target_macros_internal): Handle tigerlake and cooperlake.
> >   (ix86_target_macros_internal): Handle 
> > OPTION_MASK_ISA_AVX512VP2INTERSECT.
> >   * config/i386/i386-options.c
> >   (m_TIGERLAKE)  : Define.
> >   (m_COOPERLAKE) : Ditto.
> >   (m_CORE_AVX512): Ditto.
> >   (processor_cost_table): Add cascadelake.
> >   (ix86_target_string)  : Handle -mavx512vp2intersect.
> >   (ix86_valid_target_attribute_inner_p) : Handle avx512vp2intersect.
> >   (ix86_option_override_internal): Hadle PTA_SHSTK, PTA_MOVDIRI,
> >PTA_MOVDIR64B, PTA_AVX512VP2INTERSECT.
> >   * config/i386/i386.h
> >   (ix86_size_cost) : Define TARGET_TIGERLAKE and TARGET_COOPERLAKE.
> >   (processor_type) : Add PROCESSOR_TIGERLAKE and PROCESSOR_COOPERLAKE.
> >   (PTA_SHSTK) : Define.
> >   (PTA_MOVDIRI): Ditto.
> >   (PTA_MOVDIR64B): Ditto.
> >   (PTA_COOPERLAKE) : Ditto.
> >   (PTA_TIGERLAKE)  : Ditto.
> >   (TARGET_AVX512VP2INTERSECT) : Ditto.
> >   (TARGET_AVX512VP2INTERSECT_P(x)) : Ditto.
> >   (processor_type) : Add PROCESSOR_TIGERLAKE and PROCESSOR_COOPERLAKE.
> >   * doc/extend.texi: Add tigerlake and cooperlake.
> >
> > gcc/testsuite/
> >   * gcc.target/i386/funcspec-56.inc: Handle new march.
> >   * g++.target/i386/mv16.C: Handle new march
> >
> > libgcc/
> >   * config/i386/cpuinfo.h: Add INTEL_COREI7_TIGERLAKE and 
> > INTEL_COREI7_COOPERLAKE.
> >
> ENOPATCH
>
> Note that HJ's reworking of the cost tables may require this patch to
> change for the trunk.
>

Yes, I have checked in my patch.  Please rebase.

Thanks.

-- 
H.J.


[C++ PATCH] PR c++/90393 - ICE with thow in ?:

2019-08-15 Thread Jason Merrill
My previous patch for 64372 was incomplete: it only stopped making the
non-throw argument into an rvalue, lvalue_kind still considered the ?:
expression to be an rvalue, leaving us worse than before.

For GCC 9 I lean toward reverting the earlier patch rather than applying this
one and thus changing the meaning of well-formed code in the middle of a
release series.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/64372, DR 1560 - Gratuitous lvalue-to-rvalue conversion in ?:
* tree.c (lvalue_kind): Handle throw in one arm.
* typeck.c (rationalize_conditional_expr): Likewise.
(cp_build_modify_expr): Likewise.
---
 gcc/cp/tree.c| 21 
 gcc/cp/typeck.c  | 24 +++
 gcc/testsuite/g++.dg/abi/mangle53.C  |  5 ++--
 gcc/testsuite/g++.dg/expr/cond15.C   | 13 ++
 gcc/testsuite/g++.dg/expr/cond16.C   | 25 
 gcc/testsuite/g++.old-deja/g++.eh/cond1.C|  4 ++--
 gcc/testsuite/g++.old-deja/g++.other/cond5.C |  4 ++--
 gcc/cp/ChangeLog |  9 +++
 8 files changed, 85 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/expr/cond15.C
 create mode 100644 gcc/testsuite/g++.dg/expr/cond16.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index bca92100621..17a4df380c1 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -236,10 +236,23 @@ lvalue_kind (const_tree ref)
  gcc_assert (!type_dependent_expression_p (CONST_CAST_TREE (ref)));
  goto default_;
}
-  op1_lvalue_kind = lvalue_kind (TREE_OPERAND (ref, 1)
-   ? TREE_OPERAND (ref, 1)
-   : TREE_OPERAND (ref, 0));
-  op2_lvalue_kind = lvalue_kind (TREE_OPERAND (ref, 2));
+  {
+   tree op1 = TREE_OPERAND (ref, 1);
+   if (!op1) op1 = TREE_OPERAND (ref, 0);
+   tree op2 = TREE_OPERAND (ref, 2);
+   op1_lvalue_kind = lvalue_kind (op1);
+   op2_lvalue_kind = lvalue_kind (op2);
+   if (!op1_lvalue_kind != !op2_lvalue_kind)
+ {
+   /* The second or the third operand (but not both) is a
+  throw-expression; the result is of the type
+  and value category of the other.  */
+   if (op1_lvalue_kind && TREE_CODE (op2) == THROW_EXPR)
+ op2_lvalue_kind = op1_lvalue_kind;
+   else if (op2_lvalue_kind && TREE_CODE (op1) == THROW_EXPR)
+ op1_lvalue_kind = op2_lvalue_kind;
+ }
+  }
   break;
 
 case MODOP_EXPR:
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 4cc0ee0128d..e2a4f285a72 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2308,13 +2308,15 @@ rationalize_conditional_expr (enum tree_code code, tree 
t,
 complain);
 }
 
+  tree op1 = TREE_OPERAND (t, 1);
+  if (TREE_CODE (op1) != THROW_EXPR)
+op1 = cp_build_unary_op (code, op1, false, complain);
+  tree op2 = TREE_OPERAND (t, 2);
+  if (TREE_CODE (op2) != THROW_EXPR)
+op2 = cp_build_unary_op (code, op2, false, complain);
+
   return
-build_conditional_expr (loc, TREE_OPERAND (t, 0),
-   cp_build_unary_op (code, TREE_OPERAND (t, 1), false,
-   complain),
-   cp_build_unary_op (code, TREE_OPERAND (t, 2), false,
-   complain),
-complain);
+build_conditional_expr (loc, TREE_OPERAND (t, 0), op1, op2, complain);
 }
 
 /* Given the TYPE of an anonymous union field inside T, return the
@@ -8160,8 +8162,9 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
if (!lvalue_or_else (lhs, lv_assign, complain))
  return error_mark_node;
 
-   tree op1 = cp_build_modify_expr (loc, TREE_OPERAND (lhs, 1),
-modifycode, rhs, complain);
+   tree op1 = TREE_OPERAND (lhs, 1);
+   if (TREE_CODE (op1) != THROW_EXPR)
+ op1 = cp_build_modify_expr (loc, op1, modifycode, rhs, complain);
/* When sanitizing undefined behavior, even when rhs doesn't need
   stabilization at this point, the sanitization might add extra
   SAVE_EXPRs in there and so make sure there is no tree sharing
@@ -8170,8 +8173,9 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
if (sanitize_flags_p (SANITIZE_UNDEFINED
  | SANITIZE_UNDEFINED_NONDEFAULT))
  rhs = unshare_expr (rhs);
-   tree op2 = cp_build_modify_expr (loc, TREE_OPERAND (lhs, 2),
-modifycode, rhs, complain);
+   tree op2 = TREE_OPERAND (lhs, 2);
+   if (TREE_CODE (op2) != THROW_EXPR)
+ op2 = cp_build_modify_expr (loc, op2, modifycode, rhs, complain);
tree cond = build_conditional_expr (input_location,
 

Re: C++ PATCH for c++/91264 - detect modifying const objects in constexpr

2019-08-15 Thread Marek Polacek
On Wed, Aug 14, 2019 at 02:50:13PM -0400, Jason Merrill wrote:
> On Thu, Aug 8, 2019 at 3:25 PM Marek Polacek  wrote:
> >
> > On Thu, Aug 08, 2019 at 11:06:17AM -0400, Jason Merrill wrote:
> > > On 8/6/19 3:20 PM, Marek Polacek wrote:
> > > > On Mon, Aug 05, 2019 at 03:54:19PM -0400, Jason Merrill wrote:
> > > > > On 7/31/19 3:26 PM, Marek Polacek wrote:
> > > > > > One of the features of constexpr is that it doesn't allow UB; and 
> > > > > > such UB must
> > > > > > be detected at compile-time.  So running your code in a context 
> > > > > > that requires
> > > > > > a constant expression should ensure that the code in question is 
> > > > > > free of UB.
> > > > > > In effect, constexpr can serve as a sanitizer.  E.g. this article 
> > > > > > describes in
> > > > > > in more detail:
> > > > > > 
> > > > > >
> > > > > > [dcl.type.cv]p4 says "Any attempt to modify a const object during 
> > > > > > its lifetime
> > > > > > results in undefined behavior." However, as the article above 
> > > > > > points out, we
> > > > > > aren't detecting that case in constexpr evaluation.
> > > > > >
> > > > > > This patch fixes that.  It's not that easy, though, because we have 
> > > > > > to keep in
> > > > > > mind [class.ctor]p5:
> > > > > > "A constructor can be invoked for a const, volatile or const 
> > > > > > volatile object.
> > > > > > const and volatile semantics are not applied on an object under 
> > > > > > construction.
> > > > > > They come into effect when the constructor for the most derived 
> > > > > > object ends."
> > > > > >
> > > > > > I handled this by keeping a hash set which tracks objects under 
> > > > > > construction.
> > > > > > I considered other options, such as going up call_stack, but that 
> > > > > > wouldn't
> > > > > > work with trivial constructor/op=.  It was also interesting to find 
> > > > > > out that
> > > > > > the definition of TREE_HAS_CONSTRUCTOR says "When appearing in a 
> > > > > > FIELD_DECL,
> > > > > > it means that this field has been duly initialized in its 
> > > > > > constructor" though
> > > > > > nowhere in the codebase do we set TREE_HAS_CONSTRUCTOR on a 
> > > > > > FIELD_DECL as far
> > > > > > as I can see.  Unfortunately, using this bit proved useless for my 
> > > > > > needs here.
> > > > >
> > > > > > Also, be mindful of mutable subobjects.
> > > > > >
> > > > > > Does this approach look like an appropriate strategy for tracking 
> > > > > > objects'
> > > > > > construction?
> > > > >
> > > > > For scalar objects, we should be able to rely on INIT_EXPR vs. 
> > > > > MODIFY_EXPR
> > > > > to distinguish between initialization and modification; for class 
> > > > > objects, I
> > > >
> > > > This is already true: only class object go into the hash set.
> > > >
> > > > > wonder about setting a flag on the CONSTRUCTOR after initialization is
> > > > > complete to indicate that the value is now constant.
> > > >
> > > > But here we're not dealing with CONSTRUCTORs in the gcc sense (i.e. 
> > > > exprs with
> > > > TREE_CODE == CONSTRUCTOR).  We have a CALL_EXPR like Y::Y ((struct Y *) 
> > > > &y),
> > > > which initializes the object "y".  Setting a flag on the CALL_EXPR or 
> > > > its underlying
> > > > function decl wouldn't help.
> > > >
> > > > Am I missing something?
> > >
> > > I was thinking that where in your current patch you call
> > > remove_object_under_construction, we could instead mark the object's value
> > > CONSTRUCTOR as immutable.
> >
> > Ah, what you meant was to look at DECL_INITIAL of the object we're
> > constructing, which could be a CONSTRUCTOR.  Unfortunately, this
> > DECL_INITIAL is null (in all the new tests when doing
> > remove_object_under_construction), so there's nothing to mark as 
> > TREE_READONLY :/.
> 
> There's a value in ctx->values, isn't there?

Doesn't seem to be the case for e.g.

struct A {
  int n;
  constexpr A() : n(1) { n = 2; }
};

struct B {
  const A a;
  constexpr B(bool b) {
if (b)
  const_cast(a).n = 3; // { dg-error "modifying a const object" }
}
};

constexpr B b(false);
static_assert(b.a.n == 2, "");

Here we're constructing "b", its ctx->values->get(new_obj) is initially
"{}".  In the middle of constructing "b", we construct "b.a", but that
has nothing in ctx->values.

Marek


[PATCH] PR fortran/82992 -- Check for conflicting symbols

2019-08-15 Thread Steve Kargl
The attached patch has be regression tested on x86_64-*-freebsd.

The testcase in the PR explains what the patch does.

% cat z1.f90
subroutine sub (x)
   use iso_fortran_env, only: x => character_kinds
end
%  gfcx -c a.f90
a.f90:1:17:

1 | subroutine sub (x)
  | 1
2 |use iso_fortran_env, only: x => character_kinds
  |   2
Error: Symbol 'x' at (1) conflicts with the rename symbol at (2)

OK to commit?

2019-08-15  Steven G. Kargl  

 PR fortran/82992
 * module.c (gfc_match_use):  When renaming a module entity, search
 current namespace for conflicting symbol.

2019-08-15  Steven G. Kargl  

 PR fortran/82992
 * gfortran.dg/pr71649.f90: Adjust error messages.
 * gfortran.dg/use_15.f90: Ditto.
 * gfortran.dg/use_rename_8.f90: Ditto.

-- 
Steve
Index: gcc/fortran/module.c
===
--- gcc/fortran/module.c	(revision 274495)
+++ gcc/fortran/module.c	(working copy)
@@ -525,6 +525,8 @@ gfc_match_use (void)
   gfc_intrinsic_op op;
   match m;
   gfc_use_list *use_list;
+  gfc_symtree *st;
+  locus loc;
 
   use_list = gfc_get_use_list ();
 
@@ -632,6 +634,8 @@ gfc_match_use (void)
 	case INTERFACE_USER_OP:
 	case INTERFACE_GENERIC:
 	case INTERFACE_DTIO:
+	  loc = gfc_current_locus;
+
 	  m = gfc_match (" =>");
 
 	  if (type == INTERFACE_USER_OP && m == MATCH_YES
@@ -641,6 +645,14 @@ gfc_match_use (void)
 
 	  if (type == INTERFACE_USER_OP)
 	new_use->op = INTRINSIC_USER;
+
+	  st = gfc_find_symtree (gfc_current_ns->sym_root, name);
+	  if (st)
+	{
+	  gfc_error ("Symbol %qs at %L conflicts with the rename symbol "
+			 "at %L", name, &st->n.sym->declared_at, &loc);
+	  goto cleanup;
+	}
 
 	  if (use_list->only_flag)
 	{
Index: gcc/testsuite/gfortran.dg/pr71649.f90
===
--- gcc/testsuite/gfortran.dg/pr71649.f90	(revision 274495)
+++ gcc/testsuite/gfortran.dg/pr71649.f90	(working copy)
@@ -1,13 +1,13 @@
 ! { dg-do compile }
 ! PR71649 Internal Compiler Error
-SUBROUTINE Compiler_Options ( Options, Version, WriteOpt )
-   USE ISO_FORTRAN_ENV, ONLY : Compiler_Version, Compiler_Options ! { dg-error "already declared" }
+SUBROUTINE Compiler_Options ( Options, Version, WriteOpt )! { dg-error "\(1\)" }
+   USE ISO_FORTRAN_ENV, ONLY : Compiler_Version, Compiler_Options ! { dg-error "conflicts with the rename" }
IMPLICIT NONE
CHARACTER (LEN=*), INTENT(OUT) :: Options
CHARACTER (LEN=*), INTENT(OUT) :: Version
LOGICAL, INTENT(IN), OPTIONAL  :: WriteOpt
-   Version = Compiler_Version()
-   Options = Compiler_Options() ! { dg-error "Unexpected use of subroutine name" }
+   Version = Compiler_Version()  ! { dg-error "has no IMPLICIT type" }
+   Options = Compiler_Options()  ! { dg-error "Unexpected use of subroutine name" }
RETURN
 END SUBROUTINE Compiler_Options
 
Index: gcc/testsuite/gfortran.dg/use_15.f90
===
--- gcc/testsuite/gfortran.dg/use_15.f90	(revision 274495)
+++ gcc/testsuite/gfortran.dg/use_15.f90	(working copy)
@@ -28,8 +28,8 @@ subroutine my_sub2 (a)
 end subroutine
 
 
-subroutine my_sub3 (a)
-  use test_mod2, my_sub3 => my_sub2  ! { dg-error "is also the name of the current program unit" }
+subroutine my_sub3 (a)  ! { dg-error "\(1\)" }
+  use test_mod2, my_sub3 => my_sub2 ! { dg-error "conflicts with the rename" }
   real a
   print *, a
 end subroutine
Index: gcc/testsuite/gfortran.dg/use_rename_8.f90
===
--- gcc/testsuite/gfortran.dg/use_rename_8.f90	(revision 274495)
+++ gcc/testsuite/gfortran.dg/use_rename_8.f90	(working copy)
@@ -19,8 +19,8 @@ SUBROUTINE T
 USE MOO, ONLY: X => B
 END SUBROUTINE T
 
-SUBROUTINE C
-USE MOO, ONLY: C  ! { dg-error "is also the name of the current program unit" }
+SUBROUTINE C  ! { dg-error "\(1\)" }
+USE MOO, ONLY: C  ! { dg-error "conflicts with the rename" }
 END SUBROUTINE C
 
 SUBROUTINE D
@@ -36,15 +36,15 @@ SUBROUTINE F
 USE MOO, ONLY: X => F
 END SUBROUTINE F
 
-SUBROUTINE X
-USE MOO, ONLY: X => G ! { dg-error "is also the name of the current program unit" }
+SUBROUTINE X  ! { dg-error "\(1\)" }
+USE MOO, ONLY: X => G ! { dg-error "conflicts with the rename" }
 END SUBROUTINE X
 
-SUBROUTINE Y
-USE MOO, ONLY: Y => H ! { dg-error "is also the name of the current program unit" }
+SUBROUTINE Y  ! { dg-error "\(1\)" }
+USE MOO, ONLY: Y => H ! { dg-error "conflicts with the rename" }
 END SUBROUTINE Y
 
-SUBROUTINE Z
-USE MOO, ONLY: Z => I, Z => I ! { dg-error "is also the name of the current program unit" }
+SUBROUTINE Z! { dg-error "\(1\)" }
+USE MOO, ONLY: Z => I, Z => I   ! { dg-error "conflicts with the rename" }
 END SUBROUTINE Z
 


[PATCH], Patch #1 replacement (fix issues with future TLS patches)

2019-08-15 Thread Michael Meissner
After I submitted the patches, Aaron Sawdey tested the branch that has
the patches on it, along with Alan's TLS patches.  Alan's patch causes
the functions that determine if the insn is prefixed or not to be run
earlier that before.  The compiler was dying because the virtual arg
pointer and frame pointer registers weren't eliminated at that point,
and because I was only checking if the regno was between 0 and 31 for
GPRs.

I rewrote the test in reg_to_insn_form to use INT_REGNO_P macro (which
includes tests for arg pointer and frame pointer virtual registers).  I
used the other two macros (FP_REGNO_P and ALTIVEC_REGNO_P) for
consistency.

In addition, I removed the gcc_unreachable call if the register class
is not a GPR, FPR, or VMX register, and used the GPR defaults.  This is
in case the function gets called in the middle of reload where the
final moves are not done.

This patch replaces patch #1.  I have bootstrapped the compiler with
these changes and verified it fixed the problem Aaron was seeing.  Can
I check this into the FSF trunk?

2019-08-15   Michael Meissner  

* config/rs6000/predicates.md (pcrel_address): Rewrite to use
pcrel_addr_p.
(pcrel_external_address): Rewrite to use pcrel_addr_p.
(prefixed_mem_operand): Rewrite to use prefixed_local_addr_p.
(pcrel_external_mem_operand): Rewrite to use pcrel_addr_p.
* config/rs6000/rs6000-protos.h (reg_to_insn_form): New
declaration.
(pcrel_info_type): New declaration.
(PCREL_NULL): New macro.
(pcrel_addr_p): New declaration.
(rs6000_prefixed_address_mode_p): Delete.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add fields for
instruction format and prefixed memory support.
(rs6000_debug_insn_form): New debug function.
(rs6000_debug_print_mode): Print instruction formats.
(setup_insn_form): New function.
(rs6000_init_hard_regno_mode_ok): Call setup_insn_form.
(print_operand_address): Call pcrel_addr_p instead of
pcrel_address.  Add support for external pc-relative labels.
(mode_supports_prefixed_address_p): Delete.
(rs6000_prefixed_address_mode_p): Delete, replace with
prefixed_local_addr_p.
(prefixed_local_addr_p): Replace rs6000_prefixed_address_mode_p.
Add argument to specify the instruction format.
(pcrel_addr_p): New function.
(reg_to_insn_form): New function.
* config/rs6000/rs6000.md (enum insn_form): New enumeration.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 274172)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1626,32 +1626,11 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
-;; Return true if the operand is a pc-relative address.
+;; Return true if the operand is a pc-relative address to a local symbol.
 (define_predicate "pcrel_address"
   (match_code "label_ref,symbol_ref,const")
 {
-  if (!rs6000_pcrel_p (cfun))
-return false;
-
-  if (GET_CODE (op) == CONST)
-op = XEXP (op, 0);
-
-  /* Validate offset.  */
-  if (GET_CODE (op) == PLUS)
-{
-  rtx op0 = XEXP (op, 0);
-  rtx op1 = XEXP (op, 1);
-
-  if (!CONST_INT_P (op1) || !SIGNED_34BIT_OFFSET_P (INTVAL (op1)))
-   return false;
-
-  op = op0;
-}
-
-  if (LABEL_REF_P (op))
-return true;
-
-  return (SYMBOL_REF_P (op) && SYMBOL_REF_LOCAL_P (op));
+  return pcrel_addr_p (op, true, false, PCREL_NULL);
 })
 
 ;; Return true if the operand is an external symbol whose address can be loaded
@@ -1665,32 +1644,14 @@ (define_predicate "pcrel_address"
 (define_predicate "pcrel_external_address"
   (match_code "symbol_ref,const")
 {
-  if (!rs6000_pcrel_p (cfun))
-return false;
-
-  if (GET_CODE (op) == CONST)
-op = XEXP (op, 0);
-
-  /* Validate offset.  */
-  if (GET_CODE (op) == PLUS)
-{
-  rtx op0 = XEXP (op, 0);
-  rtx op1 = XEXP (op, 1);
-
-  if (!CONST_INT_P (op1) || !SIGNED_34BIT_OFFSET_P (INTVAL (op1)))
-   return false;
-
-  op = op0;
-}
-
-  return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
+  return pcrel_addr_p (op, false, true, PCREL_NULL);
 })
 
 ;; Return 1 if op is a prefixed memory operand.
 (define_predicate "prefixed_mem_operand"
   (match_code "mem")
 {
-  return rs6000_prefixed_address_mode_p (XEXP (op, 0), GET_MODE (op));
+  return prefixed_local_addr_p (XEXP (op, 0), mode, INSN_FORM_UNKNOWN);
 })
 
 ;; Return 1 if op is a memory operand to an external variable when we
@@ -1699,7 +1660,7 @@ (define_predicate "prefixed_mem_operand"
 (define_predicate "pcrel_external_mem_operand"
   (match_code "mem")
 {
-  return pcrel_external_address (XEXP (op, 0), Pmode);
+  return pcrel_addr_p (XEXP (op, 0), false, true, PCREL_NULL);
 })
 
 ;; Match the first insn (addis) in fusing the combination of addis 

[PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Bernd Edlinger
On 8/15/19 6:29 PM, Richard Biener wrote:
>>>
>>> Please split it into the parts for the PR and parts making the
>>> asserts not trigger.
>>>
>>
>> Yes, will do.
>>

Okay, here is the rest of the PR 89544 fix,
actually just an optimization, making the larger stack alignment
known to the middle-end, and the test cases.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.
2019-08-15  Bernd Edlinger  

	PR middle-end/89544
	* function.c (assign_parm_find_stack_rtl): Use larger alignment
	when possible.

testsuite:
2019-08-15  Bernd Edlinger  

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===
--- gcc/function.c	(Revision 274531)
+++ gcc/function.c	(Arbeitskopie)
@@ -2697,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
  intentionally forcing upward padding.  Otherwise we have to come
  up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
 align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+{
+  align = boundary;
+  /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+  if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+}
   else if (poly_int_rtx_p (offset_rtx, &offset))
 {
   align = least_bit_hwi (boundary);
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */


[PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-15 Thread Bernd Edlinger
Hi,

this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 
(PR 89544)"
which is sanitizing the middle-end interface to the back-end for strict 
alignment,
and a couple of bug-fixes that are necessary to survive boot-strap.
It is intended to be applied after the PR 89544 fix.

I think it would be possible to change the default implementation of 
STACK_SLOT_ALIGNMENT
to make all stack variables always naturally aligned instead of doing that only
in assign_parm_setup_stack, but would still like to avoid changing too many 
things
that do not seem to have a problem.  Since this would affect many targets, and 
more
kinds of variables that may probably not have a strict alignment problem.
But I am ready to take your advice though.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
Is it OK for trunk?


Thanks
Bernd.

2019-08-15  Bernd Edlinger  
	Richard Biener  

	* expr.c (expand_assignment): Handle misaligned DECLs.
	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
	too.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (build_constant_desc): Align constants of misaligned types.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
	strict alignment restrictions on memory addresses.
	* config/arm/neon.md (movti, mov, mov): Likewise.
	* config/arm/vec-common.md (mov): Likewise.

Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md	(Revision 274531)
+++ gcc/config/arm/arm.md	(Arbeitskopie)
@@ -5838,6 +5838,12 @@
 	(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DImode));
   if (can_create_pseudo_p ())
 {
   if (!REG_P (operands[0]))
@@ -6014,6 +6020,12 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
 {
   /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6503,6 +6515,12 @@
 	(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HImode));
   if (TARGET_ARM)
 {
   if (can_create_pseudo_p ())
@@ -6912,6 +6930,12 @@
 	(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
@@ -6976,6 +7000,12 @@
 	(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
@@ -7071,6 +7101,12 @@
 	(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DFmode));
   if (TARGET_32BIT)
 {
   if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===
--- gcc/config/arm/neon.md	(Revision 274531)
+++ gcc/config/arm/neon.md	(Arbeitskopie)
@@ -127,6 +127,12 @@
 	(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (TImode));
   if (can_create_pseudo_p ())
 {
   if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
 	(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		   || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		   || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (mode));
   if (can_create_pseudo_p ())
 {

[PATCH] [LRA] Fix wrong-code PR 91109 take 2

2019-08-15 Thread Bernd Edlinger
Hi,

as discussed in the PR 91109 audit trail,
my previous patch missed a case where no spilling is necessary,
but the re-materialized instruction has now scratch regs without
a hard register assignment.  And thus the LRA pass falls out of
the loop pre-maturely.

Fixed by checking for scratch regs with no assignment
and continuing the loop in that case.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.
2019-08-12  Bernd Edlinger  

	PR tree-optimization/91109
	* lra-int.h (lra_need_for_scratch_reg_p): Declare.
	* lra.c (lra): Use lra_need_for_scratch_reg_p.
	* lra-spills.c (lra_need_for_scratch_reg_p): New function.

Index: gcc/lra-int.h
===
--- gcc/lra-int.h	(revision 274168)
+++ gcc/lra-int.h	(working copy)
@@ -396,6 +396,7 @@ extern bool lra_coalesce (void);
 
 /* lra-spills.c:  */
 
+extern bool lra_need_for_scratch_reg_p (void);
 extern bool lra_need_for_spills_p (void);
 extern void lra_spill (void);
 extern void lra_final_code_change (void);
Index: gcc/lra-spills.c
===
--- gcc/lra-spills.c	(revision 274168)
+++ gcc/lra-spills.c	(working copy)
@@ -549,6 +549,19 @@ spill_pseudos (void)
 }
 }
 
+/* Return true if we need scratch reg assignments.  */
+bool
+lra_need_for_scratch_reg_p (void)
+{
+  int i; max_regno = max_reg_num ();
+
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0
+	&& lra_former_scratch_p (i))
+  return true;
+  return false;
+}
+
 /* Return true if we need to change some pseudos into memory.  */
 bool
 lra_need_for_spills_p (void)
Index: gcc/lra.c
===
--- gcc/lra.c	(revision 274168)
+++ gcc/lra.c	(working copy)
@@ -2567,7 +2567,11 @@ lra (FILE *f)
 	  lra_create_live_ranges (lra_reg_spill_p, true);
 	  live_p = true;
 	  if (! lra_need_for_spills_p ())
-	break;
+	{
+	  if (lra_need_for_scratch_reg_p ())
+		continue;
+	  break;
+	}
 	}
   lra_spill ();
   /* Assignment of stack slots changes elimination offsets for


Re: [patch, fortran] Fix PR 91443

2019-08-15 Thread Janne Blomqvist
On Thu, Aug 15, 2019 at 2:35 PM Thomas Koenig  wrote:
>
> Hello world,
>
> this patch fixes PR 91443, in which we did not warn about a mismatched
> external procedure. The problem was that the module this was called in
> was resolved before parsing of the procedure ever started.
>
> The approach taken here is to move the checking of external procedures
> to a stage after normal resolution.  And, of course, fix the resulting
> fallout from regression-testing :-)
> There is also one policy change in the patch. Previously, we only warned
> about mismatched declarations.  Now, this is a hard error unless the
> user specifies -std=legacy.  The reason is that we have not yet solved
> our single declaration problem, but it cannot be solved unless all
> of a procedure's callers match.  People who have such broken code
> should at least be made aware that they have a problem. However, I would
> like to have some sort of agreement on this point before the patch
> is committed.  This can also be changed (see the code at the bottom
> of frontend-passes.c).

Personally, I'm fine with making this a hard error. As we've recently
seen with the varargs & missing charlengths in LAPACK saga, mismatches
can cause extremely subtle issues that can cause silent corruption and
take an expert to figure out.

> Once this is in, the next step is to issue errors for mismatching
> calls where the callee is not in the same file.  This can be done
> with the infrastructure of this patch.
>
> So, OK for trunk?

The patch itself looks Ok. One worry, are you introducing an
O(N**2)(?) algorithm (looping over all symbols for every symbol?), and
does this cause performance issues when compiling some gigantic F77
project?

If this worry is unfounded, Ok for trunk.

-- 
Janne Blomqvist


Re: PC-relative TLS support

2019-08-15 Thread Segher Boessenkool
Hi!

On Thu, Aug 15, 2019 at 01:35:10PM +0930, Alan Modra wrote:
> Supporting TLS for -mpcrel turns out to be relatively simple, in part
> due to deciding that !TARGET_TLS_MARKERS with -mpcrel is silly.  No
> assembler that I know of supporting prefix insns lacks TLS marker
> support.

Will this stay that way?  (Or do we not care, not now anyway?)

> Also, at some point powerpc gcc ought to remove
> !TARGET_TLS_MARKERS generally and simplify all the occurrences of
> IS_NOMARK_TLSGETADDR in rs6000.md rather than complicating them.

The last time this came up (a year ago) the conclusion was that we first
would have to remove AIX support.

> * config/rs6000/predicates.md (unspec_tls): Allow const0_rtx for got
>   element of unspec vec.
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Disable
>   -mpcrel if -mno-tls-markers.
>   (rs6000_legitimize_tls_address): Support PC-relative TLS.
> * config/rs6000/rs6000.md (UNSPEC_TLSTLS_PCREL): New unspec.
>   (tls_gd_pcrel, tls_ld_pcrel): New insns.
> (tls_dtprel, tls_tprel): Set attr prefixed when tls_size is not 16.
> (tls_got_tprel_pcrel, tls_tls_pcrel): New insns.

(Changelog has whitespace damage, I guess that is just from how you
mailed this?  Please fix when applying it).

The patch is fine when its prerequisites are in.  Thanks,


Segher


Re: [PATCH] make clear TYPE_SIZE may be non-constant or null

2019-08-15 Thread Jeff Law
On 8/15/19 10:56 AM, Martin Sebor wrote:
> The comment for DECL_SIZE makes it clear it may be non-constant
> but not that it may be null.  The comment for TYPE_SIZE mentions
> neither.
> 
> The attached update adds a few sentences to make these caveats
> clear.  If no one has any suggestions I'll commit it as obvious
> today or tomorrow.
OK
jeff



Re: match ld besides collect2 in gcov test

2019-08-15 Thread Jeff Law
On 8/15/19 2:13 AM, Alexandre Oliva wrote:
> The regexp that checks that -lgcov is linked in when --coverage is
> passed to the compiler driver requires the command line to match
> '/collect2'.  Some of our targets don't match that, but they match /ld
> or ${target_alias}-ld depending on the testing scenario, so I'd like
> to tweak the test to match those as well.
> 
> Tested on x86_64-linux-gnu, and on the affected test scenarios.
> Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.misc-tests/options.exp: Match /ld and -ld besides
>   /collect2.
OK
jeff


Re: use __builtin_alloca, drop non-standard alloca.h

2019-08-15 Thread Jeff Law
On 8/15/19 2:17 AM, Alexandre Oliva wrote:
> Since alloca.h is not ISO C, most of our alloca-using tests seem to
> rely on __builtin_alloca instead of including the header and calling
> alloca.  This patch extends this practice to some of the exceptions I
> found in gcc.target, marking them as requiring a functional alloca
> while at that.
> 
> Tested on x86_64-linux-gnu, and manually compile-tested the non-x86
> tests.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/arc/interrupt-6.c: Use __builtin_alloca, require
>   effective target support for alloca, drop include of alloca.h.
>   * gcc.target/i386/pr80969-3.c: Likewise.
>   * gcc.target/sparc/setjmp-1.c: Likewise.
>   * gcc.target/x86_64/abi/ms-sysv/gen.cc: Likewise.
>   * gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Likewise.
OK
jeff


Re: require trampolines for pr85044

2019-08-15 Thread Jeff Law
On 8/15/19 9:46 AM, Alexandre Oliva wrote:
> Testcases that require support for trampolines should be marked as
> such; gcc.target/i386/pr85044.c was missing it.  Fixed.
> 
> Tested on x86_64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/i386/pr85044.c: Require support for trampolines.
OK
jeff


[PATCH 2/8] bpf: new GCC port

2019-08-15 Thread Jose E. Marchesi
This patch adds a port for the Linux kernel eBPF architecture to GCC.

ChangeLog:

  * configure.ac: Support for bpf-*-* targets.
  * configure: Regenerate.

contrib/ChangeLog:

  * config-list.mk (LIST): Disable go in bpf-*-* targets.

gcc/ChangeLog:

  * config.gcc: Support for bpf-*-* targets.
  * common/config/bpf/bpf-common.c: New file.
  * config/bpf/t-bpf: Likewise.
  * config/bpf/predicates.md: Likewise.
  * config/bpf/constraints.md: Likewise.
  * config/bpf/bpf.opt: Likewise.
  * config/bpf/bpf.md: Likewise.
  * config/bpf/bpf.h: Likewise.
  * config/bpf/bpf.c: Likewise.
  * config/bpf/bpf-protos.h: Likewise.
  * config/bpf/bpf-opts.h: Likewise.
  * config/bpf/bpf-helpers.h: Likewise.
  * config/bpf/bpf-helpers.def: Likewise.
---
 ChangeLog  |5 +
 configure  |   68 ++-
 configure.ac   |   54 +-
 contrib/ChangeLog  |4 +
 contrib/config-list.mk |2 +-
 gcc/ChangeLog  |   16 +
 gcc/common/config/bpf/bpf-common.c |   57 ++
 gcc/config.gcc |9 +
 gcc/config/bpf/bpf-helpers.def |  194 ++
 gcc/config/bpf/bpf-helpers.h   |  324 ++
 gcc/config/bpf/bpf-opts.h  |   56 ++
 gcc/config/bpf/bpf-protos.h|   33 ++
 gcc/config/bpf/bpf.c   | 1136 
 gcc/config/bpf/bpf.h   |  565 ++
 gcc/config/bpf/bpf.md  |  528 +
 gcc/config/bpf/bpf.opt |  119 
 gcc/config/bpf/constraints.md  |   29 +
 gcc/config/bpf/predicates.md   |  105 
 gcc/config/bpf/t-bpf   |0
 19 files changed, 3300 insertions(+), 4 deletions(-)
 create mode 100644 gcc/common/config/bpf/bpf-common.c
 create mode 100644 gcc/config/bpf/bpf-helpers.def
 create mode 100644 gcc/config/bpf/bpf-helpers.h
 create mode 100644 gcc/config/bpf/bpf-opts.h
 create mode 100644 gcc/config/bpf/bpf-protos.h
 create mode 100644 gcc/config/bpf/bpf.c
 create mode 100644 gcc/config/bpf/bpf.h
 create mode 100644 gcc/config/bpf/bpf.md
 create mode 100644 gcc/config/bpf/bpf.opt
 create mode 100644 gcc/config/bpf/constraints.md
 create mode 100644 gcc/config/bpf/predicates.md
 create mode 100644 gcc/config/bpf/t-bpf

diff --git a/configure b/configure
index 63b1e33f41c..4f8e68a4085 100755
--- a/configure
+++ b/configure
@@ -754,6 +754,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -919,6 +920,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1171,6 +1173,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1308,7 +1319,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir
+   libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1468,6 +1479,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -3353,6 +3365,9 @@ case "${target}" in
 # No hosted I/O support.
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
+  bpf-*-*)
+noconfigdirs="$noconfigdirs target-libssp"
+;;
   powerpc-*-aix* | rs6000-*-aix*)
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
@@ -3387,12 +3402,43 @@ if test "${ENABLE_LIBSTDCXX}" = "default" ; then
 avr-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   ;;
+bpf-*-*)
+  noconfigdirs="$noconfigdirs target-libstdc++-v3"
+  ;;
 ft32-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   

[PATCH] make clear TYPE_SIZE may be non-constant or null

2019-08-15 Thread Martin Sebor

The comment for DECL_SIZE makes it clear it may be non-constant
but not that it may be null.  The comment for TYPE_SIZE mentions
neither.

The attached update adds a few sentences to make these caveats
clear.  If no one has any suggestions I'll commit it as obvious
today or tomorrow.

Thanks
Martin
gcc/ChangeLog:

	* tree.def (TYPE_SIZE): Clarify.
	* tree.h (TYPE_SIZE, TYPE_SIZE_UNIT, DECL_SIZE): Add comments.

Index: gcc/tree.def
===
--- gcc/tree.def	(revision 274541)
+++ gcc/tree.def	(working copy)
@@ -77,7 +77,10 @@ DEFTREECODE (BLOCK, "block", tcc_exceptional, 0)
 /* Each data type is represented by a tree node whose code is one of
the following:  */
 /* Each node that represents a data type has a component TYPE_SIZE
-   containing a tree that is an expression for the size in bits.
+   that evaluates either to a tree that is a (potentially non-constant)
+   expression representing the type size in bits, or to a null pointer
+   when the size of the type is unknown (for example, for incomplete
+   types such as arrays of unspecified bound).
The TYPE_MODE contains the machine mode for values of this type.
The TYPE_POINTER_TO field contains a type for a pointer to this type,
  or zero if no such has been created yet.
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 274541)
+++ gcc/tree.h	(working copy)
@@ -1952,7 +1952,10 @@ class auto_suppress_location_wrappers
so they must be checked as well.  */
 
 #define TYPE_UID(NODE) (TYPE_CHECK (NODE)->type_common.uid)
+/* Type size in bits as a tree expression.  Need not be constant
+   and may be null.  */
 #define TYPE_SIZE(NODE) (TYPE_CHECK (NODE)->type_common.size)
+/* Likewise, type size in bytes.  */
 #define TYPE_SIZE_UNIT(NODE) (TYPE_CHECK (NODE)->type_common.size_unit)
 #define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to)
 #define TYPE_REFERENCE_TO(NODE) (TYPE_CHECK (NODE)->type_common.reference_to)
@@ -2480,7 +2483,7 @@ extern machine_mode vector_type_mode (const_tree);
 #define DECL_INITIAL(NODE) (DECL_COMMON_CHECK (NODE)->decl_common.initial)
 
 /* Holds the size of the datum, in bits, as a tree expression.
-   Need not be constant.  */
+   Need not be constant and may be null.  */
 #define DECL_SIZE(NODE) (DECL_COMMON_CHECK (NODE)->decl_common.size)
 /* Likewise for the size in bytes.  */
 #define DECL_SIZE_UNIT(NODE) (DECL_COMMON_CHECK (NODE)->decl_common.size_unit)


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener
On August 15, 2019 4:52:24 PM GMT+02:00, Bernd Edlinger 
 wrote:
>On 8/15/19 2:54 PM, Richard Biener wrote:
>> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
>> 
>>
>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>> be better to do that more conciously by
>>
>>   if (! DECL_USER_ALIGN (decl)
>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>   && targetm.slow_unaligned_access (DECL_MODE (decl),
>align)))
>>
>>>
>>> ? I don't know why that would be better?
>>> If the value is underaligned no matter why, pretend it was declared
>as
>>> naturally aligned if that causes wrong code otherwise.
>>> That was the idea here.
>> 
>> It would be better because then we ignore it and use what we'd use
>> by default rather than inventing sth new.  And your patch suggests
>> it might be needed to up align even w/o DECL_USER_ALIGN.
>> 
>
>Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
>But it inherits the alignment from the destination variable,
>apparently. 

Yes. I think it shouldn't inherit the alignment unless we are assembling a 
static initializer. 

>
>did you mean
>if (! DECL_USER_ALIGN (decl)
>&& align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>&& ...
>?
>
>I can give it a try.

No, I meant || thus ignore DECL_USER_ALIGN if it is sth we have to satisfy with 
unaligned loads. 
>
>> IMHO whatever code later fails to properly use unaligned loads
>> should be fixed instead rather than ignoring user requested
>alignment.
>>
>> Can you quote a short testcase that explains what exactly goes
>wrong?
>> The struct-layout ones are awkward to look at...
>>
>
> Sure,
>
> $ cat test.c
> _Complex float __attribute__((aligned(1))) cf;
>
> void foo (void)
> {
>   cf = 1.0i;
> }
>
> $ arm-linux-gnueabihf-gcc -S test.c 
> during RTL pass: expand
> test.c: In function 'foo':
> test.c:5:6: internal compiler error: in gen_movsf, at
>config/arm/arm.md:7003
> 5 |   cf = 1.0i;
>   |   ~~~^~
> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/config/arm/arm.md:7003
> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>   ../../gcc-trunk/gcc/recog.h:318
> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3695
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3791
> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3490
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>   ../../gcc-trunk/gcc/expr.c:3791
> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>   ../../gcc-trunk/gcc/expr.c:5855
> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>   ../../gcc-trunk/gcc/expr.c:5441

 Huh, so why didn't it trigger

   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
|| TREE_CODE (to) == TARGET_MEM_REF)
   && mode != BLKmode
   && !mem_ref_refers_to_non_mem_p (to)
   && ((align = get_object_alignment (to))
   < GET_MODE_ALIGNMENT (mode))
   && (((icode = optab_handler (movmisalign_optab, mode))
!= CODE_FOR_nothing)
   || targetm.slow_unaligned_access (mode, align)))
 {

 ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
 var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] >>> 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.

 Ah, 'to' is a plain DECL here so the above handling is incomplete.
 IIRC component refs like __real cf = 0.f should be handled fine
 again(?).  So, does adding || DECL_P (to) fix the case as well?

>>>
>>> So I tried this instead of the varasm.c change:
>>>
>>> Index: expr.c
>>> ===
>>> --- expr.c  (revision 274487)
>>> +++ expr.c  (working copy)
>>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>>>/* Handle misaligned stores.  */
>>>mode = TYPE_MODE (TREE_TYPE (to));
>>>if ((TREE_CODE (to) == MEM_REF
>>> -   || TREE_CODE (to) == TARGET_MEM_REF)
>>> +   || TREE_CODE (to) == TARGET_MEM_REF
>>> +   || DECL_P (to))
>>>&& mode != BLKmode
>>> -  && !mem_ref_refers_to_non_mem_p (to)
>>> +  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>>&& ((align = get_object_alignment (to))
>>>   < GET_MODE_ALIGNMENT (mode))
>>>&& (((icode = optab_handler (movmisalign_optab, mode))
>>>
>>> Result, yes, it fixes this test case
>>> but then I run all struct-layout-1.exp there are sill cases. where
>we have problems:
>>>
>>> In file included from
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>>>
>/home/ed/gnu/gcc-build-arm-l

Re: i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2019 at 3:47 PM Alexandre Oliva  wrote:
>
> On Aug 15, 2019, Uros Bizjak  wrote:
>
> > The immediate of lea is limited to +-2GB
>
> ... and we're talking about a code offset within a tiny translation
> unit, with both reference and referenced address within the same
> section.  It would be very surprising if the offset got up to 2KB, let
> alone 2GB ;-)
>
> The reason the testcase even mentions absolute addresses AFAICT is not
> that it wishes to use them, it's that there's no portable way to use
> PC-relative addresses on IA-32 (*), so the test gives up on platforms
> that mandate PIC and goes with absolute addressing there.  I'm pretty
> sure if it had a choice it would happily use PC-relative addressing,
> even with a reasonably short range, if that was readily available...
>
> (*) Something like 'call 1f; 1: popl %0; leal $foo-1b(%0), %0' is the
> closest to widely available I'm aware of, but the '1f/1b' notation is
> only available with GNU as AFAIK.

Well, OK ... let's go ahead with your patch then.

Thanks,
Uros.


Re: [PATCH] PR libstdc++/91456 make INVOKE work with uncopyable prvalues

2019-08-15 Thread Jonathan Wakely

On 15/08/19 17:04 +0100, Jonathan Wakely wrote:

In C++17 a function can return a prvalue of a type that cannot be moved
or copied. The current implementation of std::is_invocable_r uses
std::is_convertible to test the conversion to R required by INVOKE.
That fails for non-copyable prvalues, because std::is_convertible is
defined in terms of std::declval which uses std::add_rvalue_reference.
In C++17 conversion from R to R involves no copies and so is not the
same as conversion from R&& to R.

This commit changes std::is_invocable_r to check the conversion without
using std::is_convertible.

std::function also contains a similar check using std::is_convertible,
which can be fixed by simply reusing std::is_invocable_r (but because
std::is_invocable_r is not defined for C++11 it uses the underlying
std::__is_invocable_impl trait directly).

PR libstdc++/91456
* include/bits/std_function.h (__check_func_return_type): Remove.
(function::_Callable): Use std::__is_invocable_impl instead of
__check_func_return_type.
* include/std/type_traits (__is_invocable_impl): Add another defaulted
template parameter. Define a separate partial specialization for
INVOKE and INVOKE. For INVOKE replace is_convertible check
with a check that models delayed temporary materialization.
* testsuite/20_util/function/91456.cc: New test.
* testsuite/20_util/is_invocable/91456.cc: New test.


With some minor changes to __is_convertible_helper we could make that
usable by both std::is_convertible and __is_invokable_impl.

I don't plan to commit this now but might do at a later date.

diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index 44db2cade5d..4df3fee4c77 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1491,20 +1491,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
bool = __or_, is_function<_To>,
 is_array<_To>>::value>
 struct __is_convertible_helper
-{
-  typedef typename is_void<_To>::type type;
-};
+: public is_void<_To>::type
+{ };
 
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wctor-dtor-privacy"
   template
 class __is_convertible_helper<_From, _To, false>
 {
+  // Unlike declval, this doesn't add_rvalue_reference.
+  template
+	static _From1 __declval();
+
   template
 	static void __test_aux(_To1) noexcept;
 
   template(std::declval<_From1>()))>
+	   typename = decltype(__test_aux<_To1>(__declval<_From1>()))>
 	static true_type
 	__test(int);
 
@@ -1513,14 +1516,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__test(...);
 
 public:
-  typedef decltype(__test<_From, _To>(0)) type;
+  using type = decltype(__test<_From, _To>(0));
 };
 #pragma GCC diagnostic pop
 
+  template struct add_rvalue_reference;
+
   /// is_convertible
   template
 struct is_convertible
-: public __is_convertible_helper<_From, _To>::type
+: public __is_convertible_helper::type,
+ _To>::type
 { };
 
   template
 class __is_nt_convertible_helper<_From, _To, false>
 {
+  // Unlike declval, this doesn't add_rvalue_reference.
+  template
+	static _From1 __declval();
+
   template
 	static void __test_aux(_To1) noexcept;
 
   template
 	static
-	__bool_constant(std::declval<_From1>()))>
+	__bool_constant(__declval<_From1>()))>
 	__test(int);
 
   template
@@ -1555,14 +1565,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // is_nothrow_convertible for C++11
   template
 struct __is_nothrow_convertible
-: public __is_nt_convertible_helper<_From, _To>::type
+: __is_nt_convertible_helper::type,
+ _To>::type
 { };
 
 #if __cplusplus > 201703L
   /// is_nothrow_convertible
   template
 struct is_nothrow_convertible
-: public __is_nt_convertible_helper<_From, _To>::type
+: __is_nt_convertible_helper::type,
+ _To>::type
 { };
 
   /// is_nothrow_convertible_v
@@ -2896,35 +2908,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : true_type
 { };
 
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wctor-dtor-privacy"
   // Used for INVOKE expressions to check the implicit conversion to R.
   template
 struct __is_invocable_impl<_Result, _Ret,
 			   /* is_void<_Ret> = */ false,
 			   __void_t>
-{
-private:
-  // The type of the INVOKE expression.
-  // Unlike declval, this doesn't add_rvalue_reference.
-  static typename _Result::type _S_get();
-
-  template
-	static void _S_conv(_Tp);
-
-  // This overload is viable if INVOKE(f, args...) can convert to _Tp.
-  template(_S_get()))>
-	static true_type
-	_S_test(int);
-
-  template
-	static false_type
-	_S_test(...);
-
-public:
-  using type = decltype(_S_test<_Ret>(1));
-};
-#pragma GCC diagnostic pop
+: __is_convertible_helper
+{ };
 
   template
 struct __is_invocable


Re: types for VR_VARYING

2019-08-15 Thread Aldy Hernandez



On 8/15/19 7:23 AM, Richard Biener wrote:

On Thu, Aug 15, 2019 at 12:40 PM Aldy Hernandez  wrote:


On 8/14/19 1:37 PM, Jeff Law wrote:

On 8/13/19 6:39 PM, Aldy Hernandez wrote:



On 8/12/19 7:46 PM, Jeff Law wrote:

On 8/12/19 12:43 PM, Aldy Hernandez wrote:

This is a fresh re-post of:

https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html

Andrew gave me some feedback a week ago, and I obviously don't remember
what it was because I was about to leave on PTO.  However, I do remember
I addressed his concerns before getting drunk on rum in tropical islands.


FWIW found a great coffee infused rum while in Kauai last week.  I'm not
a coffee fan, but it was wonderful.  The one bottle we brought back
isn't going to last until Cauldron and I don't think I can get a special
order filled before I leave :(


You must bring some to Cauldron before we believe you. :)

That's the problem.  The nearest place I can get it is in Vegas and
there's no distributor in Montreal.   I can special order it in our
state run stores, but it won't be here in time.

Of course, I don't mind if you don't believe me.  More for me in that
case...



Is the supports_type_p stuff there to placate the calls from ipa-cp?  I
can live with it in the short term, but it really feels like there
should be something in the ipa-cp client that avoids this silliness.


I am not happy with this either, but there are various places where
statements that are !stmt_interesting_for_vrp() are still setting a
range of VARYING, which is then being ignored at a later time.

For example, vrp_initialize:

if (!stmt_interesting_for_vrp (phi))
  {
tree lhs = PHI_RESULT (phi);
set_def_to_varying (lhs);
prop_set_simulate_again (phi, false);
  }

Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the
statement is interesting for VRP but extract_range_from_stmt() does not
produce a useful range, we also set a varying for a range we will never
use.  Similarly for a statement that is not interesting in this hunk.

Ugh.  One could perhaps argue that setting any kind of range in these
circumstances is silly.   But I suspect it's necessary due to the
optimistic handling of VR_UNDEFINED in value_range_base::union_helper.
It's all coming back to me now...




Then there is vrp_prop::visit_stmt() where we also set VARYING for types
that VRP will never handle:

case IFN_ADD_OVERFLOW:
case IFN_SUB_OVERFLOW:
case IFN_MUL_OVERFLOW:
case IFN_ATOMIC_COMPARE_EXCHANGE:
  /* These internal calls return _Complex integer type,
 which VRP does not track, but the immediate uses
 thereof might be interesting.  */
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
  imm_use_iterator iter;
  use_operand_p use_p;
  enum ssa_prop_result res = SSA_PROP_VARYING;

  set_def_to_varying (lhs);

I've adjusted the patch so that set_def_to_varying will set the range to
VR_UNDEFINED if !supports_type_p.  This is a fail safe, as we can't
really do anything with a nonsensical range.  I just don't want to leave
the range in an indeterminate state.


I think VR_UNDEFINED is unsafe due to value_range_base::union_helper.
And that's a more general than this patch.  VR_UNDEFINED is _not_ a safe
range to set something to if we can't handle it.  We have to use VR_VARYING.

Why?  See the beginning of value_range_base::union_helper:

 /* VR0 has the resulting range if VR1 is undefined or VR0 is varying.  */
 if (vr1->undefined_p ()
 || vr0->varying_p ())
   return *vr0;

 /* VR1 has the resulting range if VR0 is undefined or VR1 is varying.  */
 if (vr0->undefined_p ()
 || vr1->varying_p ())
   return *vr1;
This can get called for something like

a =  ? name1 : name2;

If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe
value for something we can't handle, then we'll incorrectly return the
range for name2.


I think that if name1 was !supports_type_p, we will have never called
union/intersect.  We will have bailed at some point earlier.  However, I
do see your point about being consistent.



VR_UNDEFINED can only be used for the ranges of objects we haven't
processed.  If we can't produce a range for an object because the
statement is something we don't handle or just doesn't produce anythign
useful, then the right result is VR_VARYING.

This may be worth commenting at the definition site for VR_*.



I also noticed that Andrew's patch was setting num_vr_values to
num_ssa_names + num_ssa_names / 10.  I think he meant num_vr_values +
num_vr_values / 10.  Please verify the current incantation makes sense.

Going to assume this will be adjusted per the other messages in this thread.


Done.





diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39ea22f0554..663dd6e2398 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -182,8 +182,10 @

[PATCH] PR libstdc++/91456 make INVOKE work with uncopyable prvalues

2019-08-15 Thread Jonathan Wakely

In C++17 a function can return a prvalue of a type that cannot be moved
or copied. The current implementation of std::is_invocable_r uses
std::is_convertible to test the conversion to R required by INVOKE.
That fails for non-copyable prvalues, because std::is_convertible is
defined in terms of std::declval which uses std::add_rvalue_reference.
In C++17 conversion from R to R involves no copies and so is not the
same as conversion from R&& to R.

This commit changes std::is_invocable_r to check the conversion without
using std::is_convertible.

std::function also contains a similar check using std::is_convertible,
which can be fixed by simply reusing std::is_invocable_r (but because
std::is_invocable_r is not defined for C++11 it uses the underlying
std::__is_invocable_impl trait directly).

PR libstdc++/91456
* include/bits/std_function.h (__check_func_return_type): Remove.
(function::_Callable): Use std::__is_invocable_impl instead of
__check_func_return_type.
* include/std/type_traits (__is_invocable_impl): Add another defaulted
template parameter. Define a separate partial specialization for
INVOKE and INVOKE. For INVOKE replace is_convertible check
with a check that models delayed temporary materialization.
* testsuite/20_util/function/91456.cc: New test.
* testsuite/20_util/is_invocable/91456.cc: New test.

Tested x86_64-linux, committed to trunk.

I might backport this to gcc-9-branch too, after some time on trunk.

commit a9ae95a61b2d8b5ccbbaff1f9bd0b3ed70c600ed
Author: Jonathan Wakely 
Date:   Thu Aug 15 15:46:39 2019 +0100

PR libstdc++/91456 make INVOKE work with uncopyable prvalues

In C++17 a function can return a prvalue of a type that cannot be moved
or copied. The current implementation of std::is_invocable_r uses
std::is_convertible to test the conversion to R required by INVOKE.
That fails for non-copyable prvalues, because std::is_convertible is
defined in terms of std::declval which uses std::add_rvalue_reference.
In C++17 conversion from R to R involves no copies and so is not the
same as conversion from R&& to R.

This commit changes std::is_invocable_r to check the conversion without
using std::is_convertible.

std::function also contains a similar check using std::is_convertible,
which can be fixed by simply reusing std::is_invocable_r (but because
std::is_invocable_r is not defined for C++11 it uses the underlying
std::__is_invocable_impl trait directly).

PR libstdc++/91456
* include/bits/std_function.h (__check_func_return_type): Remove.
(function::_Callable): Use std::__is_invocable_impl instead of
__check_func_return_type.
* include/std/type_traits (__is_invocable_impl): Add another 
defaulted
template parameter. Define a separate partial specialization for
INVOKE and INVOKE. For INVOKE replace is_convertible check
with a check that models delayed temporary materialization.
* testsuite/20_util/function/91456.cc: New test.
* testsuite/20_util/is_invocable/91456.cc: New test.

diff --git a/libstdc++-v3/include/bits/std_function.h 
b/libstdc++-v3/include/bits/std_function.h
index 5733bf5f3f9..42f87873d55 100644
--- a/libstdc++-v3/include/bits/std_function.h
+++ b/libstdc++-v3/include/bits/std_function.h
@@ -293,10 +293,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 };
 
-  template
-using __check_func_return_type
-  = __or_, is_same<_From, _To>, is_convertible<_From, _To>>;
-
   /**
*  @brief Primary class template for std::function.
*  @ingroup functors
@@ -309,8 +305,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private _Function_base
 {
   template::type>
-   struct _Callable : __check_func_return_type<_Res2, _Res> { };
+  typename _Res2 = __invoke_result<_Func&, _ArgTypes...>>
+   struct _Callable
+   : __is_invocable_impl<_Res2, _Res>::type
+   { };
 
   // Used so the return type convertibility checks aren't done when
   // performing overload resolution for copy construction/assignment.
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index d3f853d4ce2..44db2cade5d 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2883,14 +2883,49 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // __is_invocable (std::is_invocable for C++11)
 
-  template
+  // The primary template is used for invalid INVOKE expressions.
+  template::value, typename = void>
 struct __is_invocable_impl : false_type { };
 
+  // Used for valid INVOKE and INVOKE expressions.
   template
-struct __is_invocable_impl<_Result, _Ret, __void_t>
-: __or_, is_convertible>::type
+struct __is_invocable_impl<_Result, _Ret,
+  /* is_void<_Ret> = */ true,
+ 

require trampolines for pr85044

2019-08-15 Thread Alexandre Oliva
Testcases that require support for trampolines should be marked as
such; gcc.target/i386/pr85044.c was missing it.  Fixed.

Tested on x86_64-linux-gnu.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr85044.c: Require support for trampolines.
---
 gcc/testsuite/gcc.target/i386/pr85044.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr85044.c 
b/gcc/testsuite/gcc.target/i386/pr85044.c
index a25cc7fe3252..02ef91d3dbbb 100644
--- a/gcc/testsuite/gcc.target/i386/pr85044.c
+++ b/gcc/testsuite/gcc.target/i386/pr85044.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target cet } } */
+/* { dg-require-effective-target trampolines } */
 /* { dg-options "-O2 -fcf-protection=branch" } */
 
 void callme (void (*callback) (void));

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[PATCH][gensupport] PR 91255: Do not error out immediately on set_attr_alternative with define_subst

2019-08-15 Thread Kyrill Tkachov

Hi all,

I'm trying to add a define_subst use in the arm backend but am getting 
many build errors complaining about:

`set_attr_alternative' is unsupported by `define_subst'

Looking at the gensupport.c code it iterates over all define_insns and 
errors if any of them have set_attr_alternative.


The usecase I'm targetting doesn't involve patterns with 
set_attr_alternative, so I would like to make the define_subst handling
more robust to only error out if the define_subst is actually attempted 
on a set_attr_alternative.


This patch produces the error only if the set_attr_alternative attr 
matches the subst name.

This allows a build of the arm backend with a define_subst usage to succeed.

Bootstrapped and tested on arm-none-linux-gnueabihf and x86_64-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2019-08-15  Kyrylo Tkachov  

    PR other/91255
    * gensupport.c (has_subst_attribute): Error out on set_attr_alternative
    only if subst_name matches curr_attr string.

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 1aab7119901..c64f683bc5c 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -788,9 +788,10 @@ has_subst_attribute (class queue_elem *elem, class queue_elem *subst_elem)
 	  return false;
 
 	case SET_ATTR_ALTERNATIVE:
-	  error_at (elem->loc,
-		"%s: `set_attr_alternative' is unsupported by "
-		"`define_subst'", XSTR (elem->data, 0));
+	  if (strcmp (XSTR (cur_attr, 0), subst_name) == 0)
+	error_at (elem->loc,
+		  "%s: `set_attr_alternative' is unsupported by "
+		  "`define_subst'", XSTR (elem->data, 0));
 	  return false;
 
 


[PATCH] Reapply missing patch for libsanitizer.

2019-08-15 Thread Martin Liška
Hi.

There's forgotten patch for libsanitizer that was not listed
in LOCAL_PATCHES. I've just tested the patch on ppc64
(gcc110 compile farm machine) and I'm going to install the patch.

Martin
>From 82662f97b6bacf21eee1185bc116aa22c0c89b33 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 15 Aug 2019 17:23:14 +0200
Subject: [PATCH] Reapply missing patch for libsanitizer.

libsanitizer/ChangeLog:

2019-08-15  Martin Liska  

	* tsan/tsan_rtl_ppc64.S: Reapply.
---
 libsanitizer/tsan/tsan_rtl_ppc64.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libsanitizer/tsan/tsan_rtl_ppc64.S b/libsanitizer/tsan/tsan_rtl_ppc64.S
index 8285e21aa1e..9e533a71a9c 100644
--- a/libsanitizer/tsan/tsan_rtl_ppc64.S
+++ b/libsanitizer/tsan/tsan_rtl_ppc64.S
@@ -1,5 +1,6 @@
 #include "tsan_ppc_regs.h"
 
+.machine altivec
 .section .text
 .hidden __tsan_setjmp
 .globl _setjmp
-- 
2.22.0



Re: [PATCH 3/9] operand_equal_p: add support for OBJ_TYPE_REF.

2019-08-15 Thread Jan Hubicka
> On Tue, Aug 6, 2019 at 5:43 PM Martin Liska  wrote:
> >
> >
> > gcc/ChangeLog:
> 
> +   /* Virtual table call.  */
> +   case OBJ_TYPE_REF:
> + {
> +   if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0),
> + OBJ_TYPE_REF_EXPR (arg1), flags))
> + return false;
> +   if (virtual_method_call_p (arg0))
> + {
> +   if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0))
> +   != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1)))
> + return false;
> +   if (!types_same_for_odr (obj_type_ref_class (arg0),
> +obj_type_ref_class (arg1)))
> + return false;
> +   if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
> + OBJ_TYPE_REF_OBJECT (arg1), flags))
> + return false;
> 
> this all gets deep into the devirt machinery, including looking at
> ODR type hashes.  So I'm not sure if we really want to handle
> it this "optimistic" in operand_equal_p and completely ignore
> other operands when !virtual_method_call_p?  That is, why
> not compare OBJ_TYPE_REF_TOKEN/OBJECT always at least?

For !virtual_method_call_p we do not use OBJ_TYPE_REF at all yet obj-C frontend
produce it.  I think we should remove them somewhere during gimplification.
We can definitly turn "optimistic" to "pesimistic" and return false here.

Otherwise the checks makes sense to me - it the tests above passes devirt
machinery ought to give same results.
> 
> Do we then have cases where the OBJ_TYPE_REF is actually
> distinct according to the remaining check?

I am not sure what you mean here?

Honza
> 
> + }
> 
> 
> > 2019-07-24  Martin Liska  
> >
> > * fold-const.c (operand_equal_p): Support OBJ_TYPE_REF.
> > * tree.c (add_expr): Hash parts of OBJ_TYPE_REF.
> > ---
> >  gcc/fold-const.c | 21 +
> >  gcc/tree.c   |  9 +
> >  2 files changed, 30 insertions(+)
> >


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Bernd Edlinger
On 8/15/19 2:54 PM, Richard Biener wrote:
> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> 
>
> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> be better to do that more conciously by
>
>   if (! DECL_USER_ALIGN (decl)
>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>
>>
>> ? I don't know why that would be better?
>> If the value is underaligned no matter why, pretend it was declared as
>> naturally aligned if that causes wrong code otherwise.
>> That was the idea here.
> 
> It would be better because then we ignore it and use what we'd use
> by default rather than inventing sth new.  And your patch suggests
> it might be needed to up align even w/o DECL_USER_ALIGN.
> 

Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
But it inherits the alignment from the destination variable, apparently.

did you mean
if (! DECL_USER_ALIGN (decl)
&& align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
&& ...
?

I can give it a try.

> IMHO whatever code later fails to properly use unaligned loads
> should be fixed instead rather than ignoring user requested alignment.
>
> Can you quote a short testcase that explains what exactly goes wrong?
> The struct-layout ones are awkward to look at...
>

 Sure,

 $ cat test.c
 _Complex float __attribute__((aligned(1))) cf;

 void foo (void)
 {
   cf = 1.0i;
 }

 $ arm-linux-gnueabihf-gcc -S test.c 
 during RTL pass: expand
 test.c: In function 'foo':
 test.c:5:6: internal compiler error: in gen_movsf, at 
 config/arm/arm.md:7003
 5 |   cf = 1.0i;
   |   ~~~^~
 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/config/arm/arm.md:7003
 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
../../gcc-trunk/gcc/recog.h:318
 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3695
 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3791
 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3490
 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
../../gcc-trunk/gcc/expr.c:3791
 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
../../gcc-trunk/gcc/expr.c:5855
 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
../../gcc-trunk/gcc/expr.c:5441
>>>
>>> Huh, so why didn't it trigger
>>>
>>>   /* Handle misaligned stores.  */
>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>   if ((TREE_CODE (to) == MEM_REF
>>>|| TREE_CODE (to) == TARGET_MEM_REF)
>>>   && mode != BLKmode
>>>   && !mem_ref_refers_to_non_mem_p (to)
>>>   && ((align = get_object_alignment (to))
>>>   < GET_MODE_ALIGNMENT (mode))
>>>   && (((icode = optab_handler (movmisalign_optab, mode))
>>>!= CODE_FOR_nothing)
>>>   || targetm.slow_unaligned_access (mode, align)))
>>> {
>>>
>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] >> 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>
>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>> IIRC component refs like __real cf = 0.f should be handled fine
>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>
>>
>> So I tried this instead of the varasm.c change:
>>
>> Index: expr.c
>> ===
>> --- expr.c   (revision 274487)
>> +++ expr.c   (working copy)
>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
>>/* Handle misaligned stores.  */
>>mode = TYPE_MODE (TREE_TYPE (to));
>>if ((TREE_CODE (to) == MEM_REF
>> -   || TREE_CODE (to) == TARGET_MEM_REF)
>> +   || TREE_CODE (to) == TARGET_MEM_REF
>> +   || DECL_P (to))
>>&& mode != BLKmode
>> -  && !mem_ref_refers_to_non_mem_p (to)
>> +  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>&& ((align = get_object_alignment (to))
>>< GET_MODE_ALIGNMENT (mode))
>>&& (((icode = optab_handler (movmisalign_optab, mode))
>>
>> Result, yes, it fixes this test case
>> but then I run all struct-layout-1.exp there are sill cases. where we have 
>> problems:
>>
>> In file included from 
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:
>>  In function 'test2112':^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10:
>>  internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3:
>>  note: in definition of mac

Re: [PATCH] Prevent LTO section collision for a symbol name starting with '*'.

2019-08-15 Thread Jan Hubicka
> On Fri, Aug 9, 2019 at 3:57 PM Martin Liška  wrote:
> >
> > Hi.
> >
> > The patch is about prevention of LTO section name clashing.
> > Now we have a situation where body of 2 functions is streamed
> > into the same ELF section. Then we'll end up with smashed data.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> 
> I think the comment should mention why we skip a leading '*'
> at all.  IIRC this is some target mangling applied to DECL_ASSEMBLER_NAME?
DECL_ASSEMBLER_NAME works in a way that if it starts with "*"
it is copied verbatim to the linker ouptut.  If it is w/o "*"
then user_label_prefix is applied first, see 
symbol_table::assembler_names_equal_p

So if we skip "*" one can definitly construct testcases of different
function names ending up in same section especially when
user_label_prefix is non-empty, like on Windows I think it is "_".

> And section names cannot contain '*'?  Or do we need to actually
> indentify '*fn' and 'fn' as the same?  For the testcase what is the clashing
> symbol?  Can't we have many so that using "0" always is broken as well?

We may have duplicate symbols during the compile time->WPA streaming
since we do not do lto-symtab at compile time and user can use asm names
that way.  This is not limited to extern inlines, so it would be nice to
make this work reliably. I also plan support for keeping multiple
function bodies defined for one symbol in cases it is necessary (glibc
checking, when optimization flags are mismatches) for WPA->ltrans
streaming.

I was always considering option to simply use node->order ids to stream
sections.  Because of way node->order is merged we area always able to
recover the ID.

It is however kind of nice to see actual names in the objdump
of the LTO .o files.  I would not mind that much to see this go and it
would also save bit of space since symbol names can be long.

Honza
> 
> Richard.
> 
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-08-09  Martin Liska  
> >
> > PR lto/91393
> > PR lto/88220
> > * lto-streamer.c (lto_get_section_name): Replace '*' leading
> > character with '0'.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2019-08-09  Martin Liska  
> >
> > PR lto/91393
> > PR lto/88220
> > * gcc.dg/lto/pr91393_0.c: New test.
> > ---
> >  gcc/lto-streamer.c   | 15 ---
> >  gcc/testsuite/gcc.dg/lto/pr91393_0.c | 11 +++
> >  2 files changed, 23 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/lto/pr91393_0.c
> >
> >


Re: [PATCH] Make cdtor names stable for LTO (PR lto/91307).

2019-08-15 Thread Jan Hubicka
> On Thu, Aug 1, 2019 at 3:10 PM Martin Liška  wrote:
> >
> > Hi.
> >
> > In LTO WPA mode we don't have to append temp file name
> > to the global cdtor function names.
> 
> Is that true?  You can link with -r -flinker-output=rel and use
> multiple WPA phases whose results you then finally link.
> 
> So I don't think it's that easy.  You might be able to look at
> all_translation_units, pick the one with a sensible name
> (hmm, not sure if we actually have a name there) and the lowest
> UID and use that?  Thus, make the set of involved source files
> known and pick one.  Ah,
> 
> struct GTY(()) tree_translation_unit_decl {
>   struct tree_decl_common common;
>   /* Source language of this translation unit.  Used for DWARF output.  */
>   const char * GTY((skip(""))) language;
>   /* TODO: Non-optimization used to build this translation unit.  */
>   /* TODO: Root of a partial DWARF tree for global types and decls.  */
> };
> 
> so you might be able to get at a filename via the decls location,
> I'm not sure.  Do we have any LTO records per "input file" where we
> can stream main_input_filename to?

This is all bit sloppy.  If you incrmentally link into .o file and then
use LTO again to add more code, you will very likely run into conflict
with .lto_priv clones as well. Especially now when we made them more
stable.

I wondered if Linker should not provide us also with list of symbols
that are used in the unit, so we can safely produce more local ones?

Honza
> 
> > It helps to have a reproducible
> > builds with LTO mode.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-08-01  Martin Liska  
> >
> > PR lto/91307
> > * tree.c (get_file_function_name): Use "wpa" when
> > we are in WPA LTO mode.
> > ---
> >  gcc/tree.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> >
> >


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener
On Thu, 15 Aug 2019, Richard Biener wrote:

> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> > > 
> > > We can't subset an SSA_NAME.  I have really no idea what this intended
> > > to do...
> > > 
> > 
> > Nice, so would you do a patch to change that to a
> > gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ?
> > maybe with a small explanation?
> 
> I'll try.

So actually we can via BIT_FIELD_REF<_1, ...> and that _1 can end
up being expanded in memory.  See r233656 which brought this in.

Richard.


[libsanitizer, comitted] Fix PR bootstrap/91455

2019-08-15 Thread Iain Sandoe
If a target does not support libbacktrace, it might still need the include
for $(top_srcdir).

Regenerate the built files using automake-1.15.1

bootstrapped on x86_64-darwin16, x86_64-linux-gnu and
powerpc64-linux-gnu (with a fix for pr90639 applied for this).
Iain

libsanitizer/

2019-08-15  Iain Sandoe  

PR bootstrap/91455
* Makefile.in: Regenerated.
* aclocal.m4: Likewise.
* asan/Makefile.in: Likewise.
* configure: Likewise.
* interception/Makefile.in: Likewise.
* libbacktrace/Makefile.in: Likewise.
* lsan/Makefile.in: Likewise.
* sanitizer_common/Makefile.am: Include top_srcdir unconditionally.
* sanitizer_common/Makefile.in: Regenerated.
* tsan/Makefile.in: Likewise.
* ubsan/Makefile.in: Likewise.

diff --git a/libsanitizer/sanitizer_common/Makefile.am 
b/libsanitizer/sanitizer_common/Makefile.am
index 7e8ce9476e..df9c294151 100644
--- a/libsanitizer/sanitizer_common/Makefile.am
+++ b/libsanitizer/sanitizer_common/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS = -I $(top_srcdir)/include -isystem $(top_srcdir)/include/system
+AM_CPPFLAGS = -I $(top_srcdir)/include -I $(top_srcdir) -isystem 
$(top_srcdir)/include/system
 
 # May be used by toolexeclibdir.
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
@@ -10,7 +10,6 @@ AM_CXXFLAGS += -std=gnu++11
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
 if LIBBACKTRACE_SUPPORTED
 AM_CXXFLAGS += -DSANITIZER_LIBBACKTRACE -DSANITIZER_CP_DEMANGLE \
-  -I $(top_srcdir)/ \
   -I $(top_srcdir)/../libbacktrace \
   -I $(top_builddir)/libbacktrace \
   -I $(top_srcdir)/../include \



Re: [PATCH 2/9] operand_equal_p: add support for FIELD_DECL

2019-08-15 Thread Jan Hubicka
> On Tue, Aug 6, 2019 at 5:44 PM Martin Liska  wrote:
> >
> >
> > gcc/ChangeLog:
> 
> So I suppose this isn't to call operand_equal_p on two FIELD_DECLs
> but to make two COMPONENT_REFs "more equal"?  If so I then

yes. The patch originates from my original patchset I believe and it is
what ICF does.
> I suggest to make this change "local" to the COMPONENT_REF handling.
> This also interacts with path-based disambiguation so you want to make
> sure to only make things equal here iff it wouldn't change the outcome
> of path-based analysis.  Honza?

Indeed this can be handled as part of COMPONENT_REF match.
Access path oracle here basically checks:
 1) that MEM_REF type matches (we want predicate for this)
 2) if it finds type match via same_type_for_tbaa and then it applies
the assumption about disjointness or overlap

So I guess ideally we should

 1) do matching part of COMPONENT_REF
 2) compare OFFSET, BIT_OFFSET 
This establishes that the access has same semantics.
 3) for -fno-strict-aliasing be happy
 4) for -fstrict-aliaisng check if access path applies (we should export
predicate from tree-ssa-alias as discussed earlier)
 5) compare types by same_type_for_tbaa_p

Honza


Re: Remove TARGET_SETUP_INCOMING_VARARG_BOUNDS

2019-08-15 Thread Richard Biener
On Thu, Aug 15, 2019 at 3:30 PM Richard Sandiford
 wrote:
>
> TARGET_SETUP_INCOMING_VARARG_BOUNDS seems to be an unused vestige of the
> MPX support.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

> Richard
>
>
> 2019-08-15  Richard Sandiford  
>
> gcc/
> * target.def (setup_incoming_vararg_bounds): Remove.
> * doc/tm.texi (TARGET_SETUP_INCOMING_VARARG_BOUNDS): Remove.
> * doc/tm.texi.in: Regenerate.
> * targhooks.c (default_setup_incoming_vararg_bounds): Delete.
> * targhooks.h (default_setup_incoming_vararg_bounds): Likewise.
> * config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
> (TARGET_SETUP_INCOMING_VARARG_BOUNDS): Likewise.
>
> Index: gcc/target.def
> ===
> --- gcc/target.def  2019-08-13 22:33:30.929994105 +0100
> +++ gcc/target.def  2019-08-15 14:28:02.041548275 +0100
> @@ -4551,15 +4551,6 @@ returned by function call into @var{slot
>   default_store_returned_bounds)
>
>  DEFHOOK
> -(setup_incoming_vararg_bounds,
> - "Use it to store bounds for anonymous register arguments stored\n\
> -into the stack.  Arguments meaning is similar to\n\
> -@code{TARGET_SETUP_INCOMING_VARARGS}.",
> - void, (cumulative_args_t args_so_far, machine_mode mode, tree type,
> -   int *pretend_args_size, int second_time),
> - default_setup_incoming_vararg_bounds)
> -
> -DEFHOOK
>  (call_args,
>   "While generating RTL for a function call, this target hook is invoked 
> once\n\
>  for each argument passed to the function, either a register returned by\n\
> Index: gcc/doc/tm.texi
> ===
> --- gcc/doc/tm.texi 2019-08-13 22:33:30.801995048 +0100
> +++ gcc/doc/tm.texi 2019-08-15 14:28:02.041548275 +0100
> @@ -5314,12 +5314,6 @@ This hook is used by expand pass to emit
>  returned by function call into @var{slot}.
>  @end deftypefn
>
> -@deftypefn {Target Hook} void TARGET_SETUP_INCOMING_VARARG_BOUNDS 
> (cumulative_args_t @var{args_so_far}, machine_mode @var{mode}, tree 
> @var{type}, int *@var{pretend_args_size}, int @var{second_time})
> -Use it to store bounds for anonymous register arguments stored
> -into the stack.  Arguments meaning is similar to
> -@code{TARGET_SETUP_INCOMING_VARARGS}.
> -@end deftypefn
> -
>  @node Trampolines
>  @section Support for Nested Functions
>  @cindex support for nested functions
> Index: gcc/doc/tm.texi.in
> ===
> --- gcc/doc/tm.texi.in  2019-06-18 09:35:52.089892867 +0100
> +++ gcc/doc/tm.texi.in  2019-08-15 14:28:02.041548275 +0100
> @@ -3785,8 +3785,6 @@ These machine description macros help im
>
>  @hook TARGET_STORE_RETURNED_BOUNDS
>
> -@hook TARGET_SETUP_INCOMING_VARARG_BOUNDS
> -
>  @node Trampolines
>  @section Support for Nested Functions
>  @cindex support for nested functions
> Index: gcc/targhooks.c
> ===
> --- gcc/targhooks.c 2019-07-10 19:41:20.127948228 +0100
> +++ gcc/targhooks.c 2019-08-15 14:28:02.041548275 +0100
> @@ -2274,15 +2274,6 @@ std_gimplify_va_arg_expr (tree valist, t
>return build_va_arg_indirect_ref (addr);
>  }
>
> -void
> -default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE_UNUSED,
> - machine_mode mode ATTRIBUTE_UNUSED,
> - tree type ATTRIBUTE_UNUSED,
> - int *pretend_arg_size ATTRIBUTE_UNUSED,
> - int second_time ATTRIBUTE_UNUSED)
> -{
> -}
> -
>  /* An implementation of TARGET_CAN_USE_DOLOOP_P for targets that do
> not support nested low-overhead loops.  */
>
> Index: gcc/targhooks.h
> ===
> --- gcc/targhooks.h 2019-07-10 19:41:20.127948228 +0100
> +++ gcc/targhooks.h 2019-08-15 14:28:02.045548244 +0100
> @@ -265,11 +265,6 @@ extern rtx default_load_bounds_for_arg (
>  extern void default_store_bounds_for_arg (rtx, rtx, rtx, rtx);
>  extern rtx default_load_returned_bounds (rtx);
>  extern void default_store_returned_bounds (rtx,rtx);
> -extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca 
> ATTRIBUTE_UNUSED,
> - machine_mode mode 
> ATTRIBUTE_UNUSED,
> - tree type ATTRIBUTE_UNUSED,
> - int *pretend_arg_size 
> ATTRIBUTE_UNUSED,
> - int second_time 
> ATTRIBUTE_UNUSED);
>  extern bool default_optab_supported_p (int, machine_mode, machine_mode,
>optimization_type);
>  extern unsigned int default_max_noce_ifcvt_seq_cost (edge);
> Index: gcc/config/i386/i386.c
> =

Re: i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Alexandre Oliva
On Aug 15, 2019, Uros Bizjak  wrote:

> The immediate of lea is limited to +-2GB

... and we're talking about a code offset within a tiny translation
unit, with both reference and referenced address within the same
section.  It would be very surprising if the offset got up to 2KB, let
alone 2GB ;-)

The reason the testcase even mentions absolute addresses AFAICT is not
that it wishes to use them, it's that there's no portable way to use
PC-relative addresses on IA-32 (*), so the test gives up on platforms
that mandate PIC and goes with absolute addressing there.  I'm pretty
sure if it had a choice it would happily use PC-relative addressing,
even with a reasonably short range, if that was readily available...

(*) Something like 'call 1f; 1: popl %0; leal $foo-1b(%0), %0' is the
closest to widely available I'm aware of, but the '1f/1b' notation is
only available with GNU as AFAIK.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Remove TARGET_SETUP_INCOMING_VARARG_BOUNDS

2019-08-15 Thread Richard Sandiford
TARGET_SETUP_INCOMING_VARARG_BOUNDS seems to be an unused vestige of the
MPX support.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-08-15  Richard Sandiford  

gcc/
* target.def (setup_incoming_vararg_bounds): Remove.
* doc/tm.texi (TARGET_SETUP_INCOMING_VARARG_BOUNDS): Remove.
* doc/tm.texi.in: Regenerate.
* targhooks.c (default_setup_incoming_vararg_bounds): Delete.
* targhooks.h (default_setup_incoming_vararg_bounds): Likewise.
* config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
(TARGET_SETUP_INCOMING_VARARG_BOUNDS): Likewise.

Index: gcc/target.def
===
--- gcc/target.def  2019-08-13 22:33:30.929994105 +0100
+++ gcc/target.def  2019-08-15 14:28:02.041548275 +0100
@@ -4551,15 +4551,6 @@ returned by function call into @var{slot
  default_store_returned_bounds)
 
 DEFHOOK
-(setup_incoming_vararg_bounds,
- "Use it to store bounds for anonymous register arguments stored\n\
-into the stack.  Arguments meaning is similar to\n\
-@code{TARGET_SETUP_INCOMING_VARARGS}.",
- void, (cumulative_args_t args_so_far, machine_mode mode, tree type,
-   int *pretend_args_size, int second_time),
- default_setup_incoming_vararg_bounds)
-
-DEFHOOK
 (call_args,
  "While generating RTL for a function call, this target hook is invoked once\n\
 for each argument passed to the function, either a register returned by\n\
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2019-08-13 22:33:30.801995048 +0100
+++ gcc/doc/tm.texi 2019-08-15 14:28:02.041548275 +0100
@@ -5314,12 +5314,6 @@ This hook is used by expand pass to emit
 returned by function call into @var{slot}.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_SETUP_INCOMING_VARARG_BOUNDS 
(cumulative_args_t @var{args_so_far}, machine_mode @var{mode}, tree @var{type}, 
int *@var{pretend_args_size}, int @var{second_time})
-Use it to store bounds for anonymous register arguments stored
-into the stack.  Arguments meaning is similar to
-@code{TARGET_SETUP_INCOMING_VARARGS}.
-@end deftypefn
-
 @node Trampolines
 @section Support for Nested Functions
 @cindex support for nested functions
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  2019-06-18 09:35:52.089892867 +0100
+++ gcc/doc/tm.texi.in  2019-08-15 14:28:02.041548275 +0100
@@ -3785,8 +3785,6 @@ These machine description macros help im
 
 @hook TARGET_STORE_RETURNED_BOUNDS
 
-@hook TARGET_SETUP_INCOMING_VARARG_BOUNDS
-
 @node Trampolines
 @section Support for Nested Functions
 @cindex support for nested functions
Index: gcc/targhooks.c
===
--- gcc/targhooks.c 2019-07-10 19:41:20.127948228 +0100
+++ gcc/targhooks.c 2019-08-15 14:28:02.041548275 +0100
@@ -2274,15 +2274,6 @@ std_gimplify_va_arg_expr (tree valist, t
   return build_va_arg_indirect_ref (addr);
 }
 
-void
-default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE_UNUSED,
- machine_mode mode ATTRIBUTE_UNUSED,
- tree type ATTRIBUTE_UNUSED,
- int *pretend_arg_size ATTRIBUTE_UNUSED,
- int second_time ATTRIBUTE_UNUSED)
-{
-}
-
 /* An implementation of TARGET_CAN_USE_DOLOOP_P for targets that do
not support nested low-overhead loops.  */
 
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2019-07-10 19:41:20.127948228 +0100
+++ gcc/targhooks.h 2019-08-15 14:28:02.045548244 +0100
@@ -265,11 +265,6 @@ extern rtx default_load_bounds_for_arg (
 extern void default_store_bounds_for_arg (rtx, rtx, rtx, rtx);
 extern rtx default_load_returned_bounds (rtx);
 extern void default_store_returned_bounds (rtx,rtx);
-extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca 
ATTRIBUTE_UNUSED,
- machine_mode mode 
ATTRIBUTE_UNUSED,
- tree type ATTRIBUTE_UNUSED,
- int *pretend_arg_size 
ATTRIBUTE_UNUSED,
- int second_time 
ATTRIBUTE_UNUSED);
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
   optimization_type);
 extern unsigned int default_max_noce_ifcvt_seq_cost (edge);
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  2019-08-13 22:35:11.737252196 +0100
+++ gcc/config/i386/i386.c  2019-08-15 14:28:02.037548302 +0100
@@ -4126,34 +4126,6 @@ ix86_setup_incoming_varargs (cumulative_
 setup_incoming_varargs_64 (&next_cum);
 }
 

Re: Add ARRAY_REF based access patch disambiguation

2019-08-15 Thread Jan Hubicka
Hi,
here is updated version.
> > + /* We generally assume that both access paths starts by same sequence
> > +of refs.  However if number of array refs is not in sync, try
> > +to recover and pop elts until number match.  This helps the case
> > +where one access path starts by array and other by element.  */
> 
> I think part of the confusion is that we're doing this in the outer
> while loop, so "starts" applies to all sub-paths we consider?
> 
> Or only to the innermost?
> 
> So - why's this inside the loop?  Some actual access path pair
> examples in the comment would help.  And definitely more testcases
> since the single one you add is too simplistic to need all this code ;)

I have added a testcase. Basically we hope the chain of array refs to
end by same type and in that case we want to peel out the outermost. You
are right that it does not work always especially if the innermost
reference is array (which is rare).

I suppose one can do type matching once actually have
same_type_for_tbaa_p working on array_ref.

I have added testcase, if you would preffer to move the logic out of the
walking loop, I can do that. I can also just drop it and handle this
later.

I put it into inner loop basically to increase chances that we
succesfully walk access paths of different types in situation
-fno-strict-aliasing is used and the type sequences are not fully
compatible. 

I plan to put some love into -fno-strict-aliasing incrementally.

This patch adds testcase for the access paths of different lengths and
fixes other issues discussed. It is another testcase where fre1 seems to
give up and fre3 is needed.

Before fre1 we get:
test (int i, int j)
{
  int[10][10] * innerptr;
  int[10][10][10] * barptr.0_1;
  int[10][10][10] * barptr.1_2;
  int _9;

   :
  innerptr_4 = barptr;
  barptr.0_1 = barptr;
  (*barptr.0_1)[i_5(D)][2][j_6(D)] = 10;
  (*innerptr_4)[3][j_6(D)] = 11;
  barptr.1_2 = barptr;
  _9 = (*barptr.1_2)[i_5(D)][2][j_6(D)];
  return _9;

}

that is optimized to:
test (int i, int j)
{
  int[10][10] * innerptr;
  int _9;

   :
  innerptr_4 = barptr;
  MEM[(int[10][10][10] *)innerptr_4][i_5(D)][2][j_6(D)] = 10;
  (*innerptr_4)[3][j_6(D)] = 11;
  _9 = MEM[(int[10][10][10] *)innerptr_4][i_5(D)][2][j_6(D)];
  return _9;

}

before fre3 we get:
test (int i, int j)
{
  int[10][10] * innerptr;
  int _7;

   [local count: 1073741824]:
  innerptr_2 = barptr;
  MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)] = 10;
  (*innerptr_2)[3][j_4(D)] = 11;
  _7 = MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)];
  return _7;

}
and fre3 does:
test (int i, int j)
{
  int[10][10] * innerptr;

   [local count: 1073741824]:
  innerptr_2 = barptr;
  MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)] = 10;
  (*innerptr_2)[3][j_4(D)] = 11;
  return 10;

}

Honza

* tree-ssa-alias.c (nonoverlapping_component_refs_since_match_p):
Rename to ...
(nonoverlapping_refs_since_match_p): ... this; handle also
ARRAY_REFs.
(alias_stats): Update stats.
(dump_alias_stats): Likewise.
(cheap_array_ref_low_bound): New function.
(aliasing_matching_component_refs_p): Add partial_overlap
argument;
pass it to nonoverlapping_refs_since_match_p.
(aliasing_component_refs_walk): Update call of
aliasing_matching_component_refs_p
(nonoverlapping_array_refs_p): New function.
(decl_refs_may_alias_p, indirect_ref_may_alias_decl_p,
indirect_refs_may_alias_p): Update calls of
nonoverlapping_refs_since_match_p.
* gcc.dg/tree-ssa/alias-access-path-10.c: New testcase.

Index: testsuite/gcc.dg/tree-ssa/alias-access-path-10.c
===
--- testsuite/gcc.dg/tree-ssa/alias-access-path-10.c(nonexistent)
+++ testsuite/gcc.dg/tree-ssa/alias-access-path-10.c(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct a {int array[3];} a[10];
+int
+test(int i,int j)
+{
+  a[i].array[1]=123;
+  a[j].array[2]=2;
+  return a[i].array[1];
+}
+/* { dg-final { scan-tree-dump-times "return 123" 1 "fre1"} } */
Index: testsuite/gcc.dg/tree-ssa/alias-access-path-11.c
===
--- testsuite/gcc.dg/tree-ssa/alias-access-path-11.c(nonexistent)
+++ testsuite/gcc.dg/tree-ssa/alias-access-path-11.c(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fdump-tree-fre3" } */
+typedef int outerarray[10][10][10];
+typedef int innerarray[10][10];
+outerarray *barptr;
+
+int
+test(int i,int j)
+{
+  innerarray *innerptr = (innerarray *)barptr;
+  (*barptr)[i][2][j]=10;;
+  (*innerptr)[3][j]=11;
+  return (*barptr)[i][2][j];
+}
+/* { dg-final { scan-tree-dump-times "return 10" 1 "fre3"} } */
Index: tree-ssa-alias.c
===
--- tree-ssa-a

Re: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-08-15 Thread Prathamesh Kulkarni
On Thu, 8 Aug 2019 at 11:22, Prathamesh Kulkarni
 wrote:
>
> On Thu, 1 Aug 2019 at 15:34, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 25 Jul 2019 at 11:56, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
> > > >  wrote:
> > > > >
> > > > > Hi Prathamesh
> > > > >
> > > > > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > > > > Hi,
> > > > > > For following test-case,
> > > > > > static long long AL[24];
> > > > > >
> > > > > > int
> > > > > > check_ok (void)
> > > > > > {
> > > > > >   return (__sync_bool_compare_and_swap (AL+1, 0x20003ll,
> > > > > > 0x1234567890ll));
> > > > > > }
> > > > > >
> > > > > > Compiling with -O2 -march=armv8.2-a results in:
> > > > > > pr90724.c: In function ‘check_ok’:
> > > > > > pr90724.c:7:1: error: unrecognizable insn:
> > > > > > 7 | }
> > > > > >   | ^
> > > > > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > > > > (compare:CC (reg:DI 95)
> > > > > > (const_int 8589934595 [0x20003]))) "pr90724.c":6:11 
> > > > > > -1
> > > > > >  (nil))
> > > > > >
> > > > > > IIUC, the issue is that 0x20003 falls outside the range of
> > > > > > allowable immediate in cmp ? If it's replaced by a small constant 
> > > > > > then
> > > > > > it works.
> > > > > >
> > > > > > The ICE results with -march=armv8.2-a because, we enter if
> > > > > > (TARGET_LSE) { ... } condition
> > > > > > in aarch64_expand_compare_and_swap, while with -march=armv8.a it 
> > > > > > goes
> > > > > > into else,
> > > > > > which forces oldval into register if the predicate fails to match.
> > > > > >
> > > > > > The attached patch checks if y (oldval) satisfies 
> > > > > > aarch64_plus_operand
> > > > > > predicate and if not, forces it to be in register, which resolves 
> > > > > > ICE.
> > > > > > Does it look OK ?
> > > > > >
> > > > > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > > > > >
> > > > > > PS: The issue has nothing to do with SVE, which I incorrectly
> > > > > > mentioned in bug report.
> > > > > >
> > > > > This looks ok to me (but you'll need maintainer approval).
> > > > >
> > > > > Does this fail on the branches as well?
> > > > Hi Kyrill,
> > > > Thanks for the review. The test also fails on gcc-9-branch (but not on 
> > > > gcc-8).
> > > Hi James,
> > > Is the patch OK to commit  ?
> > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > ping * 3: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> ping * 4: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
ping * 5: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kyrill
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh


[committed][MSP430] Fix non-GNU style in gcc/config/msp430/*{c,h} files

2019-08-15 Thread Jozef Lawrynowicz
The attached committed patches fix various GNU style violations in the msp430
backend.

The fixed problems include:
- Incorrect indentation
- Whitespace before left square bracket
- 8 spaces used instead of tab
- Whitespace before closing parenthesis
- Lines more than 80 characters long

Successfully regtested for msp430-elf.
Committed on trunk as obvious.
>From 1298708ccd5da89dca906287dad0afb57d00c08b Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Thu, 15 Aug 2019 12:55:33 +
Subject: [PATCH 1/2] 2019-08-15  Jozef Lawrynowicz  

	MSP430: Fix whitespace errors and incorrect indentation in
	config/msp430/*.{c,h} files

	* config/msp430/driver-msp430.c (msp430_select_cpu): Fix indentation.
	(msp430_select_hwmult_lib): Likewise.
	* config/msp430/msp430-devices.c (parse_devices_csv_1): Likewise.
	(msp430_extract_mcu_data): Likewise.
	(struct t_msp430_mcu_data): Likewise.
	* config/msp430/msp430.c (struct machine_function): Remove whitespace
	before left square bracket.
	(msp430_option_override): Fix indentation.
	(msp430_hard_regno_nregs_with_padding): Likewise.
	(msp430_initial_elimination_offset): Likewise.
	(msp430_special_register_convention_p): Remove whitespace before left
	square bracket and after exclamation mark.
	(msp430_evaluate_arg): Likewise.
	(msp430_callee_copies): Fix indentation.
	(msp430_gimplify_va_arg_expr): Likewise.
	(msp430_function_arg_advance): Remove whitespace before left square
	bracket.
	(reg_ok_for_addr): Likewise.
	(msp430_preserve_reg_p): Likewise.
	(msp430_compute_frame_info): Likewise.
	(msp430_asm_output_addr_const_extra): Add space between function name
	and open parenthesis.
	(has_section_name): Fix indentation.
	(msp430_attr): Remove trailing whitespace.
	(msp430_section_attr): Likewise.
	(msp430_data_attr): Likewise.
	(struct msp430_attribute_table): Fix comment and whitespace.
	(msp430_start_function): Remove whitespace before left square bracket.
	Add space between function name and open parenthesis.
	(msp430_select_section): Remove trailing whitespace.
	(msp430_section_type_flags): Remove trailing whitespace.
	(msp430_unique_section): Remove space before closing parenthesis.
	(msp430_output_aligned_decl_common): Change 8 spaces to a tab.
	(msp430_builtins): Remove whitespace before left square bracket.
	(msp430_init_builtins):	Fix indentation.
	(msp430_expand_prologue): Remove whitespace before left square bracket.
	Remove space before closing parenthesis.
	(msp430_expand_epilogue): Remove whitespace before left square bracket.
	(msp430_split_movsi): Remove space before closing parenthesis.
	(helper_function_name_mappings): Fix indentation.
	(msp430_use_f5_series_hwmult): Fix whitespace.
	(use_32bit_hwmult): Likewise.
	(msp430_no_hwmult): Likewise.
	(msp430_output_labelref): Remove whitespace before left square bracket.
	(msp430_print_operand_raw): Likewise.
	(msp430_print_operand_addr): Likewise.
	(msp430_print_operand): Add two spaces after '.' in comment.
	Fix trailing whitespace.
	(msp430x_extendhisi): Fix indentation.
	* config/msp430/msp430.h (TARGET_CPU_CPP_BUILTINS): Change 8 spaces to
	tab.
	(PC_REGNUM): Likewise.
	(STACK_POINTER_REGNUM): Likewise.
	(CC_REGNUM): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@274536 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |   60 ++
 gcc/config/msp430/driver-msp430.c  |   84 +-
 gcc/config/msp430/msp430-devices.c | 1234 ++--
 gcc/config/msp430/msp430.c |  401 +
 gcc/config/msp430/msp430.h |   18 +-
 5 files changed, 927 insertions(+), 870 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 458caa5a89e..86cfa99cf5c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,63 @@
+2019-08-15  Jozef Lawrynowicz  
+
+	MSP430: Fix whitespace errors and incorrect indentation in
+	config/msp430/*.{c,h} files
+
+	* config/msp430/driver-msp430.c (msp430_select_cpu): Fix indentation.
+	(msp430_select_hwmult_lib): Likewise.
+	* config/msp430/msp430-devices.c (parse_devices_csv_1): Likewise.
+	(msp430_extract_mcu_data): Likewise.
+	(struct t_msp430_mcu_data): Likewise.
+	* config/msp430/msp430.c (struct machine_function): Remove whitespace
+	before left square bracket.
+	(msp430_option_override): Fix indentation.
+	(msp430_hard_regno_nregs_with_padding): Likewise.
+	(msp430_initial_elimination_offset): Likewise.
+	(msp430_special_register_convention_p): Remove whitespace before left
+	square bracket and after exclamation mark.
+	(msp430_evaluate_arg): Likewise.
+	(msp430_callee_copies): Fix indentation.
+	(msp430_gimplify_va_arg_expr): Likewise.
+	(msp430_function_arg_advance): Remove whitespace before left square
+	bracket.
+	(reg_ok_for_addr): Likewise.
+	(msp430_preserve_reg_p): Likewise.
+	(msp430_compute_frame_info): Likewise.
+	(msp430_asm_output_addr_const_extra): Add space between function name
+	and open parenthesis.
+	(has_section_name): Fix indentation.
+	(msp430_attr): Remove trailing whitespace.
+	(msp430_section_attr): 

Re: i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2019 at 3:01 PM Alexandre Oliva  wrote:
>
> On Aug 15, 2019, Uros Bizjak  wrote:
>
> > On Thu, Aug 15, 2019 at 1:39 PM Alexandre Oliva  wrote:
>
> >> If we just use the best-suited way to
> >> take the address of a function behind the compiler's back on each
> >> target variant, we're less likely to hit unexpected failures.
>
> > Perhaps we should use true absolute address:
>
> But why?  Using absolute addresses is not relevant at all to the test.
> We just need some means to take the address of foo without the
> compiler's realizing we're doing so.

The immediate of lea is limited to +-2GB, so we have in effect
-mcmodel=medium. Using movabs would be OK for all code models.

Uros.


Re: types for VR_VARYING

2019-08-15 Thread Aldy Hernandez




On 8/15/19 7:23 AM, Richard Biener wrote:

On Thu, Aug 15, 2019 at 12:40 PM Aldy Hernandez  wrote:


On 8/14/19 1:37 PM, Jeff Law wrote:

On 8/13/19 6:39 PM, Aldy Hernandez wrote:



On 8/12/19 7:46 PM, Jeff Law wrote:

On 8/12/19 12:43 PM, Aldy Hernandez wrote:

This is a fresh re-post of:

https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html

Andrew gave me some feedback a week ago, and I obviously don't remember
what it was because I was about to leave on PTO.  However, I do remember
I addressed his concerns before getting drunk on rum in tropical islands.


FWIW found a great coffee infused rum while in Kauai last week.  I'm not
a coffee fan, but it was wonderful.  The one bottle we brought back
isn't going to last until Cauldron and I don't think I can get a special
order filled before I leave :(


You must bring some to Cauldron before we believe you. :)

That's the problem.  The nearest place I can get it is in Vegas and
there's no distributor in Montreal.   I can special order it in our
state run stores, but it won't be here in time.

Of course, I don't mind if you don't believe me.  More for me in that
case...



Is the supports_type_p stuff there to placate the calls from ipa-cp?  I
can live with it in the short term, but it really feels like there
should be something in the ipa-cp client that avoids this silliness.


I am not happy with this either, but there are various places where
statements that are !stmt_interesting_for_vrp() are still setting a
range of VARYING, which is then being ignored at a later time.

For example, vrp_initialize:

if (!stmt_interesting_for_vrp (phi))
  {
tree lhs = PHI_RESULT (phi);
set_def_to_varying (lhs);
prop_set_simulate_again (phi, false);
  }

Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the
statement is interesting for VRP but extract_range_from_stmt() does not
produce a useful range, we also set a varying for a range we will never
use.  Similarly for a statement that is not interesting in this hunk.

Ugh.  One could perhaps argue that setting any kind of range in these
circumstances is silly.   But I suspect it's necessary due to the
optimistic handling of VR_UNDEFINED in value_range_base::union_helper.
It's all coming back to me now...




Then there is vrp_prop::visit_stmt() where we also set VARYING for types
that VRP will never handle:

case IFN_ADD_OVERFLOW:
case IFN_SUB_OVERFLOW:
case IFN_MUL_OVERFLOW:
case IFN_ATOMIC_COMPARE_EXCHANGE:
  /* These internal calls return _Complex integer type,
 which VRP does not track, but the immediate uses
 thereof might be interesting.  */
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
  imm_use_iterator iter;
  use_operand_p use_p;
  enum ssa_prop_result res = SSA_PROP_VARYING;

  set_def_to_varying (lhs);

I've adjusted the patch so that set_def_to_varying will set the range to
VR_UNDEFINED if !supports_type_p.  This is a fail safe, as we can't
really do anything with a nonsensical range.  I just don't want to leave
the range in an indeterminate state.


I think VR_UNDEFINED is unsafe due to value_range_base::union_helper.
And that's a more general than this patch.  VR_UNDEFINED is _not_ a safe
range to set something to if we can't handle it.  We have to use VR_VARYING.

Why?  See the beginning of value_range_base::union_helper:

 /* VR0 has the resulting range if VR1 is undefined or VR0 is varying.  */
 if (vr1->undefined_p ()
 || vr0->varying_p ())
   return *vr0;

 /* VR1 has the resulting range if VR0 is undefined or VR1 is varying.  */
 if (vr0->undefined_p ()
 || vr1->varying_p ())
   return *vr1;
This can get called for something like

a =  ? name1 : name2;

If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe
value for something we can't handle, then we'll incorrectly return the
range for name2.


I think that if name1 was !supports_type_p, we will have never called
union/intersect.  We will have bailed at some point earlier.  However, I
do see your point about being consistent.



VR_UNDEFINED can only be used for the ranges of objects we haven't
processed.  If we can't produce a range for an object because the
statement is something we don't handle or just doesn't produce anythign
useful, then the right result is VR_VARYING.

This may be worth commenting at the definition site for VR_*.



I also noticed that Andrew's patch was setting num_vr_values to
num_ssa_names + num_ssa_names / 10.  I think he meant num_vr_values +
num_vr_values / 10.  Please verify the current incantation makes sense.

Going to assume this will be adjusted per the other messages in this thread.


Done.





diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39ea22f0554..663dd6e2398 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -182,8 +182,10 

Re: i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Alexandre Oliva
On Aug 15, 2019, Uros Bizjak  wrote:

> On Thu, Aug 15, 2019 at 1:39 PM Alexandre Oliva  wrote:

>> If we just use the best-suited way to
>> take the address of a function behind the compiler's back on each
>> target variant, we're less likely to hit unexpected failures.

> Perhaps we should use true absolute address:

But why?  Using absolute addresses is not relevant at all to the test.
We just need some means to take the address of foo without the
compiler's realizing we're doing so.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener
On Thu, 15 Aug 2019, Bernd Edlinger wrote:

> On 8/15/19 10:55 AM, Richard Biener wrote:
> > On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/14/19 2:00 PM, Richard Biener wrote:
> >>
> >> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
> >>
> >> It could be done, but in the moment I would be happy to have these
> >> checks of one major strict alignment target, ARM is a good candidate
> >> since most instructions work even if they are accidentally
> >> using unaligned arguments.  So middle-end errors do not always
> >> visible by ordinary tests.  Nevertheless it is a blatant violation of the
> >> contract between middle-end and back-end, which should be avoided.
> > 
> > Fair enough.
> > 
>  Several struct-layout-1.dg testcase tripped over misaligned
>  complex_cst constants, fixed by varasm.c (align_variable).
>  This is likely a wrong code bug, because misaligned complex
>  constants, are expanded to misaligned MEM_REF, but the
>  expansion cannot handle misaligned constants, only packed
>  structure fields.
> >>>
> >>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> >>> be better to do that more conciously by
> >>>
> >>>   if (! DECL_USER_ALIGN (decl)
> >>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
> >>>
> 
> ? I don't know why that would be better?
> If the value is underaligned no matter why, pretend it was declared as
> naturally aligned if that causes wrong code otherwise.
> That was the idea here.

It would be better because then we ignore it and use what we'd use
by default rather than inventing sth new.  And your patch suggests
it might be needed to up align even w/o DECL_USER_ALIGN.

> >>> ?  And why is the movmisalign optab support missing here?
> >>>
> >>
> >> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
> >>
> >>   /* If we can't trust the parm stack slot to be aligned enough for its
> >>  ultimate type, don't use that slot after entry.  We'll make another
> >>  stack slot, if we need one.  */
> >>   if (stack_parm
> >>   && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
> >>&& targetm.slow_unaligned_access (data->nominal_mode,
> >>  MEM_ALIGN (stack_parm)))
> >>
> >> which also makes a variable more aligned than it is declared.
> >> But maybe both should also check the movmisalign optab in
> >> addition to slow_unaligned_access ?
> > 
> > Quite possible.
> > 
> 
> Will do, see attached new version of the patch.
> 
> >>> IMHO whatever code later fails to properly use unaligned loads
> >>> should be fixed instead rather than ignoring user requested alignment.
> >>>
> >>> Can you quote a short testcase that explains what exactly goes wrong?
> >>> The struct-layout ones are awkward to look at...
> >>>
> >>
> >> Sure,
> >>
> >> $ cat test.c
> >> _Complex float __attribute__((aligned(1))) cf;
> >>
> >> void foo (void)
> >> {
> >>   cf = 1.0i;
> >> }
> >>
> >> $ arm-linux-gnueabihf-gcc -S test.c 
> >> during RTL pass: expand
> >> test.c: In function 'foo':
> >> test.c:5:6: internal compiler error: in gen_movsf, at 
> >> config/arm/arm.md:7003
> >> 5 |   cf = 1.0i;
> >>   |   ~~~^~
> >> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/config/arm/arm.md:7003
> >> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>../../gcc-trunk/gcc/recog.h:318
> >> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3695
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3791
> >> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3490
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >>../../gcc-trunk/gcc/expr.c:3791
> >> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
> >>../../gcc-trunk/gcc/expr.c:5855
> >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> >>../../gcc-trunk/gcc/expr.c:5441
> > 
> > Huh, so why didn't it trigger
> > 
> >   /* Handle misaligned stores.  */
> >   mode = TYPE_MODE (TREE_TYPE (to));
> >   if ((TREE_CODE (to) == MEM_REF
> >|| TREE_CODE (to) == TARGET_MEM_REF)
> >   && mode != BLKmode
> >   && !mem_ref_refers_to_non_mem_p (to)
> >   && ((align = get_object_alignment (to))
> >   < GET_MODE_ALIGNMENT (mode))
> >   && (((icode = optab_handler (movmisalign_optab, mode))
> >!= CODE_FOR_nothing)
> >   || targetm.slow_unaligned_access (mode, align)))
> > {
> > 
> > ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> > var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2]  > 0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> > 
> > Ah, 'to' is a plain DECL here so the above handling is incomplete.
> > IIRC component refs like __real cf = 0.f should be handled fi

Re: i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2019 at 1:39 PM Alexandre Oliva  wrote:
>
> gcc.target/i386/asm-4.c uses amd64's natural PC-relative addressing
> mode on a single platform, using the 32-bit absolute addressing mode
> elsewhere.  There's no point in giving up amd64's natural addressing
> mode and insisting on the 32-bit one when we're targeting amd64, and
> having to make explicit exceptions for systems where that's found not
> to work for whatever reason.  If we just use the best-suited way to
> take the address of a function behind the compiler's back on each
> target variant, we're less likely to hit unexpected failures.
>
> Tested on x86_64-linux-gnu with unix{,-m32}.  Ok to install?

Perhaps we should use true absolute address:

--cut here--
Index: asm-4.c
===
--- asm-4.c (revision 274504)
+++ asm-4.c (working copy)
@@ -27,12 +27,10 @@
 void
 baz (void)
 {
-  /* Darwin loads 64-bit regions above the 4GB boundary so
- we need to use this instead.  */
-#if defined (__LP64__) && defined (__MACH__)
-  __asm ("leaq foo(%%rip), %0" : "=r" (fn));
+#if defined (__LP64__)
+  __asm ("movabsq $foo, %0" : "=r" (fn));
 #else
-  __asm ("movl $foo, %k0" : "=r" (fn));
+  __asm ("movl $foo, %0" : "=r" (fn));
 #endif
   if (fn (2, 3, 4, 5) != 14)
 abort ();
--cut here--

Uros.


[C++ PATCH] Implement P0848R3, Conditionally Trivial Special Member Functions.

2019-08-15 Thread Jason Merrill
With Concepts, overloads of special member functions can differ in
constraints, and this paper clarifies how that affects class properties: if
a class has a more constrained trivial copy constructor and a less
constrained non-trivial copy constructor, it is still trivially copyable.

Tested x86_64-pc-linux-gnu, applying to trunk.

* tree.c (special_memfn_p): New.
* class.c (add_method): When overloading, hide ineligible special
member fns.
(check_methods): Set TYPE_HAS_COMPLEX_* here.
* decl.c (grok_special_member_properties): Not here.
* name-lookup.c (push_class_level_binding_1): Move overloaded
functions case down, accept FUNCTION_DECL as target_decl.
---
 gcc/cp/cp-tree.h |  3 +-
 gcc/cp/class.c   | 98 ++--
 gcc/cp/decl.c|  8 --
 gcc/cp/name-lookup.c |  6 +-
 gcc/cp/tree.c| 25 ++
 gcc/testsuite/g++.dg/concepts/pr89036.C  | 10 +++
 gcc/testsuite/g++.dg/cpp2a/cond-triv1.C  | 46 +++
 gcc/testsuite/g++.dg/cpp2a/cond-triv1a.C | 46 +++
 gcc/cp/ChangeLog | 11 +++
 9 files changed, 233 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/cond-triv1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/cond-triv1a.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index bdb7778c04b..05f91861b42 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6313,7 +6313,7 @@ extern void determine_key_method  (tree);
 extern void check_for_override (tree, tree);
 extern void push_class_stack   (void);
 extern void pop_class_stack(void);
-extern bool default_ctor_p (tree);
+extern bool default_ctor_p (const_tree);
 extern bool type_has_user_nondefault_constructor (tree);
 extern tree in_class_defaulted_default_constructor (tree);
 extern bool user_provided_p(tree);
@@ -7322,6 +7322,7 @@ extern tree cp_build_qualified_type_real  (tree, int, 
tsubst_flags_t);
 extern bool cv_qualified_p (const_tree);
 extern tree cv_unqualified (tree);
 extern special_function_kind special_function_p (const_tree);
+extern special_function_kind special_memfn_p   (const_tree);
 extern int count_trees (tree);
 extern int char_type_p (tree);
 extern void verify_stmt_tree   (tree);
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index b61152c7e72..cc53b15401a 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -994,6 +994,9 @@ add_method (tree type, tree method, bool via_using)
   tree *slot = find_member_slot (type, DECL_NAME (method));
   tree current_fns = slot ? *slot : NULL_TREE;
 
+  /* See below.  */
+  int losem = -1;
+
   /* Check to see if we've already got this method.  */
   for (ovl_iterator iter (current_fns); iter; ++iter)
 {
@@ -1070,9 +1073,48 @@ add_method (tree type, tree method, bool via_using)
   if (compparms (parms1, parms2)
  && (!DECL_CONV_FN_P (fn)
  || same_type_p (TREE_TYPE (fn_type),
- TREE_TYPE (method_type)))
-  && equivalently_constrained (fn, method))
+ TREE_TYPE (method_type
{
+  if (!equivalently_constrained (fn, method))
+   {
+ special_function_kind sfk = special_memfn_p (method);
+
+ if (sfk == sfk_none)
+   /* Non-special member functions coexist if they are not
+  equivalently constrained.  */
+   continue;
+
+ /* P0848: For special member functions, deleted, unsatisfied, or
+less constrained overloads are ineligible.  We implement this
+by removing them from CLASSTYPE_MEMBER_VEC.  Destructors don't
+use the notion of eligibility, and the selected destructor can
+be deleted, but removing unsatisfied or less constrained
+overloads has the same effect as overload resolution.  */
+ bool dtor = (sfk == sfk_destructor);
+ if (losem == -1)
+   losem = ((!dtor && DECL_DELETED_FN (method))
+|| !constraints_satisfied_p (method));
+ bool losef = ((!dtor && DECL_DELETED_FN (fn))
+   || !constraints_satisfied_p (fn));
+ int win;
+ if (losem || losef)
+   win = losem - losef;
+ else
+   win = more_constrained (fn, method);
+ if (win > 0)
+   /* Leave FN in the method vec, discard METHOD.  */
+   return false;
+ else if (win < 0)
+   {
+ /* Remove FN, add METHOD.  */
+ current_fns = iter.remove_node (current_fns);
+

Re: Patch to support extended characters in C/C++ identifiers

2019-08-15 Thread Joseph Myers
On Thu, 15 Aug 2019, Jason Merrill wrote:

> On 8/12/19 6:01 PM, Lewis Hyatt wrote:
> > Hello-
> > 
> > The attached patch for libcpp adds support for extended characters (e.g.
> > UTF-8)
> > in identifiers. A preliminary version of the patch was posted on PR c/67224
> > as
> > Comment 26 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c26) and
> > discussed with Joseph Myers. Here is an updated patch incorporating all
> > feedback received so far. I hope it is suitable now; please let me know if I
> > can do anything else to make it ready for you to apply. I am happy to work
> > on
> > it further, whatever is needed. I can't easily test on anything other than
> > x86_64-linux though. I did bootstrap all languages and run all tests on that
> > platform, everything was good.
> > 
> > The (relatively short) changes to libcpp are included inline here. I
> > attached
> > the test cases as a gzipped patch to avoid any problems with the encoding
> > (the
> > test cases contain some invalid UTF-8 and also other encodings such as
> > latin-1
> > as part of the testing).
> > 
> > Thanks for taking a look at it!
> 
> Looks good to me.  Joseph?

I'm a month behind on gcc-patches at present.  It will take me a while to 
get to this for detailed review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Bernd Edlinger
On 8/15/19 10:55 AM, Richard Biener wrote:
> On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> 
>> On 8/14/19 2:00 PM, Richard Biener wrote:
>>
>> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
>>
>> It could be done, but in the moment I would be happy to have these
>> checks of one major strict alignment target, ARM is a good candidate
>> since most instructions work even if they are accidentally
>> using unaligned arguments.  So middle-end errors do not always
>> visible by ordinary tests.  Nevertheless it is a blatant violation of the
>> contract between middle-end and back-end, which should be avoided.
> 
> Fair enough.
> 
 Several struct-layout-1.dg testcase tripped over misaligned
 complex_cst constants, fixed by varasm.c (align_variable).
 This is likely a wrong code bug, because misaligned complex
 constants, are expanded to misaligned MEM_REF, but the
 expansion cannot handle misaligned constants, only packed
 structure fields.
>>>
>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>> be better to do that more conciously by
>>>
>>>   if (! DECL_USER_ALIGN (decl)
>>>   || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>   && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>>>

? I don't know why that would be better?
If the value is underaligned no matter why, pretend it was declared as
naturally aligned if that causes wrong code otherwise.
That was the idea here.

>>> ?  And why is the movmisalign optab support missing here?
>>>
>>
>> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
>>
>>   /* If we can't trust the parm stack slot to be aligned enough for its
>>  ultimate type, don't use that slot after entry.  We'll make another
>>  stack slot, if we need one.  */
>>   if (stack_parm
>>   && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
>>&& targetm.slow_unaligned_access (data->nominal_mode,
>>  MEM_ALIGN (stack_parm)))
>>
>> which also makes a variable more aligned than it is declared.
>> But maybe both should also check the movmisalign optab in
>> addition to slow_unaligned_access ?
> 
> Quite possible.
> 

Will do, see attached new version of the patch.

>>> IMHO whatever code later fails to properly use unaligned loads
>>> should be fixed instead rather than ignoring user requested alignment.
>>>
>>> Can you quote a short testcase that explains what exactly goes wrong?
>>> The struct-layout ones are awkward to look at...
>>>
>>
>> Sure,
>>
>> $ cat test.c
>> _Complex float __attribute__((aligned(1))) cf;
>>
>> void foo (void)
>> {
>>   cf = 1.0i;
>> }
>>
>> $ arm-linux-gnueabihf-gcc -S test.c 
>> during RTL pass: expand
>> test.c: In function 'foo':
>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>> 5 |   cf = 1.0i;
>>   |   ~~~^~
>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/config/arm/arm.md:7003
>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>  ../../gcc-trunk/gcc/recog.h:318
>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3695
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3791
>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3490
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>  ../../gcc-trunk/gcc/expr.c:3791
>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>  ../../gcc-trunk/gcc/expr.c:5855
>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>  ../../gcc-trunk/gcc/expr.c:5441
> 
> Huh, so why didn't it trigger
> 
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>|| TREE_CODE (to) == TARGET_MEM_REF)
>   && mode != BLKmode
>   && !mem_ref_refers_to_non_mem_p (to)
>   && ((align = get_object_alignment (to))
>   < GET_MODE_ALIGNMENT (mode))
>   && (((icode = optab_handler (movmisalign_optab, mode))
>!= CODE_FOR_nothing)
>   || targetm.slow_unaligned_access (mode, align)))
> {
> 
> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2]  0x2aad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> 
> Ah, 'to' is a plain DECL here so the above handling is incomplete.
> IIRC component refs like __real cf = 0.f should be handled fine
> again(?).  So, does adding || DECL_P (to) fix the case as well?
> 

So I tried this instead of the varasm.c change:

Index: expr.c
===
--- expr.c  (revision 274487)
+++ expr.c  (working copy)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-   || 

address change

2019-08-15 Thread Alexandre Oliva
Oops, I forgot to update the MAINTAINERS file a couple of months ago,
when the address there stopped working.

Honestly, I haven't really had much involvement with the frv, mn10300
or sh ports for almost 15 years, so I wouldn't mind if someone else
stepped up and took over, but until someone does, I don't mind
reviewing the occasional patch, so it's best if it can reach me ;-)

I'll put this in in the not-too-distant future.


for  ChangeLog

* MAINTAINERS: aoliva from @redhat.com to @gcc.gnu.org.
---
 MAINTAINERS |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 912663c44978..5d8402949bc0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -64,7 +64,7 @@ c-sky portYunhai Shang

 epiphany port  Joern Rennecke  
 fr30 port  Nick Clifton
 frv port   Nick Clifton
-frv port   Alexandre Oliva 
+frv port   Alexandre Oliva 
 ft32 port  James Bowman
 h8 portJeff Law
 hppa port  Jeff Law
@@ -83,7 +83,7 @@ microblazeMichael Eager   

 mips port  Matthew Fortune 
 mmix port  Hans-Peter Nilsson  
 mn10300 port   Jeff Law
-mn10300 port   Alexandre Oliva 
+mn10300 port   Alexandre Oliva 
 moxie port Anthony Green   
 msp430 portNick Clifton
 nds32 port Chung-Ju Wu 
@@ -105,7 +105,7 @@ rx port Nick Clifton

 s390 port  Hartmut Penner  
 s390 port  Ulrich Weigand  
 s390 port  Andreas Krebbel 
-sh portAlexandre Oliva 
+sh portAlexandre Oliva 
 sh portOleg Endo   
 sparc port David S. Miller 
 sparc port Eric Botcazou   
@@ -213,7 +213,7 @@ diagnostic messages Dodji Seketeli  

 diagnostic messagesDavid Malcolm   
 build machinery (*.in) Paolo Bonzini   
 build machinery (*.in) Nathanael Nerode
-build machinery (*.in) Alexandre Oliva 
+build machinery (*.in) Alexandre Oliva 
 build machinery (*.in) Ralf Wildenhues 
 docs co-maintainer Gerald Pfeifer  
 docs co-maintainer Joseph Myers

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[COMMITTED] function.c (assign_parm_setup_reg): Handle misaligned stack arguments

2019-08-15 Thread Bernd Edlinger
Hi!

This is another approved part from my patch "Fix not 8-byte aligned ldrd/strd 
on ARMv5 (PR 89544)"
committed as "obvious".

$ svn diff -r274530:274531 -x -p
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (Revision 274530)
+++ gcc/ChangeLog   (Revision 274531)
@@ -1,3 +1,7 @@
+2019-08-15  Bernd Edlinger  
+
+   * function.c (assign_parm_setup_reg): Handle misaligned stack arguments.
+
 2019-08-15  Martin Liska  
 
* tree-ssa-dce.c (propagate_necessity): We can't reach now
Index: gcc/function.c
===
--- gcc/function.c  (Revision 274530)
+++ gcc/function.c  (Revision 274531)
@@ -3127,6 +3127,7 @@ assign_parm_setup_reg (struct assign_parm_data_all
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
   bool did_conversion = false;
   bool need_conversion, moved;
+  enum insn_code icode;
   rtx rtl;
 
   /* Store the parm in a pseudoregister during the function, but we may
@@ -3188,7 +3189,6 @@ assign_parm_setup_reg (struct assign_parm_data_all
 conversion.  We verify that this insn does not clobber any
 hard registers.  */
 
-  enum insn_code icode;
   rtx op0, op1;
 
   icode = can_extend_p (promoted_nominal_mode, data->passed_mode,
@@ -3291,6 +3291,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
 
   did_conversion = true;
 }
+  else if (MEM_P (data->entry_parm)
+  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+ > MEM_ALIGN (data->entry_parm)
+  && (((icode = optab_handler (movmisalign_optab,
+   promoted_nominal_mode))
+   != CODE_FOR_nothing)
+  || targetm.slow_unaligned_access (promoted_nominal_mode,
+MEM_ALIGN (data->entry_parm
+{
+  if (icode != CODE_FOR_nothing)
+   emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+  else
+   rtl = parmreg = extract_bit_field (validated_mem,
+   GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+   unsignedp, parmreg,
+   promoted_nominal_mode, VOIDmode, false, NULL);
+}
   else
 emit_move_insn (parmreg, validated_mem);
 


Thanks
Bernd.


i386/asm-4 test: use amd64's natural addressing mode on all OSs

2019-08-15 Thread Alexandre Oliva
gcc.target/i386/asm-4.c uses amd64's natural PC-relative addressing
mode on a single platform, using the 32-bit absolute addressing mode
elsewhere.  There's no point in giving up amd64's natural addressing
mode and insisting on the 32-bit one when we're targeting amd64, and
having to make explicit exceptions for systems where that's found not
to work for whatever reason.  If we just use the best-suited way to
take the address of a function behind the compiler's back on each
target variant, we're less likely to hit unexpected failures.

Tested on x86_64-linux-gnu with unix{,-m32}.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/asm-4.c: Use amd64 natural addressing mode
on all __LP64__ targets.
---
 gcc/testsuite/gcc.target/i386/asm-4.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/asm-4.c 
b/gcc/testsuite/gcc.target/i386/asm-4.c
index b86801032bc4..69dd1d3df0bf 100644
--- a/gcc/testsuite/gcc.target/i386/asm-4.c
+++ b/gcc/testsuite/gcc.target/i386/asm-4.c
@@ -29,7 +29,7 @@ baz (void)
 {
   /* Darwin loads 64-bit regions above the 4GB boundary so
  we need to use this instead.  */
-#if defined (__LP64__) && defined (__MACH__)
+#if defined (__LP64__)
   __asm ("leaq foo(%%rip), %0" : "=r" (fn));
 #else
   __asm ("movl $foo, %k0" : "=r" (fn));

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[patch, fortran] Fix PR 91443

2019-08-15 Thread Thomas Koenig

Hello world,

this patch fixes PR 91443, in which we did not warn about a mismatched
external procedure. The problem was that the module this was called in
was resolved before parsing of the procedure ever started.

The approach taken here is to move the checking of external procedures
to a stage after normal resolution.  And, of course, fix the resulting
fallout from regression-testing :-)

There is also one policy change in the patch. Previously, we only warned
about mismatched declarations.  Now, this is a hard error unless the
user specifies -std=legacy.  The reason is that we have not yet solved
our single declaration problem, but it cannot be solved unless all
of a procedure's callers match.  People who have such broken code
should at least be made aware that they have a problem. However, I would
like to have some sort of agreement on this point before the patch
is committed.  This can also be changed (see the code at the bottom
of frontend-passes.c).

Once this is in, the next step is to issue errors for mismatching
calls where the callee is not in the same file.  This can be done
with the infrastructure of this patch.

So, OK for trunk?

Regards

Thomas

2019-08-15  Thomas Koenig  

PR fortran/91443
* frontend-passes.c (check_externals_expr): New function.
(check_externals_code): New function.
(gfc_check_externals): New function.
* gfortran.h (debug): Add prototypes for gfc_symbol * and
gfc_expr *.
(gfc_check_externals): Add prototype.
* interface.c (compare_actual_formal): Do not complain about
alternate returns if the formal argument is optional.
(gfc_procedure_use): Handle cases when an error has been issued
previously.  Break long line.
* parse.c (gfc_parse_file): Call gfc_check_externals for all
external procedures.
* resolve.c (resolve_global_procedure): Remove checking of
argument list.

2019-08-15  Thomas Koenig  

PR fortran/91443
* gfortran.dg/argument_checking_19.f90: New test.
* gfortran.dg/altreturn_10.f90: Change dg-warning to dg-error.
* gfortran.dg/dec_union_11.f90: Add -std=legacy.
* gfortran.dg/hollerith8.f90: Likewise. Remove warning for
Hollerith constant.
* gfortran.dg/integer_exponentiation_2.f90: New subroutine gee_i8;
use it to avoid type mismatches.
* gfortran.dg/pr41011.f: Add -std=legacy.
* gfortran.dg/whole_file_1.f90: Change warnings to errors.
* gfortran.dg/whole_file_2.f90: Likewise.

Index: fortran/frontend-passes.c
===
--- fortran/frontend-passes.c	(Revision 274394)
+++ fortran/frontend-passes.c	(Arbeitskopie)
@@ -56,7 +56,6 @@ static gfc_expr* check_conjg_transpose_variable (g
 static int call_external_blas (gfc_code **, int *, void *);
 static int matmul_temp_args (gfc_code **, int *,void *data);
 static int index_interchange (gfc_code **, int*, void *);
-
 static bool is_fe_temp (gfc_expr *e);
 
 #ifdef CHECKING_P
@@ -5364,3 +5363,100 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 }
   return 0;
 }
+
+/* As a post-resolution step, check that all global symbols which are
+   not declared in the source file match in their call signatures.
+   We do this by looping over the code (and expressions). The first call
+   we happen to find is assumed to be canonical.  */
+
+/* Callback for external functions.  */
+
+static int
+check_externals_expr (gfc_expr **ep, int *walk_subtrees ATTRIBUTE_UNUSED,
+		  void *data ATTRIBUTE_UNUSED)
+{
+  gfc_expr *e = *ep;
+  gfc_symbol *sym, *def_sym;
+  gfc_gsymbol *gsym;
+
+  if (e->expr_type != EXPR_FUNCTION)
+return 0;
+
+  sym = e->value.function.esym;
+
+  if (sym == NULL || sym->attr.is_bind_c)
+return 0;
+
+  if (sym->attr.proc != PROC_EXTERNAL && sym->attr.proc != PROC_UNKNOWN)
+return 0;
+
+  gsym = gfc_find_gsymbol (gfc_gsym_root, sym->name);
+  if (gsym == NULL)
+return 0;
+
+  gfc_find_symbol (sym->name, gsym->ns, 0, &def_sym);
+
+  if (sym && def_sym)
+gfc_procedure_use (def_sym, &e->value.function.actual, &e->where);
+
+  return 0;
+}
+
+/* Callback for external code.  */
+
+static int
+check_externals_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED,
+		  void *data ATTRIBUTE_UNUSED)
+{
+  gfc_code *co = *c;
+  gfc_symbol *sym, *def_sym;
+  gfc_gsymbol *gsym;
+
+  if (co->op != EXEC_CALL)
+return 0;
+
+  sym = co->resolved_sym;
+  if (sym == NULL || sym->attr.is_bind_c)
+return 0;
+
+  if (sym->attr.proc != PROC_EXTERNAL && sym->attr.proc != PROC_UNKNOWN)
+return 0;
+
+  if (sym->attr.if_source == IFSRC_IFBODY || sym->attr.if_source == IFSRC_DECL)
+return 0;
+
+  gsym = gfc_find_gsymbol (gfc_gsym_root, sym->name);
+  if (gsym == NULL)
+return 0;
+
+  gfc_find_symbol (sym->name, gsym->ns, 0, &def_sym);
+
+  if (sym && def_sym)
+gfc_procedure_use (def_sym, &co->ext.actual, &co

Re: [patch][aarch64]: add intrinsics for vld1(q)_x4 and vst1(q)_x4

2019-08-15 Thread Kyrill Tkachov

Hi all,

On 8/6/19 10:51 AM, Richard Earnshaw (lists) wrote:

On 18/07/2019 18:18, James Greenhalgh wrote:
> On Mon, Jun 10, 2019 at 06:21:05PM +0100, Sylvia Taylor wrote:
>> Greetings,
>>
>> This patch adds the intrinsic functions for:
>> - vld1__x4
>> - vst1__x4
>> - vld1q__x4
>> - vst1q__x4
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>
>> Ok for trunk? If yes, I don't have any commit rights, so can someone
>> please commit it on my behalf.
>
> Hi,
>
> I'm concerned by this strategy for implementing the arm_neon.h builtins:
>
>> +__extension__ extern __inline int8x8x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vld1_s8_x4 (const int8_t *__a)
>> +{
>> +  union { int8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
>> +  __au.__o
>> +    = __builtin_aarch64_ld1x4v8qi ((const 
__builtin_aarch64_simd_qi *) __a);

>> +  return __au.__i;
>> +}
>
> As far as I know this is undefined behaviour in C++11. This was the best
> resource I could find pointing to the relevant standards paragraphs.
>
> 
https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior

>
> That said, GCC explicitly allows it, so maybe this is fine?
>
> 
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Optimize-Options.html#Type-punning

>
> Can anyone from the languages side chime in on whether we're exposing
> undefined behaviour (in either C or C++) here?

Yes, this is a GNU extension.  My only question is whether or not this
can be disabled within GCC if you're trying to check for strict
standards conformance of your code?  And if so, is there a way of making
sure that this header still works in that case?  A number of GNU
extensions can be protected with __extension__ but it's not clear how
that could be applied in this case.  Perhaps the outer __extension__ on
the function will already do that.

It should still work. The only relevant flag is -fstrict-aliasing and it 
is documented to preserve this case:


https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Optimize-Options.html#Optimize-Options

Note that we've already been using this idiom in arm_neon.h since 2014 
[1] and it's worked fine.


Thanks,

Kyrill

[1] http://gcc.gnu.org/r209880




R.



>
> Thanks,
> James
>
>
>
>>
>> Cheers,
>> Syl
>>
>> gcc/ChangeLog:
>>
>> 2019-06-10  Sylvia Taylor 
>>
>>   * config/aarch64/aarch64-simd-builtins.def:
>>   (ld1x4): New.
>>   (st1x4): Likewise.
>>   * config/aarch64/aarch64-simd.md:
>>   (aarch64_ld1x4): New pattern.
>>   (aarch64_st1x4): Likewise.
>>   (aarch64_ld1_x4_): Likewise.
>>   (aarch64_st1_x4_): Likewise.
>>   * config/aarch64/arm_neon.h:
>>   (vld1_s8_x4): New function.
>>   (vld1q_s8_x4): Likewise.
>>   (vld1_s16_x4): Likewise.
>>   (vld1q_s16_x4): Likewise.
>>   (vld1_s32_x4): Likewise.
>>   (vld1q_s32_x4): Likewise.
>>   (vld1_u8_x4): Likewise.
>>   (vld1q_u8_x4): Likewise.
>>   (vld1_u16_x4): Likewise.
>>   (vld1q_u16_x4): Likewise.
>>   (vld1_u32_x4): Likewise.
>>   (vld1q_u32_x4): Likewise.
>>   (vld1_f16_x4): Likewise.
>>   (vld1q_f16_x4): Likewise.
>>   (vld1_f32_x4): Likewise.
>>   (vld1q_f32_x4): Likewise.
>>   (vld1_p8_x4): Likewise.
>>   (vld1q_p8_x4): Likewise.
>>   (vld1_p16_x4): Likewise.
>>   (vld1q_p16_x4): Likewise.
>>   (vld1_s64_x4): Likewise.
>>   (vld1_u64_x4): Likewise.
>>   (vld1_p64_x4): Likewise.
>>   (vld1q_s64_x4): Likewise.
>>   (vld1q_u64_x4): Likewise.
>>   (vld1q_p64_x4): Likewise.
>>   (vld1_f64_x4): Likewise.
>>   (vld1q_f64_x4): Likewise.
>>   (vst1_s8_x4): Likewise.
>>   (vst1q_s8_x4): Likewise.
>>   (vst1_s16_x4): Likewise.
>>   (vst1q_s16_x4): Likewise.
>>   (vst1_s32_x4): Likewise.
>>   (vst1q_s32_x4): Likewise.
>>   (vst1_u8_x4): Likewise.
>>   (vst1q_u8_x4): Likewise.
>>   (vst1_u16_x4): Likewise.
>>   (vst1q_u16_x4): Likewise.
>>   (vst1_u32_x4): Likewise.
>>   (vst1q_u32_x4): Likewise.
>>   (vst1_f16_x4): Likewise.
>>   (vst1q_f16_x4): Likewise.
>>   (vst1_f32_x4): Likewise.
>>   (vst1q_f32_x4): Likewise.
>>   (vst1_p8_x4): Likewise.
>>   (vst1q_p8_x4): Likewise.
>>   (vst1_p16_x4): Likewise.
>>   (vst1q_p16_x4): Likewise.
>>   (vst1_s64_x4): Likewise.
>>   (vst1_u64_x4): Likewise.
>>   (vst1_p64_x4): Likewise.
>>   (vst1q_s64_x4): Likewise.
>>   (vst1q_u64_x4): Likewise.
>>   (vst1q_p64_x4): Likewise.
>>   (vst1_f64_x4): Likewise.
>>   (vst1q_f64_x4): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-06-10  Sylvia Taylor 
>>
>>   * gcc.target/aarch64/advsimd-intrinsics/vld1x4.c: New test.
>>   * gcc.target/aarch64/advsimd-intrinsics/vst1x4.c: New test.
>



Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-15 Thread Richard Biener
On Thu, Aug 15, 2019 at 12:47 PM Martin Liška  wrote:
>
> PING^1

OK

> On 8/8/19 10:43 AM, Martin Liška wrote:
> > On 8/7/19 4:12 PM, Richard Biener wrote:
> >> On Wed, Aug 7, 2019 at 2:04 PM Martin Liška  wrote:
> >>>
> >>> On 8/7/19 12:51 PM, Jakub Jelinek wrote:
>  On Wed, Aug 07, 2019 at 12:44:28PM +0200, Martin Liška wrote:
> > On 8/7/19 11:51 AM, Richard Biener wrote:
> >> I think the simplest way to achieve this is to not copy, aka clear,
> >> DECL_IS_OPERATOR_* when cloning and removing arguments
> >> (cloning for a constant align argument should be OK for example, as is
> >> for a constant address).  Or simply always when cloning.
> >
> > Ok, then I'm suggesting following tested patch.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  What about LAMBDA_FUNCTION, doesn't cloning which changes arguments in 
>  any
>  way invalidate that too, i.e. shouldn't it be just
>    FUNCTION_DECL_DECL_TYPE (new_node->decl) = NONE;
> >>>
> >>> Well, how are lambdas involved in the new/delete DCE here? Lambdas with 
> >>> removed
> >>> arguments should not interfere here.
> >>
> >> But for coverage where we do
> >>
> >>   gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl)
> >>&& !DECL_FUNCTION_VERSIONED (current_function_decl)
> >>&& !DECL_LAMBDA_FUNCTION_P (current_function_decl));
> >>
> >> all clones should be considered artificial?
> >
> > Well, from coverage perspective most of them are fine.
> >
> >>
> >> Anyway, your patch is OK, we can think about lambdas separately.  Can you
> >> simplify the DCE code after the patch?
> >
> > I installed the patch and I'm sending the follow up cleanup.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
> >
> >>
> >> Thanks,
> >> Richard.
> >>
>  instead?  On the other side, if the cloning doesn't change arguments in 
>  any
>  way, do we still want to clear those flags?
> >>>
> >>> Well, I would consider it safer to drop it always.
> >>>
> >>> Martin
> >>>
> 
>    Jakub
> 
> >>>
> >
>


Re: types for VR_VARYING

2019-08-15 Thread Richard Biener
On Thu, Aug 15, 2019 at 12:40 PM Aldy Hernandez  wrote:
>
> On 8/14/19 1:37 PM, Jeff Law wrote:
> > On 8/13/19 6:39 PM, Aldy Hernandez wrote:
> >>
> >>
> >> On 8/12/19 7:46 PM, Jeff Law wrote:
> >>> On 8/12/19 12:43 PM, Aldy Hernandez wrote:
>  This is a fresh re-post of:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html
> 
>  Andrew gave me some feedback a week ago, and I obviously don't remember
>  what it was because I was about to leave on PTO.  However, I do remember
>  I addressed his concerns before getting drunk on rum in tropical islands.
> 
> >>> FWIW found a great coffee infused rum while in Kauai last week.  I'm not
> >>> a coffee fan, but it was wonderful.  The one bottle we brought back
> >>> isn't going to last until Cauldron and I don't think I can get a special
> >>> order filled before I leave :(
> >>
> >> You must bring some to Cauldron before we believe you. :)
> > That's the problem.  The nearest place I can get it is in Vegas and
> > there's no distributor in Montreal.   I can special order it in our
> > state run stores, but it won't be here in time.
> >
> > Of course, I don't mind if you don't believe me.  More for me in that
> > case...
> >
> >
> >>> Is the supports_type_p stuff there to placate the calls from ipa-cp?  I
> >>> can live with it in the short term, but it really feels like there
> >>> should be something in the ipa-cp client that avoids this silliness.
> >>
> >> I am not happy with this either, but there are various places where
> >> statements that are !stmt_interesting_for_vrp() are still setting a
> >> range of VARYING, which is then being ignored at a later time.
> >>
> >> For example, vrp_initialize:
> >>
> >>if (!stmt_interesting_for_vrp (phi))
> >>  {
> >>tree lhs = PHI_RESULT (phi);
> >>set_def_to_varying (lhs);
> >>prop_set_simulate_again (phi, false);
> >>  }
> >>
> >> Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the
> >> statement is interesting for VRP but extract_range_from_stmt() does not
> >> produce a useful range, we also set a varying for a range we will never
> >> use.  Similarly for a statement that is not interesting in this hunk.
> > Ugh.  One could perhaps argue that setting any kind of range in these
> > circumstances is silly.   But I suspect it's necessary due to the
> > optimistic handling of VR_UNDEFINED in value_range_base::union_helper.
> > It's all coming back to me now...
> >
> >
> >>
> >> Then there is vrp_prop::visit_stmt() where we also set VARYING for types
> >> that VRP will never handle:
> >>
> >>case IFN_ADD_OVERFLOW:
> >>case IFN_SUB_OVERFLOW:
> >>case IFN_MUL_OVERFLOW:
> >>case IFN_ATOMIC_COMPARE_EXCHANGE:
> >>  /* These internal calls return _Complex integer type,
> >> which VRP does not track, but the immediate uses
> >> thereof might be interesting.  */
> >>  if (lhs && TREE_CODE (lhs) == SSA_NAME)
> >>{
> >>  imm_use_iterator iter;
> >>  use_operand_p use_p;
> >>  enum ssa_prop_result res = SSA_PROP_VARYING;
> >>
> >>  set_def_to_varying (lhs);
> >>
> >> I've adjusted the patch so that set_def_to_varying will set the range to
> >> VR_UNDEFINED if !supports_type_p.  This is a fail safe, as we can't
> >> really do anything with a nonsensical range.  I just don't want to leave
> >> the range in an indeterminate state.
> >>
> > I think VR_UNDEFINED is unsafe due to value_range_base::union_helper.
> > And that's a more general than this patch.  VR_UNDEFINED is _not_ a safe
> > range to set something to if we can't handle it.  We have to use VR_VARYING.
> >
> > Why?  See the beginning of value_range_base::union_helper:
> >
> > /* VR0 has the resulting range if VR1 is undefined or VR0 is varying.  
> > */
> > if (vr1->undefined_p ()
> > || vr0->varying_p ())
> >   return *vr0;
> >
> > /* VR1 has the resulting range if VR0 is undefined or VR1 is varying.  
> > */
> > if (vr0->undefined_p ()
> > || vr1->varying_p ())
> >   return *vr1;
> > This can get called for something like
> >
> >a =  ? name1 : name2;
> >
> > If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe
> > value for something we can't handle, then we'll incorrectly return the
> > range for name2.
>
> I think that if name1 was !supports_type_p, we will have never called
> union/intersect.  We will have bailed at some point earlier.  However, I
> do see your point about being consistent.
>
> >
> > VR_UNDEFINED can only be used for the ranges of objects we haven't
> > processed.  If we can't produce a range for an object because the
> > statement is something we don't handle or just doesn't produce anythign
> > useful, then the right result is VR_VARYING.
> >
> > This may be worth commenting at the definition site for VR_*.
> >
> >>
> >> I also noticed that Andrew's patch was setting num_vr_

Re: [PATCH 0/3] Libsanitizer: merge from trunk

2019-08-15 Thread Martin Liška
On 8/15/19 12:21 PM, Iain Sandoe wrote:
> 2) As noted on IRC, the version of automake used in the merge is 1.16.1 but 
> the GCC prereqs are for 1.15.1.  If it’s intended that automake-1.16.1 should 
> be used could this requirement be documented somewhere?

Thank you for heads up. Yes, I should have used 1.15.1, as documented here:
https://gcc.gnu.org/install/prerequisites.html

Feel free to send and install re-generated files in libsanitizer after your 
change.

Thanks,
Martin


[PATCH] Fix PR91445

2019-08-15 Thread Richard Biener


This fixes a regression for GCC 9 which was fixed along modifications
on trunk.  The patch backports some refactoring and hereby the
relevant change,

-  if (*disambiguate_only)
+  /* If we are looking for redundant stores do not create new hashtable
+ entries from aliasing defs with made up alias-sets.  */
+  if (*disambiguate_only || !data->tbaa_p)
 return (void *)-1;

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

The new testcase also goes to trunk.

Richard.

2019-08-15  Richard Biener  

PR tree-optimization/91445
* gcc.dg/torture/pr91445.c: New testcase.

Backport from mainline
2019-07-05  Richard Biener  

PR tree-optimization/91091
* tree-ssa-alias.h (get_continuation_for_phi): Add tbaa_p parameter.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-alias.c (maybe_skip_until): Pass down tbaa_p.
(get_continuation_for_phi): New tbaa_p parameter and pass
it down.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-pre.c (translate_vuse_through_block): Likewise.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr):
Likewise.
* tree-ssa-sccvn.c (struct vn_walk_cb_data): Add tbaa_p flag.
(vn_reference_lookup_3): Handle and pass down tbaa_p flag.
(vn_reference_lookup_pieces): Adjust.
(vn_reference_lookup): Remove alias-set altering, instead pass
down false as tbaa_p.

* gcc.dg/tree-ssa/pr91091-2.c: New testcase.

2019-07-04  Richard Biener  

* tree-ssa-sccvn.h (vn_reference_lookup): Add last_vuse_ptr
argument.
* tree-ssa-sccvn.c (last_vuse_ptr, vn_walk_kind): Move
globals into...
(struct vn_walk_cb_data): New callback data struct.
(vn_reference_lookup_2): Adjust.
(vn_reference_lookup_3): Likewise.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise, get last_vuse_ptr argument.
(visit_reference_op_load): Adjust.

Index: gcc/testsuite/gcc.dg/torture/pr91445.c
===
--- gcc/testsuite/gcc.dg/torture/pr91445.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr91445.c  (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+struct S { _Bool x; };
+
+void
+foo (struct S *s)
+{
+  __builtin_memset (s, 0x11, sizeof (struct S));
+  s->x = 1;
+}
+
+int
+main ()
+{
+  struct S s;
+  foo (&s);
+  char c;
+  __builtin_memcpy (&c, &s.x, 1);
+  if (c != 1)
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c   (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct s { int x; };
+struct t { int x; };
+
+void swap(struct s* p, struct t* q)
+{
+  p->x = q->x;
+  q->x = p->x;
+}
+
+/* The second statement is redundant.  */
+/* { dg-final { scan-tree-dump-times "x = " 1 "fre1" } } */
+/* { dg-final { scan-tree-dump-times " = \[^;\]*x;" 1 "fre1" } } */
Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 274504)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2599,8 +2599,8 @@ stmt_kills_ref_p (gimple *stmt, tree ref
 
 static bool
 maybe_skip_until (gimple *phi, tree &target, basic_block target_bb,
- ao_ref *ref, tree vuse, unsigned int &limit, bitmap *visited,
- bool abort_on_visited,
+ ao_ref *ref, tree vuse, bool tbaa_p, unsigned int &limit,
+ bitmap *visited, bool abort_on_visited,
  void *(*translate)(ao_ref *, tree, void *, bool *),
  void *data)
 {
@@ -2634,7 +2634,7 @@ maybe_skip_until (gimple *phi, tree &tar
  /* An already visited PHI node ends the walk successfully.  */
  if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt
return !abort_on_visited;
- vuse = get_continuation_for_phi (def_stmt, ref, limit,
+ vuse = get_continuation_for_phi (def_stmt, ref, tbaa_p, limit,
   visited, abort_on_visited,
   translate, data);
  if (!vuse)
@@ -2649,7 +2649,7 @@ maybe_skip_until (gimple *phi, tree &tar
  if ((int)limit <= 0)
return false;
  --limit;
- if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
+ if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
{
  bool disambiguate_only = true;
  if (translate
@@ -2681,7 +2681,7 @@ maybe_skip_until (gimple *phi, tree &tar
Returns NULL_TREE if no suitable virtual operand can be found.  */
 
 tree
-get_continuation_for_phi (gimple *phi, ao_ref *re

Re: [PATCH][i386] Fix PR91454, unrecognized insn

2019-08-15 Thread Uros Bizjak
On Thu, Aug 15, 2019 at 1:09 PM Richard Biener  wrote:
>
>
> The following fixes non-recognized RTL gegerated since my STV
> changes.  I've added a helper instead of enlarging the code
> even more.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> OK?
>
> Thanks,
> Richard.
>
> 2019-08-15  Richard Biener  
>
> PR target/91454
> * config/i386/i386-features.c (gen_gpr_to_xmm_move_src): New
> helper.
> (general_scalar_chain::make_vector_copies): Use it.

OK.

Thanks,
Uros.

> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c (revision 274504)
> +++ gcc/config/i386/i386-features.c (working copy)
> @@ -658,6 +658,25 @@ scalar_chain::emit_conversion_insns (rtx
>emit_insn_after (insns, BB_HEAD (new_bb));
>  }
>
> +/* Generate the canonical SET_SRC to move GPR to a VMODE vector register,
> +   zeroing the upper parts.  */
> +
> +static rtx
> +gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
> +{
> +  switch (GET_MODE_NUNITS (vmode))
> +{
> +case 1:
> +  return gen_rtx_SUBREG (vmode, gpr, 0);
> +case 2:
> +  return gen_rtx_VEC_CONCAT (vmode, gpr,
> +CONST0_RTX (GET_MODE_INNER (vmode)));
> +default:
> +  return gen_rtx_VEC_MERGE (vmode, gen_rtx_VEC_DUPLICATE (vmode, gpr),
> +   CONST0_RTX (vmode), GEN_INT 
> (HOST_WIDE_INT_1U));
> +}
> +}
> +
>  /* Make vector copies for all register REGNO definitions
> and replace its uses in a chain.  */
>
> @@ -684,13 +703,8 @@ general_scalar_chain::make_vector_copies
>   }
> else
>   emit_move_insn (tmp, reg);
> -   emit_insn (gen_rtx_SET
> -   (gen_rtx_SUBREG (vmode, vreg, 0),
> -gen_rtx_VEC_MERGE (vmode,
> -   gen_rtx_VEC_DUPLICATE (vmode,
> -  tmp),
> -   CONST0_RTX (vmode),
> -   GEN_INT (HOST_WIDE_INT_1U;
> +   emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> +   gen_gpr_to_xmm_move_src (vmode, tmp)));
>   }
> else if (!TARGET_64BIT && smode == DImode)
>   {
> @@ -720,13 +734,8 @@ general_scalar_chain::make_vector_copies
>   }
>   }
> else
> - emit_insn (gen_rtx_SET
> -  (gen_rtx_SUBREG (vmode, vreg, 0),
> -   gen_rtx_VEC_MERGE (vmode,
> -  gen_rtx_VEC_DUPLICATE (vmode,
> - reg),
> -  CONST0_RTX (vmode),
> -  GEN_INT (HOST_WIDE_INT_1U;
> + emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> + gen_gpr_to_xmm_move_src (vmode, reg)));
> rtx_insn *seq = get_insns ();
> end_sequence ();
> rtx_insn *insn = DF_REF_INSN (ref);


[PATCH][i386] Fix PR91454, unrecognized insn

2019-08-15 Thread Richard Biener


The following fixes non-recognized RTL gegerated since my STV
changes.  I've added a helper instead of enlarging the code
even more.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?

Thanks,
Richard.

2019-08-15  Richard Biener  

PR target/91454
* config/i386/i386-features.c (gen_gpr_to_xmm_move_src): New
helper.
(general_scalar_chain::make_vector_copies): Use it.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274504)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -658,6 +658,25 @@ scalar_chain::emit_conversion_insns (rtx
   emit_insn_after (insns, BB_HEAD (new_bb));
 }
 
+/* Generate the canonical SET_SRC to move GPR to a VMODE vector register,
+   zeroing the upper parts.  */
+
+static rtx
+gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
+{
+  switch (GET_MODE_NUNITS (vmode))
+{
+case 1:
+  return gen_rtx_SUBREG (vmode, gpr, 0);
+case 2:
+  return gen_rtx_VEC_CONCAT (vmode, gpr,
+CONST0_RTX (GET_MODE_INNER (vmode)));
+default:
+  return gen_rtx_VEC_MERGE (vmode, gen_rtx_VEC_DUPLICATE (vmode, gpr),
+   CONST0_RTX (vmode), GEN_INT (HOST_WIDE_INT_1U));
+}
+}
+
 /* Make vector copies for all register REGNO definitions
and replace its uses in a chain.  */
 
@@ -684,13 +703,8 @@ general_scalar_chain::make_vector_copies
  }
else
  emit_move_insn (tmp, reg);
-   emit_insn (gen_rtx_SET
-   (gen_rtx_SUBREG (vmode, vreg, 0),
-gen_rtx_VEC_MERGE (vmode,
-   gen_rtx_VEC_DUPLICATE (vmode,
-  tmp),
-   CONST0_RTX (vmode),
-   GEN_INT (HOST_WIDE_INT_1U;
+   emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+   gen_gpr_to_xmm_move_src (vmode, tmp)));
  }
else if (!TARGET_64BIT && smode == DImode)
  {
@@ -720,13 +734,8 @@ general_scalar_chain::make_vector_copies
  }
  }
else
- emit_insn (gen_rtx_SET
-  (gen_rtx_SUBREG (vmode, vreg, 0),
-   gen_rtx_VEC_MERGE (vmode,
-  gen_rtx_VEC_DUPLICATE (vmode,
- reg),
-  CONST0_RTX (vmode),
-  GEN_INT (HOST_WIDE_INT_1U;
+ emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+ gen_gpr_to_xmm_move_src (vmode, reg)));
rtx_insn *seq = get_insns ();
end_sequence ();
rtx_insn *insn = DF_REF_INSN (ref);


[PATCH][GIMPLE-FE] Expose 'sizetype' to the GIMPLE FE

2019-08-15 Thread Richard Biener


This exposes 'sizetype' as __SIZETYPE__ to help writing "portable"
GIMPLE testcases.

Will commit soonish, more testcases might need adjustment
for some targets.

Richard.

2019-08-15  Richard Biener  

c-family/
* c-common.c (c_stddef_cpp_builtins): When the GIMPLE FE is
enabled, define __SIZETYPE__.

* gcc.dg/pr80170.c: Adjust.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 274504)
+++ gcc/c-family/c-common.c (working copy)
@@ -5148,6 +5148,10 @@ c_stddef_cpp_builtins(void)
 builtin_define_with_value ("__INTPTR_TYPE__", INTPTR_TYPE, 0);
   if (UINTPTR_TYPE)
 builtin_define_with_value ("__UINTPTR_TYPE__", UINTPTR_TYPE, 0);
+  /* GIMPLE FE testcases need access to the GCC internal 'sizetype'.
+ Expose it as __SIZETYPE__.  */
+  if (flag_gimple)
+builtin_define_with_value ("__SIZETYPE__", SIZETYPE, 0);
 }
 
 static void
Index: gcc/testsuite/gcc.dg/pr80170.c
===
--- gcc/testsuite/gcc.dg/pr80170.c  (revision 274504)
+++ gcc/testsuite/gcc.dg/pr80170.c  (working copy)
@@ -24,11 +24,7 @@ NullB (void * misalignedPtr)
   struct B * b;
 
   bb_2:
-#if __SIZEOF_LONG__ == 8
-  b_2 = misalignedPtr_1(D) + 18446744073709551608ul;
-#else
-  b_2 = misalignedPtr_1(D) + 4294967292ul;
-#endif
+  b_2 = misalignedPtr_1(D) + _Literal (__SIZETYPE__) -__SIZEOF_POINTER__;
   __MEM  (b_2).a.a = _Literal (void *) 0;
   __MEM  (b_2).a.b = _Literal (void *) 0;
   return;


[PATCH, i386]: Fix recent STV testsuite failures

2019-08-15 Thread Uros Bizjak
The COMPARE RTX has a special conversion procedure that applies only
to DImode double-word operands. Do not convert single-word SImode and
DImode operands for now.

2019-08-15  Uroš Bizjak  

* config/i386/i386-features.c (general_scalar_chain::convert_insn)
: Revert 2019-08-14 change.
(convertible_comparison_p): Revert 2019-08-14 change.  Return false
for (TARGET_64BIT || mode != DImode).

Also update a couple of nearby comments.

Boostrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-features.c
===
--- config/i386/i386-features.c (revision 274504)
+++ config/i386/i386-features.c (working copy)
@@ -1030,11 +1030,11 @@ general_scalar_chain::convert_insn (rtx_insn *insn
 case COMPARE:
   src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
 
-  gcc_assert ((REG_P (src) && GET_MODE (src) == GET_MODE_INNER (vmode))
- || (SUBREG_P (src) && GET_MODE (src) == vmode));
+  gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
+ || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
 
   if (REG_P (src))
-   subreg = gen_rtx_SUBREG (vmode, src, 0);
+   subreg = gen_rtx_SUBREG (V2DImode, src, 0);
   else
subreg = copy_rtx_if_shared (src);
   emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared 
(subreg),
@@ -1273,8 +1273,12 @@ has_non_address_hard_reg (rtx_insn *insn)
 (const_int 0 [0])))  */
 
 static bool
-convertible_comparison_p (rtx_insn *insn, enum machine_mode mode)
+convertible_comparison_p (rtx_insn *insn, machine_mode mode)
 {
+  /* ??? Currently convertible for double-word DImode chain only.  */
+  if (TARGET_64BIT || mode != DImode)
+return false;
+
   if (!TARGET_SSE4_1)
 return false;
 
@@ -1306,12 +1310,12 @@ static bool
 
   if (!SUBREG_P (op1)
   || !SUBREG_P (op2)
-  || GET_MODE (op1) != mode
-  || GET_MODE (op2) != mode
+  || GET_MODE (op1) != SImode
+  || GET_MODE (op2) != SImode
   || ((SUBREG_BYTE (op1) != 0
-  || SUBREG_BYTE (op2) != GET_MODE_SIZE (mode))
+  || SUBREG_BYTE (op2) != GET_MODE_SIZE (SImode))
  && (SUBREG_BYTE (op2) != 0
- || SUBREG_BYTE (op1) != GET_MODE_SIZE (mode
+ || SUBREG_BYTE (op1) != GET_MODE_SIZE (SImode
 return false;
 
   op1 = SUBREG_REG (op1);
@@ -1319,13 +1323,13 @@ static bool
 
   if (op1 != op2
   || !REG_P (op1)
-  || GET_MODE (op1) != GET_MODE_WIDER_MODE (mode).else_blk ())
+  || GET_MODE (op1) != DImode)
 return false;
 
   return true;
 }
 
-/* The DImode version of scalar_to_vector_candidate_p.  */
+/* The general version of scalar_to_vector_candidate_p.  */
 
 static bool
 general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode)
@@ -1344,7 +1348,7 @@ general_scalar_to_vector_candidate_p (rtx_insn *in
   if (GET_CODE (src) == COMPARE)
 return convertible_comparison_p (insn, mode);
 
-  /* We are interested in DImode promotion only.  */
+  /* We are interested in "mode" only.  */
   if ((GET_MODE (src) != mode
&& !CONST_INT_P (src))
   || GET_MODE (dst) != mode)


Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-15 Thread Martin Liška
PING^1

On 8/8/19 10:43 AM, Martin Liška wrote:
> On 8/7/19 4:12 PM, Richard Biener wrote:
>> On Wed, Aug 7, 2019 at 2:04 PM Martin Liška  wrote:
>>>
>>> On 8/7/19 12:51 PM, Jakub Jelinek wrote:
 On Wed, Aug 07, 2019 at 12:44:28PM +0200, Martin Liška wrote:
> On 8/7/19 11:51 AM, Richard Biener wrote:
>> I think the simplest way to achieve this is to not copy, aka clear,
>> DECL_IS_OPERATOR_* when cloning and removing arguments
>> (cloning for a constant align argument should be OK for example, as is
>> for a constant address).  Or simply always when cloning.
>
> Ok, then I'm suggesting following tested patch.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

 What about LAMBDA_FUNCTION, doesn't cloning which changes arguments in any
 way invalidate that too, i.e. shouldn't it be just
   FUNCTION_DECL_DECL_TYPE (new_node->decl) = NONE;
>>>
>>> Well, how are lambdas involved in the new/delete DCE here? Lambdas with 
>>> removed
>>> arguments should not interfere here.
>>
>> But for coverage where we do
>>
>>   gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl)
>>&& !DECL_FUNCTION_VERSIONED (current_function_decl)
>>&& !DECL_LAMBDA_FUNCTION_P (current_function_decl));
>>
>> all clones should be considered artificial?
> 
> Well, from coverage perspective most of them are fine.
> 
>>
>> Anyway, your patch is OK, we can think about lambdas separately.  Can you
>> simplify the DCE code after the patch?
> 
> I installed the patch and I'm sending the follow up cleanup.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>>
>> Thanks,
>> Richard.
>>
 instead?  On the other side, if the cloning doesn't change arguments in any
 way, do we still want to clear those flags?
>>>
>>> Well, I would consider it safer to drop it always.
>>>
>>> Martin
>>>

   Jakub

>>>
> 



Re: enforce canonicalization of value_range's

2019-08-15 Thread Aldy Hernandez




On 8/14/19 1:53 PM, Jeff Law wrote:

On 8/13/19 6:51 PM, Aldy Hernandez wrote:

Presumably this was better than moving the implementation earlier.


Actually, it was for ease of review.  I made some changes to the
function, and I didn't want the reviewer to miss them because I had
moved the function wholesale.  I can move the function earlier, after we
agree on the changes (see below).

Either works for me.  I think there was an informal effort to avoid
these kinds of forward decls eons ago because our inliner sucked, but in
the IPA world order in the source file really shouldn't matter.


Ok, I'll leave it as is, so I don't have to rebase the VARYING patch. 
When both patches are in, I'll move the definition up as an obvious change.








If we weren't on a path to kill VRP I'd probably suggest a distinct
effort to constify this code.  Some of the changes were a bit confusing
when it looked like we'd dropped a call to set the range of an object.
But those were just local copies, so setting the type/min/max directly
was actually fine.  constification would make this a bit clearer.  But
again, I don't think it's worth the effort given the long term
trajectory for tree-vrp.c.


I shouldn't be introducing any new confusion.  Did I add any new methods
that should've been const that aren't?  I can't see any??.  I'm happy to
fix anything I introduced.

IIRC we had an incoming range object passed by value, which we locally
modified and called the setter.

I spotted the dropped call to the setter and was going to call it out as
possibly broken.  But in investigating further I realized the object was
passed by value, so dropping the setter wasn't really a problem.

THe funny thing was we were doing this on source operands rather than
the destination operand.  Arguably the ranges for the source operands
should be constant which would have flagged that code as fishy from its
inception and I'm sure the code would have been restructured
appropriately and would have avoided the confusion.

So in summary, you didn't break anything.  It was a safe change you
made, but it wasn't immediately obvious it was safe.  If we had a
constified codebase the intent of the code would have been more obvious.







So where does the handle_pointers stuff matter?   I'm a bit surprised we
have to do anything special for them.


I've learned to touch as little of VRP as is necessary, as changing
anything to be more consistent breaks things in unexpected ways ;-).

In this particular case, TYPE_MIN_VALUE and TYPE_MAX_VALUE are not
defined for pointers, and I didn't want to change the meaning of
vrp_val_{min,max} throughout.  I was trying to minimize the changes to
existing behavior.  If it bothers you too much, we could remove it as a
follow up when we are sure there are no expected side-effects from the
rest of the patch. ??

I don't mind exploring this as a follow-up.  I guess that a min/max
doesn't really have significant meaning for pointers.

I think rather than digging too deep into this, let's table it for now.
  I think the time to revisit will be as we work through removal of
tree-vrp at some point in the future.






OK.  I don't expect the answers to the minor questions above will
ultimately change anything.


I could appreciate a final nod before I commit.  And even then, I will
wait until the other patch is approved and commit them simultaneously.
They are a bit intertwined.

I'm nodding :-)


I've tested this patch in isolation, and am committing it while we agree 
on the varying one.


Thank you.

Aldy


Re: types for VR_VARYING

2019-08-15 Thread Aldy Hernandez

On 8/14/19 1:37 PM, Jeff Law wrote:

On 8/13/19 6:39 PM, Aldy Hernandez wrote:



On 8/12/19 7:46 PM, Jeff Law wrote:

On 8/12/19 12:43 PM, Aldy Hernandez wrote:

This is a fresh re-post of:

https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html

Andrew gave me some feedback a week ago, and I obviously don't remember
what it was because I was about to leave on PTO.  However, I do remember
I addressed his concerns before getting drunk on rum in tropical islands.


FWIW found a great coffee infused rum while in Kauai last week.  I'm not
a coffee fan, but it was wonderful.  The one bottle we brought back
isn't going to last until Cauldron and I don't think I can get a special
order filled before I leave :(


You must bring some to Cauldron before we believe you. :)

That's the problem.  The nearest place I can get it is in Vegas and
there's no distributor in Montreal.   I can special order it in our
state run stores, but it won't be here in time.

Of course, I don't mind if you don't believe me.  More for me in that
case...



Is the supports_type_p stuff there to placate the calls from ipa-cp?  I
can live with it in the short term, but it really feels like there
should be something in the ipa-cp client that avoids this silliness.


I am not happy with this either, but there are various places where
statements that are !stmt_interesting_for_vrp() are still setting a
range of VARYING, which is then being ignored at a later time.

For example, vrp_initialize:

   if (!stmt_interesting_for_vrp (phi))
 {
   tree lhs = PHI_RESULT (phi);
   set_def_to_varying (lhs);
   prop_set_simulate_again (phi, false);
 }

Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the
statement is interesting for VRP but extract_range_from_stmt() does not
produce a useful range, we also set a varying for a range we will never
use.  Similarly for a statement that is not interesting in this hunk.

Ugh.  One could perhaps argue that setting any kind of range in these
circumstances is silly.   But I suspect it's necessary due to the
optimistic handling of VR_UNDEFINED in value_range_base::union_helper.
It's all coming back to me now...




Then there is vrp_prop::visit_stmt() where we also set VARYING for types
that VRP will never handle:

   case IFN_ADD_OVERFLOW:
   case IFN_SUB_OVERFLOW:
   case IFN_MUL_OVERFLOW:
   case IFN_ATOMIC_COMPARE_EXCHANGE:
 /* These internal calls return _Complex integer type,
    which VRP does not track, but the immediate uses
    thereof might be interesting.  */
 if (lhs && TREE_CODE (lhs) == SSA_NAME)
   {
 imm_use_iterator iter;
 use_operand_p use_p;
 enum ssa_prop_result res = SSA_PROP_VARYING;

 set_def_to_varying (lhs);

I've adjusted the patch so that set_def_to_varying will set the range to
VR_UNDEFINED if !supports_type_p.  This is a fail safe, as we can't
really do anything with a nonsensical range.  I just don't want to leave
the range in an indeterminate state.


I think VR_UNDEFINED is unsafe due to value_range_base::union_helper.
And that's a more general than this patch.  VR_UNDEFINED is _not_ a safe
range to set something to if we can't handle it.  We have to use VR_VARYING.

Why?  See the beginning of value_range_base::union_helper:

/* VR0 has the resulting range if VR1 is undefined or VR0 is varying.  */
if (vr1->undefined_p ()
|| vr0->varying_p ())
  return *vr0;

/* VR1 has the resulting range if VR0 is undefined or VR1 is varying.  */
if (vr0->undefined_p ()
|| vr1->varying_p ())
  return *vr1;
This can get called for something like

   a =  ? name1 : name2;

If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe
value for something we can't handle, then we'll incorrectly return the
range for name2.


I think that if name1 was !supports_type_p, we will have never called 
union/intersect.  We will have bailed at some point earlier.  However, I 
do see your point about being consistent.




VR_UNDEFINED can only be used for the ranges of objects we haven't
processed.  If we can't produce a range for an object because the
statement is something we don't handle or just doesn't produce anythign
useful, then the right result is VR_VARYING.

This may be worth commenting at the definition site for VR_*.



I also noticed that Andrew's patch was setting num_vr_values to
num_ssa_names + num_ssa_names / 10.  I think he meant num_vr_values +
num_vr_values / 10.  Please verify the current incantation makes sense.

Going to assume this will be adjusted per the other messages in this thread.


Done.





diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39ea22f0554..663dd6e2398 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -182,8 +182,10 @@ record_temporary_equivalences_from_phis (edge e,
new_vr->deep_copy (vr_values->get_value_range (src));
  e

Re: [PATCH 0/3] Libsanitizer: merge from trunk

2019-08-15 Thread Iain Sandoe
Hi Martin,

> On 14 Aug 2019, at 17:18, Jeff Law  wrote:
> 
> On 8/14/19 2:50 AM, Martin Liška wrote:
>> On 8/13/19 5:02 PM, Jeff Law wrote:
>>> On 8/13/19 7:07 AM, Martin Liska wrote:
 Hi.
 
 For this year, I decided to make a first merge now and the
 next (much smaller) at the end of October.
 
 The biggest change is rename of many files from .cc to .cpp.
 
 I bootstrapped the patch set on x86_64-linux-gnu and run
 asan/ubsan/tsan tests on x86_64, ppc64le (power8) and
 aarch64.
 
 Libasan SONAME has been already bumped compared to GCC 9.
 
 For other libraries, I don't see a reason for library bumping:
 
 $ abidiff /usr/lib64/libubsan.so.1.0.0 
 ./x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0 --stat
 Functions changes summary: 0 Removed, 0 Changed, 4 Added functions
 Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
 Function symbols changes summary: 3 Removed, 0 Added function symbols not 
 referenced by debug info
 Variable symbols changes summary: 0 Removed, 0 Added variable symbol not 
 referenced by debug info
 
 $ abidiff /usr/lib64/libtsan.so.0.0.0  
 ./x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.0.0.0 --stat
 Functions changes summary: 0 Removed, 0 Changed, 47 Added functions
 Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
 Function symbols changes summary: 1 Removed, 2 Added function symbols not 
 referenced by debug info
 Variable symbols changes summary: 0 Removed, 0 Added variable symbol not 
 referenced by debug info
 
 Ready to be installed?
>>> ISTM that a sanitizer merge during stage1 should be able to move forward
>>> without ACKs.  Similarly for other runtimes where we pull from some
>>> upstream master.
>> 
>> Good then. I've just installed the patch and also the refresh of 
>> LOCAL_PATCHES.
> Sounds good.  My tester will spin them on a variety of platforms over
> the next couple days.  I won't be at all surprised if the MIPS bits are
> still flakey.

1)

This breaks bootstrap on Darwin, with:
/src-local/gcc-trunk/libsanitizer/sanitizer_common/sanitizer_vector.h:18:10: 
fatal error: sanitizer_common/sanitizer_allocator_internal.h: No such file or 
directory
   18 | #include "sanitizer_common/sanitizer_allocator_internal.h"
  |  ^

Which is because the top level source dir is not present in the compile line.

I am bootstrapping the following fix on darwin and x86-64/powerpc-linux and 
will apply it to unbreak bootstrap  if those three succeed.

(using automake-1.16.1 pending resolution of point 2).

2) As noted on IRC, the version of automake used in the merge is 1.16.1 but the 
GCC prereqs are for 1.15.1.  If it’s intended that automake-1.16.1 should be 
used could this requirement be documented somewhere?

cheers
Iain

libsanitizer/

2019-08-15 Iain Sandoe 

* sanitizer_common/Makefile.am: Include top_srcdir unconditionally.
* sanitizer_common/Makefile.in: Regenerated



diff --git a/libsanitizer/sanitizer_common/Makefile.am 
b/libsanitizer/sanitizer_common/Makefile.am
index 7e8ce9476e..df9c294151 100644
--- a/libsanitizer/sanitizer_common/Makefile.am
+++ b/libsanitizer/sanitizer_common/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS = -I $(top_srcdir)/include -isystem $(top_srcdir)/include/system
+AM_CPPFLAGS = -I $(top_srcdir)/include -I $(top_srcdir) -isystem 
$(top_srcdir)/include/system
 
 # May be used by toolexeclibdir.
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
@@ -10,7 +10,6 @@ AM_CXXFLAGS += -std=gnu++11
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
 if LIBBACKTRACE_SUPPORTED
 AM_CXXFLAGS += -DSANITIZER_LIBBACKTRACE -DSANITIZER_CP_DEMANGLE \
-  -I $(top_srcdir)/ \
   -I $(top_srcdir)/../libbacktrace \
   -I $(top_builddir)/libbacktrace \
   -I $(top_srcdir)/../include \



Add missing check for BUILT_IN_MD (PR 91444)

2019-08-15 Thread Richard Sandiford
In this PR we were passing an ordinary non-built-in function to
targetm.vectorize.builtin_md_vectorized_function, which is only
supposed to handle BUILT_IN_MD.

Tested on aarch64-linux-gnu and spot-checked on powerpc64el-linux-gnu.
Applied as obvious (r274524).

Richard


2019-08-15  Richard Sandiford  

gcc/
PR middle-end/91444
* tree-vect-stmts.c (vectorizable_call): Check that the function
is a BUILT_IN_MD function before passing it to
targetm.vectorize.builtin_md_vectorized_function.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-08-05 17:46:43.029559672 +0100
+++ gcc/tree-vect-stmts.c   2019-08-15 10:22:47.468694552 +0100
@@ -3376,7 +3376,7 @@ vectorizable_call (stmt_vec_info stmt_in
   if (cfn != CFN_LAST)
fndecl = targetm.vectorize.builtin_vectorized_function
  (cfn, vectype_out, vectype_in);
-  else if (callee)
+  else if (callee && fndecl_built_in_p (callee, BUILT_IN_MD))
fndecl = targetm.vectorize.builtin_md_vectorized_function
  (callee, vectype_out, vectype_in);
 }


[committed][AArch64] Add a aarch64_sve_mode_p query

2019-08-15 Thread Richard Sandiford
This patch adds an exported function for testing whether a mode is
an SVE mode.  The ACLE will make more use of it, but there's already
one place that can benefit.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274523.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_mode_p): Declare.
* config/aarch64/aarch64.c (aarch64_sve_mode_p): New function.
(aarch64_select_early_remat_modes): Use it.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:47:20.176358327 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:59:17.543066285 +0100
@@ -475,6 +475,7 @@ bool aarch64_masks_and_shift_for_bfi_p (
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
+bool aarch64_sve_mode_p (machine_mode);
 bool aarch64_sve_cnt_immediate_p (rtx);
 bool aarch64_sve_scalar_inc_dec_immediate_p (rtx);
 bool aarch64_sve_addvl_addpl_immediate_p (rtx);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:57:26.03521 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:59:17.547066257 +0100
@@ -1606,6 +1606,14 @@ aarch64_vector_data_mode_p (machine_mode
   return aarch64_classify_vector_mode (mode) & VEC_ANY_DATA;
 }
 
+/* Return true if MODE is any form of SVE mode, including predicates,
+   vectors and structures.  */
+bool
+aarch64_sve_mode_p (machine_mode mode)
+{
+  return aarch64_classify_vector_mode (mode) & VEC_ANY_SVE;
+}
+
 /* Return true if MODE is an SVE data vector mode; either a single vector
or a structure of vectors.  */
 static bool
@@ -19962,12 +19970,8 @@ aarch64_select_early_remat_modes (sbitma
   /* SVE values are not normally live across a call, so it should be
  worth doing early rematerialization even in VL-specific mode.  */
   for (int i = 0; i < NUM_MACHINE_MODES; ++i)
-{
-  machine_mode mode = (machine_mode) i;
-  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
-  if (vec_flags & VEC_ANY_SVE)
-   bitmap_set_bit (modes, i);
-}
+if (aarch64_sve_mode_p ((machine_mode) i))
+  bitmap_set_bit (modes, i);
 }
 
 /* Override the default target speculation_safe_value.  */


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-15 Thread Uros Bizjak
On Tue, Aug 13, 2019 at 9:54 PM H.J. Lu  wrote:

> > > with the latest patch (this is with -m32) where -mstv causes
> > > all spills to go away and the cmoves replaced (so clearly
> > > better code after the patch) for pr65105-5.c, no obvious
> > > improvements for pr65105-3.c where cmov does appear with -mstv.
> > > I'd rather not "fix" those by adding -mno-stv but instead have
> > > the Intel people fix costing for slm and/or decide what to do.
> > > For pr65105-3.c I'm not sure why if-conversion didn't choose
> > > to use cmov, so clearly the enabled minmax patterns expose the
> > > "failure" here.
> > I'm not sure how much effort Intel is putting into Silvermont tuning
> > these days.  So I'd suggest giving HJ a heads-up and a reasonable period
> > of time to take a looksie, but I wouldn't hold the patch for long due to
> > a Silvermont tuning issue.
>
> Leave pr65105-3.c to fail for now.  We can take a look later.

I have a patch for this. The problem is with conversion of COMPARE,
which gets assigned to SImode chain, while in fact we expect very
specific form of DImode compare.

Uros.


[committed][AArch64] Fix predicate alignment for fixed-length SVE

2019-08-15 Thread Richard Sandiford
aarch64_simd_vector_alignment was only giving predicates 16-bit
alignment in VLA mode, not VLS mode.  I think the problem is latent
because we can't yet create an ABI predicate type, but it seemed worth
fixing in a standalone patch rather than as part of the main ACLE series.

The ACLE patches have tests for this.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274522.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Return
16 for SVE predicates even if they are fixed-length.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:52:24.842110687 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:56:41.816215015 +0100
@@ -15915,11 +15915,13 @@ aarch64_simd_attr_length_rglist (machine
 static HOST_WIDE_INT
 aarch64_simd_vector_alignment (const_tree type)
 {
+  /* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can
+ be set for non-predicate vectors of booleans.  Modes are the most
+ direct way we have of identifying real SVE predicate types.  */
+  if (GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL)
+return 16;
   if (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
-/* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can
-   be set for non-predicate vectors of booleans.  Modes are the most
-   direct way we have of identifying real SVE predicate types.  */
-return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
+return 128;
   return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
 }
 


Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-08-15 Thread Richard Biener
On Wed, 14 Aug 2019, Bernd Edlinger wrote:

> On 8/14/19 2:00 PM, Richard Biener wrote:
> > On Thu, 8 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> >>> On 8/2/19 3:11 PM, Richard Biener wrote:
>  On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> 
> >
> > I have no test coverage for the movmisalign optab though, so I
> > rely on your code review for that part.
> 
>  It looks OK.  I tried to make it trigger on the following on
>  i?86 with -msse2:
> 
>  typedef int v4si __attribute__((vector_size (16)));
> 
>  struct S { v4si v; } __attribute__((packed));
> 
>  v4si foo (struct S s)
>  {
>    return s.v;
>  }
> 
> >>>
> >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> >>> So the test case could be made to trigger it this way:
> >>>
> >>> typedef int v4si __attribute__((vector_size (16)));
> >>>
> >>> struct S { v4si v; } __attribute__((packed));
> >>>
> >>> int t;
> >>> v4si foo (struct S a, struct S b, struct S c, struct S d,
> >>>   struct S e, struct S f, struct S g, struct S h,
> >>>   int i, int j, int k, int l, int m, int n,
> >>>   int o, struct S s)
> >>> {
> >>>   t = o;
> >>>   return s.v;
> >>> }
> >>>
> >>
> >> Ah, I realized that there are already a couple of very similar
> >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> >> which also manage to execute the movmisalign code with the latest patch
> >> version.  So I thought that it is not necessary to add another one.
> >>
> >>> However the code path is still not reached, since 
> >>> targetm.slow_ualigned_access
> >>> is always FALSE, which is probably a flaw in my patch.
> >>>
> >>> So I think,
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> + > MEM_ALIGN (data->entry_parm)
> >>> +  && targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +MEM_ALIGN 
> >>> (data->entry_parm)))
> >>>
> >>> should probably better be
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> + > MEM_ALIGN (data->entry_parm)
> >>> +&& (((icode = optab_handler (movmisalign_optab, 
> >>> promoted_nominal_mode))
> >>> + != CODE_FOR_nothing)
> >>> +|| targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +  MEM_ALIGN 
> >>> (data->entry_parm
> >>>
> >>> Right?
> >>>
> >>> Then the modified test case would use the movmisalign optab.
> >>> However nothing changes in the end, since the i386 back-end is used to 
> >>> work
> >>> around the middle end not using movmisalign optab when it should do so.
> >>>
> >>
> >> I prefer the second form of the check, as it offers more test coverage,
> >> and is probably more correct than the former.
> >>
> >> Note there are more variations of this misalign check in expr.c,
> >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> >>
> >> && mode != BLKmode
> >> && align < GET_MODE_ALIGNMENT (mode))
> >>   {
> >> if ((icode = optab_handler (movmisalign_optab, mode))
> >> != CODE_FOR_nothing)
> >>   [...]
> >> else if (targetm.slow_unaligned_access (mode, align))
> >>   temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
> >> 0, TYPE_UNSIGNED (TREE_TYPE (exp)),
> >> (modifier == EXPAND_STACK_PARM
> >>  ? NULL_RTX : target),
> >> mode, mode, false, alt_rtl);
> >>
> >> I wonder if they are correct this way, why shouldn't we use the movmisalign
> >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
> > 
> > Doesn't the code do exactly this?  Prefer movmisalign over 
> > extrct_bit_field?
> > 
> 
> Ah, yes.  How could I miss that.
> 
> >>
> >>> I wonder if I should try to add a gcc_checking_assert to the mov 
> >>> expand
> >>> patterns that the memory is properly aligned ?
> >>>
> >>
> >> Wow, that was a really exciting bug-hunt with those assertions around...
> > 
> > :)
> > 
>  @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> 
> did_conversion = true;
>   }
>  +  else if (MEM_P (data->entry_parm)
>  +  && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>  + > MEM_ALIGN (data->entry_parm)
> 
>  we arrive here by-passing
> 
>    else if (need_conversion)
>  {
>    /* We did not have an insn to convert directly, or the sequence
>   generated appeared unsafe.  We must first copy the parm to a
>    

[committed][AArch64] Tweak operand choice for SVE predicate AND

2019-08-15 Thread Richard Sandiford
SVE defines an assembly alias:

   MOV pa.B, pb/Z, pc.B  ->  AND pa.B. pb/Z, pc.B, pc.B

Our and3 pattern was instead using the functionally-equivalent:

   AND pa.B. pb/Z, pb.B, pc.B
   
This patch duplicates pc.B instead so that the alias can be seen
in disassembly.

I wondered about using the alias in the pattern instead, but using AND
explicitly seems to fit better with the pattern name and surrounding code.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274521.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (and3): Make the
operand order match the MOV /Z alias.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:47:20.176358327 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:54:12.977312970 +0100
@@ -3317,12 +3317,14 @@ (define_insn "*3"
 ;; -
 
 ;; Predicate AND.  We can reuse one of the inputs as the GP.
+;; Doubling the second operand is the preferred implementation
+;; of the MOV alias, so we use that instead of %1/z, %1, %2.
 (define_insn "and3"
   [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
  (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
   "TARGET_SVE"
-  "and\t%0.b, %1/z, %1.b, %2.b"
+  "and\t%0.b, %1/z, %2.b, %2.b"
 )
 
 ;; Unpredicated predicate EOR and ORR.


[committed][AArch64] Pass a pattern to aarch64_output_sve_cnt_immediate

2019-08-15 Thread Richard Sandiford
This patch makes us always pass an explicit vector pattern to
aarch64_output_sve_cnt_immediate, rather than assuming it's ALL.
The ACLE patches need to be able to pass in other values.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274520.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_output_sve_cnt_immediate): Take
the vector pattern as an aarch64_svpattern argument.  Update the
overloaded caller accordingly.
(aarch64_output_sve_scalar_inc_dec): Update call accordingly.
(aarch64_output_sve_vector_inc_dec): Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:49:56.159207559 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:52:08.670229990 +0100
@@ -2902,16 +2902,17 @@ aarch64_sve_cnt_immediate_p (rtx x)
operand (a vector pattern followed by a multiplier in the range [1, 16]).
PREFIX is the mnemonic without the size suffix and OPERANDS is the
first part of the operands template (the part that comes before the
-   vector size itself).  FACTOR is the number of quadwords.
-   NELTS_PER_VQ, if nonzero, is the number of elements in each quadword.
-   If it is zero, we can use any element size.  */
+   vector size itself).  PATTERN is the pattern to use.  FACTOR is the
+   number of quadwords.  NELTS_PER_VQ, if nonzero, is the number of elements
+   in each quadword.  If it is zero, we can use any element size.  */
 
 static char *
 aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+ aarch64_svpattern pattern,
  unsigned int factor,
  unsigned int nelts_per_vq)
 {
-  static char buffer[sizeof ("sqincd\t%x0, %w0, all, mul #16")];
+  static char buffer[sizeof ("sqincd\t%x0, %w0, vl256, mul #16")];
 
   if (nelts_per_vq == 0)
 /* There is some overlap in the ranges of the four CNT instructions.
@@ -2924,12 +2925,16 @@ aarch64_output_sve_cnt_immediate (const
 
   factor >>= shift;
   unsigned int written;
-  if (factor == 1)
+  if (pattern == AARCH64_SV_ALL && factor == 1)
 written = snprintf (buffer, sizeof (buffer), "%s%c\t%s",
prefix, suffix, operands);
+  else if (factor == 1)
+written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, %s",
+   prefix, suffix, operands, svpattern_token (pattern));
   else
-written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, all, mul #%d",
-   prefix, suffix, operands, factor);
+written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, %s, mul #%d",
+   prefix, suffix, operands, svpattern_token (pattern),
+   factor);
   gcc_assert (written < sizeof (buffer));
   return buffer;
 }
@@ -2939,7 +2944,8 @@ aarch64_output_sve_cnt_immediate (const
PREFIX is the mnemonic without the size suffix and OPERANDS is the
first part of the operands template (the part that comes before the
vector size itself).  X is the value of the vector size operand,
-   as a polynomial integer rtx.  */
+   as a polynomial integer rtx; we need to convert this into an "all"
+   pattern with a multiplier.  */
 
 char *
 aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
@@ -2947,7 +2953,7 @@ aarch64_output_sve_cnt_immediate (const
 {
   poly_int64 value = rtx_to_poly_int64 (x);
   gcc_assert (aarch64_sve_cnt_immediate_p (value));
-  return aarch64_output_sve_cnt_immediate (prefix, operands,
+  return aarch64_output_sve_cnt_immediate (prefix, operands, AARCH64_SV_ALL,
   value.coeffs[1], 0);
 }
 
@@ -2971,10 +2977,10 @@ aarch64_output_sve_scalar_inc_dec (rtx o
   poly_int64 offset_value = rtx_to_poly_int64 (offset);
   gcc_assert (offset_value.coeffs[0] == offset_value.coeffs[1]);
   if (offset_value.coeffs[1] > 0)
-return aarch64_output_sve_cnt_immediate ("inc", "%x0",
+return aarch64_output_sve_cnt_immediate ("inc", "%x0", AARCH64_SV_ALL,
 offset_value.coeffs[1], 0);
   else
-return aarch64_output_sve_cnt_immediate ("dec", "%x0",
+return aarch64_output_sve_cnt_immediate ("dec", "%x0", AARCH64_SV_ALL,
 -offset_value.coeffs[1], 0);
 }
 
@@ -3079,11 +3085,11 @@ aarch64_output_sve_vector_inc_dec (const
   if (!aarch64_sve_vector_inc_dec_immediate_p (x, &factor, &nelts_per_vq))
 gcc_unreachable ();
   if (factor < 0)
-return aarch64_output_sve_cnt_immediate ("dec", operands, -factor,
-nelts_per_vq);
+return aarch64_output_sve_cnt_immediate ("dec", operands, AARCH64_SV_ALL,
+-factor, nelts_per_vq);
   else
-return aarch64_output_sve_cnt_immediate 

[committed][AArch64] Optimise aarch64_add_offset for SVE VL constants

2019-08-15 Thread Richard Sandiford
aarch64_add_offset contains code to decompose all SVE VL-based constants
into native operations.  The worst-case fallback is to load the number
of SVE elements into a register and use a general multiplication.
This patch improves that fallback by reusing expand_mult if
can_create_pseudo_p, rather than emitting a MULT pattern directly.

In order to increase the chances of being able to use a simple
add-and-shift, the patch also tries to compute VG * the lowest set
bit of the multiplier, rather than always using CNTD as the basis
for the multiplication path.

This is tested by the ACLE patches but is really an independent
improvement.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274519.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_add_offset): In the fallback
multiplication case, try to compute VG * (lowest set bit) directly
rather than always basing the multiplication on VG.  Use
expand_mult for the multiplication if we can.

gcc/testsuite/
* gcc.target/aarch64/sve/loop_add_4.c: Expect 10 INCWs and
INCDs rather than 8.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:47:20.180358297 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:49:13.659521097 +0100
@@ -73,6 +73,7 @@ #define INCLUDE_STRING
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
 #include "intl.h"
+#include "expmed.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -3465,20 +3466,36 @@ aarch64_add_offset (scalar_int_mode mode
}
   else
{
- /* Use CNTD, then multiply it by FACTOR.  */
- val = gen_int_mode (poly_int64 (2, 2), mode);
+ /* Base the factor on LOW_BIT if we can calculate LOW_BIT
+directly, since that should increase the chances of being
+able to use a shift and add sequence.  If LOW_BIT itself
+is out of range, just use CNTD.  */
+ if (low_bit <= 16 * 8)
+   factor /= low_bit;
+ else
+   low_bit = 1;
+
+ val = gen_int_mode (poly_int64 (low_bit * 2, low_bit * 2), mode);
  val = aarch64_force_temporary (mode, temp1, val);
 
- /* Go back to using a negative multiplication factor if we have
-no register from which to subtract.  */
- if (code == MINUS && src == const0_rtx)
+ if (can_create_pseudo_p ())
+   {
+ rtx coeff1 = gen_int_mode (factor, mode);
+ val = expand_mult (mode, val, coeff1, NULL_RTX, false, true);
+   }
+ else
{
- factor = -factor;
- code = PLUS;
+ /* Go back to using a negative multiplication factor if we have
+no register from which to subtract.  */
+ if (code == MINUS && src == const0_rtx)
+   {
+ factor = -factor;
+ code = PLUS;
+   }
+ rtx coeff1 = gen_int_mode (factor, mode);
+ coeff1 = aarch64_force_temporary (mode, temp2, coeff1);
+ val = gen_rtx_MULT (mode, val, coeff1);
}
- rtx coeff1 = gen_int_mode (factor, mode);
- coeff1 = aarch64_force_temporary (mode, temp2, coeff1);
- val = gen_rtx_MULT (mode, val, coeff1);
}
 
   if (shift > 0)
Index: gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c   2019-03-08 
18:14:29.780994734 +
+++ gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c   2019-08-15 
09:49:13.659521097 +0100
@@ -68,7 +68,8 @@ TEST_ALL (LOOP)
 /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 
3 } } */
 /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, 
\[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */
 /* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, 
x[0-9]+, lsl 2\]} 8 } } */
-/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 8 } } */
+/* 2 for the calculations of -17 and 17.  */
+/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 10 } } */
 
 /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #16\n} 1 } 
} */
 /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #15\n} 1 } 
} */
@@ -85,7 +86,8 @@ TEST_ALL (LOOP)
 /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 
3 } } */
 /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, 
\[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */
 /* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, 
x[0-9]+, lsl 3\]} 8 } } */
-/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 8 } } */
+/* 2 for the calculations of -17 and 17.  */
+/* { dg-final { scan-assembler-times {\tincd\tx[0-9

[committed][AArch64] Rework SVE INC/DEC handling

2019-08-15 Thread Richard Sandiford
The scalar addition patterns allowed all the VL constants that
ADDVL and ADDPL allow, but wrote the instructions as INC or DEC
if possible (i.e. adding or subtracting a number of elements * [1, 16]
when the source and target registers the same).  That works for the
cases that the autovectoriser needs, but there are a few constants
that INC and DEC can handle but ADDPL and ADDVL can't.  E.g.:

inchx0, all, mul #9

is not a multiple of the number of bytes in an SVE register, and so
can't use ADDVL.  It represents 36 times the number of bytes in an
SVE predicate, putting it outside the range of ADDPL.

This patch therefore adds separate alternatives for INC and DEC,
tied to a new Uai constraint.  It also adds an explicit "scalar"
or "vector" to the function names, to avoid a clash with the
existing support for vector INC and DEC.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274518.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h
(aarch64_sve_scalar_inc_dec_immediate_p): Declare.
(aarch64_sve_inc_dec_immediate_p): Rename to...
(aarch64_sve_vector_inc_dec_immediate_p): ...this.
(aarch64_output_sve_addvl_addpl): Take a single rtx argument.
(aarch64_output_sve_scalar_inc_dec): Declare.
(aarch64_output_sve_inc_dec_immediate): Rename to...
(aarch64_output_sve_vector_inc_dec): ...this.
* config/aarch64/aarch64.c (aarch64_sve_scalar_inc_dec_immediate_p)
(aarch64_output_sve_scalar_inc_dec): New functions.
(aarch64_output_sve_addvl_addpl): Remove the base and offset
arguments.  Only handle true ADDVL and ADDPL instructions;
don't emit an INC or DEC.
(aarch64_sve_inc_dec_immediate_p): Rename to...
(aarch64_sve_vector_inc_dec_immediate_p): ...this.
(aarch64_output_sve_inc_dec_immediate): Rename to...
(aarch64_output_sve_vector_inc_dec): ...this.  Update call to
aarch64_sve_vector_inc_dec_immediate_p.
* config/aarch64/predicates.md (aarch64_sve_scalar_inc_dec_immediate)
(aarch64_sve_plus_immediate): New predicates.
(aarch64_pluslong_operand): Accept aarch64_sve_plus_immediate
rather than aarch64_sve_addvl_addpl_immediate.
(aarch64_sve_inc_dec_immediate): Rename to...
(aarch64_sve_vector_inc_dec_immediate): ...this.  Update call to
aarch64_sve_vector_inc_dec_immediate_p.
(aarch64_sve_add_operand): Update accordingly.
* config/aarch64/constraints.md (Uai): New constraint.
(vsi): Update call to aarch64_sve_vector_inc_dec_immediate_p.
* config/aarch64/aarch64.md (add3): Don't force the second
operand into a register if it satisfies aarch64_sve_plus_immediate.
(*add3_aarch64, *add3_poly_1): Add an alternative
for Uai.  Update calls to aarch64_output_sve_addvl_addpl.
* config/aarch64/aarch64-sve.md (add3): Call
aarch64_output_sve_vector_inc_dec instead of
aarch64_output_sve_inc_dec_immediate.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:22:03.039558220 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:47:06.552458841 +0100
@@ -476,8 +476,9 @@ bool aarch64_zero_extend_const_eq (machi
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
 bool aarch64_sve_cnt_immediate_p (rtx);
+bool aarch64_sve_scalar_inc_dec_immediate_p (rtx);
 bool aarch64_sve_addvl_addpl_immediate_p (rtx);
-bool aarch64_sve_inc_dec_immediate_p (rtx);
+bool aarch64_sve_vector_inc_dec_immediate_p (rtx);
 int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_mov_operand_p (rtx, machine_mode);
@@ -485,8 +486,9 @@ rtx aarch64_reverse_mask (machine_mode,
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
 bool aarch64_offset_9bit_signed_unscaled_p (machine_mode, poly_int64);
 char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
-char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
-char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
+char *aarch64_output_sve_scalar_inc_dec (rtx);
+char *aarch64_output_sve_addvl_addpl (rtx);
+char *aarch64_output_sve_vector_inc_dec (const char *, rtx);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
 char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:43:30.758050951 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:47:06.560458781 +0100
@@ -2950,6 +2950,33 @@ aarch64_output_sve_cnt_immediate (cons

[committed][AArch64] Rework SVE REV[BHW] patterns

2019-08-15 Thread Richard Sandiford
The current SVE REV patterns follow the AArch64 scheme, in which
UNSPEC_REV reverses elements within an -bit granule.
E.g. UNSPEC_REV64 on VNx8HI reverses the four 16-bit elements
within each 64-bit granule.

The native SVE scheme is the other way around: UNSPEC_REV64 is seen
as an operation on 64-bit elements, with REVB swapping bytes within
the elements, REVH swapping halfwords, and so on.  This fits SVE more
naturally because the operation can then be predicated per -bit
granule/element.

Making the patterns use the Advanced SIMD scheme was more natural
when all we cared about were permutes, since we could then use
the source and target of the permute in their original modes.
However, the ACLE does need patterns that follow the native scheme,
treating them as operations on integer elements.  This patch defines
the patterns that way instead and updates the existing uses to match.

This also brings in a couple of helper routines from the ACLE branch.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274517.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (UNSPEC_REVB, UNSPEC_REVH)
(UNSPEC_REVW): New constants.
(elem_bits): New mode attribute.
(SVE_INT_UNARY): New int iterator.
(optab): Handle UNSPEC_REV[BHW].
(sve_int_op): New int attribute.
(min_elem_bits): Handle VNx16QI and the predicate modes.
* config/aarch64/aarch64-sve.md (*aarch64_sve_rev64)
(*aarch64_sve_rev32, *aarch64_sve_rev16vnx16qi): Delete.
(@aarch64_pred_): New pattern.
* config/aarch64/aarch64.c (aarch64_sve_data_mode): New function.
(aarch64_sve_int_mode, aarch64_sve_rev_unspec): Likewise.
(aarch64_split_sve_subreg_move): Use UNSPEC_REV[BHW] instead of
unspecs based on the total width of the reversed data.
(aarch64_evpc_rev_local): Likewise (for SVE only).  Use a
reinterpret followed by a subreg on big-endian targets.

gcc/testsuite/
* gcc.target/aarch64/sve/revb_1.c: Restrict to little-endian targets.
Avoid including stdint.h.
* gcc.target/aarch64/sve/revh_1.c: Likewise.
* gcc.target/aarch64/sve/revw_1.c: Likewise.
* gcc.target/aarch64/sve/revb_2.c: New big-endian test.
* gcc.target/aarch64/sve/revh_2.c: Likewise.
* gcc.target/aarch64/sve/revw_2.c: Likewise.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-15 09:17:59.073360556 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-15 09:41:46.174822592 +0100
@@ -476,6 +476,9 @@ (define_c_enum "unspec"
 UNSPEC_ANDF; Used in aarch64-sve.md.
 UNSPEC_IORF; Used in aarch64-sve.md.
 UNSPEC_XORF; Used in aarch64-sve.md.
+UNSPEC_REVB; Used in aarch64-sve.md.
+UNSPEC_REVH; Used in aarch64-sve.md.
+UNSPEC_REVW; Used in aarch64-sve.md.
 UNSPEC_SMUL_HIGHPART ; Used in aarch64-sve.md.
 UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
 UNSPEC_COND_FABS   ; Used in aarch64-sve.md.
@@ -638,7 +641,10 @@ (define_mode_attr sizem1 [(QI "#7") (HI
 
 ;; The number of bits in a vector element, or controlled by a predicate
 ;; element.
-(define_mode_attr elem_bits [(VNx8HI "16") (VNx4SI "32") (VNx2DI "64")
+(define_mode_attr elem_bits [(VNx16BI "8") (VNx8BI "16")
+(VNx4BI "32") (VNx2BI "64")
+(VNx16QI "8") (VNx8HI "16")
+(VNx4SI "32") (VNx2DI "64")
 (VNx8HF "16") (VNx4SF "32") (VNx2DF "64")])
 
 ;; Attribute to describe constants acceptable in logical operations
@@ -1677,6 +1683,8 @@ (define_int_iterator UNPACK_UNSIGNED [UN
 
 (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGHPART])
 
+(define_int_iterator SVE_INT_UNARY [UNSPEC_REVB UNSPEC_REVH UNSPEC_REVW])
+
 (define_int_iterator SVE_INT_REDUCTION [UNSPEC_ANDV
UNSPEC_IORV
UNSPEC_SMAXV
@@ -1777,6 +1785,9 @@ (define_int_attr optab [(UNSPEC_ANDF "an
(UNSPEC_ANDV "and")
(UNSPEC_IORV "ior")
(UNSPEC_XORV "xor")
+   (UNSPEC_REVB "revb")
+   (UNSPEC_REVH "revh")
+   (UNSPEC_REVW "revw")
(UNSPEC_UMAXV "umax")
(UNSPEC_UMINV "umin")
(UNSPEC_SMAXV "smax")
@@ -2045,7 +2056,10 @@ (define_int_attr sve_int_op [(UNSPEC_AND
 (UNSPEC_UMAXV "umaxv")
 (UNSPEC_UMINV "uminv")
 (UNSPEC_SMAXV "smaxv")
-(UNSPEC_SMINV "sminv")])
+(UNSPEC_SMINV "sminv")
+   

[committed][AArch64] Add more SVE FMLA and FMAD /z alternatives

2019-08-15 Thread Richard Sandiford
This patch makes the floating-point conditional FMA patterns provide the
same /z alternatives as the integer patterns added by a previous patch.
We can handle cases in which individual inputs are allocated to the same
register as the output, so we don't need to force all registers to be
different.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274516.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-sve.md
(*cond__any): Add /z
alternatives in which one of the inputs is in the same register
as the output.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_mla_5.c: Allow FMAD as well as FMLA
and FMSB as well as FMLS.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:37:10.528856480 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:38:53.656095524 +0100
@@ -3844,17 +3844,17 @@ (define_insn_and_rewrite "*cond_<
 ;; Predicated floating-point ternary operations, merging with an
 ;; independent value.
 (define_insn_and_rewrite "*cond__any"
-  [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, ?&w")
+  [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, &w, &w, &w, ?&w")
(unspec:SVE_F
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, 
Upl, Upl")
   (unspec:SVE_F
 [(match_operand 6)
  (match_operand:SI 7 "aarch64_sve_gp_strictness")
- (match_operand:SVE_F 2 "register_operand" "w, w, w")
- (match_operand:SVE_F 3 "register_operand" "w, w, w")
- (match_operand:SVE_F 4 "register_operand" "w, w, w")]
+ (match_operand:SVE_F 2 "register_operand" "w, w, 0, w, w, w")
+ (match_operand:SVE_F 3 "register_operand" "w, w, w, 0, w, w")
+ (match_operand:SVE_F 4 "register_operand" "w, 0, w, w, w, w")]
 SVE_COND_FP_TERNARY)
-  (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero" "Dz, 0, w")]
+  (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero" "Dz, Dz, Dz, Dz, 
0, w")]
  UNSPEC_SEL))]
   "TARGET_SVE
&& !rtx_equal_p (operands[2], operands[5])
@@ -3863,6 +3863,9 @@ (define_insn_and_rewrite "*cond_<
&& aarch64_sve_pred_dominates_p (&operands[6], operands[1])"
   "@
movprfx\t%0., %1/z, %4.\;\t%0., %1/m, 
%2., %3.
+   movprfx\t%0., %1/z, %0.\;\t%0., %1/m, 
%2., %3.
+   movprfx\t%0., %1/z, %0.\;\t%0., %1/m, 
%3., %4.
+   movprfx\t%0., %1/z, %0.\;\t%0., %1/m, 
%2., %4.
movprfx\t%0., %1/m, %4.\;\t%0., %1/m, 
%2., %3.
#"
   "&& 1"
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_mla_5.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/cond_mla_5.c   2019-08-15 
09:22:03.047558159 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_mla_5.c   2019-08-15 
09:38:53.656095524 +0100
@@ -39,13 +39,13 @@ TEST_ALL (DEF_LOOP)
 /* { dg-final { scan-assembler-times {\t(?:mls|msb)\tz[0-9]+\.s, p[0-7]/m,} 1 
} } */
 /* { dg-final { scan-assembler-times {\t(?:mls|msb)\tz[0-9]+\.d, p[0-7]/m,} 1 
} } */
 
-/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h, p[0-7]/m,} 1 } } */
-/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
-/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmla|fmad)\tz[0-9]+\.h, p[0-7]/m,} 
1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmla|fmad)\tz[0-9]+\.s, p[0-7]/m,} 
1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmla|fmad)\tz[0-9]+\.d, p[0-7]/m,} 
1 } } */
 
-/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h, p[0-7]/m,} 1 } } */
-/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
-/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmls|fmsb)\tz[0-9]+\.h, p[0-7]/m,} 
1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmls|fmsb)\tz[0-9]+\.s, p[0-7]/m,} 
1 } } */
+/* { dg-final { scan-assembler-times {\t(?:fmls|fmsb)\tz[0-9]+\.d, p[0-7]/m,} 
1 } } */
 
 /* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.b, p[0-7]/z,} 2 } } 
*/
 /* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.h, p[0-7]/z,} 4 } } 
*/


[committed][AArch64] Add MOVPRFX alternatives for SVE EXT patterns

2019-08-15 Thread Richard Sandiford
We use EXT both to implement vec_extract for large indices and as a
permute.  In both cases we can use MOVPRFX to handle the case in which
the first input and output can't be tied.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274515.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*vec_extract_ext)
(*aarch64_sve_ext): Add MOVPRFX alternatives.

gcc/testsuite/
* gcc.target/aarch64/sve/ext_2.c: Expect a MOVPRFX.
* gcc.target/aarch64/sve/ext_3.c: New test.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:34:37.293987611 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:36:18.953237055 +0100
@@ -1356,16 +1356,19 @@ (define_insn "*vec_extract_du
 ;; Extract an element outside the range of DUP.  This pattern requires the
 ;; source and destination to be the same.
 (define_insn "*vec_extract_ext"
-  [(set (match_operand: 0 "register_operand" "=w")
+  [(set (match_operand: 0 "register_operand" "=w, ?&w")
(vec_select:
- (match_operand:SVE_ALL 1 "register_operand" "0")
+ (match_operand:SVE_ALL 1 "register_operand" "0, w")
  (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE && INTVAL (operands[2]) * GET_MODE_SIZE (mode) >= 64"
   {
 operands[0] = gen_rtx_REG (mode, REGNO (operands[0]));
 operands[2] = GEN_INT (INTVAL (operands[2]) * GET_MODE_SIZE (mode));
-return "ext\t%0.b, %0.b, %0.b, #%2";
+return (which_alternative == 0
+   ? "ext\t%0.b, %0.b, %0.b, #%2"
+   : "movprfx\t%0, %1\;ext\t%0.b, %0.b, %1.b, #%2");
   }
+  [(set_attr "movprfx" "*,yes")]
 )
 
 ;; -
@@ -4700,17 +4703,20 @@ (define_insn "aarch64_sve_"
-  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
-   (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0")
-(match_operand:SVE_ALL 2 "register_operand" "w")
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w, ?&w")
+   (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0, w")
+(match_operand:SVE_ALL 2 "register_operand" "w, w")
 (match_operand:SI 3 "const_int_operand")]
UNSPEC_EXT))]
   "TARGET_SVE
&& IN_RANGE (INTVAL (operands[3]) * GET_MODE_SIZE (mode), 0, 255)"
   {
 operands[3] = GEN_INT (INTVAL (operands[3]) * GET_MODE_SIZE (mode));
-return "ext\\t%0.b, %0.b, %2.b, #%3";
+return (which_alternative == 0
+   ? "ext\\t%0.b, %0.b, %2.b, #%3"
+   : "movprfx\t%0, %1\;ext\\t%0.b, %0.b, %2.b, #%3");
   }
+  [(set_attr "movprfx" "*,yes")]
 )
 
 ;; -
Index: gcc/testsuite/gcc.target/aarch64/sve/ext_2.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/ext_2.c2019-03-08 
18:14:29.776994751 +
+++ gcc/testsuite/gcc.target/aarch64/sve/ext_2.c2019-08-15 
09:36:18.953237055 +0100
@@ -14,5 +14,4 @@ foo (void)
   asm volatile ("" :: "w" (x));
 }
 
-/* { dg-final { scan-assembler {\tmov\tz0\.d, z1\.d\n} } } */
-/* { dg-final { scan-assembler {\text\tz0\.b, z0\.b, z[01]\.b, #4\n} } } */
+/* { dg-final { scan-assembler {\tmovprfx\tz0, z1\n\text\tz0\.b, z0\.b, z1\.b, 
#4\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/ext_3.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/ext_3.c2019-08-15 
09:36:18.953237055 +0100
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=1024" } */
+
+typedef int vnx4si __attribute__((vector_size (128)));
+
+void
+foo (void)
+{
+  register int x asm ("z0");
+  register vnx4si y asm ("z1");
+
+  asm volatile ("" : "=w" (y));
+  x = y[21];
+  asm volatile ("" :: "w" (x));
+}
+
+/* { dg-final { scan-assembler {\tmovprfx\tz0, z1\n\text\tz0\.b, z0\.b, z1\.b, 
#84\n} } } */


[committed][AArch64] Remove unneeded FSUB alternatives and add a new one

2019-08-15 Thread Richard Sandiford
The floating-point subtraction patterns don't need to handle
subtraction of constants, since those go through the addition
patterns instead.  There was a missing MOVPRFX alternative for
FSUBR though.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274514.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*sub3): Remove immediate
FADD and FSUB alternatives.  Add a MOVPRFX alternative for FSUBR.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:32:03.211125428 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:33:35.554443513 +0100
@@ -2878,34 +2878,31 @@ (define_insn_and_rewrite "*cond_add3"
-  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, ?&w")
(unspec:SVE_F
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl")
-  (match_operand:SI 4 "aarch64_sve_gp_strictness" "i, i, i, Z")
-  (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, 
w")
-  (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" 
"vsA, vsN, 0, w")]
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
+  (match_operand:SI 4 "aarch64_sve_gp_strictness" "i, Z, i")
+  (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "vsA, w, 
vsA")
+  (match_operand:SVE_F 3 "register_operand" "0, w, 0")]
  UNSPEC_COND_FSUB))]
-  "TARGET_SVE
-   && (register_operand (operands[2], mode)
-   || register_operand (operands[3], mode))"
+  "TARGET_SVE"
   "@
-   fsub\t%0., %1/m, %0., #%3
-   fadd\t%0., %1/m, %0., #%N3
fsubr\t%0., %1/m, %0., #%2
-   #"
+   #
+   movprfx\t%0, %3\;fsubr\t%0., %1/m, %0., #%2"
   ; Split the unpredicated form after reload, so that we don't have
   ; the unnecessary PTRUE.
   "&& reload_completed
-   && register_operand (operands[2], mode)
-   && register_operand (operands[3], mode)"
+   && register_operand (operands[2], mode)"
   [(set (match_dup 0) (minus:SVE_F (match_dup 2) (match_dup 3)))]
+  ""
+  [(set_attr "movprfx" "*,*,yes")]
 )
 
 ;; Predicated floating-point subtraction from a constant, merging with the


[committed][AArch64] Add more unpredicated MOVPRFX alternatives

2019-08-15 Thread Richard Sandiford
FABD and some immediate instructions were missing MOVPRFX alternatives.
This is tested by the ACLE patches but is really an independent improvement.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274513.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-sve.md (add3, sub3)
(3, *add3, *mul3)
(*fabd3): Add more MOVPRFX alternatives.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:29:08.156418215 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:31:07.711535282 +0100
@@ -1937,16 +1937,19 @@ (define_insn_and_rewrite "*cond_<
 ;; -
 
 (define_insn "add3"
-  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, w")
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, ?w, ?w, w")
(plus:SVE_I
- (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w")
- (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, 
w")))]
+ (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w, w, w")
+ (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, vsa, 
vsn, w")))]
   "TARGET_SVE"
   "@
add\t%0., %0., #%D2
sub\t%0., %0., #%N2
* return aarch64_output_sve_inc_dec_immediate (\"%0.\", 
operands[2]);
+   movprfx\t%0, %1\;add\t%0., %0., #%D2
+   movprfx\t%0, %1\;sub\t%0., %0., #%N2
add\t%0., %1., %2."
+  [(set_attr "movprfx" "*,*,*,yes,yes,*")]
 )
 
 ;; Merging forms are handled through SVE_INT_BINARY.
@@ -1960,14 +1963,16 @@ (define_insn "add3"
 ;; -
 
 (define_insn "sub3"
-  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
(minus:SVE_I
- (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa")
- (match_operand:SVE_I 2 "register_operand" "w, 0")))]
+ (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa, vsa")
+ (match_operand:SVE_I 2 "register_operand" "w, 0, w")))]
   "TARGET_SVE"
   "@
sub\t%0., %1., %2.
-   subr\t%0., %0., #%D1"
+   subr\t%0., %0., #%D1
+   movprfx\t%0, %2\;subr\t%0., %0., #%D1"
+  [(set_attr "movprfx" "*,*,yes")]
 )
 
 ;; Merging forms are handled through SVE_INT_BINARY.
@@ -2320,14 +2325,16 @@ (define_insn_and_rewrite "*cond_<
 
 ;; Unpredicated integer binary logical operations.
 (define_insn "3"
-  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?w, w")
(LOGICAL:SVE_I
- (match_operand:SVE_I 1 "register_operand" "%0, w")
- (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, w")))]
+ (match_operand:SVE_I 1 "register_operand" "%0, w, w")
+ (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, vsl, w")))]
   "TARGET_SVE"
   "@
\t%0., %0., #%C2
+   movprfx\t%0, %1\;\t%0., %0., #%C2
\t%0.d, %1.d, %2.d"
+  [(set_attr "movprfx" "*,yes,*")]
 )
 
 ;; Merging forms are handled through SVE_INT_BINARY.
@@ -2773,23 +2780,27 @@ (define_insn_and_rewrite "*cond_<
 
 ;; Predicated floating-point addition.
 (define_insn_and_split "*add3"
-  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, ?&w, ?&w")
(unspec:SVE_F
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-  (match_operand:SI 4 "aarch64_sve_gp_strictness" "i, i, Z")
-  (match_operand:SVE_F 2 "register_operand" "%0, 0, w")
-  (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" 
"vsA, vsN, w")]
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, 
Upl")
+  (match_operand:SI 4 "aarch64_sve_gp_strictness" "i, i, Z, i, i")
+  (match_operand:SVE_F 2 "register_operand" "%0, 0, w, w, w")
+  (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" 
"vsA, vsN, w, vsA, vsN")]
  UNSPEC_COND_FADD))]
   "TARGET_SVE"
   "@
fadd\t%0., %1/m, %0., #%3
fsub\t%0., %1/m, %0., #%N3
-   #"
+   #
+   movprfx\t%0, %2\;fadd\t%0., %1/m, %0., #%3
+   movprfx\t%0, %2\;fsub\t%0., %1/m, %0., #%N3"
   ; Split the unpredicated form after reload, so that we don't have
   ; the unnecessary PTRUE.
   "&& reload_completed
&& register_operand (operands[3], mode)"
   [(set (match_dup 0) (plus:SVE_F (match_dup 2) (match_dup 3)))]
+  ""
+  [(set_attr "movprfx" "*,*,*,yes,yes")]
 )
 
 ;; Predicated floating-point addition of a constant, merging with the
@@ -2972,23 +2983,26 @@ (define_insn_and_rewrite "*cond_sub3"
-  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
(unspec:SVE_F
- [(match_operand: 1 "register_op

[committed][AArch64] Use SVE reversed shifts in preference to MOVPRFX

2019-08-15 Thread Richard Sandiford
This patch makes us use reversed SVE shifts when the first operand
can't be tied to the output but the second can.  This is tested
more thoroughly by the ACLE patches but is really an independent
improvement.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274512.

Richard


2019-08-15  Richard Sandiford  
Prathamesh Kulkarni  

gcc/
* config/aarch64/aarch64-sve.md (*v3):
Add an alternative that uses reversed shifts.

gcc/testsuite/
* gcc.target/aarch64/sve/shift_1.c: Accept reversed shifts.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:25:43.333930987 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:27:49.844996586 +0100
@@ -2455,23 +2455,24 @@ (define_expand "v3"
 ;; likely to gain much and would make the instruction seem less uniform
 ;; to the register allocator.
 (define_insn_and_split "*v3"
-  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, ?&w")
(unspec:SVE_I
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl")
   (ASHIFT:SVE_I
-(match_operand:SVE_I 2 "register_operand" "w, 0, w")
-(match_operand:SVE_I 3 "aarch64_sve_shift_operand" "D, w, 
w"))]
+(match_operand:SVE_I 2 "register_operand" "w, 0, w, w")
+(match_operand:SVE_I 3 "aarch64_sve_shift_operand" "D, w, 
0, w"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   "@
#
\t%0., %1/m, %0., %3.
+   r\t%0., %1/m, %3., %2.
movprfx\t%0, %2\;\t%0., %1/m, %0., %3."
   "&& reload_completed
&& !register_operand (operands[3], mode)"
   [(set (match_dup 0) (ASHIFT:SVE_I (match_dup 2) (match_dup 3)))]
   ""
-  [(set_attr "movprfx" "*,*,yes")]
+  [(set_attr "movprfx" "*,*,*,yes")]
 )
 
 ;; Unpredicated shift operations by a constant (post-RA only).
Index: gcc/testsuite/gcc.target/aarch64/sve/shift_1.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/shift_1.c  2019-03-08 
18:14:29.784994721 +
+++ gcc/testsuite/gcc.target/aarch64/sve/shift_1.c  2019-08-15 
09:27:49.844996586 +0100
@@ -75,9 +75,9 @@ DO_IMMEDIATE_OPS (63, int64_t, 63);
 /* { dg-final { scan-assembler-times {\tlsr\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, 
z[0-9]+\.s\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tlsl\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, 
z[0-9]+\.s\n} 2 } } */
 
-/* { dg-final { scan-assembler-times {\tasr\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, 
z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tlsr\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, 
z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tlsl\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, 
z[0-9]+\.d\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tasrr?\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tlsrr?\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tlslr?\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
 
 /* { dg-final { scan-assembler-times {\tasr\tz[0-9]+\.b, z[0-9]+\.b, #5\n} 1 } 
} */
 /* { dg-final { scan-assembler-times {\tlsr\tz[0-9]+\.b, z[0-9]+\.b, #5\n} 1 } 
} */


[committed][AArch64] Add a commutativity marker to the SVE [SU]ABD patterns

2019-08-15 Thread Richard Sandiford
This will be tested by the ACLE patches, but it's really an
independent improvement.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274510.

Richard


2019-08-15  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (aarch64_abd_3): Add
a commutativity marker.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:22:03.043558190 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:25:18.602113663 +0100
@@ -2060,7 +2060,7 @@ (define_insn "aarch64_abd_3"
  [(match_operand: 1 "register_operand" "Upl, Upl")
   (minus:SVE_I
 (USMAX:SVE_I
-  (match_operand:SVE_I 2 "register_operand" "0, w")
+  (match_operand:SVE_I 2 "register_operand" "%0, w")
   (match_operand:SVE_I 3 "register_operand" "w, w"))
 (:SVE_I
   (match_dup 2)


[committed][AArch64] Use SVE MLA, MLS, MAD and MSB for conditional arithmetic

2019-08-15 Thread Richard Sandiford
This patch uses predicated MLA, MLS, MAD and MSB to implement
conditional "FMA"s on integers.  This also requires providing
the unpredicated optabs (fma and fnma) since otherwise
tree-ssa-math-opts.c won't try to use the conditional forms.

We still want to use shifts and adds in preference to multiplications,
so the patch makes the optab expanders check for that.

The tests cover floating-point types too, which are already handled,
and which were already tested to some extent by gcc.dg/vect.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274509.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_prepare_sve_int_fma)
(aarch64_prepare_sve_cond_int_fma): Declare.
* config/aarch64/aarch64.c (aarch64_convert_mult_to_shift)
(aarch64_prepare_sve_int_fma): New functions.
(aarch64_prepare_sve_cond_int_fma): Likewise.
* config/aarch64/aarch64-sve.md
(cond_): Add a "@" marker.
(fma4, cond_fma, *cond_fma_2)
(*cond_fma_4, *cond_fma_any, fnma4)
(cond_fnma, *cond_fnma_2)
(*cond_fnma_4, *cond_fnma_any): New patterns.
(*madd): Rename to...
(*fma4): ...this.
(*msub): Rename to...
(*fnma4): ...this.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_mla_1.c: New test.
* gcc.target/aarch64/sve/cond_mla_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_2.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_3.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_4.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_5.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_6.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8_run.c: Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:20:26.0 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:20:53.712070358 +0100
@@ -630,6 +630,9 @@ bool aarch64_gen_adjusted_ldpstp (rtx *,
 void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
 bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
+
+bool aarch64_prepare_sve_int_fma (rtx *, rtx_code);
+bool aarch64_prepare_sve_cond_int_fma (rtx *, rtx_code);
 #endif /* RTX_CODE */
 
 void aarch64_init_builtins (void);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:20:26.0 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:20:53.712070358 +0100
@@ -16469,6 +16469,98 @@ aarch64_sve_expand_vector_init (rtx targ
 aarch64_sve_expand_vector_init_insert_elems (target, v, nelts);
 }
 
+/* Check whether VALUE is a vector constant in which every element
+   is either a power of 2 or a negated power of 2.  If so, return
+   a constant vector of log2s, and flip CODE between PLUS and MINUS
+   if VALUE contains negated powers of 2.  Return NULL_RTX otherwise.  */
+
+static rtx
+aarch64_convert_mult_to_shift (rtx value, rtx_code &code)
+{
+  if (GET_CODE (value) != CONST_VECTOR)
+return NULL_RTX;
+
+  rtx_vector_builder builder;
+  if (!builder.new_unary_operation (GET_MODE (value), value, false))
+return NULL_RTX;
+
+  scalar_mode int_mode = GET_MODE_INNER (GET_MODE (value));
+  /* 1 if the result of the multiplication must be negated,
+ 0 if it mustn't, or -1 if we don't yet care.  */
+  int negate = -1;
+  unsigned int encoded_nelts = const_vector_encoded_nelts (value);
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+{
+  rtx elt = CONST_VECTOR_ENCODED_ELT (value, i);
+  if (!CONST_SCALAR_INT_P (elt))
+   return NULL_RTX;
+  rtx_mode_t val (elt, int_mode);
+  wide_int pow2 = wi::neg (val);
+  if (val != pow2)
+   {
+ /* It matters whether we negate or not.  Make that choice,
+and make sure that it's consistent with previous elements.  */
+ if (negate == !wi::neg_p (val))
+   return NULL_RTX;
+ negate = wi::neg_p (val);
+ if (!negate)
+   pow2 = val;
+   }
+  /* POW2 is now the value that we want to be a power of 2.  */
+  int shift = wi::exact_log2 (pow2);
+  if (shift < 0)
+   return NULL_RTX;
+  builder.quick_push (gen_int_mode (s

[committed][AArch64] Use SVE binary immediate instructions for conditional arithmetic

2019-08-15 Thread Richard Sandiford
This patch lets us use the immediate forms of FADD, FSUB, FSUBR,
FMUL, FMAXNM and FMINNM for conditional arithmetic.  (We already
use them for normal unconditional arithmetic.)

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274508.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64.c (aarch64_print_vector_float_operand):
Print 2.0 naturally.
(aarch64_sve_float_mul_immediate_p): Return true for 2.0.
* config/aarch64/predicates.md
(aarch64_sve_float_negated_arith_immediate): New predicate,
renamed from aarch64_sve_float_arith_with_sub_immediate.
(aarch64_sve_float_arith_with_sub_immediate): Test for both
positive and negative constants.
(aarch64_sve_float_arith_with_sub_operand): Redefine as a register
or an aarch64_sve_float_arith_with_sub_immediate.
* config/aarch64/constraints.md (vsN): Use
aarch64_sve_float_negated_arith_immediate.
* config/aarch64/iterators.md (SVE_COND_FP_BINARY_I1): New int
iterator.
(sve_pred_fp_rhs2_immediate): New int attribute.
* config/aarch64/aarch64-sve.md
(cond_): Use
sve_pred_fp_rhs1_operand and sve_pred_fp_rhs2_operand.
(*cond__2_const)
(*cond__any_const)
(*cond_add_2_const, *cond_add_any_const)
(*cond_sub_3_const, *cond_sub_any_const): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_fadd_1.c: New test.
* gcc.target/aarch64/sve/cond_fadd_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_4_run.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-15 09:16:03.0 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-15 09:16:20.522088713 +0100
@@ -8289,6 +8289,8 @@ aarch64_print_vector_float_operand (FILE
  fixed form in the assembly syntax.  */
   if (real_equal (&r, &dconst0))
 asm_fprintf (f, "0.0");
+  else if (real_equal (&r, &dconst2))
+asm_fprintf (f, "2.0");
   else if (real_equal (&r, &dconst1))
 asm_fprintf (f, "1.0");
   else if (real_equal (&r, &dconsthalf))
@@ -15205,11 +15207,10 @@ aarch64_sve_float_mul_immediate_p (rtx x
 {
   rtx elt;
 
-  /* GCC will never generate a multiply with an immediate of 2, so there is no
- point testing for it (even though it is a valid constant).  */
   return (const_vec_duplicate_p (x, &elt)
  && GET_CODE (elt) == CONST_DOUBLE
- && real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf));
+ && (real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf)
+ 

use __builtin_alloca, drop non-standard alloca.h

2019-08-15 Thread Alexandre Oliva
Since alloca.h is not ISO C, most of our alloca-using tests seem to
rely on __builtin_alloca instead of including the header and calling
alloca.  This patch extends this practice to some of the exceptions I
found in gcc.target, marking them as requiring a functional alloca
while at that.

Tested on x86_64-linux-gnu, and manually compile-tested the non-x86
tests.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/arc/interrupt-6.c: Use __builtin_alloca, require
effective target support for alloca, drop include of alloca.h.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/sparc/setjmp-1.c: Likewise.
* gcc.target/x86_64/abi/ms-sysv/gen.cc: Likewise.
* gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Likewise.
---
 gcc/testsuite/gcc.target/arc/interrupt-6.c |5 ++---
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |5 ++---
 gcc/testsuite/gcc.target/sparc/setjmp-1.c  |4 ++--
 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc |2 +-
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c|2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arc/interrupt-6.c 
b/gcc/testsuite/gcc.target/arc/interrupt-6.c
index d82bd67edd83..9cb0565f55c9 100644
--- a/gcc/testsuite/gcc.target/arc/interrupt-6.c
+++ b/gcc/testsuite/gcc.target/arc/interrupt-6.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "Not available for ARCv1" { arc700 || arc6xx } } */
 /* { dg-options "-O2 -mirq-ctrl-saved=r0-ilink" } */
-
-#include 
+/* { dg-require-effective-target alloca } */
 
 /* Check if ilink is recognized. Check how FP and BLINK are saved.
BLINK is saved last on the stack because the IRQ autosave will do
@@ -14,7 +13,7 @@ extern int bar (void *);
 void  __attribute__ ((interrupt("ilink")))
 foo(void)
 {
-  int *p = alloca (10);
+  int *p = __builtin_alloca (10);
   bar (p);
 }
 /* { dg-final { scan-assembler-not ".*fp,\\\[sp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
index d902a771cc8f..318e06cd94c6 100644
--- a/gcc/testsuite/gcc.target/i386/pr80969-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -2,11 +2,10 @@
 /* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
 /* { dg-options "-Ofast -mabi=ms -mavx512f" } */
 /* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target alloca } */
 
 /* Test with alloca (and DRAP).  */
 
-#include 
-
 int a[56];
 volatile int b = -12345;
 volatile const int d = 42;
@@ -19,7 +18,7 @@ void (*volatile const foo_noinfo)(int *, int, int) = foo;
 
 int main (int argc, char *argv[]) {
   int c;
-  int *e = alloca (d);
+  int *e = __builtin_alloca (d);
   foo_noinfo (e, d, 0);
   for (; b; b++) {
 c = b;
diff --git a/gcc/testsuite/gcc.target/sparc/setjmp-1.c 
b/gcc/testsuite/gcc.target/sparc/setjmp-1.c
index d0fecb363270..699d7f7b8ff4 100644
--- a/gcc/testsuite/gcc.target/sparc/setjmp-1.c
+++ b/gcc/testsuite/gcc.target/sparc/setjmp-1.c
@@ -4,9 +4,9 @@
 /* { dg-do run { target *-*-solaris2.* *-*-linux* *-*-*bsd* } } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fPIC" } */
+/* { dg-require-effective-target alloca } */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -26,7 +26,7 @@ int main (void)
 {
   setjmp (jb);
 
-  char *p = alloca (256);
+  char *p = __builtin_alloca (256);
   memset (p, 0, 256);
   sprintf (p, "%d\n", foo);
 
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
index 701531480f62..818a8875a6dd 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
@@ -291,7 +291,7 @@ void fn::print_def (ostream &out) const
   if (get_msabi () && get_alloca ())
 {
   const char *size_str = m_args.empty () ? "42" : "a";
-  out << "  void *alloca_mem = alloca (8 + " << size_str << ");" << endl
+  out << "  void *alloca_mem = __builtin_alloca (8 + " << size_str << ");" 
<< endl
  << "  *(long*)alloca_mem = FLAG_ALLOCA;" << endl;
 }
   if (get_msabi () && get_varargs ())
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
index 5fdd1e20674b..abfcee6f56a4 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
@@ -49,6 +49,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 /* { dg-do run } */
 /* { dg-additional-sources "do-test.S" } */
 /* { dg-additional-options "-Wall" } */
+/* { dg-require-effective-target alloca } */
 
 #include 
 #include 
@@ -56,7 +57,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change,

[committed][AArch64] Use SVE FABD in conditional arithmetic

2019-08-15 Thread Richard Sandiford
This patch extends the FABD support so that it handles conditional
arithmetic.  We're relying on combine for this, since there's no
associated IFN_COND_* (yet?).

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274507.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-sve.md (*aarch64_cond_abd_2)
(*aarch64_cond_abd_3)
(*aarch64_cond_abd_any): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_fabd_1.c: New test.
* gcc.target/aarch64/sve/cond_fabd_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5_run.c: Likewise.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:08:44.073462152 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:11:00.736451963 +0100
@@ -2795,6 +2795,123 @@ (define_insn_and_rewrite "*fabd3"
   }
 )
 
+;; Predicated floating-point absolute difference, merging with the first
+;; input.
+(define_insn_and_rewrite "*aarch64_cond_abd_2"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
+   (unspec:SVE_F
+ [(match_operand: 1 "register_operand" "Upl, Upl")
+  (unspec:SVE_F
+[(match_operand 4)
+ (match_operand:SI 5 "aarch64_sve_gp_strictness")
+ (unspec:SVE_F
+   [(match_operand 6)
+(match_operand:SI 7 "aarch64_sve_gp_strictness")
+(match_operand:SVE_F 2 "register_operand" "0, w")
+(match_operand:SVE_F 3 "register_operand" "w, w")]
+   UNSPEC_COND_FSUB)]
+UNSPEC_COND_FABS)
+  (match_dup 2)]
+ UNSPEC_SEL))]
+  "TARGET_SVE
+   && aarch64_sve_pred_dominates_p (&operands[4], operands[1])
+   && aarch64_sve_pred_dominates_p (&operands[6], operands[1])"
+  "@
+   fabd\t%0., %1/m, %0., %3.
+   movprfx\t%0, %2\;fabd\t%0., %1/m, %0., %3."
+  "&& (!rtx_equal_p (operands[1], operands[4])
+   || !rtx_equal_p (operands[1], operands[6]))"
+  {
+operands[4] = copy_rtx (operands[1]);
+operands[6] = copy_rtx (operands[1]);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Predicated floating-point absolute difference, merging with the second
+;; input.
+(define_insn_and_rewrite "*aarch64_cond_abd_3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
+   (unspec:SVE_F
+ [(match_operand: 1 "register_operand" "Upl, Upl")
+  (unspec:SVE_F
+[(match_operand 4)
+ (match_operand:SI 5 "aarch64_sve_gp_strictness")
+ (unspec:SVE_F
+   [(match_operand 6)
+(match_operand:SI 7 "aarch64_sve_gp_strictness")
+(match_operand:SVE_F 2 "register_operand" "w, w")
+(match_operand:SVE_F 3 "register_operand" "0, w")]
+   UNSPEC_COND_FSUB)]
+UNSPEC_COND_FABS)
+  (match_dup 3)]
+ UNSPEC_SEL))]
+  "TARGET_SVE
+   && aarch64_sve_pred_dominates_p (&operands[4], operands[1])
+   && aarch64_sve_pred_dominates_p (&operands[6], operands[1])"
+  "@
+   fabd\t%0., %1/m, %0., %2.
+   movprfx\t%0, %3\;fabd\t%0., %1/m, %0., %2."
+  "&& (!rtx_equal_p (operands[1], operands[4])
+   || !rtx_equal_p (operands[1], operands[6]))"
+  {
+operands[4] = copy_rtx (operands[1]);
+operands[6] = copy_rtx (operands[1]);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Predicated floating-point absolute difference, merging with an
+;; independent value.
+(define_insn_and_rewrite "*aarch64_cond_abd_any"
+  [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, &w, &w, ?&w")
+   (unspec:SVE_F
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, 
Upl")
+  (unspec:SVE_F
+[(match_operand 5)
+ (match_operand:SI 6 "aarch64_sve_gp_strictness")
+ (unspec:SVE_F
+   [(match_operand 7)
+(match_operand:SI 8 "aarch64_sve_gp_strictness")
+(match_operand:SVE_F 2 "register_operand" "0, w, w, w, w")
+(match_operand:SVE_F 3 "register_operand" "w, 0, w, w, w")]
+   UNSPEC_COND_FSUB)]
+UNSPEC_COND_FABS)
+  (match_operand:SVE_F 4 "aarch64_simd_reg_or_zero" "Dz, Dz, Dz, 0, 
w")]
+ UNSPEC_SEL))]
+  "TARGET_SVE
+   && !rtx_equal_p (operands[2], operands[4])
+   && !rtx_equal_p (operands[3], operands[4])
+   && aarch64_sve_pred_dominates_p (&operands[5], operands[1])
+   && aarch64_sve_pred_domina

match ld besides collect2 in gcov test

2019-08-15 Thread Alexandre Oliva
The regexp that checks that -lgcov is linked in when --coverage is
passed to the compiler driver requires the command line to match
'/collect2'.  Some of our targets don't match that, but they match /ld
or ${target_alias}-ld depending on the testing scenario, so I'd like
to tweak the test to match those as well.

Tested on x86_64-linux-gnu, and on the affected test scenarios.
Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.misc-tests/options.exp: Match /ld and -ld besides
/collect2.
---
 gcc/testsuite/gcc.misc-tests/options.exp |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.misc-tests/options.exp 
b/gcc/testsuite/gcc.misc-tests/options.exp
index 79535238fca8..c50784c84f31 100644
--- a/gcc/testsuite/gcc.misc-tests/options.exp
+++ b/gcc/testsuite/gcc.misc-tests/options.exp
@@ -65,7 +65,8 @@ proc check_for_all_options {language gcc_options 
compiler_pattern as_pattern ld_
fail "$test (assembler options)"
return
 }
-if {![regexp -- "/collect2(\\.exe)? .*$ld_pattern" $gcc_output]} {
+# Match /collect2, /ld, or *-ld.
+if {![regexp -- "(/collect2|\[-/\]ld)(\\.exe)? .*$ld_pattern" 
$gcc_output]} {
fail "$test (linker options)"
return
 }

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[committed][AArch64] Use SVE [SU]ABD in conditional arithmetic

2019-08-15 Thread Richard Sandiford
This patch extends the [SU]ABD support so that it handles
conditional arithmetic.  We're relying on combine for this,
since there's no associated IFN_COND_* (yet?).

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274506.

Richard


2019-08-15  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-sve.md (*aarch64_cond_abd_2)
(*aarch64_cond_abd_any): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_abd_1.c: New test.
* gcc.target/aarch64/sve/cond_abd_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_2.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_3.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_4.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_abd_5_run.c: Likewise.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:05:44.618788872 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-15 09:07:21.322073904 +0100
@@ -2073,6 +2073,84 @@ (define_insn "aarch64_abd_3"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; Predicated integer absolute difference, merging with the first input.
+(define_insn_and_rewrite "*aarch64_cond_abd_2"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
+   (unspec:SVE_I
+ [(match_operand: 1 "register_operand" "Upl, Upl")
+  (minus:SVE_I
+(unspec:SVE_I
+  [(match_operand 4)
+   (USMAX:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "0, w")
+ (match_operand:SVE_I 3 "register_operand" "w, w"))]
+  UNSPEC_PRED_X)
+(unspec:SVE_I
+  [(match_operand 5)
+   (:SVE_I
+ (match_dup 2)
+ (match_dup 3))]
+  UNSPEC_PRED_X))
+  (match_dup 2)]
+ UNSPEC_SEL))]
+  "TARGET_SVE"
+  "@
+   abd\t%0., %1/m, %0., %3.
+   movprfx\t%0, %2\;abd\t%0., %1/m, %0., %3."
+  "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
+  {
+operands[4] = operands[5] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Predicated integer absolute difference, merging with an independent value.
+(define_insn_and_rewrite "*aarch64_cond_abd_any"
+  [(set (match_operand:SVE_I 0 "register_operand" "=&w, &w, &w, &w, ?&w")
+   (unspec:SVE_I
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, 
Upl")
+  (minus:SVE_I
+(unspec:SVE_I
+  [(match_operand 5)
+   (USMAX:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "0, w, w, w, w")
+ (match_operand:SVE_I 3 "register_operand" "w, 0, w, w, w"))]
+  UNSPEC_PRED_X)
+(unspec:SVE_I
+  [(match_operand 6)
+   (:SVE_I
+ (match_dup 2)
+ (match_dup 3))]
+  UNSPEC_PRED_X))
+  (match_operand:SVE_I 4 "aarch64_simd_reg_or_zero" "Dz, Dz, Dz, 0, 
w")]
+ UNSPEC_SEL))]
+  "TARGET_SVE
+   && !rtx_equal_p (operands[2], operands[4])
+   && !rtx_equal_p (operands[3], operands[4])"
+  "@
+   movprfx\t%0., %1/z, %0.\;abd\t%0., %1/m, 
%0., %3.
+   movprfx\t%0., %1/z, %0.\;abd\t%0., %1/m, 
%0., %2.
+   movprfx\t%0., %1/z, %2.\;abd\t%0., %1/m, 
%0., %3.
+   movprfx\t%0., %1/m, %2.\;abd\t%0., %1/m, 
%0., %3.
+   #"
+  "&& 1"
+  {
+if (!CONSTANT_P (operands[5]) || !CONSTANT_P (operands[6]))
+  operands[5] = operands[6] = CONSTM1_RTX (mode);
+else if (reload_completed
+&& register_operand (operands[4], mode)
+&& !rtx_equal_p (operands[0], operands[4]))
+  {
+   emit_insn (gen_vcond_mask_ (operands[0], operands[2],
+operands[4], operands[1]));
+   operands[4] = operands[2] = operands[0];
+  }
+else
+  FAIL;
+  }
+  [(set_attr "movprfx" "yes")]
+)
+
 ;; -
 ;;  [INT] Highpart multiplication
 ;; -
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_abd_1.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_abd_1.c   2019-08-15 
09:07:21.322073904 +0100
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include 
+
+#define abd(A, B) (((A) < (B) ? (B) : (A)) - ((A) < (B) ? (A) : (B)))
+
+#define DEF_LOOP(TYPE) \
+  void __attribute__ ((noinline, noclone)) \
+  tes

Re: [PATCH 0/8] eBPF support for GCC

2019-08-15 Thread Richard Sandiford
"Jose E. Marchesi"  writes:
> . Dynamic stack allocation (alloca and VLAs) is achieved by using what
>   otherwise would be a perfectly normal general register, %r9, as a
>   pseudo stack pointer.  This has the disadvantage of making the
>   register "fixed" and therefore not available for general register
>   allocation.  Hopefully there is a way to conditionalize this, since
>   both alloca and VLAs are relatively uncommon; I haven't found it
>   yet.

In principle it's possible to define register eliminations for
target-specific registers as well as the usual FRAME/ARG_POINTER_REGNUM
crowd.  So you could have a fake fixed register to represent the pseudo
stack pointer, then allow that to be "eliminated" to %r9 in functions
that need it.  Functions that don't need it can continue (not) using the
fake register and leave %r9 free for general use.

Richard