Re: [PATCH AArch64/V3]Add new patterns for vcond_mask and vec_cmp

2017-06-27 Thread Andrew Pinski
On Mon, Aug 1, 2016 at 6:18 AM, Bin Cheng  wrote:
> Hi,
> This is the 3rd version patch implementing vcond_mask and vec_cmp patterns on 
> AArch64.
> Bootstrap and test along with next patch on AArch64, is it OK?
>
> Thanks,
> bin
>
> 2016-07-28  Alan Lawrence  
> Renlin Li  
> Bin Cheng  
>
> * config/aarch64/aarch64-simd.md (vec_cmp): New pattern.
> (vec_cmp): New pattern.
> (vec_cmpu): New pattern.
> (vcond_mask_): New pattern.

LTGT support is missing and can be generated via __builtin_islessgreater .
See PR 81228.

Thanks,
Andrew Pinski


Re: [PATCH 3/3] Introduce IntegerRange for options (PR driver/79659).

2017-06-27 Thread Jeff Law
On 03/15/2017 03:58 AM, Martin Liška wrote:
> Huh, I forgot to attach the patch.
> 
> Martin
> 
> 0001-Introduce-IntegerRange-for-options-PR-driver-79659.patch
> 
> 
> From bb89456e6cecfa9497cf8e265d2083e762d5bc3e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 27 Feb 2017 14:07:03 +0100
> Subject: [PATCH] Introduce IntegerRange for options (PR driver/79659).
> 
> gcc/ChangeLog:
> 
> 2017-02-28  Martin Liska  
> 
>   PR driver/79659
>   * common.opt: Add IntegerRange to various options.
>   * opt-functions.awk (integer_range_info): New function.
>   * optc-gen.awk: Add integer_range_info to cl_options struct.
>   * opts-common.c (decode_cmdline_option): Handle
>   CL_ERR_INT_RANGE_ARG.
>   (cmdline_handle_error): Likewise.
>   * opts.c (print_filtered_help): Show valid interval in
>   when --help is provided.
>   * opts.h (struct cl_option): Add range_min and range_max fields.
>   * config/i386/i386.opt: Add IntegerRange for -mbranch-cost.
> 
> gcc/c-family/ChangeLog:
> 
> 2017-02-28  Martin Liska  
> 
>   PR driver/79659
>   * c.opt: Add IntegerRange to various options.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-02-28  Martin Liska  
> 
>   PR driver/79659
>   * g++.dg/opt/pr79659.C: New test.
Presumably this never fully moved forward because it wasn't a regression?

This looks quite reasonable to me.  I'm not sure of the state of the
prereqs and you may want/need to add IntegerRange checks on newly added
options since this was first submitted.

If the prereqs are ack'd, then as far as I'm concerned this is good to
go and you're free to add any new IntegerRange checks you deem
necessary/desirable.

jeff



Re: [PATCH] Port Doxygen support script from Perl to Python; add unittests

2017-06-27 Thread Jeff Law
On 05/31/2017 08:10 AM, Martin Liška wrote:
> ..adding missing patch
> 
> 
> 0001-Doxygen-add-default-location-for-filters-and-output-.patch
> 
> 
> From 3021b695a8111e1552176529ab3342cdd2ae3a43 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 3 May 2017 11:42:41 +0200
> Subject: [PATCH] Doxygen: add default location for filters and output folder.
> 
> contrib/ChangeLog:
> 
> 2017-05-03  Martin Liska  
> 
>   * gcc.doxy: Add default location for filters and output folder.
>   * filter_gcc_for_doxygen_new: Rename to filter_gcc_for_doxygen.
>   * filter_params.pl: Remove.
OK.
jeff


Re: [PATCH], Add check ppc_cpu_supports_hw to testsuite

2017-06-27 Thread Jeff Law
On 06/27/2017 05:53 PM, Michael Meissner wrote:
> The PowerPC __builtin_cpu_supports and __builtin_cpu_is built-in functions
> require GLIBC 2.23, since they use fixed words at the end of thread control
> area to store the HWCAP and HWCAP2 bits.  If the compiler was not configured
> with the appropriate GLIBC, the compiler will generate a 0 as the result of 
> the
> built-in function call.
> 
> I've been adding the target_clone attribute support to GCC, and the resolver
> function uses __builtin_cpu_supports to detect which hardware ISA is being
> used.  On systems with an older GLIBC, only the default clone function will 
> get
> called because __builtin_cpu_supports returns 0.
> 
> This adds a target supports option in dejagnu so that future tests can use 
> this
> to determine whether or not to test target_clones.
> 
> I have verified that this patch works with the patches I plan to submit
> tomorrow for enhancing the PowerPC target_clone support.
> 
> Can I install this into the trunk?
> 
> Given that GCC 7 supports __builtin_cpu_is and __builtin_cpu_supports, I would
> ask if I could backport this to GCC 7.x as well to allow future tests to be
> back ported.
> 
> 2017-06-27  Michael Meissner  
> 
>   PR target/81193
>   * lib/target-supports.exp
>   (check_ppc_cpu_supports_hw_available): New test to make sure
>   __builtin_cpu_supports works on power7 and newer.
OK for the trunk.  It's not my call on the release branches though.

jeff


Re: [PATCH], PR ipa/81238, make default target_clone static

2017-06-27 Thread Jeff Law
On 06/27/2017 05:34 PM, Michael Meissner wrote:
> In going through the target_clone support for the PowerPC, I noticed
> that the default target clone function was not explicitly set to private.  
> This
> patch fixes this.  I have checked both the x86_64 and PowerPC target clone
> code, and the patch does make it private.
> 
> I'm doing bootstrap builds on both little endian power8 and an x86_64 system
> right now (both builds are at stage3, so I don't expect any problems).
> Assuming there is no regression, can I check this patch in to the trunk?
> 
> 2017-06-27  Michael Meissner  
> 
>   PR ipa/81238
>   * multiple_target.c (create_dispatcher_calls): Set the default
>   clone to be static, not public.
OK.
jeff



Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Martin Sebor

On 06/27/2017 04:31 PM, Joseph Myers wrote:

On Tue, 27 Jun 2017, Martin Sebor wrote:


There's the usual question of what should be done with arrays
of qualified types (where C does not consider such an array type to be
qualified, but C++ considers it to have the same qualifiers as the element
type).  There's also the matter of qualifiers used internally by GCC to
represent const and noreturn functions.


What about _Atomic?  Should it also be removed?  If yes, how would


I think so.  I'd think of this as being something like type as an rvalue,
which is the unqualified, non-atomic version of the type.  Which is
appropriate for various uses of type-generic macros where you declare
temporary variables, and there is no need for those temporaries to be
volatile, atomic, etc., even if the inputs or outputs for the macro are.


Yes, that would make sense to me.




one then generically define a cv-unqualified object of an atomic
type when given a const- or volatile-qualified atomic type or object?


I'm doubtful of the utility of that.


Given the novelty of the concept in C I don't think it's safe make
predictions about the utility of type-generic tools in the language.
The safest approach, IMO, is to take guidance from the experience
in C++ with its rich set of type traits including remove_const,
remove_volatile, and (to your later point) also remove_cv.  These
weren't introduced to C++ in 1998.  It took over a decade for
people to realize that they were not just useful but essential for
generic programming.


Yes, syntactically restrict is (kind of like) a qualifier, but
semantically it's nothing like it (the standard says it's more
akin to a storage specifier).  Most (but not all) of the essential


Storage class specifiers aren't part of the type system at all.  All the
usual rules for qualified types apply to restrict (whereas they *don't*
necessarily apply to _Atomic).


Sure.  The difference is that restricted pointers carry with them
additional semantic constraints that other qualifiers don't, such
as those transfered by assigning a T* restrict to a T*.  Or
the outer-to-inner assignment limitation.  So while it looks like
a qualifier, I don't think restrict acts like one.




In my mind, all this speaks in favor of introducing simpler building
blocks.  From its name alone, the expected effects of a __remove_const
or __remove_atomic built-in (not to mention their utility) are far
clearer than those of __typeof_noqual__.


If you *only* have blocks like that, you can't then write code that also
removes whatever qualifiers might be added in future - you keep needing to
update the generic code for future qualifiers.  For C90 you'd have had
__remove_const and __remove_volatile, but then would have needed to update
again for restrict, again after that for address spaces, and again after
that for _Atomic.

I.e., just having blocks to remove qualifiers of kind X is not sufficient
without "remove all qualifiers (possibly except these kinds)" as well.  I
suppose you could have __remove_quals (const volatile _Atomic, expr) and
__remove_quals_except (_Atomic, expr) or similar (with some keyword that
goes in there to mean "any address space").


Right.  My point isn't that the bigger features shouldn't exist,
but that they can and should be built on top of the primitives
and defined not in the compiler but in a header.  With
__bultin_remove_const() and __builtin_remove_volatile()
a __typeof_noqual(x) can be a macro that expands to these two,
plus any others as/if necessary, with any other additional
adjustments, again if/when necessary.  This is the C++ approach;
it has worked well there and I think it would work well for C
too.

Martin


[patch, fortran] PR80164 ICE in gfc_format_decoder at gcc/fortran/error.c:933

2017-06-27 Thread Jerry DeLisle
I plan to commit the following patch that was provided by Steve in PR on 
Bugzilla.  Simple. I will also backport to 7. Without patch we get a segfault 
when -Warray-temporaries is used on several existing test cases.  I will prepare 
a test case based on one of those.


Regards,

Jerry

2017-06-27  Jerry DeLisle  

PR fortran/80164
* trans-stmt.c (gfc_trans_call): If no code expr, use code->loc
as warning/error locus.


diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index e4f1da54..a1e1dff7 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -452,7 +452,11 @@ gfc_trans_call (gfc_code * code, bool dependency_check,
 subscripts.  This could be prevented in the elemental case
 as temporaries are handled separatedly
 (below in gfc_conv_elemental_dependencies).  */
-  gfc_conv_loop_setup (, >expr1->where);
+  if (code->expr1)
+   gfc_conv_loop_setup (, >expr1->where);
+  else
+   gfc_conv_loop_setup (, >loc);
+
   gfc_mark_ss_chain_used (ss, 1);

   /* Convert the arguments, checking for dependencies.  */


Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-06-27 Thread Kugan Vivekanandarajah
Hi Ramana,

On 27 June 2017 at 18:01, Ramana Radhakrishnan
 wrote:
> On 27/06/17 02:20, Kugan Vivekanandarajah wrote:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
>> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
>> is enabled.
>>
>> This was added to support building kernel loadable modules. In kernel,
>> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
>> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
>> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
>> loading objects with possibly offending sequence). Thus, it could only
>> support pc relative literal loads.
>>
>> However, the following patch was posted to kernel to add
>> -mpc-relative-literal-loads
>> http://www.spinics.net/lists/arm-kernel/msg476149.html
>>
>> -mpc-relative-literal-loads is unconditionally added to the kernel
>> build as can be seen from:
>> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile
>>
>> Therefore this patch removes the hunk so that applications like
>> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
>> -mno-pc-relative-literal-loads
>
>
> Is that because your compiler has defaulted to -mpc-relative-literal-loads
> because it has the workaround enabled by default ? I'm curious as to why
> others haven't seen this issue.
>

If TARGET_FIX_ERR_A53_843419 is selected, compiler defaults to
-mpc-relative-literal-loads unless we explicitly specify
-mno-pc-relative-literal-loads. Linaro toolchain is built with
TARGET_FIX_ERR_A53_843419.

This linking of TARGET_FIX_ERR_A53_843419 and
-mpc-relative-literal-loads  should now be relaxed since the kernel
explicitly uses -mpc-relative-literal-loads.

This 1MiB issue should be very rarely seen even before you fixed it.

Thanks,
Kugan


> regards
> Ramana


Re: [PATCH], PR ipa/81238, make default target_clone static

2017-06-27 Thread Michael Meissner
On Tue, Jun 27, 2017 at 07:34:13PM -0400, Michael Meissner wrote:
> In going through the target_clone support for the PowerPC, I noticed
> that the default target clone function was not explicitly set to private.  
> This
> patch fixes this.  I have checked both the x86_64 and PowerPC target clone
> code, and the patch does make it private.
> 
> I'm doing bootstrap builds on both little endian power8 and an x86_64 system
> right now (both builds are at stage3, so I don't expect any problems).
> Assuming there is no regression, can I check this patch in to the trunk?

Both PowerPC and x86_64 bootstraps have finished with no regressions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH], Add check ppc_cpu_supports_hw to testsuite

2017-06-27 Thread Michael Meissner
The PowerPC __builtin_cpu_supports and __builtin_cpu_is built-in functions
require GLIBC 2.23, since they use fixed words at the end of thread control
area to store the HWCAP and HWCAP2 bits.  If the compiler was not configured
with the appropriate GLIBC, the compiler will generate a 0 as the result of the
built-in function call.

I've been adding the target_clone attribute support to GCC, and the resolver
function uses __builtin_cpu_supports to detect which hardware ISA is being
used.  On systems with an older GLIBC, only the default clone function will get
called because __builtin_cpu_supports returns 0.

This adds a target supports option in dejagnu so that future tests can use this
to determine whether or not to test target_clones.

I have verified that this patch works with the patches I plan to submit
tomorrow for enhancing the PowerPC target_clone support.

Can I install this into the trunk?

Given that GCC 7 supports __builtin_cpu_is and __builtin_cpu_supports, I would
ask if I could backport this to GCC 7.x as well to allow future tests to be
back ported.

2017-06-27  Michael Meissner  

PR target/81193
* lib/target-supports.exp
(check_ppc_cpu_supports_hw_available): New test to make sure
__builtin_cpu_supports works on power7 and newer.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 249606)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -1930,6 +1930,37 @@ proc check_effective_target_powerpc64_no
} {-O2}]
 }
 
+# Return 1 if the target supports the __builtin_cpu_supports built-in,
+# including having a new enough library to support the test.  Cache the result.
+# Require at least a power7 to run on.
+
+proc check_ppc_cpu_supports_hw_available { } {
+return [check_cached_effective_target ppc_cpu_supports_hw_available {
+   # Some simulators are known to not support VSX/power8 instructions.
+   # For now, disable on Darwin
+   if { [istarget powerpc-*-eabi]
+|| [istarget powerpc*-*-eabispe]
+|| [istarget *-*-darwin*]} {
+   expr 0
+   } else {
+   set options "-mvsx"
+   check_runtime_nocache ppc_cpu_supports_hw_available {
+   int main()
+   {
+   #ifdef __MACH__
+ asm volatile ("xxlor vs0,vs0,vs0");
+   #else
+ asm volatile ("xxlor 0,0,0");
+   #endif
+ if (!__builtin_cpu_supports ("vsx"))
+   return 1;
+ return 0;
+   }
+   } $options
+   }
+}]
+}
+
 # Return 1 if the target supports executing power8 vector instructions, 0
 # otherwise.  Cache the result.
 
@@ -6922,6 +6953,7 @@ proc is-effective-target { arg } {
  "ppc_float128_sw" { set selected [check_ppc_float128_sw_available] }
  "ppc_float128_hw" { set selected [check_ppc_float128_hw_available] }
  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
+ "ppc_cpu_supports_hw" { set selected 
[check_ppc_cpu_supports_hw_available] }
  "dfp_hw" { set selected [check_dfp_hw_available] }
  "htm_hw" { set selected [check_htm_hw_available] }
  "named_sections" { set selected [check_named_sections_available] }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH rs6000] remove implicit static var outputs of toc_relative_expr_p

2017-06-27 Thread Segher Boessenkool
Hi Aaron,

On Tue, Jun 27, 2017 at 11:43:57AM -0500, Aaron Sawdey wrote:
> The function toc_relative_expr_p implicitly sets two static vars
> (tocrel_base and tocrel_offset) that are declared in rs6000.c. The real
> purpose of this is to communicate between
> print_operand/print_operand_address and rs6000_output_addr_const_extra,
> which is called through the asm_out hook vector by something in the
> call tree under output_addr_const.
> 
> This patch changes toc_relative_expr_p to make tocrel_base and
> tocrel_offset be explicit const_rtx * args. All of the calls other than
> print_operand/print_operand_address are changed to have local const_rtx
> vars that are passed in.

If those locals aren't used, can you arrange to call toc_relative_expr_p
with NULL instead?  Or are they always used?

> The statics in rs6000.c are now called
> tocrel_base_oac and tocrel_offset_oac to reflect their use to
> communicate across output_addr_const, and that is now the only thing
> they are used for.

Can't say I like those names, very cryptical.  Not that I know something
better, the short names as they were weren't very nice either.

> --- gcc/config/rs6000/rs6000.c(revision 249639)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -8628,18 +8628,25 @@
> && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (base), Pmode));
>  }
>  
> -static const_rtx tocrel_base, tocrel_offset;
> +/* These are only used to pass through from 
> print_operand/print_operand_address
> + * to rs6000_output_addr_const_extra over the intervening function 
> + * output_addr_const which is not target code.  */

No leading * in a block comment please.  (And you have a trailing space).

> +static const_rtx tocrel_base_oac, tocrel_offset_oac;
>  
>  /* Return true if OP is a toc pointer relative address (the output
> of create_TOC_reference).  If STRICT, do not match non-split
> -   -mcmodel=large/medium toc pointer relative addresses.  */
> +   -mcmodel=large/medium toc pointer relative addresses.  Places base 
> +   and offset pieces in TOCREL_BASE and TOCREL_OFFSET respectively.  */

s/Places/Place/ (and another trailing space).

> -  tocrel_base = op;
> -  tocrel_offset = const0_rtx;
> +  *tocrel_base = op;
> +  *tocrel_offset = const0_rtx;
>if (GET_CODE (op) == PLUS && add_cint_operand (XEXP (op, 1), GET_MODE 
> (op)))
>  {
> -  tocrel_base = XEXP (op, 0);
> -  tocrel_offset = XEXP (op, 1);
> +  *tocrel_base = XEXP (op, 0);
> +  *tocrel_offset = XEXP (op, 1);
>  }

Maybe write this as

  if (GET_CODE (op) == PLUS && add_cint_operand (XEXP (op, 1), GET_MODE (op)))
{
  *tocrel_base = XEXP (op, 0);
  *tocrel_offset = XEXP (op, 1);
}
  else
{
  *tocrel_base = op;
  *tocrel_offset = const0_rtx;
}

or, if you allow NULL pointers,

  bool with_offset = GET_CODE (op) == PLUS
 && add_cint_operand (XEXP (op, 1), GET_MODE (op));
  if (tocrel_base)
*tocrel_base = with_offset ? XEXP (op, 0) : op;
  if (tocrel_offset)
*tocrel_offset = with_offset ? XEXP (op, 1) : const0_rtx;

or such.

> -  return (GET_CODE (tocrel_base) == UNSPEC
> -   && XINT (tocrel_base, 1) == UNSPEC_TOCREL);
> +  return (GET_CODE (*tocrel_base) == UNSPEC
> +   && XINT (*tocrel_base, 1) == UNSPEC_TOCREL);

Well, and then you have this, so you need to assign tocrel_base to a local
as well.

>  legitimate_constant_pool_address_p (const_rtx x, machine_mode mode,
>   bool strict)
>  {
> -  return (toc_relative_expr_p (x, strict)
> +  const_rtx tocrel_base, tocrel_offset;
> +  return (toc_relative_expr_p (x, strict, _base, _offset)

For example here it seems nothing uses tocrel_base?

It is probably nicer to have a separate function for toc_relative_expr_p
and one to pull the base/offset out.  And maybe don't keep it cached for
the output function either?  It has all info it needs, right, the full
address RTX?  I don't think it is measurably slower to pull the address
apart an extra time?


Segher


[PATCH], PR ipa/81238, make default target_clone static

2017-06-27 Thread Michael Meissner
In going through the target_clone support for the PowerPC, I noticed
that the default target clone function was not explicitly set to private.  This
patch fixes this.  I have checked both the x86_64 and PowerPC target clone
code, and the patch does make it private.

I'm doing bootstrap builds on both little endian power8 and an x86_64 system
right now (both builds are at stage3, so I don't expect any problems).
Assuming there is no regression, can I check this patch in to the trunk?

2017-06-27  Michael Meissner  

PR ipa/81238
* multiple_target.c (create_dispatcher_calls): Set the default
clone to be static, not public.

Index: gcc/multiple_target.c
===
--- gcc/multiple_target.c   (revision 249710)
+++ gcc/multiple_target.c   (working copy)
@@ -148,6 +148,7 @@ create_dispatcher_calls (struct cgraph_n
}
 }
 
+  TREE_PUBLIC (node->decl) = 0;
   symtab->change_decl_assembler_name (node->decl,
  clone_function_name (node->decl,
   "default"));

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage

2017-06-27 Thread Jeff Law
On 05/01/2017 11:16 AM, Steve Ellcey wrote:
> This is a resubmittal of an earlier patch
> (https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00203.html) to improve the
> use of ccmp (conditional compare) on aarch64.  I made a couple of tweaks
> after the first submittal and retested now that we are back in stage 1.
> 
> Most of the changes are restructuring the code to allow the change and do
> not affect the actual output.  The actual behavour change is in
> ccmp_tree_comparison_p where we recoginize a boolean variable as well
> as a compare expression as code that can be done with a conditionial
> compare and in get_compare_parts where we treat a boolean variable X
> as 'X != 0' and generate that comparision.
> 
> Since the code in ccmp.c is ony used when TARGET_GEN_CCMP_FIRST is set
> and TARGET_GEN_CCMP_FIRST is only set for aarch64 this change will only
> affect aarch64.
> 
> Tested with no regressions and a new test is added to verify that we
> generate a ccmp instruction with the change.  I ran the SPEC2006 int
> tests and got a .02 increase in the SPECmark on a ThunderX box. The
> biggest increases were in mcf and astar.  One test, xlancbmk, did slow
> down but the overall SPEC result was a speedup.
> 
> OK for checkin?
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> GCC ChangeLog:
> 
> 2017-05-01  Steve Ellcey  
> 
>   * ccmp.c (ccmp_tree_comparison_p): New function.
>   (ccmp_candidate_p): Update to use above function.
>   (get_compare_parts): New function.
>   (expand_ccmp_next): Update to use new functions.
>   (expand_ccmp_expr_1): Take tree arg instead of gimple, update to use
>   new functions.
>   (expand_ccmp_expr): Pass tree instead of gimple to expand_ccmp_expr_1,
>   take mode as argument.
>   * ccmp.h (expand_ccmp_expr): Add mode as argument.
>   * expr.c (expand_expr_real_1): Pass mode as argument.
> 
> GCC Testsuite ChangeLog:
> 
> 
> 2017-05-01  Steve Ellcey  
> 
>   * gcc.target/aarch64/ccmp_2.c: New test.
> 
> 
> gcc.ccmp.patch
> 
> 
> diff --git a/gcc/ccmp.c b/gcc/ccmp.c
> index 92ca133..4fa3ebd 100644
> --- a/gcc/ccmp.c
> +++ b/gcc/ccmp.c
> @@ -38,6 +38,29 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ccmp.h"
>  #include "predict.h"
>  
> +/* Check whether T is a simple boolean variable or a SSA name
> +   set by a comparison operator in the same basic block.  */
> +static bool
> +ccmp_tree_comparison_p (tree t, basic_block bb)
> +{
> +  gimple *g = get_gimple_for_ssa_name (t);
> +  tree_code tcode;
> +
> +  /* If we have a boolean variable allow it and generate a compare
> + to zero reg when expanding.  */
> +  if (!g)
> +return (TREE_CODE (TREE_TYPE (t)) == BOOLEAN_TYPE);
Depending on how you use T, you might be better off checking T's range
and considering anything with the [0,1] range as a boolean.  That would
also pick up the case where T was set via a comparison, or the output of
a PHI with arguments that are all [0,1], etc.  I've found that to be a
useful improvement in a couple places.

See ssa_name_has_boolean_range.  I don't consider it a requirement for
this patch to go forward, but more something you might want to
investigate as a future improvement.

OK for the trunk.  Sorry about the delay.

jeff



libgo patch committed: AIX memory management

2017-06-27 Thread Ian Lance Taylor
On AIX:
* mmap does not allow to map an already mapped range,
* mmap range start at 0x3000 for 32 bits processes,
* mmap range start at 0x7000_ for 64 bits processes

This libgo patch by Matthieu Sarter addresses these issues.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249712)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-63b766d67098877496a4b79d7f41e731fbe8abc8
+66d14d95a5a453682fe387319c80bc4fc40d96ad
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/malloc.go
===
--- libgo/go/runtime/malloc.go  (revision 249205)
+++ libgo/go/runtime/malloc.go  (working copy)
@@ -291,6 +291,8 @@ func mallocinit() {
// allocation at 0x40 << 32 because when using 4k pages with 
3-level
// translation buffers, the user address space is limited to 39 
bits
// On darwin/arm64, the address space is even smaller.
+   // On AIX, mmap adresses range start at 0x0700_ for 
64 bits
+   // processes.
arenaSize := round(_MaxMem, _PageSize)
bitmapSize = arenaSize / (sys.PtrSize * 8 / 2)
spansSize = arenaSize / _PageSize * sys.PtrSize
@@ -301,12 +303,15 @@ func mallocinit() {
p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
case GOARCH == "arm64":
p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
+   case GOOS == "aix":
+   i = 1
+   p = uintptr(i)<<32 | uintptrMask&(0x70<<52)
default:
p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
}
pSize = bitmapSize + spansSize + arenaSize + _PageSize
p = uintptr(sysReserve(unsafe.Pointer(p), pSize, 
))
-   if p != 0 {
+   if p != 0 || GOOS == "aix" { // Useless to loop on AIX, 
as i is forced to 1
break
}
}
Index: libgo/go/runtime/mem_gccgo.go
===
--- libgo/go/runtime/mem_gccgo.go   (revision 249205)
+++ libgo/go/runtime/mem_gccgo.go   (working copy)
@@ -270,6 +270,11 @@ func sysMap(v unsafe.Pointer, n uintptr,
return
}
 
+   if GOOS == "aix" {
+   // AIX does not allow mapping a range that is already mapped.
+   // So always unmap first even if it is already unmapped.
+   munmap(v, n)
+   }
p := mmap(v, n, _PROT_READ|_PROT_WRITE, 
_MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, mmapFD, 0)
if uintptr(p) == _MAP_FAILED && errno() == _ENOMEM {
throw("runtime: out of memory")
Index: libgo/runtime/runtime_c.c
===
--- libgo/runtime/runtime_c.c   (revision 249205)
+++ libgo/runtime/runtime_c.c   (working copy)
@@ -139,6 +139,10 @@ uintptr getEnd(void)
 uintptr
 getEnd()
 {
+#ifdef _AIX
+  // mmap adresses range start at 0x3000 on AIX for 32 bits processes
+  uintptr end = 0x3000U;
+#else
   uintptr end = 0;
   uintptr *pend;
 
@@ -146,6 +150,8 @@ getEnd()
   if (pend != nil) {
 end = *pend;
   }
+#endif
+
   return end;
 }
 


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Joseph Myers
On Tue, 27 Jun 2017, Martin Sebor wrote:

> > There's the usual question of what should be done with arrays
> > of qualified types (where C does not consider such an array type to be
> > qualified, but C++ considers it to have the same qualifiers as the element
> > type).  There's also the matter of qualifiers used internally by GCC to
> > represent const and noreturn functions.
> 
> What about _Atomic?  Should it also be removed?  If yes, how would

I think so.  I'd think of this as being something like type as an rvalue, 
which is the unqualified, non-atomic version of the type.  Which is 
appropriate for various uses of type-generic macros where you declare 
temporary variables, and there is no need for those temporaries to be 
volatile, atomic, etc., even if the inputs or outputs for the macro are.

> one then generically define a cv-unqualified object of an atomic
> type when given a const- or volatile-qualified atomic type or object?

I'm doubtful of the utility of that.

> Yes, syntactically restrict is (kind of like) a qualifier, but
> semantically it's nothing like it (the standard says it's more
> akin to a storage specifier).  Most (but not all) of the essential

Storage class specifiers aren't part of the type system at all.  All the 
usual rules for qualified types apply to restrict (whereas they *don't* 
necessarily apply to _Atomic).

> In my mind, all this speaks in favor of introducing simpler building
> blocks.  From its name alone, the expected effects of a __remove_const
> or __remove_atomic built-in (not to mention their utility) are far
> clearer than those of __typeof_noqual__.

If you *only* have blocks like that, you can't then write code that also 
removes whatever qualifiers might be added in future - you keep needing to 
update the generic code for future qualifiers.  For C90 you'd have had 
__remove_const and __remove_volatile, but then would have needed to update 
again for restrict, again after that for address spaces, and again after 
that for _Atomic.

I.e., just having blocks to remove qualifiers of kind X is not sufficient 
without "remove all qualifiers (possibly except these kinds)" as well.  I 
suppose you could have __remove_quals (const volatile _Atomic, expr) and 
__remove_quals_except (_Atomic, expr) or similar (with some keyword that 
goes in there to mean "any address space").

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector

2017-06-27 Thread Jeff Law
On 06/06/2017 02:25 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> I'm trying to improve some of the RTL-level handling of vector lane
> operations on aarch64 and that
> involves dealing with a lot of vec_merge operations. One simplification
> that I noticed missing
> from simplify-rtx are combinations of vec_merge with vec_duplicate.
> In this particular case:
> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
> 
> which can be replaced with
> 
> (vec_concat (X) (B)) if N == 1 (0b01) or
> (vec_concat (A) (X)) if N == 2 (0b10).
> 
> For the aarch64 testcase in this patch this simplifications allows us to
> try to combine:
> (set (reg:V2DI 77 [ x ])
> (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
> (const_int 0 [0])))
> 
> instead of the more complex:
> (set (reg:V2DI 77 [ x ])
> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
> *y_3(D)+0 S8 A64]))
> (const_vector:V2DI [
> (const_int 0 [0])
> (const_int 0 [0])
> ])
> (const_int 1 [0x1])))
> 
> 
> For the simplified form above we already have an aarch64 pattern:
> *aarch64_combinez which
> is missing a DI/DFmode version due to an oversight, so this patch
> extends that pattern as well to
> use the VDC mode iterator that includes DI and DFmode (as well as V2HF
> which VD_BHSI was missing).
> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c
> hunk, so I didn't split them
> into separate patches.
> 
> Before this for the testcase we'd generate:
> construct_lanedi:
> moviv0.4s, 0
> ldr x0, [x0]
> ins v0.d[0], x0
> ret
> 
> construct_lanedf:
> moviv0.2d, 0
> ldr d1, [x0]
> ins v0.d[0], v1.d[0]
> ret
> 
> but now we can generate:
> construct_lanedi:
> ldr d0, [x0]
> ret
> 
> construct_lanedf:
> ldr d0, [x0]
> ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
> Simplify vec_merge of vec_duplicate and const_vector.
> * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
> New predicate.
> * config/aarch64/aarch64-simd.md (*aarch64_combinez): Use VDC
> mode iterator.  Update predicate on operand 1 to
> handle non-const_vec constants.  Delete constraints.
> (*aarch64_combinez_be): Likewise for operand 2.
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/construct_lane_zero_1.c: New test.
OK for the simplify-rtx parts.

jeff



Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification

2017-06-27 Thread Jeff Law
On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)
> 
> For the testcase in this patch on aarch64 this allows us to try matching
> during combine the pattern:
> (set (reg:V2DI 78 [ x ])
> (vec_concat:V2DI
> (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
> (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
> (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
> 8B]+0 S8 A64])))
> 
> rather than the more complex:
> (set (reg:V2DI 78 [ x ])
> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
> [ y ])
> (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
> + 8B]+0 S8 A64]))
> (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
> S8 A64]))
> (const_int 2 [0x2])))
> 
> We don't actually have an aarch64 pattern for the simplified version
> above, but it's a simple enough
> form to add, so this patch adds such a pattern that performs a
> concatenated load of two 64-bit vectors
> in adjacent memory locations as a single Q-register LDR. The new aarch64
> pattern is needed to demonstrate
> the effectiveness of the simplify-rtx change, so I've kept them together
> as one patch.
> 
> Now for the testcase in the patch we can generate:
> construct_lanedi:
> ldr q0, [x0]
> ret
> 
> construct_lanedf:
> ldr q0, [x0]
> ret
> 
> instead of:
> construct_lanedi:
> ld1r{v0.2d}, [x0]
> ldr x0, [x0, 8]
> ins v0.d[1], x0
> ret
> 
> construct_lanedf:
> ld1r{v0.2d}, [x0]
> ldr d1, [x0, 8]
> ins v0.d[1], v1.d[0]
> ret
> 
> The new memory constraint Utq is needed because we need to allow only
> the Q-register addressing modes but
> the MEM expressions in the RTL pattern have 64-bit vector modes, and if
> we don't constrain them they will
> allow the D-register addressing modes during register allocation/address
> mode selection, which will produce
> invalid assembly.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
> Simplify vec_merge of vec_duplicate and vec_concat.
> * config/aarch64/constraints.md (Utq): New constraint.
> * config/aarch64/aarch64-simd.md (load_pair_lanes): New
> define_insn.
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
OK for the simplify-rtx bits.

jeff



Re: [PATCH][simplify-rtx] Simplify vec_merge of vec_duplicates into vec_concat

2017-06-27 Thread Jeff Law
On 06/06/2017 02:38 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> Another vec_merge simplification that's missing from simplify-rtx.c is
> transforming
> a vec_merge of two vec_duplicates. For example:
> (set (reg:V2DF 80)
> (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 84))
> (vec_duplicate:V2DF (reg:DF 81))
> (const_int 2)))
> 
> Can be transformed into the simpler:
> (set (reg:V2DF 80)
> (vec_concat:V2DF (reg:DF 81)
> (reg:DF 84)))
> 
> I believe this should always be beneficial.
> I'm still looking into finding a small testcase demonstrating this, but
> on aarch64 SPEC
> I've seen this eliminate some really bizzare codegen where GCC was
> generating nonsense like:
>   ldr q18, [sp, 448]
>   ins v18.d[0], v23.d[0]
>   ins v18.d[1], v22.d[0]
> 
> With q18 being pushed and popped off the stack in the prologue and
> epilogue of the function!
> These are large files from SPEC that I haven't been able to analyse yet
> as to why GCC even attempts
> to do that, but with this patch it doesn't try to load a register and
> overwrite all its lanes.
> This patch shaves off about 5k of code size from zeusmp on aarch64 at
> -O3, so I believe it's a good
> thing to do.
> 
> Ok?
> 
> Thanks,
> Kyrill
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
> of two vec_duplicates into a vec_concat.
OK.  Though I'd really like to see a testcase to exercise the
simplification.

jeff


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Martin Sebor

On 06/27/2017 11:50 AM, Joseph Myers wrote:

On Tue, 27 Jun 2017, Martin Sebor wrote:


Another thing, with the current patch, __typeof_noqual__(const int)
would still produce "const int".  With the __atomic_load_n proposal
it'd return "int".  I don't know what we want to do for typenames,
but __typeof__(_Atomic int) produces "atomic int".


I missed that.  That seems surprising.  I would expect the trait
to evaluate to the same type regardless of the argument (type or
expression).  Why does it only strip qualifiers from expressions
and not also from types?


The type stripping from atomic expressions is basically what's necessary
for some stdatomic.h macros to work, while minimizing the risk to existing
code.  Of course when adding _Atomic, anything whatever could have been
done with atomic types without risk to existing code, but I suppose there
is a case for thinking of typeof (typename) as being purely like
parentheses - not modifying the type at all.

I'd expect __typeof_noqual to remove qualifiers from both expressions and
type names.


I agree, although this discussion has made me even more convinced
that a set of simpler primitives would be preferable.


There's the usual question of what should be done with arrays
of qualified types (where C does not consider such an array type to be
qualified, but C++ considers it to have the same qualifiers as the element
type).  There's also the matter of qualifiers used internally by GCC to
represent const and noreturn functions.


What about _Atomic?  Should it also be removed?  If yes, how would
one then generically define a cv-unqualified object of an atomic
type when given a const- or volatile-qualified atomic type or object?


Unless __typeof__ (p) q = p; declares a restrict-qualified q when
p is a restrict-qualified pointer I don't think __remove_restrict
is needed.  Restrict doesn't qualify a type but rather a pointer
object it applies to so I would find the effect above unexpected


restrict acts as a type qualifier in C terms, the type being
"restrict-qualifiers pointer to ...".  I'd expect it to work just like
const and volatile in __typeof and __typeof_noqual.


Yes, syntactically restrict is (kind of like) a qualifier, but
semantically it's nothing like it (the standard says it's more
akin to a storage specifier).  Most (but not all) of the essential
properties of a restrict-qualified pointer also aren't removed by
removing the qualifier.  Given a 'T restrict *p =  T *q = p;'
the pointer q is subject to the same aliasing constraints as p.
I'm pretty sure most users also expect the definition of q above
to be equivalent to '__typeof__(p) q = p;'  If it weren't (and if
restrict were, in fact, part of the type extracted from p by
__typeof__), and applied to q then the assignment from p to q
would be undefined.

Conversely, if __typeof_noqual__ did remove the restrict qualifier
(as I agree it should), then similarly to the _Atomic question above,
how would one define a pointer q of the (non-restricted) type T when
given a restrict-qualified pointer p to T such that the assignment
(q = p) didn't also discard const or volatile (or _Atomic) qualifiers?

In my mind, all this speaks in favor of introducing simpler building
blocks.  From its name alone, the expected effects of a __remove_const
or __remove_atomic built-in (not to mention their utility) are far
clearer than those of __typeof_noqual__.

Martin


Re: [gofrontend-dev] [PATCH, go]: S/390: Fix generation of PtraceRegs

2017-06-27 Thread Ian Lance Taylor
On Tue, Jun 27, 2017 at 12:52 AM, Andreas Krebbel
 wrote:
>
> go bootstrap fails on s390x starting with r249472. With including the ptrace 
> header the s390 special
> code in mksysinfo.sh isn't used anymore:
>
> if test "$regs" = ""; then
>   # s390
>   regs=`grep '^type __user_regs_struct struct' gen-sysinfo.go || true`
>   if test "$regs" != ""; then
> # Substructures of __user_regs_struct on s390
> upcase_fields "__user_psw_struct" "PtracePsw" >> ${OUT} || true
> upcase_fields "__user_fpregs_struct" "PtraceFpregs" >> ${OUT} || true
> upcase_fields "__user_per_struct" "PtracePer" >> ${OUT} || true
>   fi
> fi
>
> Instead we fall through to the code with the generic handling which appears 
> to work fine. The only
> difference is that the former code used to uppercase the initial letters of 
> the struct member while
> the generic handler doesn't. The only user however appear to be 
> syscall_linux_s390(x).go.
>
> The attached patch removes the mksysino.sh S/390 specific handling and 
> adjusts the
> syscall_linux_s390* file accordingly.
>
> This fixes the bootstrap on s390x.

Thanks for sending this.

Committed to mainline.

Ian


Re: fix libcc1 dependencies in toplevel Makefile

2017-06-27 Thread Alexandre Oliva
On Jun 26, 2017, Olivier Hainque  wrote:

> On Jun 22, 2017, at 14:12 , Alexandre Oliva  wrote:
>> Your patch takes care of the build dependencies of libcc1, which should
>> avoid some scenarios that might lead to concurrency between staged and
>> non-staged builds.  However, I don't see that it ensures libcc1 will be
>> built after GCC in bootstrap scenarios; it might do so under 'make
>> bootstrap', but probably not under 'make all-libcc1'.  I think we may
>> need some additional bootstrap-only explicit dependency for that to work
>> properly.

> I don't quite understand this: we're using the same prerequisite as target
> libraries, e.g. all-target-libstdc++-v3 or all-target-libbacktrace

Not quite.  Target libraries have deps on e.g. target-libgcc, look below
the following comments in Makefile.in:

# Dependencies for target modules on other target modules are
# described by lang_env_dependencies; the defaults apply to anything
# not mentioned there.

plus, maybe-configure*-target-libgcc depend on maybe-all*-gcc (see above
those comments).  The precise deps vary per bootstrap level, or
non-bootstrap.

But after the proposed patch there are no such deps for libcc1 in the
bootstrap case, so we might very well attempt to build libcc1 in
parallel with gcc.  We shouldn't do that.

But then, it all works out because we only build all-host after
bootstrap is complete; all-stage* doesn't depend on libcc1 at all.


> and I don't see other deps for these either.

> I don't see why the sequencing constraints for libcc1 should be tighter
> than those for the target libraries.

It was not about making them tighter, just about making them present.
Right now, in the bootstrap case, they're entirely implicit, by the fact
that we complete bootstrap first, then proceed to build all-host
all-target.  This deserves at least a comment somewhere, perhaps next to
libcc1 in Makefile.def, or next to depgcc.

Something to the effect that depgcc brings in a necessary dependency
that is implicit in the bootstrap case by the fact that we firt
bootstrap, then proceed to build all-host all-target.

Perhaps instead of depgcc=true, we should have a new flag in
dependencies that indicates the dep should be non-bootstrap only.  Or
maybe the code that implements dependencies could figure it out on its
own, when it sees a dep between a non-bootstrap module and a bootstrap
one, and generate the deps within @if gcc-no-bootstrap/@endif.

I think this would get us the behavior we want in both bootstrap and
non-bootstrap cases, including the libcc1 configure dep that, as it is,
might cause GCC to be configured in parallel given the right (or rather
wrong) conditions.


On Jun 27, 2017, Olivier Hainque  wrote:

>> On Jun 26, 2017, at 09:16 , Olivier Hainque  wrote:

> make -j 32 BOOT_LDFLAGS=-Wl,--stack=0x200 CC=gcc 'ADAFLAGS=-W
> -Wall -gnatpg -gnata -gnatws -gnatU -gnatyN' CXXFLAGS=-O2
> BOOT_CFLAGS=-O2 CFLAGS=-O2 'LN_S=cp -p' 'BOOT_ADAFLAGS=-gnatpgn
> -gnatU' 'STAGE1_CFLAGS=-O2 -O0 -g' bootstrap

Thanks.  Given that 'bootstrap' is the only requested make target, we
can be assured that something iffy took place.  What I can't figure out
is how we even tried to build libcc1 during bootstrap, under that
configuration, because the current Makefile would only do that with
all-host, after bootstrap is complete.

> From the logs of discussions we tracked, the understanding
> of the dependency issue was that we *had* (before the patch),
> possibilities to have stage_current and maybe-all-gcc targets
> built concurrently, via

>> configure-target-libquadmath: stage_current
>> all-target-libquadmath: configure-target-libquadmath
>> maybe-all-target-libquadmath: all-target-libquadmath

>> all-target: maybe-all-target-libquadmath

> on the one hand,

>> all-libcc1: maybe-all-gcc

>> maybe-all-libcc1: all-libcc1

>> all-host: maybe-all-libcc1

> on the other hand.

> Does that make sense ?

Yeah.  Running all-gcc While unstage does its directory-moving dance
can't be good.  We can't have them both.


So, would you like to give the automatic figuring out of
non-bootstrap-on-bootstrap deps in dependencies, and guard them between
@if gcc-no-bootstrap and @endif (then both configure- and all- libcc1
deps would be adjusted this way)?  (I'm not saying it should be trivial
to do or anything like that; I'm not all that familiar with it and I'd
have to figure it out myself if I were to do it, but I think that would
be better than adding yet another means of introducing dependencies,
while leaving another risky dep in place)

Thanks,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[C++ PATCH] tsubs_decl

2017-06-27 Thread Nathan Sidwell
I was looking at why tsubst_decl wanted to call constructor_name. 
Turned out it was a check with no effect -- setting 'member' to 2 
instead of 1.  And we only ever check if member is zero or non-zero.


so deleted that check, turned member into a bool, and tidied up the code 
a bit to declare vars at their firs initialization.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-06-27  Nathan Sidwell  

	* pt.c (tsubst_decl ): Move var decls to
	initialization point.  Don't unnecessarily check for ctor name.

Index: pt.c
===
--- pt.c	(revision 249702)
+++ pt.c	(working copy)
@@ -12277,22 +12277,13 @@ tsubst_decl (tree t, tree args, tsubst_f
 
 case FUNCTION_DECL:
   {
-	tree ctx;
-	tree argvec = NULL_TREE;
-	tree *friends;
-	tree gen_tmpl;
-	tree type;
-	int member;
-	int args_depth;
-	int parms_depth;
+	tree gen_tmpl, argvec;
 
 	/* Nobody should be tsubst'ing into non-template functions.  */
 	gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE);
 
 	if (TREE_CODE (DECL_TI_TEMPLATE (t)) == TEMPLATE_DECL)
 	  {
-	tree spec;
-
 	/* If T is not dependent, just return it.  */
 	if (!uses_template_parms (DECL_TI_ARGS (t)))
 	  RETURN (t);
@@ -12310,9 +12301,7 @@ tsubst_decl (tree t, tree args, tsubst_f
 
 	/* Check to see if we already have this specialization.  */
 	hash = hash_tmpl_and_args (gen_tmpl, argvec);
-	spec = retrieve_specialization (gen_tmpl, argvec, hash);
-
-	if (spec)
+	if (tree spec = retrieve_specialization (gen_tmpl, argvec, hash))
 	  {
 		r = spec;
 		break;
@@ -12350,11 +12339,11 @@ tsubst_decl (tree t, tree args, tsubst_f
 
 	   which we can spot because the pattern will be a
 	   specialization in this case.  */
-	args_depth = TMPL_ARGS_DEPTH (args);
-	parms_depth =
+	int args_depth = TMPL_ARGS_DEPTH (args);
+	int parms_depth =
 	  TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (t)));
-	if (args_depth > parms_depth
-		&& !DECL_TEMPLATE_SPECIALIZATION (t))
+
+	if (args_depth > parms_depth && !DECL_TEMPLATE_SPECIALIZATION (t))
 	  args = get_innermost_template_args (args, parms_depth);
 	  }
 	else
@@ -12371,23 +12360,18 @@ tsubst_decl (tree t, tree args, tsubst_f
 	   new decl (R) with appropriate types so that we can call
 	   determine_specialization.  */
 	gen_tmpl = NULL_TREE;
+	argvec = NULL_TREE;
 	  }
 
-	if (DECL_CLASS_SCOPE_P (t))
-	  {
-	if (DECL_NAME (t) == constructor_name (DECL_CONTEXT (t)))
-	  member = 2;
-	else
-	  member = 1;
-	ctx = tsubst_aggr_type (DECL_CONTEXT (t), args,
-complain, t, /*entering_scope=*/1);
-	  }
-	else
-	  {
-	member = 0;
-	ctx = DECL_CONTEXT (t);
-	  }
-	type = tsubst (TREE_TYPE (t), args, complain|tf_fndecl_type, in_decl);
+	tree ctx = DECL_CONTEXT (t);
+	bool member = ctx && TYPE_P (ctx);
+
+	if (member)
+	  ctx = tsubst_aggr_type (ctx, args,
+  complain, t, /*entering_scope=*/1);
+
+	tree type = tsubst (TREE_TYPE (t), args,
+			complain | tf_fndecl_type, in_decl);
 	if (type == error_mark_node)
 	  RETURN (error_mark_node);
 
@@ -12507,14 +12491,13 @@ tsubst_decl (tree t, tree args, tsubst_f
 	  DECL_TEMPLATE_INFO (r) = NULL_TREE;
 
 	/* Copy the list of befriending classes.  */
-	for (friends = _BEFRIENDING_CLASSES (r);
+	for (tree *friends = _BEFRIENDING_CLASSES (r);
 	 *friends;
 	 friends = _CHAIN (*friends))
 	  {
 	*friends = copy_node (*friends);
-	TREE_VALUE (*friends) = tsubst (TREE_VALUE (*friends),
-	args, complain,
-	in_decl);
+	TREE_VALUE (*friends)
+	  = tsubst (TREE_VALUE (*friends), args, complain, in_decl);
 	  }
 
 	if (DECL_CONSTRUCTOR_P (r) || DECL_DESTRUCTOR_P (r))


Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-27 Thread Peter Bergner
On 6/27/17 11:07 AM, Segher Boessenkool wrote:
> Not use an installed header, that's not what I'm asking.  Share the
> source file, i.e., just copy it over from the glibc source tree (it
> should probably hold the master copy).  Fewer typos, cannot forget to
> update some entry, etc.

So the glibc file is:

  sysdeps/powerpc/bits/hwcap.h

which contains only the #define PPC_FEATURE[2]_* definitions.
The GCC file is:

  gcc/config/rs6000/ppc-auxv.h

and contains the same #define's as hwcap.h above, plus the additional
#defines's:

/* The PLATFORM value stored in the TCB is offset by _DL_FIRST_PLATFORM.  */
#define _DL_FIRST_PLATFORM 32

/* AT_PLATFORM bits.  These must match the values defined in GLIBC. */
#define PPC_PLATFORM_POWER40
#define PPC_PLATFORM_PPC9701
#define PPC_PLATFORM_POWER52
...

which match values in glibc's sysdeps/powerpc/dl-procinfo.h, but that
file contains a lot more than just the defines that we (GCC) doesn't
want or need.

ppc-auxv.h also contains the following helper macros that calculate the
fixed offsets to the TCB slots that glibc initializes, but glibc has
access to the structs that the slows live in, so they don't need these
helper macros and hence don't have them:

/* Thread Control Block (TCB) offsets of the AT_PLATFORM, AT_HWCAP and
   AT_HWCAP2 values.  These must match the values defined in GLIBC.  */
#define TCB_PLATFORM_OFFSET ((TARGET_64BIT) ? -28764 : -28724)
#define TCB_HWCAP_BASE_OFFSET ((TARGET_64BIT) ? -28776 : -28736)
#define TCB_HWCAP1_OFFSET \
  ((BYTES_BIG_ENDIAN) ? TCB_HWCAP_BASE_OFFSET : TCB_HWCAP_BASE_OFFSET+4)
#define TCB_HWCAP2_OFFSET \
  ((BYTES_BIG_ENDIAN) ? TCB_HWCAP_BASE_OFFSET+4 : TCB_HWCAP_BASE_OFFSET)
#define TCB_HWCAP_OFFSET(ID) \
  (((ID) == 0) ? TCB_HWCAP1_OFFSET : TCB_HWCAP2_OFFSET)

These are only used in rs6000.c, so I could move them there.

So given the above, how do we want to handle this?  If we were to copy a
header file(s) over from glibc, are we able to modify it in the process?
Ie, to remove the parts we don't need like hwcap.h's use of:

  #if !defined(_SYS_AUXV_H) && !defined(_SYSDEPS_SYSDEP_H)
  # error "Never include  directly; use  instead."
  #endif

which would trigger for our use of it.  And also to remove unneeded code from
dl-procinfo.h, since we only want the #defines.

Peter




Re: Use ucontext_t not struct ucontext in linux-unwind.h files

2017-06-27 Thread Joseph Myers
On Tue, 27 Jun 2017, Joseph Myers wrote:

> Current glibc no longer gives the ucontext_t type the tag struct
> ucontext, to conform with POSIX namespace rules.  This requires
> various linux-unwind.h files in libgcc, that were previously using
> struct ucontext, to be fixed to use ucontext_t instead.  This is
> similar to the removal of the struct siginfo tag from siginfo_t some
> years ago.
> 
> This patch changes those files to use ucontext_t instead.  As the
> standard name that should be unconditionally safe, so this is not
> restricted to architectures supported by glibc, or conditioned on the
> glibc version.
> 
> Testing compilation together with current glibc with glibc's
> build-many-glibcs.py.  OK to commit (mainline and active release
> branches) if that passes?

That compilation testing has now passed (together with a couple of glibc 
patches, now committed, to fix the build with -Wmultistatement-macros).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-27 Thread Andrew Pinski
On Tue, Jun 27, 2017 at 7:56 AM, Richard Biener
 wrote:
> On June 27, 2017 4:52:28 PM GMT+02:00, Tamar Christina 
>  wrote:
>>> >> +(for cmp (gt ge lt le)
>>> >> + outp (convert convert negate negate)
>>> >> + outn (negate negate convert convert)
>>> >> + /* Transform (X > 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>>> >> + /* Transform (X >= 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>>> >> + /* Transform (X < 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>>> >> + /* Transform (X <= 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>>> >> +(simplify
>>> >> +  (cond (cmp @0 real_zerop) real_onep real_minus_onep)
>>> >> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
>>> >> +   && types_match (type, TREE_TYPE (@0)))
>>> >> +   (switch
>>> >> +(if (types_match (type, float_type_node))
>>> >> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outp @0)))
>>> >> +(if (types_match (type, double_type_node))
>>> >> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outp @0)))
>>> >> +(if (types_match (type, long_double_type_node))
>>> >> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outp @0))
>>> >>
>>
>>Hi,
>>
>>Out of curiosity is there any reason why this transformation can't be
>>more general?
>>
>>e.g. Transform (X > 0.0 ? CST : -CST) into copysign(CST, X).
>
> That's also possible, yes.

I will be implementing that latter today.

Thanks,
Andrew Pinski

>
>>we would at the very least avoid a csel or a branch then.
>>
>>Regards,
>>Tamar
>


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Joseph Myers
On Tue, 27 Jun 2017, Martin Sebor wrote:

> > Another thing, with the current patch, __typeof_noqual__(const int)
> > would still produce "const int".  With the __atomic_load_n proposal
> > it'd return "int".  I don't know what we want to do for typenames,
> > but __typeof__(_Atomic int) produces "atomic int".
> 
> I missed that.  That seems surprising.  I would expect the trait
> to evaluate to the same type regardless of the argument (type or
> expression).  Why does it only strip qualifiers from expressions
> and not also from types?

The type stripping from atomic expressions is basically what's necessary 
for some stdatomic.h macros to work, while minimizing the risk to existing 
code.  Of course when adding _Atomic, anything whatever could have been 
done with atomic types without risk to existing code, but I suppose there 
is a case for thinking of typeof (typename) as being purely like 
parentheses - not modifying the type at all.

I'd expect __typeof_noqual to remove qualifiers from both expressions and 
type names.  There's the usual question of what should be done with arrays 
of qualified types (where C does not consider such an array type to be 
qualified, but C++ considers it to have the same qualifiers as the element 
type).  There's also the matter of qualifiers used internally by GCC to 
represent const and noreturn functions.

> Unless __typeof__ (p) q = p; declares a restrict-qualified q when
> p is a restrict-qualified pointer I don't think __remove_restrict
> is needed.  Restrict doesn't qualify a type but rather a pointer
> object it applies to so I would find the effect above unexpected

restrict acts as a type qualifier in C terms, the type being 
"restrict-qualifiers pointer to ...".  I'd expect it to work just like 
const and volatile in __typeof and __typeof_noqual.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH GCC][6/6]Avoid aggressive predcom for high register pressure cases

2017-06-27 Thread Jeff Law
On 05/12/2017 05:28 AM, Bin Cheng wrote:
> Hi,
> Aggressive precom could result in larger number loop carried variables, 
> causes high
> register pressure and spilling.  One example is the hot loop of 
> 436.cactusADM, in
> which >25 loop carried variables are introduced for the vectorized version 
> loop,
> depending on the vector factor.  This patch computes loop register pressure 
> on tree
> ssa using previously introduced interface.  It uses the information to prune 
> chains
> with simple heuristic.  For example, combined and zero-length chains are 
> always
> allowed; other chains are allowed under register cost; and loop unrolling is 
> forced
> off if register pressure is high.  With this patch, the benchmark can be 
> obvious
> improved on AArch64.
> 
> Bootstrap and test on x86_64 and AArch64, is it OK?
> 
> Thanks,
> bin
> 2017-05-10  Bin Cheng  
> 
>   * tree-predcom.c (stor-layout.h, tree-ssa-regpressure.h): New header
>   files.
>   (prune_chains): New function.
>   (tree_predictive_commoning_loop): Call compute_loop_reg_pressure to
>   compute reg pressure.  Prune chains based on reg pressure.  Force
>   to not unroll if reg pressure is high.
> 
SO obviously this will be dependent on the whole set of earlier patches
and may need some tweaking if those change.  The most concerning part is
exposing register classes to this code.  If we could avoid that it would
be good.

BUt I'm on board with the idea of using an estimated register pressure
to tune this stuff a bit.

jeff


Re: [PATCH GCC][3/6]New file computing regional register pressure on TREE SSA

2017-06-27 Thread Jeff Law
On 05/12/2017 05:28 AM, Bin Cheng wrote:
> Hi,
> This patch computes register pressure information on TREE SSA by a backward 
> live
> range data flow problem.  The major motivation is to estimate register 
> pressure
> for inner-most loop on TREE SSA, then other optimizations can use it.  So far 
> the
> information is used only in predcom later, but it could be useful to 
> implement a
> tree level scheduler in order to shrink live ranges.  Unfortunately the 
> example
> live range shrink pass I implemented doesn't have obvious impact on 
> performance.
> I think one reason is TER which effectively undoes its effect.  Maybe it will 
> be
> useful once TER/expanding is replaced with a better instruction selector, it 
> is
> not included in this patch.
> One fact I need to mention is David proposed a similar patch long time ago at
> https://gcc.gnu.org/ml/gcc-patches/2008-12/msg01261.html.  It tries to compute
> register pressure information on tree ssa and shrink live ranges based on that
> information.  Unfortunately the patch wasn't merged in the end.  There has 
> been
> quite changes in GCC implementation, I didn't use its code directly.  However,
> I did read that patch and had it in mind when implementing this one.  If there
> is any issue in this patch, it would be me that should be blamed.  I also sent
> message to David about this patch and the possible relation with his.
> 
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
> 
> Thanks,
> bin
> 
> 2017-05-10  Xinliang David Li  
>   Bin Cheng  
> 
>   * Makefile.in (tree-ssa-regpressure.o): New object file.
>   * tree-ssa-regpressure.c: New file.
>   * tree-ssa-regpressure.h: New file.
Any thoughts on tests, either end-to-end or unit testing?

At a high level does this make more sense as a pass or as a function
that is called by other passes?  I don't have a strong opinion here,
just putting the question out there for discussion.

You've got a live computation solver in here.  Is there some reason you
don't use the existing life analysis code?   I'd prefer not have have
another life analysis implementation if we can avoid it.  And if you
were using that code, I think you can easily get the coalescing data
you're using as well.

I haven't gone through all the detail in the patch as I think we need to
make sure we've got the design issues right first.  BUt there are a
couple nits noted inline below.






> 
> 
> 0003-tree-ssa-regpressure-20170504.txt
> 
> 
> From bf6e51ff68d87c372719de567d4de49d77744f77 Mon Sep 17 00:00:00 2001
> From: Bin Cheng 
> Date: Mon, 8 May 2017 15:20:27 +0100
> Subject: [PATCH 3/6] tree-ssa-regpressure-20170504.txt
> 
> ---
>  gcc/Makefile.in|   1 +
>  gcc/tree-ssa-regpressure.c | 829 
> +
>  gcc/tree-ssa-regpressure.h |  21 ++
>  3 files changed, 851 insertions(+)
>  create mode 100644 gcc/tree-ssa-regpressure.c
>  create mode 100644 gcc/tree-ssa-regpressure.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 97259ac..abfd4bc 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1534,6 +1534,7 @@ OBJS = \
>   tree-ssa-pre.o \
>   tree-ssa-propagate.o \
>   tree-ssa-reassoc.o \
> + tree-ssa-regpressure.o \
>   tree-ssa-sccvn.o \
>   tree-ssa-scopedtables.o \
>   tree-ssa-sink.o \
> diff --git a/gcc/tree-ssa-regpressure.c b/gcc/tree-ssa-regpressure.c
> new file mode 100644
> index 000..ebc6576
> --- /dev/null
> +++ b/gcc/tree-ssa-regpressure.c
> @@ -0,0 +1,829 @@
> +/* Reg Pressure Model and Live Range Shrinking Optimization on TREE SSA.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "ira.h"
So I suspect what we need from ira.h is fairly narrow.  Would it be
possible to pull the externalized interfaces we need for register
pressure at the gimple level into a new include file?

My worry is that as we pull in ira.h (and rtl.h and who knows what else)
into the gimple space we end up with a tangled mess of touching rtl
things in gimple which we'd like to avoid.

In fact, the first thing that 

Re: Backports to 6 (and 7, and 5)

2017-06-27 Thread Jakub Jelinek
On Tue, Jun 27, 2017 at 11:02:37AM -0500, Segher Boessenkool wrote:
> On Tue, Jun 27, 2017 at 09:18:07AM +0200, Richard Biener wrote:
> > On Mon, 26 Jun 2017, Segher Boessenkool wrote:
> > > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01853.html
> > > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01923.html
> > > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02048.html
> > > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02606.html
> > > bb-reorder: Improve compgotos pass (PR71785)
> > 
> > It's not clear this fixes a regression and as it is
> > a missed-optimization I'd not backport it at this point in time
> > (I understand it's in GCC 7 already).
> 
> It's a regression from 3.x, and a pretty severe one, but we can live
> with it for a bit longer, it's a bit invasive for a backport.

It is just a missed-optimizations, not wrong-code, and the regression is
pretty old, so I think it doesn't hurt it if it stays fixed only in 7+.

Jakub


[C++ PATCH] CLASSTYPE_DESTRUCTOR

2017-06-27 Thread Nathan Sidwell
There can only be one destructor, so CLASSTYPE_DESTRUCTORS is a 
confusing name. (we're not talking about the various flavours of the 
single destructor)


Applied.

nathan
--
Nathan Sidwell
2017-06-27  Nathan Sidwell  

	* cp-tree.h (CLASSTYPE_DESTRUCTORS): Rename to ...
	(CLASSTYPE_DESTRUCTOR): ... this.
	* class.c (accessible_nvdtor_p,
	maybe_warn_about_overly_private_class,
	add_implicitly_declared_members,
	clone_constructors_and_destructors, type_has_virtual_destructor):
	Adjust for CLASSTYPE_DESTRUCTOR.
	(deduce_noexcept_on_destructors): Absorb into ...
	(check_bases_and_members): ... here.
	* except.c (dtor_nothrow): Adjust for CLASSTYPE_DESTRUCTOR.
	* init.c (build_delete): Likewise.
	* parser.c (cp_parser_lookup_name): Likewise.
	* pt.c (check_explicit_specialization): Likewise.
	* rtti.c (emit_support_tinfos): Likewise.
	* search.c (lookup_fnfields_idx_nolazy): Likewise.

Index: class.c
===
--- class.c	(revision 249693)
+++ class.c	(working copy)
@@ -1711,7 +1711,7 @@ inherit_targ_abi_tags (tree t)
 static bool
 accessible_nvdtor_p (tree t)
 {
-  tree dtor = CLASSTYPE_DESTRUCTORS (t);
+  tree dtor = CLASSTYPE_DESTRUCTOR (t);
 
   /* An implicitly declared destructor is always public.  And,
  if it were virtual, we would have created it by now.  */
@@ -2220,7 +2220,7 @@ maybe_warn_about_overly_private_class (t
   /* Even if some of the member functions are non-private, the class
  won't be useful for much if all the constructors or destructors
  are private: such an object can never be created or destroyed.  */
-  fn = CLASSTYPE_DESTRUCTORS (t);
+  fn = CLASSTYPE_DESTRUCTOR (t);
   if (fn && TREE_PRIVATE (fn))
 {
   warning (OPT_Wctor_dtor_privacy,
@@ -3366,18 +3366,17 @@ add_implicitly_declared_members (tree t,
  int cant_have_const_cctor,
  int cant_have_const_assignment)
 {
-  bool move_ok = false;
+  /* Destructor.  */
+  if (!CLASSTYPE_DESTRUCTOR (t))
+/* In general, we create destructors lazily.  */
+CLASSTYPE_LAZY_DESTRUCTOR (t) = 1;
 
-  if (cxx_dialect >= cxx11 && !CLASSTYPE_DESTRUCTORS (t)
+  bool move_ok = false;
+  if (cxx_dialect >= cxx11 && CLASSTYPE_LAZY_DESTRUCTOR (t)
   && !TYPE_HAS_COPY_CTOR (t) && !TYPE_HAS_COPY_ASSIGN (t)
   && !type_has_move_constructor (t) && !type_has_move_assign (t))
 move_ok = true;
 
-  /* Destructor.  */
-  if (!CLASSTYPE_DESTRUCTORS (t))
-/* In general, we create destructors lazily.  */
-CLASSTYPE_LAZY_DESTRUCTOR (t) = 1;
-
   /* [class.ctor]
 
  If there is no user-declared constructor for a class, a default
@@ -5015,8 +5014,9 @@ clone_constructors_and_destructors (tree
  we no longer need to know that.  */
   for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter)
 clone_function_decl (*iter, /*update_methods=*/true);
-  for (ovl_iterator iter (CLASSTYPE_DESTRUCTORS (t)); iter; ++iter)
-clone_function_decl (*iter, /*update_methods=*/true);
+
+  if (tree dtor = CLASSTYPE_DESTRUCTOR (t))
+clone_function_decl (dtor, /*update_methods=*/true);
 }
 
 /* Deduce noexcept for a destructor DTOR.  */
@@ -5029,24 +5029,6 @@ deduce_noexcept_on_destructor (tree dtor
 		noexcept_deferred_spec);
 }
 
-/* For each destructor in T, deduce noexcept:
-
-   12.4/3: A declaration of a destructor that does not have an
-   exception-specification is implicitly considered to have the
-   same exception-specification as an implicit declaration (15.4).  */
-
-static void
-deduce_noexcept_on_destructors (tree t)
-{
-  /* If for some reason we don't have a CLASSTYPE_METHOD_VEC, we bail
- out now.  */
-  if (!CLASSTYPE_METHOD_VEC (t))
-return;
-
-  for (ovl_iterator iter (CLASSTYPE_DESTRUCTORS (t)); iter; ++iter)
-deduce_noexcept_on_destructor (*iter);
-}
-
 /* Subroutine of set_one_vmethod_tm_attributes.  Search base classes
of TYPE for virtual functions which FNDECL overrides.  Return a
mask of the tm attributes found therein.  */
@@ -5460,7 +5442,7 @@ type_has_virtual_destructor (tree type)
 return false;
 
   gcc_assert (COMPLETE_TYPE_P (type));
-  dtor = CLASSTYPE_DESTRUCTORS (type);
+  dtor = CLASSTYPE_DESTRUCTOR (type);
   return (dtor && DECL_VIRTUAL_P (dtor));
 }
 
@@ -5851,10 +5833,11 @@ check_bases_and_members (tree t)
  of potential interest.  */
   check_bases (t, _have_const_ctor, _const_asn_ref);
 
-  /* Deduce noexcept on destructors.  This needs to happen after we've set
+  /* Deduce noexcept on destructor.  This needs to happen after we've set
  triviality flags appropriately for our bases.  */
   if (cxx_dialect >= cxx11)
-deduce_noexcept_on_destructors (t);
+if (tree dtor = CLASSTYPE_DESTRUCTOR (t))
+  deduce_noexcept_on_destructor (dtor);
 
   /* Check all the method declarations.  */
   check_methods (t);
Index: cp-tree.h
===
--- cp-tree.h	(revision 249693)
+++ cp-tree.h	(working 

Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-27 Thread Renlin Li

Hi Andrew,

On 27/06/17 17:11, Andrew Pinski wrote:

On Tue, Jun 27, 2017 at 8:27 AM, Renlin Li  wrote:

Hi Andrew,

On 25/06/17 22:38, Andrew Pinski wrote:


On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li  wrote:


Hi all,

In this patch, a new integer register operand modifier 'r' is added. This
will use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which
might
be used in inline assembly. It's not an exhaustive list covers every
modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly
documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?



I think 'r' modifier is very fragile and can be used incorrectly and
wrong in some cases really..



The user could always (or be encouraged to) opt to a strict register
modifier to enforce consistent behavior in all cases.

I agree the flexibility might bring unexpected behavior in corner cases.
Do you have any examples to share off the top of your head? So that we can
discuss the benefit and pitfalls, and decide to improve the patch or
withdraw it.


One thing is TImode is missing.  I have an use case of __int128_t
inside inline-asm.
For me %r and TImode would produce "x0, x1".  This is one of the
reasons why I said it is fragile.



This is true. Actually, I intended to make 'r' only handle the simplest single
integer register case.
So that people won't believe it's a magic thing which could handle everything.
I could improve the description about 'r' to clearly explain it's limitation.

For TImode integer data, if 'r' is used, it will error
"invalid 'asm': invalid operand mode for register modifier 'r'"




I like the documentation though.


As an aside %H is not documented here.  Noticed it because I am using
%H with TImode.


For the document as well, I only document those most common ones which might be used in 
inline assembly. It's good to know more use cases.

I could add 'H' into the document.

Regards,
Renlin



Thanks,
Andrew



Thanks,
Renlin




Thanks,
Andrew



gcc/ChangeLog:

2017-06-06  Renlin Li  

  PR target/63359
  * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
modifier.
  * doc/extend.texi (AArch64Operandmodifiers): New section.


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Martin Sebor

On 06/27/2017 07:14 AM, Marek Polacek wrote:

On Mon, Jun 26, 2017 at 10:37:03AM -0600, Martin Sebor wrote:

On 06/23/2017 08:46 AM, Marek Polacek wrote:

This patch adds a variant of __typeof, called __typeof_noqual.  As the name
suggests, this variant always drops all qualifiers, not just when the type
is atomic.  This was discussed several times in the past, see e.g.

or

It's been brought to my attention again here:


One approach would be to just modify the current __typeof, but that could
cause some incompatibilities, I'm afraid.  This is based on rth's earlier
patch:  but I
didn't do the address space-stripping variant __typeof_noas.  I also added
a couple of missing things.


I haven't reviewed all the discussions super carefully so I wonder
what alternatives have been considered.  For instance, it seems to
me that it should be possible to emulate __typeof_noqual__ by relying
on the atomic built-ins' type-genericity.  E.g., like this:

  #define __typeof_noqual__(x) \
__typeof__ (__atomic_load_n ((__typeof__ (x)*)0, 0))


This doesn't seem to work with structs/arrays/VLA, so wouldn't help.
(typeof can't handle bit-fields, so no need to worry about those.)


You're right, it doesn't appear to work the way I thought.  I was
misled by a poor warning message into believing it did.  But it
seems that it should work.  atomic_load() is specified (with DR
459 applied) to take a const volatile _Atomic T* argument and
return a plain T.  Or is there a problem I'm missing?  (Btw.,
I used the atomic built-in only as an example of the approach
I was thinking of, hoping someone would come up with a better
built-in or other existing extension to make the solution look
less hacky.)



Another thing, with the current patch, __typeof_noqual__(const int)
would still produce "const int".  With the __atomic_load_n proposal
it'd return "int".  I don't know what we want to do for typenames,
but __typeof__(_Atomic int) produces "atomic int".


I missed that.  That seems surprising.  I would expect the trait
to evaluate to the same type regardless of the argument (type or
expression).  Why does it only strip qualifiers from expressions
and not also from types?

The __typeof__(_Atomic T) example lends support to the idea of
more primitive traits.  It's not hard to envision use cases where
one might be interested in obtaining the underlying (non-atomic)
type of an expression or type without losing its cv qualifiers,
or vice versa.




Alternatively, adding support for lower-level C-only primitives like
__remove_const and __remove_volatile, to parallel the C++ library
traits, might provide a more general solution and avoid introducing
yet another mechanism for determining the type of an expression to
the languages (C++ already has a few).


I don't know if that wouldn't be overkill.  Qualifiers on rvalues are
meaningless in C and that's why my __typeof_noqual strips them all.
We'd then need even e.g. __remove_restrict, not sure if there's need for
these.  Maybe it is.


Unless __typeof__ (p) q = p; declares a restrict-qualified q when
p is a restrict-qualified pointer I don't think __remove_restrict
is needed.  Restrict doesn't qualify a type but rather a pointer
object it applies to so I would find the effect above unexpected
(notwithstanding the fact that a copy of a restricted pointer is
subject to some of the same constraints as the original even
without the qualification).

I would expect the other remove traits to be useful.  People
have been experimenting with generic programming in C (e.g.,
https://gcc.gnu.org/ml/gcc/2017-05/msg00082.html).  WG14 is
considering a proposal to add const-correct overloads of many
of the string functions (like strchr) that relies on a (very
simple) form of it.  The proposal's author (and the submitter
of PR 65455) also has done a lot of work in this space
(detecting/removing/adding qualifiers).  I don't have nearly
as much experience with this type of programming in C but the
lesson I learned from my work on C++ type traits is that where
a more complex feature can be decomposed into two or more
smaller, primitive ones, the latter sooner or later end up
being needed in other contexts as well and make valuable
general-purpose features on their own.




+@code{typeof_noqual} behaves the same except that it strips type qualifiers
+such as @code{const} and @code{volatile}, if given an expression.  This can
+be useful for certain macros when passed const arguments:
+
+@smallexample
+#define MAX(__x, __y)  \
+  (@{  \
+  __typeof_noqual(__x) __ret = __x;\
+  if (__y > __ret) __ret = __y; \
+__ret; \
+  @})


The example should probably avoid using reserved names (with

Re: [PATCH GCC][02/13]Skip distribution if there is no loop

2017-06-27 Thread Jeff Law
On 06/12/2017 11:02 AM, Bin Cheng wrote:
> Hi,
> this is a simple patch skipping distribution if there is no loop at all.
> 
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
> 
> Thanks,
> bin
> 
> 2017-06-07  Bin Cheng  
> 
>   * cfgloop.h (pass_loop_distribution::execute): Skip if no loops.
> 
OK.
jeff


[PING^3] Re: [PATCH] c/c++: Add fix-it hints for suggested missing #includes

2017-06-27 Thread David Malcolm
Ping re:

  https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html

On Tue, 2017-06-20 at 15:32 -0400, David Malcolm wrote:
> Ping re:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html
> 
> 
> On Fri, 2017-05-26 at 15:54 -0400, David Malcolm wrote:
> > Ping:
> >   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html
> > 
> > On Thu, 2017-05-04 at 12:36 -0400, David Malcolm wrote:
> > > As of r247522, fix-it-hints can suggest the insertion of new
> > > lines.
> > > 
> > > This patch uses this to implement a new "maybe_add_include_fixit"
> > > function in c-common.c and uses it in the two places where the C
> > > and
> > > C++
> > > frontend can suggest missing #include directives. [1]
> > > 
> > > The idea is that the user can then click on the fix-it in an IDE
> > > and have it add the #include for them (or use -fdiagnostics
> > > -generate
> > > -patch).
> > > 
> > > Examples can be seen in the test cases.
> > > 
> > > The function attempts to put the #include in a reasonable place:
> > > immediately after the last #include within the file, or at the
> > > top of the file.  It is idempotent, so -fdiagnostics-generate
> > > -patch
> > > does the right thing if several such diagnostics are emitted.
> > > 
> > > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> > > [1] I'm working on a followup which tweaks another diagnostic so
> > > that
> > > it
> > > can suggest that a #include was missing, so I'll use it there as
> > > well.
> > > 
> > > gcc/c-family/ChangeLog:
> > >   * c-common.c (try_to_locate_new_include_insertion_point): New
> > >   function.
> > >   (per_file_includes_t): New typedef.
> > >   (added_includes_t): New typedef.
> > >   (added_includes): New variable.
> > >   (maybe_add_include_fixit): New function.
> > >   * c-common.h (maybe_add_include_fixit): New decl.
> > > 
> > > gcc/c/ChangeLog:
> > >   * c-decl.c (implicitly_declare): When suggesting a missing
> > >   #include, provide a fix-it hint.
> > > 
> > > gcc/cp/ChangeLog:
> > >   * name-lookup.c (get_std_name_hint): Add '<' and '>' around
> > >   the header names.
> > >   (maybe_suggest_missing_header): Update for addition of '<' and
> > > '>'
> > >   to above.  Provide a fix-it hint.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >   * g++.dg/lookup/missing-std-include-2.C: New text case.
> > >   * gcc.dg/missing-header-fixit-1.c: New test case.
> > > ---
> > >  gcc/c-family/c-common.c| 117
> > > +
> > >  gcc/c-family/c-common.h|   2 +
> > >  gcc/c/c-decl.c |  10 +-
> > >  gcc/cp/name-lookup.c   |  94
> > > +
> > > --
> > > --
> > >  .../g++.dg/lookup/missing-std-include-2.C  |  55
> > > ++
> > >  gcc/testsuite/gcc.dg/missing-header-fixit-1.c  |  36 +++
> > >  6 files changed, 267 insertions(+), 47 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/lookup/missing-std
> > > -include
> > > -2.C
> > >  create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-1.c
> > > 
> > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> > > index 0884922..19f7e60 100644
> > > --- a/gcc/c-family/c-common.c
> > > +++ b/gcc/c-family/c-common.c
> > > @@ -7983,4 +7983,121 @@ c_flt_eval_method (bool maybe_c11_only_p)
> > >  return c_ts18661_flt_eval_method ();
> > >  }
> > >  
> > > +/* Attempt to locate a suitable location within FILE for a
> > > +   #include directive to be inserted before.  FILE should
> > > +   be a string from libcpp (pointer equality is used).
> > > +
> > > +   Attempt to return the location within FILE immediately
> > > +   after the last #include within that file, or the start of
> > > +   that file if it has no #include directives.
> > > +
> > > +   Return UNKNOWN_LOCATION if no suitable location is found,
> > > +   or if an error occurs.  */
> > > +
> > > +static location_t
> > > +try_to_locate_new_include_insertion_point (const char *file)
> > > +{
> > > +  /* Locate the last ordinary map within FILE that ended with a
> > > #include.  */
> > > +  const line_map_ordinary *last_include_ord_map = NULL;
> > > +
> > > +  /* ...and the next ordinary map within FILE after that one. 
> > >  */
> > > +  const line_map_ordinary *last_ord_map_after_include = NULL;
> > > +
> > > +  /* ...and the first ordinary map within FILE.  */
> > > +  const line_map_ordinary *first_ord_map_in_file = NULL;
> > > +
> > > +  for (unsigned int i = 0; i < LINEMAPS_ORDINARY_USED
> > > (line_table);
> > > i++)
> > > +{
> > > +  const line_map_ordinary *ord_map
> > > + = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> > > +
> > > +  const line_map_ordinary *from = INCLUDED_FROM (line_table,
> > > ord_map);
> > > +  if (from)
> > > + if (from->to_file == file)
> > > +   {
> > > + last_include_ord_map = from;
> > > + last_ord_map_after_include = NULL;
> > > +   }
> > > +
> > > +  if 

Re: [PATCH] LFS support for libbacktrace

2017-06-27 Thread Jeff Law
On 06/20/2017 05:15 AM, Richard Biener wrote:
> On Wed, 14 Jun 2017, Richard Biener wrote:
> 
>>
>> This fixes the [f]open use in libgfortran.  Doesn't fix the ones
>> in libsanitizer because those appearantly use a copy because they
>> need to rename stuff...
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
>> and branches?
> 
> Ping.
> 
> Richard.
> 
>> Thanks,
>> Richard.
>>
>> 2017-06-14  Richard Biener  
>>
>>  * configure.ac: Add AC_SYS_LARGEFILE.
>>  * config.h.in: Regenerate.
>>  * configure: Likewise.
OK.
jeff


Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-27 Thread Bin.Cheng
On Tue, Jun 27, 2017 at 3:59 PM, Richard Biener
 wrote:
> On June 27, 2017 4:27:17 PM GMT+02:00, "Bin.Cheng"  
> wrote:
>>On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener
>> wrote:
>>> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng 
>>wrote:
 On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng 
>>wrote:
> Hi,
> I was asked by upstream to split the loop distribution patch into
>>small ones.
> It is hard because data structure and algorithm are closely coupled
>>together.
> Anyway, this is the patch series with smaller patches.  Basically I
>>tried to
> separate data structure and bug-fix changes apart with one as the
>>main patch.
> Note I only made necessary code refactoring in order to separate
>>patch, apart
> from that, there is no change against the last version.
>
> This is the first patch introducing new internal function
>>IFN_LOOP_DIST_ALIAS.
> GCC will distribute loops under condition of this function call.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
 Hi,
 I need to update this patch fixing an issue in
 vect_loop_dist_alias_call.  The previous patch fails to find some
 IFN_LOOP_DIST_ALIAS calls.

 Bootstrap and test in series.  Is it OK?
>>>
>>> So I wonder if we really need to track ldist_alias_id or if we can do
>>sth
>>Yes, it is needed because otherwise we probably falsely trying to
>>search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution)
>>loop.
>>
>>> more "general", like tracking a copy_of or origin and then directly
>>> go to nearest_common_dominator (loop->header, copy_of->header)
>>> to find the controlling condition?
>>I tend to not record any pointer in loop structure, it can easily go
>>dangling for a across passes data structure.
>
> I didn't mean to record a pointer, just rename your field and make it more 
> general.  The common dominator thing shod still work, no?
I might not be following.  If we record the original loop->num in the
renamed field, nearest_common_dominator can't work because we don't
have basic blocks to start the call?  The original loop could be
eliminated at several points, for example, instantly after
distribution, or folded in vectorizer for other loops distributed from
the original loop.
BTW, setting the copy_of/origin field in loop_version is not enough
for this use case, all generated loops (actually, except the versioned
loop) from distribution need to be set.
>
> As far as memory usage
>>is concerned.  I actually don't need a whole integer to record the
>>loop num.  I can simply restrict number of distributions in one
>>function to at most 256, and record such id in a char field in struct
>>loop?  Does this sounds better?
>
> As said, tracking loop origin sounds useful anyway so I'd rather add and use 
> that somehow.
To be honest, I don't know.  the current field works like a unique
index of distribution operation.  The original loop could be destroyed
at different points thus no longer exists, this makes the recorded
copy_of/origin less meaningful?

Thanks,
bin
>
>>Thanks,
>>bin
>>>
>>> That said "ldist_alias_id" is a bit too narrow of purpose to "waste"
>>> an int inside struct loop?  I'd set copy_of/origi in loop_version for
>>example.
>>> 'origin' would probably be better given the ldist cases aren't really
>>> full "copies".
>>>
>>> fold_loop_dist_alias_call should re-use / rename
>>fold_loop_vectorized_call
>>> by just passing folded_value to it.
>>>
>>> Richard.
>>>
 Thanks,
 bin
>
> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * cfgloop.h (struct loop): New field ldist_alias_id.
> * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment
>>change.
> * internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
> * internal-fn.def (LOOP_DIST_ALIAS): New.
> * tree-vectorizer.c (vect_loop_dist_alias_call): New
>>function.
> (fold_loop_dist_alias_call): New function.
> (vectorize_loops): Fold IFN_LOOP_DIST_ALIAS call depending
>>on
> successful vectorization or not.
>


Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-27 Thread Peter Bergner
On 6/27/17 11:07 AM, Segher Boessenkool wrote:
> Not use an installed header, that's not what I'm asking.  Share the
> source file, i.e., just copy it over from the glibc source tree (it
> should probably hold the master copy).  Fewer typos, cannot forget to
> update some entry, etc.

Ah, that's make sense.  I'll have a look at how easy it is.
In the mean time, I'll hold off on committing this.

Peter




[PATCH rs6000] remove implicit static var outputs of toc_relative_expr_p

2017-06-27 Thread Aaron Sawdey
So, this is to set things up so I can in a future patch separate out
the code that deals with optimizing byte swapping for LE on P8 into a
separate file.

The function toc_relative_expr_p implicitly sets two static vars
(tocrel_base and tocrel_offset) that are declared in rs6000.c. The real
purpose of this is to communicate between
print_operand/print_operand_address and rs6000_output_addr_const_extra,
which is called through the asm_out hook vector by something in the
call tree under output_addr_const.

This patch changes toc_relative_expr_p to make tocrel_base and
tocrel_offset be explicit const_rtx * args. All of the calls other than
print_operand/print_operand_address are changed to have local const_rtx
vars that are passed in. The statics in rs6000.c are now called
tocrel_base_oac and tocrel_offset_oac to reflect their use to
communicate across output_addr_const, and that is now the only thing
they are used for.

Bootstrap and regtest passes in trunk 249639 (to avoid the bootstrap
fail), ok for trunk?


2017-06-27  Aaron Sawdey  

* config/rs6000/rs6000.c (toc_relative_expr_p): Make tocrel_base
and tocrel_offset be pointer args rather than implicitly using
static versions.
(legitimate_constant_pool_address_p, rs6000_emit_move,
const_load_sequence_p, adjust_vperm): Add local tocrel_base and
tocrel_offset and use in toc_relative_expr_p call.
(print_operand, print_operand_address): Use static tocrel_base_oac
and tocrel_offset_oac.
(rs6000_output_addr_const_extra): Use static tocrel_base_oac and
tocrel_offset_oac.


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h	(revision 249639)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -40,7 +40,7 @@
 extern int small_data_operand (rtx, machine_mode);
 extern bool mem_operand_gpr (rtx, machine_mode);
 extern bool mem_operand_ds_form (rtx, machine_mode);
-extern bool toc_relative_expr_p (const_rtx, bool);
+extern bool toc_relative_expr_p (const_rtx, bool, const_rtx *, const_rtx *);
 extern void validate_condition_mode (enum rtx_code, machine_mode);
 extern bool legitimate_constant_pool_address_p (const_rtx, machine_mode,
 		bool);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 249639)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -8628,18 +8628,25 @@
 	  && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (base), Pmode));
 }
 
-static const_rtx tocrel_base, tocrel_offset;
+/* These are only used to pass through from print_operand/print_operand_address
+ * to rs6000_output_addr_const_extra over the intervening function 
+ * output_addr_const which is not target code.  */
+static const_rtx tocrel_base_oac, tocrel_offset_oac;
 
 /* Return true if OP is a toc pointer relative address (the output
of create_TOC_reference).  If STRICT, do not match non-split
-   -mcmodel=large/medium toc pointer relative addresses.  */
+   -mcmodel=large/medium toc pointer relative addresses.  Places base 
+   and offset pieces in TOCREL_BASE and TOCREL_OFFSET respectively.  */
 
 bool
-toc_relative_expr_p (const_rtx op, bool strict)
+toc_relative_expr_p (const_rtx op, bool strict, const_rtx *tocrel_base,
+		 const_rtx *tocrel_offset)
 {
   if (!TARGET_TOC)
 return false;
 
+  gcc_assert (tocrel_base != NULL && tocrel_offset != NULL);
+
   if (TARGET_CMODEL != CMODEL_SMALL)
 {
   /* When strict ensure we have everything tidy.  */
@@ -8655,16 +8662,16 @@
 	op = XEXP (op, 1);
 }
 
-  tocrel_base = op;
-  tocrel_offset = const0_rtx;
+  *tocrel_base = op;
+  *tocrel_offset = const0_rtx;
   if (GET_CODE (op) == PLUS && add_cint_operand (XEXP (op, 1), GET_MODE (op)))
 {
-  tocrel_base = XEXP (op, 0);
-  tocrel_offset = XEXP (op, 1);
+  *tocrel_base = XEXP (op, 0);
+  *tocrel_offset = XEXP (op, 1);
 }
 
-  return (GET_CODE (tocrel_base) == UNSPEC
-	  && XINT (tocrel_base, 1) == UNSPEC_TOCREL);
+  return (GET_CODE (*tocrel_base) == UNSPEC
+	  && XINT (*tocrel_base, 1) == UNSPEC_TOCREL);
 }
 
 /* Return true if X is a constant pool address, and also for cmodel=medium
@@ -8674,7 +8681,8 @@
 legitimate_constant_pool_address_p (const_rtx x, machine_mode mode,
 bool strict)
 {
-  return (toc_relative_expr_p (x, strict)
+  const_rtx tocrel_base, tocrel_offset;
+  return (toc_relative_expr_p (x, strict, _base, _offset)
 	  && (TARGET_CMODEL != CMODEL_MEDIUM
 	  || constant_pool_expr_p (XVECEXP (tocrel_base, 0, 0))
 	  || mode == QImode
@@ -11055,6 +11063,7 @@
   /* If this is a SYMBOL_REF that refers to a constant pool entry,
 	 and we have put it in the TOC, we just need to make 

Re: [PATCH, 1/4] Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin

2017-06-27 Thread Tom de Vries

On 06/26/2017 01:31 PM, Tom de Vries wrote:

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the 
following happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

1. Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin


This patch adds a debug message (for GOMP_DEBUG=1) about the value of 
the GOMP_OPENACC_DIM variable read from the environment.




Committed as trivial.

Thanks,
- Tom


Thanks,
- Tom

0001-Show-value-of-GOMP_OPENACC_DIM-in-libgomp-nvptx-plugin.patch


Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin

2017-06-26  Tom de Vries  

* plugin/plugin-nvptx.c (notify_var): New function.
(nvptx_exec): Use notify_var for GOMP_OPENACC_DIM.

---
  libgomp/plugin/plugin-nvptx.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 0e1b3e2..71630b5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -867,6 +867,14 @@ nvptx_get_num_devices (void)
return n;
  }
  
+static void

+notify_var (const char *var_name, const char *env_var)
+{
+  if (env_var == NULL)
+GOMP_PLUGIN_debug (0, "%s: \n", var_name);
+  else
+GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
+}
  
  static bool

  link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
@@ -1089,10 +1097,12 @@ nvptx_exec (void (*fn), size_t mapnum, void 
**hostaddrs, void **devaddrs,
pthread_mutex_lock (_dev_lock);
if (!default_dims[0])
{
+ const char *var_name = "GOMP_OPENACC_DIM";
  /* We only read the environment variable once.  You can't
 change it in the middle of execution.  The syntax  is
 the same as for the -fopenacc-dim compilation option.  */
- const char *env_var = getenv ("GOMP_OPENACC_DIM");
+ const char *env_var = getenv (var_name);
+ notify_var (var_name, env_var);
  if (env_var)
{
  const char *pos = env_var;





Re: [PATCH] Fix PR bootstrap/81217

2017-06-27 Thread Jeff Law
On 06/27/2017 07:16 AM, Martin Liška wrote:
> Hello.
> 
> Following fixes the PR by removal of superfluous bootstrap_target.
> 
> Ready to be installed?
> 
> ChangeLog:
> 
> 2017-06-27  Martin Liska  
> 
>   PR bootstrap/81217
>   * Makefile.def: Remove superfluous bootstrap_target from
>   bootstrap_stage.
>   * Makefile.in: Re-generate the file.
OK.
jeff


Re: [PATCH][AArch64] Improve Cortex-A53 shift bypass

2017-06-27 Thread Ramana Radhakrishnan
On Wed, Jun 14, 2017 at 2:55 PM, James Greenhalgh
 wrote:
> On Fri, May 05, 2017 at 05:02:46PM +0100, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>
>> > --- a/gcc/config/arm/aarch-common.c
>> > +++ b/gcc/config/arm/aarch-common.c
>> > @@ -254,12 +254,7 @@ arm_no_early_alu_shift_dep (rtx producer, rtx 
>> > consumer)
>> >  return 0;
>> >
>> >if ((early_op = arm_find_shift_sub_rtx (op)))
>> > -{
>> > -  if (REG_P (early_op))
>> > - early_op = op;
>> > -
>> > -  return !reg_overlap_mentioned_p (value, early_op);
>> > -}
>> > +return !reg_overlap_mentioned_p (value, early_op);
>> >
>> >return 0;
>> >  }
>>
>> > This function is used by several aarch32 pipeline description models.
>> > What testing have you given it there.  Are the changes appropriate for
>> > those cores as well?
>>
>> arm_find_shift_sub_rtx can only ever return NULL_RTX or a shift rtx, so the
>> check for REG_P is dead code. Bootstrap passes on ARM too of course.
>
> This took me a bit of head-scratching to get right...
>
> arm_find_shift_sub_rtx calls arm_find_sub_rtx_with_code, looking for
> ASHIFT, with find_any_shift set to TRUE. There, we're going to run
> through the subRTX of pattern, if the code of the subrtx is one of the
> shift-like patterns, we return X, otherwise we return NULL_RTX.
>
> Thus
>
>> > -  if (REG_P (early_op))
>> > - early_op = op;
>
> is not needed, and the code can be reduced to:
>
>   if ((early_op = arm_find_shift_sub_rtx (op)))
> return !reg_overlap_mentioned_p (value, early_op);
>   return 0;
>
> So, this looks fine to me from an AArch64 perspective - but you'll need an
> ARM OK too as this is shared code.


I'm about to run home for the day but this came in from
https://gcc.gnu.org/ml/gcc-patches/2013-09/msg02109.html and James
said in that email that this was put in to ensure no segfaults on
cortex-a15 / cortex-a7 tuning.

I'll try and look at it later this week.




Ramana

>
> Cheers,
> James
>


Re: fix libcc1 dependencies in toplevel Makefile

2017-06-27 Thread Olivier Hainque
Hi Alex,

> On Jun 26, 2017, at 09:16 , Olivier Hainque  wrote:
> 
>> I'd like to understand better what the concurrency problem is with the
>> current build machinery, before we proceed with this change.  If you
>> manage to trigger the problem again, could you try to further analyze
>> build logs to check for e.g. concurrent activation of all-gcc in both
>> the top-level Makefile and the recursed-into-for-stage1 Makefile, or
>> somesuch?  Something else worth considering is what the make targets
>> specified in the command line were.
> 
> The problems were showing pretty rarely, only on certain hosts, in
> certain load conditions. We should still have the logs around and I'll
> look into this. They are regular logs, without -d. I can almost for sure
> fetch the exact "make" command line involved.

This was:

make -j 32 BOOT_LDFLAGS=-Wl,--stack=0x200 CC=gcc 'ADAFLAGS=-W -Wall -gnatpg 
-gnata -gnatws -gnatU -gnatyN' CXXFLAGS=-O2 BOOT_CFLAGS=-O2 CFLAGS=-O2 'LN_S=cp 
-p' 'BOOT_ADAFLAGS=-gnatpgn -gnatU' 'STAGE1_CFLAGS=-O2 -O0 -g' bootstrap

From the logs of discussions we tracked, the understanding
of the dependency issue was that we *had* (before the patch),
possibilities to have stage_current and maybe-all-gcc targets
built concurrently, via

> configure-target-libquadmath: stage_current
> all-target-libquadmath: configure-target-libquadmath
> maybe-all-target-libquadmath: all-target-libquadmath

> all-target: maybe-all-target-libquadmath

on the one hand,

> all-libcc1: maybe-all-gcc

> maybe-all-libcc1: all-libcc1

> all-host: maybe-all-libcc1

on the other hand.

Does that make sense ?

Thanks for your feedback!

(Note that I'll be away from tomorrow to Monday)

Olivier





Re: [PATCH] PR libstdc++/81221 fix namespace qualification for parallel mode

2017-06-27 Thread Jonathan Wakely

On 27/06/17 15:46 +0100, Jonathan Wakely wrote:

std::sample needs to call _GLIBCXX_STD_A::__sample instead of
std::__sample, so that it works when Parallel Mode is active.

PR libstdc++/81221
* include/bits/stl_algo.h (sample): Qualify with _GLIBCXX_STD_A not
std.
* testsuite/25_algorithms/sample/81221.cc: New.

Tested powerpc64le-linux, committed to trunk, and will commit to the
gcc-7-branch shortly.


I forgot that tests can't use -D_GLIBCXX_PARALLEL if libgomp isn't
available. This makes it conditional on running "make check-parallel"

Commnitted to trunk.

commit 8fb437bed6c707288f8d1fa6c27c6e3ed6b422a4
Author: Jonathan Wakely 
Date:   Tue Jun 27 16:31:25 2017 +0100

PR libstdc++/81221 only run new test for check-parallel

	PR libstdc++/81221
	* testsuite/25_algorithms/sample/81221.cc: Disable except for
	check-parallel.

diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc b/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc
index e6dd5e0..28ec0e3 100644
--- a/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc
@@ -17,7 +17,6 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++1z } }
+// { dg-require-parallel-mode "" }
 
-#undef _GLIBCXX_PARALLEL
-#define _GLIBCXX_PARALLEL 1
 #include 


Re: Fix genmultilib reuse rule checks for large sets of option combinations

2017-06-27 Thread Jeff Law
On 06/08/2017 02:28 PM, Joseph Myers wrote:
> genmultilib computes combination_space, a list of all combinations of
> options in MULTILIB_OPTIONS that might have multilibs built for them
> (some of which may end up not having multilibs built for them, and
> some of those may end up being mapped to other multilibs with
> MULTILIB_REUSE).  It is then used to validate the right hand part of
> MULTILIB_REUSE rules, checking with expr that combination_space
> matches a basic regular expression derived from that right hand part.
> 
> There are two problems with this approach to validation:
> 
> * It requires that right hand part to have options in the same order
>   as in MULTILIB_OPTIONS, in contradiction to the documentation of
>   MULTILIB_REUSE saying that order does not matter there.
> 
> * combination_space can be so large that the expr call fails with an
>   E2BIG error.  I have a local ARM configuration with 40 multilibs but
>   3840 combinations of options from MULTILIB_OPTIONS (so 3839 listed
>   in combination_space, since it doesn't list the default multilib)
>   and 996 MULTILIB_REUSE rules.  This generates a combination_space
>   string longer than the Linux kernel's MAX_ARG_STRLEN (PAGE_SIZE *
>   32, the limit on the length of a single argv string), so that expr
>   cannot be run.
> 
> This patch changes the validation approach to generate a much shorter
> extended regular expression for any sequence of multilib options in
> any order, and uses that for the validation instead.
> 
> Tested with a built for arm-none-eabi --with-multilib-list=aprofile
> (as a configuration that uses MULTILIB_REUSE).
> 
> 2017-06-08  Joseph Myers  
> 
>   * genmultilib (combination_space): Remove variable.
>   Validate reuse rules against regular expression for any sequence
>   of multilib options in any order.
Going to trust you on this :-)  regexps are far from my sweet spot.


jeff


Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-27 Thread Andrew Pinski
On Tue, Jun 27, 2017 at 8:27 AM, Renlin Li  wrote:
> Hi Andrew,
>
> On 25/06/17 22:38, Andrew Pinski wrote:
>>
>> On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li  wrote:
>>>
>>> Hi all,
>>>
>>> In this patch, a new integer register operand modifier 'r' is added. This
>>> will use the
>>> proper register name according to the mode of corresponding operand.
>>>
>>> 'w' register for scalar integer mode smaller than DImode
>>> 'x' register for DImode
>>>
>>> This allows more flexibility and would meet people's expectations.
>>> It will help for ILP32 and LP64, and big-endian case.
>>>
>>> A new section is added to document the AArch64 operand modifiers which
>>> might
>>> be used in inline assembly. It's not an exhaustive list covers every
>>> modifier.
>>> Only the most common and useful ones are documented.
>>>
>>> The default behavior of integer operand without modifier is clearly
>>> documented
>>> as well. It's not changed so that the patch shouldn't break anything.
>>>
>>> So with this patch, it should resolve the issues in PR63359.
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359
>>>
>>>
>>> aarch64-none-elf regression test Okay. Okay to check in?
>>
>>
>> I think 'r' modifier is very fragile and can be used incorrectly and
>> wrong in some cases really..
>
>
> The user could always (or be encouraged to) opt to a strict register
> modifier to enforce consistent behavior in all cases.
>
> I agree the flexibility might bring unexpected behavior in corner cases.
> Do you have any examples to share off the top of your head? So that we can
> discuss the benefit and pitfalls, and decide to improve the patch or
> withdraw it.

One thing is TImode is missing.  I have an use case of __int128_t
inside inline-asm.
For me %r and TImode would produce "x0, x1".  This is one of the
reasons why I said it is fragile.

>
>> I like the documentation though.

As an aside %H is not documented here.  Noticed it because I am using
%H with TImode.

Thanks,
Andrew

>
> Thanks,
> Renlin
>
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-06-06  Renlin Li  
>>>
>>>  PR target/63359
>>>  * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
>>> modifier.
>>>  * doc/extend.texi (AArch64Operandmodifiers): New section.


Re: [PATCH 1/1] Remove ns32k leftover

2017-06-27 Thread Jeff Law
On 06/24/2017 11:11 PM, Maya Rashish wrote:
> Support for ns32k was removed in GCC4.
> ---
>  include/longlong.h | 36 
>  1 file changed, 36 deletions(-)
Thanks.  I added a ChangeLog entry and committed your change.

jeff


Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-27 Thread Segher Boessenkool
On Tue, Jun 27, 2017 at 10:55:53AM -0500, Peter Bergner wrote:
> On 6/27/17 10:51 AM, Segher Boessenkool wrote:
> > On Mon, Jun 26, 2017 at 10:33:48PM -0500, Peter Bergner wrote:
> >> Tulio added support for two new AT_HWCAP2 bits to GLIBC which have been
> >> recently added to the kernel:
> >>
> >>   https://www.sourceware.org/ml/libc-alpha/2017-06/msg00069.html
> >>
> >> This patch adds support for them to the __builtin_cpu_supports() builtin
> >> function so we can test for them.
> >>
> >> Tested on powerpc64le-linux with no regressions.  Is this ok for trunk?
> > 
> > Okay.
> > 
> > Could we use a shared (with glibc) header, or reduce duplication some
> > other way?
> 
> Not really, since either GCC or GLIBC could be built with/against an
> older version of the other package and if that package were the one
> to have the defines, then we'd get a build error.

Not use an installed header, that's not what I'm asking.  Share the
source file, i.e., just copy it over from the glibc source tree (it
should probably hold the master copy).  Fewer typos, cannot forget to
update some entry, etc.


Segher


Re: [Doc, AArch64] Fix/Update AArch64 options.

2017-06-27 Thread Sandra Loosemore

On 06/27/2017 06:19 AM, Yvan Roux wrote:


diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 942a7d5..0fd1bfa 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -146,7 +146,7 @@ EnumValue
 Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)

 mpc-relative-literal-loads
-Target Report Save Var(pcrelative_literal_loads) Init(2) Save
+Target Report Var(pcrelative_literal_loads) Init(2) Save
 PC relative literal loads.

 msign-return-address=


I think this qualifies as an obvious fix.  I can't approve it if it 
isn't, anyway  ;-)



diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d1e097b..6e0e776 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -595,7 +595,9 @@ Objective-C and Objective-C++ Dialects}.
 -mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
 -mlow-precision-sqrt  -mno-low-precision-sqrt@gol
 -mlow-precision-div  -mno-low-precision-div @gol
--march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
+-mpc-relative-literal-loads -mno-pc-relative-literal-loads @gol


For options that have both positive and negative variants, we should 
only be listing the one that is not the default in the Option Summary 
table.  Can you please remove the existing redundant options listed for 
AArch64, instead of adding a new one?



+-msign-return-address=@var{scope} @gol
+-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  
-moverride=@var{string}}

 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -14158,8 +14160,10 @@ across releases.
 This option is only intended to be useful when developing GCC.

 @item -mpc-relative-literal-loads
+@item -mno-pc-relative-literal-loads


It is OK to list both the positive and negative forms in the full 
description, but in a table with multiple items in the same entry, the 
second and subsequent ones should use @itemx markup instead of @item.



 @opindex mpc-relative-literal-loads
-Enable PC-relative literal loads.  With this option literal pools are
+@opindex mno-pc-relative-literal-loads
+Enable or disable PC-relative literal loads.  With this option literal pools 
are
 accessed using a single instruction and emitted after each function.  This
 limits the maximum size of functions to 1MB.  This is enabled by default for
 @option{-mcmodel=tiny}.


-Sandra



Re: Backports to 6 (and 7, and 5)

2017-06-27 Thread Segher Boessenkool
On Tue, Jun 27, 2017 at 09:18:07AM +0200, Richard Biener wrote:
> On Mon, 26 Jun 2017, Segher Boessenkool wrote:
> > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01853.html
> > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01923.html
> > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02048.html
> > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02606.html
> > bb-reorder: Improve compgotos pass (PR71785)
> 
> It's not clear this fixes a regression and as it is
> a missed-optimization I'd not backport it at this point in time
> (I understand it's in GCC 7 already).

It's a regression from 3.x, and a pretty severe one, but we can live
with it for a bit longer, it's a bit invasive for a backport.

Thanks for the review,


Segher


Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-27 Thread Peter Bergner
On 6/27/17 10:51 AM, Segher Boessenkool wrote:
> On Mon, Jun 26, 2017 at 10:33:48PM -0500, Peter Bergner wrote:
>> Tulio added support for two new AT_HWCAP2 bits to GLIBC which have been
>> recently added to the kernel:
>>
>>   https://www.sourceware.org/ml/libc-alpha/2017-06/msg00069.html
>>
>> This patch adds support for them to the __builtin_cpu_supports() builtin
>> function so we can test for them.
>>
>> Tested on powerpc64le-linux with no regressions.  Is this ok for trunk?
> 
> Okay.
> 
> Could we use a shared (with glibc) header, or reduce duplication some
> other way?

Not really, since either GCC or GLIBC could be built with/against an
older version of the other package and if that package were the one
to have the defines, then we'd get a build error.

Peter



Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-27 Thread Segher Boessenkool
On Mon, Jun 26, 2017 at 10:33:48PM -0500, Peter Bergner wrote:
> Tulio added support for two new AT_HWCAP2 bits to GLIBC which have been
> recently added to the kernel:
> 
>   https://www.sourceware.org/ml/libc-alpha/2017-06/msg00069.html
> 
> This patch adds support for them to the __builtin_cpu_supports() builtin
> function so we can test for them.
> 
> Tested on powerpc64le-linux with no regressions.  Is this ok for trunk?

Okay.

Could we use a shared (with glibc) header, or reduce duplication some
other way?


Segher


>   * config/rs6000/ppc-auxv.h (PPC_FEATURE2_DARN): New define.
>   (PPC_FEATURE2_SCV): Likewise.
>   * config/rs6000/rs6000.c (cpu_supports_info): Use them.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/cpu-builtin-1.c (darn, scv): Add tests.


Re: [PATCH] multiarch support for non-glibc linux systems

2017-06-27 Thread Szabolcs Nagy
On 07/06/17 18:22, Szabolcs Nagy wrote:
> Current multiarch directory name is always *-linux-gnu* on linux,
> this patch configures different names for uclibc and musl targets.
> (tested by the debian rebootstrap scripts for various *-linux-musl
> and *-linux-uclibc targets see debian bug #861588)
> 
> gcc/
> 2017-06-07  Szabolcs Nagy  
> 
>   * config.gcc (*-linux-musl*): Add t-musl tmake_file.
>   (*-linux-uclibc*): Add t-uclibc tmake_file.
>   * config/t-musl: New.
>   * config/t-uclibc: New.
> 

ping.



Ping^2 Re: Fix genmultilib reuse rule checks for large sets of option combinations

2017-06-27 Thread Joseph Myers
Ping^2.  This patch 
 is still 
pending review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch] Forward triviality in variant

2017-06-27 Thread Jonathan Wakely

On 18/06/17 12:37 -0700, Tim Shen via libstdc++ wrote:

Besides the changes on the comments, I also changed the definition of
_S_trivial_copy_assign and _S_trivial_move_assign to match what union
has. See [class.copy.assign]p9.

On Thu, Jun 1, 2017 at 8:13 AM, Jonathan Wakely wrote:

On 30/05/17 02:16 -0700, Tim Shen via libstdc++ wrote:


diff --git a/libstdc++-v3/include/std/variant
b/libstdc++-v3/include/std/variant
index b9824a5182c..f81b815af09 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -290,6 +290,53 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __ref_cast<_Tp>(__t));
}

+  template
+struct _Traits
+{
+  static constexpr bool is_default_constructible_v =
+  is_default_constructible_v::type>;
+  static constexpr bool is_copy_constructible_v =
+  __and_...>::value;
+  static constexpr bool is_move_constructible_v =
+  __and_...>::value;
+  static constexpr bool is_copy_assignable_v =
+  is_copy_constructible_v && is_move_constructible_v
+  && __and_...>::value;
+  static constexpr bool is_move_assignable_v =
+  is_move_constructible_v
+  && __and_...>::value;



It seems strange to me that these ones end with _v but the following
ones don't. Could we make them all have no _v suffix?


Done. They are internal traits only for readability, so I shortened
the names and make them libstdc++ style, e.g. _S_copy_ctor.




+  static constexpr bool is_dtor_trivial =
+  __and_...>::value;
+  static constexpr bool is_copy_ctor_trivial =
+  __and_...>::value;
+  static constexpr bool is_move_ctor_trivial =
+  __and_...>::value;
+  static constexpr bool is_copy_assign_trivial =
+  is_dtor_trivial
+  && is_copy_ctor_trivial
+  && __and_...>::value;
+  static constexpr bool is_move_assign_trivial =
+  is_dtor_trivial
+  && is_move_ctor_trivial
+  && __and_...>::value;
+
+  static constexpr bool is_default_ctor_noexcept =
+  is_nothrow_default_constructible_v<
+  typename _Nth_type<0, _Types...>::type>;
+  static constexpr bool is_copy_ctor_noexcept =
+  is_copy_ctor_trivial;
+  static constexpr bool is_move_ctor_noexcept =
+  is_move_ctor_trivial
+  || __and_...>::value;
+  static constexpr bool is_copy_assign_noexcept =
+  is_copy_assign_trivial;
+  static constexpr bool is_move_assign_noexcept =
+  is_move_assign_trivial ||
+  (is_move_ctor_noexcept
+   && __and_...>::value);
+};



Does using __and_ for any of those traits reduce the limit on the
number of alternatives in a variant? We switched to using fold
expressions in some contexts to avoid very deep instantiations, but I
don't know if these will hit the same problem, but it looks like it
will.


Done, use fold expression instead. At one point we changed some fold
expressions to __and_, because __and_ has short circuiting; does fold
expressions have short circuits too? Now that I think about it, short
circuiting in a constant fold expression should be a QoI issue.


Fold expressions don't short-circuit ... I'm not sure if they would be
allowed to for QoI.


@@ -928,12 +1107,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static constexpr size_t __index_of =
  __detail::__variant::__index_of_v<_Tp, _Types...>;

+  using _Traits = __detail::__variant::_Traits<_Types...>;
+
public:
-  constexpr variant()
-  noexcept(is_nothrow_default_constructible_v<__to_type<0>>) =
default;
-  variant(const variant&) = default;
+  variant() noexcept(_Traits::is_default_ctor_noexcept) = default;



Do we need the exception specifications here? Will the =default make
the right thing happen anyway? (And if not, won't we get an error by
trying to define the constructors as noexcept when the implicit
definition would not be noexcept?)


Done. Removed unnecessary noexcept qualifiers.

It turns out I mistakenly thought using "variant() = default" means
`variant() noexcept(false) = default`.


OK for trunk, thanks.




Re: [PATCH][ARM] Update max_cond_insns settings

2017-06-27 Thread Wilco Dijkstra

    
ping
    
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> > 
> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */
 
> This parameter is also used for A32 code.  Is that really the right
> number there as well?
 
 Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>> 
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies 
due
to being conditional. So the overhead of IT itself is small.

Wilco    

Re: [PATCH][AArch64] Improve Cortex-A53 shift bypass

2017-06-27 Thread Wilco Dijkstra

ping

    
On Fri, May 05, 2017 at 05:02:46PM +0100, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
> 
> > --- a/gcc/config/arm/aarch-common.c
> > +++ b/gcc/config/arm/aarch-common.c
> > @@ -254,12 +254,7 @@ arm_no_early_alu_shift_dep (rtx producer, rtx consumer)
> >  return 0;
> >  
> >    if ((early_op = arm_find_shift_sub_rtx (op)))
> > -    {
> > -  if (REG_P (early_op))
> > - early_op = op;
> > -
> > -  return !reg_overlap_mentioned_p (value, early_op);
> > -    }
> > +    return !reg_overlap_mentioned_p (value, early_op);
> >  
> >    return 0;
> >  }
> 
> > This function is used by several aarch32 pipeline description models.
> > What testing have you given it there.  Are the changes appropriate for
> > those cores as well?
> 
> arm_find_shift_sub_rtx can only ever return NULL_RTX or a shift rtx, so the
> check for REG_P is dead code. Bootstrap passes on ARM too of course.

This took me a bit of head-scratching to get right...

arm_find_shift_sub_rtx calls arm_find_sub_rtx_with_code, looking for
ASHIFT, with find_any_shift set to TRUE. There, we're going to run
through the subRTX of pattern, if the code of the subrtx is one of the
shift-like patterns, we return X, otherwise we return NULL_RTX.

Thus 

> > -  if (REG_P (early_op))
> > - early_op = op;

is not needed, and the code can be reduced to:

  if ((early_op = arm_find_shift_sub_rtx (op)))
    return !reg_overlap_mentioned_p (value, early_op);
  return 0;

So, this looks fine to me from an AArch64 perspective - but you'll need an
ARM OK too as this is shared code.

Cheers,
James



Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-06-27 Thread Wilco Dijkstra

    
ping

    
Richard Earnshaw (lists) wrote:
>  (define_insn "*movdi_vfp"
> -  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,q,q,m,w,r,w,w, Uv")
> +  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,q,q,m,w,!r,w,w, Uv")

> Why have you introduced a no-reloads block on the 9th alternative for
> all variants?

That is the default behaviour when you don't explicitly set a cpu, so I kept 
that.
See https://patches.linaro.org/patch/541/ for the original reason for adding it 
-
duplicating this pattern was a mistake since '!' wouldn't pessimize other cores
as int<->fp moves typically have a non-trivial cost.

However given Cortex-A8 is ancient now we could just remove the '!'.

Wilco    

Re: [PATCH][ARM] Fix ldrd offsets

2017-06-27 Thread Wilco Dijkstra
    

ping

From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
    
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

    gcc/
    * config/arm/arm.c (arm_legitimate_index_p): Add comment.
    (thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
 {
   HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
   if (TARGET_LDRD)
 return val > -256 && val < 256;
   else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
 {
   HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-    However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+    Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
 }
   else
 return 0;    

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-06-27 Thread Wilco Dijkstra
    

ping


From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns
    
After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are merged
into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific DImode 
patterns.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
    (iordi_notzesidi_di): Likewise.
    (iordi_notdi_zesidi): Likewise.
    (iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
    (set_attr "type" "alu_sreg")]
 )
 
-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[5] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-    (match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[4] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 31)))
-  (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "*orsi_notsi_si"
   [(set (match_operand:SI 0 

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-06-27 Thread Wilco Dijkstra
    

ping


From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
    
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_params_internal): Improve setting of
    max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-    {
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-    max_insns_skipped = arm_restrict_it ? 1 : 4;
-    }
-  else
-    /* When -mrestrict-it is in use tone down the if-conversion.  */
-    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+    max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default

    

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-06-27 Thread Wilco Dijkstra
    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" 

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-06-27 Thread Wilco Dijkstra
ping
    
Wilco Dijkstra wrote:
> James Greenhalgh wrote:
>
> > I note this is still marked as an RFC, are you now proposing it as a
> > patch to be merged to trunk?
> 
> Absolutely. It was marked as an RFC to get some comments - I thought it
> may be controversial to separate the frame pointer and frame chain concept. 
> And this fixes the long standing bugs caused by changing the global frame
> pointer option to an incorrect value for the leaf function optimization.

Here is a rebased version that should patch without merge issues:

Cleanup frame pointer usage.  Introduce a boolean emit_frame_chain which
determines whether to store FP and LR and setup FP to point at this record.
When the frame pointer is enabled but not strictly required (eg. no use of
alloca), we emit a frame chain in non-leaf functions, but don't use the
frame pointer to access locals.  This results in smaller code and unwind info.

Simplify the logic in aarch64_override_options_after_change_1 () and compute
whether the frame chain is required in aarch64_layout_frame () instead.
As a result aarch64_frame_pointer_required is now redundant.

Convert all callee save/restore functions to use gen_frame_mem.

Bootstrap OK.

ChangeLog:
2017-06-15  Wilco Dijkstra  

    gcc/
    PR middle-end/60580
    * config/aarch64/aarch64.h (aarch64_frame):
 Add emit_frame_chain boolean.
    * config/aarch64/aarch64.c (aarch64_frame_pointer_required)
    Remove.
    (aarch64_layout_frame): Initialise emit_frame_chain.
    (aarch64_pushwb_single_reg): Use gen_frame_mem.
    (aarch64_pop_regs): Likewise.
    (aarch64_gen_load_pair): Likewise.
    (aarch64_save_callee_saves): Likewise.
    (aarch64_restore_callee_saves): Likewise.
    (aarch64_expand_prologue): Use emit_frame_chain.
    (aarch64_can_eliminate): Simplify. When FP needed or outgoing
    arguments are large, eliminate to FP, otherwise SP.
    (aarch64_override_options_after_change_1): Simplify.
    (TARGET_FRAME_POINTER_REQUIRED): Remove define.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
08acdeb52d4083f50a4b44f43fb98009cdcc041f..722c39cfc4d57280d621fb6130e4d9f4d59d1e72
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -591,6 +591,9 @@ struct GTY (()) aarch64_frame
   /* The size of the stack adjustment after saving callee-saves.  */
   HOST_WIDE_INT final_adjust;
 
+  /* Store FP,LR and setup a frame pointer.  */
+  bool emit_frame_chain;
+
   unsigned wb_candidate1;
   unsigned wb_candidate2;
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fd3005d8056e65cb32c92bbd5eb752c977c885a5..a97b4bbe9dc0f7bccc90a9337519038041241531
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2761,24 +2761,6 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch64_frame_pointer_required (void)
-{
-  /* In aarch64_override_options_after_change
- flag_omit_leaf_frame_pointer turns off the frame pointer by
- default.  Turn it back on now if we've not got a leaf
- function.  */
-  if (flag_omit_leaf_frame_pointer
-  && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
-    return true;
-
-  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
-  if (crtl->calls_eh_return)
-    return true;
-
-  return false;
-}
-
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -2791,6 +2773,18 @@ aarch64_layout_frame (void)
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
 
+  /* Force a frame chain for EH returns so the return address is at FP+8.  */
+  cfun->machine->frame.emit_frame_chain
+    = frame_pointer_needed || crtl->calls_eh_return;
+
+  /* Emit a frame chain if the frame pointer is enabled.
+ If -momit-leaf-frame-pointer is used, do not use a frame chain
+ in leaf functions which do not use LR.  */
+  if (flag_omit_frame_pointer == 2
+  && !(flag_omit_leaf_frame_pointer && crtl->is_leaf
+  && !df_regs_ever_live_p (LR_REGNUM)))
+    cfun->machine->frame.emit_frame_chain = true;
+
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
 
@@ -2825,7 +2819,7 @@ aarch64_layout_frame (void)
 last_fp_reg = regno;
   }
 
-  if (frame_pointer_needed)
+  if (cfun->machine->frame.emit_frame_chain)
 {
   /* FP and LR are placed in the linkage record.  */
   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
@@ -2997,7 +2991,7 @@ aarch64_pushwb_single_reg (machine_mode mode, unsigned 
regno,
   reg = gen_rtx_REG (mode, regno);
   mem = gen_rtx_PRE_MODIFY (Pmode, base_rtx,
 plus_constant (Pmode, base_rtx, -adjustment));
-  mem = gen_rtx_MEM (mode, mem);
+  mem = gen_frame_mem (mode, mem);
 
   insn = 

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-06-27 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
    
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common).
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- 

Re: [PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-06-27 Thread Michael Matz
Hi,

On Tue, 27 Jun 2017, Martin Liška wrote:

> Following bug was for me very educative. I learned that we support 
> non-local gotos that can be combined with nested functions. Real fun :) 
> Well, the problem is that both cfun->nonlocal_goto_save_area and 
> cfun->static_chain_decl (emitted in expand_function_start) are put 
> before NOTE_INSN_FUNCTION_BEG. And so result of expand_used_vars is put 
> after these instrumentations. That causes problems as it uses stack 
> before we initialize it (use-after-return checking):

I don't think that's the right fix.  The purpose of 
NOTE_INSN_FUNCTION_BEG is to mark the "real" beginning of the function, 
i.e. without all the compiler generated stuff that's necessary to set up 
parameters or local variables and so on.  The goto save area and the 
static chain are also such compiler generated implementation details, and 
hence are correctly put in front of the function begin note.

Also if you put something in front of the static_chain_decl initialization 
(as you do if you move the parm_birth_insn in front of it) you'd have to 
make sure that the incoming hidding parameter containing the static chain 
(r10 on x86_64) isn't clobbered from function start up to that point.  
So that won't work either generally.

I don't know what exactly is the issue with calling 
__asan_handle_no_return before the other instructions emitted by 
expand_used_vars.  Either it shouldn't be called for the static chain 
(i.e. not instrumented) or whatever setup asan needs needs to happen in 
front of the static chain setup, but then without clobbering the incoming 
static chain param (!).


Ciao,
Michael.
> 
> expanded cfun->static_chain_decl:
> 
> (note 1 0 5 NOTE_INSN_DELETED)
> (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 2 5 3 2 (set (reg/f:DI 88 [ CHAIN.1 ])
> (reg:DI 39 r10 [ CHAIN.1 ])) "pr81186.c":5 -1
>  (nil))
> (insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
> (const_int -8 [0xfff8])) [0  S8 A64])
> (reg:DI 39 r10 [ CHAIN.1 ])) "pr81186.c":5 -1
>  (nil))
> (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
> (insn 7 4 8 2 (set (reg/f:DI 87 [ _2 ])
> (reg/f:DI 88 [ CHAIN.1 ])) "pr81186.c":5 -1
>  (nil))
> (call_insn 8 7 9 2 (call (mem:QI (symbol_ref:DI ("__asan_handle_no_return") 
> [flags 0x41]   __builtin___asan_handle_no_return>) [0 __builtin___asan_handle_no_return S1 
> A8])
> (const_int 0 [0])) "pr81186.c":5 -1
>  (expr_list:REG_EH_REGION (const_int 0 [0])
> (nil))
> (nil))
> 
> expanded cfun->nonlocal_goto_save_area:
> 
> (note 1 0 34 NOTE_INSN_DELETED)
> (note 34 1 31 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 31 34 32 2 (set (mem/f/c:DI (plus:DI (reg:DI 95)
> (const_int -64 [0xffc0])) [4 
> FRAME.0.__nl_goto_buf+0 S8 A64])
> (reg/f:DI 82 virtual-stack-vars)) "pr81186.c":3 -1
>  (nil))
> (insn 32 31 2 2 (set (mem/f/c:DI (plus:DI (reg:DI 95)
> (const_int -56 [0xffc8])) [4 
> FRAME.0.__nl_goto_buf+8 S8 A64])
> (reg/f:DI 7 sp)) "pr81186.c":3 -1
>  (nil))
> (insn 2 32 3 2 (parallel [
> (set (reg:DI 96)
> (plus:DI (reg/f:DI 82 virtual-stack-vars)
> (const_int -96 [0xffa0])))
> (clobber (reg:CC 17 flags))
> ]) "pr81186.c":3 -1
>  (nil))
> (insn 3 2 4 2 (set (reg:DI 97)
> (reg:DI 96)) "pr81186.c":3 -1
>  (nil))
> (insn 4 3 5 2 (set (reg:CCZ 17 flags)
> (compare:CCZ (mem/c:SI (symbol_ref:DI 
> ("__asan_option_detect_stack_use_after_return") [flags 0x40]   0x2ba005b5b750 __asan_option_detect_stack_use_after_return>) [5 
> __asan_option_detect_stack_use_after_return+0 S4 A32])
> (const_int 0 [0]))) "pr81186.c":3 -1
>  (nil))
> 
> And thus I suggest to move both these instrumentations after 
> NOTE_INSN_FUNCTION_BEG.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2017-06-27  Martin Liska  
> 
> PR sanitize/81186
>   * function.c (expand_function_start): Move static chain and non-local
>   goto init after NOTE_INSN_FUNCTION_BEG.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-27  Martin Liska  
> 
> PR sanitize/81186
>   * gcc.dg/asan/pr81186.c: New test.
> ---
>  gcc/function.c  | 18 +-
>  gcc/testsuite/gcc.dg/asan/pr81186.c | 13 +
>  2 files changed, 22 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/asan/pr81186.c
> 
> 
> 

Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-27 Thread Renlin Li

Hi Andrew,

On 25/06/17 22:38, Andrew Pinski wrote:

On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li  wrote:

Hi all,

In this patch, a new integer register operand modifier 'r' is added. This
will use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which might
be used in inline assembly. It's not an exhaustive list covers every
modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly
documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?


I think 'r' modifier is very fragile and can be used incorrectly and
wrong in some cases really..


The user could always (or be encouraged to) opt to a strict register modifier to enforce 
consistent behavior in all cases.


I agree the flexibility might bring unexpected behavior in corner cases.
Do you have any examples to share off the top of your head? So that we can discuss the 
benefit and pitfalls, and decide to improve the patch or withdraw it.



I like the documentation though.

Thanks,
Renlin



Thanks,
Andrew



gcc/ChangeLog:

2017-06-06  Renlin Li  

 PR target/63359
 * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
modifier.
 * doc/extend.texi (AArch64Operandmodifiers): New section.


[C++ PATCH] Kill IDENTIFIER_TEMPLATE

2017-06-27 Thread Nathan Sidwell
I discovered IDENTIFIER_TEMPLATE is never assigned to, and consequently 
never non-null.


Nuked after a successful bootstrap.

nathan
--
Nathan Sidwell
2017-06-27  Nathan Sidwell  

	Kill IDENTIFIER_TEMPLATE.
	* cp-tree.h (lang_identifier): Remove class_template_info field.
	(IDENTIFIER_TEMPLATE): Delete.
	* name-lookup.c (constructor_name_full): Subsume into ...
	(constructor_name): ... here.  Don't check IDENTIFIER_TEMPLATE.
	(constructor_name_p): Likewise.
	* mangle.c (write_source_name): Likewise.
	* ptree.c (cxx_print_identifier): Likewise.

Index: cp-tree.h
===
--- cp-tree.h	(revision 249657)
+++ cp-tree.h	(working copy)
@@ -527,7 +527,6 @@ extern GTY(()) tree cp_global_trees[CPTI
 struct GTY(()) lang_identifier {
   struct c_common_identifier c_common;
   cxx_binding *bindings;
-  tree class_template_info;
   tree label_value;
 };
 
@@ -954,9 +953,6 @@ enum GTY(()) abstract_class_use {
 
 /* Macros for access to language-specific slots in an identifier.  */
 
-#define IDENTIFIER_TEMPLATE(NODE)	\
-  (LANG_IDENTIFIER_CAST (NODE)->class_template_info)
-
 /* The IDENTIFIER_BINDING is the innermost cxx_binding for the
 identifier.  Its PREVIOUS is the next outermost binding.  Each
 VALUE field is a DECL for the associated declaration.  Thus,
Index: mangle.c
===
--- mangle.c	(revision 249654)
+++ mangle.c	(working copy)
@@ -1460,11 +1460,6 @@ write_source_name (tree identifier)
 {
   MANGLE_TRACE_TREE ("source-name", identifier);
 
-  /* Never write the whole template-id name including the template
- arguments; we only want the template name.  */
-  if (IDENTIFIER_TEMPLATE (identifier))
-identifier = IDENTIFIER_TEMPLATE (identifier);
-
   write_unsigned_number (IDENTIFIER_LENGTH (identifier));
   write_identifier (IDENTIFIER_POINTER (identifier));
 }
Index: name-lookup.c
===
--- name-lookup.c	(revision 249654)
+++ name-lookup.c	(working copy)
@@ -3183,27 +3183,12 @@ set_identifier_type_value (tree id, tree
 }
 
 /* Return the name for the constructor (or destructor) for the
-   specified class TYPE.  When given a template, this routine doesn't
-   lose the specialization.  */
-
-static inline tree
-constructor_name_full (tree type)
-{
-  return TYPE_IDENTIFIER (TYPE_MAIN_VARIANT (type));
-}
-
-/* Return the name for the constructor (or destructor) for the
-   specified class.  When given a template, return the plain
-   unspecialized name.  */
+   specified class.  */
 
 tree
 constructor_name (tree type)
 {
-  tree name;
-  name = constructor_name_full (type);
-  if (IDENTIFIER_TEMPLATE (name))
-name = IDENTIFIER_TEMPLATE (name);
-  return name;
+  return TYPE_IDENTIFIER (TYPE_MAIN_VARIANT (type));
 }
 
 /* Returns TRUE if NAME is the name for the constructor for TYPE,
@@ -3212,8 +3197,6 @@ constructor_name (tree type)
 bool
 constructor_name_p (tree name, tree type)
 {
-  tree ctor_name;
-
   gcc_assert (MAYBE_CLASS_TYPE_P (type));
 
   if (!name)
@@ -3227,12 +3210,10 @@ constructor_name_p (tree name, tree type
   || TREE_CODE (type) == TYPEOF_TYPE)
 return false;
 
-  ctor_name = constructor_name_full (type);
+  tree ctor_name = constructor_name (type);
   if (name == ctor_name)
 return true;
-  if (IDENTIFIER_TEMPLATE (ctor_name)
-  && name == IDENTIFIER_TEMPLATE (ctor_name))
-return true;
+
   return false;
 }
 
Index: ptree.c
===
--- ptree.c	(revision 249654)
+++ ptree.c	(working copy)
@@ -181,7 +181,6 @@ cxx_print_identifier (FILE *file, tree n
   fprintf (file, "%s local bindings <%p>", get_identifier_kind_name (node),
 	   (void *) IDENTIFIER_BINDING (node));
   print_node (file, "label", IDENTIFIER_LABEL_VALUE (node), indent + 4);
-  print_node (file, "template", IDENTIFIER_TEMPLATE (node), indent + 4);
 }
 
 void


Re: [PATCH] Do not allow to inline ifunc resolvers (PR ipa/81128).

2017-06-27 Thread Jan Hubicka
> diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> index d5a3ae56c46..69e6e295d55 100644
> --- a/gcc/ipa-visibility.c
> +++ b/gcc/ipa-visibility.c
> @@ -97,7 +97,8 @@ non_local_p (struct cgraph_node *node, void *data 
> ATTRIBUTE_UNUSED)
>&& !DECL_EXTERNAL (node->decl)
>&& !node->externally_visible
>&& !node->used_from_other_partition
> -  && !node->in_other_partition);
> +  && !node->in_other_partition
> +  && !lookup_attribute ("ifunc", DECL_ATTRIBUTES (node->decl)));
>  }
>  
>  /* Return true when function can be marked local.  */
> 
> It's questionable if local.local can be true for ifunc function? If can, one 
> would need to
> move check for ifunc aerly in:
> /* Return function availability.  See cgraph.h for description of individual
>return values.  */
> enum availability
> cgraph_node::get_availability (symtab_node *ref)
> {
>   if (ref)
> {
>   cgraph_node *cref = dyn_cast  (ref);
>   if (cref)
>   ref = cref->global.inlined_to;
> }
>   enum availability avail;
>   if (!analyzed)
> avail = AVAIL_NOT_AVAILABLE;
>   else if (local.local)
> avail = AVAIL_LOCAL;
>   else if (global.inlined_to)
> avail = AVAIL_AVAILABLE;
>   else if (transparent_alias)
> ultimate_alias_target (, ref);
>   else if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)))
> avail = AVAIL_INTERPOSABLE;
> 
> ...
> 
> What solution do you prefer Honza?

Probably just update non_local_p to also check that availability is at least 
AVAIL_AVAILABLE.
Then we will have one place to collect such a side cases where !EXTERNAL 
function can be
interposed.

Honza
> 
> Thanks,
> Martin
> 
> > 
> > Honza
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2017-06-22  Martin Liska  
> >>
> >>* ipa-inline.c (can_inline_edge_p): Return false for ifunc fns.
> >>* ipa-visibility.c (can_replace_by_local_alias): Likewise.
> >>
> >> gcc/c-family/ChangeLog:
> >>
> >> 2017-06-22  Martin Liska  
> >>
> >>* c-attribs.c (handle_alias_ifunc_attribute): Append ifunc alias
> >>to a function declaration.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2017-06-22  Martin Liska  
> >>
> >>* gcc.target/i386/pr81128.c: New test.
> >> ---
> >>  gcc/c-family/c-attribs.c| 11 --
> >>  gcc/ipa-inline.c|  2 +
> >>  gcc/ipa-visibility.c|  3 +-
> >>  gcc/testsuite/gcc.target/i386/pr81128.c | 65 
> >> +
> >>  4 files changed, 77 insertions(+), 4 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr81128.c
> >>
> >>
> > 
> >> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> >> index 2b6845f2cbd..626ffa1cde7 100644
> >> --- a/gcc/c-family/c-attribs.c
> >> +++ b/gcc/c-family/c-attribs.c
> >> @@ -1846,9 +1846,14 @@ handle_alias_ifunc_attribute (bool is_alias, tree 
> >> *node, tree name, tree args,
> >>TREE_STATIC (decl) = 1;
> >>  
> >>if (!is_alias)
> >> -  /* ifuncs are also aliases, so set that attribute too.  */
> >> -  DECL_ATTRIBUTES (decl)
> >> -= tree_cons (get_identifier ("alias"), args, DECL_ATTRIBUTES (decl));
> >> +  {
> >> +/* ifuncs are also aliases, so set that attribute too.  */
> >> +DECL_ATTRIBUTES (decl)
> >> +  = tree_cons (get_identifier ("alias"), args,
> >> +   DECL_ATTRIBUTES (decl));
> >> +DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("ifunc"),
> >> +NULL, DECL_ATTRIBUTES (decl));
> >> +  }
> >>  }
> >>else
> >>  {
> >> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> >> index fb20d3723cc..588fa9c41e4 100644
> >> --- a/gcc/ipa-inline.c
> >> +++ b/gcc/ipa-inline.c
> >> @@ -370,6 +370,8 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
> >>e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
> >>inlinable = false;
> >>  }
> >> +  else if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (e->callee->decl)))
> >> +  inlinable = false;
> >>/* Check if caller growth allows the inlining.  */
> >>else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> >>   && !disregard_limits
> >> diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> >> index d5a3ae56c46..79d05b41085 100644
> >> --- a/gcc/ipa-visibility.c
> >> +++ b/gcc/ipa-visibility.c
> >> @@ -345,7 +345,8 @@ can_replace_by_local_alias (symtab_node *node)
> >>
> >>return (node->get_availability () > AVAIL_INTERPOSABLE
> >>  && !decl_binds_to_current_def_p (node->decl)
> >> -&& !node->can_be_discarded_p ());
> >> +&& !node->can_be_discarded_p ()
> >> +&& !lookup_attribute ("ifunc", DECL_ATTRIBUTES (node->decl)));
> >>  }
> >>  
> >>  /* Return true if we can replace reference to NODE by local alias
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr81128.c 
> >> b/gcc/testsuite/gcc.target/i386/pr81128.c
> >> new file mode 100644
> >> index 

Re: [PATCH] Do not allow to inline ifunc resolvers (PR ipa/81128).

2017-06-27 Thread Martin Liška
On 06/27/2017 04:57 PM, Jan Hubicka wrote:
>> Hello.
>>
>> Currently ifunc is interpreted as normal alias by IPA optimizations. That's 
>> problematic
>> as should not consider ifunc alias as candidate for inlining, or redirection.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> And survives MVC tests on x86_64-linux-gnu.
>>
>> Ready to be installed?
> 
> Wasn't this supposed to go with arranging availability to be 
> AVAIL_INTERPOSABLE
> (as the target will be interposed by linker to the corect target)

Should work, there's alternative patch:

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 2b6845f2cbd..626ffa1cde7 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -1846,9 +1846,14 @@ handle_alias_ifunc_attribute (bool is_alias, tree *node, 
tree name, tree args,
TREE_STATIC (decl) = 1;
 
   if (!is_alias)
-   /* ifuncs are also aliases, so set that attribute too.  */
-   DECL_ATTRIBUTES (decl)
- = tree_cons (get_identifier ("alias"), args, DECL_ATTRIBUTES (decl));
+   {
+ /* ifuncs are also aliases, so set that attribute too.  */
+ DECL_ATTRIBUTES (decl)
+   = tree_cons (get_identifier ("alias"), args,
+DECL_ATTRIBUTES (decl));
+ DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("ifunc"),
+ NULL, DECL_ATTRIBUTES (decl));
+   }
 }
   else
 {
diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
index d5a3ae56c46..69e6e295d55 100644
--- a/gcc/ipa-visibility.c
+++ b/gcc/ipa-visibility.c
@@ -97,7 +97,8 @@ non_local_p (struct cgraph_node *node, void *data 
ATTRIBUTE_UNUSED)
   && !DECL_EXTERNAL (node->decl)
   && !node->externally_visible
   && !node->used_from_other_partition
-  && !node->in_other_partition);
+  && !node->in_other_partition
+  && !lookup_attribute ("ifunc", DECL_ATTRIBUTES (node->decl)));
 }
 
 /* Return true when function can be marked local.  */

It's questionable if local.local can be true for ifunc function? If can, one 
would need to
move check for ifunc aerly in:
/* Return function availability.  See cgraph.h for description of individual
   return values.  */
enum availability
cgraph_node::get_availability (symtab_node *ref)
{
  if (ref)
{
  cgraph_node *cref = dyn_cast  (ref);
  if (cref)
ref = cref->global.inlined_to;
}
  enum availability avail;
  if (!analyzed)
avail = AVAIL_NOT_AVAILABLE;
  else if (local.local)
avail = AVAIL_LOCAL;
  else if (global.inlined_to)
avail = AVAIL_AVAILABLE;
  else if (transparent_alias)
ultimate_alias_target (, ref);
  else if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)))
avail = AVAIL_INTERPOSABLE;

...

What solution do you prefer Honza?

Thanks,
Martin

> 
> Honza
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-06-22  Martin Liska  
>>
>>  * ipa-inline.c (can_inline_edge_p): Return false for ifunc fns.
>>  * ipa-visibility.c (can_replace_by_local_alias): Likewise.
>>
>> gcc/c-family/ChangeLog:
>>
>> 2017-06-22  Martin Liska  
>>
>>  * c-attribs.c (handle_alias_ifunc_attribute): Append ifunc alias
>>  to a function declaration.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-06-22  Martin Liska  
>>
>>  * gcc.target/i386/pr81128.c: New test.
>> ---
>>  gcc/c-family/c-attribs.c| 11 --
>>  gcc/ipa-inline.c|  2 +
>>  gcc/ipa-visibility.c|  3 +-
>>  gcc/testsuite/gcc.target/i386/pr81128.c | 65 
>> +
>>  4 files changed, 77 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr81128.c
>>
>>
> 
>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>> index 2b6845f2cbd..626ffa1cde7 100644
>> --- a/gcc/c-family/c-attribs.c
>> +++ b/gcc/c-family/c-attribs.c
>> @@ -1846,9 +1846,14 @@ handle_alias_ifunc_attribute (bool is_alias, tree 
>> *node, tree name, tree args,
>>  TREE_STATIC (decl) = 1;
>>  
>>if (!is_alias)
>> -/* ifuncs are also aliases, so set that attribute too.  */
>> -DECL_ATTRIBUTES (decl)
>> -  = tree_cons (get_identifier ("alias"), args, DECL_ATTRIBUTES (decl));
>> +{
>> +  /* ifuncs are also aliases, so set that attribute too.  */
>> +  DECL_ATTRIBUTES (decl)
>> += tree_cons (get_identifier ("alias"), args,
>> + DECL_ATTRIBUTES (decl));
>> +  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("ifunc"),
>> +  NULL, DECL_ATTRIBUTES (decl));
>> +}
>>  }
>>else
>>  {
>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
>> index fb20d3723cc..588fa9c41e4 100644
>> --- a/gcc/ipa-inline.c
>> +++ b/gcc/ipa-inline.c
>> @@ -370,6 +370,8 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
>>

Re: [PATCH][AArch64] Fix PR79041

2017-06-27 Thread Yvan Roux
On 27 June 2017 at 16:55, Yvan Roux  wrote:
> Hi
>
> On 27 June 2017 at 16:49, Wilco Dijkstra  wrote:
>> As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
>> may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
>> trunk and GCC7 may still emit ADRP for some constant pool literals.
>> Fix this by adding a aarch64_pcrelative_literal_loads check.
>>
>> OK for trunk/GCC7 backport?
>
> I can't approve it, but the patch is ok for me, I've built and
> regtested the very same patch for aarch64-linux-gnu,
> aarch64-none-elf and aarch64_be-none-elf targets :)
>
> Yvan
>
>> ChangeLog:
>> 2017-06-27  Wilco Dijkstra  
>>
>> PR target/79041
>> * config/aarch64/aarch64.c (aarch64_classify_symbol):
>> Avoid SYMBOL_SMALL_ABSOLUTE .
>> * testsuite/gcc.target/aarch64/pr79041-2.c: New test.
>> --
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
>>   /* This is alright even in PIC code as the constant
>>  pool reference is always PC relative and within
>>  the same translation unit.  */
>> - if (CONSTANT_POOL_ADDRESS_P (x))
>> + if (CONSTANT_POOL_ADDRESS_P (x) && 
>> !aarch64_pcrelative_literal_loads)

maybe just a small note here, in my backport patch on gcc-6-branch a
kept the existing order in the condition:

- if (nopcrelative_literal_loads
+ if (!aarch64_pcrelative_literal_loads
  && CONSTANT_POOL_ADDRESS_P (x))

Can we keep it, or maybe I should wait for your patch to be committed
on trunk, and include its backport in my patch for GCC 6.


>> return SYMBOL_SMALL_ABSOLUTE;
>>   else
>> return SYMBOL_FORCE_TO_MEM;
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
>> new file mode 100644
>> index 
>> ..e7899725bad2b770f8488a07f99792113275bdf2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
>> +
>> +__int128
>> +t (void)
>> +{
>> +  return (__int128)1 << 80;
>> +}
>> +
>> +/* { dg-final { scan-assembler "adr" } } */


Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-27 Thread Richard Biener
On June 27, 2017 4:27:17 PM GMT+02:00, "Bin.Cheng"  
wrote:
>On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener
> wrote:
>> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng 
>wrote:
>>> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng 
>wrote:
 Hi,
 I was asked by upstream to split the loop distribution patch into
>small ones.
 It is hard because data structure and algorithm are closely coupled
>together.
 Anyway, this is the patch series with smaller patches.  Basically I
>tried to
 separate data structure and bug-fix changes apart with one as the
>main patch.
 Note I only made necessary code refactoring in order to separate
>patch, apart
 from that, there is no change against the last version.

 This is the first patch introducing new internal function
>IFN_LOOP_DIST_ALIAS.
 GCC will distribute loops under condition of this function call.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>> Hi,
>>> I need to update this patch fixing an issue in
>>> vect_loop_dist_alias_call.  The previous patch fails to find some
>>> IFN_LOOP_DIST_ALIAS calls.
>>>
>>> Bootstrap and test in series.  Is it OK?
>>
>> So I wonder if we really need to track ldist_alias_id or if we can do
>sth
>Yes, it is needed because otherwise we probably falsely trying to
>search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution)
>loop.
>
>> more "general", like tracking a copy_of or origin and then directly
>> go to nearest_common_dominator (loop->header, copy_of->header)
>> to find the controlling condition?
>I tend to not record any pointer in loop structure, it can easily go
>dangling for a across passes data structure.  

I didn't mean to record a pointer, just rename your field and make it more 
general.  The common dominator thing shod still work, no?

As far as memory usage
>is concerned.  I actually don't need a whole integer to record the
>loop num.  I can simply restrict number of distributions in one
>function to at most 256, and record such id in a char field in struct
>loop?  Does this sounds better?

As said, tracking loop origin sounds useful anyway so I'd rather add and use 
that somehow.

>Thanks,
>bin
>>
>> That said "ldist_alias_id" is a bit too narrow of purpose to "waste"
>> an int inside struct loop?  I'd set copy_of/origi in loop_version for
>example.
>> 'origin' would probably be better given the ldist cases aren't really
>> full "copies".
>>
>> fold_loop_dist_alias_call should re-use / rename
>fold_loop_vectorized_call
>> by just passing folded_value to it.
>>
>> Richard.
>>
>>> Thanks,
>>> bin

 Thanks,
 bin
 2017-06-07  Bin Cheng  

 * cfgloop.h (struct loop): New field ldist_alias_id.
 * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment
>change.
 * internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
 * internal-fn.def (LOOP_DIST_ALIAS): New.
 * tree-vectorizer.c (vect_loop_dist_alias_call): New
>function.
 (fold_loop_dist_alias_call): New function.
 (vectorize_loops): Fold IFN_LOOP_DIST_ALIAS call depending
>on
 successful vectorization or not.



Re: [PATCH] Do not allow to inline ifunc resolvers (PR ipa/81128).

2017-06-27 Thread Jan Hubicka
> Hello.
> 
> Currently ifunc is interpreted as normal alias by IPA optimizations. That's 
> problematic
> as should not consider ifunc alias as candidate for inlining, or redirection.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> And survives MVC tests on x86_64-linux-gnu.
> 
> Ready to be installed?

Wasn't this supposed to go with arranging availability to be AVAIL_INTERPOSABLE
(as the target will be interposed by linker to the corect target)

Honza
> Martin
> 
> gcc/ChangeLog:
> 
> 2017-06-22  Martin Liska  
> 
>   * ipa-inline.c (can_inline_edge_p): Return false for ifunc fns.
>   * ipa-visibility.c (can_replace_by_local_alias): Likewise.
> 
> gcc/c-family/ChangeLog:
> 
> 2017-06-22  Martin Liska  
> 
>   * c-attribs.c (handle_alias_ifunc_attribute): Append ifunc alias
>   to a function declaration.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-22  Martin Liska  
> 
>   * gcc.target/i386/pr81128.c: New test.
> ---
>  gcc/c-family/c-attribs.c| 11 --
>  gcc/ipa-inline.c|  2 +
>  gcc/ipa-visibility.c|  3 +-
>  gcc/testsuite/gcc.target/i386/pr81128.c | 65 
> +
>  4 files changed, 77 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr81128.c
> 
> 

> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 2b6845f2cbd..626ffa1cde7 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -1846,9 +1846,14 @@ handle_alias_ifunc_attribute (bool is_alias, tree 
> *node, tree name, tree args,
>   TREE_STATIC (decl) = 1;
>  
>if (!is_alias)
> - /* ifuncs are also aliases, so set that attribute too.  */
> - DECL_ATTRIBUTES (decl)
> -   = tree_cons (get_identifier ("alias"), args, DECL_ATTRIBUTES (decl));
> + {
> +   /* ifuncs are also aliases, so set that attribute too.  */
> +   DECL_ATTRIBUTES (decl)
> + = tree_cons (get_identifier ("alias"), args,
> +  DECL_ATTRIBUTES (decl));
> +   DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("ifunc"),
> +   NULL, DECL_ATTRIBUTES (decl));
> + }
>  }
>else
>  {
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index fb20d3723cc..588fa9c41e4 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -370,6 +370,8 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
>e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
>inlinable = false;
>  }
> +  else if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (e->callee->decl)))
> +  inlinable = false;
>/* Check if caller growth allows the inlining.  */
>else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
>  && !disregard_limits
> diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> index d5a3ae56c46..79d05b41085 100644
> --- a/gcc/ipa-visibility.c
> +++ b/gcc/ipa-visibility.c
> @@ -345,7 +345,8 @@ can_replace_by_local_alias (symtab_node *node)
>
>return (node->get_availability () > AVAIL_INTERPOSABLE
> && !decl_binds_to_current_def_p (node->decl)
> -   && !node->can_be_discarded_p ());
> +   && !node->can_be_discarded_p ()
> +   && !lookup_attribute ("ifunc", DECL_ATTRIBUTES (node->decl)));
>  }
>  
>  /* Return true if we can replace reference to NODE by local alias
> diff --git a/gcc/testsuite/gcc.target/i386/pr81128.c 
> b/gcc/testsuite/gcc.target/i386/pr81128.c
> new file mode 100644
> index 000..90a567ad690
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr81128.c
> @@ -0,0 +1,65 @@
> +/* PR ipa/81128 */
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +/* { dg-require-ifunc "" } */
> +
> +
> +#include 
> +#include 
> +#include 
> +
> +int resolver_fn = 0;
> +int resolved_fn = 0;
> +
> +static inline void
> +do_it_right_at_runtime_A ()
> +{
> +  resolved_fn++;
> +}
> +
> +static inline void
> +do_it_right_at_runtime_B ()
> +{
> +  resolved_fn++;
> +}
> +
> +static inline void do_it_right_at_runtime (void);
> +
> +void do_it_right_at_runtime (void)
> +  __attribute__ ((ifunc ("resolve_do_it_right_at_runtime")));
> +
> +static void (*resolve_do_it_right_at_runtime (void)) (void)
> +{
> +  srand (time (NULL));
> +  int r = rand ();
> +  resolver_fn++;
> +
> +  /* Use intermediate variable to get a warning for non-matching
> +   * prototype. */
> +  typeof(do_it_right_at_runtime) *func;
> +  if (r & 1)
> +func = do_it_right_at_runtime_A;
> +  else
> +func = do_it_right_at_runtime_B;
> +
> +  return (void *) func;
> +}
> +
> +int
> +main (void)
> +{
> +  const unsigned int ITERS = 10;
> +
> +  for (int i = ITERS; i > 0; i--)
> +{
> +  do_it_right_at_runtime ();
> +}
> +
> +  if (resolver_fn != 1)
> +__builtin_abort ();
> +
> +  if (resolved_fn != 10)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> 



RE: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-27 Thread Richard Biener
On June 27, 2017 4:52:28 PM GMT+02:00, Tamar Christina 
 wrote:
>> >> +(for cmp (gt ge lt le)
>> >> + outp (convert convert negate negate)
>> >> + outn (negate negate convert convert)
>> >> + /* Transform (X > 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>> >> + /* Transform (X >= 0.0 ? 1.0 : -1.0) into copysign(1, X). */
>> >> + /* Transform (X < 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>> >> + /* Transform (X <= 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
>> >> +(simplify
>> >> +  (cond (cmp @0 real_zerop) real_onep real_minus_onep)
>> >> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
>> >> +   && types_match (type, TREE_TYPE (@0)))
>> >> +   (switch
>> >> +(if (types_match (type, float_type_node))
>> >> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outp @0)))
>> >> +(if (types_match (type, double_type_node))
>> >> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outp @0)))
>> >> +(if (types_match (type, long_double_type_node))
>> >> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outp @0))
>> >>
>
>Hi,
>
>Out of curiosity is there any reason why this transformation can't be
>more general?
>
>e.g. Transform (X > 0.0 ? CST : -CST) into copysign(CST, X).

That's also possible, yes.

>we would at the very least avoid a csel or a branch then.
>
>Regards,
>Tamar



Re: [PATCH][AArch64] Fix PR79041

2017-06-27 Thread Yvan Roux
Hi

On 27 June 2017 at 16:49, Wilco Dijkstra  wrote:
> As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
> may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
> trunk and GCC7 may still emit ADRP for some constant pool literals.
> Fix this by adding a aarch64_pcrelative_literal_loads check.
>
> OK for trunk/GCC7 backport?

I can't approve it, but the patch is ok for me, I've built and
regtested the very same patch for aarch64-linux-gnu,
aarch64-none-elf and aarch64_be-none-elf targets :)

Yvan

> ChangeLog:
> 2017-06-27  Wilco Dijkstra  
>
> PR target/79041
> * config/aarch64/aarch64.c (aarch64_classify_symbol):
> Avoid SYMBOL_SMALL_ABSOLUTE .
> * testsuite/gcc.target/aarch64/pr79041-2.c: New test.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
>   /* This is alright even in PIC code as the constant
>  pool reference is always PC relative and within
>  the same translation unit.  */
> - if (CONSTANT_POOL_ADDRESS_P (x))
> + if (CONSTANT_POOL_ADDRESS_P (x) && 
> !aarch64_pcrelative_literal_loads)
> return SYMBOL_SMALL_ABSOLUTE;
>   else
> return SYMBOL_FORCE_TO_MEM;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> new file mode 100644
> index 
> ..e7899725bad2b770f8488a07f99792113275bdf2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
> +
> +__int128
> +t (void)
> +{
> +  return (__int128)1 << 80;
> +}
> +
> +/* { dg-final { scan-assembler "adr" } } */


RE: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-27 Thread Tamar Christina
> >> +(for cmp (gt ge lt le)
> >> + outp (convert convert negate negate)
> >> + outn (negate negate convert convert)
> >> + /* Transform (X > 0.0 ? 1.0 : -1.0) into copysign(1, X). */
> >> + /* Transform (X >= 0.0 ? 1.0 : -1.0) into copysign(1, X). */
> >> + /* Transform (X < 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
> >> + /* Transform (X <= 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
> >> +(simplify
> >> +  (cond (cmp @0 real_zerop) real_onep real_minus_onep)
> >> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
> >> +   && types_match (type, TREE_TYPE (@0)))
> >> +   (switch
> >> +(if (types_match (type, float_type_node))
> >> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outp @0)))
> >> +(if (types_match (type, double_type_node))
> >> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outp @0)))
> >> +(if (types_match (type, long_double_type_node))
> >> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outp @0))
> >>

Hi,

Out of curiosity is there any reason why this transformation can't be more 
general?

e.g. Transform (X > 0.0 ? CST : -CST) into copysign(CST, X).

we would at the very least avoid a csel or a branch then.

Regards,
Tamar


Re: [PATCH] Bail out HARD_REGISTER vars in asan (PR sanitizer/81224).

2017-06-27 Thread Jakub Jelinek
On Tue, Jun 27, 2017 at 04:42:18PM +0200, Martin Liška wrote:
> Similar to what we do for UBSAN and TSAN, DECL_HARD_REGISTER variables should 
> not
> be instrumented.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2017-06-27  Martin Liska  
> 
>   PR sanitizer/81224
>   * asan.c (instrument_derefs): Bail out inner references
>   that are hard register variables.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-27  Martin Liska  
> 
>   PR sanitizer/81224
>   * gcc.dg/asan/pr81224.c: New test.
> ---
>  gcc/asan.c  |  3 +++
>  gcc/testsuite/gcc.dg/asan/pr81224.c | 10 ++
>  2 files changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/asan/pr81224.c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/asan/pr81224.c
> @@ -0,0 +1,10 @@
> +/* PR sanitizer/80659 */
> +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> +
> +int a;
> +int
> +b ()
> +{
> +  register __attribute__ ((__vector_size__ (sizeof (int int c 
> asm("xmm0");
> +  return c[a];
> +}

I'm sure this test will fail on i?86 if -msse isn't on by default.
So I think you want at least dg-additional-options "-msse2"
(no need for sse2 effective target, as it is dg-do compile only test).

And, I'd expect 4 * sizeof (int) instead of size (int) as vector_size.

So, please test the testcase with something like
RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} asan.exp=pr81224.c'

Ok with that fixed.

Jakub


[PATCH][AArch64] Fix PR79041

2017-06-27 Thread Wilco Dijkstra
As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
trunk and GCC7 may still emit ADRP for some constant pool literals.
Fix this by adding a aarch64_pcrelative_literal_loads check.

OK for trunk/GCC7 backport?

ChangeLog:
2017-06-27  Wilco Dijkstra  

PR target/79041
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Avoid SYMBOL_SMALL_ABSOLUTE .
* testsuite/gcc.target/aarch64/pr79041-2.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
  /* This is alright even in PIC code as the constant
 pool reference is always PC relative and within
 the same translation unit.  */
- if (CONSTANT_POOL_ADDRESS_P (x))
+ if (CONSTANT_POOL_ADDRESS_P (x) && !aarch64_pcrelative_literal_loads)
return SYMBOL_SMALL_ABSOLUTE;
  else
return SYMBOL_FORCE_TO_MEM;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
new file mode 100644
index 
..e7899725bad2b770f8488a07f99792113275bdf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
+
+__int128
+t (void)
+{
+  return (__int128)1 << 80;
+}
+
+/* { dg-final { scan-assembler "adr" } } */


[PATCH] PR libstdc++/81221 fix namespace qualification for parallel mode

2017-06-27 Thread Jonathan Wakely

std::sample needs to call _GLIBCXX_STD_A::__sample instead of
std::__sample, so that it works when Parallel Mode is active.

PR libstdc++/81221
* include/bits/stl_algo.h (sample): Qualify with _GLIBCXX_STD_A not
std.
* testsuite/25_algorithms/sample/81221.cc: New.

Tested powerpc64le-linux, committed to trunk, and will commit to the
gcc-7-branch shortly.


commit e55a8a53c38e9c914d0f12e7a4fd98e6790ad68e
Author: Jonathan Wakely 
Date:   Tue Jun 27 13:46:43 2017 +0100

PR libstdc++/81221 fix namespace qualification for parallel mode

PR libstdc++/81221
* include/bits/stl_algo.h (sample): Qualify with _GLIBCXX_STD_A not
std.
* testsuite/25_algorithms/sample/81221.cc: New.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 246193f..fbebfdb 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -5831,8 +5831,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
"sample size must be an integer type");
 
   typename iterator_traits<_PopulationIterator>::difference_type __d = __n;
-  return std::__sample(__first, __last, __pop_cat{}, __out, __samp_cat{},
-  __d, std::forward<_UniformRandomBitGenerator>(__g));
+  return _GLIBCXX_STD_A::
+   __sample(__first, __last, __pop_cat{}, __out, __samp_cat{}, __d,
+std::forward<_UniformRandomBitGenerator>(__g));
 }
 #endif // C++17
 #endif // C++14
diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc 
b/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc
new file mode 100644
index 000..e6dd5e0
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/sample/81221.cc
@@ -0,0 +1,23 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++1z } }
+
+#undef _GLIBCXX_PARALLEL
+#define _GLIBCXX_PARALLEL 1
+#include 


[PATCH] Bail out HARD_REGISTER vars in asan (PR sanitizer/81224).

2017-06-27 Thread Martin Liška
Similar to what we do for UBSAN and TSAN, DECL_HARD_REGISTER variables should 
not
be instrumented.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-06-27  Martin Liska  

PR sanitizer/81224
* asan.c (instrument_derefs): Bail out inner references
that are hard register variables.

gcc/testsuite/ChangeLog:

2017-06-27  Martin Liska  

PR sanitizer/81224
* gcc.dg/asan/pr81224.c: New test.
---
 gcc/asan.c  |  3 +++
 gcc/testsuite/gcc.dg/asan/pr81224.c | 10 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr81224.c


diff --git a/gcc/asan.c b/gcc/asan.c
index e730530930b..3f814819add 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1875,6 +1875,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   || bitsize != size_in_bytes * BITS_PER_UNIT)
 return;
 
+  if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
+return;
+
   if (VAR_P (inner)
   && offset == NULL_TREE
   && bitpos >= 0
diff --git a/gcc/testsuite/gcc.dg/asan/pr81224.c b/gcc/testsuite/gcc.dg/asan/pr81224.c
new file mode 100644
index 000..5586fe04391
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr81224.c
@@ -0,0 +1,10 @@
+/* PR sanitizer/80659 */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+
+int a;
+int
+b ()
+{
+  register __attribute__ ((__vector_size__ (sizeof (int int c asm("xmm0");
+  return c[a];
+}



Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-27 Thread Bin.Cheng
On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener
 wrote:
> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng  wrote:
>> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng  wrote:
>>> Hi,
>>> I was asked by upstream to split the loop distribution patch into small 
>>> ones.
>>> It is hard because data structure and algorithm are closely coupled 
>>> together.
>>> Anyway, this is the patch series with smaller patches.  Basically I tried to
>>> separate data structure and bug-fix changes apart with one as the main 
>>> patch.
>>> Note I only made necessary code refactoring in order to separate patch, 
>>> apart
>>> from that, there is no change against the last version.
>>>
>>> This is the first patch introducing new internal function 
>>> IFN_LOOP_DIST_ALIAS.
>>> GCC will distribute loops under condition of this function call.
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>> Hi,
>> I need to update this patch fixing an issue in
>> vect_loop_dist_alias_call.  The previous patch fails to find some
>> IFN_LOOP_DIST_ALIAS calls.
>>
>> Bootstrap and test in series.  Is it OK?
>
> So I wonder if we really need to track ldist_alias_id or if we can do sth
Yes, it is needed because otherwise we probably falsely trying to
search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution)
loop.

> more "general", like tracking a copy_of or origin and then directly
> go to nearest_common_dominator (loop->header, copy_of->header)
> to find the controlling condition?
I tend to not record any pointer in loop structure, it can easily go
dangling for a across passes data structure.  As far as memory usage
is concerned.  I actually don't need a whole integer to record the
loop num.  I can simply restrict number of distributions in one
function to at most 256, and record such id in a char field in struct
loop?  Does this sounds better?

Thanks,
bin
>
> That said "ldist_alias_id" is a bit too narrow of purpose to "waste"
> an int inside struct loop?  I'd set copy_of/origi in loop_version for example.
> 'origin' would probably be better given the ldist cases aren't really
> full "copies".
>
> fold_loop_dist_alias_call should re-use / rename fold_loop_vectorized_call
> by just passing folded_value to it.
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> bin
>>> 2017-06-07  Bin Cheng  
>>>
>>> * cfgloop.h (struct loop): New field ldist_alias_id.
>>> * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment change.
>>> * internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
>>> * internal-fn.def (LOOP_DIST_ALIAS): New.
>>> * tree-vectorizer.c (vect_loop_dist_alias_call): New function.
>>> (fold_loop_dist_alias_call): New function.
>>> (vectorize_loops): Fold IFN_LOOP_DIST_ALIAS call depending on
>>> successful vectorization or not.


Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-06-27 Thread Bin.Cheng
On Tue, Jun 27, 2017 at 1:44 PM, Richard Biener
 wrote:
> On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng  wrote:
>> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  wrote:
>>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  wrote:
 Hi,
>> Rebased V3 for changes in previous patches.  Bootstap and test on
>> x86_64 and aarch64.
>
> why is ldist-12.c no longer distributed?  your comment says it doesn't expose
> more "parallelism" but the point is to reduce memory bandwith requirements
> which it clearly does.
>
> Likewise for -13.c, -14.c.  -4.c may be a questionable case but the wording
> of "parallelism" still confuses me.
>
> Can you elaborate on that.  Now onto the patch:
Given we don't model data locality or memory bandwidth, whether
distribution enables loops that can be executed paralleled becomes the
major criteria for distribution.  BTW, I think a good memory stream
optimization model shouldn't consider small loops as in ldist-12.c,
etc., appropriate for distribution.

>
> +   Loop distribution is the dual of loop fusion.  It separates statements
> +   of a loop (or loop nest) into multiple loops (or loop nests) with the
> +   same loop header.  The major goal is to separate statements which may
> +   be vectorized from those that can't.  This pass implements distribution
> +   in the following steps:
>
> misses the goal of being a memory stream optimization, not only a 
> vectorization
> enabler.  distributing a loop can also reduce register pressure.
I will revise the comment, but as explained, enabling more
vectorization is the major criteria for distribution to some extend
now.

>
> You introduce ldist_alias_id in struct loop (probably in 01/n which I
> didn't look
> into yet).  If you don't use that please introduce it separately.
Hmm, yes it is introduced in patch [01/n] and set in this patch.

>
> + /* Be conservative.  If data references are not well analyzed,
> +or the two data references have the same base address and
> +offset, add dependence and consider it alias to each other.
> +In other words, the dependence can not be resolved by
> +runtime alias check.  */
> + if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
> + || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
> + || !DR_INIT (dr1) || !DR_INIT (dr2)
> + || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
> + || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
> + || res == 0)
>
> ISTR a helper that computes whether we can handle a runtime alias check for
> a specific case?
I guess you mean runtime_alias_check_p that I factored out previously?
 Unfortunately, it's factored out vectorizer's usage and doesn't fit
here straightforwardly.  Shall I try to further generalize the
interface as independence patch to this one?

>
> +  /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
> +  if (flag_tree_loop_vectorize)
> +{
>
> so at this point I'd condition the whole runtime-alias check generating
> on flag_tree_loop_vectorize.  You seem to support versioning w/o
> that here but in other places disable versioning w/o flag_tree_loop_vectorize.
> That looks somewhat inconsistent...
It is a bit complicated.  In function version_for_distribution_p, we have
+
+  /* Need to version loop if runtime alias check is necessary.  */
+  if (alias_ddrs->length () > 0)
+return true;
+
+  /* Don't version the loop with call to IFN_LOOP_DIST_ALIAS if
+ vectorizer is not enable because no other pass can fold it.  */
+  if (!flag_tree_loop_vectorize)
+return false;
+

It means we also versioning loops even if runtime alias check is
unnecessary.  I did this because we lack cost model and current
distribution may result in too many distribution?  If that's the case,
at least vectorizer will remove distributed version loop and fall back
to the original one.  Hmm, shall I change it into below code:
+
+  /* Need to version loop if runtime alias check is necessary.  */
+  if (alias_ddrs->length () == 0)
+return false;
+
+  /* Don't version the loop with call to IFN_LOOP_DIST_ALIAS if
+ vectorizer is not enable because no other pass can fold it.  */
+  if (!flag_tree_loop_vectorize)
+return false;
+

Then I can remove the check you mentioned in function
version_loop_by_alias_check?

>
> +  /* Don't version loop if any partition is builtin.  */
> +  for (i = 0; partitions->iterate (i, ); ++i)
> +{
> +  if (partition->kind != PKIND_NORMAL)
> +   break;
> +}
>
> why's that?  Do you handle the case where only a subset of partitions
One reason is I generally consider distributed builtin functions as a
win, thus distribution won't be canceled later in vectorizer.  Another
reason is if all distributed loops are recognized as builtins, we
can't really version with current 

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-06-27 Thread Richard Earnshaw (lists)
On 27/06/17 14:39, Wilco Dijkstra wrote:
> This patch fixes a failure in gcc.target/aarch64/reload-valid-spoff.c 
> triggered by https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01367.html -
> it supersedes https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01907.html
> as this fixes the root cause of the failure.
> 
> In ILP32 all memory accesses must have Pmode as the base address, but
> aarch64_expand_mov_immediate wasn't emitting a conversion in one case.
> Besides fixing this add an assert that flags any MEM operands that are
> not Pmode.
> 
> Passes regress (with/without ilp32). OK for commit?
> 
> ChangeLog:
> 2017-06-27  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64 (aarch64_expand_mov_immediate):
>   Convert memory address to Pmode.

Missing ChangeLog entry for the new assert.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 329d244e9cf16dbdf849e5dd02b3999caf0cd5a7..9038748ba049ba589f067f3f04c31704fe673d2c
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1958,6 +1958,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
> gcc_assert (can_create_pseudo_p ());
> base = gen_reg_rtx (ptr_mode);
> aarch64_expand_mov_immediate (base, XEXP (mem, 0));
> +   if (ptr_mode != Pmode)
> + base = convert_memory_address (Pmode, base);
> mem = gen_rtx_MEM (ptr_mode, base);
>   }
>  
> @@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>  
>   case MEM:
> output_address (GET_MODE (x), XEXP (x, 0));
> +   gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
> break;

This is worthy of a comment.  Something like "All memory references must
be in Pmode, which is the natural mode of the machine.  This remains the
case even if ptr_mode is different, as for ILP32."

Ok with those changes.

R.
>  
>   case CONST:
> 



Use ucontext_t not struct ucontext in linux-unwind.h files

2017-06-27 Thread Joseph Myers
Current glibc no longer gives the ucontext_t type the tag struct
ucontext, to conform with POSIX namespace rules.  This requires
various linux-unwind.h files in libgcc, that were previously using
struct ucontext, to be fixed to use ucontext_t instead.  This is
similar to the removal of the struct siginfo tag from siginfo_t some
years ago.

This patch changes those files to use ucontext_t instead.  As the
standard name that should be unconditionally safe, so this is not
restricted to architectures supported by glibc, or conditioned on the
glibc version.

Testing compilation together with current glibc with glibc's
build-many-glibcs.py.  OK to commit (mainline and active release
branches) if that passes?

2017-06-27  Joseph Myers  

* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state),
config/alpha/linux-unwind.h (alpha_fallback_frame_state),
config/bfin/linux-unwind.h (bfin_fallback_frame_state),
config/i386/linux-unwind.h (x86_64_fallback_frame_state,
x86_fallback_frame_state), config/m68k/linux-unwind.h (struct
uw_ucontext), config/nios2/linux-unwind.h (struct nios2_ucontext),
config/pa/linux-unwind.h (pa32_fallback_frame_state),
config/riscv/linux-unwind.h (riscv_fallback_frame_state),
config/sh/linux-unwind.h (sh_fallback_frame_state),
config/tilepro/linux-unwind.h (tile_fallback_frame_state),
config/xtensa/linux-unwind.h (xtensa_fallback_frame_state): Use
ucontext_t instead of struct ucontext.

Index: libgcc/config/aarch64/linux-unwind.h
===
--- libgcc/config/aarch64/linux-unwind.h(revision 249686)
+++ libgcc/config/aarch64/linux-unwind.h(working copy)
@@ -55,7 +55,7 @@ aarch64_fallback_frame_state (struct _Unwind_Conte
   struct rt_sigframe
   {
 siginfo_t info;
-struct ucontext uc;
+ucontext_t uc;
   };
 
   struct rt_sigframe *rt_;
Index: libgcc/config/alpha/linux-unwind.h
===
--- libgcc/config/alpha/linux-unwind.h  (revision 249686)
+++ libgcc/config/alpha/linux-unwind.h  (working copy)
@@ -51,7 +51,7 @@ alpha_fallback_frame_state (struct _Unwind_Context
 {
   struct rt_sigframe {
siginfo_t info;
-   struct ucontext uc;
+   ucontext_t uc;
   } *rt_ = context->cfa;
   sc = _->uc.uc_mcontext;
 }
Index: libgcc/config/bfin/linux-unwind.h
===
--- libgcc/config/bfin/linux-unwind.h   (revision 249686)
+++ libgcc/config/bfin/linux-unwind.h   (working copy)
@@ -52,7 +52,7 @@ bfin_fallback_frame_state (struct _Unwind_Context
void *puc;
char retcode[8];
siginfo_t info;
-   struct ucontext uc;
+   ucontext_t uc;
   } *rt_ = context->cfa;
 
   /* The void * cast is necessary to avoid an aliasing warning.
Index: libgcc/config/i386/linux-unwind.h
===
--- libgcc/config/i386/linux-unwind.h   (revision 249686)
+++ libgcc/config/i386/linux-unwind.h   (working copy)
@@ -58,7 +58,7 @@ x86_64_fallback_frame_state (struct _Unwind_Contex
   if (*(unsigned char *)(pc+0) == 0x48
   && *(unsigned long long *)(pc+1) == RT_SIGRETURN_SYSCALL)
 {
-  struct ucontext *uc_ = context->cfa;
+  ucontext_t *uc_ = context->cfa;
   /* The void * cast is necessary to avoid an aliasing warning.
  The aliasing warning is correct, but should not be a problem
  because it does not alias anything.  */
@@ -138,7 +138,7 @@ x86_fallback_frame_state (struct _Unwind_Context *
siginfo_t *pinfo;
void *puc;
siginfo_t info;
-   struct ucontext uc;
+   ucontext_t uc;
   } *rt_ = context->cfa;
   /* The void * cast is necessary to avoid an aliasing warning.
  The aliasing warning is correct, but should not be a problem
Index: libgcc/config/m68k/linux-unwind.h
===
--- libgcc/config/m68k/linux-unwind.h   (revision 249686)
+++ libgcc/config/m68k/linux-unwind.h   (working copy)
@@ -33,7 +33,7 @@ see the files COPYING3 and COPYING.RUNTIME respect
 /*  is unfortunately broken right now.  */
 struct uw_ucontext {
unsigned long uc_flags;
-   struct ucontext  *uc_link;
+   ucontext_t   *uc_link;
stack_t   uc_stack;
mcontext_tuc_mcontext;
unsigned long uc_filler[80];
Index: libgcc/config/nios2/linux-unwind.h
===
--- libgcc/config/nios2/linux-unwind.h  (revision 249686)
+++ libgcc/config/nios2/linux-unwind.h  (working copy)
@@ -38,7 +38,7 @@ struct nios2_mcontext {
 
 struct nios2_ucontext {
   unsigned long uc_flags;
-  struct ucontext *uc_link;
+  ucontext_t *uc_link;
   stack_t uc_stack;
   struct 

Re: [PATCH][AArch64] Fix ldp/stp patterns for ILP32

2017-06-27 Thread Wilco Dijkstra
Hi,

This patch has been superseded by: 
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02027.html

Wilco




minor cleanups in x86*-vxworks support file

2017-06-27 Thread Olivier Hainque
Hello,

Minor cleanups, tested together with the previous patches
introducing the vx7 support.

Committing to mainline.

With Kind Regards,

Olivier

2017-06-27  Jerome Lambourg  

* config/i386/vxworks.h (ASM_SPEC): Remove definition. No target
specific need, just fallback on defaults.
(ASM_OUTPUT_ALIGNED_BSS): Add #undef before #define.



0008-minor-cleanups-in-config-i386-vxworks.h.patch
Description: Binary data


Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-27 Thread Richard Earnshaw (lists)
On 27/06/17 07:13, Michael Collison wrote:
> Richard,
> 
> I reworked the patch using an assert as you suggested. Bootstrapped and 
> retested. Okay for trunk?
> 

Yes, fine thanks.

R.

> 
> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] 
> Sent: Friday, June 23, 2017 2:09 AM
> To: Michael Collison ; GCC Patches 
> 
> Cc: nd 
> Subject: Re: [Neon intrinsics] Literal vector construction through vcombine 
> is poor
> 
> On 23/06/17 00:10, Michael Collison wrote:
>> Richard,
>>
>> I reworked the patch and retested on big endian as well as little. The 
>> original code was performing two swaps in the big endian case which works 
>> out to no swaps at all.
>>
>> I also updated the ChangeLog per your comments. Okay for trunk?
>>
>> 2017-06-19  Michael Collison  
>>
>>  * config/aarch64/aarch64-simd.md (aarch64_combine): Directly
>>  call aarch64_split_simd_combine.
>>  * (aarch64_combine_internal): Delete pattern.
>>  * config/aarch64/aarch64.c (aarch64_split_simd_combine):
>>  Allow register and subreg operands.
>>
>> -Original Message-
>> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com]
>> Sent: Monday, June 19, 2017 6:37 AM
>> To: Michael Collison ; GCC Patches 
>> 
>> Cc: nd 
>> Subject: Re: [Neon intrinsics] Literal vector construction through 
>> vcombine is poor
>>
>> On 16/06/17 22:08, Michael Collison wrote:
>>> This patch improves code generation for literal vector construction by 
>>> expanding and exposing the pattern to rtl optimization earlier. The current 
>>> implementation delays splitting the pattern until after reload which 
>>> results in poor code generation for the following code:
>>>
>>>
>>> #include "arm_neon.h"
>>>
>>> int16x8_t
>>> foo ()
>>> {
>>>   return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); }
>>>
>>> Trunk generates:
>>>
>>> foo:
>>> moviv1.2s, 0
>>> moviv0.4h, 0x8
>>> dup d2, v1.d[0]
>>> ins v2.d[1], v0.d[0]
>>> orr v0.16b, v2.16b, v2.16b
>>> ret
>>>
>>> With the patch we now generate:
>>>
>>> foo:
>>> moviv1.4h, 0x8
>>> moviv0.4s, 0
>>> ins v0.d[1], v1.d[0]
>>> ret
>>>
>>> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk.
>>>
>>> 2017-06-15  Michael Collison  
>>>
>>> * config/aarch64/aarch64-simd.md(aarch64_combine_internal):
>>> Convert from define_insn_and_split into define_expand
>>> * config/aarch64/aarch64.c(aarch64_split_simd_combine):
>>> Allow register and subreg operands.
>>>
>>
>> Your changelog entry is confusing.  You've deleted the 
>> aarch64_combine_internal pattern entirely, having merged some of its 
>> functionality directly into its caller (aarch64_combine).
>>
>> So I think it should read:
>>
>> * config/aarch64/aarch64-simd.md (aarch64_combine): Directly call 
>> aarch64_split_simd_combine.
>> (aarch64_combine_internal): Delete pattern.
>> * ...
>>
>> Note also there should be a space between the file name and the open bracket 
>> for the first function name.
>>
>> Why don't you need the big-endian code path any more?
>>
>> R.
>>
>>>
>>> pr7057.patch
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>> b/gcc/config/aarch64/aarch64-simd.md
>>> index c462164..4a253a9 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -2807,27 +2807,11 @@
>>>op1 = operands[1];
>>>op2 = operands[2];
>>>  }
>>> -  emit_insn (gen_aarch64_combine_internal (operands[0], op1, 
>>> op2));
>>> -  DONE;
>>> -}
>>> -)
>>>  
>>> -(define_insn_and_split "aarch64_combine_internal"
>>> -  [(set (match_operand: 0 "register_operand" "=")
>>> -(vec_concat: (match_operand:VDC 1 "register_operand" "w")
>>> -  (match_operand:VDC 2 "register_operand" "w")))]
>>> -  "TARGET_SIMD"
>>> -  "#"
>>> -  "&& reload_completed"
>>> -  [(const_int 0)]
>>> -{
>>> -  if (BYTES_BIG_ENDIAN)
>>> -aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
>>> -  else
>>> -aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
>>> +  aarch64_split_simd_combine (operands[0], op1, op2);
>>> +
>>>DONE;
>>>  }
>>> -[(set_attr "type" "multiple")]
>>>  )
>>>  
>>>  (define_expand "aarch64_simd_combine"
>>> diff --git a/gcc/config/aarch64/aarch64.c 
>>> b/gcc/config/aarch64/aarch64.c index 2e385c4..46bd78b 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, 
>>> rtx src2)
>>>  
>>>gcc_assert (VECTOR_MODE_P (dst_mode));
>>>  
>>> -  if (REG_P (dst) && REG_P (src1) && REG_P (src2))
>>> +  if (register_operand (dst, dst_mode) && register_operand (src1, src_mode)
>>> +  && register_operand (src2, 

x86_64-vxworks* support, vxworks target files updates

2017-06-27 Thread Olivier Hainque
Hello,

Minor updates to the x86 specific config files
for VxWorks, to account for 64bit ABIs.

Tested together with the previous patches in the
vx7 series, for x86_64-vxworks7 in particular.

Committing to mainline.

With Kind Regards,

Olivier


2017-06-27  Jerome Lambourg  
Olivier Hainque  

* config/i386/vxworks.h (DBX_REGISTER_NUMBER): Pick distinct
map for 64bits.
(TARGET_OS_CPP_BUILTINS): builtin_define CPU to X86_64 for 64bit
targets. Pick a default if no particular attempt applied.
(STACK_CHECK_PROTECT): Double for 64bit targets, which have
larger contexts.



0007-support-for-x86-and-x86_64-vxworks7-vxworks-config-f.patch
Description: Binary data


[PATCH][AArch64] Fix ILP32 memory access

2017-06-27 Thread Wilco Dijkstra
This patch fixes a failure in gcc.target/aarch64/reload-valid-spoff.c 
triggered by https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01367.html -
it supersedes https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01907.html
as this fixes the root cause of the failure.

In ILP32 all memory accesses must have Pmode as the base address, but
aarch64_expand_mov_immediate wasn't emitting a conversion in one case.
Besides fixing this add an assert that flags any MEM operands that are
not Pmode.

Passes regress (with/without ilp32). OK for commit?

ChangeLog:
2017-06-27  Wilco Dijkstra  

* config/aarch64/aarch64 (aarch64_expand_mov_immediate):
Convert memory address to Pmode.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
329d244e9cf16dbdf849e5dd02b3999caf0cd5a7..9038748ba049ba589f067f3f04c31704fe673d2c
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1958,6 +1958,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
  gcc_assert (can_create_pseudo_p ());
  base = gen_reg_rtx (ptr_mode);
  aarch64_expand_mov_immediate (base, XEXP (mem, 0));
+ if (ptr_mode != Pmode)
+   base = convert_memory_address (Pmode, base);
  mem = gen_rtx_MEM (ptr_mode, base);
}
 
@@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 
case MEM:
  output_address (GET_MODE (x), XEXP (x, 0));
+ gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
  break;
 
case CONST:


Re: adjust libgcc build support for VxWorks to VxWorks 7

2017-06-27 Thread Olivier Hainque

> On Jun 27, 2017, at 15:25 , Ian Lance Taylor  wrote:
> 
> On Tue, Jun 27, 2017 at 5:17 AM, Olivier Hainque  wrote:
>> 
>> 2017-06-27  Olivier Hainque  
>> 
>>libgcc/
>>* config/t-vxworks7: New file.
>>* config.host (*-*-vxworks7): Use it.
> 
> This is OK.

Great, thanks for your super prompt feedback Ian!




Re: adjust libgcc build support for VxWorks to VxWorks 7

2017-06-27 Thread Ian Lance Taylor via gcc-patches
On Tue, Jun 27, 2017 at 5:17 AM, Olivier Hainque  wrote:
>
> 2017-06-27  Olivier Hainque  
>
> libgcc/
> * config/t-vxworks7: New file.
> * config.host (*-*-vxworks7): Use it.

This is OK.

Thanks.

Ian


[PATCH] Fix PR bootstrap/81217

2017-06-27 Thread Martin Liška
Hello.

Following fixes the PR by removal of superfluous bootstrap_target.

Ready to be installed?

ChangeLog:

2017-06-27  Martin Liska  

PR bootstrap/81217
* Makefile.def: Remove superfluous bootstrap_target from
bootstrap_stage.
* Makefile.in: Re-generate the file.
---
 Makefile.def |  3 +--
 Makefile.in  | 23 ---
 2 files changed, 1 insertion(+), 25 deletions(-)


diff --git a/Makefile.def b/Makefile.def
index 08d0dc08a46..bd7b080d905 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -623,8 +623,7 @@ bootstrap_stage = {
 bootstrap_stage = {
 	id=profile ; prev=1 ; };
 bootstrap_stage = {
-	id=train; prev=profile ;
-	bootstrap_target=profiledbootstrap ; };
+	id=train; prev=profile ; } ;
 bootstrap_stage = {
 	id=feedback ; prev=train;
 	bootstrap_target=profiledbootstrap ; };
diff --git a/Makefile.in b/Makefile.in
index 2e2e504e106..78db0982ba2 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -56151,29 +56151,6 @@ do-clean: clean-stagetrain
 
 
 
-.PHONY: profiledbootstrap profiledbootstrap-lean
-profiledbootstrap:
-	echo stagetrain > stage_final
-	@r=`${PWD_COMMAND}`; export r; \
-	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	$(MAKE) $(RECURSE_FLAGS_TO_PASS) stagetrain-bubble
-	@: $(MAKE); $(unstage)
-	@r=`${PWD_COMMAND}`; export r; \
-	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	TFLAGS="$(STAGEtrain_TFLAGS)"; \
-	$(MAKE) $(TARGET_FLAGS_TO_PASS) all-host all-target
-
-profiledbootstrap-lean:
-	echo stagetrain > stage_final
-	@r=`${PWD_COMMAND}`; export r; \
-	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	$(MAKE) $(RECURSE_FLAGS_TO_PASS) LEAN=: stagetrain-bubble
-	@: $(MAKE); $(unstage)
-	@r=`${PWD_COMMAND}`; export r; \
-	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	TFLAGS="$(STAGEtrain_TFLAGS)"; \
-	$(MAKE) $(TARGET_FLAGS_TO_PASS) all-host all-target
-
 
 # Rules to wipe a stage and all the following ones, also used for cleanstrap
 distclean-stageprofile:: distclean-stagetrain 



Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-27 Thread Marek Polacek
On Mon, Jun 26, 2017 at 10:37:03AM -0600, Martin Sebor wrote:
> On 06/23/2017 08:46 AM, Marek Polacek wrote:
> > This patch adds a variant of __typeof, called __typeof_noqual.  As the name
> > suggests, this variant always drops all qualifiers, not just when the type
> > is atomic.  This was discussed several times in the past, see e.g.
> > 
> > or
> > 
> > It's been brought to my attention again here:
> > 
> > 
> > One approach would be to just modify the current __typeof, but that could
> > cause some incompatibilities, I'm afraid.  This is based on rth's earlier
> > patch:  but I
> > didn't do the address space-stripping variant __typeof_noas.  I also added
> > a couple of missing things.
> 
> I haven't reviewed all the discussions super carefully so I wonder
> what alternatives have been considered.  For instance, it seems to
> me that it should be possible to emulate __typeof_noqual__ by relying
> on the atomic built-ins' type-genericity.  E.g., like this:
> 
>   #define __typeof_noqual__(x) \
> __typeof__ (__atomic_load_n ((__typeof__ (x)*)0, 0))

This doesn't seem to work with structs/arrays/VLA, so wouldn't help.
(typeof can't handle bit-fields, so no need to worry about those.)

Another thing, with the current patch, __typeof_noqual__(const int)
would still produce "const int".  With the __atomic_load_n proposal
it'd return "int".  I don't know what we want to do for typenames,
but __typeof__(_Atomic int) produces "atomic int".

> Alternatively, adding support for lower-level C-only primitives like
> __remove_const and __remove_volatile, to parallel the C++ library
> traits, might provide a more general solution and avoid introducing
> yet another mechanism for determining the type of an expression to
> the languages (C++ already has a few).

I don't know if that wouldn't be overkill.  Qualifiers on rvalues are
meaningless in C and that's why my __typeof_noqual strips them all.
We'd then need even e.g. __remove_restrict, not sure if there's need for
these.  Maybe it is.

> > +@code{typeof_noqual} behaves the same except that it strips type qualifiers
> > +such as @code{const} and @code{volatile}, if given an expression.  This can
> > +be useful for certain macros when passed const arguments:
> > +
> > +@smallexample
> > +#define MAX(__x, __y)  \
> > +  (@{  \
> > +  __typeof_noqual(__x) __ret = __x;\
> > +  if (__y > __ret) __ret = __y;\
> > +__ret; \
> > +  @})
> 
> The example should probably avoid using reserved names (with
> leading/double underscores).

No, because "typeof_noqual" isn't supported (but was in the first version
of the patch).

Marek


x86-vxworks7 and x86_64-vxworks7 support, "config" files

2017-06-27 Thread Olivier Hainque
Hello,

The "config" files bits for the x86 families of targets
for which VxWorks 7 support is being introduced.

Just accept the triplets on the libgcc end and handle the
common need of x86-64.h for x86_64 on the gcc front.

Tested together with the previous patches of the vxworks7
series. Committing to mainline.

With Kind Regards,

Olivier

2017-06-27  Jerome Lambourg  

gcc/
* config.gcc: Handle i*86-vxworks7 and x86_64-vxworks7.

libgcc/
* config.host: Likewise.



0006-support-for-x86-and-x86_64-VxWorks-7-config-files.patch
Description: Binary data


[PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-06-27 Thread Martin Liška
Hi.

Following bug was for me very educative. I learned that we support non-local 
gotos that can be combined
with nested functions. Real fun :) Well, the problem is that both 
cfun->nonlocal_goto_save_area and
cfun->static_chain_decl (emitted in expand_function_start) are put before 
NOTE_INSN_FUNCTION_BEG.
And so result of expand_used_vars is put after these instrumentations. That 
causes problems as it uses
stack before we initialize it (use-after-return checking):

expanded cfun->static_chain_decl:

(note 1 0 5 NOTE_INSN_DELETED)
(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 5 3 2 (set (reg/f:DI 88 [ CHAIN.1 ])
(reg:DI 39 r10 [ CHAIN.1 ])) "pr81186.c":5 -1
 (nil))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -8 [0xfff8])) [0  S8 A64])
(reg:DI 39 r10 [ CHAIN.1 ])) "pr81186.c":5 -1
 (nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg/f:DI 87 [ _2 ])
(reg/f:DI 88 [ CHAIN.1 ])) "pr81186.c":5 -1
 (nil))
(call_insn 8 7 9 2 (call (mem:QI (symbol_ref:DI ("__asan_handle_no_return") 
[flags 0x41]  ) 
[0 __builtin___asan_handle_no_return S1 A8])
(const_int 0 [0])) "pr81186.c":5 -1
 (expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(nil))

expanded cfun->nonlocal_goto_save_area:

(note 1 0 34 NOTE_INSN_DELETED)
(note 34 1 31 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 31 34 32 2 (set (mem/f/c:DI (plus:DI (reg:DI 95)
(const_int -64 [0xffc0])) [4 
FRAME.0.__nl_goto_buf+0 S8 A64])
(reg/f:DI 82 virtual-stack-vars)) "pr81186.c":3 -1
 (nil))
(insn 32 31 2 2 (set (mem/f/c:DI (plus:DI (reg:DI 95)
(const_int -56 [0xffc8])) [4 
FRAME.0.__nl_goto_buf+8 S8 A64])
(reg/f:DI 7 sp)) "pr81186.c":3 -1
 (nil))
(insn 2 32 3 2 (parallel [
(set (reg:DI 96)
(plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -96 [0xffa0])))
(clobber (reg:CC 17 flags))
]) "pr81186.c":3 -1
 (nil))
(insn 3 2 4 2 (set (reg:DI 97)
(reg:DI 96)) "pr81186.c":3 -1
 (nil))
(insn 4 3 5 2 (set (reg:CCZ 17 flags)
(compare:CCZ (mem/c:SI (symbol_ref:DI 
("__asan_option_detect_stack_use_after_return") [flags 0x40]  ) [5 
__asan_option_detect_stack_use_after_return+0 S4 A32])
(const_int 0 [0]))) "pr81186.c":3 -1
 (nil))

And thus I suggest to move both these instrumentations after 
NOTE_INSN_FUNCTION_BEG.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-06-27  Martin Liska  

PR sanitize/81186
* function.c (expand_function_start): Move static chain and non-local
goto init after NOTE_INSN_FUNCTION_BEG.

gcc/testsuite/ChangeLog:

2017-06-27  Martin Liska  

PR sanitize/81186
* gcc.dg/asan/pr81186.c: New test.
---
 gcc/function.c  | 18 +-
 gcc/testsuite/gcc.dg/asan/pr81186.c | 13 +
 2 files changed, 22 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr81186.c


diff --git a/gcc/function.c b/gcc/function.c
index f625489205b..5e8a56099a5 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5220,6 +5220,14 @@ expand_function_start (tree subr)
  In some cases this requires emitting insns.  */
   assign_parms (subr);
 
+  /* The following was moved from init_function_start.
+ The move is supposed to make sdb output more accurate.  */
+  /* Indicate the beginning of the function body,
+ as opposed to parm setup.  */
+  rtx_note *b = emit_note (NOTE_INSN_FUNCTION_BEG);
+
+  gcc_assert (NOTE_P (get_last_insn ()));
+
   /* If function gets a static chain arg, store it.  */
   if (cfun->static_chain_decl)
 {
@@ -5284,15 +5292,7 @@ expand_function_start (tree subr)
   update_nonlocal_goto_save_area ();
 }
 
-  /* The following was moved from init_function_start.
- The move is supposed to make sdb output more accurate.  */
-  /* Indicate the beginning of the function body,
- as opposed to parm setup.  */
-  emit_note (NOTE_INSN_FUNCTION_BEG);
-
-  gcc_assert (NOTE_P (get_last_insn ()));
-
-  parm_birth_insn = get_last_insn ();
+  parm_birth_insn = b;
 
   if (crtl->profile)
 {
diff --git a/gcc/testsuite/gcc.dg/asan/pr81186.c b/gcc/testsuite/gcc.dg/asan/pr81186.c
new file mode 100644
index 000..74d3837a482
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr81186.c
@@ -0,0 +1,13 @@
+/* PR sanitizer/81186 */
+/* { dg-do run } */
+
+int
+main ()
+{
+  __label__ l;
+  void f () { goto l; }
+
+  f ();
+l:
+  return 0;
+}



Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-27 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng  wrote:
> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng  wrote:
>> Hi,
>> I was asked by upstream to split the loop distribution patch into small ones.
>> It is hard because data structure and algorithm are closely coupled together.
>> Anyway, this is the patch series with smaller patches.  Basically I tried to
>> separate data structure and bug-fix changes apart with one as the main patch.
>> Note I only made necessary code refactoring in order to separate patch, apart
>> from that, there is no change against the last version.
>>
>> This is the first patch introducing new internal function 
>> IFN_LOOP_DIST_ALIAS.
>> GCC will distribute loops under condition of this function call.
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
> Hi,
> I need to update this patch fixing an issue in
> vect_loop_dist_alias_call.  The previous patch fails to find some
> IFN_LOOP_DIST_ALIAS calls.
>
> Bootstrap and test in series.  Is it OK?

So I wonder if we really need to track ldist_alias_id or if we can do sth
more "general", like tracking a copy_of or origin and then directly
go to nearest_common_dominator (loop->header, copy_of->header)
to find the controlling condition?

That said "ldist_alias_id" is a bit too narrow of purpose to "waste"
an int inside struct loop?  I'd set copy_of/origi in loop_version for example.
'origin' would probably be better given the ldist cases aren't really
full "copies".

fold_loop_dist_alias_call should re-use / rename fold_loop_vectorized_call
by just passing folded_value to it.

Richard.

> Thanks,
> bin
>>
>> Thanks,
>> bin
>> 2017-06-07  Bin Cheng  
>>
>> * cfgloop.h (struct loop): New field ldist_alias_id.
>> * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment change.
>> * internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
>> * internal-fn.def (LOOP_DIST_ALIAS): New.
>> * tree-vectorizer.c (vect_loop_dist_alias_call): New function.
>> (fold_loop_dist_alias_call): New function.
>> (vectorize_loops): Fold IFN_LOOP_DIST_ALIAS call depending on
>> successful vectorization or not.


[PATCH] Do not allow to inline ifunc resolvers (PR ipa/81128).

2017-06-27 Thread Martin Liška
Hello.

Currently ifunc is interpreted as normal alias by IPA optimizations. That's 
problematic
as should not consider ifunc alias as candidate for inlining, or redirection.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
And survives MVC tests on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-06-22  Martin Liska  

* ipa-inline.c (can_inline_edge_p): Return false for ifunc fns.
* ipa-visibility.c (can_replace_by_local_alias): Likewise.

gcc/c-family/ChangeLog:

2017-06-22  Martin Liska  

* c-attribs.c (handle_alias_ifunc_attribute): Append ifunc alias
to a function declaration.

gcc/testsuite/ChangeLog:

2017-06-22  Martin Liska  

* gcc.target/i386/pr81128.c: New test.
---
 gcc/c-family/c-attribs.c| 11 --
 gcc/ipa-inline.c|  2 +
 gcc/ipa-visibility.c|  3 +-
 gcc/testsuite/gcc.target/i386/pr81128.c | 65 +
 4 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81128.c


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 2b6845f2cbd..626ffa1cde7 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -1846,9 +1846,14 @@ handle_alias_ifunc_attribute (bool is_alias, tree *node, tree name, tree args,
 	TREE_STATIC (decl) = 1;
 
   if (!is_alias)
-	/* ifuncs are also aliases, so set that attribute too.  */
-	DECL_ATTRIBUTES (decl)
-	  = tree_cons (get_identifier ("alias"), args, DECL_ATTRIBUTES (decl));
+	{
+	  /* ifuncs are also aliases, so set that attribute too.  */
+	  DECL_ATTRIBUTES (decl)
+	= tree_cons (get_identifier ("alias"), args,
+			 DECL_ATTRIBUTES (decl));
+	  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("ifunc"),
+	  NULL, DECL_ATTRIBUTES (decl));
+	}
 }
   else
 {
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index fb20d3723cc..588fa9c41e4 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -370,6 +370,8 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
   e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
   inlinable = false;
 }
+  else if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (e->callee->decl)))
+  inlinable = false;
   /* Check if caller growth allows the inlining.  */
   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
 	   && !disregard_limits
diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
index d5a3ae56c46..79d05b41085 100644
--- a/gcc/ipa-visibility.c
+++ b/gcc/ipa-visibility.c
@@ -345,7 +345,8 @@ can_replace_by_local_alias (symtab_node *node)
   
   return (node->get_availability () > AVAIL_INTERPOSABLE
 	  && !decl_binds_to_current_def_p (node->decl)
-	  && !node->can_be_discarded_p ());
+	  && !node->can_be_discarded_p ()
+	  && !lookup_attribute ("ifunc", DECL_ATTRIBUTES (node->decl)));
 }
 
 /* Return true if we can replace reference to NODE by local alias
diff --git a/gcc/testsuite/gcc.target/i386/pr81128.c b/gcc/testsuite/gcc.target/i386/pr81128.c
new file mode 100644
index 000..90a567ad690
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81128.c
@@ -0,0 +1,65 @@
+/* PR ipa/81128 */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-require-ifunc "" } */
+
+
+#include 
+#include 
+#include 
+
+int resolver_fn = 0;
+int resolved_fn = 0;
+
+static inline void
+do_it_right_at_runtime_A ()
+{
+  resolved_fn++;
+}
+
+static inline void
+do_it_right_at_runtime_B ()
+{
+  resolved_fn++;
+}
+
+static inline void do_it_right_at_runtime (void);
+
+void do_it_right_at_runtime (void)
+  __attribute__ ((ifunc ("resolve_do_it_right_at_runtime")));
+
+static void (*resolve_do_it_right_at_runtime (void)) (void)
+{
+  srand (time (NULL));
+  int r = rand ();
+  resolver_fn++;
+
+  /* Use intermediate variable to get a warning for non-matching
+   * prototype. */
+  typeof(do_it_right_at_runtime) *func;
+  if (r & 1)
+func = do_it_right_at_runtime_A;
+  else
+func = do_it_right_at_runtime_B;
+
+  return (void *) func;
+}
+
+int
+main (void)
+{
+  const unsigned int ITERS = 10;
+
+  for (int i = ITERS; i > 0; i--)
+{
+  do_it_right_at_runtime ();
+}
+
+  if (resolver_fn != 1)
+__builtin_abort ();
+
+  if (resolved_fn != 10)
+__builtin_abort ();
+
+  return 0;
+}



Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-06-27 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng  wrote:
> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  wrote:
>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> This is the main patch rewriting loop distribution in order to handle hmmer.
>>> It improves loop distribution by versioning loop under runtime alias check 
>>> conditions.
>>> As described in comments, the patch basically implements distribution in 
>>> the following
>>> steps:
>>>
>>>  1) Seed partitions with specific type statements.  For now we support
>>> two types seed statements: statement defining variable used outside
>>> of loop; statement storing to memory.
>>>  2) Build reduced dependence graph (RDG) for loop to be distributed.
>>> The vertices (RDG:V) model all statements in the loop and the edges
>>> (RDG:E) model flow and control dependencies between statements.
>>>  3) Apart from RDG, compute data dependencies between memory references.
>>>  4) Starting from seed statement, build up partition by adding depended
>>> statements according to RDG's dependence information.  Partition is
>>> classified as parallel type if it can be executed paralleled; or as
>>> sequential type if it can't.  Parallel type partition is further
>>> classified as different builtin kinds if it can be implemented as
>>> builtin function calls.
>>>  5) Build partition dependence graph (PG) based on data dependencies.
>>> The vertices (PG:V) model all partitions and the edges (PG:E) model
>>> all data dependencies between every partitions pair.  In general,
>>> data dependence is either compilation time known or unknown.  In C
>>> family languages, there exists quite amount compilation time unknown
>>> dependencies because of possible alias relation of data references.
>>> We categorize PG's edge to two types: "true" edge that represents
>>> compilation time known data dependencies; "alias" edge for all other
>>> data dependencies.
>>>  6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
>>> partitions in each strong connected component (SCC) correspondingly.
>>> Build new PG for merged partitions.
>>>  7) Traverse PG again and this time with both "true" and "alias" edges
>>> included.  We try to break SCCs by removing some edges.  Because
>>> SCCs by "true" edges are all fused in step 6), we can break SCCs
>>> by removing some "alias" edges.  It's NP-hard to choose optimal
>>> edge set, fortunately simple approximation is good enough for us
>>> given the small problem scale.
>>>  8) Collect all data dependencies of the removed "alias" edges.  Create
>>> runtime alias checks for collected data dependencies.
>>>  9) Version loop under the condition of runtime alias checks.  Given
>>> loop distribution generally introduces additional overhead, it is
>>> only useful if vectorization is achieved in distributed loop.  We
>>> version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
>>> no distributed loop can be vectorized, we simply remove distributed
>>> loops and recover to the original one.
>>>
>>> Also, there are some more to improve in the future (which isn't difficult I 
>>> think):
>>>TODO:
>>>  1) We only distribute innermost loops now.  This pass should handle 
>>> loop
>>> nests in the future.
>>>  2) We only fuse partitions in SCC now.  A better fusion algorithm is
>>> desired to minimize loop overhead, maximize parallelism and maximize
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>> Trivial updated due to changes in previous patches.  Also fixed issues
>> mentioned by Kugan.
> Rebased V3 for changes in previous patches.  Bootstap and test on
> x86_64 and aarch64.

why is ldist-12.c no longer distributed?  your comment says it doesn't expose
more "parallelism" but the point is to reduce memory bandwith requirements
which it clearly does.

Likewise for -13.c, -14.c.  -4.c may be a questionable case but the wording
of "parallelism" still confuses me.

Can you elaborate on that.  Now onto the patch:

+   Loop distribution is the dual of loop fusion.  It separates statements
+   of a loop (or loop nest) into multiple loops (or loop nests) with the
+   same loop header.  The major goal is to separate statements which may
+   be vectorized from those that can't.  This pass implements distribution
+   in the following steps:

misses the goal of being a memory stream optimization, not only a vectorization
enabler.  distributing a loop can also reduce register pressure.

You introduce ldist_alias_id in struct loop (probably in 01/n which I
didn't look
into yet).  If you don't use that please introduce it separately.

+ /* Be 

Re: [PATCH 2/2] C++: bulletproof the %H and %I format codes (PR c++/81167)

2017-06-27 Thread Nathan Sidwell

On 06/22/2017 08:20 PM, David Malcolm wrote:


PR c++/81167 reports a case where a NULL is passed to one of these %qH,
and it turns out that we now ICE for this case (with a gcc_assert)
whereas previously we printed a '' for the type >
This patch slightly reworks the %H and %I-handling code so that it
gracefully handles NULL, fixing the ICE in that PR.
Whilst I was at it, I also fixed things so that if only one of the
%H/%I codes is present, we do the right thing (i.e. fall back to
the %T behavior).


I dislike this kind of silently accepting bogus use of an interface.  I 
much prefer the compiler to explode.


At least make it barf with a checking build, even if a production build 
permits the abuse.


nathan

--
Nathan Sidwell


Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-27 Thread Tom de Vries

On 06/26/2017 05:29 PM, Jakub Jelinek wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:


2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]

The filename used for dumping the module is plugin-nvptx..cubin.


Are you sure this use of getenv and writing to that file is safe for
setuid/setgid programs?  I'd expect you to need to use secure_getenv as in
plugin-hsa.c; certainly for anything that could results in writes to a
file like that.


Yeah, definitely it should be using secure_getenv/__secure_getenv.
And IMNSHO GOMP_DEBUG too.



Updated patch using secure_getenv.h.

Thanks,
- Tom
Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/plugin-nvptx.c (do_prog, debug_linkout): New function.
	(link_ptx): Use debug_linkout.

---
 libgomp/plugin/plugin-nvptx.c | 105 ++
 1 file changed, 105 insertions(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 71630b5..7aa2b3b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -47,6 +47,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #if PLUGIN_NVPTX_DYNAMIC
 # include 
@@ -138,6 +141,8 @@ init_cuda_lib (void)
 # define init_cuda_lib() true
 #endif
 
+#include "secure_getenv.h"
+
 /* Convenience macros for the frequently used CUDA library call and
error handling sequence as well as CUDA library calls that
do the error checking themselves or don't do it at all.  */
@@ -876,6 +881,104 @@ notify_var (const char *var_name, const char *env_var)
 GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
 }
 
+static void
+do_prog (const char *prog, const char *arg)
+{
+  pid_t pid = fork ();
+
+  if (pid == -1)
+{
+  GOMP_PLUGIN_error ("Fork failed");
+  return;
+}
+  else if (pid > 0)
+{
+  int status;
+  waitpid (pid, , 0);
+  if (!WIFEXITED (status))
+	GOMP_PLUGIN_error ("Running %s %s failed", prog, arg);
+}
+  else
+{
+  execlp (prog, prog /* argv[0] */, arg, NULL);
+  abort ();
+}
+}
+
+static void
+debug_linkout (void *linkout, size_t linkoutsize)
+{
+  static int gomp_openacc_nvptx_disasm = -1;
+  if (gomp_openacc_nvptx_disasm == -1)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_DISASM";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_openacc_nvptx_disasm
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  static int gomp_openacc_nvptx_save_temps = -1;
+  if (gomp_openacc_nvptx_save_temps == -1)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_SAVE_TEMPS";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_openacc_nvptx_save_temps
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  if (gomp_openacc_nvptx_disasm == 0
+  && gomp_openacc_nvptx_save_temps == 0)
+return;
+
+  const char *prefix = "plugin-nvptx.";
+  const char *postfix = ".cubin";
+  const int len =	(strlen (prefix)
+			 + 20 /* %lld.  */
+			 + strlen (postfix)
+			 + 1  /* '\0'.  */);
+  char file_name[len];
+  int res = snprintf (file_name, len, "%s%lld%s", prefix,
+		  (long long)getpid (), postfix);
+  assert (res < len); /* Assert there's no truncation.  */
+
+  GOMP_PLUGIN_debug (0, "Generating %s with size %zu\n",
+		 file_name, linkoutsize);
+  FILE *cubin_file = fopen (file_name, "wb");
+  if (cubin_file == NULL)
+{
+  GOMP_PLUGIN_debug (0, "Opening %s failed\n", file_name);
+  return;
+}
+
+  fwrite (linkout, linkoutsize, 1, cubin_file);
+  unsigned int write_succeeded = ferror (cubin_file) == 0;
+  if (!write_succeeded)
+GOMP_PLUGIN_debug (0, "Writing %s failed\n", file_name);
+
+  res = fclose (cubin_file);
+  if (res != 0)
+GOMP_PLUGIN_debug (0, "Closing %s failed\n", file_name);
+
+  if (!write_succeeded)
+return;
+
+  if (gomp_openacc_nvptx_disasm == 1)
+{
+  GOMP_PLUGIN_debug (0, "Disassembling %s\n", file_name);
+  do_prog ("nvdisasm", file_name);
+}
+
+  if (gomp_openacc_nvptx_save_temps == 0)
+{
+  GOMP_PLUGIN_debug (0, "Removing %s\n", file_name);
+  remove (file_name);
+}
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
@@ -939,6 +1042,8 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   return false;
 }
 
+  debug_linkout (linkout, linkoutsize);
+
   CUDA_CALL (cuModuleLoadData, module, linkout);
   CUDA_CALL (cuLinkDestroy, linkstate);
   return true;


[Doc, AArch64] Fix/Update AArch64 options.

2017-06-27 Thread Yvan Roux
Hi,

I just noticed that some AArch64 options (-mpc-relative-literal-loads,
-msign-return-address=scope and -moverride=string) are missing in the
option summary part of the manual:

https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html#Option-Summary

and that the "-no" version of -mpc-relative-literal-loads is missing
in AArch64 options page:

https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options

This patch fixes these issues and remove a redundant "Save" property
in mpc-relative-literal-loads description.

Tested by re-generating the manual, Ok for trunk ?

Thanks
Yvan

gcc/ChangeLog
2017-06-27  Yvan Roux  

   * config/aarch64/aarch64.opt
   (mpc-relative-literal-loads): Remove redundant property.
   * doc/invoke.texi (AArch64): Add missing options.
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 942a7d5..0fd1bfa 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -146,7 +146,7 @@ EnumValue
 Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
 
 mpc-relative-literal-loads
-Target Report Save Var(pcrelative_literal_loads) Init(2) Save
+Target Report Var(pcrelative_literal_loads) Init(2) Save
 PC relative literal loads.
 
 msign-return-address=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d1e097b..6e0e776 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -595,7 +595,9 @@ Objective-C and Objective-C++ Dialects}.
 -mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
 -mlow-precision-sqrt  -mno-low-precision-sqrt@gol
 -mlow-precision-div  -mno-low-precision-div @gol
--march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
+-mpc-relative-literal-loads -mno-pc-relative-literal-loads @gol
+-msign-return-address=@var{scope} @gol
+-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  -moverride=@var{string}}
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -14158,8 +14160,10 @@ across releases.
 This option is only intended to be useful when developing GCC.
 
 @item -mpc-relative-literal-loads
+@item -mno-pc-relative-literal-loads
 @opindex mpc-relative-literal-loads
-Enable PC-relative literal loads.  With this option literal pools are
+@opindex mno-pc-relative-literal-loads
+Enable or disable PC-relative literal loads.  With this option literal pools are
 accessed using a single instruction and emitted after each function.  This
 limits the maximum size of functions to 1MB.  This is enabled by default for
 @option{-mcmodel=tiny}.


adjust libgcc build support for VxWorks to VxWorks 7

2017-06-27 Thread Olivier Hainque
Hello,

This replicates for VxWorks 7 the "trick" used to let libgcc build for VxWorks
in absence of fixincludes, namely the last piece libgcc/config/t-vxworks:

...
# This ensures that the correct target headers are used; some
# VxWorks system headers have names that collide with GCC's
# internal (host) headers, e.g. regs.h.
LIBGCC2_INCLUDES = -nostdinc \
  `case "/$(MULTIDIR)" in \
 */mrtp*) echo -I$(WIND_USR)/h -I$(WIND_USR)/h/wrn/coreip ;; \
 *) echo -I$(WIND_BASE)/target/h -I$(WIND_BASE)/target/h/wrn/coreip ;; \
   esac`

As an alternative resolution, fixincludes can help with this problem and we
have "hacks" for this purpose. They aren't quite complete, however, and getting
fixincludes to work properly gets significantly more difficult with VxWorks 7
because sets of include subdirs need to be looked at. The issue raised on
https://gcc.gnu.org/ml/gcc/2017-06/msg3.html adds on top, and overall, the
fewer dependencies we have on fixincludes for VxWorks, the better.

The attached patch reinstates the existing trick for VxWorks 7, providing an
alternate t-vxworks7 file adjusting the set of system header file directories
to consider.  The rest is copied verbatim. It's too small to warrant sharing
and this will facilitate vx7 specific adjustments to the other parameters in
the future if need be.

As for the other patches of the vx7 series, this was included in a preliminary
gcc-7 based port in-house which passes almost 100% of the ACATS tests for Ada,
and lets the toolchain+libgcc build in absence of fixincludes on mainline.

With Kind Regards,

Olivier

2017-06-27  Olivier Hainque  

libgcc/
* config/t-vxworks7: New file.
* config.host (*-*-vxworks7): Use it.



0005-adjust-libgcc-build-support-for-VxWorks-to-VxWorks-7.patch
Description: Binary data


  1   2   >