Re: [patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Uros Bizjak
On Thu, Jul 19, 2018 at 8:20 AM, Koval, Julia  wrote:
> Yes, it gives small improvements(~2%) on 557.xz on O2 and on 
> 548.exchange(~2.5%) and 500.perlbench(~1%) on Ofast in rate mode.
>
>> -Original Message-
>> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> Sent: Thursday, July 19, 2018 8:12 AM
>> To: Koval, Julia 
>> Cc: GCC Patches 
>> Subject: Re: [patch, x86] Improve memcpy/memset strategy for Skylake.
>>
>> On Thu, Jul 19, 2018 at 7:00 AM, Koval, Julia  wrote:
>> > Hi,
>> > This patch improves memset/memcpy strategy for Skylake. Ok for trunk?
>>
>> Is this patch based on some benchmark data?
>>
>> Uros.
>>
>> > * gcc/config/i386/x86-tune-costs.h (skylake_memcpy,
>> > skylake_memcpy): Replace rep_prefix with unrolling on 512.

OK for mainline with a fixed ChangeLog entry, something like:

...: Replace rep_prefix with unrolling for size 512.

Thanks,
Uros.


RE: [patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Koval, Julia
Yes, it gives small improvements(~2%) on 557.xz on O2 and on 
548.exchange(~2.5%) and 500.perlbench(~1%) on Ofast in rate mode. 

> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Thursday, July 19, 2018 8:12 AM
> To: Koval, Julia 
> Cc: GCC Patches 
> Subject: Re: [patch, x86] Improve memcpy/memset strategy for Skylake.
> 
> On Thu, Jul 19, 2018 at 7:00 AM, Koval, Julia  wrote:
> > Hi,
> > This patch improves memset/memcpy strategy for Skylake. Ok for trunk?
> 
> Is this patch based on some benchmark data?
> 
> Uros.
> 
> > * gcc/config/i386/x86-tune-costs.h (skylake_memcpy,
> > skylake_memcpy): Replace rep_prefix with unrolling on 512.
> >
> > Thanks,
> > Julia
> >


Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-18 Thread Bernd Edlinger
  if (TREE_CODE (idx) != INTEGER_CST
  && TREE_CODE (argtype) == POINTER_TYPE)
{
  /* From a pointer (but not array) argument extract the variable
 index to prevent get_addr_base_and_unit_offset() from failing
 due to it.  Use it later to compute the non-constant offset
 into the string and return it to the caller.  */
  varidx = idx;
  ref = TREE_OPERAND (arg, 0);

  tree type = TREE_TYPE (arg);
  if (TREE_CODE (type) == ARRAY_TYPE
  && TREE_CODE (type) != INTEGER_TYPE)
return NULL_TREE;
}

the condition TREE_CODE(type) == ARRAY_TYPE
&& TREE_CODE (type) != INTEGER_TYPE looks funny.
Check for ARRAY_TYPE should imply != INTEGER_TYPE.

  else if (DECL_P (arg))
{
  array = arg;
  chartype = TREE_TYPE (arg);
}

chartype is only used in the if (varidx) block, but that is always zero
in this case.

  while (TREE_CODE (chartype) == ARRAY_TYPE
 || TREE_CODE (chartype) == POINTER_TYPE)
chartype = TREE_TYPE (chartype);

you multiply sizeof(chartype) with varidx but you should probably
use the type of the  TREE_OPERAND (arg, 0); above instead.

this is not in the patch, but I dont like it at all, because it compares the
size of a single initializer against the full size of the array.  But it should
be the innermost enclosing array:

  tree array_size = DECL_SIZE_UNIT (array);
  if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
return NULL_TREE;

  /* Avoid returning a string that doesn't fit in the array
 it is stored in, like
 const char a[4] = "abcde";
 but do handle those that fit even if they have excess
 initializers, such as in
 const char a[4] = "abc\000\000";
 The excess elements contribute to TREE_STRING_LENGTH()
 but not to strlen().  */
  unsigned HOST_WIDE_INT length
= strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
  if (compare_tree_int (array_size, length + 1) < 0)
return NULL_TREE;

consider the following test case:
$ cat part.c
const char a[2][3][8] = { { "a", "bb", "ccc"},
  { "", "e", "ff" } };

int main ()
{
  int n = __builtin_strlen (&a[0][1][0]);

  if (n == 30)
__builtin_abort ();
}

11413 if (!init || TREE_CODE (init) != STRING_CST)
(gdb) call debug(init)
"bb"
(gdb) n
11416 tree array_size = DECL_SIZE_UNIT (array);
(gdb) n
11417 if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
(gdb) call debug(array_size)
48
(gdb) n
11429   = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
(gdb) n
11430 if (compare_tree_int (array_size, length + 1) < 0)
(gdb) n
11433 *ptr_offset = offset;
(gdb) 




Bernd.

Re: [patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Uros Bizjak
On Thu, Jul 19, 2018 at 7:00 AM, Koval, Julia  wrote:
> Hi,
> This patch improves memset/memcpy strategy for Skylake. Ok for trunk?

Is this patch based on some benchmark data?

Uros.

> * gcc/config/i386/x86-tune-costs.h (skylake_memcpy,
> skylake_memcpy): Replace rep_prefix with unrolling on 512.
>
> Thanks,
> Julia
>


[patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Koval, Julia
Hi,
This patch improves memset/memcpy strategy for Skylake. Ok for trunk?

* gcc/config/i386/x86-tune-costs.h (skylake_memcpy,
skylake_memcpy): Replace rep_prefix with unrolling on 512.

Thanks,
Julia



0001-memset.patch
Description: 0001-memset.patch


Re: [PATCH], Remove undocumented -mtoc-fusion from PowerPC

2018-07-18 Thread Segher Boessenkool
Hi Mike,

On Fri, Jul 13, 2018 at 04:56:13PM -0400, Michael Meissner wrote:
> This means rather than keeping the toc fusion around (that nobody used), I
> would prefer to delete the current code, and replace it with better code as I
> implement it.


> +++ gcc/config/rs6000/constraints.md  (working copy)

> +;; wG is now available.  Previously it was a memory operand suitable for TOC
> +;; fusion.

There are many other constraints unused.  Keep track of all, instead?
Like we have (at the top of this file)
;; Available constraint letters: e k q t u A B C D S T
you could do something similar for the "w" names.


> --- gcc/config/rs6000/predicates.md   (revision 262647)
> +++ gcc/config/rs6000/predicates.md   (working copy)
> @@ -412,7 +412,7 @@ (define_predicate "fpr_reg_operand"
>  ;;
>  ;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
>  ;; power9 fusion allow the floating point types.
> -(define_predicate "toc_fusion_or_p9_reg_operand"
> +(define_predicate "p9_fusion_reg_operand"

The comment before this needs fixing, too:

;; Return true if this is a register that can has D-form addressing (GPR and
;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
;; addressing for scalars in Altivec registers.
;;
;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
;; power9 fusion allow the floating point types.

It's not clear to me what this really stands for (not before the patch,
either).


Okay for trunk.  Thanks!  Please follow up to the above two things.


Segher


Re: [C++ PATCH] Further get_identifier ("string literal") C++ FE caching

2018-07-18 Thread Jakub Jelinek
On Wed, Jul 18, 2018 at 06:00:20PM -0400, Nathan Sidwell wrote:
> So cool! Thanks.

Ok for both patches or just this one?

Jakub


Re: [C++ PATCH] Further get_identifier ("string literal") C++ FE caching

2018-07-18 Thread Nathan Sidwell
So cool! Thanks.
Sorry for the top posting
nathan-- Nathan Sidwell
 Original message From: Jakub Jelinek  Date: 
7/18/18  17:04  (GMT-05:00) To: Nathan Sidwell  Cc: Jason 
Merrill , gcc-patches@gcc.gnu.org Subject: [C++ PATCH] 
Further get_identifier ("string literal") C++ FE caching 
On Wed, Jul 18, 2018 at 12:19:31PM +0200, Jakub Jelinek wrote:
> Shall I submit an incremental patch for the "abi_tag", "gnu", "begin", "end", 
> "get",
> "tuple_size", "tuple_element" etc. identifiers?

Here it is in an incremental patch.  I've tried to do it only for
get_identifier ("string literal") calls that can be called many times during
parsing rather than just at most once, and aren't related to -fgnu-tm,
-fopenmp, Obj-C++ or vtv.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-07-18  Jakub Jelinek  

* cp-tree.h (enum cp_tree_index): Add
CPTI_{ABI_TAG,ALIGNED,BEGIN,END,GET,TUPLE_{ELEMENT,SIZE}}_IDENTIFIER
and CPTI_{GNU,TYPE,VALUE,FUN,CLOSURE}_IDENTIFIER.
(abi_tag_identifier, aligned_identifier, begin_identifier,
end_identifier, get__identifier, gnu_identifier,
tuple_element_identifier, tuple_size_identifier, type_identifier,
value_identifier, fun_identifier, closure_identifier): Define.
* decl.c (initialize_predefined_identifiers): Initialize the above
identifiers.
(get_tuple_size): Use tuple_size_identifier instead of
get_identifier ("tuple_size") and value_identifier instead of
get_identifier ("value").
(get_tuple_element_type): Use tuple_element_identifier instead of
get_identifier ("tuple_element") and type_identifier instead of
get_identifier ("type").
(get_tuple_decomp_init): Use get__identifier instead of
get_identifier ("get").
* lambda.c (maybe_add_lambda_conv_op): Use fun_identifier instead of
get_identifier ("_FUN").
* parser.c (cp_parser_lambda_declarator_opt): Use closure_identifier
instead of get_identifier ("__closure").
(cp_parser_std_attribute): Use gnu_identifier instead of
get_identifier ("gnu").
(cp_parser_std_attribute_spec): Likewise.  Use aligned_identifier
instead of get_identifier ("aligned").
* class.c (check_abi_tags, inherit_targ_abi_tags): Use
abi_tag_identifier instead of get_identifier ("abi_tag").

--- gcc/cp/cp-tree.h.jj 2018-07-18 11:57:55.980529748 +0200
+++ gcc/cp/cp-tree.h2018-07-18 18:52:44.805248036 +0200
@@ -160,6 +160,18 @@ enum cp_tree_index
 CPTI_FOR_RANGE_IDENTIFIER,
 CPTI_FOR_BEGIN_IDENTIFIER,
 CPTI_FOR_END_IDENTIFIER,
+    CPTI_ABI_TAG_IDENTIFIER,
+    CPTI_ALIGNED_IDENTIFIER,
+    CPTI_BEGIN_IDENTIFIER,
+    CPTI_END_IDENTIFIER,
+    CPTI_GET_IDENTIFIER,
+    CPTI_GNU_IDENTIFIER,
+    CPTI_TUPLE_ELEMENT_IDENTIFIER,
+    CPTI_TUPLE_SIZE_IDENTIFIER,
+    CPTI_TYPE_IDENTIFIER,
+    CPTI_VALUE_IDENTIFIER,
+    CPTI_FUN_IDENTIFIER,
+    CPTI_CLOSURE_IDENTIFIER,
 
 CPTI_LANG_NAME_C,
 CPTI_LANG_NAME_CPLUSPLUS,
@@ -286,6 +298,18 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define for_range_identifier   
cp_global_trees[CPTI_FOR_RANGE_IDENTIFIER]
 #define for_begin_identifier   
cp_global_trees[CPTI_FOR_BEGIN_IDENTIFIER]
 #define for_end_identifier cp_global_trees[CPTI_FOR_END_IDENTIFIER]
+#define abi_tag_identifier cp_global_trees[CPTI_ABI_TAG_IDENTIFIER]
+#define aligned_identifier cp_global_trees[CPTI_ALIGNED_IDENTIFIER]
+#define begin_identifier   cp_global_trees[CPTI_BEGIN_IDENTIFIER]
+#define end_identifier cp_global_trees[CPTI_END_IDENTIFIER]
+#define get__identifier
cp_global_trees[CPTI_GET_IDENTIFIER]
+#define gnu_identifier cp_global_trees[CPTI_GNU_IDENTIFIER]
+#define tuple_element_identifier   
cp_global_trees[CPTI_TUPLE_ELEMENT_IDENTIFIER]
+#define tuple_size_identifier  
cp_global_trees[CPTI_TUPLE_SIZE_IDENTIFIER]
+#define type_identifier
cp_global_trees[CPTI_TYPE_IDENTIFIER]
+#define value_identifier   cp_global_trees[CPTI_VALUE_IDENTIFIER]
+#define fun_identifier cp_global_trees[CPTI_FUN_IDENTIFIER]
+#define closure_identifier cp_global_trees[CPTI_CLOSURE_IDENTIFIER]
 #define lang_name_ccp_global_trees[CPTI_LANG_NAME_C]
 #define lang_name_cplusplus
cp_global_trees[CPTI_LANG_NAME_CPLUSPLUS]
 
--- gcc/cp/decl.c.jj2018-07-18 11:59:06.220595473 +0200
+++ gcc/cp/decl.c   2018-07-18 18:52:58.676265952 +0200
@@ -4050,6 +4050,18 @@ initialize_predefined_identifiers (void)
 {"__for_range", &for_range_identifier, cik_normal},
 {"__for_begin", &for_begin_identifier, cik_normal},
 {"__for_end", &for_end_identifier, cik_normal},
+    {"abi_tag", &abi_tag_identifier, cik_normal},
+    {"aligned", &aligned_identifier, cik_normal},
+    {"begin", &begin_id

RFC: Patch to implement Aarch64 SIMD ABI

2018-07-18 Thread Steve Ellcey
This is a patch to support the Aarch64 SIMD ABI [1] in GCC.  I intend
to eventually follow this up with two more patches; one to define the
TARGET_SIMD_CLONE* macros and one to improve the GCC register
allocation/usage when calling SIMD functions.

The significant difference between the standard ARM ABI and the SIMD ABI
is that in the normal ABI a callee saves only the lower 64 bits of registers
V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
V8-V23.

This patch checks for SIMD functions and saves the extra registers when
needed.  It does not change the caller behavour, so with just this patch
there may be values saved by both the caller and callee.  This is not
efficient, but it is correct code.

This patch bootstraps and passes the GCC testsuite but that only verifies
I haven't broken anything, it doesn't validate the handling of SIMD functions.
I tried to write some tests, but I could never get GCC to generate code
that would save the FP callee-save registers in the prologue.  Complex code
might generate spills and fills but it never triggered the prologue/epilogue
code to save V8-V23.  If anyone has ideas on how to write a test that would
cause GCC to generate this code I would appreciate some ideas.  Just doing
lots of calculations with lots of intermediate values doesn't seem to be enough.

Steve Ellcey
sell...@cavium.com

[1] 
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi


2018-07-18  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_function_p): New function.
(aarch64_layout_frame): Check for simd function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (FP_SIMD_SAVED_REGNUM_P): New define.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1369704..b25da11 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1026,6 +1026,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1404,6 +1413,18 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_function_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -4034,6 +4055,7 @@ aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;
+  bool simd_function = aarch64_simd_function_p (cfun->decl);
 
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
@@ -4068,7 +4090,8 @@ aarch64_layout_frame (void)
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
-	&& !call_used_regs[regno])
+	&& (!call_used_regs[regno]
+	|| (simd_function && FP_SIMD_SAVED_REGNUM_P (regno
   {
 	cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
 	last_fp_reg = regno;
@@ -4105,7 +4128,8 @@ aarch64_layout_frame (void)
   {
 	/* If there is an alignment gap between integer and fp callee-saves,
 	   allocate the last fp register to it if possible.  */
-	if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+	if (regno == last_fp_reg && has_align_gap
+	&& !simd_function && (offset & 8) == 0)
 	  {
 	cfun->machine->frame.reg_offset[regno] = max_int_offset;
 	break;
@@ -4117,7 +4141,7 @@ aarch64_layout_frame (void)
 	else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
 		 && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
 	  cfun->machine->frame.wb_candidate2 = regno;
-	offset += UNITS_PER_WORD;
+	offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
   }
 
   offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PE

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-18 Thread Segher Boessenkool
Hi Carl,

On Tue, Jul 17, 2018 at 04:39:58PM -0700, Carl Love wrote:
> I was requested to backport the patch for the AIX test case failures to
> GCC 8.  The trunk patch applied cleanly to GCC 8.  I updated the
> changelog patch, built and retested the patch on:
> 
>     powerpc64le-unknown-linux-gnu (Power 8 LE)  
> powerpc64-unknown-linux-gnu (Power 8 BE)
> AIX 7200-00-01-1543 (Power 8 BE)
> 
> With no regressions.
> 
> Please let me know if it is OK to apply the patch to the GCC 8 branch. 

Sure, it's okay.  Thanks!


Segher


> 2018-07-17  Carl Love  
> 
>   Backport from mainline
>   2018-07-16  Carl Love  
> 
>   PR target/86414
>   * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
>   longdouble128.
>   * gcc.target/powerpc/divkc3-3.c: Ditto.
>   * gcc.target/powerpc/mulkc3-2.c: Ditto.
>   * gcc.target/powerpc/mulkc3-3.c: Ditto.
>   * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
>   * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.


[C++ PATCH] Further get_identifier ("string literal") C++ FE caching

2018-07-18 Thread Jakub Jelinek
On Wed, Jul 18, 2018 at 12:19:31PM +0200, Jakub Jelinek wrote:
> Shall I submit an incremental patch for the "abi_tag", "gnu", "begin", "end", 
> "get",
> "tuple_size", "tuple_element" etc. identifiers?

Here it is in an incremental patch.  I've tried to do it only for
get_identifier ("string literal") calls that can be called many times during
parsing rather than just at most once, and aren't related to -fgnu-tm,
-fopenmp, Obj-C++ or vtv.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-07-18  Jakub Jelinek  

* cp-tree.h (enum cp_tree_index): Add
CPTI_{ABI_TAG,ALIGNED,BEGIN,END,GET,TUPLE_{ELEMENT,SIZE}}_IDENTIFIER
and CPTI_{GNU,TYPE,VALUE,FUN,CLOSURE}_IDENTIFIER.
(abi_tag_identifier, aligned_identifier, begin_identifier,
end_identifier, get__identifier, gnu_identifier,
tuple_element_identifier, tuple_size_identifier, type_identifier,
value_identifier, fun_identifier, closure_identifier): Define.
* decl.c (initialize_predefined_identifiers): Initialize the above
identifiers.
(get_tuple_size): Use tuple_size_identifier instead of
get_identifier ("tuple_size") and value_identifier instead of
get_identifier ("value").
(get_tuple_element_type): Use tuple_element_identifier instead of
get_identifier ("tuple_element") and type_identifier instead of
get_identifier ("type").
(get_tuple_decomp_init): Use get__identifier instead of
get_identifier ("get").
* lambda.c (maybe_add_lambda_conv_op): Use fun_identifier instead of
get_identifier ("_FUN").
* parser.c (cp_parser_lambda_declarator_opt): Use closure_identifier
instead of get_identifier ("__closure").
(cp_parser_std_attribute): Use gnu_identifier instead of
get_identifier ("gnu").
(cp_parser_std_attribute_spec): Likewise.  Use aligned_identifier
instead of get_identifier ("aligned").
* class.c (check_abi_tags, inherit_targ_abi_tags): Use
abi_tag_identifier instead of get_identifier ("abi_tag").

--- gcc/cp/cp-tree.h.jj 2018-07-18 11:57:55.980529748 +0200
+++ gcc/cp/cp-tree.h2018-07-18 18:52:44.805248036 +0200
@@ -160,6 +160,18 @@ enum cp_tree_index
 CPTI_FOR_RANGE_IDENTIFIER,
 CPTI_FOR_BEGIN_IDENTIFIER,
 CPTI_FOR_END_IDENTIFIER,
+CPTI_ABI_TAG_IDENTIFIER,
+CPTI_ALIGNED_IDENTIFIER,
+CPTI_BEGIN_IDENTIFIER,
+CPTI_END_IDENTIFIER,
+CPTI_GET_IDENTIFIER,
+CPTI_GNU_IDENTIFIER,
+CPTI_TUPLE_ELEMENT_IDENTIFIER,
+CPTI_TUPLE_SIZE_IDENTIFIER,
+CPTI_TYPE_IDENTIFIER,
+CPTI_VALUE_IDENTIFIER,
+CPTI_FUN_IDENTIFIER,
+CPTI_CLOSURE_IDENTIFIER,
 
 CPTI_LANG_NAME_C,
 CPTI_LANG_NAME_CPLUSPLUS,
@@ -286,6 +298,18 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define for_range_identifier   
cp_global_trees[CPTI_FOR_RANGE_IDENTIFIER]
 #define for_begin_identifier   
cp_global_trees[CPTI_FOR_BEGIN_IDENTIFIER]
 #define for_end_identifier cp_global_trees[CPTI_FOR_END_IDENTIFIER]
+#define abi_tag_identifier cp_global_trees[CPTI_ABI_TAG_IDENTIFIER]
+#define aligned_identifier cp_global_trees[CPTI_ALIGNED_IDENTIFIER]
+#define begin_identifier   cp_global_trees[CPTI_BEGIN_IDENTIFIER]
+#define end_identifier cp_global_trees[CPTI_END_IDENTIFIER]
+#define get__identifier
cp_global_trees[CPTI_GET_IDENTIFIER]
+#define gnu_identifier cp_global_trees[CPTI_GNU_IDENTIFIER]
+#define tuple_element_identifier   
cp_global_trees[CPTI_TUPLE_ELEMENT_IDENTIFIER]
+#define tuple_size_identifier  
cp_global_trees[CPTI_TUPLE_SIZE_IDENTIFIER]
+#define type_identifier
cp_global_trees[CPTI_TYPE_IDENTIFIER]
+#define value_identifier   cp_global_trees[CPTI_VALUE_IDENTIFIER]
+#define fun_identifier cp_global_trees[CPTI_FUN_IDENTIFIER]
+#define closure_identifier cp_global_trees[CPTI_CLOSURE_IDENTIFIER]
 #define lang_name_ccp_global_trees[CPTI_LANG_NAME_C]
 #define lang_name_cplusplus
cp_global_trees[CPTI_LANG_NAME_CPLUSPLUS]
 
--- gcc/cp/decl.c.jj2018-07-18 11:59:06.220595473 +0200
+++ gcc/cp/decl.c   2018-07-18 18:52:58.676265952 +0200
@@ -4050,6 +4050,18 @@ initialize_predefined_identifiers (void)
 {"__for_range", &for_range_identifier, cik_normal},
 {"__for_begin", &for_begin_identifier, cik_normal},
 {"__for_end", &for_end_identifier, cik_normal},
+{"abi_tag", &abi_tag_identifier, cik_normal},
+{"aligned", &aligned_identifier, cik_normal},
+{"begin", &begin_identifier, cik_normal},
+{"end", &end_identifier, cik_normal},
+{"get", &get__identifier, cik_normal},
+{"gnu", &gnu_identifier, cik_normal},
+{"tuple_element", &tuple_element_identifier, cik_normal},
+{"tuple_size", &tuple_size_identifier, cik_normal},
+{"type", &type_identifie

[PING] Re: [PATCH 1/2] v5: Add "optinfo" framework

2018-07-18 Thread David Malcolm
Ping, re these patches:

"[PATCH 1/2] v5: Add "optinfo" framework"
  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00535.html
 
"[PATCH 2/2] Add "-fsave-optimization-record""
  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00536.html

Thanks
Dave

On Wed, 2018-07-11 at 07:37 -0400, David Malcolm wrote:
> Changes relative to v4:
> * eliminated optinfo subclasses as discussed
> * eliminated optinfo-internal.h, moving what remained into optinfo.h
> * added support for dump_gimple_expr_loc and dump_gimple_expr
> * more selftests
> 
> This patch implements a way to consolidate dump_* calls into
> optinfo objects, as enabling work towards being able to write out
> optimization records to a file (I'm focussing on that destination
> in this patch kit, rather than diagnostic remarks).
> 
> The patch adds the support for building optinfo instances from dump_*
> calls, but leaves implementing any *users* of them to followup
> patches.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   * Makefile.in (OBJS): Add optinfo.o.
>   * coretypes.h (class symtab_node): New forward decl.
>   (struct cgraph_node): New forward decl.
>   (class varpool_node): New forward decl.
>   * dump-context.h: New file.
>   * dumpfile.c: Include "optinfo.h", "dump-context.h",
> "cgraph.h",
>   "tree-pass.h".
>   (refresh_dumps_are_enabled): Use optinfo_enabled_p.
>   (set_dump_file): Call dumpfile_ensure_any_optinfo_are_flushed.
>   (set_alt_dump_file): Likewise.
>   (dump_context::~dump_context): New dtor.
>   (dump_gimple_stmt): Move implementation to...
>   (dump_context::dump_gimple_stmt): ...this new member function.
>   Add the stmt to any pending optinfo, creating one if need be.
>   (dump_gimple_stmt_loc): Move implementation to...
>   (dump_context::dump_gimple_stmt_loc): ...this new member
> function.
>   Start a new optinfo and add the stmt to it.
>   (dump_gimple_expr): Move implementation to...
>   (dump_context::dump_gimple_expr): ...this new member function.
>   Add the stmt to any pending optinfo, creating one if need be.
>   (dump_gimple_expr_loc): Move implementation to...
>   (dump_context::dump_gimple_expr_loc): ...this new member
> function.
>   Start a new optinfo and add the stmt to it.
>   (dump_generic_expr): Move implementation to...
>   (dump_context::dump_generic_expr): ...this new member function.
>   Add the tree to any pending optinfo, creating one if need be.
>   (dump_generic_expr_loc): Move implementation to...
>   (dump_context::dump_generic_expr_loc): ...this new member
>   function.  Add the tree to any pending optinfo, creating one if
>   need be.
>   (dump_printf): Move implementation to...
>   (dump_context::dump_printf_va): ...this new member
> function.  Add
>   the text to any pending optinfo, creating one if need be.
>   (dump_printf_loc): Move implementation to...
>   (dump_context::dump_printf_loc_va): ...this new member
> function.
>   Start a new optinfo and add the stmt to it.
>   (dump_dec): Move implementation to...
>   (dump_context::dump_dec): ...this new member function.  Add the
>   value to any pending optinfo, creating one if need be.
>   (dump_context::dump_symtab_node): New member function.
>   (dump_context::get_scope_depth): New member function.
>   (dump_context::begin_scope): New member function.
>   (dump_context::end_scope): New member function.
>   (dump_context::ensure_pending_optinfo): New member function.
>   (dump_context::begin_next_optinfo): New member function.
>   (dump_context::end_any_optinfo): New member function.
>   (dump_context::s_current): New global.
>   (dump_context::s_default): New global.
>   (dump_scope_depth): Delete global.
>   (dumpfile_ensure_any_optinfo_are_flushed): New function.
>   (dump_symtab_node): New function.
>   (get_dump_scope_depth): Reimplement in terms of dump_context.
>   (dump_begin_scope): Likewise.
>   (dump_end_scope): Likewise.
>   (selftest::temp_dump_context::temp_dump_context): New ctor.
>   (selftest::temp_dump_context::~temp_dump_context): New dtor.
>   (selftest::verify_item): New function.
>   (ASSERT_IS_TEXT): New macro.
>   (ASSERT_IS_TREE): New macro.
>   (ASSERT_IS_GIMPLE): New macro.
>   (selftest::test_capture_of_dump_calls): New test.
>   (selftest::dumpfile_c_tests): Call it.
>   * dumpfile.h (dump_printf, dump_printf_loc, dump_basic_block)
>   (dump_generic_expr_loc, dump_generic_expr,
> dump_gimple_stmt_loc)
>   (dump_gimple_stmt, dump_dec): Gather these related decls and
> add a
>   descriptive comment.
>   (dump_function, print_combine_total_stats,
> enable_rtl_dump_file)
>   (dump_node, dump_bb): Move these unrelated decls.
>   (class dump_manager): Add leading comment.
>   * op

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-18 Thread Martin Sebor

+  while (TREE_CODE (chartype) != INTEGER_TYPE)
+chartype = TREE_TYPE (chartype);

This is a bit concerning.  First under what conditions is chartype not
going to be an INTEGER_TYPE?  And under what conditions will extracting
its type ultimately lead to something that is an INTEGER_TYPE?


chartype is usually (maybe even always) pointer type here:

  const char a[] = "123";
  extern int i;
  n = strlen (&a[i]);


But your hunch was correct that the loop isn't safe because
the element type need not be an integer (I didn't know/forgot
that the function is called for non-strings too).  The loop
should be replaced by:

  while (TREE_CODE (chartype) == ARRAY_TYPE
 || TREE_CODE (chartype) == POINTER_TYPE)
chartype = TREE_TYPE (chartype);

 if (TREE_CODE (chartype) != INTEGER_TYPE)
return NULL;

I will update the patch before committing.

FWIW, it seems like it would be useful to extend the function to
non-string arguments.  That way it would be able to return array
initializers in cases like this:

  const struct A { int a[2]; }
a = { { 1, 2 } },
b = { { 1, 2 } };

  int f (void)
  {
return __builtin_memcmp (&a, &b, sizeof a);
  }

which would in turn make it possible to fold the result of
such calls analogously to strlen or strcmp.

Martin


Re: [PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread Kostya Serebryany via gcc-patches
On Wed, Jul 18, 2018 at 12:29 PM H.J. Lu  wrote:
>
> On Wed, Jul 18, 2018 at 11:45 AM, Kostya Serebryany  wrote:
> > On Wed, Jul 18, 2018 at 11:40 AM H.J. Lu  wrote:
> >>
> >> On Wed, Jul 18, 2018 at 11:18 AM, Kostya Serebryany  
> >> wrote:
> >> > What's ENDBR and do we really need to have it in compiler-rt?
> >>
> >> When shadow stack from Intel CET is enabled,  the first instruction of all
> >> indirect branch targets must be a special instruction, ENDBR.  In this 
> >> case,
> >
> > I am confused.
> > CET is a security mitigation feature (and ENDBR is a pretty weak form of 
> > such),
> > while ASAN is a testing tool, rarely used in production is almost
> > never as a mitigation (which it is not!).
> > Why would anyone need to combine CET and ASAN in one process?
> >
>
> CET is transparent to ASAN.  It is perfectly OK to use -fcf-protection to
> enable CET together with ASAN.

It is ok, but does it make any sense?
If anything, the current ASAN's intereceptors are a large blob of
security vulnerabilities.
If we ever want to use ASAN (or, more likely, HWASAN) as a security
mitigation feature,
we will need to get rid of these interceptors entirely.


>
> > Also, CET doesn't exist in the hardware yet, at least not publicly 
> > available.
> > Which means there should be no rush (am I wrong?) and we can do things
> > in the correct order:
> > implement the Clang/LLVM support, make the compiler-rt change in LLVM,
> > merge back to GCC.
>
> I am working with our LLVM people to address this.

Cool!


>
> H.J.
> > --kcc
> >
> >>
> >> int res = REAL(swapcontext)(oucp, ucp);
> >>     This function may be
> >> returned via an indirect branch.
> >>
> >> Here compiler must insert ENDBR after call, like
> >>
> >> call *bar(%rip)
> >> endbr64
> >>
> >> > As usual, I am opposed to any gcc compiler-rt that bypass upstream.
> >>
> >> We want it to be fixed in upstream.  That is why I opened an LLVM bug.
> >>
> >>
> >> > --kcc
> >> >
> >> > On Wed, Jul 18, 2018 at 8:37 AM H.J. Lu  wrote:
> >> >>
> >> >> asan/asan_interceptors.cc has
> >> >>
> >> >> ...
> >> >>   int res = REAL(swapcontext)(oucp, ucp);
> >> >> ...
> >> >>
> >> >> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
> >> >> swapcontext may return via indirect branch on x86 when shadow stack is
> >> >> enabled, we need to call REAL(swapcontext) with indirect_return 
> >> >> attribute
> >> >> on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.
> >> >>
> >> >> I opened an LLVM bug:
> >> >>
> >> >> https://bugs.llvm.org/show_bug.cgi?id=38207
> >> >>
> >> >> But it won't get fixed before indirect_return attribute is added to
> >> >> LLVM.  I'd like to get it fixed in GCC first.
> >> >>
> >> >> Tested on i386 and x86-64.  OK for trunk after
> >> >>
> >> >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html
> >> >>
> >> >> is approved?
> >> >>
> >> >> Thanks.
> >> >>
> >> >>
> >> >> H.J.
> >> >> ---
> >> >> PR target/86560
> >> >> * asan/asan_interceptors.cc (swapcontext): Call 
> >> >> REAL(swapcontext)
> >> >> with indirect_return attribute on x86.
> >> >> ---
> >> >>  libsanitizer/asan/asan_interceptors.cc | 6 ++
> >> >>  1 file changed, 6 insertions(+)
> >> >>
> >> >> diff --git a/libsanitizer/asan/asan_interceptors.cc 
> >> >> b/libsanitizer/asan/asan_interceptors.cc
> >> >> index a8f4b72723f..b8dde4f19c5 100644
> >> >> --- a/libsanitizer/asan/asan_interceptors.cc
> >> >> +++ b/libsanitizer/asan/asan_interceptors.cc
> >> >> @@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t 
> >> >> *oucp,
> >> >>uptr stack, ssize;
> >> >>ReadContextStack(ucp, &stack, &ssize);
> >> >>ClearShadowMemoryForContextStack(stack, ssize);
> >> >> +#if defined(__x86_64__) || defined(__i386__)
> >> >> +  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
> >> >> +__attribute__((__indirect_return__)) = REAL(swapcontext);
> >> >> +  int res = real_swapcontext(oucp, ucp);
> >> >> +#else
> >> >>int res = REAL(swapcontext)(oucp, ucp);
> >> >> +#endif
> >> >>// swapcontext technically does not return, but program may swap 
> >> >> context to
> >> >>// "oucp" later, that would look as if swapcontext() returned 0.
> >> >>// We need to clear shadow for ucp once again, as it may be in 
> >> >> arbitrary
> >> >> --
> >> >> 2.17.1
> >> >>
> >>
> >>
> >>
> >> --
> >> H.J.
>
>
>
> --
> H.J.


Re: [PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread H.J. Lu
On Wed, Jul 18, 2018 at 11:45 AM, Kostya Serebryany  wrote:
> On Wed, Jul 18, 2018 at 11:40 AM H.J. Lu  wrote:
>>
>> On Wed, Jul 18, 2018 at 11:18 AM, Kostya Serebryany  wrote:
>> > What's ENDBR and do we really need to have it in compiler-rt?
>>
>> When shadow stack from Intel CET is enabled,  the first instruction of all
>> indirect branch targets must be a special instruction, ENDBR.  In this case,
>
> I am confused.
> CET is a security mitigation feature (and ENDBR is a pretty weak form of 
> such),
> while ASAN is a testing tool, rarely used in production is almost
> never as a mitigation (which it is not!).
> Why would anyone need to combine CET and ASAN in one process?
>

CET is transparent to ASAN.  It is perfectly OK to use -fcf-protection to
enable CET together with ASAN.

> Also, CET doesn't exist in the hardware yet, at least not publicly available.
> Which means there should be no rush (am I wrong?) and we can do things
> in the correct order:
> implement the Clang/LLVM support, make the compiler-rt change in LLVM,
> merge back to GCC.

I am working with our LLVM people to address this.

H.J.
> --kcc
>
>>
>> int res = REAL(swapcontext)(oucp, ucp);
>>     This function may be
>> returned via an indirect branch.
>>
>> Here compiler must insert ENDBR after call, like
>>
>> call *bar(%rip)
>> endbr64
>>
>> > As usual, I am opposed to any gcc compiler-rt that bypass upstream.
>>
>> We want it to be fixed in upstream.  That is why I opened an LLVM bug.
>>
>>
>> > --kcc
>> >
>> > On Wed, Jul 18, 2018 at 8:37 AM H.J. Lu  wrote:
>> >>
>> >> asan/asan_interceptors.cc has
>> >>
>> >> ...
>> >>   int res = REAL(swapcontext)(oucp, ucp);
>> >> ...
>> >>
>> >> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
>> >> swapcontext may return via indirect branch on x86 when shadow stack is
>> >> enabled, we need to call REAL(swapcontext) with indirect_return attribute
>> >> on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.
>> >>
>> >> I opened an LLVM bug:
>> >>
>> >> https://bugs.llvm.org/show_bug.cgi?id=38207
>> >>
>> >> But it won't get fixed before indirect_return attribute is added to
>> >> LLVM.  I'd like to get it fixed in GCC first.
>> >>
>> >> Tested on i386 and x86-64.  OK for trunk after
>> >>
>> >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html
>> >>
>> >> is approved?
>> >>
>> >> Thanks.
>> >>
>> >>
>> >> H.J.
>> >> ---
>> >> PR target/86560
>> >> * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
>> >> with indirect_return attribute on x86.
>> >> ---
>> >>  libsanitizer/asan/asan_interceptors.cc | 6 ++
>> >>  1 file changed, 6 insertions(+)
>> >>
>> >> diff --git a/libsanitizer/asan/asan_interceptors.cc 
>> >> b/libsanitizer/asan/asan_interceptors.cc
>> >> index a8f4b72723f..b8dde4f19c5 100644
>> >> --- a/libsanitizer/asan/asan_interceptors.cc
>> >> +++ b/libsanitizer/asan/asan_interceptors.cc
>> >> @@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t 
>> >> *oucp,
>> >>uptr stack, ssize;
>> >>ReadContextStack(ucp, &stack, &ssize);
>> >>ClearShadowMemoryForContextStack(stack, ssize);
>> >> +#if defined(__x86_64__) || defined(__i386__)
>> >> +  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
>> >> +__attribute__((__indirect_return__)) = REAL(swapcontext);
>> >> +  int res = real_swapcontext(oucp, ucp);
>> >> +#else
>> >>int res = REAL(swapcontext)(oucp, ucp);
>> >> +#endif
>> >>// swapcontext technically does not return, but program may swap 
>> >> context to
>> >>// "oucp" later, that would look as if swapcontext() returned 0.
>> >>// We need to clear shadow for ucp once again, as it may be in 
>> >> arbitrary
>> >> --
>> >> 2.17.1
>> >>
>>
>>
>>
>> --
>> H.J.



-- 
H.J.


Re: [PATCH] Fix _Pragma GCC diagnostic in macro expansions

2018-07-18 Thread David Malcolm
On Wed, 2018-07-04 at 10:53 +, Bernd Edlinger wrote:

Sorry for the delay in reviewing this.

> Hi,
> 
> currently _Pragma("GCC diagnostic ...") does not properly
> work in macro expansions.
> 
> Consider the following code:
> 
> #define B _Pragma("GCC diagnostic push") \
> _Pragma("GCC diagnostic ignored \"-Wattributes\"")
> #define E _Pragma("GCC diagnostic pop")
> 
> #define X() B int __attribute((unknown_attr)) x; E /* { dg-bogus
> "attribute directive ignored" } */
> 
> void test1(void)
> {
> X()  /* { dg-bogus "in expansion of macro" } */
> }
> 
> 
> Warnings happen in C++ despite the _Pragma, while C happens to
> suppress the warnings
> more or less by accident.
> 
> This is connected to the fact that GCC uses the location of the
> closing parenthesis of the
> function-like macro expansion in the _Pragma, while the rest of the
> locations are relative
> to the macro expansion point, which is the letter X in this case.
> 
> This patch changes the location of builtin macros and _Pragma to use
> the macro expansion
> point instead of the closing parenthesis.

While reviewing the various PRs for the testcases this touches, I
noticed:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61817#c3
which has a link to:
  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1911.htm
which has:

  Suggested Technical Corrigendum

  Add to 6.10.8.1, paragraph 1, item __LINE__:

  The line number of a pp token is implementation defined to
  be the (physical) line number of either the first character
  or the last character of the pp token. The line number of a
  __LINE__ that spans multiple physical lines is implementation
  defined to be either the first line or the last line of that
  __LINE__. The line number of a __LINE__ in a macro body is the
  line number of the macro invocation. The line number of a macro
  invocation that spans multiple (physical or logical) lines is
  implementation defined to be either the line number of the
  first character of the macro name, the last character of the
  macro name or the closing ')' (if there is one).

I don't know the status of that suggested corrigendum, but if I'm
reading it right, changing the location from that of the closing ')' to
that of the macro name would keep us in compliance with that final
sentence (after extracting the location's line number).

> A few test cases had to be adjusted, most changes were necessary
> because the __LINE__
> location moved to the macro expansion point, which looks like a
> straight improvement.
> 
> In pr61817-2.c the location of __LINE__ depends on -ftrack-macro-
> expansion,
> when enabled the location of the macro argument is the spelling
> location, while all other
> locations change to the macro expansion point.
> 
> The C++ pagma plugin.c is also affected by the change, because the
> input_location is now
> the spelling location of _Pragma in DO_PRAGMA and has to be converted
> to the expansion
> point of the macro to get the expected result.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Some nits:

> libcpp:
> 2018-07-04  Bernd Edlinger  
> 
> * macro.c (enter_macro_context): Change the location info for builtin
> macros from location of the closing parenthesis to location of the 
> macro
> expansion point.

Please update to say "builtin macros and _Pragma" here, rather than just
"builtin macros" here.

> testsuite:
> 2018-07-04  Bernd Edlinger  
> 
> * c-c++-common/cpp/diagnostic-pragma-2.c: New test.
> * c-c++-common/pr69558.c: Remove xfail.
> * gcc.dg/cpp/builtin-macro-1.c: Adjust test expectations.
> * gcc.dg/pr61817-1.c: Likewise.
> * gcc.dg/pr61817-2.c: Likewise.
> * g++.dg/plugin/pragma_plugin.c: Warn at expansion_point_location.

Please reference PR 69558 in both ChangeLog entries (given that this
fixes an XFAIL).

OK for trunk with the above changes; thanks.

It looks like with this change we can remove Jakub's r233058 hack.  I
briefly tested with it, and it seems to work.  But that can wait for a
followup.

Dave


Re: [PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread Kostya Serebryany via gcc-patches
On Wed, Jul 18, 2018 at 11:40 AM H.J. Lu  wrote:
>
> On Wed, Jul 18, 2018 at 11:18 AM, Kostya Serebryany  wrote:
> > What's ENDBR and do we really need to have it in compiler-rt?
>
> When shadow stack from Intel CET is enabled,  the first instruction of all
> indirect branch targets must be a special instruction, ENDBR.  In this case,

I am confused.
CET is a security mitigation feature (and ENDBR is a pretty weak form of such),
while ASAN is a testing tool, rarely used in production is almost
never as a mitigation (which it is not!).
Why would anyone need to combine CET and ASAN in one process?

Also, CET doesn't exist in the hardware yet, at least not publicly available.
Which means there should be no rush (am I wrong?) and we can do things
in the correct order:
implement the Clang/LLVM support, make the compiler-rt change in LLVM,
merge back to GCC.

--kcc

>
> int res = REAL(swapcontext)(oucp, ucp);
>     This function may be
> returned via an indirect branch.
>
> Here compiler must insert ENDBR after call, like
>
> call *bar(%rip)
> endbr64
>
> > As usual, I am opposed to any gcc compiler-rt that bypass upstream.
>
> We want it to be fixed in upstream.  That is why I opened an LLVM bug.
>
>
> > --kcc
> >
> > On Wed, Jul 18, 2018 at 8:37 AM H.J. Lu  wrote:
> >>
> >> asan/asan_interceptors.cc has
> >>
> >> ...
> >>   int res = REAL(swapcontext)(oucp, ucp);
> >> ...
> >>
> >> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
> >> swapcontext may return via indirect branch on x86 when shadow stack is
> >> enabled, we need to call REAL(swapcontext) with indirect_return attribute
> >> on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.
> >>
> >> I opened an LLVM bug:
> >>
> >> https://bugs.llvm.org/show_bug.cgi?id=38207
> >>
> >> But it won't get fixed before indirect_return attribute is added to
> >> LLVM.  I'd like to get it fixed in GCC first.
> >>
> >> Tested on i386 and x86-64.  OK for trunk after
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html
> >>
> >> is approved?
> >>
> >> Thanks.
> >>
> >>
> >> H.J.
> >> ---
> >> PR target/86560
> >> * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
> >> with indirect_return attribute on x86.
> >> ---
> >>  libsanitizer/asan/asan_interceptors.cc | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/libsanitizer/asan/asan_interceptors.cc 
> >> b/libsanitizer/asan/asan_interceptors.cc
> >> index a8f4b72723f..b8dde4f19c5 100644
> >> --- a/libsanitizer/asan/asan_interceptors.cc
> >> +++ b/libsanitizer/asan/asan_interceptors.cc
> >> @@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
> >>uptr stack, ssize;
> >>ReadContextStack(ucp, &stack, &ssize);
> >>ClearShadowMemoryForContextStack(stack, ssize);
> >> +#if defined(__x86_64__) || defined(__i386__)
> >> +  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
> >> +__attribute__((__indirect_return__)) = REAL(swapcontext);
> >> +  int res = real_swapcontext(oucp, ucp);
> >> +#else
> >>int res = REAL(swapcontext)(oucp, ucp);
> >> +#endif
> >>// swapcontext technically does not return, but program may swap 
> >> context to
> >>// "oucp" later, that would look as if swapcontext() returned 0.
> >>// We need to clear shadow for ucp once again, as it may be in arbitrary
> >> --
> >> 2.17.1
> >>
>
>
>
> --
> H.J.


Re: [PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread H.J. Lu
On Wed, Jul 18, 2018 at 11:18 AM, Kostya Serebryany  wrote:
> What's ENDBR and do we really need to have it in compiler-rt?

When shadow stack from Intel CET is enabled,  the first instruction of all
indirect branch targets must be a special instruction, ENDBR.  In this case,

int res = REAL(swapcontext)(oucp, ucp);
    This function may be
returned via an indirect branch.

Here compiler must insert ENDBR after call, like

call *bar(%rip)
endbr64

> As usual, I am opposed to any gcc compiler-rt that bypass upstream.

We want it to be fixed in upstream.  That is why I opened an LLVM bug.


> --kcc
>
> On Wed, Jul 18, 2018 at 8:37 AM H.J. Lu  wrote:
>>
>> asan/asan_interceptors.cc has
>>
>> ...
>>   int res = REAL(swapcontext)(oucp, ucp);
>> ...
>>
>> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
>> swapcontext may return via indirect branch on x86 when shadow stack is
>> enabled, we need to call REAL(swapcontext) with indirect_return attribute
>> on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.
>>
>> I opened an LLVM bug:
>>
>> https://bugs.llvm.org/show_bug.cgi?id=38207
>>
>> But it won't get fixed before indirect_return attribute is added to
>> LLVM.  I'd like to get it fixed in GCC first.
>>
>> Tested on i386 and x86-64.  OK for trunk after
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html
>>
>> is approved?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>> PR target/86560
>> * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
>> with indirect_return attribute on x86.
>> ---
>>  libsanitizer/asan/asan_interceptors.cc | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/libsanitizer/asan/asan_interceptors.cc 
>> b/libsanitizer/asan/asan_interceptors.cc
>> index a8f4b72723f..b8dde4f19c5 100644
>> --- a/libsanitizer/asan/asan_interceptors.cc
>> +++ b/libsanitizer/asan/asan_interceptors.cc
>> @@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
>>uptr stack, ssize;
>>ReadContextStack(ucp, &stack, &ssize);
>>ClearShadowMemoryForContextStack(stack, ssize);
>> +#if defined(__x86_64__) || defined(__i386__)
>> +  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
>> +__attribute__((__indirect_return__)) = REAL(swapcontext);
>> +  int res = real_swapcontext(oucp, ucp);
>> +#else
>>int res = REAL(swapcontext)(oucp, ucp);
>> +#endif
>>// swapcontext technically does not return, but program may swap context 
>> to
>>// "oucp" later, that would look as if swapcontext() returned 0.
>>// We need to clear shadow for ucp once again, as it may be in arbitrary
>> --
>> 2.17.1
>>



-- 
H.J.


Re: [PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread Kostya Serebryany via gcc-patches
What's ENDBR and do we really need to have it in compiler-rt?

As usual, I am opposed to any gcc compiler-rt that bypass upstream.

--kcc

On Wed, Jul 18, 2018 at 8:37 AM H.J. Lu  wrote:
>
> asan/asan_interceptors.cc has
>
> ...
>   int res = REAL(swapcontext)(oucp, ucp);
> ...
>
> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
> swapcontext may return via indirect branch on x86 when shadow stack is
> enabled, we need to call REAL(swapcontext) with indirect_return attribute
> on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.
>
> I opened an LLVM bug:
>
> https://bugs.llvm.org/show_bug.cgi?id=38207
>
> But it won't get fixed before indirect_return attribute is added to
> LLVM.  I'd like to get it fixed in GCC first.
>
> Tested on i386 and x86-64.  OK for trunk after
>
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html
>
> is approved?
>
> Thanks.
>
>
> H.J.
> ---
> PR target/86560
> * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
> with indirect_return attribute on x86.
> ---
>  libsanitizer/asan/asan_interceptors.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/libsanitizer/asan/asan_interceptors.cc 
> b/libsanitizer/asan/asan_interceptors.cc
> index a8f4b72723f..b8dde4f19c5 100644
> --- a/libsanitizer/asan/asan_interceptors.cc
> +++ b/libsanitizer/asan/asan_interceptors.cc
> @@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
>uptr stack, ssize;
>ReadContextStack(ucp, &stack, &ssize);
>ClearShadowMemoryForContextStack(stack, ssize);
> +#if defined(__x86_64__) || defined(__i386__)
> +  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
> +__attribute__((__indirect_return__)) = REAL(swapcontext);
> +  int res = real_swapcontext(oucp, ucp);
> +#else
>int res = REAL(swapcontext)(oucp, ucp);
> +#endif
>// swapcontext technically does not return, but program may swap context to
>// "oucp" later, that would look as if swapcontext() returned 0.
>// We need to clear shadow for ucp once again, as it may be in arbitrary
> --
> 2.17.1
>


Re: [wwwdocs] Document new sve-acle-branch

2018-07-18 Thread Gerald Pfeifer
On Wed, 18 Jul 2018, Richard Sandiford wrote:
> I've created a new git branch for developing the SVE ACLE (i.e. 
> intrinsics) implementation.  Is the branches entry below OK to commit?  

Yes, thanks you!

(Perhaps ChangeLogs instead of changelogs, but I leave this to you.)

> Although the branch is on git rather than svn, other git branches have 
> also been documented here.

Makes sense.  We probably should (once out of CVS, ahem) rename
that page or split of the part covering the branches.

Gerald


[SVE ACLE] Add initial support for arm_sve.h

2018-07-18 Thread Richard Sandiford
This patch adds the target framework for handling the SVE ACLE,
starting with four functions: svadd, svptrue, svsub and svsubr.

The ACLE has both overloaded and non-overloaded names.  Without
the equivalent of clang's __attribute__((overloadable)), a header
file that declared all functions would need three sets of declarations:

- the non-overloaded forms (used for both C and C++)
- _Generic-based macros to handle overloading in C
- normal overloaded inline functions for C++

This would likely require a lot of cut-&-paste.  It would probably
also lead to poor diagnosics and be slow to parse.

Another consideration is that some functions require certain arguments
to be integer constant expressions.  We can (sort of) enforce that
for calls to built-in functions using resolve_overloaded_builtin,
but it would be harder to enforce with inline forwarder functions.

For these reasons and others, the patch takes the approach of adding
a pragma that gets the compiler to insert the definitions itself.
This requires a slight variation on the existing lang hooks for
built-in functions, but otherwise it seems to just work.

It was easier to add the support without enumerating every function
at build time.  This in turn meant that it was easier if the SVE
builtins occupied a distinct numberspace from the existing AArch64 ones.
The patch therefore divides the built-in functions codes into "major"
and "minor" codes.  At present the major code is just "general" or "SVE".

For now, the patch is only expected to work for fixed-length SVE.
Some uses of the ACLE do manage to squeak through the front-end
in the normal vector-length agnostic mode, but that's more by
accident than design.  We're planning to work on proper frontend
support for "sizeless" types in parallel with the backend changes.

Other things not handled yet:

- support for the SVE AAPCS
- handling the built-ins correctly when the compiler is invoked
  without SVE enabled (e.g. if SVE is enabled later by a pragma)

Both of these are blockers to merging the support into trunk.

The aim is to make sure when adding a function that the function
produces the expected assembly output for all relevant combinations.
The patch adds a new check-function-bodies test to try to make
that easier.

Tested on aarch64-linux-gnu (with and without SVE) and committed
to aarch64/sve-acle-branch.

Richard




initial-sve-acle.diff.gz
Description: application/gzip


[AArch64] Add support for 16-bit FMOV immediates

2018-07-18 Thread Richard Sandiford
aarch64_float_const_representable_p was still returning false for
HFmode, so we wouldn't use 16-bit FMOV immediate.  E.g. before the
patch:

__fp16 foo (void) { return 0x1.1p-3; }

gave:

   mov w0, 12352
   fmovh0, w0

with -march=armv8.2-a+fp16, whereas now it gives:

   fmovh0, 1.328125e-1

Tested on aarch64-linux-gnu, both with and without SVE.  OK to install?

Richard


2018-07-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_float_const_representable_p):
Allow HFmode constants if TARGET_FP_F16INST.

gcc/testsuite/
* gcc.target/aarch64/f16_mov_immediate_1.c: Expect fmov immediate
to be used.
* gcc.target/aarch64/f16_mov_immediate_2.c: Likewise.
* gcc.target/aarch64/f16_mov_immediate_3.c: Force +nofp16.
* gcc.target/aarch64/sve/single_1.c: Except fmov immediate to be used
for .h.
* gcc.target/aarch64/sve/single_2.c: Likewise.
* gcc.target/aarch64/sve/single_3.c: Likewise.
* gcc.target/aarch64/sve/single_4.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2018-07-18 18:45:26.0 +0100
+++ gcc/config/aarch64/aarch64.c2018-07-18 18:45:27.025332090 +0100
@@ -14908,8 +14908,8 @@ aarch64_float_const_representable_p (rtx
   if (!CONST_DOUBLE_P (x))
 return false;
 
-  /* We don't support HFmode constants yet.  */
-  if (GET_MODE (x) == VOIDmode || GET_MODE (x) == HFmode)
+  if (GET_MODE (x) == VOIDmode
+  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
 return false;
 
   r = *CONST_DOUBLE_REAL_VALUE (x);
Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
===
--- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c  2018-07-18 
18:45:26.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c  2018-07-18 
18:45:27.025332090 +0100
@@ -44,6 +44,6 @@ __fp16 f5 ()
   return a;
 }
 
-/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, #?19520"   3 } } 
*/
-/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0xbc, lsl 8"  1 } 
} */
-/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x4c, lsl 8"  1 } 
} */
+/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1\.7e\+1}  3 } } */
+/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?-1\.0e\+0} 1 } } */
+/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1\.6e\+1}  1 } } */
Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c
===
--- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c  2018-07-18 
18:45:26.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c  2018-07-18 
18:45:27.025332090 +0100
@@ -40,6 +40,4 @@ float16_t f3(void)
 /* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x5c, lsl 8" 1 } 
} */
 /* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x7c, lsl 8" 1 } 
} */
 
-/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, 19520"  1 } 
} */
-/* { dg-final { scan-assembler-times "fmov\th\[0-9\], w\[0-9\]+"  1 } 
} */
-
+/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1.7e\+1}   1 } 
} */
Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c
===
--- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c  2018-07-18 
18:45:26.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c  2018-07-18 
18:45:27.025332090 +0100
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
 
+#pragma GCC target "+nofp16"
+
 __fp16 f4 ()
 {
   __fp16 a = 0.1;
Index: gcc/testsuite/gcc.target/aarch64/sve/single_1.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/single_1.c 2018-07-18 
18:45:26.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/single_1.c 2018-07-18 
18:45:27.025332090 +0100
@@ -36,7 +36,7 @@ TEST_LOOP (double, 3.0)
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, #6\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #7\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #8\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, #15360\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.h, #1\.0e\+0\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.s, #2\.0e\+0\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.d, #3\.0e\+0\n} 1 } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve/single_2.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/single_2.c 2018-07-18 
18:45:26.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/single_2.c 2018-

[wwwdocs] Document new sve-acle-branch

2018-07-18 Thread Richard Sandiford
Hi,

I've created a new git branch for developing the SVE ACLE (i.e. intrinsics)
implementation.  Is the branches entry below OK to commit?  Although the
branch is on git rather than svn, other git branches have also been
documented here.

Thanks,
Richard


Index: htdocs/svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.222
diff -u -p -r1.222 svn.html
--- htdocs/svn.html 2 Jun 2018 21:16:11 -   1.222
+++ htdocs/svn.html 18 Jul 2018 14:44:59 -
@@ -394,6 +394,15 @@ the command svn log --stop-on-copy
 Architecture-specific
 
 
+  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/aarch64/sve-acle-branch";>aarch64/sve-acle-branch
+  This https://gcc.gnu.org/wiki/GitMirror";>Git-only branch is
+  used for collaborative development of the AArch64 SVE ACLE implementation.
+  The branch is based off and merged with trunk.  Please send patches to
+  gcc-patches with an [SVE ACLE] tag in the subject line.
+  There's no need to use changelogs; the changelogs will instead be
+  written when the work is ready to be merged into trunk.  The branch is
+  maintained by Richard Sandiford.
+
   arc-20081210-branch
   The goal of this branch is to make the port to the ARCompact
   architecture available.  This branch is maintained by Joern Rennecke


Re: backporting fix for 85602 to GCC 8

2018-07-18 Thread Martin Sebor

On 07/18/2018 09:09 AM, Franz Sirl wrote:

Am 2018-07-18 um 01:50 schrieb Martin Sebor:

If there are no objections I'd like to backport the solution
for PR 85602 to avoid a class of unnecessary warnings for
safe uses of nonstring arrays.  With the release coming up
later this week I'll go ahead and commit the patch tomorrow.

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261718


Hi Martin,

and please remember the follow-up fix

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261751


I've committed the 85602 changes to 8-branch.  I also updated
the manual to mention that -Wstringop-truncation is enabled by
-Wall (thanks).  The rest seems out of scope so I'll look into
it for trunk.

Martin



The patch for PR 85602 makes the extended and enabled-by-Wall string
warnings (which I like!) complete. There's a warning for the majority of
cases and for the char-array-without-NUL cases there is the nonstring
attribute describing it nicely, much better than to turn off the warning
around such code.
I know that probably not too many codebases will be affected, but for
anyone affected the nonstring attribute is a much better way to avoid
the warnings than to turn it off (and if they turn off the warnings for
gcc-8 they often won't turn it on again for gcc-9+).
The nonstring attribute is also the documented way to silence the warnings.

BTW, while re-reading the documentation I noticed some minor omissions,
I attached a patch (untested). Feel free to commit it (I have no access)
if you think it's correct.

Franz.


2018-07-12  Franz Sirl  

* invoke.texi (Wstringop-overflow, Wstringop-truncation):
Mention enabling via -Wall.
(Wall): Add -Wstringop-overflow02 and -Wstringop-truncation.






Re: [PATCH] Show valid options for -march and -mtune in --help=target for arm32 (PR driver/83193).

2018-07-18 Thread Thomas Preudhomme
Hi Martin,

Why is this needed when -mfpu does not seem to need it for instance?
Regarding the patch:

> -print "Name(processor_type) Type(enum processor_type)"
> -print "Known ARM CPUs (for use with the -mcpu= and -mtune= options):\n"
> +print "Name(processor_type) Type(enum processor_type) ForceHelp"
> +print "Known ARM CPUs (for use with the -mtune= options):\n"

Why changing the text beyond adding ForceHelp?

> +@item ForceHelp
> +This property is optional.  If present, enum values is printed
> +in @option{--help} output.
> +

are printed

Thanks,

Thomas
On Wed, 18 Jul 2018 at 16:50, Martin Liška  wrote:
>
> Hi.
>
> This introduces new ForceHelp option flag that helps to
> print valid option enum values that are not directly
> used as a type of an option.
>
> May I please ask ARM folks to test the patch?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2018-07-18  Martin Liska  
>
> PR driver/83193
> * config/arm/arm-tables.opt: Add ForceHelp flag for
> processor_type and arch_name enum types.
> * config/arm/parsecpu.awk: Likewise.
> * doc/options.texi: Document new flag ForceHelp.
> * opt-read.awk: Parse ForceHelp and set it in construction.
> * optc-gen.awk: Likewise.
> * opts.c (print_filtered_help): Handle force_help option.
> * opts.h (struct cl_enum): New field force_help.
> ---
>  gcc/config/arm/arm-tables.opt | 6 +++---
>  gcc/config/arm/parsecpu.awk   | 6 +++---
>  gcc/doc/options.texi  | 4 
>  gcc/opt-read.awk  | 3 +++
>  gcc/optc-gen.awk  | 3 ++-
>  gcc/opts.c| 3 ++-
>  gcc/opts.h| 3 +++
>  7 files changed, 20 insertions(+), 8 deletions(-)
>
>


Compilation error in simple-object-elf.c

2018-07-18 Thread Eli Zaretskii
Hi,

I've built the pretest of GDB 8.2 with MinGW today, and bumped into a
compilation error in libiberty:

 if [ x"" != x ]; then \
   gcc -c -DHAVE_CONFIG_H -O2 -gdwarf-4 -g3 -D__USE_MINGW_ACCESS  -I. 
-I./../include   -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes 
-pedantic  -D_GNU_SOURCE   ./simple-object-elf.c -o noasan/simple-object-elf.o; 
\
 else true; fi
 gcc -c -DHAVE_CONFIG_H -O2 -gdwarf-4 -g3 -D__USE_MINGW_ACCESS  -I. 
-I./../include   -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes 
-pedantic  -D_GNU_SOURCE ./simple-object-elf.c -o simple-object-elf.o
 ./simple-object-elf.c: In function 
'simple_object_elf_copy_lto_debug_sections':
 ./simple-object-elf.c:1284:14: error: 'ENOTSUP' undeclared (first use in 
this function)
*err = ENOTSUP;
   ^~~
 ./simple-object-elf.c:1284:14: note: each undeclared identifier is 
reported only once for each function it appears in

Suggested fix:

2018-07-18  Eli Zaretskii  

* libiberty/simple-object-elf.c (ENOTSUP): If not defined by
  errno.h, redirect to ENOSYS.

--- libiberty/simple-object-elf.c~0 2018-07-04 18:41:59.0 +0300
+++ libiberty/simple-object-elf.c   2018-07-18 18:19:39.286654700 +0300
@@ -22,6 +22,10 @@ Boston, MA 02110-1301, USA.  */
 #include "simple-object.h"
 
 #include 
+/* mingw.org's MinGW doesn't have ENOTSUP.  */
+#ifndef ENOTSUP
+# define ENOTSUP ENOSYS
+#endif
 #include 
 
 #ifdef HAVE_STDLIB_H



Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov

Hi Richard,

On 18/07/18 16:27, Richard Sandiford wrote:

Thanks for doing this.

Kyrill  Tkachov  writes:

+ calc = build_call_expr_internal_loc (input_location, ifn, type,
+ 2, mvar, convert (type, val));

(indentation looks off)


diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 
b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 
..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 
b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 
..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }

Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.

It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.


Doh, yes. I had spotted that myself after I had sent out the patch.
I've fixed that and the indentation issue in this small revision.

Given Janne's comments I will commit this tomorrow if there are no objections.
This patch should be a conservative improvement. If the Fortran folks decide
to sacrifice the more predictable NaN handling in favour of more optimisation
leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up.

Thanks for the help,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_fmax_aarch64.f90: New test.
* gfortran.dg/min_fmin_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..c9b5479740c3f98f906132fda5c252274c4b6edd 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = COMP (mvar, a2)
+  mvar = COMP (mvar, a3)
   ...
-  return mvar
+  return mvar;
 }
- */
+Where COMP is MIN/MAX_EXPR for integral types or when we don't
+care about NaNs, or IFN_FMIN/MAX when the target has support for
+fast NaN-honouring min/max.  When neither holds expand a sequence
+of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 NE_EXPR, logical_type_node,
 TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-  else
-  {
-	cond = NULL_TREE;

[PATCH] Show valid options for -march and -mtune in --help=target for arm32 (PR driver/83193).

2018-07-18 Thread Martin Liška
Hi.

This introduces new ForceHelp option flag that helps to
print valid option enum values that are not directly
used as a type of an option.

May I please ask ARM folks to test the patch?
Thanks,
Martin

gcc/ChangeLog:

2018-07-18  Martin Liska  

PR driver/83193
* config/arm/arm-tables.opt: Add ForceHelp flag for
processor_type and arch_name enum types.
* config/arm/parsecpu.awk: Likewise.
* doc/options.texi: Document new flag ForceHelp.
* opt-read.awk: Parse ForceHelp and set it in construction.
* optc-gen.awk: Likewise.
* opts.c (print_filtered_help): Handle force_help option.
* opts.h (struct cl_enum): New field force_help.
---
 gcc/config/arm/arm-tables.opt | 6 +++---
 gcc/config/arm/parsecpu.awk   | 6 +++---
 gcc/doc/options.texi  | 4 
 gcc/opt-read.awk  | 3 +++
 gcc/optc-gen.awk  | 3 ++-
 gcc/opts.c| 3 ++-
 gcc/opts.h| 3 +++
 7 files changed, 20 insertions(+), 8 deletions(-)


diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index eacee746a39..cbaa67385d7 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -21,8 +21,8 @@
 ; .
 
 Enum
-Name(processor_type) Type(enum processor_type)
-Known ARM CPUs (for use with the -mcpu= and -mtune= options):
+Name(processor_type) Type(enum processor_type) ForceHelp
+Known ARM CPUs (for use with the -mtune= options):
 
 EnumValue
 Enum(processor_type) String(arm8) Value( TARGET_CPU_arm8)
@@ -298,7 +298,7 @@ EnumValue
 Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52)
 
 Enum
-Name(arm_arch) Type(int)
+Name(arm_arch) Type(int) ForceHelp
 Known ARM architectures (for use with the -march= option):
 
 EnumValue
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index aabe1b0c64c..162712acb0e 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -441,8 +441,8 @@ function gen_opt () {
 boilerplate("md")
 
 print "Enum"
-print "Name(processor_type) Type(enum processor_type)"
-print "Known ARM CPUs (for use with the -mcpu= and -mtune= options):\n"
+print "Name(processor_type) Type(enum processor_type) ForceHelp"
+print "Known ARM CPUs (for use with the -mtune= options):\n"
 
 ncpus = split (cpu_list, cpus)
 
@@ -454,7 +454,7 @@ function gen_opt () {
 }
 
 print "Enum"
-print "Name(arm_arch) Type(int)"
+print "Name(arm_arch) Type(int) ForceHelp"
 print "Known ARM architectures (for use with the -march= option):\n"
 
 narchs = split (arch_list, archs)
diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index b3ca9f6fce6..1c9abac0b36 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -120,6 +120,10 @@ being described by this record.
 This property is required; it says what value (representable as
 @code{int}) should be used for the given string.
 
+@item ForceHelp
+This property is optional.  If present, enum values is printed
+in @option{--help} output.
+
 @item Canonical
 This property is optional.  If present, it says the present string is
 the canonical one among all those with the given value.  Other strings
diff --git a/gcc/opt-read.awk b/gcc/opt-read.awk
index 2072958e6ba..6d2be9e99d7 100644
--- a/gcc/opt-read.awk
+++ b/gcc/opt-read.awk
@@ -89,6 +89,9 @@ BEGIN {
 			enum_index[name] = n_enums
 			enum_unknown_error[name] = unknown_error
 			enum_help[name] = $3
+			enum_force_help[name] = test_flag("ForceHelp", props, "true")
+			if (enum_force_help[name] == "")
+			  enum_force_help[name] = "false"
 			n_enums++
 		}
 		else if ($1 == "EnumValue")  {
diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk
index bf177e86330..5c4f4239db0 100644
--- a/gcc/optc-gen.awk
+++ b/gcc/optc-gen.awk
@@ -167,7 +167,8 @@ for (i = 0; i < n_enums; i++) {
 	print "cl_enum_" name "_data,"
 	print "sizeof (" enum_type[name] "),"
 	print "cl_enum_" name "_set,"
-	print "cl_enum_" name "_get"
+	print "cl_enum_" name "_get,"
+	print "" enum_force_help[name]
 	print "  },"
 }
 print "};"
diff --git a/gcc/opts.c b/gcc/opts.c
index b8ae8756b4f..214ef806cd5 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1337,7 +1337,8 @@ print_filtered_help (unsigned int include_flags,
 {
   unsigned int j, pos;
 
-  if (opts->x_help_enum_printed[i] != 1)
+  if (opts->x_help_enum_printed[i] != 1
+	  && !cl_enums[i].force_help)
 	continue;
   if (cl_enums[i].help == NULL)
 	continue;
diff --git a/gcc/opts.h b/gcc/opts.h
index 3723bdbf95b..c8777b3cd6a 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -193,6 +193,9 @@ struct cl_enum
 
   /* Function to get the value of a variable of this type.  */
   int (*get) (const void *var);
+
+  /* Force enum to be printed in help.  */
+  bool force_help;
 };
 
 extern const struct cl_enum cl_enums[];



Re: [PATCH 1/4] Clean up of new format of -falign-FOO.

2018-07-18 Thread Martin Sebor

On 07/04/2018 04:23 AM, marxin wrote:


gcc/ChangeLog:

2018-07-11  Martin Liska  

* align.h: New file.


Martin,

I'm seeing lots of warnings for this file:

/ssd/src/gcc/svn/gcc/align.h:53:32: warning: extended initializer lists 
only available with -std=c++11 or -std=gnu++11


The code that triggers them is:

+struct align_flags
+{
+  /* Default constructor.  */
+  align_flags (int log0 = 0, int maxskip0 = 0, int log1 = 0, int 
maxskip1 = 0)

+  {
+levels[0] = {log0, maxskip0};
+levels[1] = {log1, maxskip1};
+normalize ();
+  }

This form of assignment isn't valid in C++ 98.

Thanks
Martin


[PATCH] Provide extension hint for aarch64 target (PR driver/83193).

2018-07-18 Thread Martin Liška
Hi.

This patch improves aarch64 feature modifier hints.

May I please ask ARM folks to test the patch?
Thanks,
Martin

gcc/ChangeLog:

2018-07-18  Martin Liska  

PR driver/83193
* common/config/aarch64/aarch64-common.c (aarch64_parse_extension):
Set invalid_extension when there's any.
(aarch64_get_all_extension_candidates): New function.
(aarch64_rewrite_selected_cpu): Pass NULL as new argument.
* config/aarch64/aarch64-protos.h 
(aarch64_get_all_extension_candidates):
Declare new function.
* config/aarch64/aarch64.c (aarch64_parse_arch): Record
invalid_feature.
(aarch64_parse_cpu): Likewise.
(aarch64_print_hint_for_feature_modifier): New.
(aarch64_validate_mcpu): Record invalid feature modifier
and print hint for it.
(aarch64_validate_march): Likewise.
(aarch64_handle_attr_arch): Likewise.
(aarch64_handle_attr_cpu): Likewise.
(aarch64_handle_attr_isa_flags): Likewise.

gcc/testsuite/ChangeLog:

2018-07-18  Martin Liska  

PR driver/83193
* gcc.target/aarch64/spellcheck_7.c: New test.
* gcc.target/aarch64/spellcheck_8.c: New test.
---
 gcc/common/config/aarch64/aarch64-common.c| 20 +-
 gcc/config/aarch64/aarch64-protos.h   |  4 +-
 gcc/config/aarch64/aarch64.c  | 67 +++
 .../gcc.target/aarch64/spellcheck_7.c | 11 +++
 .../gcc.target/aarch64/spellcheck_8.c | 12 
 5 files changed, 97 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_8.c


diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 292fb818705..c2994514004 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -175,7 +175,8 @@ static const struct arch_to_arch_name all_architectures[] =
aarch64_parse_opt_result describing the result.  */
 
 enum aarch64_parse_opt_result
-aarch64_parse_extension (const char *str, unsigned long *isa_flags)
+aarch64_parse_extension (const char *str, unsigned long *isa_flags,
+			 char **invalid_extension)
 {
   /* The extension string is parsed left to right.  */
   const struct aarch64_option_extension *opt = NULL;
@@ -226,6 +227,11 @@ aarch64_parse_extension (const char *str, unsigned long *isa_flags)
   if (opt->name == NULL)
 	{
 	  /* Extension not found in list.  */
+	  if (invalid_extension)
+	{
+	  *invalid_extension = xstrdup (str);
+	  (*invalid_extension)[len] = '\0';
+	}
 	  return AARCH64_PARSE_INVALID_FEATURE;
 	}
 
@@ -235,6 +241,16 @@ aarch64_parse_extension (const char *str, unsigned long *isa_flags)
   return AARCH64_PARSE_OK;
 }
 
+/* Append all extension candidates and put them to CANDIDATES vector.  */
+
+void
+aarch64_get_all_extension_candidates (auto_vec *candidates)
+{
+  const struct aarch64_option_extension *opt;
+  for (opt = all_extensions; opt->name != NULL; opt++)
+candidates->safe_push (opt->name);
+}
+
 /* Return a string representation of ISA_FLAGS.  DEFAULT_ARCH_FLAGS
gives the default set of flags which are implied by whatever -march
we'd put out.  Our job is to figure out the minimal set of "+" and
@@ -322,7 +338,7 @@ aarch64_rewrite_selected_cpu (const char *name)
 fatal_error (input_location, "unknown value %qs for -mcpu", name);
 
   unsigned long extensions = p_to_a->flags;
-  aarch64_parse_extension (extension_str.c_str (), &extensions);
+  aarch64_parse_extension (extension_str.c_str (), &extensions, NULL);
 
   std::string outstr = a_to_an->arch_name
 	+ aarch64_get_extension_string_for_isa_flags (extensions,
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bc11a781c4b..4db274fb85d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -550,7 +550,9 @@ bool aarch64_handle_option (struct gcc_options *, struct gcc_options *,
 			 const struct cl_decoded_option *, location_t);
 const char *aarch64_rewrite_selected_cpu (const char *name);
 enum aarch64_parse_opt_result aarch64_parse_extension (const char *,
-		   unsigned long *);
+		   unsigned long *,
+		   char **);
+void aarch64_get_all_extension_candidates (auto_vec *candidates);
 std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
 			unsigned long);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1369704da3e..6fa03e4b091 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10229,7 +10229,7 @@ static void initialize_aarch64_code_model (struct gcc_options *);
 
 static enum aarch64_parse_opt_result
 aarch64_parse_arch (const char *to_parse, const struct processor **res,
-		unsigned long *isa_flags)
+		unsigned long *isa_flags, char **invalid_feat

[PATCH] Print default options selection for -march,-mcpu and -mtune for aarch64 (PR driver/83193).

2018-07-18 Thread Martin Liška
Hi.

This is aarch64 fix for PR83193. It's about setting of default options
so that --help=target -Q prints proper numbers:

Now this is seen on my cross-compiler:

--- /home/marxin/Downloads/options-2-before.txt 2018-07-18 14:53:11.658146543 
+0200
+++ /home/marxin/Downloads/options-2.txt2018-07-18 14:52:30.113274284 
+0200
@@ -1,10 +1,10 @@
 The following options are target specific:
   -mabi=ABIlp64
-  -march=ARCH  
+  -march=  armv8-a
   -mbig-endian [disabled]
   -mbionic [disabled]
   -mcmodel=small
-  -mcpu=CPU
+  -mcpu=   generic
   -mfix-cortex-a53-835769  [enabled]
   -mfix-cortex-a53-843419  [enabled]
   -mgeneral-regs-only  [disabled]
@@ -19,7 +19,7 @@
   -msve-vector-bits=N  scalable
   -mtls-dialect=   desc
   -mtls-size=  24
-  -mtune=CPU   
+  -mtune=  generic
   -muclibc [disabled]

May I please ask ARM folks to test the patch?
Thanks,
Martin

gcc/ChangeLog:

2018-07-18  Martin Liska  

PR driver/83193
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set default values for x_aarch64_*_string strings.
* config/aarch64/aarch64.opt: Remove --{march,mcpu,mtune}==
prefix.
---
 gcc/config/aarch64/aarch64.c   | 7 +++
 gcc/config/aarch64/aarch64.opt | 6 +++---
 2 files changed, 10 insertions(+), 3 deletions(-)


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6fa03e4b091..d48e6278efa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10713,6 +10713,13 @@ aarch64_override_options_internal (struct gcc_options *opts)
   && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
 opts->x_flag_prefetch_loop_arrays = 1;
 
+  if (opts->x_aarch64_arch_string == NULL)
+opts->x_aarch64_arch_string = selected_arch->name;
+  if (opts->x_aarch64_cpu_string == NULL)
+opts->x_aarch64_cpu_string = selected_cpu->name;
+  if (opts->x_aarch64_tune_string == NULL)
+opts->x_aarch64_tune_string = selected_tune->name;
+
   aarch64_override_options_after_change_1 (opts);
 }
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 1426b45ff0f..7f0b65de37b 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -117,15 +117,15 @@ Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
 Target RejectNegative ToLower Joined Var(aarch64_arch_string)
--march=ARCH	Use features of architecture ARCH.
+Use features of architecture ARCH.
 
 mcpu=
 Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
--mcpu=CPU	Use features of and optimize for CPU.
+Use features of and optimize for CPU.
 
 mtune=
 Target RejectNegative ToLower Joined Var(aarch64_tune_string)
--mtune=CPU	Optimize for CPU.
+Optimize for CPU.
 
 mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)



[PATCH] Call REAL(swapcontext) with indirect_return attribute on x86

2018-07-18 Thread H.J. Lu
asan/asan_interceptors.cc has

...
  int res = REAL(swapcontext)(oucp, ucp);
...

REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
swapcontext may return via indirect branch on x86 when shadow stack is
enabled, we need to call REAL(swapcontext) with indirect_return attribute
on x86 so that compiler can insert ENDBR after REAL(swapcontext) call.

I opened an LLVM bug:

https://bugs.llvm.org/show_bug.cgi?id=38207

But it won't get fixed before indirect_return attribute is added to
LLVM.  I'd like to get it fixed in GCC first.

Tested on i386 and x86-64.  OK for trunk after

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01007.html

is approved?

Thanks.


H.J.
---
PR target/86560
* asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
with indirect_return attribute on x86.
---
 libsanitizer/asan/asan_interceptors.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libsanitizer/asan/asan_interceptors.cc 
b/libsanitizer/asan/asan_interceptors.cc
index a8f4b72723f..b8dde4f19c5 100644
--- a/libsanitizer/asan/asan_interceptors.cc
+++ b/libsanitizer/asan/asan_interceptors.cc
@@ -267,7 +267,13 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
   uptr stack, ssize;
   ReadContextStack(ucp, &stack, &ssize);
   ClearShadowMemoryForContextStack(stack, ssize);
+#if defined(__x86_64__) || defined(__i386__)
+  int (*real_swapcontext) (struct ucontext_t *, struct ucontext_t *)
+__attribute__((__indirect_return__)) = REAL(swapcontext);
+  int res = real_swapcontext(oucp, ucp);
+#else
   int res = REAL(swapcontext)(oucp, ucp);
+#endif
   // swapcontext technically does not return, but program may swap context to
   // "oucp" later, that would look as if swapcontext() returned 0.
   // We need to clear shadow for ucp once again, as it may be in arbitrary
-- 
2.17.1



[PATCH 2/3] i386: Change indirect_return to function type attribute

2018-07-18 Thread H.J. Lu
In

struct ucontext;
typedef struct ucontext ucontext_t;

extern int (*bar) (ucontext_t *__restrict __oucp,
   const ucontext_t *__restrict __ucp)
  __attribute__((__indirect_return__));

extern int res;

void
foo (ucontext_t *oucp, ucontext_t *ucp)
{
  res = bar (oucp, ucp);
}

bar() may return via indirect branch.  This patch changes indirect_return
to type attribute to allow indirect_return attribute on variable or type
of function pointer so that ENDBR can be inserted after call to bar().

Tested on i386 and x86-64.  OK for trunk?

Thanks.


H.J.
---
gcc/

PR target/86560
* config/i386/i386.c (rest_of_insert_endbranch): Lookup
indirect_return as function type attribute.
(ix86_attribute_table): Change indirect_return to function
type attribute.
* doc/extend.texi: Update indirect_return attribute.

gcc/testsuite/

PR target/86560
* gcc.target/i386/pr86560-1.c: New test.
* gcc.target/i386/pr86560-2.c: Likewise.
* gcc.target/i386/pr86560-3.c: Likewise.
---
 gcc/config/i386/i386.c| 23 +++
 gcc/doc/extend.texi   |  5 +++--
 gcc/testsuite/gcc.target/i386/pr86560-1.c | 16 
 gcc/testsuite/gcc.target/i386/pr86560-2.c | 16 
 gcc/testsuite/gcc.target/i386/pr86560-3.c | 17 +
 5 files changed, 67 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr86560-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr86560-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr86560-3.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aec739c3974..ac27248370b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2627,16 +2627,23 @@ rest_of_insert_endbranch (void)
{
  rtx call = get_call_rtx_from (insn);
  rtx fnaddr = XEXP (call, 0);
+ tree fndecl = NULL_TREE;
 
  /* Also generate ENDBRANCH for non-tail call which
 may return via indirect branch.  */
- if (MEM_P (fnaddr)
- && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF)
+ if (GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF)
+   fndecl = SYMBOL_REF_DECL (XEXP (fnaddr, 0));
+ if (fndecl == NULL_TREE)
+   fndecl = MEM_EXPR (fnaddr);
+ if (fndecl
+ && TREE_CODE (TREE_TYPE (fndecl)) != FUNCTION_TYPE
+ && TREE_CODE (TREE_TYPE (fndecl)) != METHOD_TYPE)
+   fndecl = NULL_TREE;
+ if (fndecl && TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
{
- tree fndecl = SYMBOL_REF_DECL (XEXP (fnaddr, 0));
- if (fndecl
- && lookup_attribute ("indirect_return",
-  DECL_ATTRIBUTES (fndecl)))
+ tree fntype = TREE_TYPE (fndecl);
+ if (lookup_attribute ("indirect_return",
+   TYPE_ATTRIBUTES (fntype)))
need_endbr = true;
}
}
@@ -46101,8 +46108,8 @@ static const struct attribute_spec 
ix86_attribute_table[] =
 ix86_handle_fndecl_attribute, NULL },
   { "function_return", 1, 1, true, false, false, false,
 ix86_handle_fndecl_attribute, NULL },
-  { "indirect_return", 0, 0, true, false, false, false,
-ix86_handle_fndecl_attribute, NULL },
+  { "indirect_return", 0, 0, false, true, true, false,
+NULL, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8b4d3fd9de3..edeaec6d872 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5861,8 +5861,9 @@ foo (void)
 @item indirect_return
 @cindex @code{indirect_return} function attribute, x86
 
-The @code{indirect_return} attribute on a function is used to inform
-the compiler that the function may return via indirect branch.
+The @code{indirect_return} attribute can be applied to a function,
+as well as variable or type of function pointer to inform the
+compiler that the function may return via indirect branch.
 
 @end table
 
diff --git a/gcc/testsuite/gcc.target/i386/pr86560-1.c 
b/gcc/testsuite/gcc.target/i386/pr86560-1.c
new file mode 100644
index 000..a2b702695c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr86560-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection" } */
+/* { dg-final { scan-assembler-times {\mendbr} 2 } } */
+
+struct ucontext;
+
+extern int (*bar) (struct ucontext *)
+  __attribute__((__indirect_return__));
+
+extern int res;
+
+void
+foo (struct ucontext *oucp)
+{
+  res = bar (oucp);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr86560-2.c 
b/gcc/testsuite/gcc.target/i386/pr

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Richard Sandiford
Thanks for doing this.

Kyrill  Tkachov  writes:
> +   calc = build_call_expr_internal_loc (input_location, ifn, type,
> +   2, mvar, convert (type, val));

(indentation looks off)

> diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 
> b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
> new file mode 100644
> index 
> ..8c8ea063e5d0718dc829c1f5574c5b46040e6786
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
> @@ -0,0 +1,9 @@
> +! { dg-do compile { target aarch64*-*-* } }
> +! { dg-options "-O2 -fdump-tree-optimized" }
> +
> +subroutine fool (a, b, c, d, e, f, g, h)
> +  real (kind=16) :: a, b, c, d, e, f, g, h
> +  a = max (a, b, c, d, e, f, g, h)
> +end subroutine
> +
> +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
> diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 
> b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
> new file mode 100644
> index 
> ..92368917fb48e0c468a16d080ab3a9ac842e01a7
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
> @@ -0,0 +1,9 @@
> +! { dg-do compile { target aarch64*-*-* } }
> +! { dg-options "-O2 -fdump-tree-optimized" }
> +
> +subroutine fool (a, b, c, d, e, f, g, h)
> +  real (kind=16) :: a, b, c, d, e, f, g, h
> +  a = min (a, b, c, d, e, f, g, h)
> +end subroutine
> +
> +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }

Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.

It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.

Thanks,
Richard


Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Janne Blomqvist
On Wed, Jul 18, 2018 at 4:26 PM, Thomas König  wrote:

> Hi Kyrlll,
>
> > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
> kyrylo.tkac...@foss.arm.com>:
> >
> > Thomas, Janne, would this relaxation of NaN handling be acceptable given
> the benefits
> > mentioned above? If so, what would be the recommended adjustment to the
> nan_1.f90 test?
>
> I would be a bit careful about changing behavior in such a major way. What
> would the results with NaN and infinity then be, with or without
> optimization? Would the results be consistent with min(nan,num) vs
> min(num,nan)? Would they be consistent with the new IEEE standard?
>

AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
Inf. For NaN the result is undefined, and you might indeed have

min(a, NaN) = a
min(NaN, a) = NaN

where "a" is a normal number.

(I think that happens at least on x86 if MIN_EXPR is expanded to
minsd/minpd.

Apparently what the proper result for min(a, NaN) should be is contentious
enough that minnum was removed from the upcoming IEEE 754 revision, and new
operations AFAICS have the semantics

minimum(a, NaN) = minimum(NaN, a) = NaN
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a

That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
C, and to the current behavior of gfortran.


> In general, I think that min(nan,num) should be nan and that our current
> behavior is not the best.
>

There was some extensive discussion of that in the Julia bug report I
linked to in an earlier message, and they came to the same conclusion and
changed their behavior.


> Does anybody have dats points on how this is handled by other compilers?
>

The only other compiler I have access to at the moment is ifort (and not
the latest version), but maybe somebody has access to a wider variety?


> Oh, and if anything is changed, then compile and runtime behavior should
> always be the same.
>

Well, IFF we place some weight on the runtime behavior being particularly
sensible wrt NaN's, which it wouldn't be if we just use a plain
MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
particular, if other compilers are inconsistent, we might as well do
whatever is fastest.


-- 
Janne Blomqvist


Re: backporting fix for 85602 to GCC 8

2018-07-18 Thread Franz Sirl

Am 2018-07-18 um 01:50 schrieb Martin Sebor:

If there are no objections I'd like to backport the solution
for PR 85602 to avoid a class of unnecessary warnings for
safe uses of nonstring arrays.  With the release coming up
later this week I'll go ahead and commit the patch tomorrow.

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261718


Hi Martin,

and please remember the follow-up fix

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261751

The patch for PR 85602 makes the extended and enabled-by-Wall string 
warnings (which I like!) complete. There's a warning for the majority of 
cases and for the char-array-without-NUL cases there is the nonstring 
attribute describing it nicely, much better than to turn off the warning 
around such code.
I know that probably not too many codebases will be affected, but for 
anyone affected the nonstring attribute is a much better way to avoid 
the warnings than to turn it off (and if they turn off the warnings for 
gcc-8 they often won't turn it on again for gcc-9+).

The nonstring attribute is also the documented way to silence the warnings.

BTW, while re-reading the documentation I noticed some minor omissions, 
I attached a patch (untested). Feel free to commit it (I have no access) 
if you think it's correct.


Franz.


2018-07-12  Franz Sirl  

* invoke.texi (Wstringop-overflow, Wstringop-truncation):
Mention enabling via -Wall.
(Wall): Add -Wstringop-overflow02 and -Wstringop-truncation.


Index: invoke.texi
===
diff --git a/trunk/gcc/doc/invoke.texi b/trunk/gcc/doc/invoke.texi
--- a/trunk/gcc/doc/invoke.texi (revision 262850)
+++ b/trunk/gcc/doc/invoke.texi (working copy)
@@ -3992,6 +3992,8 @@
 -Wsizeof-pointer-memaccess @gol
 -Wstrict-aliasing  @gol
 -Wstrict-overflow=1  @gol
+-Wstringop-overflow=2  @gol
+-Wstringop-truncation  @gol
 -Wswitch  @gol
 -Wtautological-compare  @gol
 -Wtrigraphs  @gol
@@ -5318,7 +5320,7 @@
 @}
 @end smallexample
 
-Option @option{-Wstringop-overflow=2} is enabled by default.
+Option @option{-Wstringop-overflow=2} is enabled by @option{-Wall}.
 
 @table @gcctabopt
 @item -Wstringop-overflow
@@ -5416,6 +5418,8 @@
 such arrays GCC issues warnings unless it can prove that the use is
 safe.  @xref{Common Variable Attributes}.
 
+Option @option{-Wstringop-truncation} is enabled by @option{-Wall}.
+
 @item 
-Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{|}cold@r{|}malloc@r{]}
 @opindex Wsuggest-attribute=
 @opindex Wno-suggest-attribute=


Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Janne Blomqvist
On Wed, Jul 18, 2018 at 5:03 PM, Kyrill Tkachov  wrote:

>
> On 18/07/18 14:26, Thomas König wrote:
>
>> Hi Kyrlll,
>>
>> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
>>> kyrylo.tkac...@foss.arm.com>:
>>>
>>> Thomas, Janne, would this relaxation of NaN handling be acceptable given
>>> the benefits
>>> mentioned above? If so, what would be the recommended adjustment to the
>>> nan_1.f90 test?
>>>
>> I would be a bit careful about changing behavior in such a major way.
>> What would the results with NaN and infinity then be, with or without
>> optimization? Would the results be consistent with min(nan,num) vs
>> min(num,nan)? Would they be consistent with the new IEEE standard?
>>
>> In general, I think that min(nan,num) should be nan and that our current
>> behavior is not the best.
>>
>> Does anybody have dats points on how this is handled by other compilers?
>>
>> Oh, and if anything is changed, then compile and runtime behavior should
>> always be the same.
>>
>
> Thanks, that makes it clearer what behaviour is accceptable.
>
> So this v3 patch follows Richard Sandiford's suggested approach of
> emitting IFN_FMIN/FMAX
> when dealing with floating-point values and NaN handling is important and
> the target
> supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison
> sequence is emitted.
> For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.
>
> With this patch the nan_1.f90 behaviour is preserved on all targets, we
> get the optimal
> sequence on aarch64 and on x86_64 we avoid the function call, with no
> changes in code generation.
>
> This gives the performance improvement on 521.wrf on aarch64 and leaves it
> unchanged on x86_64.
>
> I'm hoping this addresses all the concerns raised in this thread:
> * The NaN-handling behaviour is unchanged on all platforms.
> * The fast inline sequence is emitted where it is available.
> * No calls to library fmin*/fmax* are emitted where there were none.
> * MIN/MAX_EXPR sequence are emitted where possible.
>
> Is this acceptable?
>

So if I understand it correctly, the "internal fn" thing is a mechanism
that allows to check whether the target supports expanding a builtin inline
or whether it requires a call to an external library function?

If so, then yes, Ok, thanks for the patch!


-- 
Janne Blomqvist


Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-18 Thread Martin Sebor

On 07/18/2018 02:31 AM, Richard Biener wrote:

On Tue, 17 Jul 2018, Martin Sebor wrote:


The attached update takes care of a couple of problems pointed
out by Bernd Edlinger in his comments on the bug.  The ICE he
mentioned in comment #20 was due mixing sizetype, ssizetype,
and size_type_node in c_strlen().  AFAICS, some of it predates
the patch but my changes made it worse and also managed trigger
it.


+has no internal zero bytes.  If the offset falls within the
bounds
+of the string subtract the offset from the length of the string,
+and return that.  Otherwise the length is zero.  Take care to
+use SAVE_EXPR in case the OFFSET has side-effects.  */
+  tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) :
byteoff;
+  offsave = fold_convert (ssizetype, offsave);
+  tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node,
offsave,
+ build_int_cst (ssizetype, len *
eltsize));
+  tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
offsave);
+  return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+ build_zero_cst (ssizetype));

in what case are you expecting to return an actual COND_EXRP and
why is that useful?


It's necessary to correctly handle strings with multiple trailing
nuls, like in:

  const char a[8] = "123";
  int f (int i)
  {
return strlen (a + i);
  }

If (i <= 3) then the length is i.  If it's greater than 3 then
the length is zero.  I'd expect such strings to be quite common,
even pervasive, in the case of multidimensional arrays or arrays
of structs with array members.  (Probably less so in plain one-
dimensional arrays like the one above.)


You return a signed value but bother to
guard it so it is never less than zero.  Why?  Why not simply
return the difference as you did before but with the side-effects
properly handled?


Hopefully the above answers this question (if there's a way
to do it in a more straightforward way please let me know).

FWIW, as I said in bug 86434, I think this folding is premature
and prevents other optimizations that I suspect would be more
profitable.  I'm only preserving it here for now but at some
point I hope we can agree to defer it until later when more
information about the offset is known and when it will
benefit other optimizations.  I read your comments on the bug
and I'll see if it's possible to have it both ways.

Martin




On 07/17/2018 09:19 AM, Martin Sebor wrote:

My enhancement to extract constant strings out of complex
aggregates committed last week introduced a couple of bugs in
dealing with non-constant indices and offsets.  One of the bugs
was fixed earlier today (PR 86528) but another one remains.  It
causes strlen (among other functions) to incorrectly fold
expressions involving a non-constant index into an array of
strings by treating the index the same as a non-consatnt
offset into it.

The non-constant index should either prevent the folding, or it
needs to handle it differently from an offset.

The attached patch takes the conservative approach of avoiding
the folding in this case.  The remaining changes deal with
the fallout from the fix.

Tested on x86_64-linux.

Martin










Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov


On 18/07/18 14:26, Thomas König wrote:

Hi Kyrlll,


Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov :

Thomas, Janne, would this relaxation of NaN handling be acceptable given the 
benefits
mentioned above? If so, what would be the recommended adjustment to the 
nan_1.f90 test?

I would be a bit careful about changing behavior in such a major way. What 
would the results with NaN and infinity then be, with or without optimization? 
Would the results be consistent with min(nan,num) vs min(num,nan)? Would they 
be consistent with the new IEEE standard?

In general, I think that min(nan,num) should be nan and that our current 
behavior is not the best.

Does anybody have dats points on how this is handled by other compilers?

Oh, and if anything is changed, then compile and runtime behavior should always 
be the same.


Thanks, that makes it clearer what behaviour is accceptable.

So this v3 patch follows Richard Sandiford's suggested approach of emitting 
IFN_FMIN/FMAX
when dealing with floating-point values and NaN handling is important and the 
target
supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence 
is emitted.
For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.

With this patch the nan_1.f90 behaviour is preserved on all targets, we get the 
optimal
sequence on aarch64 and on x86_64 we avoid the function call, with no changes 
in code generation.

This gives the performance improvement on 521.wrf on aarch64 and leaves it 
unchanged on x86_64.

I'm hoping this addresses all the concerns raised in this thread:
* The NaN-handling behaviour is unchanged on all platforms.
* The fast inline sequence is emitted where it is available.
* No calls to library fmin*/fmax* are emitted where there were none.
* MIN/MAX_EXPR sequence are emitted where possible.

Is this acceptable?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_fmaxl_aarch64.f90: New test.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..6f5700f2a421d2a735d77c4c4ec0c4c9c058e727 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = COMP (mvar, a2)
+  mvar = COMP (mvar, a3)
   ...
-  return mvar
+  return mvar;
 }
- */
+Where COMP is MIN/MAX_EXPR for integral types or when we don't
+care about NaNs, or IFN_FMIN/MAX when the target has support for
+fast NaN-honouring min/max.  When neither holds expand a sequence
+of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 NE_EXPR, logical_type_node,
 TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-  else
-  {
-	cond = NULL_TREE;
-
+	}
+  else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-  }
+	val = gfc_evaluate_now (val, &se->pre);
 
-  thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+  tree calc;
+  /* If we de

Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-18 Thread Tom de Vries
On 07/06/2018 12:28 PM, Richard Biener wrote:
> On Thu, Jul 5, 2018 at 4:12 PM Tom de Vries  wrote:
>>
>> On 07/05/2018 01:39 PM, Richard Biener wrote:
>>> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:

 [ was: Re: [testsuite/guality, committed] Prevent optimization of local in
 vla-1.c ]

 On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
> On 07/03/2018 11:05 AM, Tom de Vries wrote:
>> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
 Given the array has size i + 1 it's upper bound should be 'i' and 'i'
 should be available via DW_OP_[GNU_]entry_value.

 I see it is

 <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
 DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)

 and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
 storage itself doesn't have a location but the
 type specifies the size.

 (gdb) ptype a
 type = char [variable length]
 (gdb) p sizeof(a)
 $3 = 0

 this looks like a gdb bug to me?

>>
>> With gdb patch:
>> ...
>> diff --git a/gdb/findvar.c b/gdb/findvar.c
>> index 8ad5e25cb2..ebaff923a1 100644
>> --- a/gdb/findvar.c
>> +++ b/gdb/findvar.c
>> @@ -789,6 +789,8 @@ default_read_var_value
>>break;
>>
>>  case LOC_OPTIMIZED_OUT:
>> +  if (is_dynamic_type (type))
>> +   type = resolve_dynamic_type (type, NULL,
>> +/* Unused address.  */ 0);
>>return allocate_optimized_out_value (type);
>>
>>  default:
>> ...
>>
>> I get:
>> ...
>> $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
>> Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
>>
>> Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
>> 17return a[0];
>> $1 = 6
>> ...
>>
>
> Well, for -O1 and -O2.
>
> For O3, I get instead:
> ...
> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
> Breakpoint 1 at 0x4004b0: f1. (2 locations)
>
> Breakpoint 1, f1 (i=5) at vla-1.c:17
> 17return a[0];
> $1 = 0
> ...
>

 Hi,

 When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is 
 optimized
 away, but f1 still contains a debug expression describing the upper bound 
 of the
 vla (D.1914):
 ...
  __attribute__((noinline))
  f1 (intD.6 iD.1900)
  {

saved_stack.1_2 = __builtin_stack_save ();
# DEBUG BEGIN_STMT
# DEBUG D#3 => i_1(D) + 1
# DEBUG D#2 => (long intD.8) D#3
# DEBUG D#1 => D#2 + -1
# DEBUG D.1914 => (sizetype) D#1
 ...

 Then f1 is cloned to a version f1.constprop with no parameters, eliminating
 parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
 Consequently, 'print sizeof (a)' yields '0' in gdb.
>>>
>>> So does gdb correctly recognize there isn't any size available or do we 
>>> somehow
>>> generate invalid debug info, not recognizing that D#3 => NULL means
>>> "optimized out" and thus all dependent expressions are "optimized out" as 
>>> well?
>>>
>>> That is, shouldn't gdb do
>>>
>>> (gdb) print sizeof (a)
>>> 
>>>
>>> ?
>>
>> The type for the vla gcc is emitting is an DW_TAG_array_type with
>> DW_TAG_subrange_type without DW_AT_upper_bound or DW_AT_count, which
>> makes the upper bound value 'unknown'. So I'd say the debug info is valid.
> 
> OK, that sounds reasonable.  I wonder if languages like Ada have a way
> to declare an array type with unknown upper bound but known lower bound.
> For
> 
> typedef int arr[];
> arr *x;
> 
> we generate just
> 
>  <1><2d>: Abbrev Number: 2 (DW_TAG_typedef)
> <2e>   DW_AT_name: arr
> <32>   DW_AT_decl_file   : 1
> <33>   DW_AT_decl_line   : 1
> <34>   DW_AT_decl_column : 13
> <35>   DW_AT_type: <0x39>
>  <1><39>: Abbrev Number: 3 (DW_TAG_array_type)
> <3a>   DW_AT_type: <0x44>
> <3e>   DW_AT_sibling : <0x44>
>  <2><42>: Abbrev Number: 4 (DW_TAG_subrange_type)
>  <2><43>: Abbrev Number: 0
> 
> which does
> 
> (gdb) ptype arr
> type = int []
> (gdb) ptype x
> type = int (*)[]
> (gdb) p sizeof (arr)
> $1 = 0
> 
> so I wonder whether the patch makes it print 
> instead?  I think both 0 and  are less than ideal
> and maybe  would be better.  In the type case
> above it's certainly not "optimized out".
> 

I ran into trouble with the earlier posted gdb patch, in
gdb/testsuite/gdb.fortran/vla-sizeof.exp.

I've submitted a second try to gdb-patches ("[PATCH][exp] Interpret size
of vla with unknown size as " at
https://sou

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Thomas König
Hi Kyrlll,

> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov :
> 
> Thomas, Janne, would this relaxation of NaN handling be acceptable given the 
> benefits
> mentioned above? If so, what would be the recommended adjustment to the 
> nan_1.f90 test?

I would be a bit careful about changing behavior in such a major way. What 
would the results with NaN and infinity then be, with or without optimization? 
Would the results be consistent with min(nan,num) vs min(num,nan)? Would they 
be consistent with the new IEEE standard?

In general, I think that min(nan,num) should be nan and that our current 
behavior is not the best.

Does anybody have dats points on how this is handled by other compilers?

Oh, and if anything is changed, then compile and runtime behavior should always 
be the same.

Regards, Thomas

[Patch, avr, PR85624] - Fix ICE when initializing 128-byte aligned array

2018-07-18 Thread Senthil Kumar Selvaraj
Hi,

The below patch fixes an ICE for the avr target when the setmemhi
expander is involved.

The setmemhi expander generated RTL ends up as an unrecognized insn
if the alignment of the destination exceeds that of a QI
mode const_int (127), AND the number of bytes to set fits in a QI
mode const_int. The second condition prevents *clrmemhi from matching,
and *clrmemqi does not match because it expects operand 3 (the alignment
const_int rtx) to be QI mode, and a value of 128 or greater does not fit.
  
The patch fixes this by changing the *clrmemqi pattern to match a HI
mode const_int, and also adds a testcase.

Regression test showed no new failures, ok to commit to trunk?

Regards
Senthil

gcc/ChangeLog:

2018-07-18  Senthil Kumar Selvaraj  

PR target/85624
* config/avr/avr.md (*clrmemqi): Change mode of operands[2]
from QI to HI.

gcc/testsuite/ChangeLog:

2018-07-18  Senthil Kumar Selvaraj  

PR target/85624
* gcc.target/avr/pr85624.c: New test.

diff --git gcc/config/avr/avr.md gcc/config/avr/avr.md
index e619e695418..644e3cfabc5 100644
--- gcc/config/avr/avr.md
+++ gcc/config/avr/avr.md
@@ -1095,7 +1095,7 @@
   [(set (mem:BLK (match_operand:HI 0 "register_operand" "e"))
 (const_int 0))
(use (match_operand:QI 1 "register_operand" "r"))
-   (use (match_operand:QI 2 "const_int_operand" "n"))
+   (use (match_operand:HI 2 "const_int_operand" "n"))
(clobber (match_scratch:HI 3 "=0"))
(clobber (match_scratch:QI 4 "=&1"))]
   ""
diff --git gcc/testsuite/gcc.target/avr/pr85624.c 
gcc/testsuite/gcc.target/avr/pr85624.c
new file mode 100644
index 000..ede2e80216a
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr85624.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* This testcase exposes PR85624. An alignment directive with
+   a value greater than 127 on an array with dimensions that fit
+   QImode causes an 'unrecognizable insn' ICE. Turns out clrmemqi
+   did not match the pattern expanded by setmemhi, because it
+   assumed the alignment val will fit in a QI. */
+
+int foo() {
+  volatile int arr[3] __attribute__((aligned(128))) = {0};
+  return arr[2];
+}


[PATCH] Fix PR86557

2018-07-18 Thread Richard Biener


The following fixes the vectorizer part of PR86557, vectorizing
of EXACT_DIV_EXPR.  The x86 backend still lacks arithmetic DImode
right shift support for vectors without AVX512.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-07-18  Richard Biener  

PR tree-optimization/86557
* tree-vect-patterns.c (vect_recog_divmod_pattern): Also handle
EXACT_DIV_EXPR.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 4c22afd2b5f..0f63ccf87bb 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2684,6 +2684,7 @@ vect_recog_divmod_pattern (stmt_vec_info stmt_vinfo, tree 
*type_out)
   switch (rhs_code)
 {
 case TRUNC_DIV_EXPR:
+case EXACT_DIV_EXPR:
 case TRUNC_MOD_EXPR:
   break;
 default:
@@ -2730,7 +2731,8 @@ vect_recog_divmod_pattern (stmt_vec_info stmt_vinfo, tree 
*type_out)
 
   cond = build2 (LT_EXPR, boolean_type_node, oprnd0,
 build_int_cst (itype, 0));
-  if (rhs_code == TRUNC_DIV_EXPR)
+  if (rhs_code == TRUNC_DIV_EXPR
+ || rhs_code == EXACT_DIV_EXPR)
{
  tree var = vect_recog_temp_ssa_var (itype, NULL);
  tree shift;


[Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-18 Thread Umesh Kalappa
Hi Nagy/Ramana,

Please help us to review the attached patch and do let me know your comments .

No regress in the  gcc.target  suite for arm target.

Thank you
~Umesh

On Tue, Jul 17, 2018 at 4:01 PM, Umesh Kalappa  wrote:
> Will do, thanks.
> Thanks
>
> On Tue, Jul 17, 2018, 3:24 PM Ramana Radhakrishnan
>  wrote:
>>
>> On Tue, Jul 17, 2018 at 10:41 AM, Umesh Kalappa
>>  wrote:
>> > Hi Nagy,
>> >
>> > Please  help us with your comments on the attached patch for the issue
>> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512)
>> >
>> > Thank you and waiting for your inputs on the same.
>>
>>
>> Patches should be sent to gcc-patches@gcc.gnu.org with a clear
>> description of what the patch hopes to
>> achieve and why this is correct, how was it tested and if a regression
>> test needs to be added - add one please.
>> Please read https://gcc.gnu.org/contribute.html before sending a patch.
>>
>> This is the wrong list to send patches to.
>>
>> regards
>> Ramana
>> > ~Umesh
>> >
>> > On Fri, Jul 13, 2018 at 1:22 PM, Umesh Kalappa
>> >  wrote:
>> >> Thank you and issue  raised at
>> >> gcc-patches@gcc.gnu.org
>> >>
>> >> ~Umesh
>> >>
>> >> On Thu, Jul 12, 2018 at 9:33 PM, Szabolcs Nagy 
>> >> wrote:
>> >>> On 12/07/18 16:20, Umesh Kalappa wrote:
>> 
>>  Hi everyone,
>> 
>>  we have our source base ,that was compiled for armv7 on gcc8.1 with
>>  soft-float and for following input
>> 
>>  a=0x0010
>>  b=0x0001
>> 
>>    result = a - b ;
>> 
>>  we are getting the result as "0x000e" and with
>>  -mhard-float (disabled the flush to zero mode ) we are getting the
>>  result as ""0x000f" as expected.
>> 
>> >>>
>> >>> please submit it as a bug report to bugzilla
>> >>>
>> >>>
>>  while debugging the soft-float code,we see that ,the compiler calls
>>  the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
>>  "a" in r0 and r1 registers and respectively for "b".
>> 
>>  we are investigating the routine "__aeabi_dsub" that comes from
>>  libgcc
>>  for incorrect result  and meanwhile we would like to know that
>> 
>>  a)do libgcc routines/intrinsic for float operations support or
>>  consider the subnormal values ? ,if so how we can enable the same.
>> 
>>  Thank you
>>  ~Umesh
>> 
>> >>>


pr86512.patch
Description: Binary data


Re: [C++ PATCH] Disallow type specifiers among lambda-declarator decl-specifier-seq (PR c++/86550)

2018-07-18 Thread Jason Merrill
OK.

On Wed, Jul 18, 2018 at 6:20 PM, Jakub Jelinek  wrote:
> On Wed, Jul 18, 2018 at 11:34:30AM +1000, Jason Merrill wrote:
>> On Wed, Jul 18, 2018 at 4:57 AM, Jakub Jelinek  wrote:
>> > The standard says:
>> > "In the decl-specifier-seq of the lambda-declarator, each decl-specifier 
>> > shall
>> > either be mutable or constexpr."
>> > and the C++ FE has CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR flag for that.
>> > But as implemented, it is actually
>> > CP_PARSER_FLAGS_ONLY_TYPE_OR_MUTABLE_OR_CONSTEXPR
>> > as it allows mutable, constexpr and type specifiers.
>> >
>> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>> > trunk?
>> >
>> > 2018-07-17  Jakub Jelinek  
>> >
>> > PR c++/86550
>> > * parser.c (cp_parser_decl_specifier_seq): Don't parse a type 
>> > specifier
>> > if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
>>
>> I think the diagnostic would be better if we parse the type-specifier
>> and then give an error about it being invalid in this context, rather
>> than not parse it and therefore give a syntax error; the constraint is
>> semantic rather than syntactic.
>
> So like this?
> It will diagnose each bool and int separately, but that is similar to how it
> will diagnose
> [] () static extern thread_local inline virtual virtual explicit {}
> too.
>
> 2018-07-18  Jakub Jelinek  
>
> PR c++/86550
> * parser.c (cp_parser_decl_specifier_seq): Diagnose invalid type
> specifier if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
>
> * g++.dg/cpp0x/lambda/lambda-86550.C: New test.
>
> --- gcc/cp/parser.c.jj  2018-07-17 20:08:07.630224343 +0200
> +++ gcc/cp/parser.c 2018-07-18 10:09:10.655030931 +0200
> @@ -13797,6 +13797,9 @@ cp_parser_decl_specifier_seq (cp_parser*
>   found_decl_spec = true;
>   if (!is_cv_qualifier)
> decl_specs->any_type_specifiers_p = true;
> +
> + if ((flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) != 0)
> +   error_at (token->location, "type-specifier invalid in 
> lambda");
> }
> }
>
> --- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C.jj 2018-07-18 
> 10:05:02.894767883 +0200
> +++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C2018-07-18 
> 10:13:41.373318350 +0200
> @@ -0,0 +1,9 @@
> +// PR c++/86550
> +// { dg-do compile { target c++11 } }
> +
> +void
> +foo ()
> +{
> +  auto a = []() bool {};   // { dg-error "type-specifier 
> invalid in lambda" }
> +  auto b = []() bool bool bool bool int {};// { dg-error "type-specifier 
> invalid in lambda" }
> +}
>
>
> Jakub


Re: cleanup cross product code in VRP

2018-07-18 Thread Aldy Hernandez

Hi again!

Well, since this hasn't been reviewed and I'm about to overhaul the 
TYPE_OVERFLOW_WRAPS code anyhow, might as well lump it all in one patch.


On 07/16/2018 09:19 AM, Aldy Hernandez wrote:

Howdy!

I've abstracted out the cross product calculations into its own 
function, and have adapted it to deal with wide ints so it's more 
reusable.  It required some shuffling around, and implementing things a 
bit different, but things should be behave as before.


I also renamed vrp_int_const_binop to make its intent clearer, 
especially now that it's really just a wrapper to wide_int_binop that 
deals with overflow.


(If wide_int_binop_overflow is generally useful, perhaps we could merge 
it with wide_int_overflow.)


This is the same as the previous patch, plus I'm abstracting the 
TYPE_OVERFLOW_WRAPS code as well.  With this, the code dealing with 
MULT_EXPR in vrp gets reduced to handling value_range specific stuff. 
Yay code re-use!


A few notes:

This is dead code.  I've removed it:

-  /* If we have an unsigned MULT_EXPR with two VR_ANTI_RANGEs,
-drop to VR_VARYING.  It would take more effort to compute a
-precise range for such a case.  For example, if we have
-op0 == 65536 and op1 == 65536 with their ranges both being
-~[0,0] on a 32-bit machine, we would have op0 * op1 == 0, so
-we cannot claim that the product is in ~[0,0].  Note that we
-are guaranteed to have vr0.type == vr1.type at this
-point.  */
-  if (vr0.type == VR_ANTI_RANGE
- && !TYPE_OVERFLOW_UNDEFINED (expr_type))
-   {
- set_value_range_to_varying (vr);
- return;
-   }

Also, the vrp_int typedef has a weird name, especially when we have 
widest2_int in gimple-fold.c that does the exact thing.  I've moved the 
common code to wide-int.h and tree.h so we can all share :).


At some point we could move the wide_int_range* and wide_int_binop* code 
into its own file.


Tested on x86-64 Linux.

OK?

gcc/

	* wide-int.h (widest2_int): New.
	* gimple-fold.c (arith_overflowed_p): Use it.
	* tree.h (widest2_int_cst): New.
	* tree-vrp.c (wide_int_binop_overflow): Rename from
	vrp_int_const_binop.
	Rewrite to work on trees.
	(extract_range_from_multiplicative_op_1): Abstract code to...
	(wide_int_range_min_max): ...here.
	(wide_int_range_cross_product): ...and here.
	(extract_range_from_binary_expr_1): Abstract overflow code to...
	(wide_int_range_cross_product_wrapping): ...here.
	* tree-vrp.h (wide_int_range_cross_product): New.
	(wide_int_range_cross_product_wrapping): New.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index a6b42834d32..027ca4da97c 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -3986,9 +3986,6 @@ bool
 arith_overflowed_p (enum tree_code code, const_tree type,
 		const_tree arg0, const_tree arg1)
 {
-  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) widest2_int;
-  typedef generic_wide_int  >
-widest2_int_cst;
   widest2_int warg0 = widest2_int_cst (arg0);
   widest2_int warg1 = widest2_int_cst (arg1);
   widest2_int wres;
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2e1ee86a161..41274b3898c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -968,64 +968,43 @@ value_range_constant_singleton (value_range *vr)
indeterminate.  */
 
 static bool
-vrp_int_const_binop (enum tree_code code, tree val1, tree val2, wide_int *res)
+wide_int_binop_overflow (wide_int &res,
+			 enum tree_code code,
+			 const wide_int &w0, const wide_int &w1,
+			 signop sign, bool overflow_undefined)
 {
-  wi::overflow_type overflow = wi::OVF_NONE;
-  signop sign = TYPE_SIGN (TREE_TYPE (val1));
-  wide_int w1 = wi::to_wide (val1);
-  wide_int w2 = wi::to_wide (val2);
-
-  switch (code)
-{
-case RSHIFT_EXPR:
-case LSHIFT_EXPR:
-  w2 = wi::to_wide (val2, TYPE_PRECISION (TREE_TYPE (val1)));
-  /* FALLTHRU */
-case MULT_EXPR:
-case TRUNC_DIV_EXPR:
-case EXACT_DIV_EXPR:
-case FLOOR_DIV_EXPR:
-case CEIL_DIV_EXPR:
-case ROUND_DIV_EXPR:
-  if (!wide_int_binop (*res, code, w1, w2, sign, &overflow))
-	return false;
-  break;
-
-default:
-  gcc_unreachable ();
-}
+  wi::overflow_type overflow;
+  if (!wide_int_binop (res, code, w0, w1, sign, &overflow))
+return false;
 
   /* If the operation overflowed return -INF or +INF depending on the
  operation and the combination of signs of the operands.  */
-  if (overflow
-  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (val1)))
-{
-  int sign1 = tree_int_cst_sgn (val1);
-  int sign2 = tree_int_cst_sgn (val2);
-
-  /* Notice that we only need to handle the restricted set of
-	 operations handled by extract_range_from_binary_expr.
-	 Among them, only multiplication, addition and subtraction
-	 can yield overflow without overflown operands because we
-	 are working with integral types only... except in the
-	 case VAL1 = -INF and VAL2 = -1 which overflows to +INF
-	 for division too.  */
-
-  /* F

[gomp5] Add support for C++ range for loops in #pragma omp {distribute,for,simd,taskloop}

2018-07-18 Thread Jakub Jelinek
Hi!

This patch adds support for C++ range for loops, including range for loops
with structured bindings for OpenMP constructs.

Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-07-18  Jakub Jelinek  

gcc/
* tree.h (OMP_CLAUSE_FIRSTPRIVATE_NO_REFERENCE): Define.
* gimplify.c (gimplify_omp_for): Handle C++ range for loops with
NULL TREE_PURPOSE in OMP_FOR_ORIG_DECLS.  Firstprivatize
__for_end and __for_range temporaries on OMP_PARALLEL for
distribute parallel for{, simd}.
* omp-low.c (lower_rec_input_clauses): Handle
OMP_CLAUSE_FIRSTPRIVATE_NO_REFERENCE on OMP_CLAUSE_FIRSTPRIVATE
clauses.
gcc/c-family/
* c-omp.c (c_omp_check_loop_iv_r): Look for orig decl of C++
range for loops too.
gcc/cp/
* cp-tree.h (cp_convert_omp_range_for, cp_finish_omp_range_for,
finish_omp_for_block): Declare.
* parser.c (cp_parser_for): Pass false as new is_omp argument to
cp_parser_range_for.
(cp_parser_range_for): Add is_omp argument, return before finalizing
if it is true.
(cp_convert_omp_range_for, cp_finish_omp_range_for): New functions.
(cp_parser_omp_for_loop): Parse C++11 range for loops among omp
loops.
(cp_parser_omp_simd, cp_parser_omp_for, cp_parser_omp_distribute,
cp_parser_omp_taskloop): Call keep_next_level before
begin_omp_structured_block and call finish_omp_for_block on
finish_omp_structured_block result.
* semantics.c (handle_omp_for_class_iterator): Don't create a new
TREE_LIST if one has been created already for range for, just fill
TREE_PURPOSE and TREE_VALUE.
(finish_omp_for): Don't check cond/incr if cond is global_namespace.
Pass to c_omp_check_loop_iv_exprs orig_declv if non-NULL.  Don't
use IS_EMPTY_STMT on NULL pre_body.
(finish_omp_for_block): New function.
* pt.c (tsubst_decomp_names): Add forward declaration.
(tsubst_omp_for_iterator): Change orig_declv into a reference.
Handle range for loops.  Move orig_declv handling after declv/initv
handling.
(tsubst_expr): Call keep_next_level before begin_omp_structured_block.
Call cp_finish_omp_range_for for range for loops and use
{begin,finish}_omp_structured_block instead of {push,pop}_stmt_list
if there are any range for loops.  Call finish_omp_for_block on
finish_omp_structured_block result.
(dependent_omp_for_p): Always return true for range for loops if
processing_template_decl.
gcc/testsuite/
* g++.dg/gomp/for-21.C: New test.
libgomp/
* testsuite/libgomp.c++/for-23.C: New test.
* testsuite/libgomp.c++/for-24.C: New test.
* testsuite/libgomp.c++/for-25.C: New test.

--- gcc/tree.h.jj   2018-07-17 16:41:46.120069780 +0200
+++ gcc/tree.h  2018-07-17 17:24:39.972318592 +0200
@@ -1460,6 +1460,11 @@ extern tree maybe_wrap_with_location (tr
 #define OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FIRSTPRIVATE)->base.public_flag)
 
+/* True on a FIRSTPRIVATE clause if only the reference and not what it refers
+   to should be firstprivatized.  */
+#define OMP_CLAUSE_FIRSTPRIVATE_NO_REFERENCE(NODE) \
+  TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FIRSTPRIVATE))
+
 /* True on a LASTPRIVATE clause if a FIRSTPRIVATE clause for the same
decl is present in the chain.  */
 #define OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE(NODE) \
--- gcc/gimplify.c.jj   2018-07-17 16:41:46.221069916 +0200
+++ gcc/gimplify.c  2018-07-17 17:24:39.975318596 +0200
@@ -10259,7 +10259,9 @@ gimplify_omp_for (tree *expr_p, gimple_s
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (inner_for_stmt)); i++)
if (OMP_FOR_ORIG_DECLS (inner_for_stmt)
&& TREE_CODE (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (inner_for_stmt),
-   i)) == TREE_LIST)
+   i)) == TREE_LIST
+   && TREE_PURPOSE (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (inner_for_stmt),
+  i)))
  {
tree orig = TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (inner_for_stmt), i);
/* Class iterators aren't allowed on OMP_SIMD, so the only
@@ -10313,6 +10315,43 @@ gimplify_omp_for (tree *expr_p, gimple_s
OMP_CLAUSE_CHAIN (c) = OMP_PARALLEL_CLAUSES (*data[1]);
OMP_PARALLEL_CLAUSES (*data[1]) = c;
  }
+  /* Similarly, take care of C++ range for temporaries, those should
+be firstprivate on OMP_PARALLEL if any.  */
+  if (data[1])
+   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (inner_for_stmt)); i++)
+ if (OMP_FOR_ORIG_DECLS (inner_for_stmt)
+ && TREE_CODE (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (inner_for_stmt),
+ i)) == TREE_LIST
+ && TREE_CHAIN (TREE_VEC_ELT (OMP

Re: [PATCH]Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-18 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
>  wrote:
>>
>>
>> On 18/07/18 10:44, Richard Biener wrote:
>> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
>> >  wrote:
>> >> Hi Richard,
>> >>
>> >> On 17/07/18 14:27, Richard Biener wrote:
>> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
>> >>>  wrote:
>>  Hi all,
>> 
>>  This is my first Fortran patch, so apologies if I'm missing something.
>>  The current expansion of the min and max intrinsics explicitly expands
>>  the comparisons between each argument to calculate the global min/max.
>>  Some targets, like aarch64, have instructions that can calculate
>>  the min/max
>>  of two real (floating-point) numbers with the proper NaN-handling
>>  semantics
>>  (if both inputs are NaN, return Nan. If one is NaN, return the
>>  other) and those
>>  are the semantics provided by the __builtin_fmin/max family of
>>  functions that expand
>>  to these instructions.
>> 
>>  This patch makes the frontend emit __builtin_fmin/max directly to
>>  compare each
>>  pair of numbers when the numbers are floating-point, and use
>>  MIN_EXPR/MAX_EXPR otherwise
>>  (integral types and -ffast-math) which should hopefully be easier
>>  to recognise in the
>> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
>> >>> require things that
>> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>> >> The current implementation expands to:
>> >>   mvar = a1;
>> >>   if (a2 .op. mvar || isnan (mvar))
>> >> mvar = a2;
>> >>   if (a3 .op. mvar || isnan (mvar))
>> >> mvar = a3;
>> >>   ...
>> >>   return mvar;
>> >>
>> >> That is, if one of the operands is a NaN it will return the other 
>> >> argument.
>> >> If both (all) are NaNs, it will return NaN. This is the same as the 
>> >> semantics of fmin/max
>> >> as far as I can tell.
>> >>
>> >>> /* Minimum and maximum values.  When used with floating point, if both
>> >>>  operands are zeros, or if either operand is NaN, then it is
>> >>> unspecified
>> >>>  which of the two operands is returned as the result.  */
>> >>>
>> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
>> >>> zeros or NaNs.
>> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS
>> >>> if singed
>> >>> zeros are significant.
>> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I
>> >> guarded their use
>> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS
>> >> (type) && !HONOR_NANS (type).
>> >>
>> >>
>> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
>> >>> is a good idea,
>> >>> this may both generate bigger code and be slower.
>> >> The patch will generate fmin/fmax calls (or the fminf,fminl
>> >> variants) when mathfn_built_in advertises
>> >> them as available (does that mean they'll have a fast inline
>> >> implementation?)
>> > This doesn't mean anything given you make them available with your
>> > patch ;)  So I expect it may
>> > cause issues for !c99_runtime targets (and long double at least).
>>
>> Urgh, that can cause headaches...
>>
>> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of
>> >> fmin/fmax then the patch falls back
>> >> to the existing expansion.
>> > As said I would not use fmin/fmax calls here at all.
>>
>> ... Given the comments from Thomas and Janne, maybe we should just
>> emit MIN/MAX_EXPRs here
>> since there is no language requirement on NaN/signed zero handling on
>> these intrinsics?
>> That should make it simpler and more portable.
>
> That's fortran maintainers call.
>
>> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017
>> >> on aarch64.
>> > You said that, yes.  Even without -ffast-math?
>>
>> It improves at -O3 without -ffast-math in particular. With -ffast-math
>> phiopt optimisation
>> is more aggressive and merges the conditionals into MIN/MAX_EXPRs
>> (minmax_replacement in tree-ssa-phiopt.c)
>
> The question is will it be slower without -ffast-math, that is, when
> fmin/max() calls are emitted rather
> than inline conditionals.
>
> I think a patch just using MAX/MIN_EXPR within the existing
> constraints and otherwise falling back to
> the current code would be more obvious and other changes should be
> mande independently.

If going to MIN_EXPR and MAX_EXPR unconditionally isn't acceptable,
maybe an alternative would be to go straight to internal functions,
under the usual:

  direct_internal_fn_supported_p (IFN_F{MIN,MAX}, type, OPTIMIZE_FOR_SPEED)

condition.

Thanks,
Richard


[PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov

Hi all,

Thank you for the feedback so far.
This version of the patch doesn't try to emit fmin/fmax function calls but 
instead
emits MIN/MAX_EXPR sequences unconditionally.
I think a source of confusion in the original proposal (for me at least) was
that on aarch64 (that I primarily work on) we implement the fmin/fmax optabs
and therefore these calls are expanded to a single instruction.
But on x86_64 these optabs are not implemented and therefore expand to actual 
library calls.
Therefore at -O3 (no -ffast-math) I saw a gain on aarch64. But I measured today
on x86_64 and saw a regression.

Thomas and Janne suggested that the Fortran standard does not impose a 
requirement
on NaN handling for the min/max intrinsics, which would make emitting 
MIN/MAX_EXPR
sequences unconditionally a valid approach.

However, the gfortran.dg/nan_1.f90 test checks that handling of NaN values in
these intrinsics follows the IEEE semantics (min (nan, 2.0) == 2.0, for 
example).
This is not required by the standard, but is the existing gfortran behaviour.

If we end up always emitting MIN/MAX_EXPR sequences, like this version of the 
patch does,
then that test fails on some configurations of x86_64 and not others (for me it 
FAILs
by default, but passes with -march=native on my machine) and passes on AArch64.
This is expected since MIN/MAX_EXPR doesn't enforce IEEE behaviour on its 
arguments.

However, by always emitting MIN/MAX_EXPR the gfc_conv_intrinsic_minmax function 
is
simplified and, perhaps more importantly, generates faster code in the -O3 case.
With this patch I see performance improvement on 521.wrf on both AArch64 (3.7%)
and x86_64 (5.4%).

Thomas, Janne, would this relaxation of NaN handling be acceptable given the 
benefits
mentioned above? If so, what would be the recommended adjustment to the 
nan_1.f90 test?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
sequence to calculate the min/max.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_float.f90: New test.
* gfortran.dg/min_float.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..e5a1f1ddabeedc7b9f473db11e70f29548fc69ac 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -3874,14 +3874,11 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = MIN/MAX_EXPR (mvar, a2);
+  mvar = MIN/MAX_EXPR (mvar, a3);
   ...
-  return mvar
-}
- */
+  return mvar;
+}  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3888,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3908,37 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 NE_EXPR, logical_type_node,
 TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-  else
-  {
-	cond = NULL_TREE;
-
+	}
+  else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-  }
-
-  thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+	val = gfc_evaluate_now (val, &se->pre);
 
-  tmp = fold_build2_loc (input_location, op, logical_type_node,
-			 convert (type, val), mvar);
+  tree calc;
 
-  /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-  if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
-	{
-	  isnan = build_call_expr_loc (input_location,
-   builtin_decl_explicit (BUILT_IN_ISNAN),
-   1, mvar);
-	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
- logical_type_node, tmp,
- fold_convert (logical_type_node

[C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515, take 2)

2018-07-18 Thread Jakub Jelinek
On Fri, Jul 13, 2018 at 06:53:30PM +0200, Jakub Jelinek wrote:
> On Fri, Jul 13, 2018 at 12:24:02PM -0400, Nathan Sidwell wrote:
> > On 07/13/2018 09:49 AM, Jakub Jelinek wrote:
> > > Hi!
> > > 
> > > I'd like to ping the following C++ patches:
> > > 
> > > - PR c++/85515
> > >make range for temporaries unspellable during parsing and only
> > >turn them into spellable for debug info purposes
> > >http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html
> > 
> > 
> > How hard would it be to add the 6 special identifiers to the C++ global
> > table via initialize_predefined_identifiers (decl.c) and then use them
> > directly in the for range machinery?  repeated get_identifier
> > ("string-const") just smells bad.
> 
> Probably not too hard, but we have hundreds of other
> get_identifier ("string-const") calls in the middle-end, C++ FE, other FEs.
> Are those 6 more important than say "abi_tag", "gnu", "begin", "end", "get",
> "tuple_size", "tuple_element", and many others?
> 
> Is it worth caching those?

Anyway, here is an updated patch that uses the get_identifier caching.
Ok for trunk?

Shall I submit an incremental patch for the "abi_tag", "gnu", "begin", "end", 
"get",
"tuple_size", "tuple_element" etc. identifiers?

2018-07-18  Jakub Jelinek  

PR c++/85515
* cp-tree.h (enum cp_tree_index): Add
CPTI_FOR_{RANGE,BEGIN,END}{,_}_IDENTIFIER.
(for_range__identifier, for_begin__identifier, for_end__identifier,
for_range_identifier, for_begin_identifier, for_end_identifier):
Define.
* decl.c (initialize_predefined_identifiers): Initialize
for_{range,begin,end}{,_}_identifier.
* parser.c (build_range_temp): Use for_range__identifier instead of
get_identifier ("__for_range").
(cp_convert_range_for): Use for_begin__identifier and
for_end__identifier instead of get_identifier ("__for_begin") and
get_identifier ("__for_end").
* semantics.c (finish_for_stmt): Rename "__for_{range,begin,end} "
local symbols to "__for_{range,begin,end}".

* g++.dg/pr85515-2.C: Add expected dg-error.
* g++.dg/cpp0x/range-for36.C: New test.

--- gcc/cp/cp-tree.h.jj 2018-06-29 09:38:17.790306399 +0200
+++ gcc/cp/cp-tree.h2018-07-18 11:57:55.980529748 +0200
@@ -154,6 +154,12 @@ enum cp_tree_index
 CPTI_AUTO_IDENTIFIER,
 CPTI_DECLTYPE_AUTO_IDENTIFIER,
 CPTI_INIT_LIST_IDENTIFIER,
+CPTI_FOR_RANGE__IDENTIFIER,
+CPTI_FOR_BEGIN__IDENTIFIER,
+CPTI_FOR_END__IDENTIFIER,
+CPTI_FOR_RANGE_IDENTIFIER,
+CPTI_FOR_BEGIN_IDENTIFIER,
+CPTI_FOR_END_IDENTIFIER,
 
 CPTI_LANG_NAME_C,
 CPTI_LANG_NAME_CPLUSPLUS,
@@ -274,6 +280,12 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define auto_identifier
cp_global_trees[CPTI_AUTO_IDENTIFIER]
 #define decltype_auto_identifier   
cp_global_trees[CPTI_DECLTYPE_AUTO_IDENTIFIER]
 #define init_list_identifier   
cp_global_trees[CPTI_INIT_LIST_IDENTIFIER]
+#define for_range__identifier  
cp_global_trees[CPTI_FOR_RANGE__IDENTIFIER]
+#define for_begin__identifier  
cp_global_trees[CPTI_FOR_BEGIN__IDENTIFIER]
+#define for_end__identifier
cp_global_trees[CPTI_FOR_END__IDENTIFIER]
+#define for_range_identifier   
cp_global_trees[CPTI_FOR_RANGE_IDENTIFIER]
+#define for_begin_identifier   
cp_global_trees[CPTI_FOR_BEGIN_IDENTIFIER]
+#define for_end_identifier cp_global_trees[CPTI_FOR_END_IDENTIFIER]
 #define lang_name_ccp_global_trees[CPTI_LANG_NAME_C]
 #define lang_name_cplusplus
cp_global_trees[CPTI_LANG_NAME_CPLUSPLUS]
 
--- gcc/cp/decl.c.jj2018-07-12 21:34:44.798598796 +0200
+++ gcc/cp/decl.c   2018-07-18 11:59:06.220595473 +0200
@@ -4044,6 +4044,12 @@ initialize_predefined_identifiers (void)
 {"auto", &auto_identifier, cik_normal},
 {"decltype(auto)", &decltype_auto_identifier, cik_normal},
 {"initializer_list", &init_list_identifier, cik_normal},
+{"__for_range ", &for_range__identifier, cik_normal},
+{"__for_begin ", &for_begin__identifier, cik_normal},
+{"__for_end ", &for_end__identifier, cik_normal},
+{"__for_range", &for_range_identifier, cik_normal},
+{"__for_begin", &for_begin_identifier, cik_normal},
+{"__for_end", &for_end_identifier, cik_normal},
 {NULL, NULL, cik_normal}
   };
 
--- gcc/cp/parser.c.jj  2018-07-18 10:09:10.655030931 +0200
+++ gcc/cp/parser.c 2018-07-18 12:00:22.907667232 +0200
@@ -11952,8 +11952,8 @@ build_range_temp (tree range_expr)
  type_uses_auto (range_type));
 
   /* Create the __range variable.  */
-  range_temp = build_decl (input_location, VAR_DECL,
-  get_identifier ("__for_range"), range_type);
+  range_temp = build_decl (input_location, VAR_DECL, for_range__identifier,
+  range_type);
   TREE_USED (range_temp) = 1;
   DECL_ARTIFICIAL (range_tem

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-18 Thread Richard Biener
On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
 wrote:
>
>
> On 18/07/18 10:44, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> >  wrote:
> >> Hi Richard,
> >>
> >> On 17/07/18 14:27, Richard Biener wrote:
> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >>>  wrote:
>  Hi all,
> 
>  This is my first Fortran patch, so apologies if I'm missing something.
>  The current expansion of the min and max intrinsics explicitly expands
>  the comparisons between each argument to calculate the global min/max.
>  Some targets, like aarch64, have instructions that can calculate the 
>  min/max
>  of two real (floating-point) numbers with the proper NaN-handling 
>  semantics
>  (if both inputs are NaN, return Nan. If one is NaN, return the other) 
>  and those
>  are the semantics provided by the __builtin_fmin/max family of functions 
>  that expand
>  to these instructions.
> 
>  This patch makes the frontend emit __builtin_fmin/max directly to 
>  compare each
>  pair of numbers when the numbers are floating-point, and use 
>  MIN_EXPR/MAX_EXPR otherwise
>  (integral types and -ffast-math) which should hopefully be easier to 
>  recognise in the
> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> >>> require things that
> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
> >> The current implementation expands to:
> >>   mvar = a1;
> >>   if (a2 .op. mvar || isnan (mvar))
> >> mvar = a2;
> >>   if (a3 .op. mvar || isnan (mvar))
> >> mvar = a3;
> >>   ...
> >>   return mvar;
> >>
> >> That is, if one of the operands is a NaN it will return the other argument.
> >> If both (all) are NaNs, it will return NaN. This is the same as the 
> >> semantics of fmin/max
> >> as far as I can tell.
> >>
> >>> /* Minimum and maximum values.  When used with floating point, if both
> >>>  operands are zeros, or if either operand is NaN, then it is 
> >>> unspecified
> >>>  which of the two operands is returned as the result.  */
> >>>
> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> >>> zeros or NaNs.
> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if 
> >>> singed
> >>> zeros are significant.
> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded 
> >> their use
> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) 
> >> && !HONOR_NANS (type).
> >>
> >>
> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> >>> is a good idea,
> >>> this may both generate bigger code and be slower.
> >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
> >> mathfn_built_in advertises
> >> them as available (does that mean they'll have a fast inline 
> >> implementation?)
> > This doesn't mean anything given you make them available with your
> > patch ;)  So I expect it may
> > cause issues for !c99_runtime targets (and long double at least).
>
> Urgh, that can cause headaches...
>
> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of 
> >> fmin/fmax then the patch falls back
> >> to the existing expansion.
> > As said I would not use fmin/fmax calls here at all.
>
> ... Given the comments from Thomas and Janne, maybe we should just emit 
> MIN/MAX_EXPRs here
> since there is no language requirement on NaN/signed zero handling on these 
> intrinsics?
> That should make it simpler and more portable.

That's fortran maintainers call.

> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on 
> >> aarch64.
> > You said that, yes.  Even without -ffast-math?
>
> It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt 
> optimisation
> is more aggressive and merges the conditionals into MIN/MAX_EXPRs 
> (minmax_replacement in tree-ssa-phiopt.c)

The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.

> Thanks,
> Kyrill
>
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>> Richard.
> >>>
>  midend and optimise. The previous approach of generating the open-coded 
>  version of that
>  is used when we don't have an appropriate __builtin_fmin/max available.
>  For example, for a configuration of x86_64-unknown-linux-gnu that I 
>  tested there was no
>  128-bit __built_fminl available.
> 
>  With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions 
>  being generated at -O3
>  on aarch64 for 521.wrf from fprate SPEC2017 where none before were 
>  generated
>  (we were generating explicit comparisons and NaN checks). This g

Re: [PATCH] Introduce instance discriminators

2018-07-18 Thread Richard Biener
On Wed, Jul 18, 2018 at 8:53 AM Alexandre Oliva  wrote:
>
> This patch is a rewrite of an earlier patch submitted at
> https://gcc.gnu.org/ml/gcc-patches/2012-11/msg02340.html
>
> With -gnateS, the Ada compiler sets itself up to output discriminators
> for different instantiations of generics, but the middle and back ends
> have lacked support for that.  This patch introduces the missing bits,
> translating the GNAT-internal representation of the instance map to an
> instance_table that maps ordinary line-map indices to instance
> discriminators.
>
> Instance discriminators are not compatible with LTO, in that the
> instance mapping is not preserved in LTO dumps.  There are no plans to
> preserve discriminators in them.

Because...?  I think that's a sentence that should cause me to say "no"
to this patch ;)

Is it possible to merge the BB discriminator stuff with the new framework?

> This patch (minus whitespace changes and tests) was regstrapped on
> x86_64-linux-gnu.  The final form of the patch was tested with a
> non-bootstrap build, and a single-test check-gnat run.  Ok to install?
>
>
> From: Olivier Hainque 
> for  libcpp/ChangeLog
>
> * include/line-map.h (ORDINARY_MAP_INDEX): New.
>
> for  gcc/ChangeLog
>
> * einput.c: New file.  Allow associating "line context"
> extension data to instruction location info, for sets of
> locations covered by an ordinary line_map structure.
> * einput.h: Likewise.
> * Makefile.in (OBJS): Add einput.o.
> * input.c (expand_location_1): On request, provide pointer to the
> line map that was used to resolve the input location.
> (map_expand_location): New function.  Same as expand_location,
> also providing the map from which the input location was resolved.
> (expand_location, expand_location_to_spelling_point): Adjust calls
> to expand_location_1.
> (linemap_client_expand_location_to_spelling_point): Likewise.
> * input.h (map_expand_location): Declare.
> * emit-rtl.c (insn_location): Handle a location_lc* argument.
> * rtl.h (insn_location): Adjust prototype.
> * print-rtl.c (print_rtx): Adjust call to insn_location.
> * modulo-sched.c (dump_insn_location): Likewise.
> * tree-inline.c (copy_bb): Copy discriminator field as well.
> * flag-types.h (loc_discriminator_type): New enum, allowing BB
> or INSTANCE_ID discriminators.
> * common.opt (loc_discriminator_kind): New variable, conveying the
> kinf of discriminator we want to see emited with source locations.
> * final.c (bb_discriminator, last_bb_discriminator): New statics,
> to track basic block discriminators.
> (final_start_function_1): Initialize them.
> (final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track
> bb_discriminator.
> (notice_source_line): If INSN_HAS_LOCATION, update current
> discriminator from BB or INSTANCE_ID depending on the kind we're
> requested to convey.  When deciding to emit, account for both
> possible kinds of discriminators.
>
> for  gcc/ada
>
> * trans.c (gigi): When requested so, allocate and populate
> the gcc table controlling the emission of per-instance debug
> info.
>
> From: Alexandre Oliva  , Olivier Hainque  
> 
> for  gcc/testsuite/ChangeLog
>
> * gnat.dg/dinst.adb: New.
> * gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New.
> ---
>  gcc/Makefile.in |1 +
>  gcc/ada/gcc-interface/trans.c   |   10 ++
>  gcc/common.opt  |   12 
>  gcc/einput.c|   55 
> +++
>  gcc/einput.h|   50 
>  gcc/emit-rtl.c  |   11 +--
>  gcc/final.c |   29 +++---
>  gcc/flag-types.h|   14 +
>  gcc/input.c |   32 +---
>  gcc/input.h |2 +
>  gcc/modulo-sched.c  |2 +
>  gcc/print-rtl.c |2 +
>  gcc/rtl.h   |3 +-
>  gcc/testsuite/gnat.dg/dinst.adb |   20 +
>  gcc/testsuite/gnat.dg/dinst_pkg.adb |7 
>  gcc/testsuite/gnat.dg/dinst_pkg.ads |4 +++
>  gcc/tree-inline.c   |2 +
>  libcpp/include/line-map.h   |8 +
>  18 files changed, 247 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/einput.c
>  create mode 100644 gcc/einput.h
>  create mode 100644 gcc/testsuite/gnat.dg/dinst.adb
>  create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb
>  create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 2a05a66ea9b87..f9a9fe8726b18 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1285,6 +1285,7 @@

Re: [PR86544] Fix Popcount detection generates different code on C and C++

2018-07-18 Thread Richard Biener
On Wed, Jul 18, 2018 at 4:19 AM Kugan Vivekanandarajah
 wrote:
>
> Attached patch fixes phi-opt not optimizing c++ testcase where we have
>
> if (b_4(D) == 0) instead of if (b_4(D) != 0) as shown in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544
>
> Patch bootstrapped and regression tested on x86_64-linux-gnu with no
> new regressions.
>
> Is this OK for trunk?

OK.

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2018-07-18  Kugan Vivekanandarajah  
>
> PR middle-end/86544
> * tree-ssa-phiopt.c (cond_removal_in_popcount_pattern): Handle
> comparison with EQ_EXPR
> in last stmt.
>
> gcc/testsuite/ChangeLog:
>
> 2018-07-18  Kugan Vivekanandarajah  
>
> PR middle-end/86544
> * g++.dg/tree-ssa/pr86544.C: New test.


Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-18 Thread Kyrill Tkachov



On 18/07/18 10:44, Richard Biener wrote:

On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
 wrote:

Hi Richard,

On 17/07/18 14:27, Richard Biener wrote:

On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
 wrote:

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that 
expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR 
otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise 
in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

The current implementation expands to:
  mvar = a1;
  if (a2 .op. mvar || isnan (mvar))
mvar = a2;
  if (a3 .op. mvar || isnan (mvar))
mvar = a3;
  ...
  return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics 
of fmin/max
as far as I can tell.


/* Minimum and maximum values.  When used with floating point, if both
 operands are zeros, or if either operand is NaN, then it is unspecified
 which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their 
use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && 
!HONOR_NANS (type).



I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).


Urgh, that can cause headaches...


If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax 
then the patch falls back
to the existing expansion.

As said I would not use fmin/fmax calls here at all.


... Given the comments from Thomas and Janne, maybe we should just emit 
MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these 
intrinsics?
That should make it simpler and more portable.


FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?


It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt 
optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs 
(minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill


Richard.


Thanks,
Kyrill


Richard.


midend and optimise. The previous approach of generating the open-coded version 
of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% 
improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  

   * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
   __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
   __builtin_fmaxl.
   * trans-intrinsic.c: Include builtins.h.
   (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
   functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  

   * gfortran.dg/max_fmaxf.f90: New test.
   * gfortran.dg/min_fminf.f90: Likewise.
   * gfortran.dg/minmax_integer.f90: Likewise.
   * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
   * gfortran.dg/min_fminl_aarch64.f90: Likewise.




Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-18 Thread Richard Biener
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
 wrote:
>
> Hi Richard,
>
> On 17/07/18 14:27, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >  wrote:
> >> Hi all,
> >>
> >> This is my first Fortran patch, so apologies if I'm missing something.
> >> The current expansion of the min and max intrinsics explicitly expands
> >> the comparisons between each argument to calculate the global min/max.
> >> Some targets, like aarch64, have instructions that can calculate the 
> >> min/max
> >> of two real (floating-point) numbers with the proper NaN-handling semantics
> >> (if both inputs are NaN, return Nan. If one is NaN, return the other) and 
> >> those
> >> are the semantics provided by the __builtin_fmin/max family of functions 
> >> that expand
> >> to these instructions.
> >>
> >> This patch makes the frontend emit __builtin_fmin/max directly to compare 
> >> each
> >> pair of numbers when the numbers are floating-point, and use 
> >> MIN_EXPR/MAX_EXPR otherwise
> >> (integral types and -ffast-math) which should hopefully be easier to 
> >> recognise in the
> > What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> > require things that
> > are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>
> The current implementation expands to:
>  mvar = a1;
>  if (a2 .op. mvar || isnan (mvar))
>mvar = a2;
>  if (a3 .op. mvar || isnan (mvar))
>mvar = a3;
>  ...
>  return mvar;
>
> That is, if one of the operands is a NaN it will return the other argument.
> If both (all) are NaNs, it will return NaN. This is the same as the semantics 
> of fmin/max
> as far as I can tell.
>
> > /* Minimum and maximum values.  When used with floating point, if both
> > operands are zeros, or if either operand is NaN, then it is unspecified
> > which of the two operands is returned as the result.  */
> >
> > which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> > zeros or NaNs.
> > Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> > zeros are significant.
>
> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded 
> their use
> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && 
> !HONOR_NANS (type).
>
>
> >
> > I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> > is a good idea,
> > this may both generate bigger code and be slower.
>
> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
> mathfn_built_in advertises
> them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).

> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax 
> then the patch falls back
> to the existing expansion.

As said I would not use fmin/fmax calls here at all.

> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?

Richard.

> Thanks,
> Kyrill
>
> >
> > Richard.
> >
> >> midend and optimise. The previous approach of generating the open-coded 
> >> version of that
> >> is used when we don't have an appropriate __builtin_fmin/max available.
> >> For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
> >> there was no
> >> 128-bit __built_fminl available.
> >>
> >> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
> >> generated at -O3
> >> on aarch64 for 521.wrf from fprate SPEC2017 where none before were 
> >> generated
> >> (we were generating explicit comparisons and NaN checks). This gave a 2.4% 
> >> improvement
> >> in performance on a Cortex-A72.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu and 
> >> x86_64-unknown-linux-gnu.
> >>
> >> Ok for trunk?
> >> Thanks,
> >> Kyrill
> >>
> >> 2018-07-17  Kyrylo Tkachov  
> >>
> >>   * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>   __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>   __builtin_fmaxl.
> >>   * trans-intrinsic.c: Include builtins.h.
> >>   (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>   functions to calculate the min/max.
> >>
> >> 2018-07-17  Kyrylo Tkachov  
> >>
> >>   * gfortran.dg/max_fmaxf.f90: New test.
> >>   * gfortran.dg/min_fminf.f90: Likewise.
> >>   * gfortran.dg/minmax_integer.f90: Likewise.
> >>   * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>   * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>


Re: [PATCH 1/4] Clean up of new format of -falign-FOO.

2018-07-18 Thread Martin Liška
On 07/18/2018 05:52 AM, Michael Collison wrote:
> Hi Martin,
> 
> Your alignment patch breaks the arm port. In the file arm.c, function 
> 'get_label_padding' the code uses:
> 
> static HOST_WIDE_INT
> get_label_padding (rtx label)
> {
>   HOST_WIDE_INT align, min_insn_size;
> 
>   align = 1 << label_to_alignment (label);
>   min_insn_size = TARGET_THUMB ? 2 : 4;
>   return align > min_insn_size ? align - min_insn_size : 0;
> }
> 
> Which breaks with your current change. I think this needs to be modified to:
> 
> 'align = 1 << label_to_alignment (label).levels[0].log'

Hello.

Sorry for the breakage, thank to Jeff it's fixed in this way.
r262848 should be fine.

Martin

> 
> Regards,
> 
> Michael Collison
> 



Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-18 Thread Richard Biener
On Tue, 17 Jul 2018, Martin Sebor wrote:

> The attached update takes care of a couple of problems pointed
> out by Bernd Edlinger in his comments on the bug.  The ICE he
> mentioned in comment #20 was due mixing sizetype, ssizetype,
> and size_type_node in c_strlen().  AFAICS, some of it predates
> the patch but my changes made it worse and also managed trigger
> it.

+has no internal zero bytes.  If the offset falls within the 
bounds
+of the string subtract the offset from the length of the string,
+and return that.  Otherwise the length is zero.  Take care to
+use SAVE_EXPR in case the OFFSET has side-effects.  */
+  tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) :
byteoff;
+  offsave = fold_convert (ssizetype, offsave);
+  tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node,
offsave,
+ build_int_cst (ssizetype, len *
eltsize));
+  tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
offsave);
+  return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+ build_zero_cst (ssizetype));

in what case are you expecting to return an actual COND_EXRP and
why is that useful?  You return a signed value but bother to
guard it so it is never less than zero.  Why?  Why not simply
return the difference as you did before but with the side-effects
properly handled?

> On 07/17/2018 09:19 AM, Martin Sebor wrote:
> > My enhancement to extract constant strings out of complex
> > aggregates committed last week introduced a couple of bugs in
> > dealing with non-constant indices and offsets.  One of the bugs
> > was fixed earlier today (PR 86528) but another one remains.  It
> > causes strlen (among other functions) to incorrectly fold
> > expressions involving a non-constant index into an array of
> > strings by treating the index the same as a non-consatnt
> > offset into it.
> > 
> > The non-constant index should either prevent the folding, or it
> > needs to handle it differently from an offset.
> > 
> > The attached patch takes the conservative approach of avoiding
> > the folding in this case.  The remaining changes deal with
> > the fallout from the fix.
> > 
> > Tested on x86_64-linux.
> > 
> > Martin
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [C++ PATCH] Disallow type specifiers among lambda-declarator decl-specifier-seq (PR c++/86550)

2018-07-18 Thread Jakub Jelinek
On Wed, Jul 18, 2018 at 11:34:30AM +1000, Jason Merrill wrote:
> On Wed, Jul 18, 2018 at 4:57 AM, Jakub Jelinek  wrote:
> > The standard says:
> > "In the decl-specifier-seq of the lambda-declarator, each decl-specifier 
> > shall
> > either be mutable or constexpr."
> > and the C++ FE has CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR flag for that.
> > But as implemented, it is actually
> > CP_PARSER_FLAGS_ONLY_TYPE_OR_MUTABLE_OR_CONSTEXPR
> > as it allows mutable, constexpr and type specifiers.
> >
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> >
> > 2018-07-17  Jakub Jelinek  
> >
> > PR c++/86550
> > * parser.c (cp_parser_decl_specifier_seq): Don't parse a type 
> > specifier
> > if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
> 
> I think the diagnostic would be better if we parse the type-specifier
> and then give an error about it being invalid in this context, rather
> than not parse it and therefore give a syntax error; the constraint is
> semantic rather than syntactic.

So like this?
It will diagnose each bool and int separately, but that is similar to how it
will diagnose
[] () static extern thread_local inline virtual virtual explicit {}
too.

2018-07-18  Jakub Jelinek  

PR c++/86550
* parser.c (cp_parser_decl_specifier_seq): Diagnose invalid type
specifier if CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.

* g++.dg/cpp0x/lambda/lambda-86550.C: New test.

--- gcc/cp/parser.c.jj  2018-07-17 20:08:07.630224343 +0200
+++ gcc/cp/parser.c 2018-07-18 10:09:10.655030931 +0200
@@ -13797,6 +13797,9 @@ cp_parser_decl_specifier_seq (cp_parser*
  found_decl_spec = true;
  if (!is_cv_qualifier)
decl_specs->any_type_specifiers_p = true;
+
+ if ((flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) != 0)
+   error_at (token->location, "type-specifier invalid in lambda");
}
}
 
--- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C.jj 2018-07-18 
10:05:02.894767883 +0200
+++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C2018-07-18 
10:13:41.373318350 +0200
@@ -0,0 +1,9 @@
+// PR c++/86550
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  auto a = []() bool {};   // { dg-error "type-specifier 
invalid in lambda" }
+  auto b = []() bool bool bool bool int {};// { dg-error "type-specifier 
invalid in lambda" }
+}


Jakub


Re: [C++ Patch] PR 59480 ("Missing error diagnostic: friend declaration specifying a default argument must be a definition")

2018-07-18 Thread Jason Merrill
OK.

On Thu, Jul 12, 2018 at 7:52 PM, Paolo Carlini  wrote:
> Hi,
>
> the below resolves the bug report and its duplicates by implementing - in a
> rather straightforward way, I believe - the resolution of DR 136, which also
> made into C++17. Note that in the patch I used permerror instead of a plain
> error for consistency with the other check
> (check_redeclaration_no_default_args) which I added (rather) recently, and
> I'm exploiting that to allow two existing testcases to compile as they are.
> Tested x86_64-linux.
>
> Thanks, Paolo.
>
> /
>


Re: [PATCH] S/390: Add CFI for mcount call sequences

2018-07-18 Thread Andreas Krebbel
On 07/17/2018 12:48 PM, Ilya Leoshkevich wrote:
> Fixes unwind for mcount.
> 
> 2018-07-17  Ilya Leoshkevich  
> 
>   * config/s390/s390.c (s390_function_profiler):
> Generate CFI.

Applied. Thanks!

Andreas