Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-19 Thread Kewen.Lin
Hi Segher,

on 2019/11/20 上午1:29, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote:
>> +;; code iterators and attributes for vector FP comparison operators:
>> +(define_code_iterator vector_fp_comparison_operator [lt le ne
>> + ungt unge unlt unle])
> 
> Let's indent that differently?
> 
> (define_code_iterator
>   vector_fp_comparison_operator [lt le ne ungt unge unlt unle])
> 
> is nice I think.
> 

Done.

>> +; There are 14 possible vector FP comparison operators, gt and eq of them 
>> have
>> +; been expanded above, so just support 12 remaining operators here.
>>  
>> +; 0. For ge:
> 
> (Don't number comments please).
> 

Done.

>> +(define_expand "vector_ge"
>> +  [(set (match_operand:VEC_F 0 "vlogical_operand")
>> +(ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
>> +  (match_operand:VEC_F 2 "vlogical_operand")))]
>>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>> +  "")
> 
> This already exists?  Line 764 in vector.md?
> 

Yes, that one is only for VEC_F (aka. vector FP), move to here to centralize
them.  Does it make sense?

>> +; 1. For lt/le/ne/ungt/unge/unlt/unle:
>> +; lt(a,b)   = gt(b,a)
>> +; le(a,b)   = ge(b,a)
>> +; unge(a,b) = ~lt(a,b)
>> +; unle(a,b) = ~gt(a,b)
>> +; ne(a,b)   = ~eq(a,b)
>> +; ungt(a,b) = ~le(a,b)
>> +; unlt(a,b) = ~ge(a,b)
>> +(define_insn_and_split "vector_"
>>[(set (match_operand:VEC_F 0 "vfloat_operand")
>> +(vector_fp_comparison_operator:VEC_F
>> +   (match_operand:VEC_F 1 "vfloat_operand")
>> +   (match_operand:VEC_F 2 "vfloat_operand")))]
>>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>>"#"
>> +  "can_create_pseudo_p ()"
>> +  [(pc)]
>>  {
>> +  enum rtx_code cond = ;
>> +  rtx res = gen_reg_rtx (mode);
>> +  bool need_invert = false;
>> +  switch (cond)
>> +{
>> +case LT:
>> +case LE:
>> +  cond = swap_condition (cond);
>> +  std::swap (operands[1], operands[2]);
>> +  break;
>> +case UNLE:
>> +case UNLT:
>> +case NE:
>> +case UNGE:
>> +case UNGT:
>> +  cond = reverse_condition_maybe_unordered (cond);
>> +  need_invert = true;
>> +  break;
>> +default:
>> +  gcc_unreachable ();
>> +}
>> +
>> +  rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
>> +  emit_insn (gen_rtx_SET (res, comp));
>> +
>> +  if (need_invert)
>> +emit_insn (gen_one_cmpl2 (res, res));
>> +
>> +  emit_insn (gen_rtx_SET (operands[0], res));
>>  })
> 
> Does this work for (say) ungt?  I would do two switch statements: the
> first only to do the invert, and then the second to do the swap (with
> the modified cond!)  So:
> 

Yes, it works but it has to call further splitting for "le".  It looks
better to split to the end at once.  Thanks a lot for your example!

> {
>   enum rtx_code cond = ;
>   bool need_invert = false;
> 
>   if (cond == UNLE || cond == UNLT || cond == NE
>   || cond == UNGE || cond == UNGT)
> {
>   cond = reverse_condition_maybe_unordered (cond);
>   need_invert = true;
> }
> 
>   if (cond == LT || cond == LE)
> {
>   cond = swap_condition (cond);
>   std::swap (operands[1], operands[2]);
> }
> 

I added one assert here to guard the codes.

>   rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> 
>   if (need_invert)
> {
>   rtx res = gen_reg_rtx (mode);
>   emit_insn (gen_rtx_SET (res, comp));
>   emit_insn (gen_one_cmpl2 (operands[0], res));
> }
>   else
> emit_insn (gen_rtx_SET (operands[0], comp));
> })
> 

>> +; 2. For ltgt/uneq/ordered/unordered:
>> +; ltgt: gt(a,b) | gt(b,a)
>> +; uneq: ~gt(a,b) & ~gt(b,a)
>> +; ordered: ge(a,b) | ge(b,a)
>> +; unordered: ~ge(a,b) & ~ge(b,a)
> 
> You could write that as
>   ~(ge(a,b) | ge(b,a))
> which may show the structure better?
> 

Done.

>> +(define_insn_and_split "vector_"
>>[(set (match_operand:VEC_F 0 "vfloat_operand")
>> +(vector_fp_extra_comparison_operator:VEC_F
>> +   (match_operand:VEC_F 1 "vfloat_operand")
>> +   (match_operand:VEC_F 2 "vfloat_operand")))]
>>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>>"#"
>> +  "can_create_pseudo_p ()"
> 
> This should be "&& can_create_pseudo ()".
> 

Done.

> Also, can_create_pseudo in the split condition can fail, in theory
> anyway.  It should be part of the insn condition, instead, and even
> then it can fail: after the last split pass, but before reload. such
> an insn can in principle be created, and then it sill be never split.
> 
> In this case, maybe you should add a scratch register; in the previous
> case, you can reuse operands[0] for res instead of making a new reg
> for it, if !can_create_pseudo ().
> 

I've updated the previous case with operands[0] with !can_create_pseudo
check.  But for the later one I met ICE if I added two clobbers with
scratch into the RTL pattern like:

  [(set (match_operand:VEC_F 0 

Re: [PATCH] Fix ICE with __builtin_stack_save (PR c/90898)

2019-11-19 Thread Richard Biener
On Wed, 20 Nov 2019, Jakub Jelinek wrote:

> Hi!
> 
> I agree that we shouldn't have made __builtin_stack_{save,restore} public,
> but I'd fear it is too late to remove it now (and replace by internal fn),
> I'd think some code might use it to control when e.g. alloca will be
> released.  As the comment on insert_clobber* says, the addition of the
> clobbers is a best effort, we could end up not finding the right spot and
> not adding the CLOBBER in there even without user __builtin_* calls, so
> this patch just removes an unnecessary assertion and just will not find
> __builtin_stack_restore if something weird is seen, such as in the testcase
> storing of the __builtin_stack_save result into memory, or could be
> a cast or whatever else too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-11-19  Jakub Jelinek  
> 
>   PR c/90898
>   * tree-ssa-ccp.c (insert_clobber_before_stack_restore): Remove
>   assertion.
>   (insert_clobbers_for_var): Fix a typo in function comment.
> 
>   * gcc.dg/pr90898.c: New test.
> 
> --- gcc/tree-ssa-ccp.c.jj 2019-11-13 10:54:47.093020613 +0100
> +++ gcc/tree-ssa-ccp.c2019-11-19 19:13:35.332705130 +0100
> @@ -2114,8 +2114,6 @@ insert_clobber_before_stack_restore (tre
>  else if (gimple_assign_ssa_name_copy_p (stmt))
>insert_clobber_before_stack_restore (gimple_assign_lhs (stmt), var,
>  visited);
> -else
> -  gcc_assert (is_gimple_debug (stmt));
>  }
>  
>  /* Advance the iterator to the previous non-debug gimple statement in the 
> same
> @@ -2140,9 +2138,9 @@ gsi_prev_dom_bb_nondebug (gimple_stmt_it
>  /* Find a BUILT_IN_STACK_SAVE dominating gsi_stmt (I), and insert
> a clobber of VAR before each matching BUILT_IN_STACK_RESTORE.
>  
> -   It is possible that BUILT_IN_STACK_SAVE cannot be find in a dominator 
> when a
> -   previous pass (such as DOM) duplicated it along multiple paths to a BB.  
> In
> -   that case the function gives up without inserting the clobbers.  */
> +   It is possible that BUILT_IN_STACK_SAVE cannot be found in a dominator 
> when
> +   a previous pass (such as DOM) duplicated it along multiple paths to a BB.
> +   In that case the function gives up without inserting the clobbers.  */
>  
>  static void
>  insert_clobbers_for_var (gimple_stmt_iterator i, tree var)
> --- gcc/testsuite/gcc.dg/pr90898.c.jj 2019-11-19 19:18:02.277712801 +0100
> +++ gcc/testsuite/gcc.dg/pr90898.c2019-11-19 19:18:52.613959787 +0100
> @@ -0,0 +1,16 @@
> +/* PR c/90898 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void *p;
> +int bar (void);
> +void baz (int *);
> +
> +void
> +foo (void)
> +{
> +  p = __builtin_stack_save ();
> +  int a[(bar (), 2)];
> +  baz (a);
> +  __builtin_stack_restore (p);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Fix ICE during MEM_REF expansion (PR middle-end/90840)

2019-11-19 Thread Richard Biener
On Wed, 20 Nov 2019, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase we ICE on i686-linux (32-bit), because we store
> (first 96-bit, then 72-bit) structure into the first part of a 2x 96-bit
> complex long double, and for 96-bit floats there is no corresponding
> integral mode that covers it and we ICE when op0 is not in MEM (it is a
> REG).
> The following patch handles the simple case where the whole dest REG is
> covered and value is a MEM using a load from the memory, and for the rest
> just spills on the stack, similarly how we punt when for stores to complex
> REGs if the bitsize/bitnum cover portions of both halves.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2019-11-19  Jakub Jelinek  
> 
>   PR middle-end/90840
>   * expmed.c (store_bit_field_1): Handle the case where op0 is not a MEM
>   and has a mode that doesn't have corresponding integral type.
> 
>   * gcc.c-torture/compile/pr90840.c: New test.
> 
> --- gcc/expmed.c.jj   2019-11-15 00:37:32.0 +0100
> +++ gcc/expmed.c  2019-11-19 17:09:22.035129617 +0100
> @@ -840,6 +840,27 @@ store_bit_field_1 (rtx str_rtx, poly_uin
>if (MEM_P (op0))
>   op0 = adjust_bitfield_address_size (op0, op0_mode.else_blk (),
>   0, MEM_SIZE (op0));
> +  else if (!op0_mode.exists ())
> + {
> +   if (ibitnum == 0
> +   && known_eq (ibitsize, GET_MODE_BITSIZE (GET_MODE (op0)))
> +   && MEM_P (value)
> +   && !reverse)
> + {
> +   value = adjust_address (value, GET_MODE (op0), 0);
> +   emit_move_insn (op0, value);
> +   return true;
> + }
> +   if (!fallback_p)
> + return false;
> +   rtx temp = assign_stack_temp (GET_MODE (op0),
> + GET_MODE_SIZE (GET_MODE (op0)));
> +   emit_move_insn (temp, op0);
> +   store_bit_field_1 (temp, bitsize, bitnum, 0, 0, fieldmode, value,
> +  reverse, fallback_p);
> +   emit_move_insn (op0, temp);
> +   return true;
> + }
>else
>   op0 = gen_lowpart (op0_mode.require (), op0);
>  }
> --- gcc/testsuite/gcc.c-torture/compile/pr90840.c.jj  2019-11-19 
> 17:18:31.361918896 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr90840.c 2019-11-19 
> 17:11:06.010575339 +0100
> @@ -0,0 +1,19 @@
> +/* PR middle-end/90840 */
> +struct S { long long a; int b; };
> +struct S foo (void);
> +struct __attribute__((packed)) T { long long a; char b; };
> +struct T baz (void);
> +
> +void
> +bar (void)
> +{
> +  _Complex long double c;
> +  *(struct S *)  = foo ();
> +}
> +
> +void
> +qux (void)
> +{
> +  _Complex long double c;
> +  *(struct T *)  = baz ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH v3] PR92398: Fix testcase failure of pr72804.c

2019-11-19 Thread luoxhu
P9LE generated instruction is not worse than P8LE.
mtvsrdd;xxlnot;stxv vs. not;not;std;std.
Update the test case to fix failures.

v3:
Define and use check_effective_target_xxx etc.
pre_power8: ... power6, power7.
power8: power8 only.
post_power8: power8, power9 ...
post_power9: power9, power10 ...

gcc/testsuite/ChangeLog:

2019-11-15  Luo Xiong Hu  

testsuite/pr92398
* gcc.target/powerpc/pr72804.c: Update instruction count per
platform.
* lib/target-supports.exp (check_effective_target_pre_power8): New.
(check_effective_target_power8): New.
(check_effective_target_post_power8): New.
(check_effective_target_post_power9): New.
---
 gcc/testsuite/gcc.target/powerpc/pr72804.c | 39 
 gcc/testsuite/lib/target-supports.exp  | 43 ++
 2 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.c 
b/gcc/testsuite/gcc.target/powerpc/pr72804.c
index 10e37caed6b..69da8942ddd 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr72804.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr72804.c
@@ -1,7 +1,6 @@
 /* { dg-do compile { target { lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-O2 -mvsx -fno-inline-functions --param 
max-inline-insns-single-O2=200" } */
+/* { dg-require-effective-target powerpc_vsx_ok  } */
+/* { dg-options "-O2 -mvsx" } */
 
 __int128_t
 foo (__int128_t *src)
@@ -15,11 +14,35 @@ bar (__int128_t *dst, __int128_t src)
   *dst =  ~src;
 }
 
-/* { dg-final { scan-assembler-times "not " 4 } } */
-/* { dg-final { scan-assembler-times "std " 2 } } */
+/* { dg-final { scan-assembler-times "not " 2 { target post_power9 } } } */
+/* { dg-final { scan-assembler-not "std " { target post_power9 } } } */
+/* { dg-final { scan-assembler-not "stxvd2x" { target post_power9 } } } */
+/* { dg-final { scan-assembler-not "xxpermdi" { target post_power9 } } } */
+
+/* { dg-final { scan-assembler-times "not " 2 { target { {power8} && {be} } } 
} } */
+/* { dg-final { scan-assembler-not "std " { target { {power8} && {be} } } } } 
*/
+/* { dg-final { scan-assembler-times "stxvd2x" 1 { target { {power8} && {be} } 
} } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 1 { target { {power8} && {be} 
} } } } */
+
+/* { dg-final { scan-assembler-times "not " 4 { target { {power8} && {le} } } 
} } */
+/* { dg-final { scan-assembler-times "std " 2 { target { {power8} && {le} } } 
} } */
+/* { dg-final { scan-assembler-not "stxvd2x" { target { {power8} && {le} } } } 
} */
+/* { dg-final { scan-assembler-not "xxpermdi" { target { {power8} && {le} } } 
} } */
+
+/* { dg-final { scan-assembler-not "lxvd2x" { target post_power8 } } } */
+
+/* { dg-final { scan-assembler-times "not " 2 { target { {pre_power8} && {be} 
} } } } */
+/* { dg-final { scan-assembler-not "std " { target { {pre_power8} && {be} } } 
} } */
+/* { dg-final { scan-assembler-times "lxvd2x" 1 { target { {pre_power8} && 
{be} } } } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 1 { target { {pre_power8} && 
{be} } } } } */
+
+/* { dg-final { scan-assembler-times "not " 4 { target { {pre_power8} && {le} 
} } } } */
+/* { dg-final { scan-assembler-times "std " 2 { target { {pre_power8} && {le} 
} } } } */
+/* { dg-final { scan-assembler-not "lxvd2x" { target { {pre_power8} && {le} } 
} } } */
+/* { dg-final { scan-assembler-not "stxvd2x" { target { {pre_power8} && {le} } 
} } } */
+
+/* { dg-final { scan-assembler-not "xxpermdi" { target pre_power8 } } } */
+
 /* { dg-final { scan-assembler-times "ld " 2 } } */
-/* { dg-final { scan-assembler-not "lxvd2x" } } */
-/* { dg-final { scan-assembler-not "stxvd2x" } } */
-/* { dg-final { scan-assembler-not "xxpermdi" } } */
 /* { dg-final { scan-assembler-not "mfvsrd" } } */
 /* { dg-final { scan-assembler-not "mfvsrd" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 751045d4744..324ec152544 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2585,6 +2585,49 @@ proc check_effective_target_le { } {
 }]
 }
 
+# Return 1 if we're generating code for pre-power8 platforms.
+# Power8 is NOT included.
+
+proc check_effective_target_pre_power8 {  } {
+  return [check_no_compiler_messages_nocache pre_power8 assembly {
+   #if defined(_ARCH_PWR8)
+   #error NO
+   #endif
+  } ""]
+}
+
+# Return 1 if we're generating code for power8 platforms.
+
+proc check_effective_target_power8 {  } {
+  return [check_no_compiler_messages_nocache power8 assembly {
+   #if !(!defined(_ARCH_PWR9) && defined(_ARCH_PWR8))
+   #error NO
+   #endif
+  } ""]
+}
+
+# Return 1 if we're generating code for post-power8 platforms.
+# Power8 is included.
+
+proc check_effective_target_post_power8 {  } {
+  return [check_no_compiler_messages_nocache post_power8 assembly {
+   #if 

Re: [PATCH] [MIPS] Prevent MSA branches from being put into delay slots

2019-11-19 Thread Dragan Mladjenovic


On 16.11.2019. 00:33, Jeff Law wrote:
> On 11/15/19 10:27 AM, Dragan Mladjenovic wrote:
>> From: "Dragan Mladjenovic" 
>>
>>   This patch tightens the instruction definitions to make sure
>>   that MSA branch instructions cannot be put into delay slots and have their
>>   delay slots eligible for being filled. Also, MSA *div*3 patterns use MSA
>>   branches for zero checks but are not marked as being multi instruction and
>>   thus could be put into delay slots. This patch fixes that.
>>
>>   Testcase only covers if MSA branch delay slot is being filled.
>>
>> gcc/ChangeLog:
>>
>> 2019-11-15  Zoran Jovanovic 
>>  Dragan Mladjenovic  
>>
>>  * config/mips/mips-msa.md (msa__, 
>> msa__v_):
>>  Mark as not having "likely" version.
>>  * config/mips/mips.md (insn_count): The simd_div instruction with
>>  TARGET_CHECK_ZERO_DIV consists of 3 instructions.
>>  (can_delay): Exclude simd_branch.
>>  (defile_delay *): Add simd_branch instructions.
>>  They have one regular delay slot.
> OK
> jeff
>

Thanks, committed as r278458.

Dragan


Re: [PATCH] Fix up arch= handling in x86 target attribute (PR target/90867)

2019-11-19 Thread Uros Bizjak
On Wed, Nov 20, 2019 at 12:53 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The arch= handling in target attribute right now clears almost all isa_flags
> and all isa_flags2, so that later call to ix86_option_override_internal
> can set just the isa options for the specific arch and nothing else.
> Unfortunately, it doesn't work, because next to the ix86_isa_flags{,2}
> we have also ix86_isa_flags{,2}_explicit bitmask and in
> ix86_option_override_internal and say for arch=x86_64 will not try to
> set sse2 isa when we cleared it in ix86_isa_flags but kept it set in
> ix86_isa_flags_explicit.
> So, the testcase works fine with -O2, but doesn't work with -O2 -msse2,
> in the former case ix86_isa_flags_explicit doesn't have MASK_SSE2 bit set,
> but in the latter case it does, so in the former case we end up with
> MASK_SSE2 in ix86_isa_flags, in the latter not in the function with target
> attribute.
>
> The following patch thus clears both ix86_isa_flags{,2} and corresponding
> ix86_isa_flags{,2}_explicit.  Also, so that say
> target ("arch=x86_64", "avx") works, the clearing is done when actually
> seeing the arch=, not at the end.  target ("avx", "arch=x86_64") will still
> not enable avx, like before, though.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-11-19  Jakub Jelinek  
>
> PR target/90867
> * config/i386/i386-options.c (ix86_valid_target_attribute_tree): Don't
> clear opts->x_ix86_isa_flags{,2} here...
> (ix86_valid_target_attribute_inner_p): ... but here when seeing
> arch=.  Also clear opts->x_ix86_isa_flags{,2}_explicit.
>
> * gcc.target/i386/pr90867.c: New test.

OK, also for backports.

Thanks,
Uros.

> --- gcc/config/i386/i386-options.c.jj   2019-11-18 12:07:54.672405129 +0100
> +++ gcc/config/i386/i386-options.c  2019-11-19 18:43:36.426606458 +0100
> @@ -1147,7 +1147,25 @@ ix86_valid_target_attribute_inner_p (tre
>   ret = false;
> }
>   else
> -   p_strings[opt] = xstrdup (p + opt_len);
> +   {
> + p_strings[opt] = xstrdup (p + opt_len);
> + if (opt == IX86_FUNCTION_SPECIFIC_ARCH)
> +   {
> + /* If arch= is set,  clear all bits in x_ix86_isa_flags,
> +except for ISA_64BIT, ABI_64, ABI_X32, and CODE16
> +and all bits in x_ix86_isa_flags2.  */
> + opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
> +| OPTION_MASK_ABI_64
> +| OPTION_MASK_ABI_X32
> +| OPTION_MASK_CODE16);
> + opts->x_ix86_isa_flags_explicit &= (OPTION_MASK_ISA_64BIT
> + | OPTION_MASK_ABI_64
> + | OPTION_MASK_ABI_X32
> + | OPTION_MASK_CODE16);
> + opts->x_ix86_isa_flags2 = 0;
> + opts->x_ix86_isa_flags2_explicit = 0;
> +   }
> +   }
> }
>
>else if (type == ix86_opt_enum)
> @@ -1225,18 +1243,8 @@ ix86_valid_target_attribute_tree (tree f
>/* If we are using the default tune= or arch=, undo the string 
> assigned,
>  and use the default.  */
>if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH])
> -   {
> - opts->x_ix86_arch_string
> -   = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]);
> -
> - /* If arch= is set,  clear all bits in x_ix86_isa_flags,
> -except for ISA_64BIT, ABI_64, ABI_X32, and CODE16.  */
> - opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
> -| OPTION_MASK_ABI_64
> -| OPTION_MASK_ABI_X32
> -| OPTION_MASK_CODE16);
> - opts->x_ix86_isa_flags2 = 0;
> -   }
> +   opts->x_ix86_arch_string
> + = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]);
>else if (!orig_arch_specified)
> opts->x_ix86_arch_string = NULL;
>
> --- gcc/testsuite/gcc.target/i386/pr90867.c.jj  2019-11-19 18:49:11.746592189 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr90867.c 2019-11-19 18:48:17.271406789 
> +0100
> @@ -0,0 +1,30 @@
> +/* PR target/90867 */
> +/* { dg-do run { target lp64 } } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned long long freq = 36UL;   /* 3.6 GHz = 3600.0 MHz */
> +
> +__attribute__((noipa)) void
> +bar (double x)
> +{
> +  static double d = 36.0;
> +  if (x != d)
> +__builtin_abort ();
> +  d /= 1000.0;
> +}
> +
> +__attribute__ ((target ("arch=x86-64"))) int
> +foo ()
> +{
> +  bar ((double) freq);
> +  bar (1e-3 * freq);
> +  bar (1e-6 * freq);
> +  bar (1e-9 * freq);
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  return foo ();
> +}
>
> Jakub
>


Re: [C++ PATCH] Fix weird addr_expr not supported by dump_expr messages (PR c++/90767)

2019-11-19 Thread Jason Merrill

On 11/19/19 11:38 PM, Jakub Jelinek wrote:

Hi!

The following patch is a minimal fix to avoid
cannot convert ‘‘addr_expr’ not supported by dump_type’ to ‘X*’
and similar messages.  The recently added complain_about_bad_argument
function expects a from_type argument, but conv->from isn't necessarily a
type, it can be an expression too.

With this patch one gets error like:
cannot convert ‘const X*’ to ‘X*’
and note like
initializing argument 'this' of ‘void X::foo()’
Still, perhaps what GCC 8 and earlier used to emit might be clearer:
pr90767-1.C: In member function ‘X::operator T() const’:
pr90767-1.C:12:7: error: no matching function for call to ‘X::foo() const’
pr90767-1.C:6:8: note: candidate: ‘void X::foo()’ 
pr90767-1.C:6:8: note:   passing ‘const X*’ as ‘this’ argument discards 
qualifiers
There is the print_conversion_rejection function that handles the various
cases, like this vs. other arguments, conv->from with expr type vs. type
etc.
Though, I must say I don't understand the reasons why 
complain_about_bad_argument
has been added and whether we'd want to emit there what
print_conversion_rejection prints as notes with 2 leading spaces instead as
errors with no leading spaces.


Historically, when we have a single candidate we assume it's chosen by 
overload resolution and try to call it, so we often get different 
diagnostics.  Sometimes better, sometimes worse.  In this case it seems 
about even.  I'm surprised this case was different in GCC 8.



In any case, I think the patch below is a step in the right direction.


Agreed, the patch is OK.


2019-11-19  Jakub Jelinek  

PR c++/90767
* call.c (complain_about_no_candidates_for_method_call): If
conv->from is not a type, pass to complain_about_bad_argument
lvalue_type of conv->from.

* g++.dg/diagnostic/pr90767-1.C: New test.
* g++.dg/diagnostic/pr90767-2.C: New test.

--- gcc/cp/call.c.jj2019-11-18 18:49:14.461880924 +0100
+++ gcc/cp/call.c   2019-11-19 14:40:19.121937148 +0100
@@ -9861,8 +9861,11 @@ complain_about_no_candidates_for_method_
  if (const conversion_info *conv
= maybe_get_bad_conversion_for_unmatched_call (candidate))
{
+ tree from_type = conv->from;
+ if (!TYPE_P (conv->from))
+   from_type = lvalue_type (conv->from);
  complain_about_bad_argument (conv->loc,
-  conv->from, conv->to_type,
+  from_type, conv->to_type,
   candidate->fn, conv->n_arg);
  return;
}
--- gcc/testsuite/g++.dg/diagnostic/pr90767-1.C.jj  2019-11-19 
14:48:00.386041586 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-1.C 2019-11-19 14:46:53.395043036 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct X {
+  int n;
+  void foo (); // { dg-message "initializing argument 'this'" }
+
+  template
+  operator T () const
+{
+  if (n == 0)
+   foo (); // { dg-error "cannot convert 'const X\\*' to 'X\\*'" }
+  return n;
+}
+};
--- gcc/testsuite/g++.dg/diagnostic/pr90767-2.C.jj  2019-11-19 
14:50:48.923522136 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-2.C 2019-11-19 14:52:27.324051149 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct A {
+  struct B { B (int) {} };
+
+  template 
+  void foo ()
+  {
+int x = 0;
+bar (x);   // { dg-error "cannot convert 'int' to 'A::B&'" }
+  }
+
+  void bar (B ) {} // { dg-message "initializing argument 1" }
+};

Jakub





Re: [C++ PATCH] Fix up lambda decl specifier parsing ICE (PR c++/90842)

2019-11-19 Thread Jason Merrill

On 11/19/19 11:46 PM, Jakub Jelinek wrote:

Hi!

In lambdas, the only valid decl specifiers are mutable, constexpr or
consteval.  For various other simple specifiers it is fine to parse them
and reject afterwards if the parsing is simple consuming of a single token
and setting some flags, but as the testcase shows, especially allowing
type specifiers, including new type definitions in there can cause ICEs.
The following patch punts for the cases where the parsing isn't that simple,
which I think is concept, typedef (there we e.g. commit tentative parsing)
or the type specifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok of trunk?

2019-11-19  Jakub Jelinek  

PR c++/90842
* parser.c (cp_parser_decl_specifier_seq): For concept, typedef
or type specifier with CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR
don't try to parse them at all.

* g++.dg/cpp1y/lambda-generic-90842.C: New test.
* g++.dg/cpp0x/lambda/lambda-86550.C: Adjust expected diagnostics.

--- gcc/cp/parser.c.jj  2019-11-14 09:13:24.356104252 +0100
+++ gcc/cp/parser.c 2019-11-19 17:47:24.776014270 +0100
@@ -14094,6 +14094,12 @@ cp_parser_decl_specifier_seq (cp_parser*
  break;
  
  case RID_CONCEPT:

+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
ds = ds_concept;
cp_lexer_consume_token (parser->lexer);


It would seem better to break after consuming the token, so we just skip 
the extra processing and still give the same error.


  
@@ -14136,6 +14142,12 @@ cp_parser_decl_specifier_seq (cp_parser*

  /* decl-specifier:
   typedef  */
case RID_TYPEDEF:
+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
  ds = ds_typedef;
  /* Consume the token.  */
  cp_lexer_consume_token (parser->lexer);
@@ -14229,7 +14241,9 @@ cp_parser_decl_specifier_seq (cp_parser*
  
/* If we don't have a DECL_SPEC yet, then we must be looking at

 a type-specifier.  */
-  if (!found_decl_spec && !constructor_p)
+  if (!found_decl_spec
+ && !constructor_p
+ && (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) == 0)


And instead of this, maybe set CP_PARSER_FLAGS_NO_TYPE_DEFINITIONS so we 
keep the same diagnostic for other type-specifiers?


Jason



Re: introduce -fcallgraph-info option

2019-11-19 Thread Alexandre Oliva
On Nov  6, 2019, Alexandre Oliva  wrote:

>   (CALLEE_FROM_CGRAPH_P): New.
>   (dump_final_callee_vcg, dump_final_node_vcg): New.

I goofed in the testing of this last-minute change.  It only works with
optimization disabled.  With any optimization enabled,
pass_remove_cgraph_callee_edges runs first thing in
pass_all_optimizations{,_g} without a subsequent
pass_rebuild_cgraph_edges.  This release takes place long before we'd
even know which calls make to expand, and thus to the callgraph-info
expected output.

It has just occurred to me that I could retain the logic but make it
conditional on !optimize, but that would be error prone (AFAICT there's
no reason why we don't release callees early without optimization) and
probably not worth it.


drop attempt to reuse cgraph callees for -fcallgraph-info

The information in cgraph callees is released long before we get to
the point in which -fcallgraph-info edges are dumped, or even
expanded.  It doesn't make sense to retain it longer: the edges
created for -fcallgraph-info are much smaller, and they don't even
coexist, so not even peak use grows.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* function.h (CALLEE_FROM_CGRAPH_P): Remove.
* function.c (record_final_call): Record even calls that might
have been in the cgraph.
* toplev.c (dump_final_node_vcg): Skip iteration over cgraph
callees.
---
 gcc/function.c |3 ---
 gcc/function.h |5 +
 gcc/toplev.c   |8 
 3 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index 1fe956b..2534c92 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -6406,9 +6406,6 @@ rest_of_handle_thread_prologue_and_epilogue (void)
 void
 record_final_call (tree callee, location_t location)
 {
-  if (!callee || CALLEE_FROM_CGRAPH_P (callee))
-return;
-
   struct callinfo_callee datum = { location, callee };
   vec_safe_push (cfun->su->callees, datum);
 }
diff --git a/gcc/function.h b/gcc/function.h
index 14794c4..beb5c7d 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -192,15 +192,12 @@ public:
   poly_int64 length;
 };
 
-/* Describe emitted builtin calls for -fcallgraph-info.  Those that
-   are not builtin are taken from cgraph edges.  */
+/* Describe emitted calls for -fcallgraph-info.  */
 struct GTY(()) callinfo_callee
 {
   location_t location;
   tree decl;
 };
-#define CALLEE_FROM_CGRAPH_P(T)\
-  (!fndecl_built_in_p (T) && !DECL_IS_BUILTIN (T))
 
 /* Describe dynamic allocation for -fcallgraph-info=da.  */
 struct GTY(()) callinfo_dalloc
diff --git a/gcc/toplev.c b/gcc/toplev.c
index d4583ba..3b9f9ee 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1093,14 +1093,6 @@ dump_final_node_vcg (FILE *f)
 dump_final_callee_vcg (f, c->location, c->decl);
   vec_free (cfun->su->callees);
   cfun->su->callees = NULL;
-
-  cgraph_node *cnode = cgraph_node::get (current_function_decl);
-  for (cgraph_edge *e = cnode->callees; e; e = e->next_callee)
-if (CALLEE_FROM_CGRAPH_P (e->callee->decl))
-  dump_final_callee_vcg (f, gimple_location (e->call_stmt),
-e->callee->decl);
-  for (cgraph_edge *e = cnode->indirect_calls; e; e = e->next_callee)
-dump_final_callee_vcg (f, gimple_location (e->call_stmt), NULL);
 }
 
 /* Output stack usage and callgraph info, as requested.  */




-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);


Re: [PATCH v2] Add `--with-install-sysroot=' configuration option

2019-11-19 Thread Maciej W. Rozycki
On Tue, 19 Nov 2019, Joseph Myers wrote:

> > > 4. How does this interact with sysroot suffixes (again, this should be 
> > > made clear in the documentation)?
> > 
> >  There is no interaction, the patch merely changes where the libraries are 
> > installed.  If the installation sysroot directory chosen is not one known 
> > by the GCC driver, then the newly-installed target libraries won't be 
> > automatically used (that of course can be changed with the appropriate use 
> > of the `-B', `-L' and `--sysroot=' driver options).
> 
> Perhaps the "sysroot" phrasing of the option name is confusing.
> 
> The documentation in install.texi says "@var{dir} rather than 
> @option{$@{gcc_tooldir@}/lib}".  If that means, for example, that when 
> "-print-multi-os-directory" prints "../lib64" the libraries are installed 
> in $dir/../lib64 (so you'd pass --with-install-sysroot=/some/where/lib 
> rather than --with-install-sysroot=/some/where), it's definitely not a 
> sysroot.  If in fact $dir/lib/../lib64 would be used, the documentation 
> should say so.

 Documentation thinko here, thanks for your meticulousness!  Indeed that 
has to read "@option{$@{gcc_tooldir@}}" as per example code:

  case ${with_install_sysroot} in
no)
  toolexeclibdir='$(toolexecdir)/lib'
  ;;
*)
  toolexeclibdir=${with_install_sysroot}/lib
  ;;
  esac

where "@var{dir}" does get interpreted as a sysroot (as was also 
previously shown by my use example).

> But even then, if you configure GCC using "--with-sysroot" or 
> "--with-build-sysroot", both of those paths are the top-level sysroot, to 
> which the sysroot suffix gets appended before GCC uses it for any purpose, 
> unless you explicitly build using --no-sysroot-suffix.  So I still think 
> it's natural for a user of GCC's configure scripts to expect the new 
> option, like the other sysroot-related configure options, also to be one 
> to which the per-multilib sysroot suffix gets appended before GCC uses it.  
> And if it's not like that, the documentation needs to say so explicitly.

 Thanks for your concern, however again, AFAICT this change is tangential 
to any sysroot suffix, which necessarily has to be already included in the 
multilib OS directory as given by `-print-multi-os-directory', so that it 
gets embedded within $toolexeclibdir for the purpose of target library 
installation across the relevant subdirs, as per this excerpt from 
`configure' code right after the assignments quoted in the example above:

multi_os_directory=`$CC -print-multi-os-directory`
case $multi_os_directory in
  .) ;; # Avoid trailing /.
  *) toolexeclibdir=$toolexeclibdir/$multi_os_directory ;;
esac

or otherwise the existing arrangement where 
toolexeclibdir='$(toolexecdir)/lib' wouldn't have worked either (and 
neither would in the native case where toolexeclibdir='$(libdir)').

 Does this answer clear your concern?  OK to apply with the documentation 
thinko fixed?

  Maciej


Re: [PATCH] Fix ICE during MEM_REF expansion (PR middle-end/90840)

2019-11-19 Thread Jeff Law
On 11/19/19 4:42 PM, Jakub Jelinek wrote:
> Hi!
> 
> On the following testcase we ICE on i686-linux (32-bit), because we store
> (first 96-bit, then 72-bit) structure into the first part of a 2x 96-bit
> complex long double, and for 96-bit floats there is no corresponding
> integral mode that covers it and we ICE when op0 is not in MEM (it is a
> REG).
> The following patch handles the simple case where the whole dest REG is
> covered and value is a MEM using a load from the memory, and for the rest
> just spills on the stack, similarly how we punt when for stores to complex
> REGs if the bitsize/bitnum cover portions of both halves.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR middle-end/90840
>   * expmed.c (store_bit_field_1): Handle the case where op0 is not a MEM
>   and has a mode that doesn't have corresponding integral type.
> 
>   * gcc.c-torture/compile/pr90840.c: New test.
OK
jeff



Re: [PATCH] Fix ICE with __builtin_stack_save (PR c/90898)

2019-11-19 Thread Jeff Law
On 11/19/19 4:58 PM, Jakub Jelinek wrote:
> Hi!
> 
> I agree that we shouldn't have made __builtin_stack_{save,restore} public,
> but I'd fear it is too late to remove it now (and replace by internal fn),
> I'd think some code might use it to control when e.g. alloca will be
> released.  As the comment on insert_clobber* says, the addition of the
> clobbers is a best effort, we could end up not finding the right spot and
> not adding the CLOBBER in there even without user __builtin_* calls, so
> this patch just removes an unnecessary assertion and just will not find
> __builtin_stack_restore if something weird is seen, such as in the testcase
> storing of the __builtin_stack_save result into memory, or could be
> a cast or whatever else too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR c/90898
>   * tree-ssa-ccp.c (insert_clobber_before_stack_restore): Remove
>   assertion.
>   (insert_clobbers_for_var): Fix a typo in function comment.
> 
>   * gcc.dg/pr90898.c: New test.
OK
jeff



Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-11-19 Thread Jeff Law
On 11/19/19 5:03 PM, Jakub Jelinek wrote:
> On Tue, Sep 03, 2019 at 02:27:29PM -0600, Jeff Law wrote:
 +  /* The transformation below will inherently introduce a memory load,
 +   for which LHS may not be initialized yet if it is not in NOTRAP,
 +   so a -Wmaybe-uninitialized warning message could be triggered.
 +   Since it's a bit hard to have a very accurate uninitialization
 +   analysis to memory reference, we disable the warning here to avoid
 +   confusion.  */
 +  TREE_NO_WARNING (lhs) = 1;
>>>
>>> I don't like this, but not for the reasons Martin stated, we use
>>> TREE_NO_WARNING not just when we've emitted warnings, but in many places
>>> when we've done something that might trigger false positives.
>>> Yes, it would be nice to do it more selectively.
>>>
>>> The problem I see with the above though is that lhs might very well be
>>> a decl, and setting TREE_NO_WARNING on it then doesn't affect only the
>>> hoisted load, but also all other code that refers to the decl.
>> LHS is restricted to just MEM_REF, ARRAY_REF and COMPONENT_REF.  We'd be
>> setting the NO_WARNING bits on the toplevel expression, but not on
>> anything shared like a _DECL node.
>>
>> So what we're losing here would be things like out of bounds array
>> checks on the LHS, so it still sucks.
> 
> Sorry for dropping the ball on this.
> You're right, LHS is a MEM_REF, ARRAY_REF or COMPONENT_REF, so what I was
> worried about doesn't happen.
> I've tried using gimple_set_no_warning, but tree-ssa-uninit.c in this case
> actually doesn't look at that (it does only in a different spot).
> 
>>> If the TREE_NO_WARNING bit is set on something that isn't shareable, that is
>>> fine with me, like a MEM_REF, TARGET_MEM_REF or handled component.  If lhs
>>> is a decl, can we force a MEM_REF around it (and won't we fold it back to
>>> the decl?)?  Or perhaps better, can we gimple_set_no_warning on the load
>>> stmt instead?
>> We have the toplevel statement, so that might be worth a try as well.
> 
> But, what the patch did was set it on the tree that is later unshared,
> which means TREE_NO_WARNING wasn't set just on the rhs1 of the load, but
> also on the lhs of the store.
> 
> The following version fixes that and I've also added the testcase to the
> testsuite.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jiangning Liu  
>   Jakub Jelinek  
> 
>   PR middle-end/91195
>   * tree-ssa-phiopt.c (cond_store_replacement): Move lhs unsharing
>   earlier.  Set TREE_NO_WARNING on the rhs1 of the artificially added
>   load.
> 
>   * gcc.dg/pr91195.c: New test.
OK
jeff



[PATCH v3] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-19 Thread Maciej W. Rozycki
Add flags to find libatomic in build-tree testing, fixing a catastrophic 
libgomp testsuite failure with `riscv*-*-linux*' targets, which imply 
`-latomic' with the `-pthread' GCC option, implied in turn by the 
`-fopenacc' and `-fopenmp' options, removing failures like:

.../bin/riscv64-linux-gnu-ld: cannot find -latomic
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libgomp.c/../libgomp.c-c++-common/atomic-18.c (test for excess errors)
Excess errors:
.../bin/riscv64-linux-gnu-ld: cannot find -latomic

UNRESOLVED: libgomp.c/../libgomp.c-c++-common/atomic-18.c compilation failed to 
produce executable

and bringing overall test results for the `riscv64-linux-gnu' target 
(here with the `x86_64-linux-gnu' host and RISC-V QEMU in the Linux user 
emulation mode as the target board) from:

=== libgomp Summary ===

# of expected passes90
# of unexpected failures3267
# of expected failures  2
# of unresolved testcases   3247
# of unsupported tests  548

to:

=== libgomp Summary ===

# of expected passes6834
# of unexpected failures4
# of expected failures  4
# of unsupported tests  518

libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Add flags to find 
libatomic in build-tree testing.
---
On Wed, 20 Nov 2019, Jakub Jelinek wrote:

> >  Why do you think it is important that it is only the relevant (i.e. 
> > `riscv*') targets that the directory providing newly-built libatomic is 
> > pointed at ?
> 
> Such changes don't happen every day, no other port added it similarly to
> riscv during the 2.5 years since riscv had it, and no port needed it ever
> before.  With unneeded -L options the log file is larger, one needs to
> cut'n'paste more when trying to reproduce stuff etc.

 Fair enough.  OK to apply?

 NB I've decided to use `ishost' rather than `istarget', and updated the 
existing comment accordingly, even though the triplets are always the same 
here, because GCC's target is libgomp's host, e.g.:

Test run by macro on Wed Nov 20 00:39:23 2019
Target is riscv64-unknown-linux-gnu
Host   is riscv64-unknown-linux-gnu
Build  is x86_64-pc-linux-gnu

=== libgomp tests ===

Schedule of variations:
qemu-user-lp64d

Running target qemu-user-lp64d
[...]

-- and libgomp itself does not have a notion of a target (unlike e.g. 
libbfd).  Consequently I suspect that:

# Many hosts now default to a non-ASCII C locale, however, so
# they can set a charset encoding here if they need.
if { [ishost "*-*-cygwin*"] } {
  setenv LC_ALL C.ASCII
  setenv LANG C.ASCII
}

is incorrect for `dg-output' matching as it will execute for libgomp built 
for `*-*-cygwin*' rather than being necessarily verified on one, and 
therefore `isbuild' ought to be used here.

  Maciej

Changes from v2:

- Limit the change to `riscv*-*-linux*' targets.

- Refer to the host rather than target within libgomp.exp as GCC's target 
  is libgomp's host.

Changes from v1:

- Replaced references to "offload options" with "`-fopenacc' and 
  `-fopenmp' options".
---
 libgomp/testsuite/lib/libgomp.exp |   14 ++
 1 file changed, 14 insertions(+)

gcc-test-libgomp-atomic-lib-path.diff
Index: gcc/libgomp/testsuite/lib/libgomp.exp
===
--- gcc.orig/libgomp/testsuite/lib/libgomp.exp
+++ gcc/libgomp/testsuite/lib/libgomp.exp
@@ -174,6 +174,20 @@ proc libgomp_init { args } {
 # For build-tree testing, also consider the library paths used for builing.
 # For installed testing, we assume all that to be provided in the sysroot.
 if { $blddir != "" } {
+   # The `-fopenacc' and `-fopenmp' options imply `-pthread', and
+   # that implies `-latomic' on some hosts, so wire in libatomic
+   # build directories.
+   if [ishost "riscv*-*-linux*"] {
+   set shlib_ext [get_shlib_extension]
+   set atomic_library_path "${blddir}/../libatomic/.libs"
+   if { [file exists "${atomic_library_path}/libatomic.a"]
+|| [file exists \
+"${atomic_library_path}/libatomic.${shlib_ext}"] } {
+   lappend ALWAYS_CFLAGS \
+   "additional_flags=-L${atomic_library_path}"
+   append always_ld_library_path ":${atomic_library_path}"
+   }
+   }
global cuda_driver_include
global cuda_driver_lib
if { $cuda_driver_include != "" } {


Add release notes for new C2X features in GCC 10

2019-11-19 Thread Joseph Myers
I've committed this patch to add release notes for new C2X features in GCC 
10.

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 1e9c3f8d..52eb303c 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -63,6 +63,44 @@ a work-in-progress.
 
 
 
+C
+
+  Several new features from the upcoming C2X revision of the ISO C
+  standard are supported with -std=c2x
+  and -std=gnu2x.  Some of these features are also
+  supported as extensions when compiling for older language versions.
+  In addition to the features listed, some features previously
+  supported as extensions and now added to the C standard are enabled
+  by default in C2X mode and not diagnosed with -std=c2x
+  -Wpedantic.
+  
+The [[]] attribute syntax is supported, as in
+C++.  Existing attributes can be used with this syntax in forms
+such as [[gnu::const]].  The standard
+attributes [[deprecated]], [[fallthrough]]
+and [[maybe_unused]] are supported.
+UTF-8 character constants using the u8'' syntax
+are supported.
+float.h defines
+macros FLT_NORM_MAX, DBL_NORM_MAX
+and LDBL_NORM_MAX.
+When decimal floating-point arithmetic is
+supported, float.h defines
+macros DEC32_TRUE_MIN, DEC64_TRUE_MIN
+and DEC128_TRUE_MIN, in addition to the macros that
+were previously only defined if __STDC_WANT_DEC_FP__
+was defined before including float.h.
+In C2X mode, empty parentheses in a function definition give
+that function a type with a prototype for subsequent calls; other
+old-style function definitions are diagnosed by default in C2X
+mode.
+The strftime format checking supports
+the %OB and %Ob formats.
+In C2X mode, -fno-fp-int-builtin-inexact is
+enabled by default.
+  
+
+
 C++
 
   Several C++20 features have been implemented:


-- 
Joseph S. Myers
jos...@codesourcery.com


Add more pedwarns for [[]] C attributes on types

2019-11-19 Thread Joseph Myers
The standard [[]] attributes currently defined in C2x are all not
valid on types not being defined at the time.

Use on such types results in a warning from attribs.c about attributes
appertaining to types (the warning that I think is bogus in general
for both C and C++, applying as it does to all [[]] attributes
including gnu:: ones that are perfectly meaningful on types not being
defined and work fine when __attribute__ syntax is used instead).  If
that warning is removed (as I intend to do in a subsequent patch),
warnings may or may not result from the attribute handlers, depending
on whether those particular attribute handlers consider the attributes
meaningful in such a context.  In C, however, the rules about where
each [[]] attribute is valid are constraints, so a pedwarn, not a
warning, is required.

Because some handlers are shared between standard and gnu::
attributes, there can be cases that are valid for the GNU attribute
variant but not for the standard one.  So in general it is not correct
to rely on the attribute handlers to give all required pedwarns
(although in some cases, a pedwarn in the attribute handler is in
appropriate way of diagnosing an invalid use); they not have the
information about whether the attribute was a gnu:: one and can
legitimately accept a wider range of uses for the gnu:: attributes.

This patch ensures appropriate diagnostics for invalid uses of C2x
standard attributes on types, and so helps pave the way for the
subsequent removal of the bogus check in attribs.c, by adding a check
run in the front end before calling decl_attributes; this check
removes the attributes from the list after calling pedwarn to avoid
subsequent duplicate warnings.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2019-11-20  Joseph Myers  

* c-decl.c (c_warn_type_attributes): New function.
(groktypename, grokdeclarator, finish_declspecs): Call
c_warn_type_attributes before applying attributes to types.
* c-tree.h (c_warn_type_attributes): Declare.

gcc/testsuite:
2019-11-20  Joseph Myers  

* gcc.dg/c2x-attr-deprecated-2.c, gcc.dg/c2x-attr-fallthrough-2.c,
gcc.dg/c2x-attr-maybe_unused-2.c: Expect errors for invalid uses
of standard attributes on types.  Add more tests of invalid uses
on types.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 278451)
+++ gcc/c/c-decl.c  (working copy)
@@ -4525,6 +4525,26 @@ c_warn_unused_attributes (tree attrs)
   warning (OPT_Wattributes, "%qE attribute ignored",
   get_attribute_name (t));
 }
+
+/* Warn for standard attributes being applied to a type that is not
+   being defined, where that is a constraint violation, and return a
+   list of attributes with them removed.  */
+
+tree
+c_warn_type_attributes (tree attrs)
+{
+  tree *attr_ptr = 
+  while (*attr_ptr)
+if (get_attribute_namespace (*attr_ptr) == NULL_TREE)
+  {
+   pedwarn (input_location, OPT_Wattributes, "%qE attribute ignored",
+get_attribute_name (*attr_ptr));
+   *attr_ptr = TREE_CHAIN (*attr_ptr);
+  }
+else
+  attr_ptr = _CHAIN (*attr_ptr);
+  return attrs;
+}
 
 /* Called when a declaration is seen that contains no names to declare.
If its type is a reference to a structure, union or enum inherited
@@ -4883,6 +4903,7 @@ groktypename (struct c_type_name *type_name, tree
 DEPRECATED_NORMAL);
 
   /* Apply attributes.  */
+  attrs = c_warn_type_attributes (attrs);
   decl_attributes (, attrs, 0);
 
   return type;
@@ -6295,10 +6316,13 @@ grokdeclarator (const struct c_declarator *declara
if (cxx11_attribute_p (attrs) && inner_decl->kind == cdk_id)
  returned_attrs = chainon (returned_attrs, attrs);
else
- returned_attrs = decl_attributes (,
-   chainon (returned_attrs,
-attrs),
-   attr_flags);
+ {
+   attrs = c_warn_type_attributes (attrs);
+   returned_attrs = decl_attributes (,
+ chainon (returned_attrs,
+  attrs),
+ attr_flags);
+ }
break;
  }
case cdk_array:
@@ -11676,6 +11700,7 @@ finish_declspecs (struct c_declspecs *specs)
 }
   if (specs->type != NULL)
 {
+  specs->postfix_attrs = c_warn_type_attributes (specs->postfix_attrs);
   decl_attributes (>type, specs->postfix_attrs, 0);
   specs->postfix_attrs = NULL_TREE;
 }
Index: gcc/c/c-tree.h
===
--- gcc/c/c-tree.h  (revision 278451)
+++ gcc/c/c-tree.h  

Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-11-19 Thread Jakub Jelinek
On Tue, Sep 03, 2019 at 02:27:29PM -0600, Jeff Law wrote:
> >> +  /* The transformation below will inherently introduce a memory load,
> >> +   for which LHS may not be initialized yet if it is not in NOTRAP,
> >> +   so a -Wmaybe-uninitialized warning message could be triggered.
> >> +   Since it's a bit hard to have a very accurate uninitialization
> >> +   analysis to memory reference, we disable the warning here to avoid
> >> +   confusion.  */
> >> +  TREE_NO_WARNING (lhs) = 1;
> > 
> > I don't like this, but not for the reasons Martin stated, we use
> > TREE_NO_WARNING not just when we've emitted warnings, but in many places
> > when we've done something that might trigger false positives.
> > Yes, it would be nice to do it more selectively.
> > 
> > The problem I see with the above though is that lhs might very well be
> > a decl, and setting TREE_NO_WARNING on it then doesn't affect only the
> > hoisted load, but also all other code that refers to the decl.
> LHS is restricted to just MEM_REF, ARRAY_REF and COMPONENT_REF.  We'd be
> setting the NO_WARNING bits on the toplevel expression, but not on
> anything shared like a _DECL node.
> 
> So what we're losing here would be things like out of bounds array
> checks on the LHS, so it still sucks.

Sorry for dropping the ball on this.
You're right, LHS is a MEM_REF, ARRAY_REF or COMPONENT_REF, so what I was
worried about doesn't happen.
I've tried using gimple_set_no_warning, but tree-ssa-uninit.c in this case
actually doesn't look at that (it does only in a different spot).

> > If the TREE_NO_WARNING bit is set on something that isn't shareable, that is
> > fine with me, like a MEM_REF, TARGET_MEM_REF or handled component.  If lhs
> > is a decl, can we force a MEM_REF around it (and won't we fold it back to
> > the decl?)?  Or perhaps better, can we gimple_set_no_warning on the load
> > stmt instead?
> We have the toplevel statement, so that might be worth a try as well.

But, what the patch did was set it on the tree that is later unshared,
which means TREE_NO_WARNING wasn't set just on the rhs1 of the load, but
also on the lhs of the store.

The following version fixes that and I've also added the testcase to the
testsuite.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jiangning Liu  
Jakub Jelinek  

PR middle-end/91195
* tree-ssa-phiopt.c (cond_store_replacement): Move lhs unsharing
earlier.  Set TREE_NO_WARNING on the rhs1 of the artificially added
load.

* gcc.dg/pr91195.c: New test.

--- gcc/tree-ssa-phiopt.c.jj2019-11-13 10:54:53.882917365 +0100
+++ gcc/tree-ssa-phiopt.c   2019-11-19 20:51:33.345775363 +0100
@@ -2269,6 +2269,10 @@ cond_store_replacement (basic_block midd
   name = make_temp_ssa_name (TREE_TYPE (lhs), NULL, "cstore");
   new_stmt = gimple_build_assign (name, lhs);
   gimple_set_location (new_stmt, locus);
+  lhs = unshare_expr (lhs);
+  /* Set TREE_NO_WARNING on the rhs of the load to avoid uninit
+ warnings.  */
+  TREE_NO_WARNING (gimple_assign_rhs1 (new_stmt)) = 1;
   gsi_insert_on_edge (e1, new_stmt);
 
   /* 3) Create a PHI node at the join block, with one argument
@@ -2279,7 +2283,6 @@ cond_store_replacement (basic_block midd
   add_phi_arg (newphi, rhs, e0, locus);
   add_phi_arg (newphi, name, e1, locus);
 
-  lhs = unshare_expr (lhs);
   new_stmt = gimple_build_assign (lhs, PHI_RESULT (newphi));
 
   /* 4) Insert that PHI node.  */
--- gcc/testsuite/gcc.dg/pr91195.c.jj   2019-11-19 20:57:57.841024685 +0100
+++ gcc/testsuite/gcc.dg/pr91195.c  2019-11-19 20:57:40.767280052 +0100
@@ -0,0 +1,25 @@
+/* PR middle-end/91195 */
+/* { dg-do compile } */
+/* { dg-options "-Wmaybe-uninitialized -O2" } */
+
+int bar (char*);
+
+void
+foo (char *x, char *y)
+{
+  char *a[2];
+  int b = 0;
+
+  if (x)
+a[b++] = x;/* { dg-bogus "may be used uninitialized in 
this function" } */
+  if (y)
+a[b++] = y;
+
+  for (int j = 0; j < 4; j++) 
+switch (j)
+  {
+  case 0:
+   if (b == 0 || bar (a[0]))
+ break;
+  }
+}


Jakub



Re: [PATCH 2/3] [ARC] Add scaled load pattern

2019-11-19 Thread Jeff Law
On 11/19/19 2:02 AM, Claudiu Zissulescu wrote:
> ARC processors can use scaled addresses, i.e., the offset part of the
> load address can be shifted by 2 (multiplied by 4). Add this pattern
> and a test for it.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.md (load_scaledsi): New pattern.
> 
> testcase/
> -xx-xx  Claudiu Zissulescu  
> 
>   * gcc.target/arc/scaled-ld.c: New test.
This is worrisome.  I'm pretty sure this has to be folded into the
existing move pattern to satisfy obscure reload requirements.

Jeff



[PATCH] Fix ICE with __builtin_stack_save (PR c/90898)

2019-11-19 Thread Jakub Jelinek
Hi!

I agree that we shouldn't have made __builtin_stack_{save,restore} public,
but I'd fear it is too late to remove it now (and replace by internal fn),
I'd think some code might use it to control when e.g. alloca will be
released.  As the comment on insert_clobber* says, the addition of the
clobbers is a best effort, we could end up not finding the right spot and
not adding the CLOBBER in there even without user __builtin_* calls, so
this patch just removes an unnecessary assertion and just will not find
__builtin_stack_restore if something weird is seen, such as in the testcase
storing of the __builtin_stack_save result into memory, or could be
a cast or whatever else too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jakub Jelinek  

PR c/90898
* tree-ssa-ccp.c (insert_clobber_before_stack_restore): Remove
assertion.
(insert_clobbers_for_var): Fix a typo in function comment.

* gcc.dg/pr90898.c: New test.

--- gcc/tree-ssa-ccp.c.jj   2019-11-13 10:54:47.093020613 +0100
+++ gcc/tree-ssa-ccp.c  2019-11-19 19:13:35.332705130 +0100
@@ -2114,8 +2114,6 @@ insert_clobber_before_stack_restore (tre
 else if (gimple_assign_ssa_name_copy_p (stmt))
   insert_clobber_before_stack_restore (gimple_assign_lhs (stmt), var,
   visited);
-else
-  gcc_assert (is_gimple_debug (stmt));
 }
 
 /* Advance the iterator to the previous non-debug gimple statement in the same
@@ -2140,9 +2138,9 @@ gsi_prev_dom_bb_nondebug (gimple_stmt_it
 /* Find a BUILT_IN_STACK_SAVE dominating gsi_stmt (I), and insert
a clobber of VAR before each matching BUILT_IN_STACK_RESTORE.
 
-   It is possible that BUILT_IN_STACK_SAVE cannot be find in a dominator when a
-   previous pass (such as DOM) duplicated it along multiple paths to a BB.  In
-   that case the function gives up without inserting the clobbers.  */
+   It is possible that BUILT_IN_STACK_SAVE cannot be found in a dominator when
+   a previous pass (such as DOM) duplicated it along multiple paths to a BB.
+   In that case the function gives up without inserting the clobbers.  */
 
 static void
 insert_clobbers_for_var (gimple_stmt_iterator i, tree var)
--- gcc/testsuite/gcc.dg/pr90898.c.jj   2019-11-19 19:18:02.277712801 +0100
+++ gcc/testsuite/gcc.dg/pr90898.c  2019-11-19 19:18:52.613959787 +0100
@@ -0,0 +1,16 @@
+/* PR c/90898 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void *p;
+int bar (void);
+void baz (int *);
+
+void
+foo (void)
+{
+  p = __builtin_stack_save ();
+  int a[(bar (), 2)];
+  baz (a);
+  __builtin_stack_restore (p);
+}

Jakub



Re: [PATCH 3/3] [ARC] Register ARC specific passes with a .def file.

2019-11-19 Thread Jeff Law
On 11/19/19 2:02 AM, Claudiu Zissulescu wrote:
> Use arc-passes.def to register ARC specific passes.
> 
> Ok to apply?
> Claudiu
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc-protos.h (make_pass_arc_ifcvt): Declare.
>   (make_pass_arc_predicate_delay_insns): Likewise.
>   * config/arc/arc.c (class pass_arc_ifcvt): Reformat text, add gate
>   method.
>   (class pass_arc_predicate_delay_insns): Likewise.
>   (arc_init): Remove registering of ARC specific passes.
>   * config/arc/t-arc (PASSES_EXTRA): Add arc-passes.def.
>   * config/arc/arc-passes.def: New file.
OK
jeff



[PATCH] Fix up arch= handling in x86 target attribute (PR target/90867)

2019-11-19 Thread Jakub Jelinek
Hi!

The arch= handling in target attribute right now clears almost all isa_flags
and all isa_flags2, so that later call to ix86_option_override_internal
can set just the isa options for the specific arch and nothing else.
Unfortunately, it doesn't work, because next to the ix86_isa_flags{,2}
we have also ix86_isa_flags{,2}_explicit bitmask and in
ix86_option_override_internal and say for arch=x86_64 will not try to
set sse2 isa when we cleared it in ix86_isa_flags but kept it set in
ix86_isa_flags_explicit.
So, the testcase works fine with -O2, but doesn't work with -O2 -msse2,
in the former case ix86_isa_flags_explicit doesn't have MASK_SSE2 bit set,
but in the latter case it does, so in the former case we end up with
MASK_SSE2 in ix86_isa_flags, in the latter not in the function with target
attribute.

The following patch thus clears both ix86_isa_flags{,2} and corresponding
ix86_isa_flags{,2}_explicit.  Also, so that say
target ("arch=x86_64", "avx") works, the clearing is done when actually
seeing the arch=, not at the end.  target ("avx", "arch=x86_64") will still
not enable avx, like before, though.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jakub Jelinek  

PR target/90867
* config/i386/i386-options.c (ix86_valid_target_attribute_tree): Don't
clear opts->x_ix86_isa_flags{,2} here...
(ix86_valid_target_attribute_inner_p): ... but here when seeing
arch=.  Also clear opts->x_ix86_isa_flags{,2}_explicit.

* gcc.target/i386/pr90867.c: New test.

--- gcc/config/i386/i386-options.c.jj   2019-11-18 12:07:54.672405129 +0100
+++ gcc/config/i386/i386-options.c  2019-11-19 18:43:36.426606458 +0100
@@ -1147,7 +1147,25 @@ ix86_valid_target_attribute_inner_p (tre
  ret = false;
}
  else
-   p_strings[opt] = xstrdup (p + opt_len);
+   {
+ p_strings[opt] = xstrdup (p + opt_len);
+ if (opt == IX86_FUNCTION_SPECIFIC_ARCH)
+   {
+ /* If arch= is set,  clear all bits in x_ix86_isa_flags,
+except for ISA_64BIT, ABI_64, ABI_X32, and CODE16
+and all bits in x_ix86_isa_flags2.  */
+ opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
+| OPTION_MASK_ABI_64
+| OPTION_MASK_ABI_X32
+| OPTION_MASK_CODE16);
+ opts->x_ix86_isa_flags_explicit &= (OPTION_MASK_ISA_64BIT
+ | OPTION_MASK_ABI_64
+ | OPTION_MASK_ABI_X32
+ | OPTION_MASK_CODE16);
+ opts->x_ix86_isa_flags2 = 0;
+ opts->x_ix86_isa_flags2_explicit = 0;
+   }
+   }
}
 
   else if (type == ix86_opt_enum)
@@ -1225,18 +1243,8 @@ ix86_valid_target_attribute_tree (tree f
   /* If we are using the default tune= or arch=, undo the string assigned,
 and use the default.  */
   if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH])
-   {
- opts->x_ix86_arch_string
-   = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]);
-
- /* If arch= is set,  clear all bits in x_ix86_isa_flags,
-except for ISA_64BIT, ABI_64, ABI_X32, and CODE16.  */
- opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
-| OPTION_MASK_ABI_64
-| OPTION_MASK_ABI_X32
-| OPTION_MASK_CODE16);
- opts->x_ix86_isa_flags2 = 0;
-   }
+   opts->x_ix86_arch_string
+ = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]);
   else if (!orig_arch_specified)
opts->x_ix86_arch_string = NULL;
 
--- gcc/testsuite/gcc.target/i386/pr90867.c.jj  2019-11-19 18:49:11.746592189 
+0100
+++ gcc/testsuite/gcc.target/i386/pr90867.c 2019-11-19 18:48:17.271406789 
+0100
@@ -0,0 +1,30 @@
+/* PR target/90867 */
+/* { dg-do run { target lp64 } } */
+/* { dg-options "-O2 -msse2" } */
+
+unsigned long long freq = 36UL;   /* 3.6 GHz = 3600.0 MHz */
+
+__attribute__((noipa)) void
+bar (double x)
+{
+  static double d = 36.0;
+  if (x != d)
+__builtin_abort ();
+  d /= 1000.0;
+}
+
+__attribute__ ((target ("arch=x86-64"))) int
+foo ()
+{
+  bar ((double) freq);
+  bar (1e-3 * freq);
+  bar (1e-6 * freq);
+  bar (1e-9 * freq);
+  return 0;
+}
+
+int
+main ()
+{
+  return foo ();
+}

Jakub



[C++ PATCH] Fix up lambda decl specifier parsing ICE (PR c++/90842)

2019-11-19 Thread Jakub Jelinek
Hi!

In lambdas, the only valid decl specifiers are mutable, constexpr or
consteval.  For various other simple specifiers it is fine to parse them
and reject afterwards if the parsing is simple consuming of a single token
and setting some flags, but as the testcase shows, especially allowing
type specifiers, including new type definitions in there can cause ICEs.
The following patch punts for the cases where the parsing isn't that simple,
which I think is concept, typedef (there we e.g. commit tentative parsing)
or the type specifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok of trunk?

2019-11-19  Jakub Jelinek  

PR c++/90842
* parser.c (cp_parser_decl_specifier_seq): For concept, typedef
or type specifier with CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR
don't try to parse them at all.

* g++.dg/cpp1y/lambda-generic-90842.C: New test.
* g++.dg/cpp0x/lambda/lambda-86550.C: Adjust expected diagnostics.

--- gcc/cp/parser.c.jj  2019-11-14 09:13:24.356104252 +0100
+++ gcc/cp/parser.c 2019-11-19 17:47:24.776014270 +0100
@@ -14094,6 +14094,12 @@ cp_parser_decl_specifier_seq (cp_parser*
  break;
 
 case RID_CONCEPT:
+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
   ds = ds_concept;
   cp_lexer_consume_token (parser->lexer);
 
@@ -14136,6 +14142,12 @@ cp_parser_decl_specifier_seq (cp_parser*
  /* decl-specifier:
   typedef  */
case RID_TYPEDEF:
+  if (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR)
+{
+ found_decl_spec = false;
+ break;
+}
+
  ds = ds_typedef;
  /* Consume the token.  */
  cp_lexer_consume_token (parser->lexer);
@@ -14229,7 +14241,9 @@ cp_parser_decl_specifier_seq (cp_parser*
 
   /* If we don't have a DECL_SPEC yet, then we must be looking at
 a type-specifier.  */
-  if (!found_decl_spec && !constructor_p)
+  if (!found_decl_spec
+ && !constructor_p
+ && (flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) == 0)
{
  int decl_spec_declares_class_or_enum;
  bool is_cv_qualifier;
@@ -14288,9 +14302,6 @@ cp_parser_decl_specifier_seq (cp_parser*
  found_decl_spec = true;
  if (!is_cv_qualifier)
decl_specs->any_type_specifiers_p = true;
-
- if ((flags & CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR) != 0)
-   error_at (token->location, "type-specifier invalid in lambda");
}
}
 
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C.jj2019-11-19 
17:53:37.682440002 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-90842.C   2019-11-19 
17:53:01.815976144 +0100
@@ -0,0 +1,4 @@
+// PR c++/90842
+// { dg-do compile { target c++14 } }
+
+auto a = [](auto x) struct C { void foo (); } {};  // { dg-error 
"expected" }
--- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C.jj 2018-07-18 
23:01:22.824082949 +0200
+++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-86550.C2019-11-19 
17:54:26.978703112 +0100
@@ -4,6 +4,6 @@
 void
 foo ()
 {
-  auto a = []() bool {};   // { dg-error "type-specifier 
invalid in lambda" }
-  auto b = []() bool bool bool bool int {};// { dg-error "type-specifier 
invalid in lambda" }
+  auto a = []() bool {};   // { dg-error "expected" }
+  auto b = []() bool bool bool bool int {};// { dg-error "expected" }
 }

Jakub



[PATCH] Fix ICE during MEM_REF expansion (PR middle-end/90840)

2019-11-19 Thread Jakub Jelinek
Hi!

On the following testcase we ICE on i686-linux (32-bit), because we store
(first 96-bit, then 72-bit) structure into the first part of a 2x 96-bit
complex long double, and for 96-bit floats there is no corresponding
integral mode that covers it and we ICE when op0 is not in MEM (it is a
REG).
The following patch handles the simple case where the whole dest REG is
covered and value is a MEM using a load from the memory, and for the rest
just spills on the stack, similarly how we punt when for stores to complex
REGs if the bitsize/bitnum cover portions of both halves.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-19  Jakub Jelinek  

PR middle-end/90840
* expmed.c (store_bit_field_1): Handle the case where op0 is not a MEM
and has a mode that doesn't have corresponding integral type.

* gcc.c-torture/compile/pr90840.c: New test.

--- gcc/expmed.c.jj 2019-11-15 00:37:32.0 +0100
+++ gcc/expmed.c2019-11-19 17:09:22.035129617 +0100
@@ -840,6 +840,27 @@ store_bit_field_1 (rtx str_rtx, poly_uin
   if (MEM_P (op0))
op0 = adjust_bitfield_address_size (op0, op0_mode.else_blk (),
0, MEM_SIZE (op0));
+  else if (!op0_mode.exists ())
+   {
+ if (ibitnum == 0
+ && known_eq (ibitsize, GET_MODE_BITSIZE (GET_MODE (op0)))
+ && MEM_P (value)
+ && !reverse)
+   {
+ value = adjust_address (value, GET_MODE (op0), 0);
+ emit_move_insn (op0, value);
+ return true;
+   }
+ if (!fallback_p)
+   return false;
+ rtx temp = assign_stack_temp (GET_MODE (op0),
+   GET_MODE_SIZE (GET_MODE (op0)));
+ emit_move_insn (temp, op0);
+ store_bit_field_1 (temp, bitsize, bitnum, 0, 0, fieldmode, value,
+reverse, fallback_p);
+ emit_move_insn (op0, temp);
+ return true;
+   }
   else
op0 = gen_lowpart (op0_mode.require (), op0);
 }
--- gcc/testsuite/gcc.c-torture/compile/pr90840.c.jj2019-11-19 
17:18:31.361918896 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr90840.c   2019-11-19 
17:11:06.010575339 +0100
@@ -0,0 +1,19 @@
+/* PR middle-end/90840 */
+struct S { long long a; int b; };
+struct S foo (void);
+struct __attribute__((packed)) T { long long a; char b; };
+struct T baz (void);
+
+void
+bar (void)
+{
+  _Complex long double c;
+  *(struct S *)  = foo ();
+}
+
+void
+qux (void)
+{
+  _Complex long double c;
+  *(struct T *)  = baz ();
+}

Jakub



[C++ PATCH] Fix weird addr_expr not supported by dump_expr messages (PR c++/90767)

2019-11-19 Thread Jakub Jelinek
Hi!

The following patch is a minimal fix to avoid
cannot convert ‘‘addr_expr’ not supported by dump_type’ to ‘X*’
and similar messages.  The recently added complain_about_bad_argument
function expects a from_type argument, but conv->from isn't necessarily a
type, it can be an expression too.

With this patch one gets error like:
cannot convert ‘const X*’ to ‘X*’
and note like
initializing argument 'this' of ‘void X::foo()’
Still, perhaps what GCC 8 and earlier used to emit might be clearer:
pr90767-1.C: In member function ‘X::operator T() const’:
pr90767-1.C:12:7: error: no matching function for call to ‘X::foo() const’
pr90767-1.C:6:8: note: candidate: ‘void X::foo()’ 
pr90767-1.C:6:8: note:   passing ‘const X*’ as ‘this’ argument discards 
qualifiers
There is the print_conversion_rejection function that handles the various
cases, like this vs. other arguments, conv->from with expr type vs. type
etc.
Though, I must say I don't understand the reasons why 
complain_about_bad_argument
has been added and whether we'd want to emit there what
print_conversion_rejection prints as notes with 2 leading spaces instead as
errors with no leading spaces.

In any case, I think the patch below is a step in the right direction.

2019-11-19  Jakub Jelinek  

PR c++/90767
* call.c (complain_about_no_candidates_for_method_call): If
conv->from is not a type, pass to complain_about_bad_argument
lvalue_type of conv->from.

* g++.dg/diagnostic/pr90767-1.C: New test.
* g++.dg/diagnostic/pr90767-2.C: New test.

--- gcc/cp/call.c.jj2019-11-18 18:49:14.461880924 +0100
+++ gcc/cp/call.c   2019-11-19 14:40:19.121937148 +0100
@@ -9861,8 +9861,11 @@ complain_about_no_candidates_for_method_
  if (const conversion_info *conv
= maybe_get_bad_conversion_for_unmatched_call (candidate))
{
+ tree from_type = conv->from;
+ if (!TYPE_P (conv->from))
+   from_type = lvalue_type (conv->from);
  complain_about_bad_argument (conv->loc,
-  conv->from, conv->to_type,
+  from_type, conv->to_type,
   candidate->fn, conv->n_arg);
  return;
}
--- gcc/testsuite/g++.dg/diagnostic/pr90767-1.C.jj  2019-11-19 
14:48:00.386041586 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-1.C 2019-11-19 14:46:53.395043036 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct X {
+  int n;
+  void foo (); // { dg-message "initializing argument 'this'" }
+
+  template
+  operator T () const
+{
+  if (n == 0)
+   foo (); // { dg-error "cannot convert 'const X\\*' to 'X\\*'" }
+  return n;
+}
+};
--- gcc/testsuite/g++.dg/diagnostic/pr90767-2.C.jj  2019-11-19 
14:50:48.923522136 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr90767-2.C 2019-11-19 14:52:27.324051149 
+0100
@@ -0,0 +1,15 @@
+// PR c++/90767
+// { dg-do compile }
+
+struct A {
+  struct B { B (int) {} };
+
+  template 
+  void foo ()
+  {
+int x = 0;
+bar (x);   // { dg-error "cannot convert 'int' to 'A::B&'" }
+  }
+
+  void bar (B ) {} // { dg-message "initializing argument 1" }
+};

Jakub



[C/C++ PATCH] Fix up build of GCC 4.6 and earlier with GCC 9+ (PR c/90677)

2019-11-19 Thread Jakub Jelinek
Hi!

The following patch fixes build of older GCC releases with newer ones.
In GCC 4.6 and earlier, we had:
struct cgraph_node;
struct cgraph_node *cgraph_node (tree);
and that is something on which GCC 9+ code errors on if it sees any
__gcc_diag__ and similar attributes, because cgraph_node it wants to find is
not a type.

As older GCC releases don't have the __gcc_diag__ etc. attributes guarded on
no newer GCC releases, only on minimum GCC version that does support it,
I think we need to make sure we don't reject what older GCCs used to have.

The following patch does that.  In addition, get_pointer_to_named_type
looked misnamed, because we actually aren't interested in getting gimple *
etc. type, but gimple.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and eventually 9 too?

2019-11-19  Jakub Jelinek  

PR c/90677
* c-format.c (get_pointer_to_named_type): Renamed to ...
(get_named_type): ... this.  If result is FUNCTION_DECL for
cgraph_node, return cgraph_node from pointee of return type if
possible instead of emitting an error.
(init_dynamic_diag_info): Adjust get_pointer_to_named_type callers
to call get_named_type instead.  Formatting fixes.

* c-c++-common/pr90677.c: New test.

--- gcc/c-family/c-format.c.jj  2019-10-05 09:35:12.917997709 +0200
+++ gcc/c-family/c-format.c 2019-11-19 13:05:10.113308578 +0100
@@ -4899,21 +4899,39 @@ init_dynamic_gfc_info (void)
 }
 }
 
-/* Lookup the type named NAME and return a pointer-to-NAME type if found.
-   Otherwise, return void_type_node if NAME has not been used yet, or 
NULL_TREE if
-   NAME is not a type (issuing an error).  */
+/* Lookup the type named NAME and return a NAME type if found.
+   Otherwise, return void_type_node if NAME has not been used yet,
+   or NULL_TREE if NAME is not a type (issuing an error).  */
 
 static tree
-get_pointer_to_named_type (const char *name)
+get_named_type (const char *name)
 {
   tree result;
-  if ((result = maybe_get_identifier (name)))
+  if (tree id = maybe_get_identifier (name))
 {
-  result = identifier_global_value (result);
+  result = identifier_global_value (id);
   if (result)
{
  if (TREE_CODE (result) != TYPE_DECL)
{
+ if (TREE_CODE (result) == FUNCTION_DECL
+ && !strcmp (name, "cgraph_node")
+ && TREE_CODE (TREE_TYPE (result)) == FUNCTION_TYPE)
+   {
+ /* Before GCC 4.7, there used to be
+struct cgraph_node;
+struct cgraph_node *cgraph_node (tree);
+Don't error in this case, so that GCC 9+ can still
+compile GCC 4.6 and earlier.  */
+ tree res = TREE_TYPE (TREE_TYPE (result));
+ if (TREE_CODE (res) == POINTER_TYPE
+ && (TYPE_NAME (TREE_TYPE (res)) == id
+ || (TREE_CODE (TYPE_NAME (TREE_TYPE (res)))
+ == TYPE_DECL
+ && (DECL_NAME (TYPE_NAME (TREE_TYPE (res)))
+ == id
+   return TREE_TYPE (res);
+   }
  error ("%qs is not defined as a type", name);
  result = NULL_TREE;
}
@@ -4953,23 +4971,24 @@ init_dynamic_diag_info (void)
 an extra type level.  */
   if ((local_tree_type_node = maybe_get_identifier ("tree")))
{
- local_tree_type_node = identifier_global_value (local_tree_type_node);
+ local_tree_type_node
+   = identifier_global_value (local_tree_type_node);
  if (local_tree_type_node)
{
  if (TREE_CODE (local_tree_type_node) != TYPE_DECL)
{
  error ("% is not defined as a type");
- local_tree_type_node = 0;
+ local_tree_type_node = NULL_TREE;
}
  else if (TREE_CODE (TREE_TYPE (local_tree_type_node))
   != POINTER_TYPE)
{
  error ("% is not defined as a pointer type");
- local_tree_type_node = 0;
+ local_tree_type_node = NULL_TREE;
}
  else
-   local_tree_type_node =
- TREE_TYPE (TREE_TYPE (local_tree_type_node));
+   local_tree_type_node
+ = TREE_TYPE (TREE_TYPE (local_tree_type_node));
}
}
   else
@@ -4979,12 +4998,12 @@ init_dynamic_diag_info (void)
   /* Similar to the above but for gimple*.  */
   if (!local_gimple_ptr_node
   || local_gimple_ptr_node == void_type_node)
-local_gimple_ptr_node = get_pointer_to_named_type ("gimple");
+local_gimple_ptr_node = get_named_type ("gimple");
 
   /* Similar to the above but for cgraph_node*.  */
   if (!local_cgraph_node_ptr_node
   || local_cgraph_node_ptr_node == 

Re: [PATCH v2] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-19 Thread Jakub Jelinek
On Tue, Nov 19, 2019 at 11:11:39PM +, Maciej W. Rozycki wrote:
>  Why do you think it is important that it is only the relevant (i.e. 
> `riscv*') targets that the directory providing newly-built libatomic is 
> pointed at ?

Such changes don't happen every day, no other port added it similarly to
riscv during the 2.5 years since riscv had it, and no port needed it ever
before.  With unneeded -L options the log file is larger, one needs to
cut'n'paste more when trying to reproduce stuff etc.

Jakub



Re: Reverting r278411

2019-11-19 Thread Segher Boessenkool
Hi Richard,

On Tue, Nov 19, 2019 at 07:35:10PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> > author of this simplify-rtx code, and I think what you did is all wrong.
> > Also, it should be two patches, the CSE part should be separate.  (I can
> > not tell if that is the part that regresses us, either).
> 
> I committed them as one patch because I'd tested them together,
> which I thought was the way this should be done.

Ideally you would test separate patches separately, too :-)

> In practice
> a clean revert would have to be done as a pair anyway: reverting
> one part individually would have required XFAILs on the tests.
> 
> And yeah, it was the cse.c part that was the problem.

Thanks for finding the problem so quickly, much appreciated.

> find_sets_in_insn has:
> 
>   /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
>The hard function value register is used only once, to copy to
>someplace else, so it isn't worth cse'ing.  */
>   else if (GET_CODE (SET_SRC (x)) == CALL)
>   ;
> 
> So n_sets == 1 can be true if we have a single SET in parallel
> with a call.  Unsurprising, I guess no-one had MEM sets in
> parallel with a call, so it didn't matter until now.
> 
> I'm retesting with a !CALL_P check added to make sure that was
> the only problem.
> 
> Before I resubmit, why is the simplify-rtx.c part all wrong?

It was nice and simple, and it isn't anymore.  8 4 2 1 for the four of
lt gt eq un are hardly worth documenting or making symbolic constants
for, because a) they are familiar to all powerpc users anyway (the
assembler has those as predefined constants!), admittedly this isn't a
strong argument for most people; but also b) they were only used in two
short and obvious routines.  After your patch the routines are no
longer short or obvious.

A comparison result is exactly one of four things: lt, gt, eq, or un.
So we can express testing for any subset of those with just an OR of
the masks for the individual conditions.  Whether a comparison is
floating point, floating point no-nans, signed, or unsigned, is just
a property of the comparison, not of the result, in this representation.

If you do not mix signed and unsigned comparisons (they make not much
sense mixed anyway(*)), you need no changes at all: just translate
ltu/gtu/leu/geu to/from lt/gt/le/ge on the way in and out of this
function (there already are helper functions for that, signed_condition
and unsigned_condition).


Segher

(*) lt(a,b) & ltu(a,b) means the signs of a and b are equal, and a

[PATCH v2] libgomp/test: Add flags to find libatomic in build-tree testing

2019-11-19 Thread Maciej W. Rozycki
Add flags to find libatomic in build-tree testing, fixing a catastrophic 
libgomp testsuite failure with targets such as `riscv-linux-gnu' that 
imply `-latomic' with the `-pthread' GCC option, implied in turn by the 
`-fopenacc' and `-fopenmp' options, removing failures like:

.../bin/riscv64-linux-gnu-ld: cannot find -latomic
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libgomp.c/../libgomp.c-c++-common/atomic-18.c (test for excess errors)
Excess errors:
.../bin/riscv64-linux-gnu-ld: cannot find -latomic

UNRESOLVED: libgomp.c/../libgomp.c-c++-common/atomic-18.c compilation failed to 
produce executable

and bringing overall test results for the said target (here with the 
`x86_64-linux-gnu' host and RISC-V QEMU in the Linux user emulation mode 
as the target board) from:

=== libgomp Summary ===

# of expected passes90
# of unexpected failures3267
# of expected failures  2
# of unresolved testcases   3247
# of unsupported tests  548

to:

=== libgomp Summary ===

# of expected passes6834
# of unexpected failures4
# of expected failures  4
# of unsupported tests  518

libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Add flags to find 
libatomic in build-tree testing.
---
Hi Jakub,

 Thank you for your review.

On Mon, 18 Nov 2019, Jakub Jelinek wrote:

> > --- gcc.orig/libgomp/testsuite/lib/libgomp.exp
> > +++ gcc/libgomp/testsuite/lib/libgomp.exp
> > @@ -174,6 +174,16 @@ proc libgomp_init { args } {
> >  # For build-tree testing, also consider the library paths used for 
> > builing.
> >  # For installed testing, we assume all that to be provided in the 
> > sysroot.
> >  if { $blddir != "" } {
> > +   # Offload options imply `-pthread', and that implies `-latomic'
> > +   # on some targets, so wire in libatomic build directories.
> 
> -fopenmp is not an option I'd like to call Offload option, OpenMP is 
> much more than just offloading, and this isn't on some targets, but only 
> one, riscv*.
> So, I think it should be done only for that target and talk about 
> -fopenmp/-fopenacc options instead of Offload options.

 Thanks for straightening me out on OpenMP; I have adjusted the comment 
and the change description accordingly.

 I think it makes no sense to require people to update this test harness 
if a future target also requires access to libatomic; the `-L' option and 
the dynamic loader path are harmless if not required, as libatomic won't 
actually be pulled unless otherwise requested and there is no other 
library in libatomic's build directory that could be used unintentionally.  
Whereas in the absence of these settings a random version of libatomic 
already present somewhere on the system may accidentally be used and 
affect test results somehow.

 Why do you think it is important that it is only the relevant (i.e. 
`riscv*') targets that the directory providing newly-built libatomic is 
pointed at ?

  Maciej

Changes from v1:

- Replaced references to "offload options" with "`-fopenacc' and 
  `-fopenmp' options".
---
 libgomp/testsuite/lib/libgomp.exp |   11 +++
 1 file changed, 11 insertions(+)

gcc-test-libgomp-atomic-lib-path.diff
Index: gcc/libgomp/testsuite/lib/libgomp.exp
===
--- gcc.orig/libgomp/testsuite/lib/libgomp.exp
+++ gcc/libgomp/testsuite/lib/libgomp.exp
@@ -174,6 +174,17 @@ proc libgomp_init { args } {
 # For build-tree testing, also consider the library paths used for builing.
 # For installed testing, we assume all that to be provided in the sysroot.
 if { $blddir != "" } {
+   # The `-fopenacc' and `-fopenmp' options imply `-pthread', and
+   # that implies `-latomic' on some targets, so wire in libatomic
+   # build directories.
+   set shlib_ext [get_shlib_extension]
+   set atomic_library_path "${blddir}/../libatomic/.libs"
+   if { [file exists "${atomic_library_path}/libatomic.a"]
+|| [file exists \
+"${atomic_library_path}/libatomic.${shlib_ext}"] } {
+   lappend ALWAYS_CFLAGS "additional_flags=-L${atomic_library_path}"
+   append always_ld_library_path ":${atomic_library_path}"
+   }
global cuda_driver_include
global cuda_driver_lib
if { $cuda_driver_include != "" } {


libgo patch committed: Better cgo handling for . in pkgpath

2019-11-19 Thread Ian Lance Taylor
This libgo patch by Than McIntosh updates cgo's
gccgoPkgpathToSymbolNew() to bring it into conformance with the way
that gccgo now handles package paths with embedded dots (see
https://golang.org/cl/200838).  See also https://gcc.gnu.org/PR61880,
a related bug.  This patch is a copy of https://golang.org/cl/207957
in the main Go repo.  This is for https://golang.org/issue/35623.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 278316)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-25d5e9dca49ad3f49393b254dd87f8df51487c65
+245904ac148f15db530fb8d50f526c47d08024c5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/cgo/out.go
===
--- libgo/go/cmd/cgo/out.go (revision 277299)
+++ libgo/go/cmd/cgo/out.go (working copy)
@@ -1318,8 +1318,10 @@ func gccgoPkgpathToSymbolNew(ppath strin
for _, c := range []byte(ppath) {
switch {
case 'A' <= c && c <= 'Z', 'a' <= c && c <= 'z',
-   '0' <= c && c <= '9', c == '_', c == '.':
+   '0' <= c && c <= '9', c == '_':
bsl = append(bsl, c)
+   case c == '.':
+   bsl = append(bsl, ".x2e"...)
default:
changed = true
encbytes := []byte(fmt.Sprintf("..z%02x", c))


Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-19 Thread Thomas Koenig

Am 19.11.19 um 11:39 schrieb Bernhard Reutner-Fischer:

+ char name[GFC_MAX_SYMBOL_LEN + 1];
+ snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+   f->sym->name);
+
+ if (gfc_get_sym_tree (name, ns, , false) != 0)

(I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.


GFC_MAX_SYMBOL_LEN is the maximum length of a gfortran symbol. AFAIK, it
is not possible to use a longer symbol name than that, so it needs to be
truncated. Uniqueness of the variable name is guaranteed by the var_num
variable.

If the user puts dummy arguments Supercalifragilisticexpialidociousa and
Supercalifragilisticexpialidociousb into the argument list of a
procedure, he will have to look at the numbers to differentiate them :-)


(II) s/__dummy/__intent_in/ for clarity?


It's moved away a bit from INTENT(IN) now, because an argument which
cannot be modified (even by passing to a procedure with a dummy argument
with unknown intent) is now also handled.

Regards

Thomas


Re: [C++ Patch] Avoid duplicate warning

2019-11-19 Thread Jason Merrill

On 11/18/19 3:45 PM, Paolo Carlini wrote:

functions like c_common_truthvalue_conversion are shared with the C 
front-end thus don't get a tsubst_flags_t argument. It seems clear that 
long term we have to do something about those but in the meanwhile we 
have been using warning sentinels which, along the paths which certainly 
must have all the warnings suppressed, work well for now and cannot 
cause regressions. Here I propose to add (at least) one more to 
ocp_convert since I have a straightforward testcase.


OK.


Note sue if we want to proactively add, say, one for warn_parentheses too.


Not without a testcase.

Jason



Re: [C++ PATCH] c++/92450 - ICE with invalid nested name specifier.

2019-11-19 Thread Jason Merrill

On 11/18/19 7:04 PM, Marek Polacek wrote:

The newly added diagnostic causes an ICE because the new grokdeclarator call
returned error_mark_node and DECL_SOURCE_LOCATION crashes on that.  So don't
attempt to print the new diagnostic if we've encountered something not
resembling a bit-field.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-11-18  Marek Polacek  

PR c++/92450 - ICE with invalid nested name specifier.
* parser.c (cp_parser_member_declaration): Bail out for invalid code.

* g++.dg/parse/crash71.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index c473e7fd92f..caed6946977 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -25052,6 +25052,9 @@ cp_parser_member_declaration (cp_parser* parser)
  tree d = grokdeclarator (declarator, _specifiers,
   BITFIELD, /*initialized=*/false,
   );
+ /* Hopelessly invalid code, give up.  */
+ if (error_operand_p (d))
+   goto out;
  error_at (DECL_SOURCE_LOCATION (d),
"bit-field %qD has non-integral type %qT",
d, TREE_TYPE (d));


Don't we still want the rest of the error-recovery code there, i.e. 
skipping to the end of the statement?


Jason



Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates

2019-11-19 Thread Jason Merrill

On 11/19/19 1:44 AM, Marek Polacek wrote:

It also looks like you're using the LOOKUP flag to mean two different
things:

1) try to treat parenthesized args as an aggregate initializer
(build_new_method_call_1)
2) treat this CONSTRUCTOR as coming from parenthesized args
(store_init_value/digest_init)


Correct.


Why is the flag needed for #1?  When do we not want to try to treat the args
as an aggregate initializer?


There are cases where treating the args as an aggregate initializer causes
spurious overload resolution successes, e.g.

  void swap(int&, int&);

  int& get();

  struct pair {
void swap(pair&) noexcept(noexcept(swap(get(), get( { } // { dg-error "no 
matching function for call" }
  };

There are no viable candidates for pair::swap (# of args mismatch) but since
pair is an aggregate, build_new_method_call_1 would return a CONSTRUCTOR so
overload resolution would succeed.  Another case had to do with SFINAE and
decltype where we didn't evaluate the arg, but succeeding in the
no-viable-function case caused the compiler to choose the wrong function.


Hmm, but then the parenthesized list is arguments for swap, not an 
initializer for a single argument of swap.  That would be using it for 
copy-initialization, and we only want to treat parenthesized args as an 
aggregate initializer in direct-initialization.  Can we check for 
direct-init (i.e. !LOOKUP_ONLYCONVERTING) instead?



-  expr = get_target_expr_sfinae (digest_init (totype, expr, complain),
+  /* We want an implicit lookup, but when initializing an aggregate
+from a parenthesized list, we must remember not to warn about
+narrowing conversions.  */
+  flags = LOOKUP_IMPLICIT;
+  if (CONSTRUCTOR_IS_PAREN_INIT (expr))
+   flags |= LOOKUP_AGGREGATE_PAREN_INIT;
+  expr = get_target_expr_sfinae (digest_init_flags (totype, expr,
+   flags, complain),


I was thinking to set the flag inside digest_init, so callers don't need 
to know about it.



@@ -6704,6 +6761,14 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
}
  init_code = NULL_TREE;
}
+ else if (BRACE_ENCLOSED_INITIALIZER_P (init_code))
+   {
+ /* In C++20, the call to build_aggr_init could have created
+a CONSTRUCTOR to handle A(1, 2).  */
+ gcc_assert (cxx_dialect >= cxx2a);
+ init = init_code;
+ flags |= LOOKUP_AGGREGATE_PAREN_INIT;
+   }


I think you want to clear init_code after copying it into init.


+ init = build_aggr_init (decl, init, flags, tf_warning_or_error);
+ /* In C++20, a member initializer list can be initializing an
+aggregate from a parenthesized list of values:
+
+  struct S {
+A aggr;
+S() : aggr(1, 2, 3) { }
+  };
+
+ Build up an INIT_EXPR like we do for aggr{1, 2, 3}, so that
+ build_data_member_initialization can grok it.  */
+ if (cxx_dialect >= cxx2a)
+   {
+ tree t = init;
+ while (TREE_CODE (t) == EXPR_STMT
+|| TREE_CODE (t) == CONVERT_EXPR)
+   t = TREE_OPERAND (t, 0);
+ if (BRACE_ENCLOSED_INITIALIZER_P (t))
+   {
+ t = digest_init_flags (type, t,
+(LOOKUP_IMPLICIT
+ | LOOKUP_AGGREGATE_PAREN_INIT),
+tf_warning_or_error);
+ init = build2 (INIT_EXPR, type, decl, t);
+   }
+   }


All this should happen within the call to build_aggr_init. 
expand_default_init already builds INIT_EXPR in various cases; it could 
do so for this case, too.



+  if (BRACE_ENCLOSED_INITIALIZER_P (exp))
+{
+  gcc_assert (cxx_dialect >= cxx2a);
+  return finish_compound_literal (type, exp, complain,
+ fcl_functional_paren);
+}


How about handling this in build_cplus_new instead of places that also 
call build_cplus_new?


It might also be better to call digest_init in build_new_method_call_1 
and not involve finish_compound_literal at all, since the latter calls 
reshape_init.


Jason



Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread Joseph Myers
On Tue, 19 Nov 2019, David Malcolm wrote:

> If we're going with this approach (which I'll leave to Joseph), perhaps

I think reusing the glibc generator is appropriate.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING] Re: [PATCH] Fix PR92088

2019-11-19 Thread Joseph Myers
On Tue, 19 Nov 2019, Richard Biener wrote:

> > +/* For nested functions disqualify ones taking VLAs by value
> > +   from inlining since the middle-end cannot deal with this.
> > +   ???  We should arrange for those to be passed by reference
> > +   with emitting the copy on the caller side in the frontend.  */
> > +if (storage_class == csc_none
> > +   && TREE_CODE (type) == FUNCTION_TYPE)
> > +  for (tree al = TYPE_ARG_TYPES (type); al; al = TREE_CHAIN (al))
> > +   {
> > + tree arg = TREE_VALUE (al);
> > + if (arg != error_mark_node
> > + && C_TYPE_VARIABLE_SIZE (TREE_VALUE (al)))

The second use of TREE_VALUE (al) looks like it would be better as a 
reference to arg.  OK with that change.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-19 Thread David Malcolm
On Sat, 2019-11-16 at 21:42 +0100, Thomas Schwinge wrote:
> Hi David!
> 
> On 2019-11-15T20:22:47-0500, David Malcolm 
> wrote:
> > This patch kit
> 
> (I have not looked at the patches.)  ;-)
> 
> > introduces a static analysis pass for GCC that can diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> Sounds great from the description!

Thanks.

> Would it make sense to add to the wiki page
>  a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what
> their
> respective benefits are, what their design goals are, etc.  (Of
> course
> understanding that yours is much less mature at this point; talking
> about
> high-level design rather than current implementation status.)
> 
> For example, why do we want that in GCC instead of an external tool
> -- in
> part covered in your Rationale.  Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-
> level
> '#if' etc., which surely are problematic for external tools that are
> not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
> 
> Just an idea, because I wondered about these things.

Thanks; I've added some notes to the "Rationale" section of the wiki
page.

A lot of the information you're after is hidden in patch 2 of the kit,
in an analysis.texi (though admittedly that's hard to read in "patch
that adds a .texi file" form).

For now, I've uploaded a prebuilt version of the HTML to:

https://dmalcolm.fedorapeople.org/gcc/2019-11-19/gccint/Static-Analyzer.html


> > The analyzer runs as an IPA pass on the gimple SSA representation.
> > It associates state machines with data, with transitions at certain
> > statements and edges.  It finds "interesting" interprocedural paths
> > through the user's code, in which bogus state transitions happen.
> > 
> > For example, given:
> > 
> >free (ptr);
> >free (ptr);
> > 
> > at the first call, "ptr" transitions to the "freed" state, and
> > at the second call the analyzer complains, since "ptr" is already
> > in
> > the "freed" state (unless "ptr" is NULL, in which case it stays in
> > the NULL state for both calls).
> > 
> > Specific state machines include:
> > - a checker for malloc/free, for detecting double-free, resource
> > leaks,
> >   use-after-free, etc (sm-malloc.cc), and
> 
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as
> source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
> 
> In addition to host-side 'malloc'/'free', there is device-side
> (separate
> memory space) 'acc_malloc'/'acc_free'. 

I've been thinking about generalizing the malloc/free checker to cover
resource acquisition/release pairs, adding a "domain" for the
allocation, where we'd complain if the resource release function isn't
of the same domain as the resource acquisition function.

Allocation domains might be:
  malloc/free
  C++ scalar new/delete
  C++ array new/delete
  FILE * (fopen/fclose)
  "foo_alloc"/"foo_release" for libfoo (i.e. user-extensible, via
attributes)

and thus catch things like deleting with scalar delete when the buffer
was allocated using new[], and various kinds of layering violations.

I'm afraid that I'm not very familiar with OpenACC.  Would
acc_malloc/acc_free fit into that pattern, or would more be needed? 
For example, can you e.g. dereference a device-side pointer in host
code, or would we ideally issue a diagnostic about that?

>  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
> 
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must
> not
> 'acc_free' memory spaces that are still mapped.

It sounds like this state machine is somewhat more complicated.

Is there a state transition diagram for this somewhere?  I don't have
that for my state machines, but there are at least lists of states; see
e.g. the various state_t within malloc_state_machine
near the top of:

Re: [C++ PATCH] Fix error-recovery with constexpr dtor (PR c++/92414)

2019-11-19 Thread Jason Merrill

On 11/18/19 11:22 PM, Jakub Jelinek wrote:

On Mon, Nov 18, 2019 at 02:38:51PM -0500, Jason Merrill wrote:

On 11/8/19 9:14 AM, Jakub Jelinek wrote:

Hi!

We ICE on the following testcase, because DECL_INITIAL (decl) is
error_mark_node due to previously reported error and
cxx_eval_outermost_constant_expr is unhappy if ctx.ctor is not
a CONSTRUCTOR, but error_mark_node.

If the initializer is invalid, it should have been diagnosed already and
there is no need to try to evaluate constexpr dtor on it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-08  Jakub Jelinek  

PR c++/92414
* constexpr.c (cxx_constant_dtor): Don't call
cxx_eval_outermost_constant_expr if DECL_INITIAL is erroneous.


Why add the error-checking here rather than in
cxx_eval_outermost_constant_expr when we use DECL_INITIAL?


So like this?
Also passed bootstrap/regtest on x86_64-linux and i686-linux:


OK, thanks.


2019-11-19  Jakub Jelinek  

PR c++/92414
* constexpr.c (cxx_eval_outermost_constant_expr): If DECL_INITIAL
on object is erroneous, return t without trying to evaluate
a constexpr dtor.

* g++.dg/cpp2a/constexpr-dtor4.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-18 21:50:49.539233013 +0100
+++ gcc/cp/constexpr.c  2019-11-18 21:52:47.217475549 +0100
@@ -5834,6 +5834,8 @@ cxx_eval_outermost_constant_expr (tree t
  gcc_assert (object && VAR_P (object));
  gcc_assert (DECL_DECLARED_CONSTEXPR_P (object));
  gcc_assert (DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (object));
+ if (error_operand_p (DECL_INITIAL (object)))
+   return t;
  ctx.ctor = unshare_expr (DECL_INITIAL (object));
  TREE_READONLY (ctx.ctor) = false;
  /* Temporarily force decl_really_constant_value to return false
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor4.C.jj 2019-11-18 
21:50:56.152134256 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor4.C2019-11-18 
21:50:56.152134256 +0100
@@ -0,0 +1,15 @@
+// PR c++/92414
+// { dg-do compile { target c++2a } }
+
+struct A { virtual void foo (); };
+
+struct B : A {
+  constexpr B (int);   // { dg-warning "used but never defined" }
+  constexpr ~B () { }
+};
+
+struct D : B {
+  constexpr D () : B (42) { }  // { dg-error "used before its definition" }
+};
+
+constexpr D d; // { dg-message "in 'constexpr' expansion of" }


Jakub





Re: Add a new combine pass

2019-11-19 Thread Segher Boessenkool
On Tue, Nov 19, 2019 at 11:33:13AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Sun, Nov 17, 2019 at 11:35:26PM +, Richard Sandiford wrote:
> >> While working on SVE, I've noticed several cases in which we fail
> >> to combine instructions because the combined form would need to be
> >> placed earlier in the instruction stream than the last of the
> >> instructions being combined.  This includes one very important
> >> case in the handling of the first fault register (FFR).
> >
> > Do you have an example of that?
> 
> It's difficult to share realistic examples at this stage since this
> isn't really the right forum for making them public for the first time.

Oh I'm very sorry.  In the future, just say "Future" and I know what
you mean :-)

>   /* Make sure that the value that is to be substituted for the register
>does not use any registers whose values alter in between.  However,
>If the insns are adjacent, a use can't cross a set even though we
>think it might (this can happen for a sequence of insns each setting
>the same destination; last_set of that register might point to
>a NOTE).  If INSN has a REG_EQUIV note, the register is always
>equivalent to the memory so the substitution is valid even if there
>are intervening stores.  Also, don't move a volatile asm or
>UNSPEC_VOLATILE across any other insns.  */
>   || (! all_adjacent
> && (((!MEM_P (src)
>   || ! find_reg_note (insn, REG_EQUIV, src))
>  && modified_between_p (src, insn, i3))
> || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
> || GET_CODE (src) == UNSPEC_VOLATILE))

So this would work if you had pseudos here, instead of the hard reg?
Because it is a hard reg it is the same number in both places, making it
hard to move.

> > How are dependencies represented in your new pass?  If it just does
> > walks over the insn stream for everything, you get quadratic complexity
> > if you move insns backwards.  We have that in combine already, mostly
> > from modified_between_p, but that is limited because of how LOG_LINKS
> > work, and we have been doing this for so long and there are no problems
> > found with it, so it must work in practice.  But I am worried about it
> > when moving insns back an unlimited distance.
> 
> It builds def-use chains, but using a constant limit on the number of
> explicitly-recorded uses.  All other uses go in a numerical live range
> from which they (conservatively) never escape.  The def-use chains
> represent memory as a single entity, a bit like in gimple.

Ah.  So that range thing ensures correctness.

Why don't you use DF for the DU chains?

> >> - it tries using REG_EQUAL notes for the final instruction.
> >
> > And that.
> 
> I meant REG_EQUAL notes on i3, i.e. it tries replacing the src of i3
> with i3's REG_EQUAL note and combining into that.  Does combine do that?
> I couldn't see it, and in:
> 
>https://gcc.gnu.org/ml/gcc/2019-06/msg00148.html
> 
> you seemed to reject the idea of allowing it.

Yes, I still do.  Do you have an example where it helps?

> >> - it can parallelise two independent instructions that both read from
> >>   the same register or both read from memory.
> >
> > That only if somehow there is a link between the two (so essentially
> > never).  The only combinations tried by combine are those via LOG_LINKs,
> > which are between a SET and the first corresponding use.  This is a key
> > factor that makes it kind of linear (instead of exponential) complexity.
> 
> Tracking limited def-use chains is what makes this last bit easy.
> We can just try parallelising two instructions from the (bounded) list
> of uses.  And for this case there's not any garbage rtl involved, since
> we reuse the same PARALLEL rtx between attempts.  The cost is basically
> all in the recog call (which would obviously mount up if we went
> overboard).

*All* examples above and below are just this.

If you disable everything else, what do the statistics look like then?

> > One thing I want to do is some mini-combine after every split, probably
> > only with the insns new from the split.  But we have no cfglayout mode
> > anymore then, and only hard regs (except in the first split pass, which
> > is just a little later than your new pass).
> 
> Yeah, sounds like it could be useful.  I guess there'd need to be
> an extra condition on the combination that the new insn can't be
> immediately split.

It would run *after* split.  Not interleaved with it.

> > And amount of garbage produced?
> 
> If -ftime-report stats are accurate, then the total amount of
> memory allocated is:
> 
> run-combine=2 (normal combine): 1793 kB
> run-combine=4 (new pass only):98 kB
> run-combine=6 (both passes):1871 kB (new pass accounts for 78 kB)
> 
> But again that's not a fair comparison when the main combine pass does more.

The way combine does SUBST is pretty 

[C++ PATCH] Consider parm types equivalence for operator rewrite tiebreaker.

2019-11-19 Thread Jason Merrill
The C++ committee continues to discuss how best to avoid breaking existing
code with the new rules for reversed operators.  A recent suggestion was to
base the tie-breaker on the parameter types of the candidates, which made a
lot of sense to me, so this patch implements that.

This patch also mentions that a candidate was reversed or rewritten when
printing the list of candidates, and warns about a comparison that becomes
recursive under the new rules.  There is no flag for this warning; people
can silence it by swapping the operands.

Tested x86_64-pc-linux-gnu, applying to trunk.

* call.c (same_fn_or_template): Change to cand_parms_match.
(joust): Adjust.
(print_z_candidate): Mark rewritten/reversed candidates.
(build_new_op_1): Warn about recursive call with reversed arguments.
---
 gcc/cp/call.c | 63 ---
 .../g++.dg/cpp2a/spaceship-rewrite2.C | 12 
 .../g++.dg/cpp2a/spaceship-rewrite3.C | 10 +++
 .../g++.dg/cpp2a/spaceship-rewrite4.C | 12 
 4 files changed, 73 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite4.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 13639a1c901..f4dfa7b3f56 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3694,6 +3694,10 @@ print_z_candidate (location_t loc, const char *msgstr,
 inform (cloc, "%s%#qD (near match)", msg, fn);
   else if (DECL_DELETED_FN (fn))
 inform (cloc, "%s%#qD (deleted)", msg, fn);
+  else if (candidate->reversed ())
+inform (cloc, "%s%#qD (reversed)", msg, fn);
+  else if (candidate->rewritten ())
+inform (cloc, "%s%#qD (rewritten)", msg, fn);
   else
 inform (cloc, "%s%#qD", msg, fn);
   if (fn != candidate->fn)
@@ -6219,8 +6223,14 @@ build_new_op_1 (const op_location_t , enum tree_code 
code, int flags,
  else
{
  if (cand->reversed ())
-   /* We swapped these in add_candidate, swap them back now.  */
-   std::swap (cand->convs[0], cand->convs[1]);
+   {
+ /* We swapped these in add_candidate, swap them back now.  */
+ std::swap (cand->convs[0], cand->convs[1]);
+ if (cand->fn == current_function_decl)
+   warning_at (loc, 0, "in C++20 this comparison calls the "
+   "current function recursively with reversed "
+   "arguments");
+   }
  result = build_over_call (cand, LOOKUP_NORMAL, complain);
}
 
@@ -10995,18 +11005,32 @@ joust_maybe_elide_copy (z_candidate *)
   return false;
 }
 
-/* True if cand1 and cand2 represent the same function or function
-   template.  */
+/* True if the defining declarations of the two candidates have equivalent
+   parameters.  */
 
-static bool
-same_fn_or_template (z_candidate *cand1, z_candidate *cand2)
+bool
+cand_parms_match (z_candidate *c1, z_candidate *c2)
 {
-  if (cand1->fn == cand2->fn)
+  tree fn1 = c1->template_decl;
+  tree fn2 = c2->template_decl;
+  if (fn1 && fn2)
+{
+  fn1 = most_general_template (TI_TEMPLATE (fn1));
+  fn1 = DECL_TEMPLATE_RESULT (fn1);
+  fn2 = most_general_template (TI_TEMPLATE (fn2));
+  fn2 = DECL_TEMPLATE_RESULT (fn2);
+}
+  else
+{
+  fn1 = c1->fn;
+  fn2 = c2->fn;
+}
+  if (fn1 == fn2)
 return true;
-  if (!cand1->template_decl || !cand2->template_decl)
+  if (identifier_p (fn1) || identifier_p (fn2))
 return false;
-  return (most_general_template (TI_TEMPLATE (cand1->template_decl))
- == most_general_template (TI_TEMPLATE (cand2->template_decl)));
+  return compparms (TYPE_ARG_TYPES (TREE_TYPE (fn1)),
+   TYPE_ARG_TYPES (TREE_TYPE (fn2)));
 }
 
 /* Compare two candidates for overloading as described in
@@ -11155,20 +11179,11 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
 
  if (winner && comp != winner)
{
- if (same_fn_or_template (cand1, cand2))
-   {
- /* Ambiguity between normal and reversed versions of the
-same comparison operator; prefer the normal one.
-https://lists.isocpp.org/core/2019/10/7438.php  */
- if (cand1->reversed ())
-   winner = -1;
- else
-   {
- gcc_checking_assert (cand2->reversed ());
- winner = 1;
-   }
- break;
-   }
+ /* Ambiguity between normal and reversed comparison operators
+with the same parameter types; prefer the normal one.  */
+ if ((cand1->reversed () != cand2->reversed ())
+ && cand_parms_match (cand1, 

Re: [PATCH v2] PR92398: Fix testcase failure of pr72804.c

2019-11-19 Thread Segher Boessenkool
Hi!

On Mon, Nov 18, 2019 at 04:21:07PM +0800, luoxhu wrote:
> On 2019/11/15 18:17, Segher Boessenkool wrote:
> > On Thu, Nov 14, 2019 at 09:12:32PM -0600, Xiong Hu Luo wrote:
> >> P9LE generated instruction is not worse than P8LE.
> >> mtvsrdd;xxlnot;stxv vs. not;not;std;std.
> >> Update the test case to fix failures.
> > 
> > So this no longer runs it for p7, and it also doesn't run it for cpus
> > after p9 anymore.  Could you change it to be a test for p9 and above,
> > and one for before p9?  Does that work?
> Since one test file could not support multiple dg do-compile or dg-options,
> it seems not feasible to build both power7, power8 and power9 option in one
> source file, I added check_effective_target_power[7,8,9] and
> add_options_for_power[7,8,9] in target-supports.exp for each cpu configure.

You can add a selector to the scan-assembler tests, I meant.

There are two ways to run tests: in one way you use just what -mcpu=
etc. you happen to get; in the other way, you force particular options.
With the first way, you will automatically also test everything on newer
cpus.  With the second way, you also test (for example) p8 code gen when
actually testing on a p9 system.  Both approaches have their uses.

In this case, I would use the first approach, but okay :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.h
> @@ -0,0 +1,17 @@
> +/* This test code is included into pr72804.p8.c and pr72804.p9.c

And p7.

> +   The two files have the tests for the number of instructions generated for
> +   P8LE versus P9LE.  */

Three files :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p7.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2" } */
> +/* { dg-add-options power7 } */
> +
> +/* { dg-final { scan-assembler-times "not " 4 {xfail be} } } */
> +/* { dg-final { scan-assembler-times "std " 2  } } */
> +/* { dg-final { scan-assembler-times "ld " 2 } } */
> +/* { dg-final { scan-assembler-not "lxvd2x" {xfail be} } } */
> +/* { dg-final { scan-assembler-not "stxvd2x" {xfail be} } } */
> +/* { dg-final { scan-assembler-not "xxpermdi" } } */
> +/* { dg-final { scan-assembler-not "mfvsrd" } } */
> +/* { dg-final { scan-assembler-not "mfvsrd" } } */

So this is exactly the same as p8.  Hmm.

> --- a/gcc/testsuite/gcc.target/powerpc/pr72804.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c
> @@ -1,25 +1,16 @@
>  /* { dg-do compile { target { lp64 } } } */
> -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
> -/* { dg-options "-O2 -mvsx -fno-inline-functions --param 
> max-inline-insns-single-O2=200" } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-add-options power9 } */

-mvsx is superfluous if you have -mcpu=power9 already.
You shouldn't test powerpc_vsx_ok if you force a CPU that has it.

> -/* { dg-final { scan-assembler-times "not " 4 } } */
> -/* { dg-final { scan-assembler-times "std " 2 } } */
> +/* { dg-final { scan-assembler-times "not " 2 { target power9 } } } */
> +/* { dg-final { scan-assembler-not "std " { target power9 } } } */
>  /* { dg-final { scan-assembler-times "ld " 2 } } */
>  /* { dg-final { scan-assembler-not "lxvd2x" } } */
>  /* { dg-final { scan-assembler-not "stxvd2x" } } */
>  /* { dg-final { scan-assembler-not "xxpermdi" } } */
>  /* { dg-final { scan-assembler-not "mfvsrd" } } */
>  /* { dg-final { scan-assembler-not "mfvsrd" } } */

You now force power9 for this test, so you shouldn't test for it anymore.

> +proc check_effective_target_power7 {  } {
> +  return [check_no_compiler_messages_nocache power7 assembly {
> + #if !(!defined(_ARCH_PWR8) && defined(_ARCH_PWR7))
> + #error !_ARCH_PWR7
> + #endif
> +  } "-mdejagnu-cpu=power7"]
> +}

So this test for p7 exactly (not power8 or up).  Okay.  Add a comment
saying that?

> +proc add_options_for_power7 { flags } {
> +  if { [istarget powerpc*-*-*]  } {
> +return "$flags -mdejagnu-cpu=power7"
> +  }
> +  return "$flags"
> +}

Is this better than just adding -mdejagnu-cpu=power7 directly?


Segher


Re: Reverting r278411

2019-11-19 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi,
>
> r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> author of this simplify-rtx code, and I think what you did is all wrong.
> Also, it should be two patches, the CSE part should be separate.  (I can
> not tell if that is the part that regresses us, either).

I committed them as one patch because I'd tested them together,
which I thought was the way this should be done.  In practice
a clean revert would have to be done as a pair anyway: reverting
one part individually would have required XFAILs on the tests.

And yeah, it was the cse.c part that was the problem.
find_sets_in_insn has:

  /* Don't count call-insns, (set (reg 0) (call ...)), as a set.
 The hard function value register is used only once, to copy to
 someplace else, so it isn't worth cse'ing.  */
  else if (GET_CODE (SET_SRC (x)) == CALL)
;

So n_sets == 1 can be true if we have a single SET in parallel
with a call.  Unsurprising, I guess no-one had MEM sets in
parallel with a call, so it didn't matter until now.

I'm retesting with a !CALL_P check added to make sure that was
the only problem.

Before I resubmit, why is the simplify-rtx.c part all wrong?

Thanks,
Richard


Remove unused parameter to estimate_edge_size_and_time

2019-11-19 Thread Jan Hubicka
Hi,
this patch removes unused argument of estimate_edge_size_and_time.
Bootstrapped/regtested x86_64-linux, comitted.

* ipa-fnsummary.c (estimate_edge_size_and_time): Drop parameter PROP.
(estimate_calls_size_and_time): Update.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278441)
+++ ipa-fnsummary.c (working copy)
@@ -2950,7 +2950,6 @@ estimate_edge_devirt_benefit (struct cgr
 static inline void
 estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
 sreal *time,
-int prob,
 vec known_vals,
 vec known_contexts,
 vec known_aggs,
@@ -2960,6 +2959,7 @@ estimate_edge_size_and_time (struct cgra
   int call_size = es->call_stmt_size;
   int call_time = es->call_stmt_time;
   int cur_size;
+
   if (!e->callee && hints && e->maybe_hot_p ()
   && estimate_edge_devirt_benefit (e, _size, _time,
   known_vals, known_contexts, known_aggs))
@@ -2968,12 +2968,8 @@ estimate_edge_size_and_time (struct cgra
   *size += cur_size;
   if (min_size)
 *min_size += cur_size;
-  if (!time)
-;
-  else if (prob == REG_BR_PROB_BASE)
+  if (time)
 *time += ((sreal)call_time) * e->sreal_frequency ();
-  else
-*time += ((sreal)call_time * prob) * e->sreal_frequency ();
 }
 
 
@@ -3019,7 +3015,7 @@ estimate_calls_size_and_time (struct cgr
 sowe do not need to compute probabilities.  */
  estimate_edge_size_and_time (e, size,
   es->predicate ? NULL : min_size,
-  time, REG_BR_PROB_BASE,
+  time,
   known_vals, known_contexts,
   known_aggs, hints);
}
@@ -3031,7 +3027,7 @@ estimate_calls_size_and_time (struct cgr
  || es->predicate->evaluate (possible_truths))
estimate_edge_size_and_time (e, size,
 es->predicate ? NULL : min_size,
-time, REG_BR_PROB_BASE,
+time,
 known_vals, known_contexts, known_aggs,
 hints);
 }


Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Tim Rühsen
On 19.11.19 20:01, Jeff Law wrote:
> On 11/19/19 12:00 PM, Tim Rühsen wrote:
>> Thanks. Where exactly did it go ? Still can't see it in
>> git://sourceware.org/git/binutils-gdb.git. Is there another repository ?
> It's in the GCC repo.  binutils-gdb will (at some point) pull it from GCC.
> jeff

Thanks, good to know, will clone it from there :-)

Tim



signature.asc
Description: OpenPGP digital signature


Minor speedup in estimate_edge_badness

2019-11-19 Thread Jan Hubicka
Hi,
this patch avoids some extra calculation in edge_badness.

Bootstrapped/regtested x86_64-linux. Comitted.

* ipa-inline.c (inlining_speedup): New function.
(edge_badness): Use it.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 278441)
+++ ipa-inline.c(working copy)
@@ -768,6 +768,33 @@ compute_inlined_call_time (struct cgraph
   return time;
 }
 
+/* Determine time saved by inlininig EDGE of frequency FREQ
+   where callee's runtime w/o inlineing is UNINLINED_TYPE
+   and with inlined is INLINED_TYPE.  */
+
+inline sreal
+inlining_speedup (struct cgraph_edge *edge,
+ sreal freq,
+ sreal uninlined_time,
+ sreal inlined_time)
+{
+  sreal speedup = uninlined_time - inlined_time;
+  /* Handling of call_time should match one in ipa-inline-fnsummary.c
+ (estimate_edge_size_and_time).  */
+  sreal call_time = ipa_call_summaries->get (edge)->call_stmt_time;
+
+  if (freq > 0)
+{
+  speedup = (speedup + call_time);
+  if (freq != 1)
+   speedup = speedup * freq;
+}
+  else if (freq == 0)
+speedup = speedup >> 11;
+  gcc_checking_assert (speedup >= 0);
+  return speedup;
+}
+
 /* Return true if the speedup for inlining E is bigger than
PARAM_MAX_INLINE_MIN_SPEEDUP.  */
 
@@ -1149,10 +1176,8 @@ edge_badness (struct cgraph_edge *edge,
   sreal numerator, denominator;
   int overall_growth;
   sreal freq = edge->sreal_frequency ();
-  sreal inlined_time = compute_inlined_call_time (edge, edge_time, freq);
 
-  numerator = (compute_uninlined_call_time (edge, unspec_edge_time, freq)
-  - inlined_time);
+  numerator = inlining_speedup (edge, freq, unspec_edge_time, edge_time);
   if (numerator <= 0)
numerator = ((sreal) 1 >> 8);
   if (caller->count.ipa ().nonzero_p ())
@@ -1235,16 +1260,14 @@ edge_badness (struct cgraph_edge *edge,
  fprintf (dump_file,
   "  %f: guessed profile. frequency %f, count %" PRId64
   " caller count %" PRId64
-  " time w/o inlining %f, time with inlining %f"
+  " time saved %f"
   " overall growth %i (current) %i (original)"
   " %i (compensated)\n",
   badness.to_double (),
   freq.to_double (),
   edge->count.ipa ().initialized_p () ? edge->count.ipa 
().to_gcov_type () : -1,
   caller->count.ipa ().initialized_p () ? caller->count.ipa 
().to_gcov_type () : -1,
-  compute_uninlined_call_time (edge,
-   unspec_edge_time, 
freq).to_double (),
-  inlined_time.to_double (),
+  inlining_speedup (edge, freq, unspec_edge_time, 
edge_time).to_double (),
   estimate_growth (callee),
   callee_info->growth, overall_growth);
}


Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Tim Rühsen
On 16.11.19 18:14, Jeff Law wrote:
> On 11/12/19 9:39 AM, Tim Rühsen wrote:
>> On 11/12/19 4:15 PM, Ian Lance Taylor wrote:
>>> On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
 this is a proposal to fix
 https://sourceware.org/bugzilla/show_bug.cgi?id=25180

 In short:
 cxxfilt
 _ZZ1_DO1z1Dclaa1D1VEE1VE2zo

 takes several minutes with 100% CPU before it comes back with a result.

 With this patch the result is returned immediately. The test suite in
 binutils-gdb/libiberty/ throws no error.

 I'd like to note that I am not subscribed to the list, so please add me
 to CC when replying. Thanks in advance.
>>> This is OK with an appropriate ChangeLog entry.
>> Thanks for feedback, Ian.
>>
>> Attached is the patch with a ChangeLog entry.
>>
>> Regards, Tim
...

> THanks.  Installed after minor twiddling of the ChangeLog.
> 
> jeff

Thanks. Where exactly did it go ? Still can't see it in
git://sourceware.org/git/binutils-gdb.git. Is there another repository ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Fix slowness in demangler

2019-11-19 Thread Jeff Law
On 11/19/19 12:00 PM, Tim Rühsen wrote:
> On 16.11.19 18:14, Jeff Law wrote:
>> On 11/12/19 9:39 AM, Tim Rühsen wrote:
>>> On 11/12/19 4:15 PM, Ian Lance Taylor wrote:
 On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
> this is a proposal to fix
> https://sourceware.org/bugzilla/show_bug.cgi?id=25180
>
> In short:
> cxxfilt
> _ZZ1_DO1z1Dclaa1D1VEE1VE2zo
>
> takes several minutes with 100% CPU before it comes back with a result.
>
> With this patch the result is returned immediately. The test suite in
> binutils-gdb/libiberty/ throws no error.
>
> I'd like to note that I am not subscribed to the list, so please add me
> to CC when replying. Thanks in advance.
 This is OK with an appropriate ChangeLog entry.
>>> Thanks for feedback, Ian.
>>>
>>> Attached is the patch with a ChangeLog entry.
>>>
>>> Regards, Tim
> ...
> 
>> THanks.  Installed after minor twiddling of the ChangeLog.
>>
>> jeff
> 
> Thanks. Where exactly did it go ? Still can't see it in
> git://sourceware.org/git/binutils-gdb.git. Is there another repository ?
It's in the GCC repo.  binutils-gdb will (at some point) pull it from GCC.
jeff



Re: [C++ coroutines 3/6] Front end parsing and transforms.

2019-11-19 Thread Nathan Sidwell

On 11/17/19 5:25 AM, Iain Sandoe wrote:


+++ b/gcc/cp/coroutines.cc



+/* = Morph and Expand. =



+/* Helpers for label creation.  */
+static tree
+create_anon_label_with_ctx (location_t loc, tree ctx)
+{
+  tree lab = build_decl (loc, LABEL_DECL, NULL_TREE, void_type_node);
+
+  DECL_ARTIFICIAL (lab) = 1;
+  DECL_IGNORED_P (lab) = 1;


We can use true for such boolean flags now, your call.



+/* We mark our named labels as used, because we want to keep them in place
+   during development.  FIXME: Remove this before integration.  */


FIXME?


+struct __proxy_replace
+{
+  tree from, to;
+};


std::pair ?



+/* If this is a coreturn statement (or one wrapped in a cleanup) then
+   return the list of statements to replace it.  */
+static tree


space between comment and function -- here and elsewhere.


+  /* p.return_void and p.return_value are probably void, but it's not
+ clear if that's intended to be a guarantee.  CHECKME.  */


CHECKME?


+  /* We might have a single co_return statement, in which case, we do
+have to replace it with a list.  */


'do have to' reads weirdly -> 'have to' or 'do not have to' ?



+static tree
+co_await_expander (tree *stmt, int * /*do_subtree*/, void *d)
+{



+  r = build2_loc (loc, INIT_EXPR, TREE_TYPE (sv_handle), sv_handle,
+ suspend);
+  append_to_statement_list (r, _list);
+  tree resume
+   = lookup_member (TREE_TYPE (sv_handle), get_identifier ("resume"), 1, 0,
+tf_warning_or_error);
+  resume = build_new_method_call (sv_handle, resume, NULL, NULL_TREE,
+ LOOKUP_NORMAL, NULL, tf_warning_or_error);
+  resume = coro_build_cvt_void_expr_stmt (resume, loc);


Do we have a way of forcing this to be a tail call?  Should comment 
and/or TODO:



+  append_to_statement_list (resume, _list);
+}


// comments approaching ...


+  add_stmt (r); // case 0:
+  // Implement the suspend, a scope exit without clean ups.
+  r = build_call_expr_internal_loc (loc, IFN_CO_SUSPN, void_type_node, 1, 
susp);
+  r = coro_build_cvt_void_expr_stmt (r, loc);


// danger zone over :)  [there are others scattered around though]



+/* When we built the await expressions, we didn't know the coro frame
+   layout, therefore no idea where to find the promise or where to put
+   the awaitables.  Now we know these things, fill them in.  */
+static tree
+transform_await_expr (tree await_expr, struct __await_xform_data *xform)
+{
+  struct suspend_point_info *si = suspend_points->get (await_expr);
+  location_t loc = EXPR_LOCATION (await_expr);
+  if (!si)
+{
+  error_at (loc, "no suspend point info for %qD", await_expr);


that looks implementory speak -- is it an ICE?



+  /* FIXME: determine if it's better to walk the co_await several times with
+ a quick test, or once with a more complex test.  */


Probably can simply be an 'i'm not sure, I went with ... ' comment?



+  /* Update the block associated with the outer scope of the orig fn.  */
+  tree first = expr_first (fnbody);
+  if (first && TREE_CODE (first) == BIND_EXPR)
+{
+  /* We will discard this, since it's connected to the original scope
+nest... ??? CHECKME, this might be overly cautious.  */

?


+  tree block = BIND_EXPR_BLOCK (first);
+  if (block) // For this to be missing is probably a bug.


gcc_assert?


+   {
+ gcc_assert (BLOCK_SUPERCONTEXT (block) == NULL_TREE);
+ gcc_assert (BLOCK_CHAIN (block) == NULL_TREE);
+ BLOCK_SUPERCONTEXT (block) = top_block;
+ BLOCK_SUBBLOCKS (top_block) = block;
+   }
+}
+
+  add_stmt (actor_bind);
+  tree actor_body = push_stmt_list ();
+
+  /* FIXME: this is development marker, remove later.  */


FIXME



+  tree return_void = NULL_TREE;
+  tree rvm
+= lookup_promise_member (orig, "return_void", loc, false /*musthave*/);

  /*musthave=*/false  I think there are other similar cases



+  /* Get a reference to the final suspend var in the frame.  */
+  transform_await_expr (final_await, );
+  r = coro_build_expr_stmt (final_await, loc);
+  add_stmt (r);
+
+  /* now do the tail of the function.  */

now->Now


+  /* Here deallocate the frame (if we allocated it), which we will have at
+ present.  */


sentence is confusing.  reword?


+/* Helper that returns an identifier for an appended extension to the
+   current un-mangled function name.  */
+static tree
+get_fn_local_identifier (tree orig, const char *append)
+{
+  /* Figure out the bits we need to generate names for the outlined things
+ For consistency, this needs to behave the same way as
+ ASM_FORMAT_PRIVATE_NAME does. */


pity we don't have a generic helper already.


+  char *an;
+  if (DECL_ASSEMBLER_NAME (orig))
+an = ACONCAT ((IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (orig)), sep, 
append,
+  (char *) 0));
+  else if (DECL_USE_TEMPLATE (orig) && 

Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread Joseph Myers
On Tue, 19 Nov 2019, David Malcolm wrote:

> If we're going with this approach, and redistributing those unicode data
> files as part of our repo and tarballs, do we need some kind of
> copyright/license statement that spells out the situation?  What does
> glibc do for this?

glibc includes the files in localedata/unicode-gen/ directory.

My inclination is that we should include them in the GCC sources, but 
there were concerns about doing so when I asked about that issue when last 
updating the data used for warnings about normalization of UCNs in 
identifiers (which I should probably update again - given the work I did 
last time, this time it should just be a regeneration with newer input 
files) .

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Free dominance info at the beginning of pass_jump_after_combine

2019-11-19 Thread Segher Boessenkool
On Wed, Nov 13, 2019 at 01:15:18PM +0100, Ilya Leoshkevich wrote:
> > But OTOH it may well be the case that other things in cleanup_cfg make
> > the known dominance info invalid as well, in which case just the comment
> > is a bit misleading.  Sounds likely to me :-)
> 
> Yeah, that's what I worry about as well.  In particular, this block in
> try_optimize_cfg:

Heh, I did that code, whoops :-)

> /* Try to change a conditional branch to a return to the
>respective conditional return.  */
> if (EDGE_COUNT (b->succs) == 2
> && any_condjump_p (BB_END (b))
> && bb_is_just_return (BRANCH_EDGE (b)->dest, , ))
>   {
> if (redirect_jump (as_a  (BB_END (b)),
>PATTERN (ret), 0))
>   {
> if (use)
>   emit_insn_before (copy_insn (PATTERN (use)),
> BB_END (b));
> if (dump_file)
>   fprintf (dump_file, "Changed conditional jump %d->%d "
>"to conditional return.\n",
>b->index, BRANCH_EDGE (b)->dest->index);
> redirect_edge_succ (BRANCH_EDGE (b),
> EXIT_BLOCK_PTR_FOR_FN (cfun));
> BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
> changed_here = true;
>   }
>   }
> 
> runs regardless of cleanup mode, and it makes use of redirect_edge_succ,
> which does not update dominators.

Yeah.  Of course this only does anything if the targeted block is only
a return insn (and maybe a USE of the return value), so nothing bad will
happen, but the dom info is not technically correct anymore.


Segher


Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-11-19 Thread David Malcolm
On Fri, 2019-09-27 at 16:41 -0400, Lewis Hyatt wrote:
> On Thu, Sep 26, 2019 at 08:46:56PM +, Joseph Myers wrote:
> > On Thu, 26 Sep 2019, Lewis Hyatt wrote:
> > 
> > > A couple notes: 
> > > - In order to avoid any portability problems with wchar_t,
> > > the
> > > equivalent of wcwidth() from libc is implemented in-house.
> > 
> > I'm uneasy about contrib/gen_wcwidth.cpp doing the generation using
> > host 
> > libc's wcwidth.  The effect is that libcpp/generated_cpp_wcwidth.h
> > is 
> > *not* reproducible unless you know exactly what host (libc version,
> > locale 
> > used when running the program, distribution patches to libc and
> > locale 
> > data) was used to run the program.  I think we need a generator
> > that works 
> > from Unicode data in some way so we can explicitly say what version
> > of the 
> > (unmodified) Unicode data was used.
> 
> Here is a revised patch that hopefully addresses your concerns. I
> borrowed the
> relevant Python code for parsing Unicode's data files from glibc,
> then added a
> new script that parses the locale data they output into the same sort
> of simply
> searchable tables I was creating before. The new generated table is
> very close
> to the old one, but there are some differences due to improvements
> that have
> been made to glibc recently, affecting 200 or so codepoints. The
> procedure for
> updating GCC's wcwidth would then be the following:
> 
> -The three Unicode data files live in contrib/unicode/
> {EastAsianWidth.txt,PropList.txt,UnicodeData.txt} and can be updated
> at any
> time when Unicode changes them.
> 
> -glibc's processing logic lives in two Python scripts in
> contrib/unicode/from_glibc and these would ideally be updated when
> glibc makes
> updates. It seems they occasionally put some manual overrides, etc.,
> based on
> feedback and bug reports. (These are the verbatim scripts from glibc
> with no
> changes, so they need only be copied over.)
> 
> -contrib/unicode/gen_wcwidth.py runs the glibc code, using GCC's
> Unicode data
> files as inputs, and produces the necessary tables for
> libcpp/generated_cpp_wcwidth.h.
> 
> Hope that sounds better. This way it is relatively straightforward to
> keep in
> sync with glibc (which seems desirable to me anyway), but is also
> always
> reproducible.
> 
> Note: I did not include the three large unicode data files in this
> emailed
> patch, although they would be committed as part of the patch
> presumably.
> They are available here:
> ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
> ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
> ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
> 
> The rest of the patch is unchanged from before, other than one
> comment updated
> to reflect the new situation, and charset.c rebased to current trunk.
> 
> Thank you for taking the time to review this.
> 
> -Lewis

Thanks for posting this patch; I'm sorry about how long it's taken me
to review it.

BTW, have you done GCC contributor paperwork?
  https://gcc.gnu.org/contribute.html#legal

> diff --git a/contrib/unicode/from_glibc/unicode_utils.py 
> b/contrib/unicode/from_glibc/unicode_utils.py
> new file mode 100644
> index 000..a9e94cce418
> --- /dev/null
> +++ b/contrib/unicode/from_glibc/unicode_utils.py

[...snip...]

I'll leave it for Joseph to comment on whether this approach satisifies
his concerns; I'll focus on the diagnostic-show-locus.c changes.

I'm assuming that all of the Python code is Python 3, rather than 2
(I see python 3 shebangs and python3-style-print, so it looks good from
that POV).  It appears that there's no need for the build machine to
have Python, it's only needed when maintainers are refreshing the data
from glibc.

If we're going with this approach, and redistributing those unicode data
files as part of our repo and tarballs, do we need some kind of
copyright/license statement that spells out the situation?  What does
glibc do for this?

> diff --git a/contrib/unicode/from_glibc/utf8_gen.py 
> b/contrib/unicode/from_glibc/utf8_gen.py
> new file mode 100755
> index 000..0e5583cd259
> --- /dev/null
> +++ b/contrib/unicode/from_glibc/utf8_gen.py

[...snip...]

> diff --git a/contrib/unicode/gen_wcwidth.py b/contrib/unicode/gen_wcwidth.py
> new file mode 100755
> index 000..02b28bcedcf
> --- /dev/null
> +++ b/contrib/unicode/gen_wcwidth.py

[...snip...]

If we're going with this approach (which I'll leave to Joseph), perhaps
this directory should have a brief README (covering much of the material
you've mentioned in the email with the patch, talking about syncing with
glibc, regenerating the data, etc).

> diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
> index 4d563dda8f4..7a5bd36d962 100644
> --- a/gcc/diagnostic-show-locus.c
> +++ b/gcc/diagnostic-show-locus.c
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gcc-rich-location.h"
>  #include "selftest.h"
>  #include 

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-19 Thread Segher Boessenkool
Hi!

On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote:
> +;; code iterators and attributes for vector FP comparison operators:
> +(define_code_iterator vector_fp_comparison_operator [lt le ne
> +  ungt unge unlt unle])

Let's indent that differently?

(define_code_iterator
  vector_fp_comparison_operator [lt le ne ungt unge unlt unle])

is nice I think.

> +; There are 14 possible vector FP comparison operators, gt and eq of them 
> have
> +; been expanded above, so just support 12 remaining operators here.
>  
> +; 0. For ge:

(Don't number comments please).

> +(define_expand "vector_ge"
> +  [(set (match_operand:VEC_F 0 "vlogical_operand")
> + (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
> +   (match_operand:VEC_F 2 "vlogical_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +  "")

This already exists?  Line 764 in vector.md?

> +; 1. For lt/le/ne/ungt/unge/unlt/unle:
> +; lt(a,b)   = gt(b,a)
> +; le(a,b)   = ge(b,a)
> +; unge(a,b) = ~lt(a,b)
> +; unle(a,b) = ~gt(a,b)
> +; ne(a,b)   = ~eq(a,b)
> +; ungt(a,b) = ~le(a,b)
> +; unlt(a,b) = ~ge(a,b)
> +(define_insn_and_split "vector_"
>[(set (match_operand:VEC_F 0 "vfloat_operand")
> + (vector_fp_comparison_operator:VEC_F
> +(match_operand:VEC_F 1 "vfloat_operand")
> +(match_operand:VEC_F 2 "vfloat_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>"#"
> +  "can_create_pseudo_p ()"
> +  [(pc)]
>  {
> +  enum rtx_code cond = ;
> +  rtx res = gen_reg_rtx (mode);
> +  bool need_invert = false;
> +  switch (cond)
> +{
> +case LT:
> +case LE:
> +  cond = swap_condition (cond);
> +  std::swap (operands[1], operands[2]);
> +  break;
> +case UNLE:
> +case UNLT:
> +case NE:
> +case UNGE:
> +case UNGT:
> +  cond = reverse_condition_maybe_unordered (cond);
> +  need_invert = true;
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> +  emit_insn (gen_rtx_SET (res, comp));
> +
> +  if (need_invert)
> +emit_insn (gen_one_cmpl2 (res, res));
> +
> +  emit_insn (gen_rtx_SET (operands[0], res));
>  })

Does this work for (say) ungt?  I would do two switch statements: the
first only to do the invert, and then the second to do the swap (with
the modified cond!)  So:

{
  enum rtx_code cond = ;
  bool need_invert = false;

  if (cond == UNLE || cond == UNLT || cond == NE
  || cond == UNGE || cond == UNGT)
{
  cond = reverse_condition_maybe_unordered (cond);
  need_invert = true;
}

  if (cond == LT || cond == LE)
{
  cond = swap_condition (cond);
  std::swap (operands[1], operands[2]);
}

  rtx comp = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);

  if (need_invert)
{
  rtx res = gen_reg_rtx (mode);
  emit_insn (gen_rtx_SET (res, comp));
  emit_insn (gen_one_cmpl2 (operands[0], res));
}
  else
emit_insn (gen_rtx_SET (operands[0], comp));
})

> +; 2. For ltgt/uneq/ordered/unordered:
> +; ltgt: gt(a,b) | gt(b,a)
> +; uneq: ~gt(a,b) & ~gt(b,a)
> +; ordered: ge(a,b) | ge(b,a)
> +; unordered: ~ge(a,b) & ~ge(b,a)

You could write that as
  ~(ge(a,b) | ge(b,a))
which may show the structure better?

> +(define_insn_and_split "vector_"
>[(set (match_operand:VEC_F 0 "vfloat_operand")
> + (vector_fp_extra_comparison_operator:VEC_F
> +(match_operand:VEC_F 1 "vfloat_operand")
> +(match_operand:VEC_F 2 "vfloat_operand")))]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>"#"
> +  "can_create_pseudo_p ()"

This should be "&& can_create_pseudo ()".

Also, can_create_pseudo in the split condition can fail, in theory
anyway.  It should be part of the insn condition, instead, and even
then it can fail: after the last split pass, but before reload. such
an insn can in principle be created, and then it sill be never split.

In this case, maybe you should add a scratch register; in the previous
case, you can reuse operands[0] for res instead of making a new reg
for it, if !can_create_pseudo ().

>  {
> +  enum rtx_code cond = ;
> +  rtx res1 = gen_reg_rtx (mode);
> +  rtx res2 = gen_reg_rtx (mode);
> +  bool need_invert = false;
> +  switch (cond)
> +{
> +case UNEQ:
> +  need_invert = true;
> +/* Fall through.  */
> +case LTGT:
> +  cond = GT;
> +  break;
> +case UNORDERED:
> +  need_invert = true;
> +/* Fall through.  */
> +case ORDERED:
> +  cond = GE;
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  rtx comp1 = gen_rtx_fmt_ee (cond, mode, operands[1], operands[2]);
> +  emit_insn (gen_rtx_SET (res1, comp1));
> +  rtx comp2 = gen_rtx_fmt_ee (cond, mode, operands[2], operands[1]);
> +  emit_insn (gen_rtx_SET (res2, comp2));
> +
> +  emit_insn (gen_ior3 (res1, res1, res2));
> +
> +  if (need_invert)

Re: Symver attribute

2019-11-19 Thread Jakub Jelinek
On Tue, Nov 19, 2019 at 10:19:05AM -0600, Segher Boessenkool wrote:
> On Tue, Nov 19, 2019 at 03:53:43PM +0100, Jan Hubicka wrote:
> > > Sounds perfectly fine to me, but how many tests will need changing?  Is
> > > it only those that use symbol versioning directly?
> > 
> > There are no tests presently, I plan to write some so those will get
> > dg-require-symver.
> > 
> > .symver is output only if you use explicit symver function/variable
> > attribute. So the "only" downdisde of this is that instead of getting
> > friendly error message from GCC that your target does not support
> > symvers (because we can not easily check for it) you will get less
> > friendly error message from assembler.  I hope that is acceptale since
> > we have pre-existing situations like that already.
> 
> Ah, I thought this would happen for various things from libc as well, so
> there would be a lot of random fallout.  I probably misunderstood :-)

glibc so far uses inline asm with .symver directives.  That could change one
day of course conditionally to use the GCC symver attribute.

Jakub



Re: Reverting r278411

2019-11-19 Thread Segher Boessenkool
On Tue, Nov 19, 2019 at 04:32:09PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> > author of this simplify-rtx code, and I think what you did is all wrong.
> > Also, it should be two patches, the CSE part should be separate.  (I can
> > not tell if that is the part that regresses us, either).
> >
> > Could you please revert the patch?  I'll re-implement the simplify-rtx
> > part of it properly, if you want.  And I can test the CSE part separately.
> > But right now we need trunk to work again.
> 
> OK, done as r278455.

Thanks!


Segher


Re: Reverting r278411

2019-11-19 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi,
>
> r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
> author of this simplify-rtx code, and I think what you did is all wrong.
> Also, it should be two patches, the CSE part should be separate.  (I can
> not tell if that is the part that regresses us, either).
>
> Could you please revert the patch?  I'll re-implement the simplify-rtx
> part of it properly, if you want.  And I can test the CSE part separately.
> But right now we need trunk to work again.

OK, done as r278455.

Richard


Re: Symver attribute

2019-11-19 Thread Segher Boessenkool
On Tue, Nov 19, 2019 at 03:53:43PM +0100, Jan Hubicka wrote:
> > Sounds perfectly fine to me, but how many tests will need changing?  Is
> > it only those that use symbol versioning directly?
> 
> There are no tests presently, I plan to write some so those will get
> dg-require-symver.
> 
> .symver is output only if you use explicit symver function/variable
> attribute. So the "only" downdisde of this is that instead of getting
> friendly error message from GCC that your target does not support
> symvers (because we can not easily check for it) you will get less
> friendly error message from assembler.  I hope that is acceptale since
> we have pre-existing situations like that already.

Ah, I thought this would happen for various things from libc as well, so
there would be a lot of random fallout.  I probably misunderstood :-)


Segher


Reverting r278411

2019-11-19 Thread Segher Boessenkool
Hi,

r278411 causes bootstrap to fail on at least all powerpc*.  Also, I am
author of this simplify-rtx code, and I think what you did is all wrong.
Also, it should be two patches, the CSE part should be separate.  (I can
not tell if that is the part that regresses us, either).

Could you please revert the patch?  I'll re-implement the simplify-rtx
part of it properly, if you want.  And I can test the CSE part separately.
But right now we need trunk to work again.

Thanks,


Segher


RFA; patch to fix PR90007

2019-11-19 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

Sometime ago a code which permits LRA to reload hard register into 
memory as it did for pseudo were added.  But this LRA possibility was 
not reflected in recog.c.  The following patch fixes this discrepancy 
and as a result fixes PR90007.


OK to commit?

Index: ChangeLog
===
--- ChangeLog	(revision 278451)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2019-11-19  Vladimir Makarov  
+
+	PR rtl-optimization/90007
+	* recog.c (constrain_operands): Permit hard registers too for
+	memory when LRA is used.
+
 2019-11-19  Martin Liska  
 
 	* toplev.c (general_init): Move the call...
Index: recog.c
===
--- recog.c	(revision 278413)
+++ recog.c	(working copy)
@@ -2757,10 +2757,9 @@ constrain_operands (int strict, alternat
 			   /* Before reload, accept what reload can turn
   into a mem.  */
 			   || (strict < 0 && CONSTANT_P (op))
-			   /* Before reload, accept a pseudo,
+			   /* Before reload, accept a pseudo or hard register,
   since LRA can turn it into a mem.  */
-			   || (strict < 0 && targetm.lra_p () && REG_P (op)
-   && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+			   || (strict < 0 && targetm.lra_p () && REG_P (op))
 			   /* During reload, accept a pseudo  */
 			   || (reload_in_progress && REG_P (op)
    && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 278451)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-11-19  Vladimir Makarov  
+
+	PR rtl-optimization/90007
+	* gcc.target/i386/pr90007.c: New test.
+
 2019-11-15  Andrew Sutton  
 
 	PR c++/89913
Index: testsuite/gcc.target/i386/pr90007.c
===
--- testsuite/gcc.target/i386/pr90007.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr90007.c	(working copy)
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/90007 */
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-march=bdver1 -mfpmath=387 -O1 -fschedule-insns -fselective-scheduling" } */
+
+void
+qj (int b9, int r9, int k4, int k0, int e7)
+{
+  (void) b9;
+  (void) r9;
+  (void) k4;
+
+  while (!!k0 == e7 * 1.1)
+{
+}
+}


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-11-19 Thread Wilco Dijkstra
Hi Richard,

> I acked this here: 
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html

Thanks - I missed your email, but it's committed now. Yes we will
need to look at the vector costs again and retune them based on
recent vectorizer improvements and latest microarchitectures.

Cheers,
Wilco



Re: [PATCH] Fix dwarf-lineinfo inconsistency of inlined subroutines

2019-11-19 Thread Bernd Edlinger
On 11/19/19 3:01 AM, Alexandre Oliva wrote:
> Hello, Bernd,
> 
> Apologies for taking so long to respond.  I had not noticed your patch
> before being explicitly copied on it.
> 
> IIUC you're describing a problem in GDB, that could be summed up as its
> not paying attention to is_stmt and being unaware of location views, and
> you appear to be proposing to work around that by changing what AFAICT
> is a correct line number program into one that, though incorrect, will
> get more desirable behavior out of current GDB.
> 
> If that is so, I'm inclined to disagree that this is a desirable change.
> We should strive to generate correct, rather than incorrect debug
> information.  It was foreseen and expected that statement frontiers and
> location views would introduce occasional behavior regressions in
> debuggers unable to consume this information; what was not expected was
> that debuggers would lack support for it for so long.
> 
> I'd much rather make debug info incorrect, and instead introduce support
> for the extended debug info in consumers.  I realize it would take a lot
> more work to implement a proper fix there, however.  Unfortunately, I
> don't know of anyone currently working on that counterpart
> implementation, so the best recommendation I can offer to avoid this
> problem is to disable statement frontiers (-gno-statement-frontiers) and
> location views (-gno-variable-location-views).  This will get you line
> number programs and location lists that do not exercise features that
> GDB is not aware of.
> 
> Perhaps we should change our defaults, if the situation with GDB does
> not change :-(
> 

I tried those options, and -gno-variable-location-views changes
nothing in the debug exprience, but -gno-statement-frontiers
makes gdb do this:

Breakpoint 1, get_alias_set (t=t@entry=0x77ff6c60) at 
../../gcc-10-20191117/gcc/alias.c:829
829 {
(gdb) n
837   if (t == error_mark_node
(gdb) n
829 {
(gdb) n
838   || (! TYPE_P (t)
(gdb) n
839   && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
(gdb) n
851   STRIP_NOPS (t);
(gdb)


So while the inline function is skipped this time, the function prologue
is executed twice, and the code is probably more jumpy than before.

It is interesting that my previous attempt at fixing it in gdb resulted
in basically the same effect :-/


My question to you is this: can you explain, given the following debug info, 
extracted
by readelf from alias.c, why the address 0x6e11 is in the inlined subroutine
and *not* in the get_alias_set function?  If you can explain it to me, I can 
probably
explain it to gdb as well.


  [0x8b37]  Special opcode 61: advance Address by 4 to 0x6e04 and Line by 0 
to 3386
  [0x8b38]  Set is_stmt to 1
  [0x8b39]  Special opcode 189: advance Address by 13 to 0x6e11 and Line by 
2 to 3388
  [0x8b3a]  Set is_stmt to 0
  [0x8b3b]  Copy (view 1)
  [0x8b3c]  Set File Name to entry 5 in the File Name Table
  [0x8b3e]  Set column to 8
  [0x8b40]  Advance Line by -2549 to 839

 <2><45b53>: Abbrev Number: 14 (DW_TAG_inlined_subroutine)
<45b54>   DW_AT_abstract_origin: <0x4c5b0>
<45b58>   DW_AT_entry_pc: 0x6e00
<45b60>   DW_AT_GNU_entry_view: 1
<45b62>   DW_AT_ranges  : 0x9d60
<45b66>   DW_AT_call_file   : 5
<45b67>   DW_AT_call_line   : 839
<45b69>   DW_AT_call_column : 8
<45b6a>   DW_AT_sibling : <0x45be4>

9d60 6e00 6e11
9d60 1115 112f
9d60 

So I read this: when pc==0x6e11, the location is line 3388 (in tree.h) it is a 
statement.
But the inlined subroutine "contains_struct_check" extends from 0x6e00..0x6e11 
and from
0x1115..0x112f.  Therefore 0x6e11 is OUTSIDE the subroutine, and that is what 
the gdb
thinks there as well.


Thanks
Bernd.


Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-11-19 Thread Vladimir Makarov

On 11/19/19 4:19 AM, Richard Biener wrote:

On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek  wrote:

Hi!

xchg instruction is smaller, in some cases much smaller than 3 moves,
(e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
disaster, but from Agner Fog tables and
https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
it doesn't seem to be something we'd want to use when optimizing for speed,
at least not on Intel.

While we have *swap patterns, those are very unlikely to be triggered
during combine, usually we have different pseudos in there and the actual
need for swapping is only materialized during RA.

The following patch does it when optimizing the insn for size only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if IRA/LRA should be made aware of an xchg instruction
(and it's cost?) so it knows it doesn't actually need a temporary register?


There is one place where it could be used.  It is 
ira-emit.c::modify_move_list where register shuffling on region border 
happens.  It is possible that pseudo1:hr1, pseudo2:hr2 inside a region 
should be pseudo1:hr2 and pseudo2:hr1 outside the region.


First I used a trick of 3 xor insns to break cycle w/o new temp 
creation. But such move cycles were so very rare events that I just 
finished with new temp creation.  That is extremely rare case, the goal 
IRA to decrease any move generation on region borders.  But if somebody 
implements xchg usage there, I'll not be against this.




Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-11-19 Thread Richard Sandiford
Wilco Dijkstra  writes:
> ping

I acked this here: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html

>
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?
>
> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
>
> PR target/79262
> * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>1, /* vec_int_stmt_cost  */
>1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
>1, /* vec_align_load_cost  */
>1, /* vec_unalign_load_cost  */


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-11-19 Thread Wilco Dijkstra


ping

PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a 
result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  

PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust 
vec_to_scalar_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */



Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2019-11-19 Thread Wilco Dijkstra
ping

The current pressure scheduler doesn't appear to correctly track register
pressure and avoid creating unnecessary spills when register pressure is high.
As a result disabling the early scheduler improves integer performance
considerably and reduces codesize as a bonus. Since scheduling floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.

On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-11-06  Wilco Dijkstra  

* gcc/common/config/arm-common.c (arm_option_optimization_table):
Enable fsched_pressure with Ofast only.

--
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
41a920f6dc96833e778faa8dbcc19beac483734c..b761d3abd670a144a593c4b410b1e7fbdcb52f56
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -38,7 +38,7 @@ static const struct default_options 
arm_option_optimization_table[] =
   {
 /* Enable section anchors by default at -O1 or higher.  */
 { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
-{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-19 Thread Wilco Dijkstra
ping

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2 for all Cortex-A cores.  Set max_cond_insns
to 4 on Thumb-2 architectures given it's already limited to that by
MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-08-19  Wilco Dijkstra  

* gcc/config/arm/arm.c (arm_option_override_internal):
Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
(arm_v6t2_tune): set max_cond_insns to 4.
(arm_cortex_tune): set max_cond_insns to 2.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_v6m_tune): set max_cond_insns to 4.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2167,9 +2167,6 @@ const struct tune_params arm_xgene1_tune =
   tune_params::SCHED_AUTOPREF_OFF
 };
 
-/* Branches can be dual-issued on Cortex-A5, so conditional execution is
-   less appealing.  Set max_insns_skipped to a low value.  */
-
 const struct tune_params arm_cortex_a5_tune =
 {
   _extra_costs,
@@ -2178,7 +2175,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  1,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   

Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

2019-11-19 Thread Segher Boessenkool
On Tue, Nov 19, 2019 at 12:58:19PM +, Richard Sandiford wrote:
> Fow the record, I think the assert was valid and I'd have preferred it
> if we'd fixed the targets so that they don't return bogus modes for
> preferred_simd_mode.  After all, it's the targets that decide which
> vector modes are supported and what can go in registers.
> 
> So IMO the assert simply exposed a genuine target bug.  I don't strongly
> object though.

Why is this a bug?  Saying V2DI is the preferred vector mode for DI does
not imply that we can actually put V2DI in registers, does it?  It isn't
documented like that, in any case.

(Of course it is somewhat silly to return a mode that cannot be used, and
it is likely the target can do better.)


Segher


Re: Symver attribute

2019-11-19 Thread Jan Hubicka
> On Tue, Nov 19, 2019 at 07:29:37AM +0100, Jan Hubicka wrote:
> > Current patch makes GCC to accept symver attribute on all ELF targets
> > and simply output .symver directive into the assembly file hoping for
> > the best.  I hope that is acceptable since user will be informed by
> > assembler that symver is not supported.
> > 
> > For testsuite we however needs tests to not fail when .symver is not
> > accepted by assembler that can probably by tested by the dg-require
> > infrastructure...
> > 
> > It is not perfect solution but also not much worse than what we do
> > elsewhere...
> 
> Sounds perfectly fine to me, but how many tests will need changing?  Is
> it only those that use symbol versioning directly?

There are no tests presently, I plan to write some so those will get
dg-require-symver.

.symver is output only if you use explicit symver function/variable
attribute. So the "only" downdisde of this is that instead of getting
friendly error message from GCC that your target does not support
symvers (because we can not easily check for it) you will get less
friendly error message from assembler.  I hope that is acceptale since
we have pre-existing situations like that already.

Honza


> 
> 
> Segher


Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Richard Biener
On Tue, Nov 19, 2019 at 2:54 PM Martin Liška  wrote:
>
> On 11/19/19 10:25 AM, Richard Biener wrote:
> > On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:
> >>
> >> On 11/14/19 1:15 PM, Richard Biener wrote:
> >>> Hmm.  I was thinking of moving the pass instead of adding another one.
> >>> What's the reason to run switch-conversion during early optimization 
> >>> again?
> >>
> >> I'm adding CC, as he's the author of cswitch pass and he probably knows
> >> more about the pass placement. But I bet early conversion to array access
> >> rapidly simplifies CFG and other passes can benefit from that.
> >>
> >>>
> >>> But then it's probably not too bad... (and somehow I'd still like to see
> >>> switch-conversion, switch lowering and if-to-switch be "integrated"
> >>> somehow, analyzing the IL and then outputting optimized if/switch
> >>> according to the same cost metric).
> >>
> >> Well, my intuition is that gswitch statement is a canonical representation
> >> of a switch. And so that one wants to have if-to-switch early in the 
> >> pipeline
> >> and switch lowering very late?
> >
> > Just to throw in a tiny thing here - while a switch seems a natural 
> > canonical
> > representation it is actually a poor one for most GCC passes since they're
> > going to give up on switches and it's CFG complexity.  GCCs switch
> > representation also isn't exactly good (remember all that label lookup
> > stuff you have to do).  Switches are a quite high-level representation
> > and to me only "good" if we are actually going to emit a jump-table.
>
> I agree with you here. My simple counter example would be a user input
> where a switch will be explicitly used :)
>
> > Given converting a series of ifs to a switch is "hard" (the opposite
> > is easy) going for a switch early (before CFG transforms mess up
> > things too much) kind-of makes sense, but lowering back non-jump table
> > ones soon is equally important.
>
> What we can do is to share decision login in between iftoswitch pass and
> cswitch/switch lowering. That way we can convert chains of if statements
> that will be interesting for any of the later passes.
>
> One comment here is that 'if-elseif -> gswitch -> lowering to ifs' can
> still produce a faster code as be expand smart to a decision tree.

Sure.  As said in the if-to-switch pass "review" all these (now three)
passes should share an intermediate representation and costing they
then generate code from (either switch-converted lookups, switches
or optimized if-chains).  There's no reason to actually generate a switch
from an if-chain if we just want to optimize the if-chain (it might be
convenient for the code generator though, not sure).  Likewise if the
switch ends up switch-converted there's no reason to produce the
switch first.

But first and foremost sharing costing and transform decision code
which means having some unified representation of the IL semantics
is what is needed for this.

> The question still remains what do we want for GCC 10? A selective
> iftoswitch conversion?

Not sure if we really need it for GCC 10 - was there any pressing reason
other than PRxyz asks for it?

Thanks,
Richard.

> Thanks,
> Martin
>
> >
> > That's of course all unrelated to adding another switch-conversion pass.
> > (and yes, switch-converted IL is more friendly than a switch)
> >
> > Richard.
> >
> >> Martin
> >>
>


Re: [PATCH] Restore init_ggc_heuristics.

2019-11-19 Thread Richard Biener
On Tue, Nov 19, 2019 at 2:18 PM Martin Liška  wrote:
>
> On 11/19/19 1:42 PM, Richard Biener wrote:
> > Well, then call it from the caller of init_options_struct instead,
> > right after it or after the
> > langhook variant is called?
>
> Yes, that's definitely possible, there's a patch that I've just locally 
> tested.
>
> Ready for trunk?

OK.

> Thanks,
> Martin


[committed] Update loop-1.c test for amdgcn

2019-11-19 Thread Andrew Stubbs
I've committed this minor testcase update. The change updates the 
expected result for amdgcn, but shouldn't affect any other target.


The compiler used to calculate the function address once and then call 
it five times. Now it repeats the calculation five times, so the pattern 
doesn't match. The code is still correct for the purpose of the testcase 
either way, however, so I'm removing the over-fussy match.


Andrew
Update loop-1.c test for amdgcn

2019-11-19  Andrew Stubbs  

	gcc/testsuite/
	* gcc.dg/tree-ssa/loop-1.c: Change amdgcn assembler scan.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
index 4b5a43457b0..39ee4dea883 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
@@ -45,8 +45,6 @@ int xxx(void)
relaxation.  */
 /* CRIS and MSP430 keep the address in a register.  */
 /* m68k sometimes puts the address in a register, depending on CPU and PIC.  */
-/* AMD GCN loads symbol addresses as hi/lo pairs, and then reuses that for
-   each jump.  */
 
 /* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* cris-*-* crisv32-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */
 /* { dg-final { scan-assembler-times "foo,%r" 5 { target hppa*-*-* } } } */
@@ -58,5 +56,4 @@ int xxx(void)
 /* { dg-final { scan-assembler-times "\[jb\]sr" 5 { target fido-*-* m68k-*-* pdp11-*-* } } } */
 /* { dg-final { scan-assembler-times "bra *tr,r\[1-9\]*,r21" 5 { target visium-*-* } } } */
 /* { dg-final { scan-assembler-times "(?n)\[ \t\]call\[ \t\].*\[ \t\]foo," 5 { target nvptx*-*-* } } } */
-/* { dg-final { scan-assembler-times "add_u32\t\[sv\]\[0-9\]*, \[sv\]\[0-9\]*, foo@rel32@lo" 1 { target { amdgcn*-*-* } } } } */
 /* { dg-final { scan-assembler-times "s_swappc_b64" 5 { target { amdgcn*-*-* } } } } */


Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Martin Liška

On 11/19/19 11:25 AM, Jakub Jelinek wrote:

The point I'm trying to make is that with if-to-switch, if cswitch doesn't
handle it, we have a regression.  If user writes it as a switch, it is a
missed optimization if we generate worse code than we could, but not a
regression.


Sure, but the question is how common are these conditions that can reassoc
transform using the fancy popcount techniques? On the other handy, if-to-switch
can optimize more switch statements to e.g. jump tables.

As mentioned in the previous email, I can imagine doing gswitch from a if chain
only if cswitch/switch lowering can handle it.

Martin


Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Martin Liška

On 11/19/19 10:25 AM, Richard Biener wrote:

On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:


On 11/14/19 1:15 PM, Richard Biener wrote:

Hmm.  I was thinking of moving the pass instead of adding another one.
What's the reason to run switch-conversion during early optimization again?


I'm adding CC, as he's the author of cswitch pass and he probably knows
more about the pass placement. But I bet early conversion to array access
rapidly simplifies CFG and other passes can benefit from that.



But then it's probably not too bad... (and somehow I'd still like to see
switch-conversion, switch lowering and if-to-switch be "integrated"
somehow, analyzing the IL and then outputting optimized if/switch
according to the same cost metric).


Well, my intuition is that gswitch statement is a canonical representation
of a switch. And so that one wants to have if-to-switch early in the pipeline
and switch lowering very late?


Just to throw in a tiny thing here - while a switch seems a natural canonical
representation it is actually a poor one for most GCC passes since they're
going to give up on switches and it's CFG complexity.  GCCs switch
representation also isn't exactly good (remember all that label lookup
stuff you have to do).  Switches are a quite high-level representation
and to me only "good" if we are actually going to emit a jump-table.


I agree with you here. My simple counter example would be a user input
where a switch will be explicitly used :)


Given converting a series of ifs to a switch is "hard" (the opposite
is easy) going for a switch early (before CFG transforms mess up
things too much) kind-of makes sense, but lowering back non-jump table
ones soon is equally important.


What we can do is to share decision login in between iftoswitch pass and
cswitch/switch lowering. That way we can convert chains of if statements
that will be interesting for any of the later passes.

One comment here is that 'if-elseif -> gswitch -> lowering to ifs' can
still produce a faster code as be expand smart to a decision tree.

The question still remains what do we want for GCC 10? A selective
iftoswitch conversion?

Thanks,
Martin



That's of course all unrelated to adding another switch-conversion pass.
(and yes, switch-converted IL is more friendly than a switch)

Richard.


Martin





Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov



On 11/19/19 1:41 PM, Dennis Zhang wrote:

Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:

Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin_general): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
   __ARM_FEATURE_MEMORY_TAGGING when enabled.
   (aarch64_resolve_overloaded_builtin): Call
   aarch64_resolve_overloaded_builtin_general.
   * config/aarch64/aarch64-protos.h
   (aarch64_resolve_overloaded_builtin_general): New declaration.
   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
   (TARGET_MEMTAG): Likewise.
   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
   (irg, gmi, subp, addg, ldg, stg): New instructions.
   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
   * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
New.
   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
   * config/arm/types.md (memtag): New.
   * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

   * gcc.target/aarch64/acle/memtag_1.c: New test.
   * gcc.target/aarch64/acle/memtag_2.c: New test.
   * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics


The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
    AARCH64_MEMTAG_BUILTIN_START,
AARCH64_MEMTAG_BUILTIN_IRG,
    AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
    AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
    AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
    (aarch64_init_memtag_builtins): New.
    (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
    (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
    (aarch64_expand_builtin_memtag): New.
    (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
    (AARCH64_BUILTIN_SUBCODE): New macro.
    (aarch64_resolve_overloaded_memtag): New.
    (aarch64_resolve_overloaded_builtin): New hook. Call
    

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Dennis Zhang
Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 11/12/19 5:32 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> On 12/11/2019 15:57, Kyrill Tkachov wrote:
>>> On 11/12/19 3:50 PM, Dennis Zhang wrote:
 Hi Kyrill,

 On 12/11/2019 09:40, Kyrill Tkachov wrote:
> Hi Dennis,
>
> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> I have rebased the patch on top of current truck.
>> For resolve_overloaded, I redefined my memtag overloading function to
>> fit the latest resolve_overloaded_builtin interface.
>>
>> Regression tested again and survived for aarch64-none-linux-gnu.
> Please reply inline rather than top-posting on gcc-patches.
>
>
>> Cheers
>> Dennis
>>
>> Changelog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>>   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): 
>> Add
>>   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>   (aarch64_init_memtag_builtins): New.
>>   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>   (aarch64_general_init_builtins): Call
>> aarch64_init_memtag_builtins.
>>   (aarch64_expand_builtin_memtag): New.
>>   (aarch64_general_expand_builtin): Call
>> aarch64_expand_builtin_memtag.
>>   (AARCH64_BUILTIN_SUBCODE): New macro.
>>   (aarch64_resolve_overloaded_memtag): New.
>>   (aarch64_resolve_overloaded_builtin_general): New hook. Call
>>   aarch64_resolve_overloaded_memtag to handle overloaded MTE
>> builtins.
>>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): 
>> Define
>>   __ARM_FEATURE_MEMORY_TAGGING when enabled.
>>   (aarch64_resolve_overloaded_builtin): Call
>>   aarch64_resolve_overloaded_builtin_general.
>>   * config/aarch64/aarch64-protos.h
>>   (aarch64_resolve_overloaded_builtin_general): New declaration.
>>   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>>   (TARGET_MEMTAG): Likewise.
>>   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>>   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>>   (irg, gmi, subp, addg, ldg, stg): New instructions.
>>   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
>> macro.
>>   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>>   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
>> Likewise.
>>   * config/aarch64/predicates.md (aarch64_memtag_tag_offset): 
>> New.
>>   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>>   * config/arm/types.md (memtag): New.
>>   * doc/invoke.texi (-memtag): Update description.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>>   * gcc.target/aarch64/acle/memtag_1.c: New test.
>>   * gcc.target/aarch64/acle/memtag_2.c: New test.
>>   * gcc.target/aarch64/acle/memtag_3.c: New test.
>>
>>
>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
 Hi,

 Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
 It can be used for spatial and temporal memory safety detection and
 lightweight lock and key system.

 This patch enables new intrinsics leveraging MTE instructions to
 implement functionalities of creating tags, setting tags, reading
 tags,
 and manipulating tags.
 The intrinsics are part of Arm ACLE extension:
 https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics 


 The MTE ISA specification can be found at
 https://developer.arm.com/docs/ddi0487/latest chapter D6.

 Bootstraped and regtested for aarch64-none-linux-gnu.

 Please help to check if it's OK for trunk.

>>> This looks mostly ok to me but for further review this needs to be
>>> rebased on top of current trunk as there are some conflicts with
>>> the SVE
>>> ACLE changes that recently went in. Most conflicts looks trivial to
>>> resolve but one that needs more attention is the definition of the
>>> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
>>>
>>> Thanks,
>>>
>>> Kyrill
>>>
 Many Thanks
 Dennis

 gcc/ChangeLog:

 2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum
 aarch64_builtins): Add
  

Re: [PATCH] Restore init_ggc_heuristics.

2019-11-19 Thread Martin Liška

On 11/19/19 1:42 PM, Richard Biener wrote:

Well, then call it from the caller of init_options_struct instead,
right after it or after the
langhook variant is called?


Yes, that's definitely possible, there's a patch that I've just locally tested.

Ready for trunk?
Thanks,
Martin
>From b868170d585354e3cfedb4b6076ec66d475fa66d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 Nov 2019 13:55:40 +0100
Subject: [PATCH] Restore init_ggc_heuristics.

gcc/ChangeLog:

2019-11-19  Martin Liska  

	* toplev.c (general_init): Move the call...
	(toplev::main): ... here as we need init_options_struct
	being called.
---
 gcc/toplev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index d4583bac66c..cfc757d84ba 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1240,10 +1240,6 @@ general_init (const char *argv0, bool init_signals)
   /* Initialize register usage now so switches may override.  */
   init_reg_sets ();
 
-  /* This must be done after global_init_params but before argument
- processing.  */
-  init_ggc_heuristics ();
-
   /* Create the singleton holder for global state.  This creates the
  dump manager.  */
   g = new gcc::context ();
@@ -2377,6 +2373,10 @@ toplev::main (int argc, char **argv)
   init_options_struct (_options, _options_set);
   lang_hooks.init_options_struct (_options);
 
+  /* Init GGC heuristics must be caller after we initialize
+ options.  */
+  init_ggc_heuristics ();
+
   /* Convert the options to an array.  */
   decode_cmdline_options_to_array_default_mask (argc,
 		CONST_CAST2 (const char **,
-- 
2.24.0



Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

2019-11-19 Thread Jakub Jelinek
On Tue, Nov 19, 2019 at 12:58:19PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > This patch restores the previous behavior, there could be many reasons why
> > TYPE_MODE is BLKmode or some integral mode instead of a vector mode,
> > unsupported vector mode, lack of available registers etc.
> 
> Fow the record, I think the assert was valid and I'd have preferred it
> if we'd fixed the targets so that they don't return bogus modes for
> preferred_simd_mode.  After all, it's the targets that decide which
> vector modes are supported and what can go in registers.
> 
> So IMO the assert simply exposed a genuine target bug.  I don't strongly
> object though.

I don't object if it is fixed in the targets, I just don't think
omp_clause_aligned_alignment is the place where such target bugs should be
diagnosed.  If we want to enforce it, we should enforce it in some self-test
(though, that wouldn't cover all possible ISA combinations), or in some
checking function somewhere where it is always triggered.
The vectorizer apparently doesn't enforce that.

Jakub



Re: [committed] Punt instead of assertion failure when TYPE_MODE is not vmode (PR tree-optimization/92557)

2019-11-19 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> This patch restores the previous behavior, there could be many reasons why
> TYPE_MODE is BLKmode or some integral mode instead of a vector mode,
> unsupported vector mode, lack of available registers etc.

Fow the record, I think the assert was valid and I'd have preferred it
if we'd fixed the targets so that they don't return bogus modes for
preferred_simd_mode.  After all, it's the targets that decide which
vector modes are supported and what can go in registers.

So IMO the assert simply exposed a genuine target bug.  I don't strongly
object though.

Thanks,
Richard

> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
>
> 2019-11-19  Jakub Jelinek  
>
>   PR tree-optimization/92557
>   * omp-low.c (omp_clause_aligned_alignment): Punt if TYPE_MODE is not
>   vmode rather than asserting it always is.
>
>   * gcc.dg/gomp/pr92557.c: New test.
>
> --- gcc/omp-low.c.jj  2019-11-15 00:37:26.359069152 +0100
> +++ gcc/omp-low.c 2019-11-18 16:17:27.767714777 +0100
> @@ -4086,8 +4086,8 @@ omp_clause_aligned_alignment (tree claus
>   if (type == NULL_TREE || TYPE_MODE (type) != mode)
> continue;
>   type = build_vector_type_for_mode (type, vmode);
> - /* The functions above are not allowed to return invalid modes.  */
> - gcc_assert (TYPE_MODE (type) == vmode);
> + if (TYPE_MODE (type) != vmode)
> +   continue;
>   if (TYPE_ALIGN_UNIT (type) > al)
> al = TYPE_ALIGN_UNIT (type);
>}
> --- gcc/testsuite/gcc.dg/gomp/pr92557.c.jj2019-11-18 16:18:18.246961685 
> +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr92557.c   2019-11-18 16:19:21.531017555 
> +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/92557 */
> +/* { dg-do compile } */
> +/* { dg-additional-options "-maltivec" { target powerpc*-*-* } } */
> +
> +void
> +foo (double *p)
> +{
> +  int i;
> +
> +#pragma omp simd aligned (p)
> +  for (i = 0; i < 1; ++i)
> +p[i] = 7.0;
> +}
>
>   Jakub


Re: [PATCH] Restore init_ggc_heuristics.

2019-11-19 Thread Richard Biener
On Tue, Nov 19, 2019 at 12:37 PM Martin Liška  wrote:
>
> On 11/19/19 11:03 AM, Richard Biener wrote:
> > On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> After my param to option transformation, we lost automatic GGC
> >> detection. It's because init_ggc_heuristics is called before
> >> init_options_struct which memsets all the values to zero first.
> >>
> >> I've tested the patch with --enable-checking=release and I hope
> >> Honza can test it more?
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
> > call is OK.
>
> I would like to, but opts.o is also put into all wrappers:
>
> g++ -no-pie   -g   -DIN_GCC -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc   -o xgcc gcc.o gcc-main.o 
> ggc-none.o \
>c/gccspec.o driver-i386.o  libcommon-target.a \
> libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
> ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: 
> libcommon-target.a(opts.o): in function `init_options_struct(gcc_options*, 
> gcc_options*)':
> /home/marxin/Programming/gcc/gcc/opts.c:292: undefined reference to 
> `init_ggc_heuristics()'
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:2037: xgcc] Error 1
>
> and adding ggc-common.o to OBJS-libcommon-target will not work.
> That's why I also moved the implementation.

Well, then call it from the caller of init_options_struct instead,
right after it or after the
langhook variant is called?

Richard.

>
> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-11-18  Martin Liska  
> >>
> >>  * ggc-common.c (ggc_rlimit_bound): Move to opts.c
> >>  (ggc_min_expand_heuristic): Likewise.
> >>  (ggc_min_heapsize_heuristic): Likewise.
> >>  (init_ggc_heuristics): Likewise.
> >>  * ggc.h (init_ggc_heuristics): Remove declaration.
> >>  * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
> >>  (ggc_min_expand_heuristic): Likewise.
> >>  (ggc_min_heapsize_heuristic): Likewise.
> >>  (init_ggc_heuristics): Likewise.
> >>  (init_options_struct): Init GGC params.
> >>  * toplev.c (general_init): Remove call to init_ggc_heuristics.
> >> ---
> >>gcc/ggc-common.c | 103 -
> >>gcc/ggc.h|   3 --
> >>gcc/opts.c   | 106 +++
> >>gcc/toplev.c |   4 --
> >>4 files changed, 106 insertions(+), 110 deletions(-)
> >>
> >>
>


Re: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.

2019-11-19 Thread Richard Biener
On Tue, Nov 19, 2019 at 12:36 PM Martin Liška  wrote:
>
> On 11/19/19 10:12 AM, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 9:12 AM Martin Liška  wrote:
> >>
> >> One potential improvement is to enable the heuristics
> >> for ENABLE_GC_CHECKING. The macro is about sanity checking
> >> and poisoning of GGC memory. Which seems to me completely
> >> independent to setting of the default parameters.
> >
> > Well, doing more collects also is "checking" and the GC checking part
> > this way becomes deterministic (not dependent on host setup) so no,
> > this is by no means an obvious change.  If we think we collect too
> > often we can adjust the GC param static defaults to todays reality instead
> > (mostly --param ggc-min-heapsize which has been 4MB for ages).
>
> Sure, that's probably more reasonable change we can do. Do you have a 
> suggestion
> when can default be now?

I'm not sure.  For GC checking (poisoning, etc.) more collecting is
better because
then stuff is poisoned earlier.  The question is really the intent here.

> >
> > Btw, I see the params.opt default is now 4096 unconditionally so
> > does ENABLE_GC_ALWAYS_COLLECT no longer "work"?
> > It used to adjust the params.def default values.
>
> It will work with ENABLE_GC_ALWAYS_COLLECT once the patch
> in previous email will be applied.
>
> Martin
>
> >
> > Richard.
> >
> >> Ready to be installed?
> >> Thanks,
> >> Martin
>


Re: Symver attribute

2019-11-19 Thread Segher Boessenkool
On Tue, Nov 19, 2019 at 07:29:37AM +0100, Jan Hubicka wrote:
> Current patch makes GCC to accept symver attribute on all ELF targets
> and simply output .symver directive into the assembly file hoping for
> the best.  I hope that is acceptable since user will be informed by
> assembler that symver is not supported.
> 
> For testsuite we however needs tests to not fail when .symver is not
> accepted by assembler that can probably by tested by the dg-require
> infrastructure...
> 
> It is not perfect solution but also not much worse than what we do
> elsewhere...

Sounds perfectly fine to me, but how many tests will need changing?  Is
it only those that use symbol versioning directly?


Segher


[patch, openacc] Adjust tests for amdgcn offloading

2019-11-19 Thread Andrew Stubbs
This patch adds GCN special casing for most of the OpenACC libgomp tests 
that require it. It also disables one testcase that explicitly uses CUDA.


OK to commit?

Andrew
Update OpenACC tests for amdgcn

2019-11-19  Andrew Stubbs  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Handle gcn.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Disable on GCN.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
index b356feb8108..e82a03e8f3c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
@@ -224,6 +224,8 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info *
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 7cfc364e411..ddf647cda9b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -106,6 +106,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
 assert (event_info->launch_event.vector_length >= 1);
   else if (acc_device_type == acc_device_nvidia) /* ... is special.  */
 assert (event_info->launch_event.vector_length == 32);
+  else if (acc_device_type == acc_device_gcn) /* ...and so is this.  */
+assert (event_info->launch_event.vector_length == 64);
   else
 {
 #ifdef __OPTIMIZE__
@@ -118,6 +120,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
index ac6eb48cbbe..dc7c7582ce2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
@@ -265,6 +265,8 @@ static void cb_enter_data_end (acc_prof_info *prof_info, acc_event_info *event_i
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -319,6 +321,8 @@ static void cb_exit_data_start (acc_prof_info *prof_info, acc_event_info *event_
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -371,6 +375,8 @@ static void cb_exit_data_end (acc_prof_info *prof_info, acc_event_info *event_in
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES);
@@ -510,6 +516,8 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info *
 
   if (acc_device_type == acc_device_host)
 assert (api_info->device_api == acc_device_api_none);
+  else if (acc_device_type == acc_device_gcn)
+assert (api_info->device_api == acc_device_api_other);
   else
 assert (api_info->device_api == acc_device_api_cuda);
   assert (api_info->valid_bytes == 

[PATCH] Fix PR92581

2019-11-19 Thread Richard Biener


We are now vectorizing condition reduction chains but epilogue generation
doesn't consider them.  The following fixes this by simply emitting
a chain of index vector updates rather than trying to be clever
and combining the conditions which is quite target dependent.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-19  Richard Biener  

PR tree-optimization/92581
* tree-vect-loop.c (vect_create_epilog_for_reduction): For
condition reduction chains gather all conditions involved
for computing the index reduction vector.

* gcc.dg/vect/vect-cond-reduc-5.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 278431)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4552,18 +4552,26 @@ vect_create_epilog_for_reduction (stmt_v
  zeroes.  */
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION)
 {
+  auto_vec, 2> ccompares;
   stmt_vec_info cond_info = STMT_VINFO_REDUC_DEF (reduc_info);
   cond_info = vect_stmt_to_vectorize (cond_info);
-  while (gimple_assign_rhs_code (cond_info->stmt) != COND_EXPR)
+  while (cond_info != reduc_info)
{
+ if (gimple_assign_rhs_code (cond_info->stmt) == COND_EXPR)
+   {
+ gimple *vec_stmt = STMT_VINFO_VEC_STMT (cond_info)->stmt;
+ gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
+ ccompares.safe_push
+   (std::make_pair (unshare_expr (gimple_assign_rhs1 (vec_stmt)),
+STMT_VINFO_REDUC_IDX (cond_info) == 2));
+   }
  cond_info
= loop_vinfo->lookup_def (gimple_op (cond_info->stmt,
 1 + STMT_VINFO_REDUC_IDX
(cond_info)));
  cond_info = vect_stmt_to_vectorize (cond_info);
}
-  gimple *vec_stmt = STMT_VINFO_VEC_STMT (cond_info)->stmt;
-  gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
+  gcc_assert (ccompares.length () != 0);
 
   tree indx_before_incr, indx_after_incr;
   poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4605,37 +4613,35 @@ vect_create_epilog_for_reduction (stmt_v
   add_phi_arg (as_a  (new_phi), vec_zero,
   loop_preheader_edge (loop), UNKNOWN_LOCATION);
 
-  /* Now take the condition from the loops original cond_expr
-(VEC_STMT) and produce a new cond_expr (INDEX_COND_EXPR) which for
+  /* Now take the condition from the loops original cond_exprs
+and produce a new cond_exprs (INDEX_COND_EXPR) which for
 every match uses values from the induction variable
 (INDEX_BEFORE_INCR) otherwise uses values from the phi node
 (NEW_PHI_TREE).
 Finally, we update the phi (NEW_PHI_TREE) to take the value of
 the new cond_expr (INDEX_COND_EXPR).  */
-
-  /* Duplicate the condition from vec_stmt.  */
-  tree ccompare = unshare_expr (gimple_assign_rhs1 (vec_stmt));
-
-  /* Create a conditional, where the condition is taken from vec_stmt
-(CCOMPARE).  The then and else values mirror the main VEC_COND_EXPR:
-the reduction phi corresponds to NEW_PHI_TREE and the new values
-correspond to INDEX_BEFORE_INCR.  */
-  gcc_assert (STMT_VINFO_REDUC_IDX (cond_info) >= 1);
-  tree index_cond_expr;
-  if (STMT_VINFO_REDUC_IDX (cond_info) == 2)
-   index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
- ccompare, indx_before_incr, new_phi_tree);
-  else
-   index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
- ccompare, new_phi_tree, indx_before_incr);
-  induction_index = make_ssa_name (cr_index_vector_type);
-  gimple *index_condition = gimple_build_assign (induction_index,
-index_cond_expr);
-  gsi_insert_before (_gsi, index_condition, GSI_SAME_STMT);
-  stmt_vec_info index_vec_info = loop_vinfo->add_stmt (index_condition);
+  gimple_seq stmts = NULL;
+  for (int i = ccompares.length () - 1; i != -1; --i)
+   {
+ tree ccompare = ccompares[i].first;
+ if (ccompares[i].second)
+   new_phi_tree = gimple_build (, VEC_COND_EXPR,
+cr_index_vector_type,
+ccompare,
+indx_before_incr, new_phi_tree);
+ else
+   new_phi_tree = gimple_build (, VEC_COND_EXPR,
+cr_index_vector_type,
+ccompare,
+new_phi_tree, indx_before_incr);
+   }
+  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
+  stmt_vec_info 

Re: [PATCH] Restore init_ggc_heuristics.

2019-11-19 Thread Martin Liška

On 11/19/19 11:03 AM, Richard Biener wrote:

On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:


Hello.

After my param to option transformation, we lost automatic GGC
detection. It's because init_ggc_heuristics is called before
init_options_struct which memsets all the values to zero first.

I've tested the patch with --enable-checking=release and I hope
Honza can test it more?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
call is OK.


I would like to, but opts.o is also put into all wrappers:

g++ -no-pie   -g   -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
-DHAVE_CONFIG_H -static-libstdc++ -static-libgcc   -o xgcc gcc.o gcc-main.o 
ggc-none.o \
  c/gccspec.o driver-i386.o  libcommon-target.a \
   libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: 
libcommon-target.a(opts.o): in function `init_options_struct(gcc_options*, 
gcc_options*)':
/home/marxin/Programming/gcc/gcc/opts.c:292: undefined reference to 
`init_ggc_heuristics()'
collect2: error: ld returned 1 exit status
make: *** [Makefile:2037: xgcc] Error 1

and adding ggc-common.o to OBJS-libcommon-target will not work.
That's why I also moved the implementation.

Martin



Thanks,
Richard.


Thanks,
Martin

gcc/ChangeLog:

2019-11-18  Martin Liska  

 * ggc-common.c (ggc_rlimit_bound): Move to opts.c
 (ggc_min_expand_heuristic): Likewise.
 (ggc_min_heapsize_heuristic): Likewise.
 (init_ggc_heuristics): Likewise.
 * ggc.h (init_ggc_heuristics): Remove declaration.
 * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
 (ggc_min_expand_heuristic): Likewise.
 (ggc_min_heapsize_heuristic): Likewise.
 (init_ggc_heuristics): Likewise.
 (init_options_struct): Init GGC params.
 * toplev.c (general_init): Remove call to init_ggc_heuristics.
---
   gcc/ggc-common.c | 103 -
   gcc/ggc.h|   3 --
   gcc/opts.c   | 106 +++
   gcc/toplev.c |   4 --
   4 files changed, 106 insertions(+), 110 deletions(-)






Re: [PATCH] Enable init_ggc_heuristics for ENABLE_GC_CHECKING.

2019-11-19 Thread Martin Liška

On 11/19/19 10:12 AM, Richard Biener wrote:

On Tue, Nov 19, 2019 at 9:12 AM Martin Liška  wrote:


One potential improvement is to enable the heuristics
for ENABLE_GC_CHECKING. The macro is about sanity checking
and poisoning of GGC memory. Which seems to me completely
independent to setting of the default parameters.


Well, doing more collects also is "checking" and the GC checking part
this way becomes deterministic (not dependent on host setup) so no,
this is by no means an obvious change.  If we think we collect too
often we can adjust the GC param static defaults to todays reality instead
(mostly --param ggc-min-heapsize which has been 4MB for ages).


Sure, that's probably more reasonable change we can do. Do you have a suggestion
when can default be now?



Btw, I see the params.opt default is now 4096 unconditionally so
does ENABLE_GC_ALWAYS_COLLECT no longer "work"?
It used to adjust the params.def default values.


It will work with ENABLE_GC_ALWAYS_COLLECT once the patch
in previous email will be applied.

Martin



Richard.


Ready to be installed?
Thanks,
Martin




Re: Add a new combine pass

2019-11-19 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Sun, Nov 17, 2019 at 11:35:26PM +, Richard Sandiford wrote:
>> While working on SVE, I've noticed several cases in which we fail
>> to combine instructions because the combined form would need to be
>> placed earlier in the instruction stream than the last of the
>> instructions being combined.  This includes one very important
>> case in the handling of the first fault register (FFR).
>
> Do you have an example of that?

It's difficult to share realistic examples at this stage since this
isn't really the right forum for making them public for the first time.
But in rtl terms we have:

(set (reg/v:VNx16BI 102 [ ok ])
 (reg:VNx16BI 85 ffrt))
(set (reg:VNx16BI 85 ffrt)
 (unspec:VNx16BI [(reg:VNx16BI 85 ffrt)] UNSPEC_UPDATE_FFRT))
(set (reg:CC_NZC 66 cc)
 (unspec:CC_NZC
   [(reg:VNx16BI 106) repeated x2
(const_int 1 [0x1])
(reg/v:VNx16BI 102 [ ok ])] UNSPEC_PTEST))

and want to combine the first and third instruction at the site of the
first instruction.  Current combine gives:

Trying 18 -> 24:
   18: r102:VNx16BI=ffrt:VNx16BI
   24: cc:CC_NZC=unspec[r106:VNx16BI,r106:VNx16BI,0x1,r102:VNx16BI] 104
Can't combine i2 into i3

because of:

  /* Make sure that the value that is to be substituted for the register
 does not use any registers whose values alter in between.  However,
 If the insns are adjacent, a use can't cross a set even though we
 think it might (this can happen for a sequence of insns each setting
 the same destination; last_set of that register might point to
 a NOTE).  If INSN has a REG_EQUIV note, the register is always
 equivalent to the memory so the substitution is valid even if there
 are intervening stores.  Also, don't move a volatile asm or
 UNSPEC_VOLATILE across any other insns.  */
  || (! all_adjacent
  && (((!MEM_P (src)
|| ! find_reg_note (insn, REG_EQUIV, src))
   && modified_between_p (src, insn, i3))
  || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
  || GET_CODE (src) == UNSPEC_VOLATILE))

>> Combine currently requires the combined instruction to live at the same
>> location as i3.
>
> Or i2 and i3.
>
>> I thought about trying to relax that restriction, but it
>> would be difficult to do with the current pass structure while keeping
>> everything linear-ish time.
>
> s/difficult/impossible/, yes.
>
> A long time ago we had to only move insns forward for correctness even,
> but that should no longer be required, combine always is finite by other
> means now.
>
>> So this patch instead goes for an option that has been talked about
>> several times over the years: writing a new combine pass that just
>> does instruction combination, and not all the other optimisations
>> that have been bolted onto combine over time.  E.g. it deliberately
>> doesn't do things like nonzero-bits tracking, since that really ought
>> to be a separate, more global, optimisation.
>
> In my dreams tracking nonzero bits would be a dataflow problem.
>
>> This is still far from being a realistic replacement for the even
>> the combine parts of the current combine pass.  E.g.:
>> 
>> - it only handles combinations that can be built up from individual
>>   two-instruction combinations.
>
> And combine does any of {2,3,4}->{1,2} combinations, and it also can
> modify a third insn ("other_insn").  For the bigger ->1 combos, if it
> *can* be decomposed in a bunch of 2->1, then those result in insns that
> are greater cost than those we started with (or else those combinations
> *would* be done).  For the ->2 combinations, there are many ways those
> two insns can be formed: it can be the two arms of a parallel, or
> combine can break a non-matching insn into two at what looks like a good
> spot for that, or it can use a define_split for it.
>
> All those things lead to many more successful combinations :-)

Right.  I definitely want to support multi-insn combos too.  It's one of
the TODOs in the head comment, along with the other points in this list.
Like I say, it's not yet a realistic replacement for even the combine parts
of the current pass.

>> On a more positive note, the pass handles things that the current
>> combine pass doesn't:
>> 
>> - the main motivating feature mentioned above: it works out where
>>   the combined instruction could validly live and moves it there
>>   if necessary.  If there are a range of valid places, it tries
>>   to pick the best one based on register pressure (although only
>>   with a simple heuristic for now).
>
> How are dependencies represented in your new pass?  If it just does
> walks over the insn stream for everything, you get quadratic complexity
> if you move insns backwards.  We have that in combine already, mostly
> from modified_between_p, but that is limited because of how LOG_LINKS
> work, and we have been doing this for so long and there are no 

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov

Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin_general): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  (aarch64_resolve_overloaded_builtin): Call
  aarch64_resolve_overloaded_builtin_general.
  * config/aarch64/aarch64-protos.h
  (aarch64_resolve_overloaded_builtin_general): New declaration.
  * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
  (TARGET_MEMTAG): Likewise.
  * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
  UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
  (irg, gmi, subp, addg, ldg, stg): New instructions.
  * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
  (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
  (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
  * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
  (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
  * config/arm/types.md (memtag): New.
  * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

  * gcc.target/aarch64/acle/memtag_1.c: New test.
  * gcc.target/aarch64/acle/memtag_2.c: New test.
  * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics

The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
   

Re: [arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Earnshaw (lists)

On 19/11/2019 10:35, Richard Henderson wrote:

I'm not sure what happened to v2.  I can see it in my sent email, but it never
made it to the mailing list, and possibly not to Richard E. either.

So resending, with an extra testsuite fix for ilp32, spotted by Christophe.

Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, I
am swapping out a targetm hook when changing into and out of thumb1 mode.


r~




OK.

R.


Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-19 Thread Bernhard Reutner-Fischer
On 16 November 2019 21:33:58 CET, Thomas Koenig  wrote:
>Hello world,
>
>here is an update to the patch.

+ char name[GFC_MAX_SYMBOL_LEN + 1];
+ snprintf (name, GFC_MAX_SYMBOL_LEN, "__dummy_%d_%s", var_num++,
+   f->sym->name);
+
+ if (gfc_get_sym_tree (name, ns, , false) != 0)

(I) you should + sizeof(__dummy__) + 10 for unsigned long %d or the like.
(II) s/__dummy/__intent_in/ for clarity?

thanks,


[arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Henderson
I'm not sure what happened to v2.  I can see it in my sent email, but it never
made it to the mailing list, and possibly not to Richard E. either.

So resending, with an extra testsuite fix for ilp32, spotted by Christophe.

Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, I
am swapping out a targetm hook when changing into and out of thumb1 mode.


r~


gcc/
* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
to define __GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (thumb1_md_asm_adjust): New function.
(arm_option_params_internal): Swap out targetm.md_asm_adjust
depending on TARGET_THUMB1.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.

gcc/testsuite/
* testsuite/gcc.target/arm/asm-flag-3.c: Skip for thumb1.
* testsuite/gcc.target/arm/asm-flag-5.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-6.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-4.c: New test.

* testsuite/gcc.target/aarch64/asm-flag-6.c: Use %w for
asm inputs to cmp instruction for ILP32.


diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..546b35a5cbd 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -122,7 +122,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   if (arm_arch_notm)
 builtin_define ("__ARM_ARCH_ISA_ARM");
   builtin_define ("__APCS_32__");
-  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
+
+  def_or_undef_macro (pfile, "__GCC_ASM_FLAG_OUTPUTS__", !TARGET_THUMB1);
 
   def_or_undef_macro (pfile, "__thumb__", TARGET_THUMB);
   def_or_undef_macro (pfile, "__thumb2__", TARGET_THUMB2);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1fd30c238cd..a6b401b7f2e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -325,6 +325,9 @@ static unsigned int arm_hard_regno_nregs (unsigned int, 
machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx_insn * thumb1_md_asm_adjust (vec &, vec &,
+   vec &, vec &,
+   HARD_REG_SET &);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -2941,6 +2944,11 @@ arm_option_params_internal (void)
   /* For THUMB2, we limit the conditional sequence to one IT block.  */
   if (TARGET_THUMB2)
 max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
+
+  if (TARGET_THUMB1)
+targetm.md_asm_adjust = thumb1_md_asm_adjust;
+  else
+targetm.md_asm_adjust = arm_md_asm_adjust;
 }
 
 /* True if -mflip-thumb should next add an attribute for the default
@@ -32528,6 +32536,23 @@ arm_run_selftests (void)
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
 
+/* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode.
+   Unlike the arm version, we do NOT implement asm flag outputs.  */
+
+rtx_insn *
+thumb1_md_asm_adjust (vec , vec &/*inputs*/,
+ vec ,
+ vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+if (strncmp (constraints[i], "=@cc", 4) == 0)
+  {
+   sorry ("asm flags not supported in thumb1 mode");
+   break;
+  }
+  return NULL;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arm.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));
 
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
index 9707ebfcebb..97104d3ac73 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -1,6 +1,7 @@
 /* Test the valid @cc asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 #ifndef __GCC_ASM_FLAG_OUTPUTS__
 #error "missing preprocessor 

Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Jakub Jelinek
On Tue, Nov 19, 2019 at 11:20:16AM +0100, Martin Liška wrote:
> On 11/19/19 10:12 AM, Jakub Jelinek wrote:
> > On Mon, Nov 18, 2019 at 03:34:35PM +0100, Martin Liška wrote:
> > > > Now, I believe with the if to gswitch optimization these will only 
> > > > rarely
> > > > kick in, because the IL will have switches that reassoc doesn't handle,
> > > > instead of series of ifs.
> > > 
> > > Yes, so my question is whether reassoc can handle gswitch statements 
> > > similar
> > > to series of if statements? Note that use can write explicitly something 
> > > like:
> > 
> > I don't see how.  Either the cswitch pass runs early, and then you need to
> > take it into account when deciding whether to expand it as series of ifs or
> > table lookup or bittest (say, realize you'd need just a single condition to
> > cover everything and go with if series), or it runs late and then on the
> > other side if reassoc turned gswitch into if series without taking into
> > account other possibilities like table lookup, it could pessimize stuff.
> > So, IMHO cswitch should he able to do this too.
> 
> I agree with you that we should teach switch lowering to also consider the
> transformation that reassoc does. I understand it as a generalized bit test,
> and I would be happy to implement that on gswitch level. Which leads us
> back to need for if-to-switch conversion pass that will enable such 
> optimization.

The point I'm trying to make is that with if-to-switch, if cswitch doesn't
handle it, we have a regression.  If user writes it as a switch, it is a
missed optimization if we generate worse code than we could, but not a
regression.

Jakub



Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Martin Liška

On 11/19/19 10:12 AM, Jakub Jelinek wrote:

On Mon, Nov 18, 2019 at 03:34:35PM +0100, Martin Liška wrote:

Now, I believe with the if to gswitch optimization these will only rarely
kick in, because the IL will have switches that reassoc doesn't handle,
instead of series of ifs.


Yes, so my question is whether reassoc can handle gswitch statements similar
to series of if statements? Note that use can write explicitly something like:


I don't see how.  Either the cswitch pass runs early, and then you need to
take it into account when deciding whether to expand it as series of ifs or
table lookup or bittest (say, realize you'd need just a single condition to
cover everything and go with if series), or it runs late and then on the
other side if reassoc turned gswitch into if series without taking into
account other possibilities like table lookup, it could pessimize stuff.
So, IMHO cswitch should he able to do this too.


I agree with you that we should teach switch lowering to also consider the
transformation that reassoc does. I understand it as a generalized bit test,
and I would be happy to implement that on gswitch level. Which leads us
back to need for if-to-switch conversion pass that will enable such 
optimization.

My counter-example is following:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
index 944362ad076..58f9737b918 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
@@ -6,10 +6,15 @@
 
 int test (int a, int b, int c)

 {
-  if ( a == 10 || a == 12 || a == 26)
-return b;
-  else
-return c;
+  switch (a)
+{
+case 10:
+case 12:
+case 26:
+  return b;
+default:
+  return c;
+}
 }
 
 int main ()


Note that I see both representation in the source code equivalent.
It's probably just a matter of taste ;)

Martin





switch (a)
case 0...30:
   return 1;
case 32:
   return 1;
case 34:
   return 1;

which won't be optimized by reassoc. While it can handle something 
0<=A<=30|A==32|A==34.


Jakub





Re: [PATCH] Initialize a variable due to -Wmaybe-uninitialized.

2019-11-19 Thread Andreas Schwab
On Nov 18 2019, Jeff Law wrote:

> On 11/18/19 6:17 AM, Martin Liška wrote:
>> Hi.
>> 
>> The patch is about yet another bootstrap -Wmaybe-uninitialized warning.
>> I've just tested that on risv64 cross compiler with latest trunk.
>> 
>> Ready to be installed?
>> Thanks,
>> Martin
>> 
>> gcc/ChangeLog:
>> 
>> 2019-11-18  Martin Liska  
>> 
>> PR bootstrap/92540
>> * config/riscv/riscv.c (riscv_address_insns): Initialize
>> addr in order to remove boostrap -Wmaybe-uninitialized
>> error.
> OK.  I had this internally, but haven't had time to analyze if the
> warnings was a false positive or not.

IMHO it is a false positive.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Restore init_ggc_heuristics.

2019-11-19 Thread Richard Biener
On Mon, Nov 18, 2019 at 1:24 PM Martin Liška  wrote:
>
> Hello.
>
> After my param to option transformation, we lost automatic GGC
> detection. It's because init_ggc_heuristics is called before
> init_options_struct which memsets all the values to zero first.
>
> I've tested the patch with --enable-checking=release and I hope
> Honza can test it more?
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I prefer to _not_ move all the functions.  Moving the init_ggc_heuristics
call is OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-11-18  Martin Liska  
>
> * ggc-common.c (ggc_rlimit_bound): Move to opts.c
> (ggc_min_expand_heuristic): Likewise.
> (ggc_min_heapsize_heuristic): Likewise.
> (init_ggc_heuristics): Likewise.
> * ggc.h (init_ggc_heuristics): Remove declaration.
> * opts.c (ggc_rlimit_bound): Moved here from ggc-common.c.
> (ggc_min_expand_heuristic): Likewise.
> (ggc_min_heapsize_heuristic): Likewise.
> (init_ggc_heuristics): Likewise.
> (init_options_struct): Init GGC params.
> * toplev.c (general_init): Remove call to init_ggc_heuristics.
> ---
>   gcc/ggc-common.c | 103 -
>   gcc/ggc.h|   3 --
>   gcc/opts.c   | 106 +++
>   gcc/toplev.c |   4 --
>   4 files changed, 106 insertions(+), 110 deletions(-)
>
>


Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Christophe Lyon
On Tue, 19 Nov 2019 at 10:23, Richard Henderson
 wrote:
>
> On 11/19/19 9:29 AM, Christophe Lyon wrote:
> > On Mon, 18 Nov 2019 at 20:54, Richard Henderson
> >  wrote:
> >>
> >> On 11/18/19 1:30 PM, Christophe Lyon wrote:
> >>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
> >>> when compiling with -mabi=ilp32. I have less details than for Arm,
> >>> because here I'm using the Foundation Model as simulator instead of
> >>> Qemu. In addition, I'm using an old version of it, so maybe it's a
> >>> simulator bug. Does it work on your side?
> >>
> >> I don't know how to test ilp32 with qemu.  Is there a distribution that 
> >> uses
> >> this mode, and one tests in system mode?  We don't have user-only support 
> >> for
> >> ilp32.
> >>
> >
> > Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.
>
> In the short term, can you please try this testsuite patch?
>
I confirm this patch makes the test pass.

>
> r~


Re: [C++ coroutines 4/6] Middle end expanders and transforms.

2019-11-19 Thread Richard Biener
On Sun, Nov 17, 2019 at 11:27 AM Iain Sandoe  wrote:
>
>
> As described in the covering note, the main part of this is the
> expansion of the library support builtins, these are simple boolean
> or numerical substitutions.
>
> The functionality of implementing an exit from scope without cleanup
> is performed here by lowering an IFN to a gimple goto.
>
> The final part is the expansion of the coroutine IFNs that describe the
> state machine connections to the dispatchers.
>
>In the front end we construct a single actor function that contains
>the coroutine state machine.
>
>The actor function has three entry conditions:
> 1. from the ramp, resume point 0 - to initial-suspend.
> 2. when resume () is executed (resume point N).
> 3. from the destroy () shim when that is executed.
>
>The actor function begins with two dispatchers; one for resume and
>one for destroy (where the initial entry from the ramp is a special-
>case of resume point 0).
>
>Each suspend point and each dispatch entry is marked with an IFN such
>that we can connect the relevant dispatchers to their target labels.
>
>So, if we have:
>
>CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR)
>
>This is await point NUM, and is the final await if FINAL is non-zero.
>The resume point is RES_LAB, and the destroy point is DEST_LAB.
>
>We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a
>CO_ACTOR (NUM+1) in the destroy dispatcher.
>
>Initially, the intent of keeping the resume and destroy paths together
>is that the conditionals controlling them are identical, and thus there
>would be duplication of any optimisation of those paths if the split
>were earlier.
>
>Subsequent inlining of the actor (and DCE) is then able to extract the
>resume and destroy paths as separate functions if that is found
>profitable by the optimisers.
>
>Once we have remade the connections to their correct postions, we elide
>the labels that the front end inserted.

Comments inline.

> gcc/ChangeLog:
>
> 2019-11-17  Iain Sandoe  
>
> * Makefile.in: Add coroutine-passes.o.
> * coroutine-passes.cc: New file.
> * passes.def: Add pass_coroutine_lower_builtins,
> pass_coroutine_early_expand_ifns and
> pass_coroutine_finalize_frame.
> * tree-pass.h (make_pass_coroutine_lower_builtins): New.
> (make_pass_coroutine_early_expand_ifns): New.
> (make_pass_coroutine_finalize_frame): New.
> ---
>  gcc/Makefile.in |   1 +
>  gcc/coroutine-passes.cc | 621 
> 
>  gcc/passes.def  |   3 +
>  gcc/tree-pass.h |   3 +
>  4 files changed, 628 insertions(+)
>  create mode 100644 gcc/coroutine-passes.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index ac21401..fc7226a 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1266,6 +1266,7 @@ OBJS = \
> compare-elim.o \
> context.o \
> convert.o \
> +   coroutine-passes.o \
> coverage.o \
> cppbuiltin.o \
> cppdefault.o \
> diff --git a/gcc/coroutine-passes.cc b/gcc/coroutine-passes.cc
> new file mode 100644
> index 000..33e1d38
> --- /dev/null
> +++ b/gcc/coroutine-passes.cc
> @@ -0,0 +1,621 @@
> +/* coroutine expansion and optimisation passes.
> +
> +   Copyright (C) 2018-2019 Free Software Foundation, Inc.
> +
> + Contributed by Iain Sandoe  under contract to Facebook.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "cgraph.h"
> +#include "pretty-print.h"
> +#include "diagnostic-core.h"
> +#include "fold-const.h"
> +#include "internal-fn.h"
> +#include "langhooks.h"
> +#include "gimplify.h"
> +#include "gimple-iterator.h"
> +#include "gimplify-me.h"
> +#include "gimple-walk.h"
> +#include "gimple-fold.h"
> +#include "tree-cfg.h"
> +#include "tree-into-ssa.h"
> +#include "tree-ssa-propagate.h"
> +#include "gimple-pretty-print.h"
> +#include "cfghooks.h"
> +
> +/* Here we:
> +   * lower the internal function that implements an exit from scope.
> +   * expand the builtins that are used to implement 

Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-11-19 Thread Jan Hubicka
> Hi!
> 
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
> 
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
> 
> The following patch does it when optimizing the insn for size only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR target/92549
>   * config/i386/i386.md (peephole2 for *swap): New peephole2.
> 
>   * gcc.target/i386/pr92549.c: New test.

Yep, it makes sense to me.
I remember poking around xchg a while ago.  We used to generate it
pre-reload which had various unexpected consequences, but doing it via
peephole seems safe. So patch is OK.

Honza
> 
> --- gcc/config/i386/i386.md.jj2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md   2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>  
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "register_operand")
> + (match_operand:SWI 1 "register_operand"))
> +   (set (match_dup 1)
> + (match_operand:SWI 2 "register_operand"))
> +   (set (match_dup 2) (match_dup 0))]
> +  "peep2_reg_dead_p (3, operands[0])
> +   && optimize_insn_for_size_p ()"
> +  [(parallel [(set (match_dup 1) (match_dup 2))
> +   (set (match_dup 2) (match_dup 1))])])
> +
>  (define_expand "movstrict"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
>   (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj2019-11-18 
> 17:48:35.533177377 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c   2019-11-18 17:49:31.888336444 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif
> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> +  return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> +  return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> +  return foo (39, 3);
> +}
> 
>   Jakub
> 


Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-19 Thread Prathamesh Kulkarni
On Mon, 18 Nov 2019 at 16:17, Kyrill Tkachov
 wrote:
>
> Hi Prathamesh,
>
> On 11/14/19 6:47 PM, Prathamesh Kulkarni wrote:
> > Hi,
> > As suggested in PR, the attached patch falls back to distributing
> > rshift over plus_expr instead of fallback widening -> arithmetic ->
> > narrowing sequence, if target support is not available.
> > Bootstrap+tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
>
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
> new file mode 100644
> index 000..b682f3f3b74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#define N 1024
> +unsigned char dst[N];
> +unsigned char in1[N];
> +unsigned char in2[N];
> +
> +void
> +foo ()
> +{
> +  for( int x = 0; x < N; x++ )
> +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
>
>
> I think you'll want to make the test a bit strong to test the actual 
> instructions expected here.
> You'll also want to test the IFN_AVG_FLOOR case, as your patch adds support 
> for it too.
Hi Kyrill,
Thanks for the suggestions, I have updated  tests in the attached patch.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 8ebbcd76b64..7025a3b4dc2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> last_stmt_info, tree *type_out)
>
> /* Check for target support.  */
> tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> -  if (!new_vectype
> -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> - OPTIMIZE_FOR_SPEED))
> +
> +  if (!new_vectype)
>   return NULL;
>
> +  bool ifn_supported
> += direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED);
> +
> /* The IR requires a valid vector type for the cast result, even though
>it's likely to be discarded.  */
> *type_out = get_vectype_for_scalar_type (vinfo, type);
> if (!*type_out)
>   return NULL;
>
> -  /* Generate the IFN_AVG* call.  */
> tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> tree new_ops[2];
> vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
>unprom, new_vectype);
> +
> +  if (!ifn_supported)
> +{
> +  /* If there is no target support available, generate code
> +to distribute rshift over plus and add one depending
> +upon floor or ceil rounding.  */
> +
> +  tree one_cst = build_one_cst (new_type);
> +
> +  tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], 
> one_cst);
> +
> +  tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g2 = gimple_build_assign (tmp2, RSHIFT_EXPR, new_ops[1], 
> one_cst);
> +
> +  tree tmp3 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g3 = gimple_build_assign (tmp3, PLUS_EXPR, tmp1, tmp2);
> +
> +  tree tmp4 = vect_recog_temp_ssa_var (new_type, NULL);
> +  tree_code c = (ifn == IFN_AVG_CEIL) ? BIT_IOR_EXPR : BIT_AND_EXPR;
> +  gassign *g4 = gimple_build_assign (tmp4, c, new_ops[0], new_ops[1]);
> +
> +  tree tmp5 = vect_recog_temp_ssa_var (new_type, NULL);
> +  gassign *g5 = gimple_build_assign (tmp5, BIT_AND_EXPR, tmp4, one_cst);
> +
> +  gassign *g6 = gimple_build_assign (new_var, PLUS_EXPR, tmp3, tmp5);
> +
> +  append_pattern_def_seq (last_stmt_info, g1, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g2, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g3, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g4, new_vectype);
> +  append_pattern_def_seq (last_stmt_info, g5, new_vectype);
> +  return vect_convert_output (last_stmt_info, type, g6, new_vectype);
> +}
> +
> +  /* Generate the IFN_AVG* call.  */
> gcall *average_stmt = gimple_build_call_internal (ifn, 2, new_ops[0],
> new_ops[1]);
> gimple_call_set_lhs (average_stmt, new_var);
>
2019-11-19  Prathamesh Kulkarni  

PR tree-optimization/89007
* tree-vect-patterns.c (vect_recog_average_pattern): If there is no
target support available, generate code to distribute rshift over plus
and add one depending upon floor or ceil rounding.

testsuite/
* gcc.target/aarch64/sve/pr89007.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
new file mode 100644
index 000..32095c63c61
--- /dev/null
+++ 

Re: [PATCH] libstdc++: Define C++20 range utilities and range factories

2019-11-19 Thread Jonathan Wakely

On 19/11/19 08:51 +, Jonathan Wakely wrote:

On 19/11/19 09:38 +0100, Stephan Bergmann wrote:

On 17/11/2019 02:07, Jonathan Wakely wrote:

This adds another chunk of the  header.

The changes from P1456R1 (Move-only views) and P1862R1 (Range adaptors
for non-copyable iterators) are included, but not the changes from
P1870R1 (forwarding-range is too subtle).

The tests for subrange and iota_view are poor and should be improved.


At least with recent Clang trunk, with -std=c++2a this causes failures like


In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:61:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 error: default initialization of an object of const type 'const bool'
template inline constexpr bool __enable_view_impl;
 ^
In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/set:62:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_multiset.h:1045:48:
 error: redefinition of '__enable_view_impl'
template inline constexpr bool __enable_view_impl;
 ^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/10.0.0/../../../../include/c++/10.0.0/bits/stl_set.h:1057:48:
 note: previous definition is here
template inline constexpr bool __enable_view_impl;
 ^


and I'm not sure whether this is legal C++20 code that Clang is just 
not capable of handling?


I'm not sure either. I checked with GCC and it did what I expected
(the first declaration is not a definition, just a declaration). I'll
check if that's valid.

For now the simplest fix would be to just guard all those variable
templates with #if __cpp_lib_concepts since std::ranges::enable_view
only exists when concepts are supported (which is false for Clang).

I'll look into it, thanks for the heads up.


GCC is wrong. I reported https://gcc.gnu.org/PR92576 and I've fixed the
libstdc++ code with the attached patch.

commit 60c1e565ca97887d1d51490237bc3463875c4e99
Author: Jonathan Wakely 
Date:   Tue Nov 19 09:13:42 2019 +

libstdc++: Fix declarations of variable templates

This code is invalid and rejected by other compilers (see PR 92576).

* include/bits/regex.h (ranges::__detail::__enable_view_impl): Fix
declaration.
* include/bits/stl_multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/stl_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/unordered_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/set.h (ranges::__detail::__enable_view_impl): Likewise.
* include/debug/unordered_set (ranges::__detail::__enable_view_impl):
Likewise.

diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h
index 49994369563..4a19065fb58 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -2061,7 +2061,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool __enable_view_impl>
   = false;
diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 9e34961f4a5..14207e62205 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -1042,7 +1042,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool
   __enable_view_impl<_GLIBCXX_STD_C::multiset<_Key, _Compare, _Alloc>>
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index 135d57af496..85f26c24dae 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -1054,7 +1054,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if __cplusplus > 201703L
 namespace ranges::__detail
 {
-  template inline constexpr bool __enable_view_impl;
+  template extern inline const bool __enable_view_impl;
   template
 inline constexpr bool
   __enable_view_impl<_GLIBCXX_STD_C::set<_Key, _Compare, _Alloc>> = false;
diff --git a/libstdc++-v3/include/bits/unordered_set.h b/libstdc++-v3/include/bits/unordered_set.h
index 98943fb1a47..b138d02bbe8 100644
--- a/libstdc++-v3/include/bits/unordered_set.h
+++ b/libstdc++-v3/include/bits/unordered_set.h
@@ -1775,7 +1775,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #if 

Re: [PATCH] Add one more pass_convert_switch late.

2019-11-19 Thread Richard Biener
On Mon, Nov 18, 2019 at 10:17 AM Martin Liška  wrote:
>
> On 11/14/19 1:15 PM, Richard Biener wrote:
> > Hmm.  I was thinking of moving the pass instead of adding another one.
> > What's the reason to run switch-conversion during early optimization again?
>
> I'm adding CC, as he's the author of cswitch pass and he probably knows
> more about the pass placement. But I bet early conversion to array access
> rapidly simplifies CFG and other passes can benefit from that.
>
> >
> > But then it's probably not too bad... (and somehow I'd still like to see
> > switch-conversion, switch lowering and if-to-switch be "integrated"
> > somehow, analyzing the IL and then outputting optimized if/switch
> > according to the same cost metric).
>
> Well, my intuition is that gswitch statement is a canonical representation
> of a switch. And so that one wants to have if-to-switch early in the pipeline
> and switch lowering very late?

Just to throw in a tiny thing here - while a switch seems a natural canonical
representation it is actually a poor one for most GCC passes since they're
going to give up on switches and it's CFG complexity.  GCCs switch
representation also isn't exactly good (remember all that label lookup
stuff you have to do).  Switches are a quite high-level representation
and to me only "good" if we are actually going to emit a jump-table.
Given converting a series of ifs to a switch is "hard" (the opposite
is easy) going for a switch early (before CFG transforms mess up
things too much) kind-of makes sense, but lowering back non-jump table
ones soon is equally important.

That's of course all unrelated to adding another switch-conversion pass.
(and yes, switch-converted IL is more friendly than a switch)

Richard.

> Martin
>


Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Richard Henderson
On 11/19/19 9:29 AM, Christophe Lyon wrote:
> On Mon, 18 Nov 2019 at 20:54, Richard Henderson
>  wrote:
>>
>> On 11/18/19 1:30 PM, Christophe Lyon wrote:
>>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
>>> when compiling with -mabi=ilp32. I have less details than for Arm,
>>> because here I'm using the Foundation Model as simulator instead of
>>> Qemu. In addition, I'm using an old version of it, so maybe it's a
>>> simulator bug. Does it work on your side?
>>
>> I don't know how to test ilp32 with qemu.  Is there a distribution that uses
>> this mode, and one tests in system mode?  We don't have user-only support for
>> ilp32.
>>
> 
> Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.

In the short term, can you please try this testsuite patch?


r~
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));
 


Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-11-19 Thread Uros Bizjak
On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek  wrote:
>
> Hi!
>
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
>
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
>
> The following patch does it when optimizing the insn for size only.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-11-19  Jakub Jelinek  
>
> PR target/92549
> * config/i386/i386.md (peephole2 for *swap): New peephole2.
>
> * gcc.target/i386/pr92549.c: New test.

OK with some test adjustments.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md 2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "register_operand")
> +   (match_operand:SWI 1 "register_operand"))
> +   (set (match_dup 1)
> +   (match_operand:SWI 2 "register_operand"))
> +   (set (match_dup 2) (match_dup 0))]
> +  "peep2_reg_dead_p (3, operands[0])
> +   && optimize_insn_for_size_p ()"
> +  [(parallel [(set (match_dup 1) (match_dup 2))
> + (set (match_dup 2) (match_dup 1))])])
> +
>  (define_expand "movstrict"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
> (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj  2019-11-18 17:48:35.533177377 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c 2019-11-18 17:49:31.888336444 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif

Please use

*/ { dg-additional-options "-mregparm=2" { target ia32 } } */

instead.

> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> +  return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> +  return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> +  return foo (39, 3);
> +}

No need for main in compile-only tests.

> Jakub
>


  1   2   >