date:20211208

[PATCH] x86: Update -mtune=tremont

2021-12-08 Thread Cui,Lili via Gcc-patches

Hi Uros,

This patch is to update mtune for tremont.

Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?


Silvermont has a special handle in add_stmt_cost function, because it has in
order SIMD pipeline. But for Tremont, its SIMD pipeline is out of order,
remove Tremont from this special handle.

gcc/ChangeLog

* config/i386/i386.c (ix86_vector_costs::add_stmt_cost): Remove Tremont.
---
 gcc/config/i386/i386.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1e41fd55f9..9f4ed34ffd5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23144,8 +23144,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  for Silvermont as it has out of order integer pipeline and can execute
  2 scalar instruction per tick, but has in order SIMD pipeline.  */
   if ((TARGET_CPU_P (SILVERMONT) || TARGET_CPU_P (GOLDMONT)
-   || TARGET_CPU_P (GOLDMONT_PLUS) || TARGET_CPU_P (TREMONT)
-   || TARGET_CPU_P (INTEL))
+   || TARGET_CPU_P (GOLDMONT_PLUS) || TARGET_CPU_P (INTEL))
   && stmt_info && stmt_info->stmt)
 {
   tree lhs_op = gimple_get_lhs (stmt_info->stmt);
-- 
2.17.1

Thanks,
Lili.

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-12-08 Thread Jiufu Guo via Gcc-patches

Richard Biener  writes:

> On Mon, 18 Oct 2021, Jiufu Guo wrote:
>
>> With reference the discussions in:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html
>> 
>> Base on the patches in above discussion, we may draft a patch to fix the
>> issue.
>> 
>> In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
>> '{b0,s0-s1} op {b1,0}', we also compute the condition which could assume
>> both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
>> < the niter "of untill wrap for iv0 or iv1".
>> 
>> Does this patch make sense?
>
> Hum, the patch is mightly complex :/  I'm not sure we can throw
> artficial IVs at number_of_iterations_cond and expect a meaningful
> result.
>
> ISTR the problem is with number_of_iterations_ne[_max], but I would
> have to go and dig in myself again for a full recap of the problem.
> I did plan to do that, but not before stage3 starts.
>
> Thanks,
> Richard.

Hi Richard,

Thanks for your comment!  It is really complex, using artificial IVs and
recursively calling number_of_iterations_cond.  We may use a simpler way.
Not sure if you had started to dig into the problem.  I refined a patch.
Hope this patch is helpful.  This patch enhances the conditions in some
aspects. Attached are two test cases that could be handled.

---
 gcc/tree-ssa-loop-niter.c | 92 +++
 .../gcc.c-torture/execute/pr100740.c  | 11 +++
 gcc/testsuite/gcc.dg/vect/pr102131.c  | 47 ++
 3 files changed, 134 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 06954e437f5..ee1d7293c5c 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1788,6 +1788,70 @@ dump_affine_iv (FILE *file, affine_iv *iv)
 }
 }
 
+/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */
+
+static tree
+get_step_count (tree high, tree low, tree step, tree utype,
+   bool end_inclusive = false)
+{
+  tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
+  delta = fold_convert (utype,delta);
+  if (end_inclusive)
+delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype));
+
+  if (tree_int_cst_sign_bit (step))
+step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
+  step = fold_convert (utype, step);
+
+  return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step);
+}
+
+/*  Get the additional assumption if both two steps are not zero.
+Assumptions satisfy that there is no overflow or wrap during
+v0 and v1 chasing.  */
+
+static tree
+extra_iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step,
+  enum tree_code code)
+{
+  /* No need additional assumptions.  */
+  if (code == NE_EXPR)
+return boolean_true_node;
+
+  /* it not safe to transform {b0, 1} < {b1, 2}.  */
+  if (tree_int_cst_sign_bit (step))
+return boolean_false_node;
+
+  /* No need addition assumption for pointer.  */
+  tree type = TREE_TYPE (iv0->base);
+  if (POINTER_TYPE_P (type))
+return boolean_true_node;
+
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
+  bool positive = !tree_int_cst_sign_bit (step);
+  tree utype = unsigned_type_for (type);
+  bool add1 = code == LE_EXPR;
+  tree niter = positive
+? get_step_count (iv1->base, iv0->base, step, utype, add1)
+: get_step_count (iv0->base, iv1->base, step, utype, add1);
+
+  int prec = TYPE_PRECISION (type);
+  signop sgn = TYPE_SIGN (type);
+  tree max = wide_int_to_tree (type, wi::max_value (prec, sgn));
+  tree min = wide_int_to_tree (type, wi::min_value (prec, sgn));
+  tree valid_niter0, valid_niter1;
+
+  valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype)
+  : get_step_count (iv0->base, min, iv0->step, utype);
+  valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype)
+  : get_step_count (iv1->base, min, iv1->step, utype);
+
+  tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0);
+  tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1);
+  return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1);
+}
+
 /* Determine the number of iterations according to condition (for staying
inside loop) which compares two induction variables using comparison
operator CODE.  The induction variable on left side of the comparison
@@ -1879,30 +1943,26 @@ number_of_iterations_cond (class loop *loop,
{iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0}
 
  provided that either below condition is satisfied:
+ a. iv0.step and iv1.step are integer.
+

Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/8/2021 9:08 PM, Alexandre Oliva wrote:

On Dec  8, 2021, Jeff Law  wrote:


expr.c (emit_move_multi_word): Skip clobber during lra.

OK.

I found a similar pattern of issuing clobbers for multi-word moves, but
not when reload_in_progress, in expr.c:emit_move_complex_parts.  I don't
have a testcase, but I'm tempted to propose '!lra_in_progress &&' for it
as well.  Can you think of any reason not to?
The only reason I can think of is we're in stage3 :-)  It'd be a lot 
easier to green light that if we could trigger an issue.





I also see lots of uses of reload_in_progress in machine-dependent code,
and I suspect many cases involving enabling patterns or checking for
legitimate addresses might benefit from the addition of lra_in_progress,
but that's too many occurrences to try to make sense of :-(

Yea, very likely.

jeff

Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-08 Thread Alexandre Oliva via Gcc-patches

On Dec  8, 2021, Jeff Law  wrote:

>> expr.c (emit_move_multi_word): Skip clobber during lra.

> OK.

I found a similar pattern of issuing clobbers for multi-word moves, but
not when reload_in_progress, in expr.c:emit_move_complex_parts.  I don't
have a testcase, but I'm tempted to propose '!lra_in_progress &&' for it
as well.  Can you think of any reason not to?

I also see lots of uses of reload_in_progress in machine-dependent code,
and I suspect many cases involving enabling patterns or checking for
legitimate addresses might benefit from the addition of lra_in_progress,
but that's too many occurrences to try to make sense of :-(

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] [i386]Add combine splitter to transform vashr/vlshr/vashl_optab to ashr/lshr/ashl_optab for const vector duplicate operand.

2021-12-08 Thread Hongtao Liu via Gcc-patches

On Wed, Dec 8, 2021 at 2:47 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi,
>
> This patch add combine splitter to transform vashr/vlshr/vashl_optab to 
> ashr/lshr/ashl_optab for const vector duplicate operand.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/101796
> * config/i386/predicates.md (const_vector_operand):
> Add new predicate.
> * config/i386/sse.md(3):
> Add new define_split below.
>
> gcc/testsuite/ChangeLog:
>
> PR target/101796
> * gcc.target/i386/pr101796-1.c: New test.
> ---
>  gcc/config/i386/predicates.md  | 13 +
>  gcc/config/i386/sse.md | 14 ++
>  gcc/testsuite/gcc.target/i386/pr101796-1.c | 20 
>  3 files changed, 47 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr101796-1.c
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 4ccbe11b842..770e2f0c0dd 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -1844,6 +1844,19 @@
>return true;
>  })
>
> +;; Return true if OP is a const vector with duplicate value.
> +(define_predicate "const_vector_duplicate_operand"
> +  (match_code "const_vector")
> +{
> +  rtx elt = XVECEXP (op, 0, 0);
> +  int i, nelt = XVECLEN (op, 0);
> +
> +  for (i = 1; i < nelt; ++i)
> +if (!rtx_equal_p (elt, XVECEXP (op, 0, i)))
> +  return false;
> +  return true;
> +})
> +
>  ;; Return true if OP is a parallel for a vbroadcast permute.
>  (define_predicate "avx_vbroadcast_operand"
>(and (match_code "parallel")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 08bdcddc111..a2c0c1209c7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -15232,6 +15232,20 @@
> (const_string "0")))
> (set_attr "mode" "")])
>
> +;; PR target/101796: Transfrom movl+vpbranchcastw+vpsravw to vpsraw
> +;; when COUNT is immediate.
> +(define_split
> +  [(set (match_operand:VI248_AVX512BW 0 "register_operand")
> +   (any_shift:VI248_AVX512BW
> + (match_operand:VI248_AVX512BW 1 "nonimmediate_operand")
> + (match_operand:VI248_AVX512BW 2 "const_vector_duplicate_operand")))]
> +  "TARGET_AVX512F && GET_MODE_UNIT_BITSIZE (mode)
> +   > INTVAL (XVECEXP (operands[2], 0, 0))"
> +  [(set (match_dup 0)
> +   (any_shift:VI248_AVX512BW
> + (match_dup 1)
> + (match_dup 3)))]
> +  "operands[3] = XVECEXP (operands[2], 0, 0);")
>
>  (define_expand "vec_shl_"
>[(set (match_dup 3)
> diff --git a/gcc/testsuite/gcc.target/i386/pr101796-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101796-1.c
> new file mode 100755
> index 000..32ae5909913
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101796-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512bw" } */
> +/* { dg-final {scan-assembler-times "vpsrlw\[ \\t\]" 1 } } */
> +/* { dg-final {scan-assembler-times "vpsllw\[ \\t\]" 1 } } */
> +/* { dg-final {scan-assembler-times "vpsraw\[ \\t\]" 1 } } */
> +/* { dg-final {scan-assembler-not "vpbroadcastw\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpsrlvw\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpsllvw\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpsravw\[ \\t\]" } } */
> +#include 
> +
> +volatile __m512i a, b;
> +
> +void
> +foo()
> +{
> +  b = _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3));
> +  b = _mm512_sllv_epi16 (a, _mm512_set1_epi16 (4));
> +  b = _mm512_srav_epi16 (a, _mm512_set1_epi16 (5));
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao

Re: [PR103097] tolerate reg-stack cross-block malformed asms

2021-12-08 Thread Alexandre Oliva via Gcc-patches

On Dec  8, 2021, Jeff Law  wrote:

> On 12/7/2021 7:00 PM, Alexandre Oliva via Gcc-patches wrote:
>> PR target/103097
>> * reg-stack.c (convert_regs_1): Move any_malformed_asm
>> resetting...
>> (reg_to_stack): ... here.

> So it's "stickier" after your change.  ie, instead of indicating if
> there was a malformed insn in a block, it's did we find a malformed
> insn anywhere.

Yeah, anywhere in the same function.

> Which implies the comment before the declaration of any_malformed_asm
> needs a trivial update since it stated "malformed insns in a block".

(-: But, but...  we still find it in a block!  ;-D

> OK with the trivial comment update.

Thanks, I'm making it s/block/function/.

Here's what I'm installing momentarily.


[PR103097] tolerate reg-stack cross-block malformed asms

The testcase shows malformed asms in one block confuse reg-stack logic
in another block.  Moving the resetting of any_malformed_asm to the
end of the pass enables it to take effect throughout the affected
function.


for  gcc/ChangeLog

PR target/103097
* reg-stack.c (convert_regs_1): Move any_malformed_asm
resetting...
(reg_to_stack): ... here.

for  gcc/testsuite/ChangeLog

PR target/103097
* gcc.target/i386/pr103097.c: New.
---
 gcc/reg-stack.c  |5 ++---
 gcc/testsuite/gcc.target/i386/pr103097.c |   30 ++
 2 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103097.c

diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index 1d9ea035cf44f..cc369f0b635a0 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -453,7 +453,7 @@ get_true_reg (rtx *pat)
   }
 }

-/* Set if we find any malformed asms in a block.  */
+/* Set if we find any malformed asms in a function.  */
 static bool any_malformed_asm;
 
 /* There are many rules that an asm statement for stack-like regs must
@@ -3014,8 +3014,6 @@ convert_regs_1 (basic_block block)
   bool cfg_altered = false;
   int debug_insns_with_starting_stack = 0;
 
-  any_malformed_asm = false;
-
   /* Choose an initial stack layout, if one hasn't already been chosen.  */
   if (bi->stack_in.top == -2)
 {
@@ -3385,6 +3383,7 @@ reg_to_stack (void)
  0, sizeof (char) * (max_uid + 1));
 
   convert_regs ();
+  any_malformed_asm = false;
 
   free_aux_for_blocks ();
   return true;
diff --git a/gcc/testsuite/gcc.target/i386/pr103097.c 
b/gcc/testsuite/gcc.target/i386/pr103097.c
new file mode 100644
index 0..2b7a307deec9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103097.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fharden-conditional-branches" } */
+
+/* This is a slightly simplified version of
+   gcc.target/s390/vector/long-double-asm-earlyclobber.c.  On x86, the f
+   constraints in asm statements imposes some requirements that the testcase
+   doesn't meet.  What's unusual is that -fharden-conditional-branches extends
+   the effects of the malformed asm onto a different basic blocks, which
+   reg-stack did not expect.  */
+
+#include 
+#include 
+
+void
+f (void)
+{
+  long double res, x = 0;
+  asm("" : "=f"(res) /* { dg-error "must specify a single register" } */
+  : "0"(x));
+  assert (res == x);
+}  
+
+void
+g (void)
+{
+  long double res, x = 0;
+  asm("" : "=g"(res) /* this is ok.  */
+  : "0"(x));
+  assert (res == x);
+}  


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-08 Thread Alexandre Oliva via Gcc-patches

On Dec  8, 2021, Jeff Law  wrote:

> On 12/7/2021 10:37 PM, Alexandre Oliva via Gcc-patches wrote:

>> expr.c (emit_move_multi_word): Skip clobber during lra.

> OK.  Nit in the ChangeLog.  You forgot a '*' before the expr.c entry.

Thanks, fixed.  Here's what I'm installing momentarily.


[PR103302] skip multi-word pre-move clobber during lra

If we emit clobbers before multi-word moves during lra, we get
confused if a copy ends up with input or output replaced with each
other: the clobber then kills the previous set, and it gets deleted.

This patch avoids emitting such clobbers when lra_in_progress.


for  gcc/ChangeLog

PR target/103302
* expr.c (emit_move_multi_word): Skip clobber during lra.

for  gcc/testsuite/ChangeLog

PR target/103302
* gcc.target/riscv/pr103302.c: New.
---
 gcc/expr.c|2 +
 gcc/testsuite/gcc.target/riscv/pr103302.c |   47 +
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr103302.c

diff --git a/gcc/expr.c b/gcc/expr.c
index b281525750978..0365625e7b835 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -3929,7 +3929,7 @@ emit_move_multi_word (machine_mode mode, rtx x, rtx y)
  hard regs shouldn't appear here except as return values.
  We never want to emit such a clobber after reload.  */
   if (x != y
-  && ! (reload_in_progress || reload_completed)
+  && ! (lra_in_progress || reload_in_progress || reload_completed)
   && need_clobber != 0)
 emit_clobber (x);
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr103302.c 
b/gcc/testsuite/gcc.target/riscv/pr103302.c
new file mode 100644
index 0..822c408741645
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr103302.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-Og -fharden-compares -fno-tree-dce -fno-tree-fre " } */
+
+typedef unsigned char u8;
+typedef unsigned char __attribute__((__vector_size__ (32))) v256u8;
+typedef unsigned short __attribute__((__vector_size__ (32))) v256u16;
+typedef unsigned short __attribute__((__vector_size__ (64))) v512u16;
+typedef unsigned int u32;
+typedef unsigned int __attribute__((__vector_size__ (4))) v512u32;
+typedef unsigned long long __attribute__((__vector_size__ (32))) v256u64;
+typedef unsigned long long __attribute__((__vector_size__ (64))) v512u64;
+typedef unsigned __int128 __attribute__((__vector_size__ (32))) v256u128;
+typedef unsigned __int128 __attribute__((__vector_size__ (64))) v512u128;
+
+v512u16 g;
+
+void
+foo0 (u8 u8_0, v256u16 v256u16_0, v512u16 v512u16_0, u32 u32_0, v512u32,
+  v256u64 v256u64_0, v512u64 v512u64_0, v256u128 v256u128_0,
+  v512u128 v512u128_0)
+{
+  u32_0 <= (v512u128) (v512u128_0 != u8_0);
+  v512u64 v512u64_1 =
+__builtin_shufflevector (v256u64_0, v512u64_0, 7, 8, 0, 9, 5, 0, 3, 1);
+  g = v512u16_0;
+  (v256u8) v256u16_0 + (v256u8) v256u128_0;
+}
+
+int
+main (void)
+{
+  foo0 (40, (v256u16)
+   {
+   }, (v512u16)
+   {
+   }, 0, (v512u32)
+   {
+   }, (v256u64)
+   {
+   }, (v512u64)
+   {
+   }, (v256u128)
+   {
+   }, (v512u128)
+   {
+   });
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] pragma: Update target option node when optimization changes [PR103515]

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 7:15 PM, Kewen.Lin via Gcc-patches wrote:

Hi,

For a function with optimize pragma, it's possible that the target
options change as optimization options change.  Now we create one
optimization option node when parsing pragma optimize, but don't
create target option node for possible target option changes.  It
makes later processing not detect the target options have actually
changed and doesn't update the target options accordingly.

This patch is to check whether target options have changed when
creating one optimization option node for pragma optimize, and
make one target option node if needed.  The associated test case
shows the difference.  Without this patch, the function foo1 will
perform unrolling which is unexpected.  The reason is that flag
unroll_only_small_loops isn't correctly set for it.  The value
is updated after parsing function foo2, but doesn't get restored
later since both decls don't have DECL_FUNCTION_SPECIFIC_TARGET
set and the hook think we don't need to switch.  With this patch,
there is no unrolling for foo1, which is also consistent with the
behavior by replacing pragma by attribute whether w/ and w/o this
patch.

Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu
and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
---
gcc/ChangeLog:

PR target/103515
* attribs.c (decl_attributes): Check if target options change and
create one node if so.

gcc/testsuite/ChangeLog:

PR target/103515
* gcc.target/powerpc/pr103515.c: New test.

OK
jeff

Re: [PATCH v3 1/7] ifcvt: Check if cmovs are needed.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

When if-converting multiple SETs and we encounter a swap-style idiom

   if (a > b)
 {
   tmp = c;   // [1]
   c = d;
   d = tmp;
 }

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
---
  gcc/ifcvt.c | 174 
  1 file changed, 150 insertions(+), 24 deletions(-)

OK.  Needs a ChangeLog entry though.

Jeff

Re: [PATCH v3 7/7] ifcvt: Run second pass if it is possible to omit a temporary.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

If one of the to-be-converted SETs requires the original comparison
(i.e. in order to generate a min/max insn) but no other insn after it
does, we can omit creating temporaries, thus facilitating costing.
---
  gcc/ifcvt.c | 33 +++--
  1 file changed, 31 insertions(+), 2 deletions(-)
I'd generally prefer to refactor the bits between the restart label and 
the goto restart into a function and call it twice, passing in the 
additional bits to allow for better costing.  Can you look into that?  
If it's going to be major surgery, then the goto approach will be OK.


Conceptually I don't have any concerns with the patch.  It'll obviously 
need a ChangeLog entry.


Jeff

Re: [PATCH v3 5/7] ifcvt: Try re-using CC for conditional moves.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences:  The same one as before
and a second one that tries to re-use the existing CC.  Then their costs
are compared and the cheaper one is selected.
---
  gcc/ifcvt.c | 112 +---
  1 file changed, 88 insertions(+), 24 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3e78e1bb03d..d1e7db1ee27 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -83,7 +83,7 @@ static rtx_insn *last_active_insn (basic_block, int);
  static rtx_insn *find_active_insn_before (basic_block, rtx_insn *);
  static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
  static basic_block block_fallthru (basic_block);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx cond_exec_get_condition (rtx_insn *, bool);
  static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
  static int noce_operand_ok (const_rtx);
  static void merge_if_block (ce_if_block *);
@@ -427,7 +427,7 @@ cond_exec_process_insns (ce_if_block *ce_info 
ATTRIBUTE_UNUSED,
  /* Return the condition for a jump.  Do not do any special processing.  */
  
  static rtx

-cond_exec_get_condition (rtx_insn *jump)
+cond_exec_get_condition (rtx_insn *jump, bool get_reversed = false)
  {
rtx test_if, cond;
  
@@ -439,8 +439,9 @@ cond_exec_get_condition (rtx_insn *jump)
  
/* If this branches to JUMP_LABEL when the condition is false,

   reverse the condition.  */
-  if (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
-  && label_ref_label (XEXP (test_if, 2)) == JUMP_LABEL (jump))
+  if (get_reversed || (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
+  && label_ref_label (XEXP (test_if, 2))
+  == JUMP_LABEL (jump)))
Just a formatting nit.   Bring the condition after get_reversed down to 
a new line and reindent appropriately.  ie


  if (get_reversed
   || (blah blah blah
 && more blah == frobit))





  {
enum rtx_code rev = reversed_comparison_code (cond, jump);
if (rev == UNKNOWN)
@@ -3146,6 +3147,46 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx 
cond,
return false;
  }
  
+/* Helper function to emit a cmov sequence.  */

+
+static rtx_insn*
+try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
+   rtx cond, rtx new_val, rtx old_val, bool need_cmov,
+   unsigned *cost, rtx *temp_dest,
+   rtx cc_cmp = NULL, rtx rev_cc_cmp = NULL)
Could you expand the function comment a bit here.  Saying something is a 
helper doesn't really help the reader understand what the purpose of the 
function really is.


With the formatting nit fixed, a better function comment and a suitable 
ChangeLog, this is fine.


jeff

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-12-08 Thread Segher Boessenkool

Hi!

On Wed, Dec 08, 2021 at 07:06:30PM -0500, David Malcolm wrote:
> On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote:
> > Named address spaces are completely target-specific.  Defining them
> > with
> > a pragma like this does not allow you to set the pointer mode or
> > anything related to a custom LEGITIMATE_ADDRESS_P.
> 
> My thinking was that each custom address space is based on an existing
> address space, but is disjoint from it, where "based on" means "what it
> looks like in terms of RTL generation" (clearly I'm handwaving here).  
> 
> In patch 1a, the custom address spaces are all based on the generic
> address space (but disjoint from it); syntax could be added to base
> them on one of the target-specific address spaces.
> 
> >   It does not allow
> > you to sayy zero pointers are invalid in some address spaces and not in
> > others.
> 
> Syntax could be added for this, I suppose.
> 
> > You cannot provide any of the DWARF address space stuff this
> > way.
> 
> True.  I confess that I haven't thought about the debugging experience,
> and I'd need to think what would happen at the DWARF level.
> 
> > But most importantly, there are only four bits for the address
> > space field internally, and they are used by however a backend wants to
> > use them.
> 
> One of the ideas of patch 1a is to divide up this 4-bit space between
> the target-specific and the custom address spaces.  The backend code
> would need to be tweaked to decode the 4-bit value to get at the
> underlying target-specific address space value.  This is done by the
> function ensure_builtin_addr_space in patch 1a, though I've likely
> missed some places.
> 
> IIRC, the target that's currently using the most address spaces is avr,
> which I believe has 8 target-specific address spaces, in addition to
> the generic one, i.e. 9 builtin address spaces, which would leave room
> for up to 6 user-defined address spaces.

Except that a backend is free to use this bitfield any way it pleases.

All of the above says that what you want is something completely
orthogonal to and separate from named address spaces.  Very similar in
some ways, sure, but keeping it apart will work much better and be much
less pain.

> The Linux kernel's smatch
> annotations currently effectively introduce 4 custom address spaces,
> __user, __iomem, __percpu, and __rcu (assuming that __kernel is the
> generic address space), so it's something of a tight squeeze, but it
> does fit.  This doesn't account for out-of-tree backends, of course.

Or any future backends.

> > IMO it will be best to not mix this with address spaces in the user
> > interface (it is of course fine to *implement* it like that, or with
> > big overlap at least).
> 
> I was thinking the other way around, in that it should look like
> address spaces in terms of the user's source code, but has some
> implementation differences.

That does not solve any of the problems I brought up though.  That was
just a list of all the basic features from address spaces btw, from
gccint.

> > Allowing the user to define new address spaces does not jibe well with
> > how targets do (validly!) use them.
> 
> I think from a user's perspective it's a nice approach - my feeling is
> that it makes certain things easier for the user, whilst complicating
> things from a backend implementation perspective.
> 
> Plus you've raised various technical issues which I'd have to resolve
> if we went in this direction.

It is fine to have a (very) similar concept for the user, but it does
not work well at all to equate this to the existing concept of named
address spaces.

> Indeed - I think this "untrusted attribute" approach is much simpler
> implementation-wise than the "custom address space" approach, which is
> also in its favor.
> 
> I'm wondering if anyone from the kernel development community has
> strong opinions here, since the custom address space approach is
> potentially much more expressive.

Anything that is more expressive than you have thought through what the
consequences will be is not a feature but a danger.  Anything that does
not fit in with the rest structurally now, will never do that.

> Otherwise I think we're both preferring the "untrusted attribute"
> approach (patch 1b).

That attribute does not interfere with anything else afaics, so that is
much safer.

> > > I thing being able to express something along these lines would
> > > be useful even outside the analyzer, both for warnings and, when
> > > done right, perhaps also for optimization.  So I'm in favor of
> > > something like this.  I'll just reiterate here the comment on
> > > this attribute I sent you privately some time ago.
> > 
> > What is "success" though?  You probably want it so some checker can
> > make
> > sure you do handle failure some way, but how do you see what is
> > handling
> > failure and what is handling the successful case?
> 
> "success" and "failure" in this case are purely in terms of how we
> label events f

Re: [PATCH v3 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.
---
  gcc/expmed.c |   8 +--
  gcc/expr.c   |  10 ++--
  gcc/ifcvt.c  |  45 ++---
  gcc/optabs.c | 135 ++-
  gcc/optabs.h |   4 +-
  gcc/rtl.h|  11 -
  6 files changed, 150 insertions(+), 63 deletions(-)





@@ -4948,17 +4947,85 @@ emit_conditional_move (rtx target, enum rtx_code code, 
rtx op0, rtx op1,

+/* Helper for emitting a conditional move.  */
+
+static rtx
+emit_conditional_move_1 (rtx target, rtx comparison,
+rtx op2, rtx op3, machine_mode mode)
+{
+  enum insn_code icode;
+
+  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
+return NULL_RTX;
+
+  /* If the two source operands are identical, that's just a move.  */
+  if (rtx_equal_p (op2, op3))
+{
+  if (!target)
+   target = gen_reg_rtx (mode);
+
+  emit_move_insn (target, op3);
+  return target;
+}
What if the condition has a side effect?  Doesn't this drop the side 
effect by converting the conditional move into a simple move?


That's my only technical concern.  Obviously the patch needs a ChangeLog.

Jeff

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-12-08 Thread David Malcolm via Gcc-patches

On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote:
> On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> > On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > > Approach 1: Custom Address Spaces
> > > =
> > > 
> > > GCC's C frontend supports target-specific address spaces; see:
> > >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> > > Quoting the N1275 draft of ISO/IEC DTR 18037:
> > >   "Address space names are ordinary identifiers, sharing the same
> > > name
> > >   space as variables and typedef names.  Any such names follow the
> > > same
> > >   rules for scope as other ordinary identifiers (such as typedef
> > > names).
> > >   An implementation may provide an implementation-defined set of
> > >   intrinsic address spaces that are, in effect, predefined at the
> > > start
> > >   of every translation unit.  The names of intrinsic address spaces
> > > must
> > >   be reserved identifiers (beginning with an underscore and an
> > > uppercase
> > >   letter or with two underscores).  An implementation may also
> > >   optionally support a means for new address space names to be
> > > defined
> > >   within a translation unit."
> > > 
> > > Patch 1a in the following patch kit for GCC implements such a means
> > > to
> > > define new address spaces names in a translation unit, via a
> > > pragma:
> > >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> > > 
> > > For example, the Linux kernel could perhaps write:
> > > 
> > >   #define __kernel
> > >   #pragma GCC custom_address_space(__user)
> > >   #pragma GCC custom_address_space(__iomem)
> > >   #pragma GCC custom_address_space(__percpu)
> > >   #pragma GCC custom_address_space(__rcu)
> > > 
> > > and thus the C frontend can complain about code that mismatches
> > > __user
> > > and kernel pointers, e.g.:
> > > 
> > > custom-address-space-1.c: In function ‘test_argpass_to_p’:
> > > custom-address-space-1.c:29:14: error: passing argument 1 of 
> > > ‘accepts_p’
> > > from pointer to non-enclosed address space
> > >    29 |   accepts_p (p_user);
> > >   |  ^~
> > > custom-address-space-1.c:21:24: note: expected ‘void *’ but
> > > argument is
> > > of type ‘__user void *’
> > >    21 | extern void accepts_p (void *);
> > >   |    ^~
> > > custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> > > custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> > > space
> > > pointer from disjoint generic address space pointer
> > >   135 |   p_user = (void __user *)p_kernel;
> > >   |    ^
> > 
> > This seems like an excellent use of named address spaces :)
> 
> It has some big problems though.

Thanks for raising these points.

> 
> Named address spaces are completely target-specific.  Defining them
> with
> a pragma like this does not allow you to set the pointer mode or
> anything related to a custom LEGITIMATE_ADDRESS_P.

My thinking was that each custom address space is based on an existing
address space, but is disjoint from it, where "based on" means "what it
looks like in terms of RTL generation" (clearly I'm handwaving here).  

In patch 1a, the custom address spaces are all based on the generic
address space (but disjoint from it); syntax could be added to base
them on one of the target-specific address spaces.

>   It does not allow
> you to sayy zero pointers are invalid in some address spaces and not in
> others.

Syntax could be added for this, I suppose.

> You cannot provide any of the DWARF address space stuff this
> way.

True.  I confess that I haven't thought about the debugging experience,
and I'd need to think what would happen at the DWARF level.

> But most importantly, there are only four bits for the address
> space field internally, and they are used by however a backend wants to
> use them.

One of the ideas of patch 1a is to divide up this 4-bit space between
the target-specific and the custom address spaces.  The backend code
would need to be tweaked to decode the 4-bit value to get at the
underlying target-specific address space value.  This is done by the
function ensure_builtin_addr_space in patch 1a, though I've likely
missed some places.

IIRC, the target that's currently using the most address spaces is avr,
which I believe has 8 target-specific address spaces, in addition to
the generic one, i.e. 9 builtin address spaces, which would leave room
for up to 6 user-defined address spaces.  The Linux kernel's smatch
annotations currently effectively introduce 4 custom address spaces,
__user, __iomem, __percpu, and __rcu (assuming that __kernel is the
generic address space), so it's something of a tight squeeze, but it
does fit.  This doesn't account for out-of-tree backends, of course.

> 
> None of this cannot be solved, but all of it will have to be solved.

(nods)

> 
> IMO it will be best to not mix this with address spaces in the user
> interface (it is of c

[committed 2/4] d: Merge upstream dmd testsuite 568496d5b

2021-12-08 Thread Iain Buclaw via Gcc-patches

Hi,

This patch merges the D2 testsuite upstream dmd 568496d5b.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 568496d5b.
---
 gcc/testsuite/gdc.test/compilable/b19294.d|  69 ++
 gcc/testsuite/gdc.test/compilable/cdcmp.d |   2 +-
 .../compilable/ddoc_markdown_tables_22285.d   |  15 +
 .../gdc.test/compilable/dtoh_ignored.d|   5 +-
 .../gdc.test/compilable/imports/cstuff3.c |   6 -
 .../gdc.test/compilable/mixintype2.d  |  49 ++
 gcc/testsuite/gdc.test/compilable/noreturn1.d |  49 +-
 .../gdc.test/compilable/previewall.d  |  10 -
 .../gdc.test/compilable/reinterpretctfe.d |  14 +
 gcc/testsuite/gdc.test/compilable/sroa.d  |  55 ++
 .../gdc.test/compilable/stc_traits.d  | 172 
 gcc/testsuite/gdc.test/compilable/test15711.d |  31 +
 gcc/testsuite/gdc.test/compilable/test16492.d |  87 --
 gcc/testsuite/gdc.test/compilable/test19482.d |  68 ++
 gcc/testsuite/gdc.test/compilable/test21438.d |  15 +
 gcc/testsuite/gdc.test/compilable/test21794.d |  52 ++
 gcc/testsuite/gdc.test/compilable/test21850.d |  35 +
 gcc/testsuite/gdc.test/compilable/test22214.d |  16 +
 gcc/testsuite/gdc.test/compilable/test4.d |   4 +
 gcc/testsuite/gdc.test/compilable/test8.d |  11 +
 gcc/testsuite/gdc.test/compilable/test22292.d | 155 
 gcc/testsuite/gdc.test/compilable/test22388.d |  22 +
 gcc/testsuite/gdc.test/compilable/test22410.d |  59 ++
 gcc/testsuite/gdc.test/compilable/test22420.d |  88 +++
 gcc/testsuite/gdc.test/compilable/test22421.d |  19 +
 gcc/testsuite/gdc.test/compilable/test318.d   |  19 +
 gcc/testsuite/gdc.test/compilable/test4090.d  |  17 -
 gcc/testsuite/gdc.test/compilable/test9766.d  |   4 +-
 .../gdc.test/compilable/testcstuff3.d |   4 -
 .../gdc.test/compilable/transition_in.d   |  26 +
 gcc/testsuite/gdc.test/compilable/zerosize.d  |  13 +-
 .../gdc.test/fail_compilation/diag10327.d |   3 +-
 .../gdc.test/fail_compilation/diag20059.d |   2 +-
 .../gdc.test/fail_compilation/fail20618.d |  16 +
 .../gdc.test/fail_compilation/fail21091a.d|   3 +-
 .../gdc.test/fail_compilation/fail21091b.d|   3 +-
 .../gdc.test/fail_compilation/fail22084.d |   2 +-
 .../gdc.test/fail_compilation/fail22151.d |  24 +
 .../gdc.test/fail_compilation/fail22366.d |  15 +
 .../gdc.test/fail_compilation/fail225.d   |  17 -
 .../gdc.test/fail_compilation/fail287.d   |   2 +-
 .../gdc.test/fail_compilation/fail318.d   |   8 -
 .../gdc.test/fail_compilation/fail318_b.d |  11 +
 .../gdc.test/fail_compilation/fail7173.d  |   2 +-
 .../gdc.test/fail_compilation/foreach.d   |  14 +
 .../gdc.test/fail_compilation/foreach2.d  |  22 +
 .../gdc.test/fail_compilation/ice10212.d  |   2 +-
 .../gdc.test/fail_compilation/ice22377.d  |   8 +
 .../gdc.test/fail_compilation/ice7782.d   |   3 +-
 .../fail_compilation/imports/imp22329.d   |   4 +
 .../gdc.test/fail_compilation/noreturn.d  |   2 +-
 .../gdc.test/fail_compilation/noreturn2.d |  90 +++
 .../fail_compilation/reserved_version.d   |   6 +
 .../reserved_version_switch.d |   6 +
 .../gdc.test/fail_compilation/test17425.d |   2 +-
 .../gdc.test/fail_compilation/test17868b.d|   2 +-
 .../gdc.test/fail_compilation/test20998.d | 120 +++
 .../gdc.test/fail_compilation/test21093.d |  56 ++
 .../gdc.test/fail_compilation/test21380.d |  46 ++
 .../gdc.test/fail_compilation/test21930.d |  27 +
 .../gdc.test/fail_compilation/test22329.d |  21 +
 .../gdc.test/fail_compilation/test22361.d |  11 +
 .../gdc.test/fail_compilation/testOpApply.d   | 161 
 gcc/testsuite/gdc.test/runnable/aliasthis.d   |  36 +
 gcc/testsuite/gdc.test/runnable/dhry.d|  16 +
 gcc/testsuite/gdc.test/runnable/fix22372.d|  38 +
 gcc/testsuite/gdc.test/runnable/interpret.d   |  57 ++
 gcc/testsuite/gdc.test/runnable/noreturn1.d   |  47 ++
 gcc/testsuite/gdc.test/runnable/noreturn2.d   | 220 ++
 gcc/testsuite/gdc.test/runnable/sroa13220.d   | 103 +++
 gcc/testsuite/gdc.test/runnable/test15624.d   |  51 --
 gcc/testsuite/gdc.test/runnable/test21039.d   |  27 +
 gcc/testsuite/gdc.test/runnable/test22205.d   |  17 +
 gcc/testsuite/gdc.test/runnable/test22278.d   |  24 +
 gcc/testsuite/gdc.test/runnable/testOpApply.d | 142 
 gcc/testsuite/gdc.test/runnable/testmainb.d   |  15 +
 gcc/testsuite/gdc.test/runnable/uda.d |  48 ++
 gcc/testsuite/gdc.test/runnable/ufcs.d|   1 +
 .../runnable_cxx/extra-files/cpp22287.cpp | 337 
 .../gdc.test/runnable_cxx/test22287.d | 327 
 80 files changed, 3187 insertions(+), 221 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/b19294.d
 create mode 100644 
gcc/testsuite/gdc.test/compilable/ddoc_markdown_tables_22285.d
 delete mode 100644 gcc/testsuite/gdc.test/compilable/imports/cstuff3.c
 d

Re: [PATCH v3 3/7] ifcvt: Improve costs handling for noce_convert_multiple.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

When noce_convert_multiple is called the original costs are not yet
initialized.  Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2).  This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.

This patch temporarily initializes the original costs by counting
adding costs for all sets inside the then_bb.
---
  gcc/ifcvt.c | 31 +++
  1 file changed, 27 insertions(+), 4 deletions(-)

OK with a ChangeLog once the prereqs are approved.

jeff

Re: [PATCH v3 2/7] ifcvt: Allow constants for noce_convert_multiple.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

This lifts the restriction of not allowing constants for
noce_convert_multiple.  The code later checks if a valid sequence
is produced anyway.
---
  gcc/ifcvt.c | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

Fine with a ChangeLog once the prereqs are approved.

jeff

Re: [PATCH v3 6/7] testsuite/s390: Add tests for noce_convert_multiple.

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:

Add new s390-specific tests that check if we convert two SETs into two
loads on condition.  Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
---
  gcc/testsuite/gcc.dg/ifcvt-4.c|  2 +-
  .../gcc.target/s390/ifcvt-two-insns-bool.c| 39 +++
  .../gcc.target/s390/ifcvt-two-insns-int.c | 39 +++
  .../gcc.target/s390/ifcvt-two-insns-long.c| 39 +++
  4 files changed, 118 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
  create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
  create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

This is fine with a ChangeLog entry once the prereqs are approved.

jeff

[committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 19:27, Jonathan Wakely wrote:
>
> On Wed, 8 Dec 2021 at 19:21, Jonathan Wakely wrote:
> >
> > On Wed, 8 Dec 2021 at 19:17, Rainer Orth wrote:
> > >
> > > Hi Jonathan,
> > >
> > > > I've pushed this change to trunk now (it was posted and reviewed in
> > > > stage 1, I just didn't get around to pushing it until now).
> > > >
> > > > The final version of the patch is attached to this mail.
> > >
> > > unfortunately, it breaks Solaris/SPARC bootstrap:
> > >
> > > In file included from 
> > > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr.h:53,
> > >  from 
> > > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/memory:77,
> > >  from 
> > > /vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:82:
> > > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:
> > >  In member function 'void std::_Sp_counted_base<_Lp>::_M_release() [with 
> > > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]':
> > > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:329:26:
> > >  error: right operand of shift expression '(1 << 64)' is greater than or 
> > > equal to the precision 64 of the left operand [-fpermissive]
> > >   329 | = 1LL + (1LL << (__CHAR_BIT__ * 
> > > sizeof(_Atomic_word)));
> > >   | ~^
> > > make[9]: *** [Makefile:1875: 
> > > sparc-sun-solaris2.11/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> > >
> > > For 64-bit SPARC, _Atomic_word is long.
> >
> > Ah yes, so we need to disable this optimization. Patch coming up ...
>
> Gah, I remembered to check that:
>
>   constexpr bool __double_word
> = sizeof(long long) == 2 * sizeof(_Atomic_word);
>   // The ref-count members follow the vptr, so are aligned to
>   // alignof(void*).
>   constexpr bool __aligned = __alignof(long long) <= alignof(void*);
>   if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
>
>
> But for C++11 and C++14 that is a normal runtime condition not
> if-constexpr, so the undefined shift still gets compiled, even though
> it can never be reached at runtime.

Fixed like so. Tested sparc-sun-solaris2.11, pushed to trunk.
commit c15aa46cca0649b68613d3292cf71c7cc57ef78f
Author: Jonathan Wakely 
Date:   Wed Dec 8 19:36:24 2021

libstdc++: Fix undefined shift when _Atomic_word is 64-bit

The check for _Atomic_word being 32-bit is just a normal runtime
condition for C++11 and C++14, because it doesn't use if-constexpr. That
means the 1LL << (CHAR_BIT * sizeof(_Atomic_word)) expression expands to
1LL << 64 on Solaris, which is ill-formed.

This adds another indirection so that the shift width is zero if the
code is unreachable.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (_Sp_counted_base::_M_release()):
Make shift width conditional on __double_word condition.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index 90ad30947b0..f315d8f354f 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -325,8 +325,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool __aligned = __alignof(long long) <= alignof(void*);
   if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
{
- constexpr long long __unique_ref
-   = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
+ constexpr int __wordbits = __CHAR_BIT__ * sizeof(_Atomic_word);
+ constexpr int __shiftbits = __double_word ? __wordbits : 0;
+ constexpr long long __unique_ref = 1LL + (1LL << __shiftbits);
  auto __both_counts = reinterpret_cast(&_M_use_count);
 
  _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);

Re: [PATCH 3/3] Fix loop split incorrect count and probability

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:

In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split.  Both split behavior should consider
loop count and probability update.  For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
  1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
  2) probability update in the two loops and between the two loops.

Regression tested pass, OK for master?

Changes diff for split_loop and split_loop_on_cond cases:

1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
 [local count: 118111600]:
if (beg_5(D) < end_8(D))
  goto ; [89.00%]
else
  goto ; [11.00%]

 [local count: 105119324]:
if (beg2_6(D) < c_9(D))
-goto ; [100.00%]
+goto ; [33.00%]
else
-goto ; [100.00%]
+goto ; [67.00%]

-   [local count: 105119324]:
+   [local count: 34689377]:
_25 = beg_5(D) + 1;
_26 = end_8(D) - beg_5(D);
_27 = beg2_6(D) + _26;
_28 = MIN_EXPR ;

-   [local count: 955630225]:
+   [local count: 315357973]:
# i_16 = PHI 
# j_17 = PHI 
printf ("a: %d %d\n", i_16, j_17);
i_11 = i_16 + 1;
j_12 = j_17 + 1;
if (j_12 < _28)
-goto ; [89.00%]
+goto ; [29.37%]
else
-goto ; [11.00%]
+goto ; [70.63%]

-   [local count: 850510901]:
+   [local count: 280668596]:
goto ; [100.00%]

-   [local count: 105119324]:
+   [local count: 70429947]:
# i_22 = PHI 
# j_23 = PHI 

 [local count: 955630225]:
# i_2 = PHI 
# j_1 = PHI 
i_20 = i_2 + 1;
j_21 = j_1 + 1;
if (end_8(D) > i_20)
-goto ; [89.00%]
+goto ; [59.63%]
else
-goto ; [11.00%]
+goto ; [40.37%]

-   [local count: 850510901]:
+   [local count: 569842305]:
goto ; [100.00%]

 [local count: 105119324]:
# i_29 = PHI 
# j_30 = PHI 
if (end_8(D) > i_29)
  goto ; [80.00%]
else
  goto ; [20.00%]

 [local count: 105119324]:

 [local count: 118111600]:
return 0;

  }
 [local count: 118111600]:
-  if (beg_5(D) < end_8(D))
+  _1 = end_6(D) - beg_7(D);
+  j_9 = _1 + beg2_8(D);
+  if (end_6(D) > beg_7(D))
  goto ; [89.00%]
else
  goto ; [11.00%]

 [local count: 105119324]:
-  if (beg2_6(D) < c_9(D))
-goto ; [100.00%]
+  if (j_9 >= c_11(D))
+goto ; [33.00%]
else
-goto ; [100.00%]
+goto ; [67.00%]

-   [local count: 105119324]:
-  _25 = beg_5(D) + 1;
-  _26 = end_8(D) - beg_5(D);
-  _27 = beg2_6(D) + _26;
-  _28 = MIN_EXPR ;
-
-   [local count: 955630225]:
-  # i_16 = PHI 
-  # j_17 = PHI 
-  printf ("a: %d %d\n", i_16, j_17);
-  i_11 = i_16 + 1;
-  j_12 = j_17 + 1;
-  if (j_12 < _28)
-goto ; [89.00%]
+   [local count: 34689377]:
+  _27 = end_6(D) + -1;
+  _28 = beg_7(D) - end_6(D);
+  _29 = j_9 + _28;
+  _30 = _29 + 1;
+  _31 = MAX_EXPR ;
+
+   [local count: 315357973]:
+  # i_18 = PHI 
+  # j_19 = PHI 
+  printf ("a: %d %d\n", i_18, j_19);
+  i_13 = i_18 + -1;
+  j_14 = j_19 + -1;
+  if (j_14 >= _31)
+goto ; [29.37%]
else
-goto ; [11.00%]
+goto ; [70.63%]

-   [local count: 850510901]:
+   [local count: 280668596]:
goto ; [100.00%]

-   [local count: 105119324]:
-  # i_22 = PHI 
-  # j_23 = PHI 
+   [local count: 70429947]:
+  # i_24 = PHI 
+  # j_25 = PHI 

 [local count: 955630225]:
-  # i_2 = PHI 
-  # j_1 = PHI 
-  i_20 = i_2 + 1;
-  j_21 = j_1 + 1;
-  if (end_8(D) > i_20)
+  # i_3 = PHI 
+  # j_2 = PHI 
+  i_22 = i_3 + -1;
+  j_23 = j_2 + -1;
+  if (beg_7(D) < i_22)
  goto ; [89.00%]
else
  goto ; [11.00%]

-   [local count: 850510901]:
+   [local count: 569842305]:
goto ; [100.00%]

 [local count: 105119324]:
-  # i_29 = PHI 
-  # j_30 = PHI 
-  if (end_8(D) > i_29)
+  # i_32 = PHI 
+  # j_33 = PHI 
+  if (beg_7(D) < i_32)
  goto ; [80.00%]
else
  goto ; [20.00%]

 [local count: 105119324]:

 [local count: 118111600]:
return 0;

  }

2) diff base/loop-cond-split-1.c.151t.lsplit  
patched/loop-cond-split-1.c.151t.lsplit:
...
 [local count: 118111600]:
if (n_7(D) > 0)
  goto ; [89.00%]
else
  goto ; [11.00%]

 [local count: 118111600]:
return;

 [local count: 105119324]:
pretmp_3 = ga;

-   [local count: 955630225]:
+   [local count: 315357973]:
# i_13 = PHI 
# prephitmp_12 = PHI 
if (prephitmp_12 != 0)
  goto ; [33.00%]
else
  goto ; [67.00%]

 [local count: 315357972]:
_2 = do_something ();
ga = _2;

-   [local count: 955630225]:
+   [local count: 315357973]:
# prephitmp_5 = PHI 
i_10 = inc (i_13);
if (n_7(D) > i_10)
  goto ; [89.00%]
else
  goto ; [11.00%]

 [loca

Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:

r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.

Regression tested on P8LE, OK for master?

gcc/ChangeLog:

PR middle-end/103270
* predict.c (predict_extra_loop_exits): Add loop parameter.
(predict_loops): Call with loop argument.

gcc/testsuite/ChangeLog:

PR middle-end/103270
* gcc.dg/pr103270.c: New test.

OK
jeff

Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:

gcc/ChangeLog:

* loop-invariant.c (find_invariants_bb): Check profile count
before motion.
(find_invariants_body): Add argument.

OK
jeff

[PATCH v4] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-08 Thread Marek Polacek via Gcc-patches

On Wed, Dec 08, 2021 at 03:09:00PM -0500, Jason Merrill wrote:
> On 12/8/21 13:32, Marek Polacek wrote:
> > On Wed, Dec 08, 2021 at 09:15:05AM -0500, Jason Merrill wrote:
> > > On 12/7/21 19:25, Marek Polacek wrote:
> > > > On Mon, Dec 06, 2021 at 04:44:06PM -0500, Jason Merrill wrote:
> > > > > Please also make this change to cp_parser_sizeof_operand, and add 
> > > > > tests
> > > > > involving sizeof/alignof in array bounds.  OK with that change.
> > > > 
> > > > Turns out we reject sizeof(auto(4)) because cp_parser_type_id_1 errors
> > > > "invalid use of auto".  So I've added a hack to make it work; auto(x)
> > > > is *not* a type-id, so reject that parse and let it be parsed as an
> > > > expression.
> > > > 
> > > > FWIW, I don't think we need to clear 
> > > > auto_is_implicit_function_template_parm_p
> > > > in cp_parser_sizeof_operand for parameters like int[sizeof(auto(10))] 
> > > > because
> > > > the auto is in a declarator and auto_is_... will have been cleared 
> > > > already in
> > > > cp_parser_parameter_declaration before parsing the declarator.  But 
> > > > I've added
> > > > it anyway, maybe there are other cases where it matters.
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
> > > > so
> > > > 
> > > > void f(decltype(auto(0)));
> > > > 
> > > > should be just as
> > > > 
> > > > void f(int);
> > > > 
> > > > but currently, everytime we see 'auto' in a 
> > > > parameter-declaration-clause,
> > > > we try to synthesize_implicit_template_parm for it, creating a new 
> > > > template
> > > > parameter list.  The code above actually has us calling s_i_t_p twice;
> > > > once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
> > > > fails and then again from cp_parser_decltype_expr -> 
> > > > cp_parser_expression.
> > > > So it looks like we have f and we accept ill-formed code.
> > > > 
> > > > This shows that we need to be more careful about synthesizing the
> > > > implicit template parameter.  [dcl.spec.auto.general] says that "A
> > > > placeholder-type-specifier of the form type-constraintopt auto can be
> > > > used as a decl-specifier of the decl-specifier-seq of a
> > > > parameter-declaration of a function declaration or lambda-expression..."
> > > > so this patch turns off auto_is_... after we've parsed the 
> > > > decl-specifier-seq.
> > > > 
> > > > That doesn't quite cut it yet though, because we also need to handle an
> > > > auto nested in the decl-specifier:
> > > > 
> > > > void f(decltype(new auto{0}));
> > > > 
> > > > therefore the cp_parser_decltype change.
> > > > 
> > > > To accept "sizeof(auto{10})", the cp_parser_type_id_1 hunk rejects the
> > > > current parse if it sees an auto followed by a ( or {.
> > > 
> > > The problem here doesn't seem specific to the ( or {, but that we're 
> > > giving
> > > a hard error in tentative parsing context; I think we want to guard that
> > > error with cp_parser_simulate_error like we do a few lines earlier for 
> > > class
> > > template placeholders.
> > 
> > I agree that that's generally the approach that makes sense, but in this
> > case it regresses our diagnostic :(.  For example,
> > 
> >int i = *(auto *) 0;
> > 
> > would give
> > 
> > q.C:1:11: error: expected primary-expression before ‘auto’
> >  1 | int i = *(auto *) 0;
> >|   ^~~~
> > q.C:1:11: error: expected ‘)’ before ‘auto’
> >  1 | int i = *(auto *) 0;
> >|  ~^~~~
> >|   )
> > 
> > instead of the current
> > 
> > q.C:1:11: error: invalid use of ‘auto’
> >  1 | int i = *(auto *) 0;
> >|   ^~~~
> > 
> > We just reject the parse in cp_parser_type_id_1 and then give an error in
> > cp_parser_primary_expression:
> > 
> >cp_parser_error (parser, "expected primary-expression");
> > 
> > I suppose I could add 'case RID_AUTO' to cp_parser_primary_expression and
> > issue an error there, but that doesn't understand decltype(auto) etc, and
> > still issues multiple error messages.
> > 
> > 
> > Or, maybe it would be OK to actually go with the cp_parser_simulate_error
> > approach and accept that about 5 tests produce somewhat worse diagnostic.
> > 
> > What's your preference?
> 
> Hmm.
> 
> auto( could be the beginning of e.g. auto(*)(), which is also a type-id, and
> might trip your assert instead of giving an error.

Ah, yes.
 
> So I think the latter is the way to go.

Patch attached.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

> I wonder about some time establishing a pattern in the parser that if a
> tentative parse results in error_mark_node without simulating an error, we
> repeat the same parse again to get the desired semantic error.  But that's a
> big project, not something to address this bug.

Interesting.  How would we handle e.g. the case when sizeof gets something
that isn't a valid typ

Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 10:37 PM, Alexandre Oliva via Gcc-patches wrote:

If we emit clobbers before multi-word moves during lra, we get
confused if a copy ends up with input or output replaced with each
other: the clobber then kills the previous set, and it gets deleted.

This patch avoids emitting such clobbers when lra_in_progress.

Regstrapped on x86_64-linux-gnu.  Verified that, applied on a riscv64
compiler that failed the test, the asm statements are no longer dropped
in the reload dumps.  Running a x86_64-x-riscv64 regression testing now.
Ok to install?


for  gcc/ChangeLog

PR target/103302
expr.c (emit_move_multi_word): Skip clobber during lra.

for  gcc/testsuite/ChangeLog

PR target/103302
* gcc.target/riscv/pr103302.c: New.

OK.  Nit in the ChangeLog.  You forgot a '*' before the expr.c entry.

jeff

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-12-08 Thread David Malcolm via Gcc-patches

On Mon, 2021-12-06 at 11:12 -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > [Crossposting between gcc-patches@gcc.gnu.org and
> > linux-toolcha...@vger.kernel.org; sorry about my lack of kernel
> > knowledge, in case of the following seems bogus]
> > 
> > I've been trying to turn my prototype from the LPC2021 session on
> > "Adding kernel-specific test coverage to GCC's -fanalyzer option"
> > ( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
> > something that can go into GCC upstream without adding kernel-
> > specific
> > special cases, or requiring a GCC plugin.  The prototype simply
> > specialcased "copy_from_user" and "copy_to_user" in GCC, which is
> > clearly not OK.
> > 
> > This GCC patch kit implements detection of "trust boundaries", aimed
> > at
> > detection of "infoleaks" and of use of unsanitized attacker-
> > controlled
> > values ("taint") in the Linux kernel.
> > 
> > For example, here's an infoleak diagnostic (using notes to
> > express what fields and padding within a struct have not been
> > initialized):
> > 
> > infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
> > infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of
> > sensitive
> >    information by copying uninitialized data from stack across trust
> >    boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
> >     28 | copy_to_user(optval, &cinfo, sizeof(cinfo));
> >    | ^~~
> >    ‘test_1’: events 1-3
> >  |
> >  |   21 | struct sco_conninfo cinfo;
> >  |  | ^
> >  |  | |
> >  |  | (1) region created on stack
> > here
> >  |  | (2) capacity: 6 bytes
> >  |..
> >  |   28 | copy_to_user(optval, &cinfo, sizeof(cinfo));
> >  |  | ~~~
> >  |  | |
> >  |  | (3) uninitialized data copied from stack here
> >  |
> > infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
> >     28 | copy_to_user(optval, &cinfo, sizeof(cinfo));
> >    | ^~~
> > infoleak-CVE-2011-1078-2.c:14:15: note: padding after field
> > ‘dev_class’ is uninitialized (1 byte)
> >     14 | __u8  dev_class[3];
> >    |   ^
> > infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-
> > initialization by providing a ‘{0}’ initializer
> >     21 | struct sco_conninfo cinfo;
> >    | ^
> >    |   = {0}
> > 
> > I have to come up with a way of expressing trust boundaries in a way
> > that will be:
> > - acceptable to the GCC community (not be too kernel-specific), and
> > - useful to the Linux kernel community.
> > 
> > At LPC it was pointed out that the kernel already has various
> > annotations e.g. "__user" for different kinds of pointers, and that
> > it
> > would be best to reuse those.
> > 
> > 
> > Approach 1: Custom Address Spaces
> > =
> > 
> > GCC's C frontend supports target-specific address spaces; see:
> >    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> > Quoting the N1275 draft of ISO/IEC DTR 18037:
> >    "Address space names are ordinary identifiers, sharing the same
> > name
> >    space as variables and typedef names.  Any such names follow the
> > same
> >    rules for scope as other ordinary identifiers (such as typedef
> > names).
> >    An implementation may provide an implementation-defined set of
> >    intrinsic address spaces that are, in effect, predefined at the
> > start
> >    of every translation unit.  The names of intrinsic address spaces
> > must
> >    be reserved identifiers (beginning with an underscore and an
> > uppercase
> >    letter or with two underscores).  An implementation may also
> >    optionally support a means for new address space names to be
> > defined
> >    within a translation unit."
> > 
> > Patch 1a in the following patch kit for GCC implements such a means
> > to
> > define new address spaces names in a translation unit, via a pragma:
> >    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> > 
> > For example, the Linux kernel could perhaps write:
> > 
> >    #define __kernel
> >    #pragma GCC custom_address_space(__user)
> >    #pragma GCC custom_address_space(__iomem)
> >    #pragma GCC custom_address_space(__percpu)
> >    #pragma GCC custom_address_space(__rcu)
> > 
> > and thus the C frontend can complain about code that mismatches
> > __user
> > and kernel pointers, e.g.:
> > 
> > custom-address-space-1.c: In function ‘test_argpass_to_p’:
> > custom-address-space-1.c:29:14: error: passing argument 1 of
> > ‘accepts_p’
> > from pointer to non-enclosed address s

Re: [PATCH] pch: Add support for relocation of the PCH data [PR71934]

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/8/2021 1:00 AM, Iain Sandoe wrote:



On 7 Dec 2021, at 14:50, Jakub Jelinek via Gcc-patches 
 wrote:

On Tue, Dec 07, 2021 at 10:55:07AM +0100, Jakub Jelinek via Gcc-patches wrote:

So, this patch instead builds a relocation table (sorted list of addresses
in the blob which needs relocation) at PCH save time, stores it in a very
compact form into the gch file and upon restore, adjusts pointers in GTY
roots (that is right away in the root structures) and the addresses in the
relocation table.
The cost on stdc++.gch/O2g.gch (previously 85MB large) is about 3% file size
growth, there are 2.5 million pointers that need relocation in the gch blob
and the relocation table uses uleb128 for address deltas and needs ~1.01 bytes
for one address that needs relocation, and about 20% compile time during
PCH save (I think it is mainly because of the need to qsort those 2.5
million pointers).  On PCH restore, if it doesn't need relocation (the usual
case), it is just an extra fread of sizeof (size_t) data and fseek
(in my tests real time on vanilla tree for #include  CU
was ~0.175s and with the patch but no relocation ~0.173s), while if it needs
relocation it took ~0.193s, i.e. 11.5% slower.

I'll note that without PCH that
#include 
int i;
testcase compiles with -O2 -g in ~1.199s, i.e. 6.2 times slower than PCH with
relocation and 6.9 times than PCH without relocation.

I’ve run tests across the Darwin range, including old and new m32 hosts, and 
this
seems to be working well.

The attached patch should be applied before (or merged with) the change for
relocation when it is applied - since the operation of the PCH hooks needs some
adjustment on Darwin.

thanks for working on this!
I think as the Darwin maintainer, you can just commit this once Jakub 
commits his bits.


jeff

Re: [PATCH] pch: Add support for relocation of the PCH data [PR71934]

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 2:55 AM, Jakub Jelinek wrote:

Hi!

The following patch adds support for relocation of the PCH blob on PCH
restore if we don't manage to get the preferred map slot for it.
The GTY stuff knows where all the pointers are, after all it relocates
it once during PCH save from the addresses where it was initially allocated
to addresses in the preferred map slot.
But, if we were to do it solely using GTY info upon PCH restore, we'd need
another set of GTY functions, which I think would make it less maintainable
and I think it would also be more costly at PCH restore time.  Those
functions would need to call something to add bias to pointers that haven't
been marked yet and make sure not to add bias to any pointer twice.

So, this patch instead builds a relocation table (sorted list of addresses
in the blob which needs relocation) at PCH save time, stores it in a very
compact form into the gch file and upon restore, adjusts pointers in GTY
roots (that is right away in the root structures) and the addresses in the
relocation table.
The cost on stdc++.gch/O2g.gch (previously 85MB large) is about 3% file size
growth, there are 2.5 million pointers that need relocation in the gch blob
and the relocation table uses uleb128 for address deltas and needs ~1.01 bytes
for one address that needs relocation, and about 20% compile time during
PCH save (I think it is mainly because of the need to qsort those 2.5
million pointers).  On PCH restore, if it doesn't need relocation (the usual
case), it is just an extra fread of sizeof (size_t) data and fseek
(in my tests real time on vanilla tree for #include  CU
was ~0.175s and with the patch but no relocation ~0.173s), while if it needs
relocation it took ~0.193s, i.e. 11.5% slower.

The discovery of the pointers in the blob that need relocation is done
in the relocate_ptrs hook which does the pointer relocation during PCH save.
Unfortunately, I had to make one change to the gengtype stuff due to the
nested_ptr feature of GTY, which some libcpp headers and stringpool.c use.
The relocate_ptrs hook had 2 arguments, pointer to the pointer and a cookie.
When relocate_ptrs is done, in most cases it is called solely on the
subfields of the current object, so e.g.
   if ((void *)(x) == this_obj)
 op (&((*x).u.fld[0].rt_rtx), cookie);
so relocate_ptrs can assert that ptr_p is within the
state->ptrs[state->ptrs_i]->obj ..
state->ptrs[state->ptrs_i]->obj+state->ptrs[state->ptrs_i]->size-sizeof(void*)
range and compute from that the address in the blob which will need
relocation (state->ptrs[state->ptrs_i]->new_addr is the new address
given to it and ptr_p-state->ptrs[state->ptrs_i]->obj is the relative
offset.  Unfortunately, for nested_ptr gengtype emits something like:
   {
 union tree_node * x0 =
   ((*x).val.node.node) ? HT_IDENT_TO_GCC_IDENT (HT_NODE 
(((*x).val.node.node))) : NULL;
 if ((void *)(x) == this_obj)
   op (&(x0), cookie);
 (*x).val.node.node = (x0) ? CPP_HASHNODE (GCC_IDENT_TO_HT_IDENT 
((x0))) : NULL;
   }
so relocate_ptrs is called with an address of some temporary variable and
so doesn't know where the pointer will finally be.
So, I've added another argument to relocate_ptrs (and to
gt_pointer_operator).  For the most common case I pass NULL as the new middle
argument to that function, first one remains pointer to the pointer that
needs adjustment and last the cookie.  The NULL seems to be cheap to compute
and short in the gt*.[ch] files and stands for ptr_p is an address within
the this_obj's range, remember its address.  For the nested_ptr case, the
new middle argument contains actual address of the pointer that might need
to be relocated, so instead of the above
   op (&(x0), &((*x).val.node.node), cookie);
in there.  And finally, e.g. for the reorder case I need a way to tell
restore_ptrs to ignore a particular address for the relocation purposes
and only treat it the old way.  I've used for that the case when
the first and second arguments are equal.

In order to enable support for mapping PCH as fallback at different
addresses than the preferred ones, a small change is needed to the
host pch_use_address hooks.  One change I've done to all of them is
the change of the type of the first argument from void * to void *&,
such that the actual address can be told to the callers (or shall I
instead use void **?), but another change that still needs to be done
in them if they want the relocation is actually not fail if they couldn't
get a preferred address, but instead modify what the first argument
refers to.  I've done that only for host-linux.c and Iain is testing
similar change for host-darwin.c.  Didn't change hpux, netbsd, openbsd,
solaris, mingw32 or the fallbacks because I can't test those.

Bootstrapped/regtested on x86_64-linux and i686-linux (both as is and with
incremental
--- gcc/config/host-linux.c.jj  2021-12-06 22:22:42.00367 +0100
+++ gcc/config/host-linux.c

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-12-08 Thread Andrew MacLeod via Gcc-patches


On 12/2/21 08:46, Richard Biener wrote:

On Thu, Dec 2, 2021 at 2:10 PM Martin Liška  wrote:

On 12/2/21 13:01, Richard Biener wrote:

On Thu, Dec 2, 2021 at 12:45 PM Martin Liška  wrote:

On 12/1/21 19:21, Andrew MacLeod wrote:

On 12/1/21 09:48, Martin Liška wrote:

On 12/1/21 15:34, Richard Biener wrote:

On Wed, Dec 1, 2021 at 3:25 PM Martin Liška  wrote:

On 12/1/21 15:19, Richard Biener wrote:

which is compute the range of 'lhs' on edge_true into predicate->true_range,
assign that same range to ->false_range and then invert it to get the
range on the false_edge.  What I am saying is that for better precision
you should do

 ranger->range_on_edge (predicate->false_range, edge_false, lhs);

rather than prematurely optimize this to the inversion of the true range
since yes, ranger is CFG sensitive and only the_last_ predicate on a
long CFG path is actually inverted.

What am I missing?

I might be misunderstood, but I think it's the problem defined here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584605.html

where I used the ranger->range_on_edge on the false_edge.

Ah, OK.  But then even the true_edge range is possibly wrong, no?

You are of course correct, I've just proved that in debugger ://


Consider

 for (;;)
{
if (a < 100)
  if (a > 50)  // unswitch on this
/* .. */
if (a < 120)
/* ... */
}

then you record [51, 99] for true_range of the a > 50 predicate and thus
simplification will simplify the if (a < 120) check, no?

Yep.


You can only record the range from the (CFG independent) a > 50 check,
thus [51, +INF] but of course at simplification time you can also use
the CFG context at each simplification location.

@Andrew: How can I easily get irange based just on a stmt? Something like 
fold_range
with int_range_max as the 3rd argument?


Sorry, I miss these things if I'm not directly CC'd a lot :-)

So you just want to know the basic range the stmt generates without context?
Sure, what you say would be fine, but your want to initialize it to the type 
range:

Yes, I want to know range of LHS in a gcond statement and the same for cases in 
a gswitch statement.


int_range_max range (TREE_TYPE (name));

you can also simply trigger it using the current SSA_NAME_RANGE_INFO global  
values query instead of the default current contextual one... which , if there 
isnt a global range, will automatically use the range of the type of the 
argument.. so maybe just try

fold_range (r, stmt, get_global_range_query ())

Doing

 predicate->true_range = int_range_max (TREE_TYPE (lhs));
 fold_range (predicate->true_range, stmt, get_global_range_query ());
 predicate->true_range.debug();

gives me _Bool VARYING.

Likely because that gives a range for the bool result rather than
a range for the LHS of a LHS op RHS on the true or false edge.

Yes :) I guess Andrew can help us.

some grepping and poking found be gimple_range_calc_op1 which should
eventually do the trick (for constant gimple_cond_rhs at least).

Wait, didn't the  gori()->outgoing_edge_range_p()  call do this for you? 
Its a more general API that invokes gimple_range_calc_op1() under the 
covers.


Im going to add myself to the cc so I don't miss any more of these :-)

Or am I just way behind in my non-addressed reading? and that did 
address your issue? :-)


Andrew

Re: [PATCH] D: fix UBSAN

2021-12-08 Thread Iain Buclaw via Gcc-patches

Excerpts from Martin Liška's message of December 6, 2021 1:03 pm:
> Fixes:
> gcc/d/expr.cc:2596:9: runtime error: null pointer passed as argument 2, which 
> is declared to never be null
> 
> Ready for master?
> Thanks,
> Martin
> 

Looks reasonable to me.

Iain.

[PATCH, committed] PR fortran/103609 - [11/12 Regression] ICE in gfc_sym_get_dummy_args, at fortran/symbol.c:5243

2021-12-08 Thread Harald Anlauf via Gcc-patches

Dear all,

the attached patch fixes a NULL pointer dereference for a missed
declaration of a dummy argument or a conflict of a procedure with
another decl.

Committed as obvious after regtesting on x86_64-pc-linux-gnu:

commit r12-5847-gb77968a70537429b4f548f90c369d26e6b6943cc
Author: Harald Anlauf 
Date:   Wed Dec 8 21:14:19 2021 +0100

Fortran: avoid NULL pointer dereference on missing or bad dummy arguments

gcc/fortran/ChangeLog:

PR fortran/103609
* symbol.c (gfc_sym_get_dummy_args): Catch NULL pointer
dereference.

gcc/testsuite/ChangeLog:

PR fortran/103609
* gfortran.dg/pr103609.f90: New test.

Thanks,
Harald
From b77968a70537429b4f548f90c369d26e6b6943cc Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 8 Dec 2021 21:14:19 +0100
Subject: [PATCH] Fortran: avoid NULL pointer dereference on missing or bad
 dummy arguments

gcc/fortran/ChangeLog:

	PR fortran/103609
	* symbol.c (gfc_sym_get_dummy_args): Catch NULL pointer
	dereference.

gcc/testsuite/ChangeLog:

	PR fortran/103609
	* gfortran.dg/pr103609.f90: New test.
---
 gcc/fortran/symbol.c   |  3 +++
 gcc/testsuite/gfortran.dg/pr103609.f90 | 15 +++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr103609.f90

diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 179f6029ca3..ebd99846610 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -5240,6 +5240,9 @@ gfc_sym_get_dummy_args (gfc_symbol *sym)
 {
   gfc_formal_arglist *dummies;

+  if (sym == NULL)
+return NULL;
+
   dummies = sym->formal;
   if (dummies == NULL && sym->ts.interface != NULL)
 dummies = sym->ts.interface->formal;
diff --git a/gcc/testsuite/gfortran.dg/pr103609.f90 b/gcc/testsuite/gfortran.dg/pr103609.f90
new file mode 100644
index 000..57f6a3b1531
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103609.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! PR fortran/103609 - ICE in gfc_sym_get_dummy_args
+! Contributed by G.Steinmetz
+
+program p
+  implicit none
+  integer :: i
+  do i = 1, 2
+ call s
+  end do
+contains
+  subroutine s
+call sub(x) ! { dg-error "has no IMPLICIT type" }
+  end
+end
--
2.26.2

Re: [PATCH v3] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-08 Thread Jason Merrill via Gcc-patches


On 12/8/21 13:32, Marek Polacek wrote:

On Wed, Dec 08, 2021 at 09:15:05AM -0500, Jason Merrill wrote:

On 12/7/21 19:25, Marek Polacek wrote:

On Mon, Dec 06, 2021 at 04:44:06PM -0500, Jason Merrill wrote:

Please also make this change to cp_parser_sizeof_operand, and add tests
involving sizeof/alignof in array bounds.  OK with that change.


Turns out we reject sizeof(auto(4)) because cp_parser_type_id_1 errors
"invalid use of auto".  So I've added a hack to make it work; auto(x)
is *not* a type-id, so reject that parse and let it be parsed as an
expression.

FWIW, I don't think we need to clear auto_is_implicit_function_template_parm_p
in cp_parser_sizeof_operand for parameters like int[sizeof(auto(10))] because
the auto is in a declarator and auto_is_... will have been cleared already in
cp_parser_parameter_declaration before parsing the declarator.  But I've added
it anyway, maybe there are other cases where it matters.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
so

void f(decltype(auto(0)));

should be just as

void f(int);

but currently, everytime we see 'auto' in a parameter-declaration-clause,
we try to synthesize_implicit_template_parm for it, creating a new template
parameter list.  The code above actually has us calling s_i_t_p twice;
once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
So it looks like we have f and we accept ill-formed code.

This shows that we need to be more careful about synthesizing the
implicit template parameter.  [dcl.spec.auto.general] says that "A
placeholder-type-specifier of the form type-constraintopt auto can be
used as a decl-specifier of the decl-specifier-seq of a
parameter-declaration of a function declaration or lambda-expression..."
so this patch turns off auto_is_... after we've parsed the decl-specifier-seq.

That doesn't quite cut it yet though, because we also need to handle an
auto nested in the decl-specifier:

void f(decltype(new auto{0}));

therefore the cp_parser_decltype change.

To accept "sizeof(auto{10})", the cp_parser_type_id_1 hunk rejects the
current parse if it sees an auto followed by a ( or {.


The problem here doesn't seem specific to the ( or {, but that we're giving
a hard error in tentative parsing context; I think we want to guard that
error with cp_parser_simulate_error like we do a few lines earlier for class
template placeholders.


I agree that that's generally the approach that makes sense, but in this
case it regresses our diagnostic :(.  For example,

   int i = *(auto *) 0;

would give

q.C:1:11: error: expected primary-expression before ‘auto’
 1 | int i = *(auto *) 0;
   |   ^~~~
q.C:1:11: error: expected ‘)’ before ‘auto’
 1 | int i = *(auto *) 0;
   |  ~^~~~
   |   )

instead of the current

q.C:1:11: error: invalid use of ‘auto’
 1 | int i = *(auto *) 0;
   |   ^~~~

We just reject the parse in cp_parser_type_id_1 and then give an error in
cp_parser_primary_expression:

   cp_parser_error (parser, "expected primary-expression");

I suppose I could add 'case RID_AUTO' to cp_parser_primary_expression and
issue an error there, but that doesn't understand decltype(auto) etc, and
still issues multiple error messages.


Or, maybe it would be OK to actually go with the cp_parser_simulate_error
approach and accept that about 5 tests produce somewhat worse diagnostic.

What's your preference?


Hmm.

auto( could be the beginning of e.g. auto(*)(), which is also a type-id, 
and might trip your assert instead of giving an error.


So I think the latter is the way to go.

I wonder about some time establishing a pattern in the parser that if a 
tentative parse results in error_mark_node without simulating an error, 
we repeat the same parse again to get the desired semantic error.  But 
that's a big project, not something to address this bug.



The second hunk broke lambda-generic-85713-2.C but I think the error we
issue with this patch is in fact correct, and clang++ agrees.


I don't think this is the second hunk anymore.  :)


Ah, fixed.

Marek

Re: [PATCH v2 3/5] fix up compute_objsize: factor out PHI handling

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:32 AM, Martin Sebor wrote:

Attached is subset of the patch in part (4) below: factor out
PHI handling.  It applies on top of patch 3/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-pointer_query-refactor-3.diff

commit 6ac1d37947ad5cf07fe133faaf8414f00e0eed13
Author: Martin Sebor 
Date:   Mon Dec 6 09:23:22 2021 -0700

 Introduce access_ref::merge_ref.
 
 gcc/ChangeLog:
 
 * pointer-query.cc (access_ref::merge_ref): Define new function.

 (access_ref::get_ref): Move code into merge_ref and call it.
 * pointer-query.h (access_ref::merge_ref): Declare new function.
OK.  But it's probably worth noting that this patch does more than just 
factoring out the PHI handling.  It also adds the MIN/MAX bits noted in 
the original cover letter.   That's not inherently as bad now that this 
patch isn't intermixed with the other work.





diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index c75c4da6b60..24fbac84ec4 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc




@ -766,7 +818,14 @@ access_ref::get_ref (vec *all_refs,
  
/* Avoid changing *THIS.  */

if (pref && pref != this)
-*pref = phi_ref;
+{
+  /* Keep the SSA_NAME of the PHI unchanged so that all PHI arguments
+can be referred to later if necessary.  This is useful even if
+they all refer to the same object.  */
+  tree ref = pref->ref;
+  *pref = phi_ref;
+  pref->ref = ref;
+}

I don't see any mention of this in the ChangeLog.

So I'm fine with the patch itself.  I would just ask for a better 
ChangeLog.  If one was to read the current ChangeLog they could easily 
be led to believe this patch was just refactoring, but it brings in 
other changes as well.


Thanks,

Jeff

[pushed] libgcc, Darwin: Build a libgcc_s.1 for backwards compatibility.

2021-12-08 Thread Iain Sandoe via Gcc-patches

In order to reslve a long-standing issue with inter-operation
with libSystem, we have bumped the SO name for libgcc_s.

Distributions might wish to install this new version into a
structure where exisiting code is already linked with the
compiler-local libgcc_s.1 (providing symbols exported by the
now-retired libgcc_ext.10.x shims).

The replacement libgcc_s.1 forwards the symbols from the new SO.
In order to support DYLD_LIBRARY_PATH on systems (where it works)
we forward the libSystem unwinder symbols from 10.7+ and a
compiler-local version of the libgcc unwinder on earlier.

For macOS 10.4 to 10.6 this is 'bug-compatible' with existing uses.
For 10.7+ the behaviour will now actually be correct.

This should be squashed with the initial libgcc changes for PR80556
in any backport (r12-5418-gd4943ce939d).

tested on powerpc, i686, x86_64 and aarch64 (where this will be a NOP
since there are no historical uses to cater for).
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 
 
libgcc/ChangeLog:

* config.host (*-*-darwin*): Add logic to build a shared
unwinder library for Darwin8-10.
* config/i386/t-darwin: Build legacy libgcc_s.1.
* config/rs6000/t-darwin: Likewise.
* config/t-darwin: Reorganise the EH fragments to place
them for inclusion in a shared EH lib.
* config/t-slibgcc-darwin: Build a legacy libgcc_s.1 and
the supporting pieces (all FAT libs).
* config/t-darwin-noeh: Removed.
* config/darwin-unwind.ver: New file.
* config/rs6000/t-darwin-ehs: New file.
* config/t-darwin-ehs: New file.
---
 libgcc/config.host|  13 ++-
 libgcc/config/darwin-unwind.ver   |  30 +++
 libgcc/config/i386/t-darwin   |   3 +
 libgcc/config/rs6000/t-darwin |   3 +
 libgcc/config/rs6000/t-darwin-ehs |   5 ++
 libgcc/config/t-darwin|  19 -
 libgcc/config/t-darwin-ehs|   4 +
 libgcc/config/t-darwin-noeh   |   4 -
 libgcc/config/t-slibgcc-darwin| 130 ++
 9 files changed, 186 insertions(+), 25 deletions(-)
 create mode 100644 libgcc/config/darwin-unwind.ver
 create mode 100644 libgcc/config/rs6000/t-darwin-ehs
 create mode 100644 libgcc/config/t-darwin-ehs
 delete mode 100644 libgcc/config/t-darwin-noeh

diff --git a/libgcc/config.host b/libgcc/config.host
index 9475246593b..d4c035c7188 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -215,10 +215,17 @@ case ${host} in
 *-*-darwin*)
   asm_hidden_op=.private_extern
   tmake_file="$tmake_file t-darwin ${cpu_type}/t-darwin t-libgcc-pic"
-  extra_parts="crt3.o libd10-uwfef.a crttms.o crttme.o libemutls_w.a"
   # The unwinder is provided by the system shared libraries, do not add one
-  # to the shared libgcc.
-  tmake_file="$tmake_file t-darwin-noeh t-slibgcc-darwin"
+  # to the shared libgcc but, for older systems, we build a shared unwinder
+  # separately so that we can construct a libgcc_s.1 to use for binaries
+  # linked against the old libgcc_ext.10.x stubs.
+  case ${host} in
+*-*-darwin[89]* | *-*-darwin10*)
+  tmake_file="$tmake_file t-darwin-ehs ${cpu_type}/t-darwin-ehs"
+  ;;
+  esac
+  tmake_file="$tmake_file t-slibgcc-darwin"
+  extra_parts="crt3.o libd10-uwfef.a crttms.o crttme.o libemutls_w.a"
   ;;
 *-*-dragonfly*)
   tmake_file="$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip"
diff --git a/libgcc/config/darwin-unwind.ver b/libgcc/config/darwin-unwind.ver
new file mode 100644
index 000..fb74cb2c5d2
--- /dev/null
+++ b/libgcc/config/darwin-unwind.ver
@@ -0,0 +1,30 @@
+# unwinder
+__Unwind_DeleteException
+__Unwind_Find_FDE
+__Unwind_ForcedUnwind
+__Unwind_GetGR
+__Unwind_GetIP
+__Unwind_GetLanguageSpecificData
+__Unwind_GetRegionStart
+__Unwind_GetTextRelBase
+__Unwind_GetDataRelBase
+__Unwind_RaiseException
+__Unwind_Resume
+__Unwind_SetGR
+__Unwind_SetIP
+__Unwind_FindEnclosingFunction
+__Unwind_GetCFA
+__Unwind_Backtrace
+__Unwind_Resume_or_Rethrow
+__Unwind_GetIPInfo
+
+___register_frame
+___register_frame_table
+___register_frame_info
+___register_frame_info_bases
+___register_frame_info_table
+___register_frame_info_table_bases
+
+___deregister_frame
+___deregister_frame_info
+___deregister_frame_info_bases
diff --git a/libgcc/config/i386/t-darwin b/libgcc/config/i386/t-darwin
index c6b3acaaca2..4c18da1efbf 100644
--- a/libgcc/config/i386/t-darwin
+++ b/libgcc/config/i386/t-darwin
@@ -4,3 +4,6 @@ LIB2FUNCS_EXCLUDE = _fixtfdi _fixunstfdi _floatditf _floatunditf
 
 # Extra symbols for this port.
 SHLIB_MAPFILES += $(srcdir)/config/i386/libgcc-darwin.ver
+
+# Build a legacy libgcc_s.1
+BUILD_LIBGCCS1 = YES
diff --git a/libgcc/config/rs6000/t-darwin b/libgcc/config/rs6000/t-darwin
index 8b513bdb1d7..183d0df92ce 100644
--- a/libgcc/config/rs6000/t-darwin
+++ b/libgcc/config/rs6000/t-darwin
@@ -56,3 +56,6 @@ unwind-dw2_s.o: HOST_LIBGCC2_CFLAGS += -maltivec
 unwind-dw2.o: HOST_LIBGCC2_CFLAGS += -maltivec
 
 LIB2ADDEH += $(srcdir)/config/rs6000

[pushed] Darwin: Amend pie options when linking mdynamic-no-pic.

2021-12-08 Thread Iain Sandoe via Gcc-patches

On i686 Darwin from macOS 10.7 onwards the default is to
link executables as PIE, which conflicts with code generated
using mdynamic-no-pic.  Rather than warn about this and then
get the user to add -Wl,-no_pie, we can inject this in the
link specs.

tested on i686,powerpc-darwin9,i686-darwin17, x86_64-darwin,
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/darwin.h (DARWIN_PIE_SPEC): Add -no_pie when
linking mdynamic-no-pic code on macOS > 10.7.
---
 gcc/config/darwin.h | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 9895e7f99ec..0ce13207ad6 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -288,13 +288,17 @@ extern GTY(()) int darwin_ms_struct;
 #define DARWIN_RDYNAMIC "%{rdynamic:%nrdynamic is not supported}"
 #endif
 
-/* FIXME: we should check that the linker supports the -pie and -no_pie.
+/* Code built with mdynamic-no-pic does not support PIE/PIC, so  we disallow
+   these combinations; we also ensure that the no_pie option is passed to
+   ld64 on system versions that default to PIE when mdynamic-no-pic is given.
+   FIXME: we should check that the linker supports the -pie and -no_pie.
options.  */
 #define DARWIN_PIE_SPEC \
 "%{pie|fpie|fPIE:\
%{mdynamic-no-pic: \
  %n'-mdynamic-no-pic' overrides '-pie', '-fpie' or '-fPIE'; \
- :%:version-compare(>= 10.5 mmacosx-version-min= -pie) }} "
+ :%:version-compare(>= 10.5 mmacosx-version-min= -pie) }; \
+   mdynamic-no-pic:%:version-compare(>= 10.7 mmacosx-version-min= -no_pie) } "
 
 #define DARWIN_NOPIE_SPEC \
 "%{no-pie|fno-pie|fno-PIE: \
-- 
2.24.3 (Apple Git-128)

Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 19:21, Jonathan Wakely wrote:
>
> On Wed, 8 Dec 2021 at 19:17, Rainer Orth wrote:
> >
> > Hi Jonathan,
> >
> > > I've pushed this change to trunk now (it was posted and reviewed in
> > > stage 1, I just didn't get around to pushing it until now).
> > >
> > > The final version of the patch is attached to this mail.
> >
> > unfortunately, it breaks Solaris/SPARC bootstrap:
> >
> > In file included from 
> > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr.h:53,
> >  from 
> > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/memory:77,
> >  from 
> > /vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:82:
> > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:
> >  In member function 'void std::_Sp_counted_base<_Lp>::_M_release() [with 
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]':
> > /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:329:26:
> >  error: right operand of shift expression '(1 << 64)' is greater than or 
> > equal to the precision 64 of the left operand [-fpermissive]
> >   329 | = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
> >   | ~^
> > make[9]: *** [Makefile:1875: 
> > sparc-sun-solaris2.11/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> >
> > For 64-bit SPARC, _Atomic_word is long.
>
> Ah yes, so we need to disable this optimization. Patch coming up ...

Gah, I remembered to check that:

  constexpr bool __double_word
= sizeof(long long) == 2 * sizeof(_Atomic_word);
  // The ref-count members follow the vptr, so are aligned to
  // alignof(void*).
  constexpr bool __aligned = __alignof(long long) <= alignof(void*);
  if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)


But for C++11 and C++14 that is a normal runtime condition not
if-constexpr, so the undefined shift still gets compiled, even though
it can never be reached at runtime.

Re: [PATCH] pru: Fixup flags for .pru_irq_map section

2021-12-08 Thread Dimitar Dimitrov

On Fri, Dec 03, 2021 at 11:33:48PM +0200, Dimitar Dimitrov wrote:
> I intend to merge this patch next week, unless I hear objections.  I
> consider it a bug fix which fits the Stage 3 criteria.  It fixes the
> RPMSG firmware examples in the latest version 6.0 of TI's PRU Software
> Package.
> 
> The .pru_irq_map section has been introduced by Linux kernel 5.10:
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c75c9fdac66efd8b54773368254ef330c276171b
> This section must not be loaded into target memory.
> 
> Binutils already includes the corresponding fix in the linker script:
>   
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=44b357eb9aefc77a8385e631d8e3035a664f2333
> 
> gcc/ChangeLog:
> 
>   * config/pru/pru.c (pru_section_type_flags): New function.
>   (TARGET_SECTION_TYPE_FLAGS): Wire it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/pru/pru_irq_map.c: New test.
> 
> Signed-off-by: Dimitar Dimitrov 


Committed.

Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 19:17, Rainer Orth wrote:
>
> Hi Jonathan,
>
> > I've pushed this change to trunk now (it was posted and reviewed in
> > stage 1, I just didn't get around to pushing it until now).
> >
> > The final version of the patch is attached to this mail.
>
> unfortunately, it breaks Solaris/SPARC bootstrap:
>
> In file included from 
> /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr.h:53,
>  from 
> /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/memory:77,
>  from 
> /vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:82:
> /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:
>  In member function 'void std::_Sp_counted_base<_Lp>::_M_release() [with 
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]':
> /var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:329:26:
>  error: right operand of shift expression '(1 << 64)' is greater than or 
> equal to the precision 64 of the left operand [-fpermissive]
>   329 | = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
>   | ~^
> make[9]: *** [Makefile:1875: 
> sparc-sun-solaris2.11/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
>
> For 64-bit SPARC, _Atomic_word is long.

Ah yes, so we need to disable this optimization. Patch coming up ...

Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-12-08 Thread Rainer Orth

Hi Jonathan,

> I've pushed this change to trunk now (it was posted and reviewed in
> stage 1, I just didn't get around to pushing it until now).
>
> The final version of the patch is attached to this mail.

unfortunately, it breaks Solaris/SPARC bootstrap:

In file included from 
/var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr.h:53,
 from 
/var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/memory:77,
 from 
/vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:82:
/var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:
 In member function 'void std::_Sp_counted_base<_Lp>::_M_release() [with 
__gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]':
/var/gcc/regression/master/11.4-gcc-gas/build/sparc-sun-solaris2.11/sparcv9/libstdc++-v3/include/bits/shared_ptr_base.h:329:26:
 error: right operand of shift expression '(1 << 64)' is greater than or equal 
to the precision 64 of the left operand [-fpermissive]
  329 | = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
  | ~^
make[9]: *** [Makefile:1875: 
sparc-sun-solaris2.11/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1

For 64-bit SPARC, _Atomic_word is long.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH v2 5/5] fix up compute_objsize: add a dump function

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:32 AM, Martin Sebor wrote:

Attached is the subset of the patch in part (5) below: Add
a new dump function.  It applies on top of patch 4/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-pointer_query-refactor-5.diff

commit 2054b01fb383560b96d51fabfe9dee6dbd611f4a
Author: Martin Sebor 
Date:   Mon Dec 6 09:52:32 2021 -0700

 Add a new dump function.
 
 gcc/ChangeLog:
 
 * pointer-query.cc (access_ref::dump): Define new function

 (pointer_query::dump): Call it.
 * pointer-query.h (access_ref::dump): Declare new function.
OK.  I think it's worth also noting in the ChangeLog the additional 
dumping you added with the "pointer_query cache contents (again)" hunk.


Jeff

Re: [PATCH v2 4/5] fix up compute_objsize: refactor it into helpers

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:32 AM, Martin Sebor wrote:

Attached is the subset of the patch in part (2) below: refactor
compute_objsize_r into helpers.  It applies on top of patch 3/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-pointer_query-refactor-4.diff

commit 7d1d8cc18678ccaec5f274919e0c9910ccfea86e
Author: Martin Sebor 
Date:   Mon Dec 6 09:33:32 2021 -0700

 Refactor compute_objsize_r into helpers.
 
 gcc/ChangeLog:
 
 * pointer-query.cc (compute_objsize_r): Add an argument.

 (gimple_call_return_array): Pass a new argument to 
compute_objsize_r.
 (access_ref::merge_ref): Same.
 (access_ref::inform_access): Add an argument and use it.
 (access_data::access_data): Initialize new member.
 (handle_min_max_size): Pass a new argument to compute_objsize_r.
 (handle_decl): New function.
 (handle_array_ref): Pass a new argument to compute_objsize_r.
 Avoid incrementing deref.
 (set_component_ref_size): New function.
 (handle_component_ref): New function.
 (handle_mem_ref): Pass a new argument to compute_objsize_r.
 Only increment deref after successfully computing object size.
 (handle_ssa_name): New function.
 (compute_objsize_r): Move code into helpers and call them.
 (compute_objsize): Pass a new argument to compute_objsize_r.
 * pointer-query.h (access_ref::inform_access): Add an argument.
 (access_data::ostype): New member.
Thanks.  This was a bit uglier than I expected to review -- but I think 
that largely stems from the somewhat chaotic nature of the original code 
where we have fragments like


  block A

  block B

  block C

Where blocks A & C are closely related and you've refactored them into a 
common function.  Parts of B may stay while other parts end up in a 
different refactored function.  In and of itself that usually isn't too 
bad, but in this case git diff made a bit of a mess out of things making 
the rearrangements harder to follow.  I chased most stuff around and it 
all looked sensible.


OK.

Thanks,
Jeff

[COMMITTED] bpf: avoid potential NULL pointer dereference

2021-12-08 Thread David Faust via Gcc-patches

[Committed as obvious.]

If the result from SSA_NAME_DEF_STMT is NULL, we could try to
dereference it anyway and ICE. Avoid this.

gcc/ChangeLog:

* config/bpf/bpf.c (handle_attr_preserve): Avoid calling
is_gimple_assign with a NULL pointer.
---
 gcc/config/bpf/bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 9d2c0bb6818..c054c1ead6b 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -1482,7 +1482,7 @@ handle_attr_preserve (function *fn)
  && TREE_CODE (TREE_OPERAND (expr, 0)) == SSA_NAME)
{
  gimple *def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (expr, 
0));
- if (is_gimple_assign (def_stmt))
+ if (def_stmt && is_gimple_assign (def_stmt))
expr = gimple_assign_rhs1 (def_stmt);
}
 
-- 
2.31.1

Re: [PATCH v2 2/5] fix up compute_objsize: pass GIMPLE statement to it

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:31 AM, Martin Sebor wrote:

Attached is the subset of the patch in part (3) below: Pass
GIMPLE statement to compute_objsize.  It applies on top of
patch 1/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-pointer_query-refactor-2.diff

commit 64d34f249fb98d53ae8fcb3b36b01c2b9b05ea4f
Author: Martin Sebor 
Date:   Sat Dec 4 16:57:48 2021 -0700

 Pass GIMPLE statement to compute_objsize.
 
 gcc/ChangeLog:
 
 * gimple-ssa-warn-restrict.c (builtin_access::builtin_access): Pass

 GIMPLE statement to compute_objsize.
 * pointer-query.cc (compute_objsize): Add a statement argument.
 * pointer-query.h (compute_objsize): Define a new overload.

OK
jeff

Re: [PATCH v2 1/5] fix up compute_objsize: move bndrng into access_data

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:31 AM, Martin Sebor wrote:

Attached is the subset of the patch in part (1) below:  Move
bndrng from access_ref to access_data.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-pointer_query-refactor-1.diff

commit 1062c1154beeeae26e86f053946ea733ffb5d136
Author: Martin Sebor 
Date:   Sat Dec 4 16:46:17 2021 -0700

 Move bndrng from access_ref to access_data.
 
 gcc/ChangeLog:
 
 * gimple-ssa-warn-access.cc (check_access): Adjust to member name

 change.
 (pass_waccess::check_strncmp): Same.
 * pointer-query.cc (access_ref::access_ref): Remove arguments.
 Simpilfy.
 (access_data::access_data): Define new ctors.
 (access_data::set_bound): Define new member function.
 (compute_objsize_r): Remove unnecessary code.
 * pointer-query.h (struct access_ref): Remove ctor arguments.
 (struct access_data): Declare ctor overloads.
 (access_data::dst_bndrng): New member.
 (access_data::src_bndrng): New member.

OK.  Thanks for breaking this out.

jeff

[r12-5832 Regression] FAIL: g++.target/i386/pr100738-1.C -std=gnu++98 scan-assembler-times vblendvps[ \\t] 2 on Linux/x86_64

2021-12-08 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

691f05c2197a7b79cb2d7fdbabe3182e22da320a is the first bad commit
commit 691f05c2197a7b79cb2d7fdbabe3182e22da320a
Author: Haochen Jiang 
Date:   Thu Dec 2 15:30:17 2021 +0800

Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for 
~op0

caused

FAIL: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-not vpxor[ \\t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-times 
vblendvps[ \\t] 2
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++17  scan-assembler-not vpxor[ \\t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++17  scan-assembler-times 
vblendvps[ \\t] 2
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++2a  scan-assembler-not vpxor[ \\t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++2a  scan-assembler-times 
vblendvps[ \\t] 2
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++98  scan-assembler-not vpxor[ \\t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++98  scan-assembler-times 
vblendvps[ \\t] 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5832/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=g++.target/i386/pr100738-1.C --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[r12-5835 Regression] FAIL: libgomp.c++/target-this-4.C execution test on Linux/x86_64

2021-12-08 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

0ab29cf0bb68960c1f87405f14b4fb2109254e2f is the first bad commit
commit 0ab29cf0bb68960c1f87405f14b4fb2109254e2f
Author: Chung-Lin Tang 
Date:   Wed Dec 8 22:28:03 2021 +0800

openmp: Improve OpenMP target support for C++ (PR92120)

caused

FAIL: libgomp.c++/target-lambda-1.C execution test
FAIL: libgomp.c++/target-this-3.C execution test
FAIL: libgomp.c++/target-this-4.C execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5835/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-lambda-1.C --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-lambda-1.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-lambda-1.C --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-lambda-1.C --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-3.C --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-3.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-3.C --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-3.C --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-4.C --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-4.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-4.C --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/target-this-4.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

Re: [PATCH v3] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-08 Thread Marek Polacek via Gcc-patches

On Wed, Dec 08, 2021 at 09:15:05AM -0500, Jason Merrill wrote:
> On 12/7/21 19:25, Marek Polacek wrote:
> > On Mon, Dec 06, 2021 at 04:44:06PM -0500, Jason Merrill wrote:
> > > Please also make this change to cp_parser_sizeof_operand, and add tests
> > > involving sizeof/alignof in array bounds.  OK with that change.
> > 
> > Turns out we reject sizeof(auto(4)) because cp_parser_type_id_1 errors
> > "invalid use of auto".  So I've added a hack to make it work; auto(x)
> > is *not* a type-id, so reject that parse and let it be parsed as an
> > expression.
> > 
> > FWIW, I don't think we need to clear 
> > auto_is_implicit_function_template_parm_p
> > in cp_parser_sizeof_operand for parameters like int[sizeof(auto(10))] 
> > because
> > the auto is in a declarator and auto_is_... will have been cleared already 
> > in
> > cp_parser_parameter_declaration before parsing the declarator.  But I've 
> > added
> > it anyway, maybe there are other cases where it matters.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
> > so
> > 
> >void f(decltype(auto(0)));
> > 
> > should be just as
> > 
> >void f(int);
> > 
> > but currently, everytime we see 'auto' in a parameter-declaration-clause,
> > we try to synthesize_implicit_template_parm for it, creating a new template
> > parameter list.  The code above actually has us calling s_i_t_p twice;
> > once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
> > fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
> > So it looks like we have f and we accept ill-formed code.
> > 
> > This shows that we need to be more careful about synthesizing the
> > implicit template parameter.  [dcl.spec.auto.general] says that "A
> > placeholder-type-specifier of the form type-constraintopt auto can be
> > used as a decl-specifier of the decl-specifier-seq of a
> > parameter-declaration of a function declaration or lambda-expression..."
> > so this patch turns off auto_is_... after we've parsed the 
> > decl-specifier-seq.
> > 
> > That doesn't quite cut it yet though, because we also need to handle an
> > auto nested in the decl-specifier:
> > 
> >void f(decltype(new auto{0}));
> > 
> > therefore the cp_parser_decltype change.
> > 
> > To accept "sizeof(auto{10})", the cp_parser_type_id_1 hunk rejects the
> > current parse if it sees an auto followed by a ( or {.
> 
> The problem here doesn't seem specific to the ( or {, but that we're giving
> a hard error in tentative parsing context; I think we want to guard that
> error with cp_parser_simulate_error like we do a few lines earlier for class
> template placeholders.

I agree that that's generally the approach that makes sense, but in this
case it regresses our diagnostic :(.  For example,

  int i = *(auto *) 0;

would give

q.C:1:11: error: expected primary-expression before ‘auto’
1 | int i = *(auto *) 0;
  |   ^~~~
q.C:1:11: error: expected ‘)’ before ‘auto’
1 | int i = *(auto *) 0;
  |  ~^~~~
  |   )

instead of the current

q.C:1:11: error: invalid use of ‘auto’
1 | int i = *(auto *) 0;
  |   ^~~~

We just reject the parse in cp_parser_type_id_1 and then give an error in
cp_parser_primary_expression:

  cp_parser_error (parser, "expected primary-expression");

I suppose I could add 'case RID_AUTO' to cp_parser_primary_expression and
issue an error there, but that doesn't understand decltype(auto) etc, and
still issues multiple error messages.


Or, maybe it would be OK to actually go with the cp_parser_simulate_error
approach and accept that about 5 tests produce somewhat worse diagnostic.

What's your preference?

> > The second hunk broke lambda-generic-85713-2.C but I think the error we
> > issue with this patch is in fact correct, and clang++ agrees.
> 
> I don't think this is the second hunk anymore.  :)

Ah, fixed.

Marek

Re: [PATCH] Use -fopt-info in unswitch pass.

2021-12-08 Thread Martin Liška


On 12/8/21 16:23, Richard Biener wrote:

Likewise.

Otherwise looks OK.


Fine, I fixed all the notes and installed the patch as 
df704591a2cad3526456aa77be403d21c822724d.

Cheers,
Martin

Re: [PATCH] libstdc++: Allow std::condition_variable waits to be cancelled [PR103382]

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 17:36, Ville Voutilainen wrote:
>
> On Wed, 8 Dec 2021 at 19:27, Jonathan Wakely via Libstdc++
>  wrote:
> > After resolving a PEBKAC issue, here's an incremental diff that
> > preserves the old behaviour for the existing @GLIBCXX_3.4.11 symbol,
> > but adds a new @@GLIBCXX_3.4.30 symbol that supports cancellation via
> > __forced_unwind.
> >
> > Maybe we should also do this in the implementation of the old noexcept 
> > function:
> >
> > __attribute__((used))
> > void
> > __nothrow_wait_cv::wait(std::unique_lock& lock) noexcept
> > {
> >   int old;
> >   int err = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
> >   this->condition_variable::wait(lock);
> >   if (!err && old != PTHREAD_CANCEL_DISABLE)
> > pthread_setcancelstate(old, &old);
> > }
> >
> > This would prevent cancellation from terminating a process if it uses
> > the old symbol. So we'd have a new symbol that supports cancellation,
> > and an old one that safely disables it.
>
> That sounds good to me.

I think I'll disable cancellation as a separate change, because it's
not what the existing symbol does, and we should look at doing it
anywhere else that will currently terminate.

> Also, I'm not sure it was pointed out, for the original: changing a
> noexcept function to start throwing can leak exceptions
> through other noexcept functions, hitting catch-blocks instead of
> terminates, or terminates that occur much later
> than intended. The compiler will assume that it doesn't need to set up
> the LSDA in a noexcept function if everything
> you call is noexcept, and then you don't have the LSDA that would
> terminate right then and there. That's probably
> a lesser problem for the thread cancellation exception than it would
> be for some others, but it's a blood-curdling/chilling possibility
> that we should just avoid. And you have done that, thanks for that.

Yes, and those kind of problems are probably more likely in practice,
because the compiler *always* treated that function as noexcept. Users
probably didn't care whether it was or not (and it isn't guaranteed to
be by the standard) so wouldn't have gone out of their way to write
code that depended on it being noexcept.

Re: [PATCH] enable -Winvalid-memory-order for C++ [PR99612]

2021-12-08 Thread Martin Sebor via Gcc-patches


On 12/8/21 10:14 AM, Jonathan Wakely wrote:

On Wed, 8 Dec 2021 at 16:49, Martin Sebor wrote:

I don't anticipate this change to lead to the same fallout
because it's unlikely for GCC to synthesize invalid memory
orders out of thin air;


Agreed. I don't think we'll have the same kind of issues. 99% of uses
of memory orders just use the constants explicitly, passing them
directly to the std::atomic member functions (or something that calls
them).


Ack.




and b) because the current solution
can only detect the problems in calls to atomic functions at
-O0 that are declared with attribute always_inline.  This
includes member functions defined in the enclosing atomic
class but not namespace-scope functions.  To make
the detection possible those would also have to be
always_inline.  If that's a change you'd like to see I can
look into making it happen.


I think we can ignore the namespace-scope functions in . Most people do.


I was thinking it would be nice to provide the same quality
for the C/C++ portability interface (see the test below that
triggers the warning at -O0 in C but requires -O1 to get it
in C++).  But it's not a big deal.

Martin

#if __cplusplus
#  include 
using std::memory_order_release;
using std::atomic_load_explicit;
extern std::atomic eai;
#else
#  include 
extern _Atomic int eai;
#endif

int load (void)
{
  return atomic_load_explicit (&eai, memory_order_release);
}

Re: [PATCH] libstdc++: Allow std::condition_variable waits to be cancelled [PR103382]

2021-12-08 Thread Ville Voutilainen via Gcc-patches

On Wed, 8 Dec 2021 at 19:27, Jonathan Wakely via Libstdc++
 wrote:
> After resolving a PEBKAC issue, here's an incremental diff that
> preserves the old behaviour for the existing @GLIBCXX_3.4.11 symbol,
> but adds a new @@GLIBCXX_3.4.30 symbol that supports cancellation via
> __forced_unwind.
>
> Maybe we should also do this in the implementation of the old noexcept 
> function:
>
> __attribute__((used))
> void
> __nothrow_wait_cv::wait(std::unique_lock& lock) noexcept
> {
>   int old;
>   int err = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
>   this->condition_variable::wait(lock);
>   if (!err && old != PTHREAD_CANCEL_DISABLE)
> pthread_setcancelstate(old, &old);
> }
>
> This would prevent cancellation from terminating a process if it uses
> the old symbol. So we'd have a new symbol that supports cancellation,
> and an old one that safely disables it.

That sounds good to me.

Also, I'm not sure it was pointed out, for the original: changing a
noexcept function to start throwing can leak exceptions
through other noexcept functions, hitting catch-blocks instead of
terminates, or terminates that occur much later
than intended. The compiler will assume that it doesn't need to set up
the LSDA in a noexcept function if everything
you call is noexcept, and then you don't have the LSDA that would
terminate right then and there. That's probably
a lesser problem for the thread cancellation exception than it would
be for some others, but it's a blood-curdling/chilling possibility
that we should just avoid. And you have done that, thanks for that.

Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-12-08 Thread Richard Earnshaw via Gcc-patches

On 08/12/2021 15:39, Murray Steele via Gcc-patches wrote:
> Hi,
> 
> Thank you for the feedback, I'll make the noted changes to the changelog and
> add the missing end-of-namespace comments.
> 
> On 08/12/2021 15:23, Richard Earnshaw wrote:
> 
>> diff --git a/gcc/config/arm/arm-mve-builtins.def 
>> b/gcc/config/arm/arm-mve-builtins.def
>> new file mode 100644
>> index 
>> ..02a46bec3e4cba6add9bce4021c732e15aa8b012
>> --- /dev/null
>> +++ b/gcc/config/arm/arm-mve-builtins.def
>> @@ -0,0 +1,41 @@
>>
>> +#ifndef DEF_MVE_TYPE
>> +#define DEF_MVE_TYPE(A, B)
>> +#endif
>>
>> When would this file ever be included when this macro wasn't defined? Better 
>> to require the caller to define this by using #error if it's missing.
>>
>> then...
>>
>> +
>> +#undef DEF_MVE_TYPE
>>
>> This isn't needed anymore, because caller should undef it after use.
> 
> 
> I'd added this because later patches that build from this series will most
> likely need to define further DEF_MVE_* macros, in the style of the current
> SVE implementation. You are right that it is unnecessary for right now though,
> and I'll remove it too.

The best thing to do in that case then is to require the caller to
explicitly define DEF_MVE_TYPE as a NOP when it isn't required.  It
means a bit more churn at each call site, but I think it's more robust
longer term as it is clear which operations are going to be extracted.

R.

> 
> Thanks again,
> 
> Murray
>

Re: [PATCH] libstdc++: Allow std::condition_variable waits to be cancelled [PR103382]

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 00:36, Jonathan Wakely wrote:
>
> On Tue, 7 Dec 2021 at 21:52, Florian Weimer wrote:
> >
> > * Jonathan Wakely:
> >
> > > On Tue, 7 Dec 2021, 21:20 Florian Weimer via Libstdc++, 
> > > 
> > > wrote:
> > >
> > >  * Jonathan Wakely via Libstdc:
> > >
> > >  > If necessary we could keep the terminate-on-cancellation behaviour as
> > >  > 
> > > _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE@GLIBCXX_3.4.11
> > >  > and export the new behaviour as @@GLIBCXX_3.4.30, although this patch
> > >  > doesn't do that.
> > >
> > >  Note that if this fix escapes into the wild and then you have to make
> > >  the symbol version change, you will break newer applications.  In a few
> > >  cases in glibc, we proactively added aliases at a different symbol
> > >  version, but with the same implementation (at first).
> > >
> > > To be safe, we probably should preserve the old behaviour for the old
> > > version of the symbol. If we decide that the new behaviour is always
> > > preferable, we could change that later by making the old symbol an
> > > alias for the new. If we don't decide that, we'll be glad we made it a
> > > separate symbol.
> >
> > On the other hand, with separate versions, it's possible to reintroduce
> > the old behavior at a later date, as a bugfix.  It's not strictly
> > necessary to do that work upfront.  It's just nice to have this option.
>
> Ah yes, a new symbol version gives us more flexibility in every direction.
>
> > > I'll see if I can get it working with two versioned symbols. We don't
> > > actually do that in libstdc++ currently, we only have a single version
> > > of every symbol.
> >
> > Ping me if you want to discuss options. 8->
>
> Thanks. I'll try it and let you know how I get on.

After resolving a PEBKAC issue, here's an incremental diff that
preserves the old behaviour for the existing @GLIBCXX_3.4.11 symbol,
but adds a new @@GLIBCXX_3.4.30 symbol that supports cancellation via
__forced_unwind.

Maybe we should also do this in the implementation of the old noexcept function:

__attribute__((used))
void
__nothrow_wait_cv::wait(std::unique_lock& lock) noexcept
{
  int old;
  int err = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
  this->condition_variable::wait(lock);
  if (!err && old != PTHREAD_CANCEL_DISABLE)
pthread_setcancelstate(old, &old);
}

This would prevent cancellation from terminating a process if it uses
the old symbol. So we'd have a new symbol that supports cancellation,
and an old one that safely disables it.
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 8f3c7b3827e..b747351a1b9 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1285,7 +1285,6 @@ GLIBCXX_3.4.11 {
 # condition_variable
 _ZNSt18condition_variable10notify_allEv;
 _ZNSt18condition_variable10notify_oneEv;
-_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE;
 _ZNSt18condition_variableC1Ev;
 _ZNSt18condition_variableC2Ev;
 _ZNSt18condition_variableD1Ev;
@@ -1295,6 +1294,12 @@ GLIBCXX_3.4.11 {
 _ZNSt22condition_variable_anyD1Ev;
 _ZNSt22condition_variable_anyD2Ev;
 
+#ifndef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
+# The original definition of this symbol gets versioned as @GLIBCXX_3.4.11
+# if ".symver" is supported, or as @@GLIBCXX_3.4.11 otherwise.
+_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE;
+#endif
+
 # thread
 _ZNSt6thread4joinEv;
 _ZNSt6thread6detachEv;
@@ -2401,6 +2406,11 @@ GLIBCXX_3.4.30 {
 
 _ZSt21__glibcxx_assert_fail*;
 
+#ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
+# The new definition of this symbol gets versioned as @@GLIBCXX_3.4.30
+_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE;
+#endif
+
 } GLIBCXX_3.4.29;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml 
b/libstdc++-v3/doc/xml/manual/evolution.xml
index 271d2225c3a..fd08cd84d20 100644
--- a/libstdc++-v3/doc/xml/manual/evolution.xml
+++ b/libstdc++-v3/doc/xml/manual/evolution.xml
@@ -1038,6 +1038,13 @@ The bitmap, mt, 
and pool--enable-libstdcxx-allocator were removed.
 
 
+
+std::condition_variable::wait changed to be
+noexcept(false) to allow thread cancellation exceptions to
+be thrown from pthread_cond_wait without aborting
+the process.
+
+
 
 
 
diff --git a/libstdc++-v3/src/c++11/compatibility-condvar.cc 
b/libstdc++-v3/src/c++11/compatibility-condvar.cc
index 575d78055cb..439f1844e2c 100644
--- a/libstdc++-v3/src/c++11/compatibility-condvar.cc
+++ b/libstdc++-v3/src/c++11/compatibility-condvar.cc
@@ -54,4 +54,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
+#if ! _GLIBCXX_INLINE_VERSION
+// XXX GLIBCXX_ABI Deprecated
+// gcc-12.1
+// std::condition_variable::wait changed to noexcept(false)
+#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
+&& defined(_

Re: [PATCH] AArch64: Improve rotate patterns

2021-12-08 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Improve and generalize rotate patterns. Rotates by more than half the
> bitwidth of a register are canonicalized to rotate left. Many existing
> shift patterns don't handle this case correctly, so add rotate left to
> the shift iterator and convert rotate left into ror during assembly
> output. Add missing zero_extend patterns for shifted BIC, ORN and EON.
>
> Passes bootstrap and regress. OK for commit?

OK, thanks.  I agree it should go in during stage 3 since handling
rotatert by a constant without rotate by a constant is a bug,
like you say.  And the port is supposed to be complete wrt.
zero_extend patterns.

FTR, the is_rotl isn't strictly necessary, since we could I think do:

  if ( == ROTATE)

But the fact that ROTATE is the left shift isn't very mnemonic,
so I agree the version in the patch is better.

Richard

>
> ChangeLog:
> 2021-12-07  Wilco Dijkstra  
>
> * config/aarch64/aarch64.md
> (and_3_compare0): Support rotate left.
> (and_si3_compare0_uxtw): Likewise.
> (_3): Likewise.
> (_si3_uxtw): Likewise.
> (one_cmpl_2): Likewise.
> (_one_cmpl_3): Likewise.
> (_one_cmpl_sidi_uxtw): New pattern.
> (eor_one_cmpl_3_alt): Support rotate left.
> (eor_one_cmpl_sidi3_alt_ze): Likewise.
> (and_one_cmpl_3_compare0): Likewise.
> (and_one_cmpl_si3_compare0_uxtw): Likewise.
> (and_one_cmpl_3_compare0_no_reuse): Likewise.
> (and_3nr_compare0): Likewise.
> (*si3_insn_uxtw): Use SHIFT_no_rotate.
> (rolsi3_insn_uxtw): New pattern.
> * config/aarch64/iterators.md (SHIFT): Add rotate left.
> (SHIFT_no_rotate): Add new iterator.
> (SHIFT:shift): Print rotate left as ror.
> (is_rotl): Add test for left rotate.
>
> * gcc.target/aarch64/ror_2.c: New test.
> * gcc.target/aarch64/ror_3.c: New test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 5297b2d3f95744ac72e36814c6676cc97478d48b..f80679bcb34ea07918b1a0304b32be436568095d
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4419,7 +4419,11 @@ (define_insn "*and_3_compare0"
> (set (match_operand:GPI 0 "register_operand" "=r")
> (and:GPI (SHIFT:GPI (match_dup 1) (match_dup 2)) (match_dup 3)))]
>""
> -  "ands\\t%0, %3, %1,  %2"
> +{
> +  if ()
> +operands[2] = GEN_INT ( - UINTVAL (operands[2]));
> +  return "ands\\t%0, %3, %1,  %2";
> +}
>[(set_attr "type" "logics_shift_imm")]
>  )
>
> @@ -4436,7 +4440,11 @@ (define_insn "*and_si3_compare0_uxtw"
> (zero_extend:DI (and:SI (SHIFT:SI (match_dup 1) (match_dup 2))
> (match_dup 3]
>""
> -  "ands\\t%w0, %w3, %w1,  %2"
> +{
> +  if ()
> +operands[2] = GEN_INT (32 - UINTVAL (operands[2]));
> +  return "ands\\t%w0, %w3, %w1,  %2";
> +}
>[(set_attr "type" "logics_shift_imm")]
>  )
>
> @@ -4447,7 +4455,11 @@ (define_insn "*_3"
>   (match_operand:QI 2 "aarch64_shift_imm_" "n"))
>  (match_operand:GPI 3 "register_operand" "r")))]
>""
> -  "\\t%0, %3, %1,  %2"
> +{
> +  if ()
> +operands[2] = GEN_INT ( - UINTVAL (operands[2]));
> +  return "\\t%0, %3, %1,  %2";
> +}
>[(set_attr "type" "logic_shift_imm")]
>  )
>
> @@ -4504,17 +4516,6 @@ (define_split
>"operands[3] = gen_reg_rtx (mode);"
>  )
>
> -(define_insn "*_rol3"
> -  [(set (match_operand:GPI 0 "register_operand" "=r")
> -   (LOGICAL:GPI (rotate:GPI
> - (match_operand:GPI 1 "register_operand" "r")
> - (match_operand:QI 2 "aarch64_shift_imm_" "n"))
> -(match_operand:GPI 3 "register_operand" "r")))]
> -  ""
> -  "\\t%0, %3, %1, ror #( - %2)"
> -  [(set_attr "type" "logic_shift_imm")]
> -)
> -
>  ;; zero_extend versions of above
>  (define_insn "*_si3_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r")
> @@ -4524,19 +4525,11 @@ (define_insn "*_si3_uxtw"
>   (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
>  (match_operand:SI 3 "register_operand" "r"]
>""
> -  "\\t%w0, %w3, %w1,  %2"
> -  [(set_attr "type" "logic_shift_imm")]
> -)
> -
> -(define_insn "*_rolsi3_uxtw"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -   (zero_extend:DI
> -(LOGICAL:SI (rotate:SI
> - (match_operand:SI 1 "register_operand" "r")
> - (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
> -(match_operand:SI 3 "register_operand" "r"]
> -  ""
> -  "\\t%w0, %w3, %w1, ror #(32 - %2)"
> +{
> +  if ()
> +operands[2] = GEN_INT (32 - UINTVAL (operands[2]));
> +  return "\\t%w0, %w3, %w1,  %2";
> +}
>[(set_attr "type" "logic_shift_imm")]
>  )
>
> @@ -4565,7 +4558,11 @@ (define_insn "*one_cmpl_2"
> (not:GPI (SHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>

Re: [PATCH] enable -Winvalid-memory-order for C++ [PR99612]

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 16:49, Martin Sebor wrote:
> I don't anticipate this change to lead to the same fallout
> because it's unlikely for GCC to synthesize invalid memory
> orders out of thin air;

Agreed. I don't think we'll have the same kind of issues. 99% of uses
of memory orders just use the constants explicitly, passing them
directly to the std::atomic member functions (or something that calls
them).

>and b) because the current solution
> can only detect the problems in calls to atomic functions at
> -O0 that are declared with attribute always_inline.  This
> includes member functions defined in the enclosing atomic
> class but not namespace-scope functions.  To make
> the detection possible those would also have to be
> always_inline.  If that's a change you'd like to see I can
> look into making it happen.

I think we can ignore the namespace-scope functions in . Most people do.

Re: [PR103097] tolerate reg-stack cross-block malformed asms

2021-12-08 Thread Jeff Law via Gcc-patches





On 12/7/2021 7:00 PM, Alexandre Oliva via Gcc-patches wrote:

The testcase shows malformed asms in one block confuse reg-stack logic
in another block.  Moving the resetting of any_malformed_asm to the
end of the pass enables it to take effect throughout the affected
function.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR target/103097
* reg-stack.c (convert_regs_1): Move any_malformed_asm
resetting...
(reg_to_stack): ... here.

for  gcc/testsuite/ChangeLog

PR target/103097
* gcc.target/i386/pr103097.c: New.
So it's "stickier" after your change.  ie, instead of indicating if 
there was a malformed insn in a block, it's did we find a malformed insn 
anywhere.  Which implies the comment before the declaration of 
any_malformed_asm needs a trivial update since it stated "malformed 
insns in a block".


OK with the trivial comment update.

jeff

[PATCH] enable -Winvalid-memory-order for C++ [PR99612]

2021-12-08 Thread Martin Sebor via Gcc-patches


Even with -Wno-system-headers enabled, the -Winvalid-memory-order
code tries to make sure calls to atomic functions with invalid
memory orders are diagnosed even though the C atomic functions
are defined as macros in the  system header.
The warning triggers at all optimization levels, including -O0.

Independently, the core diagnostic enhancements implemented earlier
this year (the warning group control) enable warnings for functions
defined in system headers that are inlined into user code.  This
was done for similar reason as above: because it's desirable to
diagnose invalid calls made from user code to system functions
(e.g., buffer overflows, invalid or mismatched deallocations,
etc.)

However, the C macro solution interferes with the code diagnostic
changes and prevents the invalid memory model warnings from being
issued for the same problems in C++.  In addition, because C++
atomics are ordinary (inline) functions that call the underlying
__atomic_xxx built-ins, the invalid memory orders can only be
detected with both inlining and constant propagation enabled.

The attached patch removes these limitations and enables
-Winvalid-memory-order to trigger even for C++ std::atomic,
(almost) just like it does in C, at all optimization levels
including -O0.

To make that possible I had to move -Winvalid-memory-order from
builtins.c to a GIMPLE pass where it can use context-sensitive
range info at -O0, instead of relying on constant propagation
(only available at -O1 and above).  Although the same approach
could be used to emit better object code for C++ atomics at -O0
(i.e., use the right memory order instead of dropping to seq_cst),
this patch doesn't do that.)

In addition to enabling the warning I've also enhanced it to
include the memory models involved in the diagnosed call (both
the problem ones and the viable alternatives).

Tested on x86_64-linux.

Jonathan, I CC you for two reasons: a) because this solution
is based on your (as well as my own) preference for handling
C++ system headers, and because of our last week's discussion
of the false positives in std::string resulting from the same
choice there.

I don't anticipate this change to lead to the same fallout
because it's unlikely for GCC to synthesize invalid memory
orders out of thin air; and b) because the current solution
can only detect the problems in calls to atomic functions at
-O0 that are declared with attribute always_inline.  This
includes member functions defined in the enclosing atomic
class but not namespace-scope functions.  To make
the detection possible those would also have to be
always_inline.  If that's a change you'd like to see I can
look into making it happen.

Martin
PR middle-end/99612 - Remove "#pragma GCC system_header" from atomic file to warn on incorrect memory order

gcc/ChangeLog:

	PR middle-end/99612
	* builtins.c (get_memmodel): Move warning code to
	gimple-ssa-warn-access.cc.
	(expand_builtin_atomic_compare_exchange): Same.
	(expand_ifn_atomic_compare_exchange): Same.
	(expand_builtin_atomic_load): Same.
	(expand_builtin_atomic_store): Same.
	(expand_builtin_atomic_clear): Same.
	* doc/extend.texi (__atomic_exchange_n): Update valid memory
	models.
	* gimple-ssa-warn-access.cc (memmodel_to_uhwi): New function.
	(struct memmodel_pair): New struct.
	(memmodel_name): New function.
	(pass_waccess::maybe_warn_memmodel): New function.
	(pass_waccess::check_atomic_memmodel): New function.
	(pass_waccess::check_atomic_builtin): Handle memory model.
	* input.c (expansion_point_location_if_in_system_header): Return
	original location if expansion location is in a system header.

gcc/testsuite/ChangeLog:

	PR middle-end/99612
	* c-c++-common/pr83059.c: Adjust text of expected diagnostics.
	* gcc.dg/atomic-invalid-2.c: Same.
	* gcc.dg/atomic-invalid.c: Same.
	* c-c++-common/Winvalid-memory-model.c: New test.
	* g++.dg/warn/Winvalid-memory-model-2.C: New test.
	* g++.dg/warn/Winvalid-memory-model.C: New test.


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 03829c03a5a..e7d0c276751 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5790,35 +5790,22 @@ expand_builtin_sync_lock_release (machine_mode mode, tree exp)
 static enum memmodel
 get_memmodel (tree exp)
 {
-  rtx op;
-  unsigned HOST_WIDE_INT val;
-  location_t loc
-= expansion_point_location_if_in_system_header (input_location);
-
   /* If the parameter is not a constant, it's a run time value so we'll just
  convert it to MEMMODEL_SEQ_CST to avoid annoying runtime checking.  */
   if (TREE_CODE (exp) != INTEGER_CST)
 return MEMMODEL_SEQ_CST;
 
-  op = expand_normal (exp);
+  rtx op = expand_normal (exp);
 
-  val = INTVAL (op);
+  unsigned HOST_WIDE_INT val = INTVAL (op);
   if (targetm.memmodel_check)
 val = targetm.memmodel_check (val);
   else if (val & ~MEMMODEL_MASK)
-{
-  warning_at (loc, OPT_Winvalid_memory_model,
-		  "unknown architecture specifier in memory model to builtin");
-  return MEMMODEL_SEQ_CST;
-}
+retur

[PATCH] AArch64: Improve rotate patterns

2021-12-08 Thread Wilco Dijkstra via Gcc-patches

Improve and generalize rotate patterns. Rotates by more than half the
bitwidth of a register are canonicalized to rotate left. Many existing
shift patterns don't handle this case correctly, so add rotate left to
the shift iterator and convert rotate left into ror during assembly
output. Add missing zero_extend patterns for shifted BIC, ORN and EON.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-12-07  Wilco Dijkstra  

* config/aarch64/aarch64.md
(and_3_compare0): Support rotate left.
(and_si3_compare0_uxtw): Likewise.
(_3): Likewise.
(_si3_uxtw): Likewise.
(one_cmpl_2): Likewise.
(_one_cmpl_3): Likewise.
(_one_cmpl_sidi_uxtw): New pattern.
(eor_one_cmpl_3_alt): Support rotate left.
(eor_one_cmpl_sidi3_alt_ze): Likewise.
(and_one_cmpl_3_compare0): Likewise.
(and_one_cmpl_si3_compare0_uxtw): Likewise.
(and_one_cmpl_3_compare0_no_reuse): Likewise.
(and_3nr_compare0): Likewise.
(*si3_insn_uxtw): Use SHIFT_no_rotate.
(rolsi3_insn_uxtw): New pattern.
* config/aarch64/iterators.md (SHIFT): Add rotate left.
(SHIFT_no_rotate): Add new iterator.
(SHIFT:shift): Print rotate left as ror.
(is_rotl): Add test for left rotate.

* gcc.target/aarch64/ror_2.c: New test.
* gcc.target/aarch64/ror_3.c: New test.

---

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5297b2d3f95744ac72e36814c6676cc97478d48b..f80679bcb34ea07918b1a0304b32be436568095d
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4419,7 +4419,11 @@ (define_insn "*and_3_compare0"
(set (match_operand:GPI 0 "register_operand" "=r")
(and:GPI (SHIFT:GPI (match_dup 1) (match_dup 2)) (match_dup 3)))]
   ""
-  "ands\\t%0, %3, %1,  %2"
+{
+  if ()
+operands[2] = GEN_INT ( - UINTVAL (operands[2]));
+  return "ands\\t%0, %3, %1,  %2";
+}
   [(set_attr "type" "logics_shift_imm")]
 )
 
@@ -4436,7 +4440,11 @@ (define_insn "*and_si3_compare0_uxtw"
(zero_extend:DI (and:SI (SHIFT:SI (match_dup 1) (match_dup 2))
(match_dup 3]
   ""
-  "ands\\t%w0, %w3, %w1,  %2"
+{
+  if ()
+operands[2] = GEN_INT (32 - UINTVAL (operands[2]));
+  return "ands\\t%w0, %w3, %w1,  %2";
+}
   [(set_attr "type" "logics_shift_imm")]
 )
 
@@ -4447,7 +4455,11 @@ (define_insn "*_3"
  (match_operand:QI 2 "aarch64_shift_imm_" "n"))
 (match_operand:GPI 3 "register_operand" "r")))]
   ""
-  "\\t%0, %3, %1,  %2"
+{
+  if ()
+operands[2] = GEN_INT ( - UINTVAL (operands[2]));
+  return "\\t%0, %3, %1,  %2";
+}
   [(set_attr "type" "logic_shift_imm")]
 )
 
@@ -4504,17 +4516,6 @@ (define_split
   "operands[3] = gen_reg_rtx (mode);"
 )
 
-(define_insn "*_rol3"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-   (LOGICAL:GPI (rotate:GPI
- (match_operand:GPI 1 "register_operand" "r")
- (match_operand:QI 2 "aarch64_shift_imm_" "n"))
-(match_operand:GPI 3 "register_operand" "r")))]
-  ""
-  "\\t%0, %3, %1, ror #( - %2)"
-  [(set_attr "type" "logic_shift_imm")]
-)
-
 ;; zero_extend versions of above
 (define_insn "*_si3_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -4524,19 +4525,11 @@ (define_insn "*_si3_uxtw"
  (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
 (match_operand:SI 3 "register_operand" "r"]
   ""
-  "\\t%w0, %w3, %w1,  %2"
-  [(set_attr "type" "logic_shift_imm")]
-)
-
-(define_insn "*_rolsi3_uxtw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (zero_extend:DI
-(LOGICAL:SI (rotate:SI
- (match_operand:SI 1 "register_operand" "r")
- (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
-(match_operand:SI 3 "register_operand" "r"]
-  ""
-  "\\t%w0, %w3, %w1, ror #(32 - %2)"
+{
+  if ()
+operands[2] = GEN_INT (32 - UINTVAL (operands[2]));
+  return "\\t%w0, %w3, %w1,  %2";
+}
   [(set_attr "type" "logic_shift_imm")]
 )
 
@@ -4565,7 +4558,11 @@ (define_insn "*one_cmpl_2"
(not:GPI (SHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
(match_operand:QI 2 "aarch64_shift_imm_" 
"n"]
   ""
-  "mvn\\t%0, %1,  %2"
+{
+  if ()
+operands[2] = GEN_INT ( - UINTVAL (operands[2]));
+  return "mvn\\t%0, %1,  %2";
+}
   [(set_attr "type" "logic_shift_imm")]
 )
 
@@ -4672,7 +4669,28 @@ (define_insn 
"_one_cmpl_3"
   (match_operand:QI 2 "aarch64_shift_imm_" "n")))
 (match_operand:GPI 3 "register_operand" "r")))]
   ""
-  "\\t%0, %3, %1,  %2"
+{
+  if ()
+operands[2] = GEN_INT ( - UINTVAL (operands[2]));
+  return "\\t%0, %3, %1,  %2";
+}
+  [(set_attr "type" "logic_shift_imm")]
+)
+
+;; Zero-extend version of the above.
+(define_insn "_one_cmpl_sidi_uxtw"
+  [(set

Re: [PATCH][_GLIBCXX_DEBUG] Enhance std::erase_if for vector/deque

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Sun, 21 Nov 2021 at 11:26, François Dumont via Libstdc++
 wrote:
>
> I tried to use the same approach I used for node based containers but
> got ambiguity on erase calls. I think this simple version will do the work.
>
>  libstdc++: [_GLIBCXX_DEBUG] Enhance std::erase_if for vector/deque
>
>  libstdc++-v3/ChangeLog:
>
>  * include/std/deque (erase_if): Use _GLIBCXX_STD_C
> container reference and
>  __niter_wrap to limit _GLIBCXX_DEBUG mode impact.
>  * include/std/vector (erase_if): Likewise.
>
> Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes.
>
> Ok to commit ?

OK, thanks.

Re: [patch, Fortran] IEEE support for aarch64-apple-darwin

2021-12-08 Thread Richard Earnshaw via Gcc-patches





On 08/12/2021 15:47, FX via Gcc-patches wrote:

Hi Richard,


This isn't a full review, but I do have a question: is this really specific to 
Darwin?  or is it really generic aarch64 code?  If the former, then the file 
name is not right and it should reflect the darwin-specific nature of the 
contents.  If the latter, then I wonder why most other fortran targets don't 
seem to have specific implementations of this.


The code is not specific to Darwin, but right now I chose to only enable on 
Darwin because:
- All glibc targets are covered already by using glibc  function calls
- I don’t know if there are other aarch64 targets that exist, support Fortran 
and IEEE, but don’t have glibc

I’d suggest other non-glibc aarch64 targets could be added to the support and 
enable this code, but I don’t want to do it unless it’s been tested there. IEEE 
support is optional in Fortran, so I suggest we keep it “opt-in”: targets where 
it’s known to work enable it, but it’s off by default on other targets.

I hope this explains the rationale.

FX




OK, that makes sense.  I wonder if a comment to that effect is warranted 
somewhere.


One tricky thing about aarch64 IEEE support is that trapping exceptions 
(raising a software exception, as opposed to just setting an exceptional 
condition bit in the FPSR) is implementation defined.  The only way to 
tell if your implementation can support trapping an exception is to try 
to set the trapping enable bit in the FPCR.  If the bit is read back and 
is non-zero then that exception supports trapping; if it reads back as 
zero, then trapping that exception is not supported.  More details can 
be found in the armv8 Arm ARM.


HTH,

R.

Re: [patch, Fortran] IEEE support for aarch64-apple-darwin

2021-12-08 Thread FX via Gcc-patches

Hi Richard,

> This isn't a full review, but I do have a question: is this really specific 
> to Darwin?  or is it really generic aarch64 code?  If the former, then the 
> file name is not right and it should reflect the darwin-specific nature of 
> the contents.  If the latter, then I wonder why most other fortran targets 
> don't seem to have specific implementations of this.

The code is not specific to Darwin, but right now I chose to only enable on 
Darwin because:
- All glibc targets are covered already by using glibc  function calls
- I don’t know if there are other aarch64 targets that exist, support Fortran 
and IEEE, but don’t have glibc

I’d suggest other non-glibc aarch64 targets could be added to the support and 
enable this code, but I don’t want to do it unless it’s been tested there. IEEE 
support is optional in Fortran, so I suggest we keep it “opt-in”: targets where 
it’s known to work enable it, but it’s off by default on other targets.

I hope this explains the rationale.

FX

Re: [PATCH] nvptx: Use cvt to perform sign-extension of truncation.

2021-12-08 Thread Tom de Vries via Gcc-patches

On 8/27/21 12:07 PM, Roger Sayle wrote:
> 
> This patch introduces some new define_insn rules to the nvptx backend,
> to perform sign-extension of a truncation (from and to the same mode),
> using a single cvt instruction.  As an example, the following function
> 
> int foo(int x) { return (char)x; }
> 
> with -O2 currently generates:
> 
>   mov.u32 %r24, %ar0;
>   mov.u32 %r26, %r24;
>   cvt.s32.s8  %value, %r26;
> 
> and with this patch, now generates:
> 
>   mov.u32 %r24, %ar0;
>   cvt.s32.s8  %value, %r24;
> 
> This patch has been tested on nvptx-none hosted by x86_64-pc-linux-gnu
> with a top-level "make" (including newlib) and a "make check" with no
> new regressions.  Ok for mainline?
> 

Hi Roger,

thanks for the patch and apologies for the delay.

I was not happy with negative-testing-only in the test-case, so I've
split it up into 6 separate tests.

The first one is not influenced by the patch, so I've committed that one
separately.

The other 5 I've committed together with the patch.

Thanks,
- Tom

> 
> 2021-08-27  Roger Sayle  
> 
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (*extend_trunc_2_qi,
>   *extend_trunc_2_hi, *extend_trunc_di2_si): New insns.
>   Use cvt to perform sign-extension of truncation in one step.
> 
> gcc/testsuite/ChangeLog
>   * gcc.target/nvptx/exttrunc.c: New test case.
> 
> 
> Roger
> --
>

Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-12-08 Thread Murray Steele via Gcc-patches

Hi,

Thank you for the feedback, I'll make the noted changes to the changelog and
add the missing end-of-namespace comments.

On 08/12/2021 15:23, Richard Earnshaw wrote:

> diff --git a/gcc/config/arm/arm-mve-builtins.def 
> b/gcc/config/arm/arm-mve-builtins.def
> new file mode 100644
> index 
> ..02a46bec3e4cba6add9bce4021c732e15aa8b012
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -0,0 +1,41 @@
> 
> +#ifndef DEF_MVE_TYPE
> +#define DEF_MVE_TYPE(A, B)
> +#endif
> 
> When would this file ever be included when this macro wasn't defined? Better 
> to require the caller to define this by using #error if it's missing.
> 
> then...
> 
> +
> +#undef DEF_MVE_TYPE
> 
> This isn't needed anymore, because caller should undef it after use.

I'd added this because later patches that build from this series will most
likely need to define further DEF_MVE_* macros, in the style of the current
SVE implementation. You are right that it is unnecessary for right now though,
and I'll remove it too.

Thanks again,

Murray

Re: [patch, Fortran] IEEE support for aarch64-apple-darwin

2021-12-08 Thread Richard Earnshaw via Gcc-patches





On 06/12/2021 16:32, FX via Gcc-patches wrote:

Hi everyone,

Since support for target aarch64-apple-darwin has been submitted for review, 
it’s time to submit the Fortran part, i.e. enabling IEEE support on that target.

The patch has been in use now for several months, in a developer branch shipped 
by some distros on macOS (including Homebrew). It was authored more than a year 
ago, but I figured it wasn’t relevant to submit until the target was actually 
close to be in trunk: 
https://github.com/iains/gcc-darwin-arm64/commit/b107973550d3d9a9ce9acc751adbbe2171d13736

Bootstrapped and tested on aarch64-apple-darwin20 (macOS Big Sur) and 
aarch64-apple-darwin21 (macOS Monterey).

OK to merge?
Can someone point me to the right way of formatting ChangeLogs and commit 
entries, nowadays?


Thanks,
FX



This isn't a full review, but I do have a question: is this really 
specific to Darwin?  or is it really generic aarch64 code?  If the 
former, then the file name is not right and it should reflect the 
darwin-specific nature of the contents.  If the latter, then I wonder 
why most other fortran targets don't seem to have specific 
implementations of this.


R.

Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-12-08 Thread Richard Earnshaw via Gcc-patches





On 25/11/2021 09:42, Murray Steele via Gcc-patches wrote:

Changes from original patch:

1. Merged test_redef_* test files into one
2. Encapsulated contents of arm-mve-builtins.h in namespace arm_mve (missed
in initial patch).
3. Added extern declarations for scalar_types and acle_vector types to
arm-mve-builtins.h (missed in initial patch).
4. Added arm-mve-builtins.(cc|h) to gt_targets for arm-*-*-* (missed in
initial patch).
5. Added include for gt-arm-mve-builtins.h to arm-mve-builtins.cc (missed in
initial patch).
6. Removed explicit initialisation of handle_arm_mve_types_p as it is unneeded.

---

This patch moves the implementation of MVE ACLE types from
arm_mve_types.h to inside GCC via a new pragma, which replaces the prior
type definitions. This allows for the types to be used internally for
intrinsic function definitions.

Bootstrapped and regression tested on arm-none-linux-gnuabihf, and
regression tested on arm-eabi -- no issues.

Thanks,
Murray



Nearly there, but...

Your changelog entry needs some work:


gcc/ChangeLog:

 * config.gcc: Add arm-mve-builtins.o to extra_objs for arm-*-*-*
 targets.


 * config.gcc (arm*-*-*): Add arm-mve-builtins.o to extra_objs.


 * config/arm/arm-c.c (arm_pragma_arm): Handle new pragma.


 ... Handle "#pragma GCC arm".


 (arm_register_target_pragmas): Register new pragma.

 ... Register it.


 * config/arm/arm-protos.h: Add arm_mve namespace and declare
 arm_handle_mve_types_h.


 ...arm-protos.h (arm_mve::arm_handle_mve_types.h): New prototype.


 * config/arm/arm_mve_types.h: Replace MVE type definitions with
 new pragma.
 * config/arm/t-arm: Add arm-mve-builtins.o target.


.../t-arm (arm-mve-builtins.o): New target rule.


 * config/arm/arm-mve-builtins.cc: New file.
 * config/arm/arm-mve-builtins.def: New file.
 * config/arm/arm-mve-builtins.h: New file.

gcc/testsuite/ChangeLog:

 * gcc.target/arm/mve/mve.exp: Add new subdirectories.
 * gcc.target/arm/mve/general-c/type_redef_1.c: New test.
 * gcc.target/arm/mve/general/double_pragmas_1.c: New test.
 * gcc.target/arm/mve/general/nomve_1.c: New test.




diff --git a/gcc/config/arm/arm-mve-builtins.def 
b/gcc/config/arm/arm-mve-builtins.def

new file mode 100644
index 
..02a46bec3e4cba6add9bce4021c732e15aa8b012

--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins.def
@@ -0,0 +1,41 @@

+#ifndef DEF_MVE_TYPE
+#define DEF_MVE_TYPE(A, B)
+#endif

When would this file ever be included when this macro wasn't defined? 
Better to require the caller to define this by using #error if it's missing.


then...

+
+#undef DEF_MVE_TYPE

This isn't needed anymore, because caller should undef it after use.

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc

new file mode 100644
index 
..99ddc8d49aad39e057c1c0d349c6c02c278553d6

--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins.cc

...
+}
+
+}
+
+using namespace arm_mve;

Add a comment that this is the end of the earlier namespace declaration.

diff --git a/gcc/config/arm/arm-mve-builtins.h 
b/gcc/config/arm/arm-mve-builtins.h

new file mode 100644
index 
..c165ce6997b4560fc87626be4bbaa0e8afcbbfed

--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -0,0 +1,41 @@


...
+}
+

Likewise.

R.

Re: [PATCH] Use -fopt-info in unswitch pass.

2021-12-08 Thread Richard Biener via Gcc-patches

On December 8, 2021 10:32:20 AM GMT+01:00, "Martin Liška"  
wrote:
>The patch is about porting of dump information to -fopt-info so
>that we can compare the current pass with a modified one.
>
>Right now, there are 1945 'optimized: Unswitching loop on condition' lines
>for SPEC 2006 benchmark.
>
>Moreover, I adjusted dump functions in profile-count.{ch} so that it dumps
>to a string buffer.
>
>Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
>Ready to be installed?
>Thanks,
>Martin
>
>gcc/ChangeLog:
>
>   * profile-count.c (profile_count::dump): Add function
>   that can dump to a provided buffer.
>   (profile_probability::dump): Likewise.
>   * profile-count.h: Likewise.
>   * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop):
>   Use dump_printf_loc infrastructure.
>   (tree_unswitch_outer_loop): Likewise.
>   (find_loop_guard): Likewise.
>   (hoist_guard): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   * gcc.dg/loop-unswitch-1.c: Adjust test-case based on
>   dump_printf_loc.
>   * gcc.dg/loop-unswitch-2.c: Likewise.
>   * gcc.dg/loop-unswitch-3.c: Likewise.
>   * gcc.dg/loop-unswitch-4.c: Likewise.
>   * gcc.dg/loop-unswitch-5.c: Likewise.
>---
>  gcc/profile-count.c|  48 ++---
>  gcc/profile-count.h|   6 ++
>  gcc/testsuite/gcc.dg/loop-unswitch-1.c |   4 +-
>  gcc/testsuite/gcc.dg/loop-unswitch-2.c |   5 +-
>  gcc/testsuite/gcc.dg/loop-unswitch-3.c |   7 +-
>  gcc/testsuite/gcc.dg/loop-unswitch-4.c |   1 -
>  gcc/testsuite/gcc.dg/loop-unswitch-5.c |   2 +-
>  gcc/tree-ssa-loop-unswitch.c   | 131 +++--
>  8 files changed, 128 insertions(+), 76 deletions(-)
>
>diff --git a/gcc/profile-count.c b/gcc/profile-count.c
>index f7f4dffdc90..c04b4fe823d 100644
>--- a/gcc/profile-count.c
>+++ b/gcc/profile-count.c
>@@ -84,18 +84,28 @@ const char *profile_quality_display_names[] =
>"precise"
>  };
>  
>-/* Dump THIS to F.  */
>+/* Dump THIS to BUFFER.  */
>  
>  void
>-profile_count::dump (FILE *f) const
>+profile_count::dump (char *buffer) const
>  {
>if (!initialized_p ())
>-fprintf (f, "uninitialized");
>+sprintf (buffer, "uninitialized");
>else
>-fprintf (f, "%" PRId64 " (%s)", m_val,
>+sprintf (buffer, "%" PRId64 " (%s)", m_val,
>profile_quality_display_names[m_quality]);
>  }
>  
>+/* Dump THIS to F.  */
>+
>+void
>+profile_count::dump (FILE *f) const
>+{
>+  char buffer[64];
>+  dump (buffer);
>+  fputs (buffer, f);
>+}
>+
>  /* Dump THIS to stderr.  */
>  
>  void
>@@ -151,32 +161,44 @@ profile_count::stream_out (struct lto_output_stream *ob)
>streamer_write_uhwi_stream (ob, m_quality);
>  }
>  
>-/* Dump THIS to F.  */
>+
>+/* Output THIS to BUFFER.  */
>  
>  void
>-profile_probability::dump (FILE *f) const
>+profile_probability::dump (char *buffer) const
>  {
>if (!initialized_p ())
>-fprintf (f, "uninitialized");
>+sprintf (buffer, "uninitialized");
>else
>  {
>/* Make difference between 0.00 as a roundoff error and actual 0.
>Similarly for 1.  */
>if (m_val == 0)
>-fprintf (f, "never");
>+  buffer += sprintf (buffer, "never");
>else if (m_val == max_probability)
>-fprintf (f, "always");
>+  buffer += sprintf (buffer, "always");
>else
>-fprintf (f, "%3.1f%%", (double)m_val * 100 / max_probability);
>+  buffer += sprintf (buffer, "%3.1f%%", (double)m_val * 100 / 
>max_probability);
>+
>if (m_quality == ADJUSTED)
>-  fprintf (f, " (adjusted)");
>+  sprintf (buffer, " (adjusted)");
>else if (m_quality == AFDO)
>-  fprintf (f, " (auto FDO)");
>+  sprintf (buffer, " (auto FDO)");
>else if (m_quality == GUESSED)
>-  fprintf (f, " (guessed)");
>+  sprintf (buffer, " (guessed)");
>  }
>  }
>  
>+/* Dump THIS to F.  */
>+
>+void
>+profile_probability::dump (FILE *f) const
>+{
>+  char buffer[64];
>+  dump (buffer);
>+  fputs (buffer, f);
>+}
>+
>  /* Dump THIS to stderr.  */
>  
>  void
>diff --git a/gcc/profile-count.h b/gcc/profile-count.h
>index c7a45ac5ee3..f86091f23a8 100644
>--- a/gcc/profile-count.h
>+++ b/gcc/profile-count.h
>@@ -609,6 +609,9 @@ public:
>/* Output THIS to F.  */
>void dump (FILE *f) const;
>  
>+  /* Output THIS to BUFFER.  */
>+  void dump (char *buffer) const;
>+
>/* Print THIS to stderr.  */
>void debug () const;
>  
>@@ -1208,6 +1211,9 @@ public:
>/* Output THIS to F.  */
>void dump (FILE *f) const;
>  
>+  /* Output THIS to BUFFER.  */
>+  void dump (char *buffer) const;
>+
>/* Print THIS to stderr.  */
>void debug () const;
>  
>diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-1.c 
>b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
>index de2fb2c0e4b..f9d628df510 100644
>--- a/gcc/testsuite/gcc.dg/loop-unswitch-1.c
>+++ b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
>@@ -1,6 +1,6 @@
>  /* For PR rtl-optimization/27735  */

Re: [PATCH v3] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-08 Thread Jason Merrill via Gcc-patches


On 12/7/21 19:25, Marek Polacek wrote:

On Mon, Dec 06, 2021 at 04:44:06PM -0500, Jason Merrill wrote:

Please also make this change to cp_parser_sizeof_operand, and add tests
involving sizeof/alignof in array bounds.  OK with that change.


Turns out we reject sizeof(auto(4)) because cp_parser_type_id_1 errors
"invalid use of auto".  So I've added a hack to make it work; auto(x)
is *not* a type-id, so reject that parse and let it be parsed as an
expression.

FWIW, I don't think we need to clear auto_is_implicit_function_template_parm_p
in cp_parser_sizeof_operand for parameters like int[sizeof(auto(10))] because
the auto is in a declarator and auto_is_... will have been cleared already in
cp_parser_parameter_declaration before parsing the declarator.  But I've added
it anyway, maybe there are other cases where it matters.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
so

   void f(decltype(auto(0)));

should be just as

   void f(int);

but currently, everytime we see 'auto' in a parameter-declaration-clause,
we try to synthesize_implicit_template_parm for it, creating a new template
parameter list.  The code above actually has us calling s_i_t_p twice;
once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
So it looks like we have f and we accept ill-formed code.

This shows that we need to be more careful about synthesizing the
implicit template parameter.  [dcl.spec.auto.general] says that "A
placeholder-type-specifier of the form type-constraintopt auto can be
used as a decl-specifier of the decl-specifier-seq of a
parameter-declaration of a function declaration or lambda-expression..."
so this patch turns off auto_is_... after we've parsed the decl-specifier-seq.

That doesn't quite cut it yet though, because we also need to handle an
auto nested in the decl-specifier:

   void f(decltype(new auto{0}));

therefore the cp_parser_decltype change.

To accept "sizeof(auto{10})", the cp_parser_type_id_1 hunk rejects the
current parse if it sees an auto followed by a ( or {.


The problem here doesn't seem specific to the ( or {, but that we're 
giving a hard error in tentative parsing context; I think we want to 
guard that error with cp_parser_simulate_error like we do a few lines 
earlier for class template placeholders.



The second hunk broke lambda-generic-85713-2.C but I think the error we
issue with this patch is in fact correct, and clang++ agrees.


I don't think this is the second hunk anymore.  :)


The r11-1913 change is OK: we need to make sure that we see '(auto)' after
decltype to go ahead with 'decltype(auto)'.

PR c++/103401

gcc/cp/ChangeLog:

* parser.c (cp_parser_decltype): Clear
auto_is_implicit_function_template_parm_p.
(cp_parser_type_id_1): Reject this parse if we see auto(x) or auto{x}.
(cp_parser_parameter_declaration): Clear
auto_is_implicit_function_template_parm_p after parsing the
decl-specifier-seq.
(cp_parser_sizeof_operand): Clear
auto_is_implicit_function_template_parm_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-85713-2.C: Add dg-error.
* g++.dg/cpp23/auto-fncast7.C: New test.
* g++.dg/cpp23/auto-fncast8.C: New test.
* g++.dg/cpp23/auto-fncast9.C: New test.
---
  gcc/cp/parser.c   | 32 ++
  .../g++.dg/cpp1y/lambda-generic-85713-2.C |  2 +-
  gcc/testsuite/g++.dg/cpp23/auto-fncast7.C |  9 
  gcc/testsuite/g++.dg/cpp23/auto-fncast8.C | 42 +++
  gcc/testsuite/g++.dg/cpp23/auto-fncast9.C | 17 
  5 files changed, 101 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast7.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast8.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast9.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 55e6a1a8b3a..6f9f84631e5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16432,6 +16432,16 @@ cp_parser_decltype (cp_parser *parser)
= parser->greater_than_is_operator_p;
parser->greater_than_is_operator_p = true;
  
+  /* Don't synthesize an implicit template type parameter here.  This

+could happen with C++23 code like
+
+  void f(decltype(new auto{0}));
+
+where we want to deduce the auto right away so that the parameter
+is of type 'int *'.  */
+  auto cleanup = make_temp_override
+   (parser->auto_is_implicit_function_template_parm_p, false);
+
/* Do not actually evaluate the expression.  */
++cp_unevaluated_operand;
  
@@ -24142,6 +24152,16 @@ cp_parser_type_id_1 (cp_parser *parser, cp_parser_flags flags,

  /* OK */;
else if (parser->in_result_type_constraint_p)
  /* OK */;
+   else if ((cp_le

Re: [PATCH] c++: Support &typeid(x) == &typeid(y) and typeid(x) == typeid(y) in constant evaluation [PR103600]

2021-12-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Dec 2021 at 13:53, Jason Merrill wrote:
> During constant evaluation, the operator== could compare the type_info
> address instead of the __name address, reducing this to the previous
> problem.

That makes sense to me. We might still want the libstdc++ changes in
case other compilers choose to do this differently. I'll get in touch
with some Clang and EDG people about that.

Re: [PATCH] c++: Support &typeid(x) == &typeid(y) and typeid(x) == typeid(y) in constant evaluation [PR103600]

2021-12-08 Thread Jason Merrill via Gcc-patches


On 12/8/21 05:35, Jakub Jelinek wrote:

Hi!

If the tinfo vars are emitted in the current TU, they are emitted at the end
of the compilation, and for some types they are exported from
libstdc++/libsupc++ and not emitted in the current TU at all.

The following patch allows constant folding of comparisons of typeid
addresses and makes it possible to implement P1328R1 - making type_info
operator== constexpr (Jonathan has a patch for that).

As mentioned in the PR, the varpool/middle-end code is trying to be
conservative with address comparisons of different vars if those vars
don't bind locally, because of possible aliases in other TUs etc.


Would it make sense to assume that DECL_ARTIFICIAL variables can't be 
aliases?  If not, could we have some way of marking a variable as 
non-aliasing, perhaps an attribute?



and so while match.pd folds &typeid(int) == &typeid(int) because
it is equality comparison with the same operands, for different typeids
it doesn't fold it.  The first constexpr.c hunk takes care of that
during constant evaluation.  As the std::type_info class has a protected
member __name only, I think people shouldn't be comparing anything but
the starts of those vars, so we don't need to deal with offsets into
those vars etc.

The other constexpr.c hunk and rtti.c change is to allow constexpr
operator== that will in the manifestly constant evaluation just
compare the __name members for equality.  Again, the _ZTS* variables
aren't constructed yet and for some types will not be at all.
So, what this patch does is when evaluating typeid(whatever).__name
fold the INDIRECT_REF on (const std::type_info *) &_ZTIwhatever
into a CONSTRUCTOR_NO_CLEARING ctor with just __name field initialized
to &"1S" etc.


During constant evaluation, the operator== could compare the type_info 
address instead of the __name address, reducing this to the previous 
problem.



Testcase coverage includes just the &typeid comparison, for typeid
comparison it will be in the libstdc++ patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-12-08  Jakub Jelinek  

PR c++/103600
* cp-tree.h (tinfo_for_constant_evaluation): Declare.
* constexpr.c (cxx_eval_binary_expression): Evaluate
&typeid(x) == &typeid(y) to true or false depending on whether
the typeid is the same or not.
(cxx_eval_indirect_ref): Fold typeid(S) to {.__name="1S"}.
* rtti.c (tinfo_for_constant_evaluation): New function.

* g++.dg/cpp0x/constexpr-typeid2.C: New test.

--- gcc/cp/cp-tree.h.jj 2021-12-04 11:02:03.925646583 +0100
+++ gcc/cp/cp-tree.h2021-12-07 16:58:06.698802940 +0100
@@ -7376,6 +7376,7 @@ extern tree build_dynamic_cast(locati
 tsubst_flags_t);
  extern void emit_support_tinfos   (void);
  extern bool emit_tinfo_decl   (tree);
+extern tree tinfo_for_constant_evaluation  (tree);
  extern unsigned get_pseudo_tinfo_index(tree);
  extern tree get_pseudo_tinfo_type (unsigned);
  extern tree build_if_nonnull  (tree, tree, tsubst_flags_t);
--- gcc/cp/constexpr.c.jj   2021-12-04 11:02:03.949646241 +0100
+++ gcc/cp/constexpr.c  2021-12-07 17:32:07.324739082 +0100
@@ -3346,6 +3346,25 @@ cxx_eval_binary_expression (const conste
lhs = cplus_expand_constant (lhs);
else if (TREE_CODE (rhs) == PTRMEM_CST)
rhs = cplus_expand_constant (rhs);
+
+  /* Fold &typeid(x) == &typeid(y) to
+whether the DECL_TINFO_P vars are the same or not.  */
+  if (flag_rtti)
+   {
+ tree op0 = lhs;
+ tree op1 = rhs;
+ STRIP_NOPS (op0);
+ STRIP_NOPS (op1);
+ if (TREE_CODE (op0) == ADDR_EXPR
+ && VAR_P (TREE_OPERAND (op0, 0))
+ && DECL_TINFO_P (TREE_OPERAND (op0, 0))
+ && TREE_CODE (op1) == ADDR_EXPR
+ && VAR_P (TREE_OPERAND (op1, 0))
+ && DECL_TINFO_P (TREE_OPERAND (op1, 0)))
+   r = constant_boolean_node ((TREE_OPERAND (op0, 0)
+   != TREE_OPERAND (op1, 0))
+  ^ is_code_eq, type);
+   }
  }
if (code == POINTER_PLUS_EXPR && !*non_constant_p
&& integer_zerop (lhs) && !integer_zerop (rhs))
@@ -5261,6 +5280,18 @@ cxx_eval_indirect_ref (const constexpr_c
  STRIP_NOPS (sub);
  if (TREE_CODE (sub) == ADDR_EXPR)
{
+ /* For *(const std::type_info *)&_ZTIwhatever return
+a constructor with no clearing and __name being the
+tinfo name.  */
+ if (flag_rtti
+ && VAR_P (TREE_OPERAND (sub, 0))
+ && DECL_TINFO_P (TREE_OPERAND (sub, 0))
+ && !lval
+ && TREE_TYPE (t) == const_type_info_type_node)
+   if (tree ret
+   = tinfo_for_constant_eva

Re: [PATCH] testsuite: Use attribute "noipa" in sibcall tests

2021-12-08 Thread Richard Sandiford via Gcc-patches

Hans-Peter Nilsson via Gcc-patches  writes:
> ...instead of attribute "noinline".
>
> For cris-elf, testsuite/gcc.dg/sibcall-3.c and sibcall-4.c "XPASS",
> without sibcalls being implemented.  On inspection, recurser_void2 is
> set to be an assembly-level alias for recurser_void1 as in
> ".set _recurser_void2,_recurser_void1" for both these cases.
>
> IOW, those "__attribute__((noinline))" should be
> "__attribute__((noipa))".  The astute reader will notice that I also
> adjust test-cases where self-recursion should occur: as mentioned in
> sibcall-1.c "self-recursion tail calls are optimized for all targets,
> regardless of presence of sibcall patterns".  But, that optimization
> happens even with "noipa", as observed by the test-cases still passing
> for cris-elf after patching.  Being of a small mind, I like
> consistency, but not all the time, so there's hope.
>
> Now, editing test-cases is usually frowned upon, but I have
> it from authoritative sources that the intent was as
> patched.  Ok to commit?
>
> testsuite:
>   * gcc.dg/sibcall-1.c, gcc.dg/sibcall-10.c,
>   gcc.dg/sibcall-2.c, gcc.dg/sibcall-3.c,
>   gcc.dg/sibcall-4.c, gcc.dg/sibcall-9.c: Replace
>   attribute "noinline" with "noipa".

OK, thanks.

Richard

> Change-Id: I001030b314883170e6ff7d77e38ed0c7b57fecec
> ---
>  gcc/testsuite/gcc.dg/sibcall-1.c  | 2 +-
>  gcc/testsuite/gcc.dg/sibcall-10.c | 6 +++---
>  gcc/testsuite/gcc.dg/sibcall-2.c  | 2 +-
>  gcc/testsuite/gcc.dg/sibcall-3.c  | 6 +++---
>  gcc/testsuite/gcc.dg/sibcall-4.c  | 6 +++---
>  gcc/testsuite/gcc.dg/sibcall-9.c  | 6 +++---
>  6 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/sibcall-1.c 
> b/gcc/testsuite/gcc.dg/sibcall-1.c
> index 367ee4374e16..f2bd3ade2176 100644
> --- a/gcc/testsuite/gcc.dg/sibcall-1.c
> +++ b/gcc/testsuite/gcc.dg/sibcall-1.c
> @@ -47,7 +47,7 @@ recurser_void (int n)
>  
>  void *trackpoint;
>  
> -void __attribute__ ((noinline))
> +void __attribute__ ((noipa))
>  track (int n)
>  {
>char stackpos[1];
> diff --git a/gcc/testsuite/gcc.dg/sibcall-10.c 
> b/gcc/testsuite/gcc.dg/sibcall-10.c
> index d6668dc748da..f692869a99e6 100644
> --- a/gcc/testsuite/gcc.dg/sibcall-10.c
> +++ b/gcc/testsuite/gcc.dg/sibcall-10.c
> @@ -46,7 +46,7 @@ int main ()
> reasonably sure is to make them have the same contents (regarding the
> n tests).  */
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void1 (void)
>  {
>if (n == 0 || n == 7 || n == 8)
> @@ -58,7 +58,7 @@ recurser_void1 (void)
>recurser_void2 ();
>  }
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void2 (void)
>  {
>if (n == 0 || n == 7 || n == 8)
> @@ -73,7 +73,7 @@ recurser_void2 (void)
>  
>  void *trackpoint;
>  
> -void __attribute__ ((noinline))
> +void __attribute__ ((noipa))
>  track ()
>  {
>char stackpos[1];
> diff --git a/gcc/testsuite/gcc.dg/sibcall-2.c 
> b/gcc/testsuite/gcc.dg/sibcall-2.c
> index a626273e6200..0d4df82b0ca4 100644
> --- a/gcc/testsuite/gcc.dg/sibcall-2.c
> +++ b/gcc/testsuite/gcc.dg/sibcall-2.c
> @@ -38,7 +38,7 @@ recurser_void (void)
>  
>  void *trackpoint;
>  
> -void __attribute__ ((noinline))
> +void __attribute__ ((noipa))
>  track ()
>  {
>char stackpos[1];
> diff --git a/gcc/testsuite/gcc.dg/sibcall-3.c 
> b/gcc/testsuite/gcc.dg/sibcall-3.c
> index 77227824827a..8f6e776937a8 100644
> --- a/gcc/testsuite/gcc.dg/sibcall-3.c
> +++ b/gcc/testsuite/gcc.dg/sibcall-3.c
> @@ -40,7 +40,7 @@ int main ()
> reasonably sure is to make them have the same contents (regarding the
> n tests).  */
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void1 (int n)
>  {
>if (n == 0 || n == 7 || n == 8)
> @@ -52,7 +52,7 @@ recurser_void1 (int n)
>recurser_void2 (n + 1);
>  }
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void2 (int n)
>  {
>if (n == 0 || n == 7 || n == 8)
> @@ -66,7 +66,7 @@ recurser_void2 (int n)
>  
>  void *trackpoint;
>  
> -void __attribute__ ((noinline))
> +void __attribute__ ((noipa))
>  track (int n)
>  {
>char stackpos[1];
> diff --git a/gcc/testsuite/gcc.dg/sibcall-4.c 
> b/gcc/testsuite/gcc.dg/sibcall-4.c
> index 7fa51cab0dcb..a587c7254959 100644
> --- a/gcc/testsuite/gcc.dg/sibcall-4.c
> +++ b/gcc/testsuite/gcc.dg/sibcall-4.c
> @@ -41,7 +41,7 @@ int main ()
> reasonably sure is to make them have the same contents (regarding the
> n tests).  */
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void1 (void)
>  {
>if (n == 0 || n == 7 || n == 8)
> @@ -53,7 +53,7 @@ recurser_void1 (void)
>recurser_void2 ();
>  }
>  
> -static void __attribute__((noinline)) ATTR
> +static void __attribute__((noipa)) ATTR
>  recurser_void2 (void)
>  {
>if (n == 0 || n == 7 || n == 8)
> @

Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-12-08 Thread Jonathan Wakely via Gcc-patches

I've pushed this change to trunk now (it was posted and reviewed in
stage 1, I just didn't get around to pushing it until now).

The final version of the patch is attached to this mail.

Thanks for the nice optimization, Maged!


On Wed, 4 Aug 2021 at 20:49, Maged Michael via Libstdc++
 wrote:
>
> On Wed, Aug 4, 2021 at 3:32 PM Jonathan Wakely 
> wrote:
>
> > On Wed, 4 Aug 2021 at 18:19, Maged Michael wrote:
> > >
> > > Sorry. I totally missed the rest of your message and the patch. My fuzzy
> > eyesight, which usually guesses correctly 90% of the time, mistook
> > "Secondly" on a line by itself for "Sincerely" :-)
> >
> > :-)
> >
> > > The noinlining was based on looking at generated code. That was for
> > clang. It was inlining the _M_last_use function for every instance of
> > _M_release (e.g., ~shared_ptr). This optimization with the noinline for
> > _M_release_last_use ended up reducing massive binary text sizes by 1.5%
> > (several megabytes)., which was an extra benefit. Without the noinline we
> > saw code size increase.
> >
> > Wow, that is a convincing argument for making it not inline, thanks.
> >
> > > IIUC, we van use the following. Right?
> > >
> > > __attribute__((__noinline__))
> >
> > Right.
> >
> > > I didn't understand the part about programmers doing #define noinline 1.
> > I don't see code in the patch that uses noinline.
> >
> > This is a valid C++ program:
> >
> > #define noinline 1
> > #include 
> > int main() { }
> >
> > But if anything in  uses "noinline" then this valid program
> > will not compile. Which is why we must use ((__noinline__)) instead of
> > ((noinline)).
> >
> > Thanks. Now I get it.
>
>
> >
> >
> > >
> > > How about something like this comment?
> > >
> > > // Noinline to avoid code size increase.
> >
> > Great, thanks.
> >
> > On Wed, 4 Aug 2021 at 18:34, Maged Michael wrote:
> > > Actually I take back what I said. Sorry. I think the logic in your patch
> > is correct. I missed some of the atomic decrements.
> > > But I'd be concerned about performance. If we make _M_release_last_use
> > noinline then we are adding overhead to the fast path of the original logic
> > (where both counts are 1).
> >
> > Oh, I see. So the code duplication serves a purpose. We want the
> > _M_release_last_use() code to be non-inline for the new logic, because
> > in the common case we never call it (we either detect that both counts
> > are 1 and do the dispose & destroy without atomic decrements, or we do
> > a single decrement and don't dispose or destroy anything). But for the
> > old logic, we do want that code inlined into _M_release (or
> > _M_release_orig as it was called in your patch). Have I got that right
> > now?
> >
> > Yes. Completely right.
>
>
> > What if we remove the __noinline__ from _M_release_last_use() so that
> > it can be inlined, but than add a noinline wrapper that can be called
> > when we don't want to inline it?
> >
> > So:
> >   // Called by _M_release() when the use count reaches zero.
> >   void
> >   _M_release_last_use() noexcept
> >   {
> > // unchanged from previous patch, but without the attribute.
> > // ...
> >   }
> >
> >   // As above, but 'noinline' to reduce code size on the cold path.
> >   __attribute__((__noinline__))
> >   void
> >   _M_release_last_use_cold() noexcept
> >   { _M_release_last_use(); }
> >
> >
> > And then:
> >
> >   template<>
> > inline void
> > _Sp_counted_base<_S_atomic>::_M_release() noexcept
> > {
> >   _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
> > #if ! _GLIBCXX_TSAN
> >   constexpr bool __lock_free
> > = __atomic_always_lock_free(sizeof(long long), 0)
> > && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
> >   constexpr bool __double_word
> > = sizeof(long long) == 2 * sizeof(_Atomic_word);
> >   // The ref-count members follow the vptr, so are aligned to
> >   // alignof(void*).
> >   constexpr bool __aligned = __alignof(long long) <= alignof(void*);
> >   if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
> > {
> >   constexpr long long __unique_ref
> > = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
> >   auto __both_counts = reinterpret_cast(&_M_use_count);
> >
> >   _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
> >   if (__atomic_load_n(__both_counts, __ATOMIC_ACQUIRE) ==
> > __unique_ref)
> > {
> >   // Both counts are 1, so there are no weak references and
> >   // we are releasing the last strong reference. No other
> >   // threads can observe the effects of this _M_release()
> >   // call (e.g. calling use_count()) without a data race.
> >   *(long long*)(&_M_use_count) = 0;
> >   _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
> >   _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_c

Re: [PATCH] [2/2] arm: add arm bti pass

2021-12-08 Thread Richard Earnshaw via Gcc-patches





On 05/11/2021 08:55, Andrea Corallo via Gcc-patches wrote:

Hi all,

this patch enables Branch Target Identification Armv8.1-M Mechanism
[1].

This is achieved by moving and generalizing the Aarch64 "bti" pass so
it can be used also by the Arm backend.

The pass iterates through the instructions and adds the necessary BTI
instructions at the beginning of every function and at every landing
pads targeted by indirect jumps.

Regressioned and arm-linux-gnu aarch64-linux-gnu bootstraped.

Best Regards

   Andrea

[1] 




I think this patch needs splitting into two parts.  The first part is 
simply the restructuring of the existing aarch64 code (and function 
renaming - mostly aarch64 -> aarch) the second adds the new arm code. 
The first patch is essentially a nop but takes care of all the 
uninteresting changes that are needed to enable part two.


R.

Re: [PATCH] [1/2] arm: Implement cortex-M return signing address codegen

2021-12-08 Thread Richard Earnshaw via Gcc-patches





On 05/11/2021 08:52, Andrea Corallo via Gcc-patches wrote:

Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

 pac r12, lr, sp
 push{r3, r7, lr}
 push{r12}
 sub sp, sp, #4


Which, as shown here, generates a stack which does not preserve 8-byte 
alignment.


Also, what's wrong with

pac r12, lr, sp
push{r3, r7, ip, lr}
?

Which saves 2 bytes in the prologue and ...


 [...] function body
 add sp, sp, #4
 pop {r12}
 pop {r3, r7, lr}
 aut r12, lr, sp
 bx  lr


pop {r3, r7, ip, lr}
aut r12, lr, sp
bx  lr

which saves 4 bytes in the epilogue (repeated for each instance of the 
epilogue).




The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.



What about variadic functions?

What about functions where lr is live on entry (where it's used for 
passing the closure in nested functions)?




These two patches apply on top of Tejas series posted here [2].

Regressioned and arm-linux-gnu aarch64-linux-gnu bootstraped.

Best Regards

   Andrea

[1] 

[2] 




+static bool arm_pac_enabled_for_curr_function_p (void);

I really don't like that name.  There are a lot of functions with 
variations of 'current function' in the name already and this creates 
yet another variant.  Something like 
arm_current_function_pac_enabled_p() would be preferable; or, if that 
really is too long, use 'current_func' which already has usage within 
the compiler.


+(define_insn "pac_ip_lr_sp"
+  [(set (reg:DI IP_REGNUM)
+   (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+   UNSPEC_PAC_IP_LR_SP))]
+  ""
+  "pac\tr12, lr, sp")
+
+(define_insn "pacbti_ip_lr_sp"
+  [(set (reg:DI IP_REGNUM)
+   (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+   UNSPEC_PACBTI_IP_LR_SP))]
+  ""
+  "pacbti\tr12, lr, sp")
+
+(define_insn "aut_ip_lr_sp"
+  [(unspec:DI [(reg:DI IP_REGNUM) (reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+  UNSPEC_AUT_IP_LR_SP)]
+  ""
+  "aut\tr12, lr, sp")
+

I think all these need a length attribute.  Also, they should only be 
enabled for thumb2 (certainly not in Arm state).
And when using explicit register names in an asm, prefix each name with 
'%|', just in case the assembler dialect has a register name prefix.


The names are somewhat unweildy, can't we use something more usefully 
descriptive, like 'pac_nop', 'pacbti_nop' and 'aut_nop', since all these 
instructions are using the architectural NOP space.


Finally, I think we need some more tests that cover the various 
frame-pointer flavours when used in combination with this feature and 
for various corners of the PCS.


R.

[PATCH] c++: Support &typeid(x) == &typeid(y) and typeid(x) == typeid(y) in constant evaluation [PR103600]

2021-12-08 Thread Jakub Jelinek via Gcc-patches

Hi!

If the tinfo vars are emitted in the current TU, they are emitted at the end
of the compilation, and for some types they are exported from
libstdc++/libsupc++ and not emitted in the current TU at all.

The following patch allows constant folding of comparisons of typeid
addresses and makes it possible to implement P1328R1 - making type_info
operator== constexpr (Jonathan has a patch for that).

As mentioned in the PR, the varpool/middle-end code is trying to be
conservative with address comparisons of different vars if those vars
don't bind locally, because of possible aliases in other TUs etc.
and so while match.pd folds &typeid(int) == &typeid(int) because
it is equality comparison with the same operands, for different typeids
it doesn't fold it.  The first constexpr.c hunk takes care of that
during constant evaluation.  As the std::type_info class has a protected
member __name only, I think people shouldn't be comparing anything but
the starts of those vars, so we don't need to deal with offsets into
those vars etc.

The other constexpr.c hunk and rtti.c change is to allow constexpr
operator== that will in the manifestly constant evaluation just
compare the __name members for equality.  Again, the _ZTS* variables
aren't constructed yet and for some types will not be at all.
So, what this patch does is when evaluating typeid(whatever).__name
fold the INDIRECT_REF on (const std::type_info *) &_ZTIwhatever
into a CONSTRUCTOR_NO_CLEARING ctor with just __name field initialized
to &"1S" etc.

Testcase coverage includes just the &typeid comparison, for typeid
comparison it will be in the libstdc++ patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-12-08  Jakub Jelinek  

PR c++/103600
* cp-tree.h (tinfo_for_constant_evaluation): Declare.
* constexpr.c (cxx_eval_binary_expression): Evaluate
&typeid(x) == &typeid(y) to true or false depending on whether
the typeid is the same or not.
(cxx_eval_indirect_ref): Fold typeid(S) to {.__name="1S"}.
* rtti.c (tinfo_for_constant_evaluation): New function.

* g++.dg/cpp0x/constexpr-typeid2.C: New test.

--- gcc/cp/cp-tree.h.jj 2021-12-04 11:02:03.925646583 +0100
+++ gcc/cp/cp-tree.h2021-12-07 16:58:06.698802940 +0100
@@ -7376,6 +7376,7 @@ extern tree build_dynamic_cast(locati
 tsubst_flags_t);
 extern void emit_support_tinfos(void);
 extern bool emit_tinfo_decl(tree);
+extern tree tinfo_for_constant_evaluation  (tree);
 extern unsigned get_pseudo_tinfo_index (tree);
 extern tree get_pseudo_tinfo_type  (unsigned);
 extern tree build_if_nonnull   (tree, tree, tsubst_flags_t);
--- gcc/cp/constexpr.c.jj   2021-12-04 11:02:03.949646241 +0100
+++ gcc/cp/constexpr.c  2021-12-07 17:32:07.324739082 +0100
@@ -3346,6 +3346,25 @@ cxx_eval_binary_expression (const conste
lhs = cplus_expand_constant (lhs);
   else if (TREE_CODE (rhs) == PTRMEM_CST)
rhs = cplus_expand_constant (rhs);
+
+  /* Fold &typeid(x) == &typeid(y) to
+whether the DECL_TINFO_P vars are the same or not.  */
+  if (flag_rtti)
+   {
+ tree op0 = lhs;
+ tree op1 = rhs;
+ STRIP_NOPS (op0);
+ STRIP_NOPS (op1);
+ if (TREE_CODE (op0) == ADDR_EXPR
+ && VAR_P (TREE_OPERAND (op0, 0))
+ && DECL_TINFO_P (TREE_OPERAND (op0, 0))
+ && TREE_CODE (op1) == ADDR_EXPR
+ && VAR_P (TREE_OPERAND (op1, 0))
+ && DECL_TINFO_P (TREE_OPERAND (op1, 0)))
+   r = constant_boolean_node ((TREE_OPERAND (op0, 0)
+   != TREE_OPERAND (op1, 0))
+  ^ is_code_eq, type);
+   }
 }
   if (code == POINTER_PLUS_EXPR && !*non_constant_p
   && integer_zerop (lhs) && !integer_zerop (rhs))
@@ -5261,6 +5280,18 @@ cxx_eval_indirect_ref (const constexpr_c
  STRIP_NOPS (sub);
  if (TREE_CODE (sub) == ADDR_EXPR)
{
+ /* For *(const std::type_info *)&_ZTIwhatever return
+a constructor with no clearing and __name being the
+tinfo name.  */
+ if (flag_rtti
+ && VAR_P (TREE_OPERAND (sub, 0))
+ && DECL_TINFO_P (TREE_OPERAND (sub, 0))
+ && !lval
+ && TREE_TYPE (t) == const_type_info_type_node)
+   if (tree ret
+   = tinfo_for_constant_evaluation (TREE_OPERAND (sub, 0)))
+ return ret;
+
  gcc_assert (!similar_type_p
  (TREE_TYPE (TREE_TYPE (sub)), TREE_TYPE (t)));
  /* DR 1188 says we don't have to deal with this.  */
--- gcc/cp/rtti.c.jj2021-09-15 08:55:37.569497474 +0200
+++ gcc/cp/rtti.c   2021-12-07 17:17:50.943944129 +0100
@@ -1700,4 +1700,39

Re: [PATCH] [1/2] arm: Implement cortex-M return signing address codegen

2021-12-08 Thread Andrea Corallo via Gcc-patches

Andrea Corallo via Gcc-patches  writes:

> Andrea Corallo via Gcc-patches  writes:
>
>> Hi all,
>>
>> this patch enables address return signature and verification based on
>> Armv8.1-M Pointer Authentication [1].
>>
>> To sign the return address, we use the PAC R12, LR, SP instruction
>> upon function entry.  This is signing LR using SP and storing the
>> result in R12.  R12 will be pushed into the stack.
>>
>> During function epilogue R12 will be popped and AUT R12, LR, SP will
>> be used to verify that the content of LR is still valid before return.
>>
>> Here an example of PAC instrumented function prologue and epilogue:
>>
>> pac r12, lr, sp
>> push{r3, r7, lr}
>> push{r12}
>> sub sp, sp, #4
>> [...] function body
>> add sp, sp, #4
>> pop {r12}
>> pop {r3, r7, lr}
>> aut r12, lr, sp
>> bx  lr
>>
>> The patch also takes care of generating a PACBTI instruction in place
>> of the sequence BTI+PAC when Branch Target Identification is enabled
>> contextually.
>>
>> These two patches apply on top of Tejas series posted here [2].
>>
>> Regressioned and arm-linux-gnu aarch64-linux-gnu bootstraped.
>>
>> Best Regards
>>
>>   Andrea
>>
>> [1] 
>> 
>> [2] 
>
> Ping
>
> Best Regards
>
>   Andrea

Hi all,

pinging this and 2/2.

Thanks

  Andrea

[PING][PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-12-08 Thread Murray Steele via Gcc-patches

Hi,


I'd like to ping this patch revision [1]. 

Thanks,
Murray

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585400.html

---

On 25/11/2021 09:42, Murray Steele wrote:
> Changes from original patch:
> 
> 1. Merged test_redef_* test files into one
> 2. Encapsulated contents of arm-mve-builtins.h in namespace arm_mve (missed
>in initial patch).
> 3. Added extern declarations for scalar_types and acle_vector types to
>arm-mve-builtins.h (missed in initial patch).
> 4. Added arm-mve-builtins.(cc|h) to gt_targets for arm-*-*-* (missed in
>initial patch).
> 5. Added include for gt-arm-mve-builtins.h to arm-mve-builtins.cc (missed in
>initial patch).
> 6. Removed explicit initialisation of handle_arm_mve_types_p as it is 
> unneeded.
> 
> ---
> 
> This patch moves the implementation of MVE ACLE types from
> arm_mve_types.h to inside GCC via a new pragma, which replaces the prior
> type definitions. This allows for the types to be used internally for
> intrinsic function definitions.
> 
> Bootstrapped and regression tested on arm-none-linux-gnuabihf, and
> regression tested on arm-eabi -- no issues.
> 
> Thanks,
> Murray
> 
> gcc/ChangeLog:
> 
> * config.gcc: Add arm-mve-builtins.o to extra_objs for arm-*-*-*
> targets.
> * config/arm/arm-c.c (arm_pragma_arm): Handle new pragma.
> (arm_register_target_pragmas): Register new pragma.
> * config/arm/arm-protos.h: Add arm_mve namespace and declare
> arm_handle_mve_types_h.
> * config/arm/arm_mve_types.h: Replace MVE type definitions with
> new pragma.
> * config/arm/t-arm: Add arm-mve-builtins.o target.
> * config/arm/arm-mve-builtins.cc: New file.
> * config/arm/arm-mve-builtins.def: New file.
> * config/arm/arm-mve-builtins.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/mve/mve.exp: Add new subdirectories.
> * gcc.target/arm/mve/general-c/type_redef_1.c: New test.
> * gcc.target/arm/mve/general/double_pragmas_1.c: New test.
> * gcc.target/arm/mve/general/nomve_1.c: New test.
>

Re: [PATCH] PR fortran/103418 - random_number() does not accept pointer, intent(in) array argument

2021-12-08 Thread Mikael Morin


On 07/12/2021 21:46, Harald Anlauf wrote:

Hi Mikael,

Am 07.12.21 um 21:17 schrieb Mikael Morin:

Hello,

On 05/12/2021 22:55, Harald Anlauf via Fortran wrote:

Dear all,

the check of dummy arguments with pointer attribute and INTENT(IN)
was broken in the case the argument was passed to an intrinsic.
We therefore rejected valid code as e.g. given in the PR.

The patch relaxes the excessive check.  This requires the adjustment
of one of the tests for MOVE_ALLOC.

The existing code looks dubious to me (or at least difficult to
understand), and your patch doesn’t make that any better.
I would rather try to remove the whole block, and fix the fallout on
move_alloc by adding calls to gfc_check_vardef_context in
gfc_check_move_alloc.
Can you try that instead?


I hadn't thought that far but will think about a possibly better
solution.


Hello,

I thought about it some more over night, and it is probably a poor 
suggestion to restrict the check to move_alloc only.  The existing code 
was added for move_alloc, but it has a broader scope.  Still, 
gfc_check_vardef_context has the correct checks and is the one to be used.

[PATCH] Use -fopt-info in unswitch pass.

2021-12-08 Thread Martin Liška


The patch is about porting of dump information to -fopt-info so
that we can compare the current pass with a modified one.

Right now, there are 1945 'optimized: Unswitching loop on condition' lines
for SPEC 2006 benchmark.

Moreover, I adjusted dump functions in profile-count.{ch} so that it dumps
to a string buffer.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* profile-count.c (profile_count::dump): Add function
that can dump to a provided buffer.
(profile_probability::dump): Likewise.
* profile-count.h: Likewise.
* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop):
Use dump_printf_loc infrastructure.
(tree_unswitch_outer_loop): Likewise.
(find_loop_guard): Likewise.
(hoist_guard): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-1.c: Adjust test-case based on
dump_printf_loc.
* gcc.dg/loop-unswitch-2.c: Likewise.
* gcc.dg/loop-unswitch-3.c: Likewise.
* gcc.dg/loop-unswitch-4.c: Likewise.
* gcc.dg/loop-unswitch-5.c: Likewise.
---
 gcc/profile-count.c|  48 ++---
 gcc/profile-count.h|   6 ++
 gcc/testsuite/gcc.dg/loop-unswitch-1.c |   4 +-
 gcc/testsuite/gcc.dg/loop-unswitch-2.c |   5 +-
 gcc/testsuite/gcc.dg/loop-unswitch-3.c |   7 +-
 gcc/testsuite/gcc.dg/loop-unswitch-4.c |   1 -
 gcc/testsuite/gcc.dg/loop-unswitch-5.c |   2 +-
 gcc/tree-ssa-loop-unswitch.c   | 131 +++--
 8 files changed, 128 insertions(+), 76 deletions(-)

diff --git a/gcc/profile-count.c b/gcc/profile-count.c
index f7f4dffdc90..c04b4fe823d 100644
--- a/gcc/profile-count.c
+++ b/gcc/profile-count.c
@@ -84,18 +84,28 @@ const char *profile_quality_display_names[] =
   "precise"
 };
 
-/* Dump THIS to F.  */

+/* Dump THIS to BUFFER.  */
 
 void

-profile_count::dump (FILE *f) const
+profile_count::dump (char *buffer) const
 {
   if (!initialized_p ())
-fprintf (f, "uninitialized");
+sprintf (buffer, "uninitialized");
   else
-fprintf (f, "%" PRId64 " (%s)", m_val,
+sprintf (buffer, "%" PRId64 " (%s)", m_val,
 profile_quality_display_names[m_quality]);
 }
 
+/* Dump THIS to F.  */

+
+void
+profile_count::dump (FILE *f) const
+{
+  char buffer[64];
+  dump (buffer);
+  fputs (buffer, f);
+}
+
 /* Dump THIS to stderr.  */
 
 void

@@ -151,32 +161,44 @@ profile_count::stream_out (struct lto_output_stream *ob)
   streamer_write_uhwi_stream (ob, m_quality);
 }
 
-/* Dump THIS to F.  */

+
+/* Output THIS to BUFFER.  */
 
 void

-profile_probability::dump (FILE *f) const
+profile_probability::dump (char *buffer) const
 {
   if (!initialized_p ())
-fprintf (f, "uninitialized");
+sprintf (buffer, "uninitialized");
   else
 {
   /* Make difference between 0.00 as a roundoff error and actual 0.
 Similarly for 1.  */
   if (m_val == 0)
-fprintf (f, "never");
+   buffer += sprintf (buffer, "never");
   else if (m_val == max_probability)
-fprintf (f, "always");
+   buffer += sprintf (buffer, "always");
   else
-fprintf (f, "%3.1f%%", (double)m_val * 100 / max_probability);
+   buffer += sprintf (buffer, "%3.1f%%", (double)m_val * 100 / 
max_probability);
+
   if (m_quality == ADJUSTED)
-   fprintf (f, " (adjusted)");
+   sprintf (buffer, " (adjusted)");
   else if (m_quality == AFDO)
-   fprintf (f, " (auto FDO)");
+   sprintf (buffer, " (auto FDO)");
   else if (m_quality == GUESSED)
-   fprintf (f, " (guessed)");
+   sprintf (buffer, " (guessed)");
 }
 }
 
+/* Dump THIS to F.  */

+
+void
+profile_probability::dump (FILE *f) const
+{
+  char buffer[64];
+  dump (buffer);
+  fputs (buffer, f);
+}
+
 /* Dump THIS to stderr.  */
 
 void

diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index c7a45ac5ee3..f86091f23a8 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -609,6 +609,9 @@ public:
   /* Output THIS to F.  */
   void dump (FILE *f) const;
 
+  /* Output THIS to BUFFER.  */

+  void dump (char *buffer) const;
+
   /* Print THIS to stderr.  */
   void debug () const;
 
@@ -1208,6 +1211,9 @@ public:

   /* Output THIS to F.  */
   void dump (FILE *f) const;
 
+  /* Output THIS to BUFFER.  */

+  void dump (char *buffer) const;
+
   /* Print THIS to stderr.  */
   void debug () const;
 
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-1.c b/gcc/testsuite/gcc.dg/loop-unswitch-1.c

index de2fb2c0e4b..f9d628df510 100644
--- a/gcc/testsuite/gcc.dg/loop-unswitch-1.c
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
@@ -1,6 +1,6 @@
 /* For PR rtl-optimization/27735  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details 
-fno-finite-loops" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-all 
-fno-finite-loops" } */
 
 void set_color(void);

 void xml_colorize_lin

Re: Limit inlining functions called once

2021-12-08 Thread Bernhard Reutner-Fischer via Gcc-patches

On Tue, 7 Dec 2021 16:07:01 +0100
Jan Hubicka via Gcc-patches  wrote:

> Hi,
> as dicussed in PR ipa/103454 there are several benchmarks that regresses
> for -finline-functions-called once. Runtmes:
>  - tramp3d with -Ofast. 31%
>  - exchange2 with -Ofast 11-21%
>  - roms O2 9%-10%
>  - tonto 2.5-3.5% with LTO
> Build times:
>  - specfp2006 41% (mostly wrf that builds 71% faster)
>  - specint2006 1.5-3%
>  - specfp2017 64% (again mostly wrf)
>  - specint2017 2.5-3.5%
> 
> 
> This patch adds two params to tweak the behaviour:
>  1) max-inline-functions-called-once-loop-depth limiting the loop depth
> (this is useful primarily for exchange where the inlined function is in
>  loop depth 9)
>  2) max-inline-functions-called-once-insns
> We already have large-function-insns/growth parameters, but these are
> limiting also inlining small functions, so reducing them will regress
> very large functions that are hot.
> 
> Because inlining functions called once is meant just as a cleanup pass
> I think it makes sense to have separate limit for it.
> 
> I set the parmaeters to 6 and 4000.
> 4000 was chosen to make fatigue benchmark happy and that seems to be only one
> holding the value pretty high.  I opened
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 to track this.
> 
> I plan to reduce the value during before christmas after bit more testing 
> since
> it seems to be overall win even if we trade fatigue2 performance, but I would
> like to get more testing on larger C++ APPs first.

Will this hurt -Os -finline-limit=0 ?
thanks,

Re: [PATCH][GCC11] PR tree-optimization/103603 - Directly resolve range_of_stmt dependencies. (Port of PR 103231/103464)

2021-12-08 Thread Kito Cheng via Gcc-patches

Test result from RISC-V, tested on riscv64-unknown-elf and
riscv64-unknown-linux-gnu with no regressions.

Thanks :)

On Wed, Dec 8, 2021 at 4:19 AM Andrew MacLeod via Gcc-patches
 wrote:
>
> The following patch is a slight rework of the 2 patches which flatten
> rangers call stack.  It needed some tweaking since some of the routines
> have changed name or been adjusted.
>
> This has been bootstrapped on x86_64-pc-linux-gnu with no regressions.
> OK for gcc-11 release branch?
>
> Andrew
>

Re: [PATCH] pch: Add support for relocation of the PCH data [PR71934]

2021-12-08 Thread Iain Sandoe via Gcc-patches



> On 7 Dec 2021, at 14:50, Jakub Jelinek via Gcc-patches 
>  wrote:
> 
> On Tue, Dec 07, 2021 at 10:55:07AM +0100, Jakub Jelinek via Gcc-patches wrote:
>> So, this patch instead builds a relocation table (sorted list of addresses
>> in the blob which needs relocation) at PCH save time, stores it in a very
>> compact form into the gch file and upon restore, adjusts pointers in GTY
>> roots (that is right away in the root structures) and the addresses in the
>> relocation table.
>> The cost on stdc++.gch/O2g.gch (previously 85MB large) is about 3% file size
>> growth, there are 2.5 million pointers that need relocation in the gch blob
>> and the relocation table uses uleb128 for address deltas and needs ~1.01 
>> bytes
>> for one address that needs relocation, and about 20% compile time during
>> PCH save (I think it is mainly because of the need to qsort those 2.5
>> million pointers).  On PCH restore, if it doesn't need relocation (the usual
>> case), it is just an extra fread of sizeof (size_t) data and fseek
>> (in my tests real time on vanilla tree for #include  CU
>> was ~0.175s and with the patch but no relocation ~0.173s), while if it needs
>> relocation it took ~0.193s, i.e. 11.5% slower.
> 
> I'll note that without PCH that
> #include 
> int i;
> testcase compiles with -O2 -g in ~1.199s, i.e. 6.2 times slower than PCH with
> relocation and 6.9 times than PCH without relocation.

I’ve run tests across the Darwin range, including old and new m32 hosts, and 
this
seems to be working well.

The attached patch should be applied before (or merged with) the change for
relocation when it is applied - since the operation of the PCH hooks needs some
adjustment on Darwin.

thanks for working on this!
Iain



0001-Darwin-PCH-Rework-PCH-hooks-for-relocatable-implemen.patch
Description: Binary data

80 matches

Mail list logo