date:20210825

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-25 Thread Xionghu Luo via Gcc-patches




On 2021/8/24 16:20, Richard Biener wrote:
> On Tue, 24 Aug 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/19 20:11, Richard Biener wrote:
 -  class loop *inn_loop = loop;

  if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
{
 @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, 
 sbitmap contains_call)
 to disprove this if possible).  */
  if (bb->flags & BB_IRREDUCIBLE_LOOP)
break;
 -
 -if (!flow_bb_inside_loop_p (inn_loop, bb))
 -  break;
 -
 -if (bb->loop_father->header == bb)
 -  {
 -if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 -  break;
 -
 -/* In a loop that is always entered we may proceed anyway.
 -   But record that we entered it and stop once we leave it.  */
 -inn_loop = bb->loop_father;
 -  }
}

  while (1)
>>> I'm not sure this will work correct (I'm not sure how the existing
>>> code makes it so either...).  That said, I can't poke any hole
>>> into the change.  What I see is that definitely
>>>
>>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>   last = bb;
>>>
>>> if (bitmap_bit_p (contains_call, bb->index))
>>>   break;
>>>
>>> doesn't work reliably since the DOM ordering will process blocks
>>> A B and C in random order for
>>>
>>> for (;;)
>>>  {
>>> if (cond)
>>>   {
>>> A: foo ();
>>>   }
>>> else B:;
>>> C:;
>>>  }
>>>
>>> and thus we can end up setting 'last' to C_before_  processing
>>> 'A' and thus arriving at the call foo () ...
>>>
>>> get_loop_body_in_dom_order does some "special sauce" but not
>>> to address the above problem - but it might be that a subtle
>>> issue like the above is the reason for the inner loop handling.
>>> The inner loop block order does_not_  adhere to this "special sauce",
>>> that is - the "Additionally, if a basic block s dominates
>>> the latch, then only blocks dominated by s are be after it."
>>> guarantee holds for the outer loop latch, not for the inner.
>>>
>>> Digging into the history of fill_always_executed_in_1 doesn't
>>> reveal anything - the inner loop handling has been present
>>> since introduction by Zdenek - but usually Zdenek has a reason
>>> for doing things as he does;)
>>
>> Yes, this is really complicated usage, thanks for point it out. :)
>> I constructed two cases to verify this with inner loop includes "If A; else 
>> B; C".
>> Finding that fill_sons_in_loop in get_loop_body_in_dom_order will also checks
>> whether the bb domintes outer loop’s latch, if C dominate outer loop’s latch,
>> C is postponed, the access order is ABC, 'last' won’t be set to C if A or B 
>> contains call;
> 
> But it depends on the order of visiting ABC and that's hard to put into
> a testcase since it depends on the order of edges and the processing
> of the dominance computation.  ABC are simply unordered with respect
> to a dominator walk.
> 
>> Otherwise if C doesn’t dominate outer loop’s latch in fill_sons_in_loop,
>> the access order is CAB, but 'last' also won’t be updated to C in 
>> fill_always_executed_in_1
>> since there is also dominate check, then if A or B contains call, it could 
>> break
>> successfully.
>>
>> C won't be set to ALWAYS EXECUTED for both circumstance.
>>
>>>
>>> Note it might be simply a measure against quadratic complexity,
>>> esp. since with your patch we also dive into not always executed
>>> subloops as you remove the
>>>
>>> if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>   break;
>>>
>>> check.  I suggest to evaluate behavior of the patch on a testcase
>>> like
>>>
>>> void foo (int n, int **k)
>>> {
>>> for (int i = 0; i < n; ++i)
>>>   if (k[0][i])
>>> for (int j = 0; j < n; ++j)
>>>   if (k[1][j])
>>> for (int l = 0; l < n; ++l)
>>>   if (k[2][l])
>>> ...
>>> }
>>
>> Theoretically the complexity is changing from L1(bbs) to 
>> L1(bbs)+L2(bbs)+L3(bbs)+…+Ln(bbs),
>> so fill_always_executed_in_1's execution time is supposed to be increase from
>> O(n) to O(n2)?  The time should depend on loop depth and bb counts.   I also 
>> drafted a
>> test case has 73-depth loop function with 25 no-ipa function copies each 
>> compiled
>> in lim2 and lim4 dependently.  Total execution time of 
>> fill_always_executed_in_1 is
>> increased from 32ms to 58ms, almost doubled but not quadratic?
> 
> It's more like n + (n-1) + (n-2) + ... + 1 which is n^2/2 but that's still
> O(n^2).
> 
>> It seems reasonable to see compiling time getting longer since most bbs are 
>> checked
>> more but a MUST to ensure early break correctly in every loop level...
>> Though loop nodes could be huge, loop depth will never be so large in actual 
>> code?
> 
>

Re: [PATCH] Fold more shuffle builtins to VEC_PERM_EXPR.

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Thu, Aug 26, 2021 at 12:57 PM liuhongt  wrote:
>
>   This patch is a follow-up to [1], it fold all shufps/shufpd builtins into 
> gimple.
Of course for non-mask or mask all-ones version.
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html
>
> gcc/
> PR target/98167
> PR target/43147
> * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512,
> IX86_BUILTIN_SHUFPD256， IX86_BUILTIN_SHUFPS，
> IX86_BUILTIN_SHUFPS256.
> (ix86_masked_all_ones): New function.
>
> gcc/testsuite/
> * gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase.
> * gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase.
> * gcc.target/i386/pr43147.c: New test.
> ---
>  gcc/config/i386/i386.c| 90 ++-
>  .../gcc.target/i386/avx512f-vshufpd-1.c   |  3 +-
>  .../gcc.target/i386/avx512f-vshufps-1.c   |  3 +-
>  gcc/testsuite/gcc.target/i386/pr43147.c   | 15 
>  4 files changed, 87 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr43147.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ebec8668758..f3eed9f2426 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -17541,6 +17541,20 @@ ix86_vector_shift_count (tree arg1)
>return NULL_TREE;
>  }
>
> +/* Return true if arg_mask is all ones, arg_vec is corresponding vector.  */
> +static bool
> +ix86_masked_all_ones (unsigned HOST_WIDE_INT elems, tree arg_mask)
> +{
> +  if (TREE_CODE (arg_mask) != INTEGER_CST)
> +return false;
> +
> +  unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (arg_mask);
> +  if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
> +return false;
> +
> +  return true;
> +}
> +
>  static tree
>  ix86_fold_builtin (tree fndecl, int n_args,
>tree *args, bool ignore ATTRIBUTE_UNUSED)
> @@ -18026,6 +18040,7 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>enum tree_code tcode;
>unsigned HOST_WIDE_INT count;
>bool is_vshift;
> +  unsigned HOST_WIDE_INT elems;
>
>switch (fn_code)
>  {
> @@ -18349,17 +18364,11 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gcc_assert (n_args >= 2);
>arg0 = gimple_call_arg (stmt, 0);
>arg1 = gimple_call_arg (stmt, 1);
> -  if (n_args > 2)
> -   {
> - /* This is masked shift.  Only optimize if the mask is all ones.  */
> - tree argl = gimple_call_arg (stmt, n_args - 1);
> - if (!tree_fits_uhwi_p (argl))
> -   break;
> - unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl);
> - unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> - if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
> -   break;
> -   }
> +  elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +  /* For masked shift, only optimize if the mask is all ones.  */
> +  if (n_args > 2
> + && !ix86_masked_all_ones (elems, gimple_call_arg (stmt, n_args - 
> 1)))
> +   break;
>if (is_vshift)
> {
>   if (TREE_CODE (arg1) != VECTOR_CST)
> @@ -18408,25 +18417,62 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> }
>break;
>
> +case IX86_BUILTIN_SHUFPD512:
> +case IX86_BUILTIN_SHUFPS512:
>  case IX86_BUILTIN_SHUFPD:
> +case IX86_BUILTIN_SHUFPD256:
> +case IX86_BUILTIN_SHUFPS:
> +case IX86_BUILTIN_SHUFPS256:
> +  arg0 = gimple_call_arg (stmt, 0);
> +  elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +  /* This is masked shuffle.  Only optimize if the mask is all ones.  */
> +  if (n_args > 3
> + && !ix86_masked_all_ones (elems,
> +   gimple_call_arg (stmt, n_args - 1)))
> +   break;
>arg2 = gimple_call_arg (stmt, 2);
>if (TREE_CODE (arg2) == INTEGER_CST)
> {
> + unsigned HOST_WIDE_INT shuffle_mask = TREE_INT_CST_LOW (arg2);
> + /* Check valid imm, refer to gcc.target/i386/testimm-10.c.  */
> + if (shuffle_mask > 255)
> +   return false;
> +
> + machine_mode imode = GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0)));
>   location_t loc = gimple_location (stmt);
> - unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2);
> - arg0 = gimple_call_arg (stmt, 0);
> + tree itype = (imode == E_DFmode
> +   ? long_long_integer_type_node : integer_type_node);
> + tree vtype = build_vector_type (itype, elems);
> + tree_vector_builder elts (vtype, elems, 1);
> +
> +
> + /* Transform integer shuffle_mask to vector perm_mask which
> +is used by vec_perm_expr, refer to shuflp[sd]256/512 in sse.md.  
> */
> + for (unsigned i = 0; i != elems; i++)
> +   {
> + unsigned

[PATCH] Fold more shuffle builtins to VEC_PERM_EXPR.

2021-08-25 Thread liuhongt via Gcc-patches

  This patch is a follow-up to [1], it fold all shufps/shufpd builtins into 
gimple.
  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html

gcc/
PR target/98167
PR target/43147
* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512,
IX86_BUILTIN_SHUFPD256， IX86_BUILTIN_SHUFPS，
IX86_BUILTIN_SHUFPS256.
(ix86_masked_all_ones): New function.

gcc/testsuite/
* gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase.
* gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase.
* gcc.target/i386/pr43147.c: New test.
---
 gcc/config/i386/i386.c| 90 ++-
 .../gcc.target/i386/avx512f-vshufpd-1.c   |  3 +-
 .../gcc.target/i386/avx512f-vshufps-1.c   |  3 +-
 gcc/testsuite/gcc.target/i386/pr43147.c   | 15 
 4 files changed, 87 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr43147.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ebec8668758..f3eed9f2426 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17541,6 +17541,20 @@ ix86_vector_shift_count (tree arg1)
   return NULL_TREE;
 }
 
+/* Return true if arg_mask is all ones, arg_vec is corresponding vector.  */
+static bool
+ix86_masked_all_ones (unsigned HOST_WIDE_INT elems, tree arg_mask)
+{
+  if (TREE_CODE (arg_mask) != INTEGER_CST)
+return false;
+
+  unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (arg_mask);
+  if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
+return false;
+
+  return true;
+}
+
 static tree
 ix86_fold_builtin (tree fndecl, int n_args,
   tree *args, bool ignore ATTRIBUTE_UNUSED)
@@ -18026,6 +18040,7 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   enum tree_code tcode;
   unsigned HOST_WIDE_INT count;
   bool is_vshift;
+  unsigned HOST_WIDE_INT elems;
 
   switch (fn_code)
 {
@@ -18349,17 +18364,11 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   gcc_assert (n_args >= 2);
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
-  if (n_args > 2)
-   {
- /* This is masked shift.  Only optimize if the mask is all ones.  */
- tree argl = gimple_call_arg (stmt, n_args - 1);
- if (!tree_fits_uhwi_p (argl))
-   break;
- unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl);
- unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
- if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
-   break;
-   }
+  elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+  /* For masked shift, only optimize if the mask is all ones.  */
+  if (n_args > 2
+ && !ix86_masked_all_ones (elems, gimple_call_arg (stmt, n_args - 1)))
+   break;
   if (is_vshift)
{
  if (TREE_CODE (arg1) != VECTOR_CST)
@@ -18408,25 +18417,62 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
}
   break;
 
+case IX86_BUILTIN_SHUFPD512:
+case IX86_BUILTIN_SHUFPS512:
 case IX86_BUILTIN_SHUFPD:
+case IX86_BUILTIN_SHUFPD256:
+case IX86_BUILTIN_SHUFPS:
+case IX86_BUILTIN_SHUFPS256:
+  arg0 = gimple_call_arg (stmt, 0);
+  elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+  /* This is masked shuffle.  Only optimize if the mask is all ones.  */
+  if (n_args > 3
+ && !ix86_masked_all_ones (elems,
+   gimple_call_arg (stmt, n_args - 1)))
+   break;
   arg2 = gimple_call_arg (stmt, 2);
   if (TREE_CODE (arg2) == INTEGER_CST)
{
+ unsigned HOST_WIDE_INT shuffle_mask = TREE_INT_CST_LOW (arg2);
+ /* Check valid imm, refer to gcc.target/i386/testimm-10.c.  */
+ if (shuffle_mask > 255)
+   return false;
+
+ machine_mode imode = GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0)));
  location_t loc = gimple_location (stmt);
- unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2);
- arg0 = gimple_call_arg (stmt, 0);
+ tree itype = (imode == E_DFmode
+   ? long_long_integer_type_node : integer_type_node);
+ tree vtype = build_vector_type (itype, elems);
+ tree_vector_builder elts (vtype, elems, 1);
+
+
+ /* Transform integer shuffle_mask to vector perm_mask which
+is used by vec_perm_expr, refer to shuflp[sd]256/512 in sse.md.  */
+ for (unsigned i = 0; i != elems; i++)
+   {
+ unsigned sel_idx;
+ /* Imm[1:0](if VL > 128, then use Imm[3:2],Imm[5:4],Imm[7:6])
+provide 2 select constrols for each element of the
+destination.  */
+ if (imode == E_DFmode)
+   sel_idx = (i & 1) * elems + (i & ~1)
+ + ((shuffle_mask >> i) & 1);
+

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Thu, Aug 26, 2021 at 7:16 AM Jeff Law  wrote:
>
>
>
> On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote:
>
> On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu  wrote:
>
> On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu  wrote:
>
> On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu  wrote:
>
> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
>  wrote:
>
> On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
>  wrote:
>
> Richard Biener via Gcc-patches  writes:
>
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt  wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
> ...)).
> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Wrap the call to
> extract_integral_bit_field, extracting in an integer mode with
> the same size as 'tmode' and then converting the result
> as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c  | 19 +++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>op0_mode = opt_scalar_int_mode ();
>  }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> + in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists ()
>
> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> cheaper.  Then imode should always be != tmode.  Maybe
> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> how it behaves for composite modes.
>
> Of course the least surprises would happen when we restrict this
> to FLOAT_MODE_P (tmode).
>
> Richard - any preferences?
>
> If the bug is that extract_integral_bit_field is being called with
> a non-integral mode parameter, then it looks odd that we can still
> fall through to it without an integral mode (when exists is false).
>
> If calling extract_integral_bit_field without an integral mode is
> a bug then I think we should have:
>
>   int_mode_for_mode (mode).require ()
>
> whenever mode is not already SCALAR_INT_MODE_P/is_a.
> Ideally we'd make the mode parameter scalar_int_mode too.
>
> extract_integral_bit_field currently has:
>
>   /* Find a correspondingly-sized integer field, so we can apply
>  shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (_mode))
> /* If this fails, we should probably push op0 out to memory and then
>do a load.  */
> int_mode = int_mode_for_mode (mode).require ();
>
> which would seem to be redundant after this change.
>
> I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> up creating a lowpart subreg that's not allowed and that ICEs (and I
> can't see a way to check beforehand).  So it seems to me at least
> part of that function doesn't expect non-integral extraction modes.
>
> But who knows - the code is older than I am (OK, not, but older than
> my involvment in GCC ;))
>
> How about attached patch w/ below changelog
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Make sure we're playing with
> integral modes before call extract_integral_bit_field.
> (extract_integral_bit_field): Add a parameter of type
> scalar_int_mode which corresponds to of tmode.
> And call extract_and_convert_fixed_bit_field instead of
> extract_fixed_bit_field and convert_extracted_bit_field.
> (extract_and_convert_fixed_bit_field): New function, it's a
> combination of extract_fixed_bit_field and
> convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/float16-5.c: New test.
>
> I'd like to ping this patch, or maybe we can use the patch before with
> richi's comments.
>
> Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help

Re: [Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor

2021-08-25 Thread Jonathan Yong via Gcc-patches


On 8/25/21 11:06 PM, Jeff Law wrote:



On 8/25/2021 10:46 AM, Jonathan Yong via Gcc-patches wrote:

Attached patches OK?

    cygming-crtend.c: fix build warnings

    libgcc/Changelog:
    * config/i386/cygming-crtend.c: Fix register_frame_ctor
    and register_frame_dtor warnings.


    extend.texi: add note about reserved ctor/dtor priorities

    gcc/Changelog:
    * doc/extend.texi: Add note about reserved priorities
    to the constructor attribute.

Yes.  Both are OK.

jeff



Done, pushed to master, thanks.


OpenPGP_0x713B5FE29C145D45.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches


Hi Bill,

   Thanks for your comments.

Hi Segher,

   Here is the ChangeLog and patch diff. Thanks.

2021-08-25 Haochen Gui 

gcc/
    * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
    Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
    VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/
    * gcc.target/powerpc/vec-minmax-1.c: New test.
    * gcc.target/powerpc/vec-minmax-2.c: Likewise.

diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index b4e13af4dc6..90527734ceb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin 
(gimple_stmt_iterator *gsi)

   return true;
 /* flavors of vec_min.  */
 case VSX_BUILTIN_XVMINDP:
+    case ALTIVEC_BUILTIN_VMINFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MIN_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMINSD:
 case P8V_BUILTIN_VMINUD:
 case ALTIVEC_BUILTIN_VMINSB:
@@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
*gsi)

 case ALTIVEC_BUILTIN_VMINUB:
 case ALTIVEC_BUILTIN_VMINUH:
 case ALTIVEC_BUILTIN_VMINUW:
-    case ALTIVEC_BUILTIN_VMINFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
@@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin 
(gimple_stmt_iterator *gsi)

   return true;
 /* flavors of vec_max.  */
 case VSX_BUILTIN_XVMAXDP:
+    case ALTIVEC_BUILTIN_VMAXFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MAX_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMAXSD:
 case P8V_BUILTIN_VMAXUD:
 case ALTIVEC_BUILTIN_VMAXSB:
@@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
*gsi)

 case ALTIVEC_BUILTIN_VMAXUB:
 case ALTIVEC_BUILTIN_VMAXUH:
 case ALTIVEC_BUILTIN_VMAXUW:
-    case ALTIVEC_BUILTIN_VMAXFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c

new file mode 100644
index 000..547798fd65c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c

new file mode 100644
index 000..4c6f4365830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be converted
+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Tue, Aug 24, 2021 at 7:39 PM Richard Biener
 wrote:
>
> On Tue, Aug 24, 2021 at 11:38 AM Hongtao Liu  wrote:
> >
> > On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu  wrote:
> > >
> > > On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > > > > >  wrote:
> > > > > > >
> > > > > > > Richard Biener via Gcc-patches  writes:
> > > > > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt  
> > > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi:
> > > > > > > >> ---
> > > > > > > >> OK, I think sth is amiss here upthread.  insv/extv do look 
> > > > > > > >> like they
> > > > > > > >> are designed
> > > > > > > >> to work on integer modes (but docs do not say anything about 
> > > > > > > >> this here).
> > > > > > > >> In fact the caller of extract_bit_field_using_extv is named
> > > > > > > >> extract_integral_bit_field.  Of course nothing seems to check 
> > > > > > > >> what kind of
> > > > > > > >> modes we're dealing with, but we're for example happily doing
> > > > > > > >> expand_shift in 'mode'.  In the extract_integral_bit_field 
> > > > > > > >> call 'mode' is
> > > > > > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > > > > > >> the other way around?  In that case we should wrap the
> > > > > > > >> call to extract_integral_bit_field, extracting in an integer 
> > > > > > > >> mode with the
> > > > > > > >> same size as 'mode' and then converting the result as 
> > > > > > > >> (subreg:HF (reg:HI ...)).
> > > > > > > >> ---
> > > > > > > >>   This is a separate patch as a follow up of upper comments.
> > > > > > > >>
> > > > > > > >> gcc/ChangeLog:
> > > > > > > >>
> > > > > > > >> * expmed.c (extract_bit_field_1): Wrap the call to
> > > > > > > >> extract_integral_bit_field, extracting in an integer 
> > > > > > > >> mode with
> > > > > > > >> the same size as 'tmode' and then converting the result
> > > > > > > >> as (subreg:tmode (reg:imode)).
> > > > > > > >>
> > > > > > > >> gcc/testsuite/ChangeLog:
> > > > > > > >> * gcc.target/i386/float16-5.c: New test.
> > > > > > > >> ---
> > > > > > > >>  gcc/expmed.c  | 19 
> > > > > > > >> +++
> > > > > > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
> > > > > > > >>  2 files changed, 31 insertions(+)
> > > > > > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > > >>
> > > > > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > > > > > >> index 3143f38e057..72790693ef0 100644
> > > > > > > >> --- a/gcc/expmed.c
> > > > > > > >> +++ b/gcc/expmed.c
> > > > > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, 
> > > > > > > >> poly_uint64 bitsize, poly_uint64 bitnum,
> > > > > > > >>op0_mode = opt_scalar_int_mode ();
> > > > > > > >>  }
> > > > > > > >>
> > > > > > > >> +  /* Make sure we are playing with integral modes.  Pun with 
> > > > > > > >> subregs
> > > > > > > >> + if we aren't. When tmode is HFmode, op0 is SImode, there 
> > > > > > > >> will be ICE
> > > > > > > >> + in extract_integral_bit_field.  */
> > > > > > > >> +  if (int_mode_for_mode (tmode).exists ()
> > > > > > > >
> > > > > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > > > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > > > > > how it behaves for composite modes.
> > > > > > > >
> > > > > > > > Of course the least surprises would happen when we restrict this
> > > > > > > > to FLOAT_MODE_P (tmode).
> > > > > > > >
> > > > > > > > Richard - any preferences?
> > > > > > >
> > > > > > > If the bug is that extract_integral_bit_field is being called with
> > > > > > > a non-integral mode parameter, then it looks odd that we can still
> > > > > > > fall through to it without an integral mode (when exists is 
> > > > > > > false).
> > > > > > >
> > > > > > > If calling extract_integral_bit_field without an integral mode is
> > > > > > > a bug then I think we should have:
> > > > > > >
> > > > > > >   int_mode_for_mode (mode).require ()
> > > > > > >
> > > > > > > whenever mode is not already 
> > > > > > > SCALAR_INT_MODE_P/is_a.
> > > > > > > Ideally we'd make the mode parameter scalar_int_mode too.
> > > > > > >
> > > > > > > extract_integral_bit_field currently has:
> > > > > > >
> > > > > > >   /* Find a correspondingly-sized integer field, so we can apply
> > > > > > >  shifts and masks to it.  */
> > > > > > >   scalar_int_mode int_mode;
> > > > > > >   if (!int_mode_for_mode (tmode).exists (_mode))
> > > > > > > /* If this fails, we should probably push op0 out to memory 
> > > > > > > and then
> > > > > > >do a load.

[r12-3142 Regression] FAIL: 20_util/enable_shared_from_this/89303.cc (test for excess errors) on Linux/x86_64

2021-08-25 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

5c85f29537662f1f4195a102cbf0182ffa32d8ac is the first bad commit
commit 5c85f29537662f1f4195a102cbf0182ffa32d8ac
Author: Jan Hubicka 
Date:   Wed Aug 25 21:43:07 2021 +0200

Merge load/stores in ipa-modref summaries

caused

FAIL: 20_util/enable_shared_from_this/89303.cc (test for excess errors)
FAIL: g++.dg/torture/pr89303.C   -O1  (internal compiler error)
FAIL: g++.dg/torture/pr89303.C   -O1  (test for excess errors)

with GCC configured with



To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=20_util/enable_shared_from_this/89303.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=20_util/enable_shared_from_this/89303.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr89303.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr89303.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[r12-3144 Regression] FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[-8589934591, -2\\]" on Linux/x86_64

2021-08-25 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

ed3de423f1694d30f90c024fb6e19e2c6323 is the first bad commit
commit ed3de423f1694d30f90c024fb6e19e2c6323
Author: Martin Sebor 
Date:   Wed Aug 25 14:36:13 2021 -0600

Avoid printing range table header alone.

caused

FAIL: gcc.dg/tree-ssa/evrp1.c scan-tree-dump evrp "\\[5, \\+INF"
FAIL: gcc.dg/tree-ssa/evrp2.c scan-tree-dump evrp "\\[4, 7\\]"
FAIL: gcc.dg/tree-ssa/evrp3.c scan-tree-dump evrp "\\[1, 10\\]"
FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "\\[1B, \\+INF\\]"
FAIL: gcc.dg/tree-ssa/evrp6.c scan-tree-dump evrp "\\[12, \\+INF"
FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[2, 8589934591\\]"
FAIL: gcc.dg/tree-ssa/pr64130.c scan-tree-dump evrp "int \\[-8589934591, -2\\]"

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp1.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp2.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp3.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp4.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/evrp6.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr64130.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[PATCH] Inline IBM long double __gcc_qsub

2021-08-25 Thread David Edelsohn via Gcc-patches

rs6000: inline ldouble __gcc_qsub

While performing some tests of IEEE 128 float for PPC64LE, Michael
Meissner noticed that __gcc_qsub is substantially slower than
__gcc_qadd.  __gcc_qsub valls __gcc_add with the second operand
negated.  Because the functions normally are invoked through
libgcc shared object, the extra PLT overhead has a large impact
on the overall time of the function.  Instead of trying to be
fancy with function decorations to prevent interposition, this
patch inlines the definition of __gcc_qadd into __gcc_qsub with
the negation propagated through the function.

libgcc/ChangeLog:

* config/rs6000/ibm-ldouble.c (__gcc_qsub): Inline negated
__gcc_qadd.

diff --git a/libgcc/config/rs6000/ibm-ldouble.c
b/libgcc/config/rs6000/ibm-ldouble.c
index 4c13453f975..ed74900e5c3 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -158,9 +158,42 @@ __gcc_qadd (double a, double aa, double c, double cc)
 }

 IBM128_TYPE
-__gcc_qsub (double a, double b, double c, double d)
+__gcc_qsub (double a, double aa, double c, double cc)
 {
-  return __gcc_qadd (a, b, -c, -d);
+  double xh, xl, z, q, zz;
+
+  z = a - c;
+
+  if (nonfinite (z))
+{
+  if (fabs (z) != inf())
+   return z;
+  z = -cc + aa - c + a;
+  if (nonfinite (z))
+   return z;
+  xh = z;  /* Will always be DBL_MAX.  */
+  zz = aa - cc;
+  if (fabs(a) > fabs(c))
+   xl = a - z - c + zz;
+  else
+   xl = -c - z + a + zz;
+}
+  else
+{
+  q = a - z;
+  zz = q - c + (a - (q + z)) + aa - cc;
+
+  /* Keep -0 result.  */
+  if (zz == 0.0)
+   return z;
+
+  xh = z + zz;
+  if (nonfinite (xh))
+   return xh;
+
+  xl = z - xh + zz;
+}
+  return pack_ldouble (xh, xl);
 }

 #ifdef __NO_FPRS__

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-25 Thread Martin Sebor via Gcc-patches


On 8/25/21 5:00 PM, Jeff Law wrote:



On 8/25/2021 2:03 PM, Martin Sebor via Gcc-patches wrote:

Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.

Tested on x86_64-linux.
I'm going to object to installing this "as obvious".  I don't see how 
anyone could come to the conclusion this is obvious.


As I explained, the patch just a) removes unnecessary and/or
unused code, b) moves declarations closer to their point of
initialization, and c) gets rid of the linemap_location_before_p()
tests that prevent the note that Richard was looking for from being
printed.  It has no effect besides that and it seems rather trivial
to me.  But I don't want to reopen the whole "obviousness debate"
again.



Please wait for review.


Fine.

Martin

Re: [PATCH] avoid printing range table header alone

2021-08-25 Thread Martin Sebor via Gcc-patches


On 8/25/21 1:34 PM, Andrew MacLeod wrote:

On 8/25/21 3:15 PM, Martin Sebor wrote:

On 8/25/21 11:46 AM, Andrew MacLeod wrote:

On 8/25/21 1:25 PM, Martin Sebor wrote:

I noticed that the output of -fdump-tree-walloca (and probably all
passes that call gimple_ranger::export_global_ranges()) includes
the following two lines for all functions, even when there's
nothing else of relevance after them:

Exported global range table
===

I was a bit puzzled by it at first, wondering if it was leftover
debugging output, until I checked the code and realized it's
a header that may be (but isn't always) followed by the contents
of the table.

To reduce clutter in the dump the attached patch changes the function
to print the header only when it is followed by at least one line of
output.  (Another tweak to reduce even further the amount of output
printed by default that might be worth considering is to print
the table only when TDF_DETAILS is set.)

Martin


Except checking x == 1 won't work if ssa_name(1)  was not updated.

Just set a flag if the header hasn't been printed yet. tried and 
true. I guess you can protect it behind details if you want.


Doh!  I've fixed that.  While testing I came across another instance
of the same pattern so I've made the same change there as well.  I
also took the liberty to make the output slightly more consistent
between the two dumps.

Attached is what I'm planning to commit.  Please let me know if you
have any other suggestions.


OK, thats fine.


I got ahead of myself and pushed the change in r12-3144 before my
second test run finished.  A few tree-ssa tests needed adjusting
after the TDF_DETAILS change.  I fixed those r12-3152.





FWIW, I see duplicate output in the EVRP1 dump that looks just like
it could be the output under Non-varying global ranges. I wonder if
it's the result of the block in ranger_cache::fill_block_cache guarded
by if (DEBUG_RANGE_CACHE).

Martin 



The cache debug mechanism hasnt been fully reworked yet... i moved GORI 
and the ranger to the trace class.. I haven't decided yet what I want to 
change in the cache. its a bit dated.


regardless, if your sources are up to date, DEBUG_RANGE_CACHE is now 
guarded by:


#define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & 
EVRP_MODE_CACHE) \

  == EVRP_MODE_CACHE)

so you'd only see that output if you specify --param=evrp-mode=cache or 
debug


It's not a big deal but the output I noticed is actually coming
from gimple_ranger::dump_bb ().  In the test I happened to look
at it shows up just before Non-varying global ranges:  It's
attached just to show you what I mean.

Martin

;; Function foo (foo, funcdef_no=0, decl_uid=1943, cgraph_uid=1, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }

Substituting values and folding statements

Folding statement: _3 = *x_2(D);
Not folded
Folding statement: return _3;
Not folded

=== BB 2 
 :
_3 = *x_2(D);
return _3;


char foo (char * x)
{
  char _3;

   :
  _3 = *x_2(D);
  return _3;

}



;; Function bar (bar, funcdef_no=1, decl_uid=1946, cgraph_uid=2, symbol_order=1)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }

Substituting values and folding statements

Folding statement: saved_stack.1_16 = __builtin_stack_save ();
Not folded
Folding statement: _18 = *x_17(D);
Not folded
Folding statement: _1 = (long int) _18;
Not folded
Folding statement: _2 = _1 + -1;
 Registering value_relation (_2 < _1) (bb2) at _2 = _1 + -1;
Not folded
Folding statement: _3 = (sizetype) _2;
Not folded
Folding statement: _4 = (sizetype) _18;
Not folded
Folding statement: _5 = (bitsizetype) _4;
Not folded
Folding statement: _6 = _5 * 64;
Not folded
Folding statement: _8 = _4 * 8;
Not folded
Folding statement: arr.0_26 = __builtin_alloca_with_align (_8, 64);
Not folded
Folding statement: __builtin_stack_restore (saved_stack.1_16);
Not folded
Folding statement: return;
Not folded

=== BB 2 
Relational : (_2 < _1)
 :
saved_stack.1_16 = __builtin_stack_save ();
_18 = *x_17(D);
_1 = (long int) _18;
_2 = _1 + -1;
_3 = (sizetype) _2;
_4 = (sizetype) _18;
_5 = (bitsizetype) _4;
_6 = _5 * 64;
_8 = _4 * 8;
arr.0_26 = __builtin_alloca_with_align (_8, 64);
__builtin_stack_restore (saved_stack.1_16);
return;

_1 : long int [-128, 127]
_2 : long int [-129, 126]
_3 : sizetype [0, 126][18446744073709551487, +INF]
_4 : sizetype [0, 127][18446744073709551488, +INF]
_5 : bitsizetype [0, 127][18446744073709551488, 18446744073709551615]
_6 : bitsizetype [0, 8128][0x3fe000, 0x3fffc0]
_8 : sizetype [0, 1016][18446744073709550592, 18446744073709551608]
arr.0_26 : void *[0:D.1956] * [1B, +INF]
Non-varying global ranges:
=:
_1  : long int [-128, 127]
_2  : long

Re: [PATCH 14/34] rs6000: Add remaining overloads

2021-08-25 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:31:01AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-overload.def: Add remaining overloads.

> +; TODO: Note that the entry for VEC_ADDE currently gets ignored in
> +; altivec_resolve_overloaded_builtin.  Revisit whether we can remove
> +; that.  We still need to register the legal builtin forms here.
> +[VEC_ADDE, vec_adde, __builtin_vec_adde]
> +  vsq __builtin_vec_adde (vsq, vsq, vsq);
> +VADDEUQM  VADDEUQM_VSQ
> +  vuq __builtin_vec_adde (vuq, vuq, vuq);
> +VADDEUQM  VADDEUQM_VUQ

I'm not sure what this means.  "Currently" is the problem I think.  Do
you mean that the existing code (before this patch) ignores it already?

> +;  XVRSPIP{TARGET_VSX};VRFIP
> +[VEC_CEIL, vec_ceil, __builtin_vec_ceil]
> +  vf __builtin_vec_ceil (vf);
> +VRFIP
> +  vd __builtin_vec_ceil (vd);
> +XVRDPIP

Is that a comment you forgot to remove, or is there work to be done here?

> +[VEC_CMPEQ, vec_cmpeq, __builtin_vec_cmpeq]
> +;  XVCMPEQSP{TARGET_VSX};VCMPEQFP

And this.

> +;  XVCMPEQSP_P{TARGET_VSX};VCMPEQFP_P
> +[VEC_CMPEQ_P, SKIP, __builtin_vec_vcmpeq_p]

And more!

And more later.  It isn't clear to me at all what those comments mean,
and they are formatted haphazardly, so looks like a WIP?

> +; Note that the entries for VEC_MUL are currently ignored.  See rs6000-c.c:
> +; altivec_resolve_overloaded_builtin, where there is special-case code for
> +; VEC_MUL.  TODO: Is this really necessary?  Investigate.  Seven missing
> +; prototypes here...no corresponding builtins.  Also added "vmulld" in P10
> +; which could be used instead of MUL_V2DI, conditionally?

Space after "..." :-P

> +; Opportunity for improvement: We can use XVRESP instead of VREFP for
> +; TARGET_VSX.  We would need conditional dispatch to allow two possibilities.
> +; Some syntax like "XVRESP{TARGET_VSX};VREFP".
> +; TODO. 
> +[VEC_RE, vec_re, __builtin_vec_re]

Don't we already anyway?  The only difference is whether all VSRs are
allowed or only the VRs, no?  The RTL generated is just the same?  Or
maybe I am overlooking something :-)

> +; **
> +; **
> +; Deprecated overloads that should never have existed at all
> +; **
> +; **

The coding conventions say not to use showy block comments like that,
but it seems appropriate here :-)


Okay for trunk with the  looked at.  Please don't repost this one.
Thanks!


Segher

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote:

On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu  wrote:

On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu  wrote:

On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu  wrote:

On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
 wrote:

On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
 wrote:

Richard Biener via Gcc-patches  writes:

On Fri, Aug 6, 2021 at 5:32 AM liuhongt  wrote:

Hi:
---
OK, I think sth is amiss here upthread.  insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field.  Of course nothing seems to check what kind of
modes we're dealing with, but we're for example happily doing
expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
some integer mode and op0 is HFmode?  From the above I get it's
the other way around?  In that case we should wrap the
call to extract_integral_bit_field, extracting in an integer mode with the
same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
---
   This is a separate patch as a follow up of upper comments.

gcc/ChangeLog:

 * expmed.c (extract_bit_field_1): Wrap the call to
 extract_integral_bit_field, extracting in an integer mode with
 the same size as 'tmode' and then converting the result
 as (subreg:tmode (reg:imode)).

gcc/testsuite/ChangeLog:
 * gcc.target/i386/float16-5.c: New test.
---
  gcc/expmed.c  | 19 +++
  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
  2 files changed, 31 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..72790693ef0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
op0_mode = opt_scalar_int_mode ();
  }

+  /* Make sure we are playing with integral modes.  Pun with subregs
+ if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+ in extract_integral_bit_field.  */
+  if (int_mode_for_mode (tmode).exists ()

check !INTEGRAL_MODE_P (tmode) before, that should be slightly
cheaper.  Then imode should always be != tmode.  Maybe
even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
how it behaves for composite modes.

Of course the least surprises would happen when we restrict this
to FLOAT_MODE_P (tmode).

Richard - any preferences?

If the bug is that extract_integral_bit_field is being called with
a non-integral mode parameter, then it looks odd that we can still
fall through to it without an integral mode (when exists is false).

If calling extract_integral_bit_field without an integral mode is
a bug then I think we should have:

   int_mode_for_mode (mode).require ()

whenever mode is not already SCALAR_INT_MODE_P/is_a.
Ideally we'd make the mode parameter scalar_int_mode too.

extract_integral_bit_field currently has:

   /* Find a correspondingly-sized integer field, so we can apply
  shifts and masks to it.  */
   scalar_int_mode int_mode;
   if (!int_mode_for_mode (tmode).exists (_mode))
 /* If this fails, we should probably push op0 out to memory and then
do a load.  */
 int_mode = int_mode_for_mode (mode).require ();

which would seem to be redundant after this change.

I'm not sure what exactly the bug is, but extract_integral_bit_field ends
up creating a lowpart subreg that's not allowed and that ICEs (and I
can't see a way to check beforehand).  So it seems to me at least
part of that function doesn't expect non-integral extraction modes.

But who knows - the code is older than I am (OK, not, but older than
my involvment in GCC ;))


How about attached patch w/ below changelog

gcc/ChangeLog:

 * expmed.c (extract_bit_field_1): Make sure we're playing with
 integral modes before call extract_integral_bit_field.
 (extract_integral_bit_field): Add a parameter of type
 scalar_int_mode which corresponds to of tmode.
 And call extract_and_convert_fixed_bit_field instead of
 extract_fixed_bit_field and convert_extracted_bit_field.
 (extract_and_convert_fixed_bit_field): New function, it's a
 combination of extract_fixed_bit_field and
 convert_extracted_bit_field.

gcc/testsuite/ChangeLog:
 * gcc.target/i386/float16-5.c: New test.


I'd like to ping this patch, or maybe we can use the patch before with
richi's comments.

Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
this patch, i'd like someone to help review this patch.


Please ignore the former attached patch, should be the patch attached here.

Richard.


+  && imode != tmode
+  && imode != GET_MODE (op0))
+{
+  rtx ret = extract_integral_bit_field (op0, op0_mode,
+

[committed] libstdc++: Add another non-reserved name to tests

2021-08-25 Thread Jonathan Wakely via Gcc-patches

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Check 'sz'.

Tested x86_64-linux. Committed to trunk.

commit ea5674687ac45fe7242c57220b699337899881f0
Author: Jonathan Wakely 
Date:   Wed Aug 25 23:19:25 2021

libstdc++: Add another non-reserved name to tests

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Check 'sz'.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index b0cc21d6cda..b945511e088 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -108,6 +108,7 @@
 
 #define func (
 #define tmp (
+#define sz (
 
 #if __cplusplus < 201103L
 #define uses_allocator  (

Re: [PATCH] Use non-numbered clones for target_clones.

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 6:23 AM, Martin Liška wrote:

Hello.

There's updated version of the patch that:
- strips '.$number' suffix for default clones
- skips numbering for both declarations and definitions of target clones.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

0001-Use-non-numbered-clones-for-target_clones.patch

 From 99fdcd317cec7e3742e161f43951a67421506df7 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 20 Aug 2021 16:35:18 +0200
Subject: [PATCH] Use non-numbered clones for target_clones.

gcc/ChangeLog:

* cgraph.h (create_version_clone_with_body): Add new parameter.
* cgraphclones.c: Likewise.
* multiple_target.c (create_dispatcher_calls): Do not use
numbered suffixes.
(create_target_clone): Likewise here.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mvc5.c: Scan assembly names.
* gcc.target/i386/mvc7.c: Likewise.
* gcc.target/i386/pr95778-1.c: Update scanned patterns.
* gcc.target/i386/pr95778-2.c: Likewise.

OK
jeff

[committed] libstdc++: Fix names.cc test failures on Windows

2021-08-25 Thread Jonathan Wakely via Gcc-patches

The Windows CRT headers define structs with members called f, x, y etc
so don't check those. There are also lots of unnecessary function
parameters in mingw headers using non-reserved names, e.g.

 uses p and z as parameters of mingw_gettimeofday
 uses j as a parameter of imaxabs
 uses l, o and func as parameter names

Those should be fixed in the headers instead.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Adjust for Windows.

Tested powerpc64le-linux. Committed to trunk.

commit f1a08f4d7839a0197a28ecd9ca6d4400769da3a7
Author: Jonathan Wakely 
Date:   Wed Aug 25 22:27:22 2021

libstdc++: Fix names.cc test failures on Windows

The Windows CRT headers define structs with members called f, x, y etc
so don't check those. There are also lots of unnecessary function
parameters in mingw headers using non-reserved names, e.g.

 uses p and z as parameters of mingw_gettimeofday
 uses j as a parameter of imaxabs
 uses l, o and func as parameter names

Those should be fixed in the headers instead.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Adjust for Windows.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 3cbe0d2da54..b0cc21d6cda 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -287,4 +287,14 @@
 
 #endif // __VXWORKS__
 
+#ifdef _WIN32
+#undef Value
+//  defines _CRT_FLOAT::f
+#undef f
+//  defines _CRT_DOUBLE::x and _LONGDOUBLE::x
+#undef x
+//  defines _complex::x and _complex::y
+#undef y
+#endif
+
 #include

[committed] libstdc++: Fix non-reserved names in

2021-08-25 Thread Jonathan Wakely via Gcc-patches

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/valarray: Uglify 'func' parameters.
* testsuite/17_intro/names.cc: Add 'func' to checks.

Tested powerpc64le-linux. Committed to trunk.

commit 0163bbaaef119ef9e98c4b3dcba159609f77c818
Author: Jonathan Wakely 
Date:   Wed Aug 25 22:24:54 2021

libstdc++: Fix non-reserved names in 

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/valarray: Uglify 'func' parameters.
* testsuite/17_intro/names.cc: Add 'func' to checks.

diff --git a/libstdc++-v3/include/std/valarray 
b/libstdc++-v3/include/std/valarray
index ad3e14ebe52..5adc94282ee 100644
--- a/libstdc++-v3/include/std/valarray
+++ b/libstdc++-v3/include/std/valarray
@@ -536,25 +536,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @brief  Apply a function to the array.
*
*  Returns a new valarray with elements assigned to the result of
-   *  applying func to the corresponding element of this array.  The new
+   *  applying __func to the corresponding element of this array.  The new
*  array has the same size as this one.
*
-   *  @param  func  Function of Tp returning Tp to apply.
+   *  @param  __func  Function of Tp returning Tp to apply.
*  @return  New valarray with transformed elements.
*/
-  _Expr<_ValFunClos<_ValArray, _Tp>, _Tp> apply(_Tp func(_Tp)) const;
+  _Expr<_ValFunClos<_ValArray, _Tp>, _Tp> apply(_Tp __func(_Tp)) const;
 
   /**
*  @brief  Apply a function to the array.
*
*  Returns a new valarray with elements assigned to the result of
-   *  applying func to the corresponding element of this array.  The new
+   *  applying __func to the corresponding element of this array.  The new
*  array has the same size as this one.
*
-   *  @param  func  Function of const Tp& returning Tp to apply.
+   *  @param  __func  Function of const Tp& returning Tp to apply.
*  @return  New valarray with transformed elements.
*/
-  _Expr<_RefFunClos<_ValArray, _Tp>, _Tp> apply(_Tp func(const _Tp&)) 
const;
+  _Expr<_RefFunClos<_ValArray, _Tp>, _Tp> apply(_Tp __func(const _Tp&)) 
const;
 
   /**
*  @brief  Resize array.
@@ -1062,18 +1062,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 inline _Expr<_ValFunClos<_ValArray, _Tp>, _Tp>
-valarray<_Tp>::apply(_Tp func(_Tp)) const
+valarray<_Tp>::apply(_Tp __func(_Tp)) const
 {
   typedef _ValFunClos<_ValArray, _Tp> _Closure;
-  return _Expr<_Closure, _Tp>(_Closure(*this, func));
+  return _Expr<_Closure, _Tp>(_Closure(*this, __func));
 }
 
   template
 inline _Expr<_RefFunClos<_ValArray, _Tp>, _Tp>
-valarray<_Tp>::apply(_Tp func(const _Tp &)) const
+valarray<_Tp>::apply(_Tp __func(const _Tp &)) const
 {
   typedef _RefFunClos<_ValArray, _Tp> _Closure;
-  return _Expr<_Closure, _Tp>(_Closure(*this, func));
+  return _Expr<_Closure, _Tp>(_Closure(*this, __func));
 }
 
 #define _DEFINE_VALARRAY_UNARY_OPERATOR(_Op, _Name) \
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index aca7a8e5812..3cbe0d2da54 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -106,6 +106,7 @@
 #endif
 #define z (
 
+#define func (
 #define tmp (
 
 #if __cplusplus < 201103L

Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 9:16 AM, Siddhesh Poyarekar wrote:

On 8/25/21 5:44 PM, Matthias Klose wrote:

On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote:

On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote:

Recognize __builtin_free as being equivalent to free when passed into
__attribute__((malloc ())), similar to how it is treated when it is
encountered as a call.  This fixes spurious warnings in glibc where
xmalloc family of allocators as well as reallocarray, memalign,
etc. are declared to have __builtin_free as the free function.

 gcc/analyzer/ChangeLog:
 * sm-malloc.cc
 (malloc_state_machine::get_or_create_deallocator): Recognize
 __builtin_free.

 gcc/testsuite/ChangeLog:
 * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc,
 compatible_alloc2): New extern allocator declarations.
 (test_9, test_10): New tests.


Looks good to me, thanks
Dave





Please could this be backported to all active branches?



Sure, it looks like only gcc11 needs this since malloc attribute 
matching seems recent.  David, I've never done a backport before, may 
I just cherry-pick, push and post a [committed] patch on list or does 
it need to go through review?
In general it's fine to cherry-pick fixes from the mainline to a release 
branch that fix regressions, incorrect code and the like.


This doesn't fall into one of those categories -- but given this is 
limited to the analyzer, I think it's fine to cherry-pick into the 
release branch as long as David is OK with it as well.


jeff

Re: [Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 10:46 AM, Jonathan Yong via Gcc-patches wrote:

Attached patches OK?

    cygming-crtend.c: fix build warnings

    libgcc/Changelog:
    * config/i386/cygming-crtend.c: Fix register_frame_ctor
    and register_frame_dtor warnings.


    extend.texi: add note about reserved ctor/dtor priorities

    gcc/Changelog:
    * doc/extend.texi: Add note about reserved priorities
    to the constructor attribute.

Yes.  Both are OK.

jeff

Re: [PATCH] diagnostics: Fix sporadic test failure

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 12:07 PM, H.J. Lu wrote:

On Sat, May 29, 2021 at 1:03 PM Jeff Law via Gcc-patches
 wrote:



On 5/29/2021 1:55 PM, Bernd Edlinger wrote:

On 5/29/21 9:31 PM, Jeff Law wrote:

On 5/28/2021 6:38 AM, Bernd Edlinger wrote:

Hi,

it turns out to be reproducible this way:

COLUMNS=80 make check-gcc-c RUNTESTFLAGS="plugin.exp=diagnostic*"

Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/plugin/plugin.exp ...
FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
-fplugin=./diagnostic_plugin_test_tree_expression_range.so  1 blank line(s) 
in output
FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
-fplugin=./diagnostic_plugin_test_tree_expression_range.so  expected multiline 
pattern lines 550-551 not found: "
__builtin_types_compatible_p \(long, int\) \+ f \(i\)\);.*\n
~\^~~\n"
FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
-fplugin=./diagnostic_plugin_test_tree_expression_range.so (test for excess 
errors)

a lot more errors happen with COLUMNS=20.

Tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.


2021-05-28  Bernd Edlinger  

  * gcc.dg/plugin/diagnostic_plugin_show_trees.c (plugin_init): Fix 
caret_max_with.
  * gcc.dg/plugin/diagnostic_plugin_test_inlining.c
  (plugin_init): Likewise.
  * gcc.dg/plugin/diagnostic_plugin_test_paths.c (plugin_init): Likewise.
  * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
  (plugin_init): Likewise.
  * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c
  (plugin_init): Likewise.

So while you've got a patch here, you haven't indicated what the problem 
actually was.  I'm guessing caret_max_width was uninitialized and thus we got 
random-ish values.  Presumably those randomish-values are what caused tests to 
occasionally appear to truncate their output and fail?

If that's the case, this is fine.  If it's something deeper, please provide a 
bit of background to help in evaluating the patch.


Yes, the problem is just the function get_terminal_width in diagnostic.c, can
somehow learn the X-terminal's window dimensions where the make check is
started:

int
get_terminal_width (void)
{
const char * s = getenv ("COLUMNS");
if (s != NULL) {
  int n = atoi (s);
  if (n > 0)
return n;
}

#ifdef TIOCGWINSZ
struct winsize w;
w.ws_col = 0;
if (ioctl (0, TIOCGWINSZ, ) == 0 && w.ws_col > 0)
  return w.ws_col;
#endif

return INT_MAX;
}

and this is used to initialize context->caret_max_width in 
diagnostic_set_caret_max_width,
and possibly also other places. This causes a small variation in the output that
is trips the test cases.  It is just an extra newline in some cases, that I 
have not
debugged why exactly this happens, but I assume this is intentional to make the
diagnostics spanning multiple lines better readable to a human.

Thanks.  So for the testsuite we certainly don't want to be doing that
:-)  OK for the trunk, thanks for chasing it down -- I've been seeing
these failures, but haven't had the time to chase down a root cause.


I'd like to backport it to GCC 11 branch to avoid random failures on
GCC 11 branch:

https://gcc.gnu.org/pipermail/gcc-regression/2021-August/075244.html

Sure.
jeff

Re: [PATCH] declare get_range_query attribute returns_nonnull

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 5:01 PM, Martin Sebor via Gcc-patches wrote:

Andrew, based on your remarks in our discussion Re: enable ranger
and caching in pass_waccess, I've added some comments to struct
function and get_range_query() and declared the latter with
attribute returns_nonnull to make it explicit both to readers
and to GCC that the x_range_query member is never null.

Tested on x86_64-linux.

Martin

gcc-get_range_query.diff

gcc/ChangeLog:
* function.h (function): Add comments.
(get_range_query): Same.  Add attribute returns nonnull.

OK
jeff

[PATCH] declare get_range_query attribute returns_nonnull

2021-08-25 Thread Martin Sebor via Gcc-patches


Andrew, based on your remarks in our discussion Re: enable ranger
and caching in pass_waccess, I've added some comments to struct
function and get_range_query() and declared the latter with
attribute returns_nonnull to make it explicit both to readers
and to GCC that the x_range_query member is never null.

Tested on x86_64-linux.

Martin
gcc/ChangeLog:
	* function.h (function): Add comments.
	(get_range_query): Same.  Add attribute returns nonnull.

diff --git a/gcc/function.h b/gcc/function.h
index 0db51775e7c..36003e7576a 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -312,7 +312,8 @@ struct GTY(()) function {
 
   /* Range query mechanism for functions.  The default is to pick up
  global ranges.  If a pass wants on-demand ranges OTOH, it must
- call enable/disable_ranger().  */
+ call enable/disable_ranger().  The pointer is never null.  It
+ should be queried by calling get_range_query().  */
   range_query * GTY ((skip)) x_range_query;
 
   /* Last statement uid.  */
@@ -719,10 +720,10 @@ extern const char *current_function_name (void);
 extern void used_types_insert (tree);
 
 /* Returns the currently active range access class.  When there is no active
-   range class, global ranges are used.  */
+   range class, global ranges are used.  Never returns null.  */
 
-inline range_query *
-get_range_query (struct function *fun)
+ATTRIBUTE_RETURNS_NONNULL inline range_query *
+get_range_query (const struct function *fun)
 {
   return fun->x_range_query;
 }

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-25 Thread Jeff Law via Gcc-patches





On 8/25/2021 2:03 PM, Martin Sebor via Gcc-patches wrote:

Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.

Tested on x86_64-linux.
I'm going to object to installing this "as obvious".  I don't see how 
anyone could come to the conclusion this is obvious.


Please wait for review.

jeff

Re: [PATCH 13/34] rs6000: Add Cell builtins

2021-08-25 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:31:00AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-builtin-new.def: Add cell stanza.

This one is fine, too.  Thanks!


Segher

Re: [PATCH 12/34] rs6000: Add miscellaneous builtins

2021-08-25 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:30:59AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-builtin-new.def: Add ieee128-hw, dfp,
>   crypto, and htm stanzas.

Okay for trunk.  Thanks!


Segher

Re: [PATCH 11/34] rs6000: Add MMA builtins

2021-08-25 Thread Segher Boessenkool

Hi!

On Thu, Jul 29, 2021 at 08:30:58AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-builtin-new.def: Add mma stanza.

Okay for trunk.  Thanks!


Segher

Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

2021-08-25 Thread Michael Meissner via Gcc-patches

On Wed, Aug 25, 2021 at 05:29:22PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote:
> > On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> > > Out of interest, did you notice any scheduling differences with this?
> > 
> > I don't use the built-ins so I wouldn't notice a difference.  I noticed 
> > this as
> > part of the next patch to add support for XXSPLTIDP (and ultimately 
> > XXSPLTIW in
> > a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
> > DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
> > certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.
> 
> Yeah, you might notice scheduling differences when "normal" code starts
> using this.  Builtins are meh :-)

Yep, that's my thought.  It just noticed the vecsimple stuff as I was adding
the xxsplti{w,dp,32dx} support.

> > However, I suspect if you aren't running spec on an otherwise
> > idle machine, things will change where XXSPLTIDP will be more of a win by
> > eliminating the loads.
> 
> It has more bandwidth and it is less contested, yup.  It does have the
> usual "prefixed" gotchas though.

Note, it replaces one prefixed instruction (plfd) with another (xxspltidp).

> > While XXSPLTIDP by itself is positive, unfortunately, there is a regression 
> > in
> > cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
> > track down.
> > 
> > If I add both instructions, several of the benchmarks improve (including
> > xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression 
> > and
> > fotonik3d_r also has a new 3% regression.
> > 
> > Given that many more programs use floating point constants than vector
> > constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
> > the XXSPLTIDP now, and try to figure out the differences before submitting 
> > the
> > XXSPLTIW patch.
> 
> It would be interesting to figure out a pattern behind the regressions :-)

Definately.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

2021-08-25 Thread Segher Boessenkool

On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote:
> On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> > Out of interest, did you notice any scheduling differences with this?
> 
> I don't use the built-ins so I wouldn't notice a difference.  I noticed this 
> as
> part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW 
> in
> a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
> DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
> certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.

Yeah, you might notice scheduling differences when "normal" code starts
using this.  Builtins are meh :-)

> However, I suspect if you aren't running spec on an otherwise
> idle machine, things will change where XXSPLTIDP will be more of a win by
> eliminating the loads.

It has more bandwidth and it is less contested, yup.  It does have the
usual "prefixed" gotchas though.

> While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
> cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
> track down.
> 
> If I add both instructions, several of the benchmarks improve (including
> xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
> fotonik3d_r also has a new 3% regression.
> 
> Given that many more programs use floating point constants than vector
> constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
> the XXSPLTIDP now, and try to figure out the differences before submitting the
> XXSPLTIW patch.

It would be interesting to figure out a pattern behind the regressions :-)


Segher

[PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-25 Thread Michael Meissner via Gcc-patches

>From 327273dfeec5c000f3c33ca7b88ee0097fd33586 Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Wed, 25 Aug 2021 00:31:35 -0400
Subject: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

I built a compiler on a little endian power8 system where the default long
double was IEEE 128-bit instead of IBM 128-bit.  I discovered that on
power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
instead of the Altivec lxv.

In addition, I noticed the constant that was being loaded (1.0q) could be
loaded by the lxvkq instruction.

I rewrote the test to handle all forms of vector load and store that can
be generated.  And I changed the constant to be one that lxvkq does not
support.

I did bootstrap tests on the following systems, and the the test ran in all
environments (each of the systems were configured for the cpu mentioned):

   1)   Little endian power9  with IBM  128-bit long double
   2)   Little endian power9  with IEEE 128-bit long double
   3)   Little endian power8  with IBM  128-bit long double
   4)   Little endian power8  with IEEE 128-bit long double
   5)   Little endian power10 with IBM  128-bit long double
   6)   Little endian power10 with IEEE 128-bit long double
   7)   Big endianpower8  with IBM  128-bit long double

Can I check this patch into the master branch and later backport it to GCC-11?

2021-08-25  Michael Meissner  

gcc/testsuite/
* gcc.target/powerpc/float128-call.c: Fix test for IEEE 128-bit
long double and power10.
---
 .../gcc.target/powerpc/float128-call.c| 27 +--
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-call.c 
b/gcc/testsuite/gcc.target/powerpc/float128-call.c
index b64ffc68bfa..d1cf47e4298 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-call.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-call.c
@@ -6,22 +6,33 @@
 #error "-mfloat128 is not supported."
 #endif
 
+/* Pick a constant to load that cannot be generated by the power10 lxvkq
+   instruction.  */
 #ifdef __LONG_DOUBLE_IEEE128__
 #define TYPE long double
-#define ONE  1.0L
+#define TEN  10.0L
 
 #else
 #define TYPE __float128
-#define ONE  1.0Q
+#define TEN  10.0Q
 #endif
 
 /* Test to make sure vector registers are used for passing IEEE 128-bit
floating point values and returning them. Also make sure the 'q' suffix is
-   handled.  */
-TYPE one (void) { return ONE; }
+   handled for __float128.  */
+TYPE one (void) { return TEN; }
 void store (TYPE a, TYPE *p) { *p = a; }
 
-/* { dg-final { scan-assembler {\mlxvd2x 34\M} {target be} } } */
-/* { dg-final { scan-assembler {\mstxvd2x 34\M} {target be} } } */
-/* { dg-final { scan-assembler {\mlvx 2\M} {target le} } }  */
-/* { dg-final { scan-assembler {\mstvx 2\M} {target le} } } */
+/* This regexp captures the different vector load/stores that can be generated:
+
+   lxvd2x  -- big endian power7/power8, little endian power8
+   lvx -- Altivec
+   lxv -- power9
+   plxv-- power10
+   lxvx-- X-form variant.
+   stxvd2x -- big endian power7/power8, little endian power8
+   stvx-- Altivec
+   lxvx-- power9/power10.  */
+
+/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } } */
+/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } } */
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[committed] libstdc++: Fix conditions for optimizing uninitialized algos [PR102064]

2021-08-25 Thread Jonathan Wakely via Gcc-patches

While laying some groundwork for constexpr std::vector, I noticed some
bugs in the std::uninitialized_xxx algorithms. The conditions being
checked for optimizing trivial cases were not quite right, as shown in
the examples in the PR.

This consolidates the checks into a single macro. The macro has
appropriate definitions for C++98 or for later standards, to avoid a #if
everywhere the checks are used. For C++11 and later the check makes a
call to a new function doing a static_assert to ensure we don't use
assignment in cases where construction would have been invalid.
Extracting that check to a separate function will be useful for
constexpr std::vector, as that can't use std::uninitialized_copy
directly because it isn't constexpr).

The consolidated checks mean that some slight variations in static
assert message are gone, as there is only one place that does the assert
now. That required adjusting some tests. As part of that the redundant
89164_c++17.cc test was merged into 89164.cc which is compiled as C++17
by default now, but can also use other -std options if the
C++17-specific error is made conditional with a target selector.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102064
* include/bits/stl_uninitialized.h (_GLIBCXX_USE_ASSIGN_FOR_INIT):
Define macro to check conditions for optimizing trivial cases.
(__check_constructible): New function to do static assert.
(uninitialized_copy, uninitialized_fill, uninitialized_fill_n):
Use new macro.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error pattern.
* testsuite/23_containers/vector/cons/89164.cc: Likewise. Add
C++17-specific checks from 89164_c++17.cc.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Removed.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/102064.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy_n/102064.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/102064.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/102064.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

commit ead408529d7a69873a7c14dd12fa043cd5862253
Author: Jonathan Wakely 
Date:   Wed Aug 25 21:10:48 2021

libstdc++: Fix conditions for optimizing uninitialized algos [PR102064]

While laying some groundwork for constexpr std::vector, I noticed some
bugs in the std::uninitialized_xxx algorithms. The conditions being
checked for optimizing trivial cases were not quite right, as shown in
the examples in the PR.

This consolidates the checks into a single macro. The macro has
appropriate definitions for C++98 or for later standards, to avoid a #if
everywhere the checks are used. For C++11 and later the check makes a
call to a new function doing a static_assert to ensure we don't use
assignment in cases where construction would have been invalid.
Extracting that check to a separate function will be useful for
constexpr std::vector, as that can't use std::uninitialized_copy
directly because it isn't constexpr).

The consolidated checks mean that some slight variations in static
assert message are gone, as there is only one place that does the assert
now. That required adjusting some tests. As part of that the redundant
89164_c++17.cc test was merged into 89164.cc which is compiled as C++17
by default now, but can also use other -std options if the
C++17-specific error is made conditional with a target selector.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102064
* include/bits/stl_uninitialized.h (_GLIBCXX_USE_ASSIGN_FOR_INIT):
Define macro to check conditions for optimizing trivial cases.
(__check_constructible): New function to do static assert.
(uninitialized_copy, uninitialized_fill, uninitialized_fill_n):
Use new macro.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error pattern.
* testsuite/23_containers/vector/cons/89164.cc: Likewise. Add
C++17-specific checks from 89164_c++17.cc.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Removed.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy/102064.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy_n/102064.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill/102064.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/102064.cc:
New test.

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h

[committed] libstdc++: Remove __gnu_cxx::rope::erase(size_type) [PR102048]

2021-08-25 Thread Jonathan Wakely via Gcc-patches

This function claims to remove a single character at index p, but it
actually removes p+1 characters beginning at p. So r.erase(0) removes
the first character, but r.erase(1) removes the second and third, and
r.erase(2) removes the second, third and fourth. This is not a useful
API.

The overload is present in the SGI STL  header that we
imported, but it isn't documented in the API reference. The erase
overloads that are documented are:

erase(const iterator& p)
erase(const iterator& f, const iterator& l)
erase(size_type i, size_type n);

Having an erase(size_type p) overload that erases a single character (as
the comment says it does) might be useful, but would be inconsistent
with std::basic_string::erase(size_type p = 0, size_type n = npos),
which erases from p to the end of the string when called with a single
argument.

Since the function isn't part of the documented API, doesn't do what it
claims to do (or anything useful) and "fixing" it would leave it
inconsistent with basic_string, I'm just removing that overload.

libstdc++-v3/ChangeLog:

PR libstdc++/102048
* include/ext/rope (rope::erase(size_type)): Remove broken
function.

Tested powerpc64le-linux. Committed to trunk.

commit 2cd229dec8d6716938de5052479d059d306969da
Author: Jonathan Wakely 
Date:   Wed Aug 25 16:42:49 2021

libstdc++: Remove __gnu_cxx::rope::erase(size_type) [PR102048]

This function claims to remove a single character at index p, but it
actually removes p+1 characters beginning at p. So r.erase(0) removes
the first character, but r.erase(1) removes the second and third, and
r.erase(2) removes the second, third and fourth. This is not a useful
API.

The overload is present in the SGI STL  header that we
imported, but it isn't documented in the API reference. The erase
overloads that are documented are:

erase(const iterator& p)
erase(const iterator& f, const iterator& l)
erase(size_type i, size_type n);

Having an erase(size_type p) overload that erases a single character (as
the comment says it does) might be useful, but would be inconsistent
with std::basic_string::erase(size_type p = 0, size_type n = npos),
which erases from p to the end of the string when called with a single
argument.

Since the function isn't part of the documented API, doesn't do what it
claims to do (or anything useful) and "fixing" it would leave it
inconsistent with basic_string, I'm just removing that overload.

libstdc++-v3/ChangeLog:

PR libstdc++/102048
* include/ext/rope (rope::erase(size_type)): Remove broken
function.

diff --git a/libstdc++-v3/include/ext/rope b/libstdc++-v3/include/ext/rope
index 9681dbc6225..d4406a942b6 100644
--- a/libstdc++-v3/include/ext/rope
+++ b/libstdc++-v3/include/ext/rope
@@ -2401,11 +2401,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
this->_M_tree_ptr = __result;
   }
 
-  // Erase, single character
-  void
-  erase(size_type __p)
-  { erase(__p, __p + 1); }
-
   // Insert, iterator variants.
   iterator
   insert(const iterator& __p, const rope& __r)

Re: [PING][PATCH] enable ranger and caching in pass_waccess

2021-08-25 Thread Martin Sebor via Gcc-patches


On 8/25/21 10:14 AM, Andrew MacLeod wrote:

On 8/25/21 11:20 AM, Martin Sebor wrote:

Ping: Andrew, did I answer your questions?  Do you (or anyone else)
have any other comments on the latest patch below?

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html



I wasn't attempting to block it, its outside my review purview..

I was merely commenting that you should not need a pointer to a 
range_query at all anymore









Are you planning to transition to using the get_range_query() 
interface instead of keeping a range_query pointer in the 
pointer_query class?


This pass and to a smaller extent the pointer_query class that's
used by it and the strlen pass are still a work in progress.
I also still need to convert the strlen pass to use Ranger and
I expect it will take some changes to pointer_query.  So at that
point, if going through get_range_query (cfun) everywhere is what
you recommend, I'm happy to do it.

absolutely. you should not need to even know whether you have a ranger 
instance running or not. get_range_query will give you whichever is 
active, and there is ALWAYS something active.. defaulting to the global 
version.


the code in get_range() seems to be the end of the call chain which uses 
the pointer and should be consolidated to something much simpler


  if (rvals && stmt)
     {
   if (!rvals->range_of_expr (vr, val, stmt))
     return NULL_TREE;

     <...>

   // ?? This entire function should use get_range_query or
get_global_range_query (),
   // instead of doing something different for RVALS and global ranges.

   if (!get_global_range_query ()->range_of_expr (vr, val) ||
vr.undefined_p ())
     return NULL_TREE;


This entire section can boil down to something like

if (!get_range_query (cfun)->range_of_expr (vr, val, stmt))
   return NULL;


So get_range_query (cfun) will never be null?  And no special handling
of INTEGER_CST is needed, or checking for SSA_NAME.  Nice.  Attached
is another revision of the same patch with this function simplified.

(FYI: I expect the whole function to eventually go away, and to make
other similar simplifications, but I'm not there yet.)


PS There has been an effort to get rid of global variables from GCC,
or, as the first step, to avoid accessing them directly(*).  If and
when that happens, it seems like each pass will have to store either
the ranger instance as a member (directly or indirectly, via a member
of a class that stores it) or the function passed to pass::execute()
if it wants to access either.

[*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html
The patch at the link above wasn't approved but IIUC removing globals
from GCC is still a goal.


I have no idea what direction that is going, but the net effect will be 
the same in the end.  You'll be calling get_range_query() with either 
the function pointer, or the pass pointer, or something.. but you should 
never need to pass around  a pointer to either a ranger or range-query.  
As I said earlier, this will likely be a pass property and accessed thru 
the pass eventually... but until then, use what we have.. cfun.  It'll 
be trivial to transition down the road to whatever the ultimate solution is.


Okay, I'll do that in my future changes.

Martin
Enable ranger and caching in pass_waccess.

gcc/ChangeLog:

	* gimple-ssa-warn-access.cc (get_size_range): Add argument.
	(check_access): Pass additional argument.
	(check_memop_access): Remove template and make a member function.
	(maybe_check_dealloc_call): Make a pass_waccess member function.
	(class pass_waccess): Add, rename, and remove members.
	(pass_waccess::pass_waccess): Adjust to name change.
	(pass_waccess::~pass_waccess): Same.
	(check_alloca): Make a member function.
	(check_alloc_size_call): Same.
	(check_strcat): Same.
	(check_strncat): Same.
	(check_stxcpy): Same.
	(check_stxncpy): Same.
	(check_strncmp): Same.
	(maybe_warn_rdwr_sizes): Rename...
	(pass_waccess::maybe_check_access_sizes): ...to this.
	(pass_waccess::check_call): Adjust to name changes.
	(pass_waccess::maybe_check_dealloc_call): Make a pass_waccess member
	function.
	(pass_waccess::execute): Adjust to name changes.
	* gimple-ssa-warn-access.h (check_memop_access): Remove.
	* pointer-query.cc (access_ref::phi): Handle null pointer.
	(access_ref::inform_access): Same.
	(pointer_query::put_ref): Modify a cached value, not a copy of it.
	(pointer_query::dump): New function.
	(compute_objsize_r): Avoid overwriting access_ref::bndrng.  Cache
	more results.
	* pointer-query.h (pointer_query::dump): Declare.
	* tree-ssa-strlen.c (get_range): Simplify.  Use function query.
	(dump_strlen_info): Use function query.
	(printf_strlen_execute): Factor code out into pointer_query::put_ref.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wstringop-overflow-11.c: Remove xfails.
	* gcc.dg/Wstringop-overflow-12.c: Same.
	* gcc.dg/Wstringop-overflow-43.c: Add xfails.
	* gcc.dg/Wstringop-overflow-73.c: New

Re: [PATCH] Fix PR c++/66590: incorrect warning "reaches end of non-void function" for switch

2021-08-25 Thread Jason Merrill via Gcc-patches


On 8/13/21 2:40 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

So the problem here is there is code in the C++ front-end not to add a
break statement (to the IR) if the previous block does not fall through.
The problem is the code which does the check to see if the block
may fallthrough does not check a CLEANUP_STMT; it assumes it is always
fall through.  Anyways this adds the code for the case of a CLEANUP_STMT
that is only for !CLEANUP_EH_ONLY (the try/finally case).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.


OK, thanks.


gcc/cp/ChangeLog:

* cp-objcp-common.c (cxx_block_may_fallthru): Handle
CLEANUP_STMT for the case which will be try/finally.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wreturn-5.C: New test.
---
  gcc/cp/cp-objcp-common.c  |  9 +
  gcc/testsuite/g++.dg/warn/Wreturn-5.C | 15 +++
  2 files changed, 24 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wreturn-5.C

diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index beef0123b04..43888507b85 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -317,6 +317,15 @@ cxx_block_may_fallthru (const_tree stmt)
return true;
return block_may_fallthru (ELSE_CLAUSE (stmt));
  
+case CLEANUP_STMT:

+  /* Just handle the try/finally cases.  */
+  if (!CLEANUP_EH_ONLY (stmt))
+   {
+ return (block_may_fallthru (CLEANUP_BODY (stmt))
+ && block_may_fallthru (CLEANUP_EXPR (stmt)));
+   }
+  return true;
+
  default:
return c_block_may_fallthru (stmt);
  }
diff --git a/gcc/testsuite/g++.dg/warn/Wreturn-5.C 
b/gcc/testsuite/g++.dg/warn/Wreturn-5.C
new file mode 100644
index 000..543e33e905d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wreturn-5.C
@@ -0,0 +1,15 @@
+// PR C++/66590
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+struct A{ ~A();};
+
+int f(int x)
+{
+A a;
+switch (x)
+{
+case 1: { A tmp; return 1; } break;
+default: return 0;
+}
+}  // { dg-bogus "control reaches end of non-void function" }

Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-08-25 Thread John McCall via Gcc-patches

On Wed, Aug 25, 2021 at 8:36 AM H.J. Lu  wrote:

> On Mon, Aug 23, 2021 at 10:55 PM John McCall  wrote:
> > On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu  wrote:
> >> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu  wrote:
> >> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers 
> wrote:
> >> > > On Tue, 13 Jul 2021, H.J. Lu wrote:
> >> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <
> pengfei.w...@intel.com> wrote:
> >> > > > >
> >> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1
> registers.
> >> > > > >
> >> > > > > Can you please explain the behavior here? Is there difference
> between _Float16 and _Complex _Float16 when return? I.e.,
> >> > > > > 1, In which case will _Float16 values return in both %xmm0 and
> %xmm1?
> >> > > > > 2, For a single _Float16 value, are both real part and
> imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1
> respectively?
> >> > > >
> >> > > > Here is the v2 patch to add the missing _Float16 bits.   The PDF
> file is at
> >> > > >
> >> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
> >> > >
> >> > > This PDF shows _Complex _Float16 as having a size of 2 bytes
> (should be
> >> > > 4-byte size, 2-byte alignment).
> >> > >
> >> > > It also seems to change double from 4-byte to 8-byte alignment,
> which is
> >> > > wrong.  And it's inconsistent about whether it covers the long
> double =
> >> > > double (Android) case - it shows that case for _Complex long double
> but
> >> > > not for long double itself.
> >> >
> >> > Here is the v3 patch with the fixes.  I also updated the PDF file.
> >>
> >> Here is the final patch I checked in.   _Complex _Float16 is changed to
> return
> >> in XMM0 register.   The new PDF file is at
> >>
> >> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
> >
> >
> > This should be explicit that the real part is returned in bits 0..15 and
> the imaginary part is returned in bits 16..31, or however we conventionally
> designate subcomponents of a vector.
>
> How about this?
>
> diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
> index 860ff66..8f527c1 100644
> --- a/low-level-sys-info.tex
> +++ b/low-level-sys-info.tex
> @@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory.
>  & \texttt{__float128} & memory \\
>  \hline
>  & \texttt{_Complex _Float16} & \reg{xmm0} \\
> +& & The real part is returned in bits 0..15. The imaginary part is
> +returned \\
> +& & in bits 16..31.\\
>  \cline{2-3}
>  Complex & \texttt{_Complex float} & \EDX:\EAX \\
>  floating- & & The real part is returned in \EAX. The imaginary part is
>
>
> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf


Looks good to me, thanks.

John.

Re: [PATCH] c++: Fix up value initialization of structs with zero width bitfields [PR102019]

2021-08-25 Thread Jason Merrill via Gcc-patches


On 8/23/21 4:16 PM, Jakub Jelinek wrote:

Hi!

The removal of remove_zero_width_bit_fields, in addition to triggering
some ABI issues that need solving anyway (ABI incompatibility between
C and C++) also resulted in UB inside of gcc, we now call build_zero_init
which calls build_int_cst on an integral type with TYPE_PRECISION of 0.

Fixed by ignoring the zero width bitfields.  I understand
build_value_init_noctor wants to initialize to 0 even unnamed bitfields
(of non-zero width), at least until we have some CONSTRUCTOR flag that says
that even all the padding bits should be cleared.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.


2021-08-23  Jakub Jelinek  

PR c++/102019
* init.c (build_value_init_noctor): Ignore unnamed zero-width
bitfields.

--- gcc/cp/init.c.jj2021-07-29 13:24:42.698012879 +0200
+++ gcc/cp/init.c   2021-08-23 14:43:08.791947395 +0200
@@ -427,6 +427,11 @@ build_value_init_noctor (tree type, tsub
  == NULL_TREE))
continue;
  
+	  /* Ignore unnamed zero-width bitfields.  */

+ if (DECL_UNNAMED_BIT_FIELD (field)
+ && integer_zerop (DECL_SIZE (field)))
+   continue;
+
  /* We could skip vfields and fields of types with
 user-defined constructors, but I think that won't improve
 performance at all; it should be simpler in general just

Jakub

Re: [PATCH] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-08-25 Thread Jason Merrill via Gcc-patches


On 8/21/21 12:55 AM, Barrett Adair via Gcc-patches wrote:

This patch fixes AST comparison for trailing return types using dependent
sizeof/alignof/noexcept expressions as template value arguments. I believe
this bug is over a decade old, hailing from GCC 4.6. I found it over 5
years ago and sat on the repro until I had time to fix it myself.


Thanks!


The new test cases demonstrate redeclarations mistakenly parsed as
ambiguous overloads. These fail with gcc, but pass with clang. The last
test case concerns a GNU extension for alignof (hence the -Wno-pedantic).
After applying this patch, bootstrap succeeds, the new tests pass, and my
native "make -k check" outputs look essentially similar to these based on
the same commit:
https://gcc.gnu.org/pipermail/gcc-testresults/2021-August/715032.html
(except I only waited for a few thousand of the libstdc++ tests to pass).

I think it makes sense to now split the switch-case blocks here since there
isn't much shared logic left, and I have a patch to do that if desired, but
I think the patch below is the cleanest for initial review.

I studied the history and churn of the comparing_specializations hacks.
While I am still not entirely certain this is the correct change to make
there, the change seems at least superficially reasonable to me. I haven't
been able to break it yet. Perhaps someone more familiar with this area of
the compiler can weigh in. See also the existing cp_unevaluated_operand
push/pop for these EXPR codes in cp_walk_tree.

I am awaiting a response to the copyright assignment request sent to
ass...@gnu.org.


Until that goes through, you can use DCO sign-off 
(https://gcc.gnu.org/dco.html).



commit fe64ca143dfb9021b4c47abb6382827c13e1f0cd
Author: Barrett Adair 
Date:   Fri Aug 20 15:37:36 2021 -0500


Please generate your patch with git format-patch for easier application.

It also looks like the patch was corrupted by your email client; a 
simple fix for that is to attach the patch instead of pasting it in the 
body of your message.  Or 'git help format-patch' has suggestions for 
ways to make including the patch inline work.


I mostly use git send-email myself, or attachments when using send-email 
would be too complicated.



 Fix cp_tree_equal for template value args using
 dependent sizeof/alignof/noexcept expressions

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 3c62dd74380..fce2a46cfa8 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3903,40 +3903,41 @@ cp_tree_equal (tree t1, tree t2)
 as being equivalent to anything.  */
   if (VAR_P (o1) && DECL_NAME (o1) == NULL_TREE
  && !DECL_RTL_SET_P (o1))
/*Nop*/;
   else if (VAR_P (o2) && DECL_NAME (o2) == NULL_TREE
   && !DECL_RTL_SET_P (o2))
/*Nop*/;
   else if (!cp_tree_equal (o1, o2))
return false;

   return cp_tree_equal (TREE_OPERAND (t1, 1), TREE_OPERAND (t2, 1));
}

  case PARM_DECL:
/* For comparing uses of parameters in late-specified return types
   with an out-of-class definition of the function, but can also come
   up for expressions that involve 'this' in a member function
   template.  */



if (comparing_specializations
+  && !cp_unevaluated_operand


This looks dangerous; I would expect it to mean that in

template  struct A { };
template  void f(T t1) { A a; }
template  void g(T t2) { A a; }

the two As are treated as equivalent; that's what the 
comparing_specializations code is trying to prevent.



&& DECL_CONTEXT (t1) != DECL_CONTEXT (t2))
   /* When comparing hash table entries, only an exact match is
 good enough; we don't want to replace 'this' with the
 version from another function.  But be more flexible
 with parameters with identical contexts.  */
   return false;

if (same_type_p (TREE_TYPE (t1), TREE_TYPE (t2)))
   {
if (DECL_ARTIFICIAL (t1) ^ DECL_ARTIFICIAL (t2))
  return false;
if (CONSTRAINT_VAR_P (t1) ^ CONSTRAINT_VAR_P (t2))
  return false;
if (DECL_ARTIFICIAL (t1)
|| (DECL_PARM_LEVEL (t1) == DECL_PARM_LEVEL (t2)
&& DECL_PARM_INDEX (t1) == DECL_PARM_INDEX (t2)))
  return true;
   }
return false;

@@ -3974,63 +3975,68 @@ cp_tree_equal (tree t1, tree t2)
return true;

  case CONSTRAINT_INFO:
return cp_tree_equal (CI_ASSOCIATED_CONSTRAINTS (t1),
  CI_ASSOCIATED_CONSTRAINTS (t2));

  case CHECK_CONSTR:
return (CHECK_CONSTR_CONCEPT (t1) == CHECK_CONSTR_CONCEPT (t2)
&& comp_template_args (CHECK_CONSTR_ARGS (t1),
   CHECK_CONSTR_ARGS (t2)));

  case TREE_VEC:
/* These are template args.  Really we should be getting the
   caller to do this as it knows it to be true.  */
if (!comp_template_args (t1, t2, NULL, NULL, false))
   return false;
return true;

  case SIZEOF_EXPR:
  case ALIGNOF_EXPR:
+case NOEXCEPT_EXPR:
{
   tree o1 = TREE_OPERAND (t1, 0);
   tree o2 = TREE_OPERAND (t2, 0);

   if (code1 ==

[PATCH] improve note location and refactor warn_uninit

2021-08-25 Thread Martin Sebor via Gcc-patches


Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.

Tested on x86_64-linux.

Martin
Refactor warn_uninit() code, improve note location.

Related:
PR tree-optimization/17506 - warning about uninitialized variable points to wrong location
PR testsuite/37182 - Revision 139286 caused gcc.dg/pr17506.c and gcc.dg/uninit-15.c

gcc/ChangeLog:

	PR tree-optimization/17506
	PR testsuite/37182
	* tree-ssa-uninit.c (warn_uninit): Refactor and simplify.
	(warn_uninit_phi_uses): Remove argument from calls to warn_uninit.
	(warn_uninitialized_vars): Same.  Reduce visibility of locals.
	(warn_uninitialized_phi): Same.

gcc/testsuite/ChangeLog:

	PR tree-optimization/17506
	PR testsuite/37182
	* gcc.dg/diagnostic-tree-expr-ranges-2.c: Add expected output.
	* gcc.dg/uninit-15-O0.c: Remove xfail.
	* gcc.dg/uninit-15.c: Same.

diff --git a/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c
index 302e233a04a..1cbcad5ac7d 100644
--- a/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c
+++ b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges-2.c
@@ -3,21 +3,25 @@
 
 int test_uninit_1 (void)
 {
-  int result;
-  return result;  /* { dg-warning "uninitialized" } */
-/* { dg-begin-multiline-output "" }
-   return result;
-  ^~
+  int result_1; /* { dg-message "declared here" } */
+  return result_1;  /* { dg-warning "uninitialized" } */
+  /* { dg-begin-multiline-output "" }
+   return result_1;
+  ^~~~
+   int result_1;
+   ^~~~
{ dg-end-multiline-output "" } */
 }
 
 int test_uninit_2 (void)
 {
-  int result;
-  result += 3; /* { dg-warning "uninitialized" } */
-/* { dg-begin-multiline-output "" }
-   result += 3;
-   ~~~^~~~
+  int result_2; /* { dg-message "declared here" } */
+  result_2 += 3;/* { dg-warning "uninitialized" } */
+  /* { dg-begin-multiline-output "" }
+   result_2 += 3;
+   ~^~~~
+   int result_2;
+   ^~~~
{ dg-end-multiline-output "" } */
-  return result;
+  return result_2;
 }
diff --git a/gcc/testsuite/gcc.dg/uninit-15-O0.c b/gcc/testsuite/gcc.dg/uninit-15-O0.c
index 36d96348617..1ee1388 100644
--- a/gcc/testsuite/gcc.dg/uninit-15-O0.c
+++ b/gcc/testsuite/gcc.dg/uninit-15-O0.c
@@ -14,7 +14,7 @@ void baz();
 
 void bar()
 {
-int j;   /* { dg-message "was declared here" {} { xfail *-*-* } } */
+int j;  /* { dg-message "declared here" } */
 for (; foo(j); ++j) /* { dg-warning "is used uninitialized" } */
 baz();
 }
diff --git a/gcc/testsuite/gcc.dg/uninit-15.c b/gcc/testsuite/gcc.dg/uninit-15.c
index 85cb116f349..7352662f627 100644
--- a/gcc/testsuite/gcc.dg/uninit-15.c
+++ b/gcc/testsuite/gcc.dg/uninit-15.c
@@ -20,7 +20,7 @@ void baz (void);
 void
 bar (void)
 {
-  int j; /* { dg-message "note: 'j' was declared here" "" { xfail *-*-* } } */
+  int j;/* { dg-message "note: 'j' was declared here" } */
   for (; foo (j); ++j)  /* { dg-warning "'j' is used uninitialized" } */
 baz ();
 }
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index ad2cf48819b..3f9eab74e68 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -131,20 +131,19 @@ uninit_undefined_value_p (tree t)
again for plain uninitialized variables, since optimization may have
changed conditionally uninitialized to unconditionally uninitialized.  */
 
-/* Emit a warning for EXPR based on variable VAR at the point in the
-   program T, an SSA_NAME, is used being uninitialized.  The exact
-   warning text is in MSGID and DATA is the gimple stmt with info about
-   the location in source code.  When DATA is a GIMPLE_PHI, PHIARG_IDX
-   gives which argument of the phi node to take the location from.  WC
-   is the warning code.  */
+/* Emit warning OPT for variable VAR at the point in the program where
+   the SSA_NAME T is being used uninitialized.  The warning text is in
+   MSGID and STMT is the statement that does the uninitialized read.
+   PHI_ARG_LOC is the location of the PHI argument if T and VAR are one,
+   or UNKNOWN_LOCATION otherwise.  */
 
 static void
-warn_uninit (enum opt_code wc, tree t, tree expr, tree var,
-	 const char *gmsgid, void *data, location_t phiarg_loc)
+warn_uninit (opt_code opt, tree t, tree var, const char *gmsgid,
+	 gimple *context, location_t phi_arg_loc =

[PATCH] Generate XXSPLTIDP on power10.

2021-08-25 Thread Michael Meissner via Gcc-patches

Generate XXSPLTIDP on power10.

This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF
vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
is converted to a vector of two DFmode constants.  The immediate is in SFmode
format, so only constants that fit as SFmode values can be loaded with
XXSPLTIDP.

I added a new constraint (eF) to match constants that can be loaded with the
XXSPLTIDP instruction.

I have added a temporary switch (-mxxspltidp) to control whether or not the
XXSPLTIDP instruction is generated.

I added 3 new tests to test loading up SF/DF scalar and V2DF vector
constants.

I have tested this with bootstrap compilers on power10 systems and there was no
regression.  I have built GCC with these patches on little endian power9 and
big endian power8 systems, and there were no regressions.

In addition, I have built and run the full Spec 2017 rate suite, comparing with
the patches enabled and not enabled.  There were roughly 66,000 XXSPLTIDP's
generated in the rate build for Spec 2017.  On a stand-alone system that is
running single threaded, blender_r has a 1.9% increase in performance, and rest
of the benchmarks are performance neutral.  However, I would expect that in a
real world scenario, switching to use XXSPLTIDP will increase performance due
to removing all of the loads.

Can I check this into the master branch?

2021-08-25  Michael Meissner  

gcc/
* config/rs6000/constraints.md (eF): New constraint.
* config/rs6000/predicates.md (easy_fp_constant): If we can load
the scalar constant with XXSPLTIDP, the floating point constant is
easy.
(xxspltidp_operand): New predicate.
(easy_vector_constant): If we can generate XXSPLTIDP, mark the
vector constant as easy.
* config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
declaration.
(prefixed_permute_p): Likewise.
* config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
(output_vec_const_move): Add support for XXSPLTIDP.
(prefixed_permute_p): New function.
* config/rs6000/rs6000.md (prefixed attribute): Add support for
permute prefixed instructions.
(movsf_hardfloat): Add XXSPLTIDP support.
(mov_hardfloat32, FMOVE64 iterator): Likewise.
(mov_hardfloat64, FMOVE64 iterator): Likewise.
* config/rs6000/rs6000.opt (-mxxspltidp): New switch.
* config/rs6000/vsx.md (vsx_move_64bit): Add XXSPLTIDP
support.
(vsx_move_32bit): Likewise.
(vsx_splat_v2df_xxspltidp): New insn.
(XXSPLTIDP): New mode iterator.
(xxspltidp__internal): New insn and splits.
(xxspltidp__inst): Replace xxspltidp_v2df_inst with an
iterated form that also does SFmode, and DFmode.

gcc/testsuite/
* gcc.target/powerpc/vec-splat-constant-sf.c: New test.
* gcc.target/powerpc/vec-splat-constant-df.c: New test.
* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
---
 gcc/config/rs6000/constraints.md  |   5 +
 gcc/config/rs6000/predicates.md   |  17 +++
 gcc/config/rs6000/rs6000-protos.h |   2 +
 gcc/config/rs6000/rs6000.c| 106 ++
 gcc/config/rs6000/rs6000.md   |  45 +---
 gcc/config/rs6000/rs6000.opt  |   4 +
 gcc/config/rs6000/vsx.md  |  64 ++-
 .../powerpc/vec-splat-constant-df.c   |  60 ++
 .../powerpc/vec-splat-constant-sf.c   |  60 ++
 .../powerpc/vec-splat-constant-v2df.c |  64 +++
 10 files changed, 405 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..ea2e4a267c3 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@ (define_constraint "P"
   (and (match_code "const_int")
(match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x1")))
 
+;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
+(define_constraint "eF"
+  "A vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "xxspltidp_operand"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..134243e404b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
   if (TARGET_VSX && op == CONST0_RTX (mode))
 return 1;
 
+  /* If we have the ISA 3.1

Re: [PATCH] avoid printing range table header alone

2021-08-25 Thread Andrew MacLeod via Gcc-patches


On 8/25/21 3:15 PM, Martin Sebor wrote:

On 8/25/21 11:46 AM, Andrew MacLeod wrote:

On 8/25/21 1:25 PM, Martin Sebor wrote:

I noticed that the output of -fdump-tree-walloca (and probably all
passes that call gimple_ranger::export_global_ranges()) includes
the following two lines for all functions, even when there's
nothing else of relevance after them:

Exported global range table
===

I was a bit puzzled by it at first, wondering if it was leftover
debugging output, until I checked the code and realized it's
a header that may be (but isn't always) followed by the contents
of the table.

To reduce clutter in the dump the attached patch changes the function
to print the header only when it is followed by at least one line of
output.  (Another tweak to reduce even further the amount of output
printed by default that might be worth considering is to print
the table only when TDF_DETAILS is set.)

Martin


Except checking x == 1 won't work if ssa_name(1)  was not updated.

Just set a flag if the header hasn't been printed yet. tried and 
true. I guess you can protect it behind details if you want.


Doh!  I've fixed that.  While testing I came across another instance
of the same pattern so I've made the same change there as well.  I
also took the liberty to make the output slightly more consistent
between the two dumps.

Attached is what I'm planning to commit.  Please let me know if you
have any other suggestions.


OK, thats fine.



FWIW, I see duplicate output in the EVRP1 dump that looks just like
it could be the output under Non-varying global ranges. I wonder if
it's the result of the block in ranger_cache::fill_block_cache guarded
by if (DEBUG_RANGE_CACHE).

Martin 



The cache debug mechanism hasnt been fully reworked yet... i moved GORI 
and the ranger to the trace class.. I haven't decided yet what I want to 
change in the cache. its a bit dated.


regardless, if your sources are up to date, DEBUG_RANGE_CACHE is now 
guarded by:


#define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & 
EVRP_MODE_CACHE) \

 == EVRP_MODE_CACHE)

so you'd only see that output if you specify --param=evrp-mode=cache or 
debug


Andrew

Re: [PATCH] avoid printing range table header alone

2021-08-25 Thread Martin Sebor via Gcc-patches


On 8/25/21 11:46 AM, Andrew MacLeod wrote:

On 8/25/21 1:25 PM, Martin Sebor wrote:

I noticed that the output of -fdump-tree-walloca (and probably all
passes that call gimple_ranger::export_global_ranges()) includes
the following two lines for all functions, even when there's
nothing else of relevance after them:

Exported global range table
===

I was a bit puzzled by it at first, wondering if it was leftover
debugging output, until I checked the code and realized it's
a header that may be (but isn't always) followed by the contents
of the table.

To reduce clutter in the dump the attached patch changes the function
to print the header only when it is followed by at least one line of
output.  (Another tweak to reduce even further the amount of output
printed by default that might be worth considering is to print
the table only when TDF_DETAILS is set.)

Martin


Except checking x == 1 won't work if ssa_name(1)  was not updated.

Just set a flag if the header hasn't been printed yet. tried and true. I 
guess you can protect it behind details if you want.


Doh!  I've fixed that.  While testing I came across another instance
of the same pattern so I've made the same change there as well.  I
also took the liberty to make the output slightly more consistent
between the two dumps.

Attached is what I'm planning to commit.  Please let me know if you
have any other suggestions.

FWIW, I see duplicate output in the EVRP1 dump that looks just like
it could be the output under Non-varying global ranges. I wonder if
it's the result of the block in ranger_cache::fill_block_cache guarded
by if (DEBUG_RANGE_CACHE).

Martin
Avoid printing range table header alone.

gcc/ChangeLog:
	* gimple-range-cache.cc (ssa_global_cache::dump): Avoid printing
	range table header alone.
	* gimple-range.cc (gimple_ranger::export_global_ranges): Same.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 4138d0556c6..facf981c15d 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -628,20 +628,32 @@ ssa_global_cache::clear ()
 void
 ssa_global_cache::dump (FILE *f)
 {
-  unsigned x;
-  int_range_max r;
-  fprintf (f, "Non-varying global ranges:\n");
-  fprintf (f, "=:\n");
-  for ( x = 1; x < num_ssa_names; x++)
-if (gimple_range_ssa_p (ssa_name (x)) &&
-	get_global_range (r, ssa_name (x))  && !r.varying_p ())
-  {
-	print_generic_expr (f, ssa_name (x), TDF_NONE);
-	fprintf (f, "  : ");
-	r.dump (f);
-	fprintf (f, "\n");
-  }
-  fputc ('\n', f);
+  /* Cleared after the table header has been printed.  */
+  bool print_header = true;
+  for (unsigned x = 1; x < num_ssa_names; x++)
+{
+  int_range_max r;
+  if (gimple_range_ssa_p (ssa_name (x)) &&
+	  get_global_range (r, ssa_name (x))  && !r.varying_p ())
+	{
+	  if (print_header)
+	{
+	  /* Print the header only when there's something else
+		 to print below.  */
+	  fprintf (f, "Non-varying global ranges:\n");
+	  fprintf (f, "==\n");
+	  print_header = false;
+	}
+
+	  print_generic_expr (f, ssa_name (x), TDF_NONE);
+	  fprintf (f, "  : ");
+	  r.dump (f);
+	  fprintf (f, "\n");
+	}
+}
+
+  if (!print_header)
+fputc ('\n', f);
 }
 
 // --
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index ef3afeacc90..fb23d567971 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -259,16 +259,11 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
 void
 gimple_ranger::export_global_ranges ()
 {
-  unsigned x;
-  int_range_max r;
-  if (dump_file)
-{
-  fprintf (dump_file, "Exported global range table\n");
-  fprintf (dump_file, "===\n");
-}
-
-  for ( x = 1; x < num_ssa_names; x++)
+  /* Cleared after the table header has been printed.  */
+  bool print_header = true;
+  for (unsigned x = 1; x < num_ssa_names; x++)
 {
+  int_range_max r;
   tree name = ssa_name (x);
   if (name && !SSA_NAME_IN_FREE_LIST (name)
 	  && gimple_range_ssa_p (name)
@@ -276,21 +271,29 @@ gimple_ranger::export_global_ranges ()
 	  && !r.varying_p())
 	{
 	  bool updated = update_global_range (r, name);
+	  if (!updated || !dump_file || !(dump_flags & TDF_DETAILS))
+	continue;
 
-	  if (updated && dump_file)
+	  if (print_header)
 	{
-	  value_range vr = r;
-	  print_generic_expr (dump_file, name , TDF_SLIM);
-	  fprintf (dump_file, " --> ");
-	  vr.dump (dump_file);
+	  /* Print the header only when there's something else
+		 to print below.  */
+	  fprintf (dump_file, "Exported global range table:\n");
+	  fprintf (dump_file, "\n");
+	  print_header = false;
+	}
+
+	  value_range vr = r;
+	  print_generic_expr (dump_file, name , TDF_SLIM);
+	  fprintf (dump_file, "  : ");
+	  vr.dump (dump_file);
+	  fprintf (dump_file, "\n");
+

PING – Re: [Patch] Fortran: Fix Bind(C) char-len check, add ptr-contiguous check

2021-08-25 Thread Tobias Burnus


Early *PING*.
(I also should still review several Fortan patches... There are lots of
patches waiting for review :-/)

On 20.08.21 19:24, Tobias Burnus wrote:

The following is about interoperability (BIND(C)) only.


* The patch adds a missing check for pointer + contiguous.
(Rejected to avoid copy-in issues? Or checking issues?)


* And it corrects an issue regarding len > 1 characters. While

 subroutine foo(x)
character(len=2) :: x(*)

is valid Fortran code (the argument can be "abce" or ['a','b','c','d']
or ...)
– and would work also with bind(C) as the len=2 does not need to be
passed
as hidden argument as len is a constant.
However, it is not valid nonetheless.


OK? Comments?

Tobias


PS: Referenced locations in the standard (F2018):

C1554 If proc-language-binding-spec is specified for a procedure,
each of its dummy arguments shall be an interoperable procedure (18.3.6)
or a variable that is interoperable (18.3.4, 18.3.5), assumed-shape,
assumed-rank, assumed-type, of type CHARACTER with assumed length,
or that has the ALLOCATABLE or POINTER attribute.

18.3.1: "... If the type is character, the length type parameter is
interoperable if and only if its value is one. ..."

"18.3.4 Interoperability of scalar variables":
"... A named scalar Fortran variable is interoperable ... if it
is of type character12its length is not assumed or declared by
an expression that is not a constant expression."

18.3.5: Likewise but for arrays.

18.3.6 "... Fortran procedure interface is interoperable with a C
function prototype ..."
"(5) any dummy argument without the VALUE attribute corresponds
 to a formal parameter of the prototype that is of a pointer type,
and either
 • the dummy argument is interoperable with an entity of the
referenced type ..."
(Remark: those are passed as byte stream)
 "• the dummy argument is a nonallocatable nonpointer variable of
type
CHARACTER with assumed character length and the formal
parameter is
a pointer to CFI_cdesc_t,
  • the dummy argument is allocatable, assumed-shape,
assumed-rank, or
a pointer without the CONTIGUOUS attribute, and the formal
parameter
is a pointer to CFI_cdesc_t, or
(Remark: those two use an array descriptor, also for
explicit-size/assumed-size
arrays or for scalars.)
  • the dummy argument is assumed-type ..."


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread Iain Sandoe via Gcc-patches




> On 25 Aug 2021, at 19:22, H.J. Lu  wrote:
> 
> On Wed, Aug 25, 2021 at 11:10 AM Iain Sandoe  wrote:
>> 
>> 
>> 
>>> On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches  
>>> wrote:
>>> 
>>> On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu  wrote:
 
 On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe  
 wrote:
> 
> Hi,
> 
>> On 20 Aug 2021, at 11:29, Richard Sandiford  
>> wrote:
>> 
 Maybe it would be easier to have the makefile fragments determine
 something like CODE_MODEL_CFLAGS, which can be "-fPIC", 
 "-mdynamic-no-pic",
 etc., and use:
 
 COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
>>> 
>>> OK. I have misgivings about this - the problem is that:
>>> 
>>> -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - 
>>> who expect that
>>> the “last edition of a flag will be the one in force”.
>>> 
>>> So the PIE-ness and the PIC-ness are decoupled in the configury but 
>>> they need to be
>>> ordered specifically for targets that want PIC code by default (FWIW, I 
>>> don’t think Darwin
>>> is the only default-PIC case here, from discussions on irc).
>> 
>> Yeah, that's what the above was supposed to achieve.  In other words,
>> if you force non-PIE, you also need to follow that by 
>> $(CODE_MODEL_CFLAGS),
>> which restates whatever the base code model is.
>> 
>> If it's the decoupling you're worried about, then an alternative would
>> be to have:
>> 
>> NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”
> 
> I’d like to ask a couple of questions (of HJ who introduced the no-PIE 
> logic) before implementing this.
> 
> A. We use no-PIE for cc1* because that is needed to handle the PCH 
> implementation (which relies on the executables being loaded at the same 
> addresses each time).
> 
> B. It’s certainly not obvious to me why we need to build code to run on 
> $build to be no-PIE - I don’t see any such dependencies in the generators 
> etc.
> 
> - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
> adopt no-PIE?
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
 
> ——
> 
> Independently of this we seem to be building the objects for $host thus:
> 
> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.
> 
> but we build for $build thus:
> 
> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
> 
> which means that code model flags in $ALL_CXXFLAGS are overridden for 
> $build, but active for $host
> ^^ this is actually what causes the Darwin build fail - since on Darwin 
> we cannot build static linked code for user-space processes.
> 
> in any event that’s inconsistent (unless there’s a reason that it should 
> be different).
> 
> 
> 
> below are extracts from gcc/Makefile *on linux* which demonstrates the 
> different ordering.
> 
> AFAICT,
> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?
>> 
>> ^^ what was the intention for these?
>> 
> Question 2 : HJ, what was your intention for how a configuration would 
> request PIC code (for example) for things to run on $build?
>>> 
>>> We need to fix
>>> 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
>> 
>> The need for no-PIE for the exectuables that consume PCH is completely clear 
>> (but a target can choose not to use PCH - and configure 
>> —disable-libstdcxx-pch).
>> 
>> The PR doesn’t explain:
>> 
>> 1. why it’s being enabled for the generators (and other $build code) which 
>> do not use PCH (AFAIK)
>> 
>> 2. why the flags ordering is different for $build and $host.
>> 
>> I am completely happy to make the fix Richard suggested - but we seem to be 
>> adding complexity rather than simplifying things; as noted in the PR there 
>> are targets that cannot use no-PIE and therefore have to disable PCH anyway.
> 
> Please follow:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2015-October/432180.html

I don’t see answers to the questions above in that thead or the PR - there is 
nothing so far in this that explains why no-PIE has been forced on for the 
stuff in gcc/build

but OK.. I’ll look at implementing the patch suggested by Richard,
thanks
Iain

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread H.J. Lu via Gcc-patches

On Wed, Aug 25, 2021 at 11:10 AM Iain Sandoe  wrote:
>
>
>
> > On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches  
> > wrote:
> >
> > On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu  wrote:
> >>
> >> On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe  
> >> wrote:
> >>>
> >>> Hi,
> >>>
>  On 20 Aug 2021, at 11:29, Richard Sandiford  
>  wrote:
> 
> >> Maybe it would be easier to have the makefile fragments determine
> >> something like CODE_MODEL_CFLAGS, which can be "-fPIC", 
> >> "-mdynamic-no-pic",
> >> etc., and use:
> >>
> >> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
> >
> > OK. I have misgivings about this - the problem is that:
> >
> > -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - 
> > who expect that
> > the “last edition of a flag will be the one in force”.
> >
> > So the PIE-ness and the PIC-ness are decoupled in the configury but 
> > they need to be
> > ordered specifically for targets that want PIC code by default (FWIW, I 
> > don’t think Darwin
> > is the only default-PIC case here, from discussions on irc).
> 
>  Yeah, that's what the above was supposed to achieve.  In other words,
>  if you force non-PIE, you also need to follow that by 
>  $(CODE_MODEL_CFLAGS),
>  which restates whatever the base code model is.
> 
>  If it's the decoupling you're worried about, then an alternative would
>  be to have:
> 
>  NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”
> >>>
> >>> I’d like to ask a couple of questions (of HJ who introduced the no-PIE 
> >>> logic) before implementing this.
> >>>
> >>> A. We use no-PIE for cc1* because that is needed to handle the PCH 
> >>> implementation (which relies on the executables being loaded at the same 
> >>> addresses each time).
> >>>
> >>> B. It’s certainly not obvious to me why we need to build code to run on 
> >>> $build to be no-PIE - I don’t see any such dependencies in the generators 
> >>> etc.
> >>>
> >>> - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
> >>> adopt no-PIE?
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
> >>
> >>> ——
> >>>
> >>> Independently of this we seem to be building the objects for $host thus:
> >>>
> >>> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.
> >>>
> >>> but we build for $build thus:
> >>>
> >>> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> >>> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
> >>>
> >>> which means that code model flags in $ALL_CXXFLAGS are overridden for 
> >>> $build, but active for $host
> >>> ^^ this is actually what causes the Darwin build fail - since on Darwin 
> >>> we cannot build static linked code for user-space processes.
> >>>
> >>> in any event that’s inconsistent (unless there’s a reason that it should 
> >>> be different).
> >>>
> >>> 
> >>>
> >>> below are extracts from gcc/Makefile *on linux* which demonstrates the 
> >>> different ordering.
> >>>
> >>> AFAICT,
> >>> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?
>
> ^^ what was the intention for these?
>
> >>> Question 2 : HJ, what was your intention for how a configuration would 
> >>> request PIC code (for example) for things to run on $build?
> >
> > We need to fix
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
>
> The need for no-PIE for the exectuables that consume PCH is completely clear 
> (but a target can choose not to use PCH - and configure 
> —disable-libstdcxx-pch).
>
> The PR doesn’t explain:
>
> 1. why it’s being enabled for the generators (and other $build code) which do 
> not use PCH (AFAIK)
>
> 2. why the flags ordering is different for $build and $host.
>
> I am completely happy to make the fix Richard suggested - but we seem to be 
> adding complexity rather than simplifying things; as noted in the PR there 
> are targets that cannot use no-PIE and therefore have to disable PCH anyway.

Please follow:

https://gcc.gnu.org/pipermail/gcc-patches/2015-October/432180.html



--
H.J.

Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

2021-08-25 Thread Michael Meissner via Gcc-patches

On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote:
> > I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
> > xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
> > permute instructions (3 cycle latency) and should use 'vecperm' instead.
> 
> They are all executed on the PM pipe currently, yup.  If this changes
> later we'll have to fix it, but that is for then :-)
> 
> > While I was at it, I changed the UNSPEC name for xxspltidp to be
> > UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> In the future please do separate things as separate patches.
> 
> > * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
> > UNSPEC_XXSPLTID.
> 
>   * config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to...
>   (UNSPEC_XXSPLTIDP): ... this.
> 
> > (xxspltidp_v2df): Use vecperm type attribute.  Use
> > UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> Typo ("UU").
> 
> Okay for trunk with those trivial fixes.  Also okay for backport to 11,
> it is trivial enough.  Thanks!

Thanks.

> Out of interest, did you notice any scheduling differences with this?

I don't use the built-ins so I wouldn't notice a difference.  I noticed this as
part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in
a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.

I'm trying to iron out the slow-downs, and I wanted the scheduler to know it
needed to add insns between the XXSPLTIDP to load the constant and its use if
it can.  Right now, I'm seeing a slight boost in blender_r with XXSPLTIDP (over
doing a load).  However, I suspect if you aren't running spec on an otherwise
idle machine, things will change where XXSPLTIDP will be more of a win by
eliminating the loads.

While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
track down.

If I add both instructions, several of the benchmarks improve (including
xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
fotonik3d_r also has a new 3% regression.

Given that many more programs use floating point constants than vector
constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
the XXSPLTIDP now, and try to figure out the differences before submitting the
XXSPLTIW patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread Iain Sandoe via Gcc-patches




> On 25 Aug 2021, at 18:56, H.J. Lu via Gcc-patches  
> wrote:
> 
> On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu  wrote:
>> 
>> On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe  wrote:
>>> 
>>> Hi,
>>> 
 On 20 Aug 2021, at 11:29, Richard Sandiford  
 wrote:
 
>> Maybe it would be easier to have the makefile fragments determine
>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", 
>> "-mdynamic-no-pic",
>> etc., and use:
>> 
>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
> 
> OK. I have misgivings about this - the problem is that:
> 
> -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - 
> who expect that
> the “last edition of a flag will be the one in force”.
> 
> So the PIE-ness and the PIC-ness are decoupled in the configury but they 
> need to be
> ordered specifically for targets that want PIC code by default (FWIW, I 
> don’t think Darwin
> is the only default-PIC case here, from discussions on irc).
 
 Yeah, that's what the above was supposed to achieve.  In other words,
 if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS),
 which restates whatever the base code model is.
 
 If it's the decoupling you're worried about, then an alternative would
 be to have:
 
 NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”
>>> 
>>> I’d like to ask a couple of questions (of HJ who introduced the no-PIE 
>>> logic) before implementing this.
>>> 
>>> A. We use no-PIE for cc1* because that is needed to handle the PCH 
>>> implementation (which relies on the executables being loaded at the same 
>>> addresses each time).
>>> 
>>> B. It’s certainly not obvious to me why we need to build code to run on 
>>> $build to be no-PIE - I don’t see any such dependencies in the generators 
>>> etc.
>>> 
>>> - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
>>> adopt no-PIE?
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
>> 
>>> ——
>>> 
>>> Independently of this we seem to be building the objects for $host thus:
>>> 
>>> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.
>>> 
>>> but we build for $build thus:
>>> 
>>> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
>>> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
>>> 
>>> which means that code model flags in $ALL_CXXFLAGS are overridden for 
>>> $build, but active for $host
>>> ^^ this is actually what causes the Darwin build fail - since on Darwin we 
>>> cannot build static linked code for user-space processes.
>>> 
>>> in any event that’s inconsistent (unless there’s a reason that it should be 
>>> different).
>>> 
>>> 
>>> 
>>> below are extracts from gcc/Makefile *on linux* which demonstrates the 
>>> different ordering.
>>> 
>>> AFAICT,
>>> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?

^^ what was the intention for these?

>>> Question 2 : HJ, what was your intention for how a configuration would 
>>> request PIC code (for example) for things to run on $build?
> 
> We need to fix
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934

The need for no-PIE for the exectuables that consume PCH is completely clear 
(but a target can choose not to use PCH - and configure —disable-libstdcxx-pch).

The PR doesn’t explain:

1. why it’s being enabled for the generators (and other $build code) which do 
not use PCH (AFAIK)

2. why the flags ordering is different for $build and $host.

I am completely happy to make the fix Richard suggested - but we seem to be 
adding complexity rather than simplifying things; as noted in the PR there are 
targets that cannot use no-PIE and therefore have to disable PCH anyway.

thanks
Iain

> 
> first.
> 
> H.J.
>>> thanks
>>> Iain
>>> 
>>> ---
>>> 
>>> 
>>> ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
>>>  $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \
>>> 
>>> --
>>> # Native compiler for the build machine and its switches.
>>> CC_FOR_BUILD = $(CC)
>>> CXX_FOR_BUILD = $(CXX)
>>> NO_PIE_CFLAGS_FOR_BUILD =
>>> NO_PIE_FLAG_FOR_BUILD =
>>> BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
>>> BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
>>> BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS)
>>> BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS)
>>> BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS)
>>> 
>>> # Native compiler that we use.  This may be C++ some day.
>>> COMPILER_FOR_BUILD = $(CXX_FOR_BUILD)
>>> BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS)
>>> 
>>> # Native linker that we use.
>>> LINKER_FOR_BUILD = $(CXX_FOR_BUILD)
>>> BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
>>> 
>>> # Native linker and preprocessor flags.  For x-fragment overrides.
>>> BUILD_LDFLAGS=$(LDFLAGS)
>>> BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG)
>>> BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG)
>>> BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
>>>-I$(srcdir)/../include  $(CPPINC)

Re: [PATCH] diagnostics: Fix sporadic test failure

2021-08-25 Thread H.J. Lu via Gcc-patches

On Sat, May 29, 2021 at 1:03 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/29/2021 1:55 PM, Bernd Edlinger wrote:
> >
> > On 5/29/21 9:31 PM, Jeff Law wrote:
> >>
> >> On 5/28/2021 6:38 AM, Bernd Edlinger wrote:
> >>> Hi,
> >>>
> >>> it turns out to be reproducible this way:
> >>>
> >>> COLUMNS=80 make check-gcc-c RUNTESTFLAGS="plugin.exp=diagnostic*"
> >>>
> >>> Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/plugin/plugin.exp ...
> >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
> >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so  1 blank 
> >>> line(s) in output
> >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
> >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so  expected 
> >>> multiline pattern lines 550-551 not found: "
> >>> __builtin_types_compatible_p \(long, int\) \+ f \(i\)\);.*\n  
> >>>   ~\^~~\n"
> >>> FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c
> >>>-fplugin=./diagnostic_plugin_test_tree_expression_range.so (test for 
> >>> excess errors)
> >>>
> >>> a lot more errors happen with COLUMNS=20.
> >>>
> >>> Tested on x86_64-pc-linux-gnu.
> >>> Is it OK for trunk?
> >>>
> >>>
> >>> Thanks
> >>> Bernd.
> >>>
> >>>
> >>> 2021-05-28  Bernd Edlinger  
> >>>
> >>>  * gcc.dg/plugin/diagnostic_plugin_show_trees.c (plugin_init): Fix 
> >>> caret_max_with.
> >>>  * gcc.dg/plugin/diagnostic_plugin_test_inlining.c
> >>>  (plugin_init): Likewise.
> >>>  * gcc.dg/plugin/diagnostic_plugin_test_paths.c (plugin_init): 
> >>> Likewise.
> >>>  * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
> >>>  (plugin_init): Likewise.
> >>>  * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c
> >>>  (plugin_init): Likewise.
> >> So while you've got a patch here, you haven't indicated what the problem 
> >> actually was.  I'm guessing caret_max_width was uninitialized and thus we 
> >> got random-ish values.  Presumably those randomish-values are what caused 
> >> tests to occasionally appear to truncate their output and fail?
> >>
> >> If that's the case, this is fine.  If it's something deeper, please 
> >> provide a bit of background to help in evaluating the patch.
> >>
> > Yes, the problem is just the function get_terminal_width in diagnostic.c, 
> > can
> > somehow learn the X-terminal's window dimensions where the make check is
> > started:
> >
> > int
> > get_terminal_width (void)
> > {
> >const char * s = getenv ("COLUMNS");
> >if (s != NULL) {
> >  int n = atoi (s);
> >  if (n > 0)
> >return n;
> >}
> >
> > #ifdef TIOCGWINSZ
> >struct winsize w;
> >w.ws_col = 0;
> >if (ioctl (0, TIOCGWINSZ, ) == 0 && w.ws_col > 0)
> >  return w.ws_col;
> > #endif
> >
> >return INT_MAX;
> > }
> >
> > and this is used to initialize context->caret_max_width in 
> > diagnostic_set_caret_max_width,
> > and possibly also other places. This causes a small variation in the output 
> > that
> > is trips the test cases.  It is just an extra newline in some cases, that I 
> > have not
> > debugged why exactly this happens, but I assume this is intentional to make 
> > the
> > diagnostics spanning multiple lines better readable to a human.
> Thanks.  So for the testsuite we certainly don't want to be doing that
> :-)  OK for the trunk, thanks for chasing it down -- I've been seeing
> these failures, but haven't had the time to chase down a root cause.
>

I'd like to backport it to GCC 11 branch to avoid random failures on
GCC 11 branch:

https://gcc.gnu.org/pipermail/gcc-regression/2021-August/075244.html

Thanks.

-- 
H.J.

Merge stores/loads in modref summaries

2021-08-25 Thread Jan Hubicka

Hi,
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries.  This helps analyzing array initializers and similar code.  It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely.  For this reason I
added counter tracking number of adjustments and a cap to limit them during the
dataflow.  This triggers in:
void
recurse (char *p, int n)
{
*p = 0;
if (n)
  recurse (p+1,n-1);
}

Where we work now hard enugh to determine:

access: Parm 0 param offset:0 offset:0 size:8 max_size:-1 adjusted 8 times

which is correct access info saying that param0 can be accessed from byte 0
in 8bit accesses with unknown max_size.

--param max-modref-accesses is now hit 8 times instead of 45 before the patch.
We hit --param param=modref-max-adjustments once for fft algorithm (where the
recursion really seems to walk array) and max-bases once for late modref and 9
times for IPA modref (it works on types rather than alias sets so it is more
likely to hit the limit).

I would be happy for suggestions how to simplify the merging logic. It is bit
convoluted since I need to know if I am going to adjust the range and need
to deal with poly_ints and possibly unknown sizes.

Incrementally I will add logic on improving behaviour when limits are met
instead of just giving up on the analysis.

With the patch I get following cc1plus stats:

Alias oracle query stats:
  refs_may_alias_p: 83135089 disambiguations, 101581194 queries
  ref_maybe_used_by_call_p: 590484 disambiguations, 84157326 queries
  call_may_clobber_ref_p: 345434 disambiguations, 349295 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 39520 queries
  nonoverlapping_refs_since_match_p: 33266 disambiguations, 66411 must 
overlaps, 100574 queries
  aliasing_component_refs_p: 66251 disambiguations, 9920037 queries
  TBAA oracle: 31033174 disambiguations 93485041 queries
   14359693 are in alias set 0
   11930606 queries asked about the same object
   129 queries asked about the same alias set
   0 access volatile
   34218393 are dependent in the DAG
   1943046 are aritificially in conflict with void *

Modref stats:
  modref use: 26293 disambiguations, 705198 queries 
  modref clobber: 1828340 disambiguations, 21213011 queries
  4748965 tbaa queries (0.223870 per modref query)  
  711083 base compares (0.033521 per modref query)  

PTA query stats:
  pt_solution_includes: 13119524 disambiguations, 33183481 queries
  pt_solutions_intersect: 1510541 disambiguations, 15368102 queries

this would suggest quite large improvement over my last run
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577962.html
(about 12% on overall disambiguation count)

but I also updated my setup so part of the increase may be accounted for
diferent libraries.  The overall size of modref access lists is about halved on
cc1plus that looks promising though. It may be that we less often hit the limit
on number of querries done in tree-ssa-alias.

Bootstrapped/regtested x86_64-linux.
I plan to commit the patch after bit more testing.

gcc/ChangeLog:

* doc/invoke.texi: Document --param modref-max-adjustments
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref-tree.h (struct modref_access_node): Add adjustments;
(modref_access_node::operator==): Fix handling of access ranges.
(modref_access_node::contains): Constify parameter; handle also
mismatched parm offsets.
(modref_access_node::update): New function.
(modref_access_node::merge0: New function.
(unspecified_modref_access_node): Update constructor.
(modref_ref_node::insert_access): Add record_adjustments parameter;
handle merging.
(modref_ref_node::try_merge_with): New private function.
(modref_tree::insert): New record_adjustments parameter.
(modref_tree::merge): New record_adjustments parameter.
(modref_tree::copy_from): Update.
* ipa-modref.c (dump_access): Dump adjustments field.
(get_access): Update constructor.
(record_access): Update call of insert.
(record_access_lto): Update call of insert.
(merge_call_side_effects): Add record_adjustments parameter.
(get_access_for_fnspec): Update.
(process_fnspec): Update.
(analyze_call): Update.
(analyze_function): Update.
(read_modref_records): Update.
(ipa_merge_modref_summary_after_inlining): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
* params.opt: (param-max-modref-adjustments=): New.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/modref-1.c: Update testcase.
* gcc.dg/tree-ssa/modref-4.c: Update testcase.
* gcc.dg/tree-ssa/modref-8.c: New test.

diff --git

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread H.J. Lu via Gcc-patches

On Wed, Aug 25, 2021 at 10:51 AM H.J. Lu  wrote:
>
> On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe  wrote:
> >
> > Hi,
> >
> > > On 20 Aug 2021, at 11:29, Richard Sandiford  
> > > wrote:
> > >
> > >>> Maybe it would be easier to have the makefile fragments determine
> > >>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", 
> > >>> "-mdynamic-no-pic",
> > >>> etc., and use:
> > >>>
> > >>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
> > >>
> > >> OK. I have misgivings about this - the problem is that:
> > >>
> > >> -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - 
> > >> who expect that
> > >> the “last edition of a flag will be the one in force”.
> > >>
> > >> So the PIE-ness and the PIC-ness are decoupled in the configury but they 
> > >> need to be
> > >> ordered specifically for targets that want PIC code by default (FWIW, I 
> > >> don’t think Darwin
> > >> is the only default-PIC case here, from discussions on irc).
> > >
> > > Yeah, that's what the above was supposed to achieve.  In other words,
> > > if you force non-PIE, you also need to follow that by 
> > > $(CODE_MODEL_CFLAGS),
> > > which restates whatever the base code model is.
> > >
> > > If it's the decoupling you're worried about, then an alternative would
> > > be to have:
> > >
> > >  NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”
> >
> > I’d like to ask a couple of questions (of HJ who introduced the no-PIE 
> > logic) before implementing this.
> >
> > A. We use no-PIE for cc1* because that is needed to handle the PCH 
> > implementation (which relies on the executables being loaded at the same 
> > addresses each time).
> >
> > B. It’s certainly not obvious to me why we need to build code to run on 
> > $build to be no-PIE - I don’t see any such dependencies in the generators 
> > etc.
> >
> >  - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
> > adopt no-PIE?
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
>
> > ——
> >
> > Independently of this we seem to be building the objects for $host thus:
> >
> > $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.
> >
> > but we build for $build thus:
> >
> > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
> >
> > which means that code model flags in $ALL_CXXFLAGS are overridden for 
> > $build, but active for $host
> > ^^ this is actually what causes the Darwin build fail - since on Darwin we 
> > cannot build static linked code for user-space processes.
> >
> > in any event that’s inconsistent (unless there’s a reason that it should be 
> > different).
> >
> > 
> >
> > below are extracts from gcc/Makefile *on linux* which demonstrates the 
> > different ordering.
> >
> > AFAICT,
> > NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?
> >
> > Question 2 : HJ, what was your intention for how a configuration would 
> > request PIC code (for example) for things to run on $build?

We need to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934

first.

H.J.
> > thanks
> > Iain
> >
> > ---
> >
> >
> > ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
> >   $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \
> >
> > --
> > # Native compiler for the build machine and its switches.
> > CC_FOR_BUILD = $(CC)
> > CXX_FOR_BUILD = $(CXX)
> > NO_PIE_CFLAGS_FOR_BUILD =
> > NO_PIE_FLAG_FOR_BUILD =
> > BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
> > BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
> > BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS)
> > BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS)
> > BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS)
> >
> > # Native compiler that we use.  This may be C++ some day.
> > COMPILER_FOR_BUILD = $(CXX_FOR_BUILD)
> > BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS)
> >
> > # Native linker that we use.
> > LINKER_FOR_BUILD = $(CXX_FOR_BUILD)
> > BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
> >
> > # Native linker and preprocessor flags.  For x-fragment overrides.
> > BUILD_LDFLAGS=$(LDFLAGS)
> > BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG)
> > BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG)
> > BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
> > -I$(srcdir)/../include  $(CPPINC) $(CPPFLAGS)
> > --
> > # This is the variable actually used when we compile. If you change this,
> > # you probably want to update BUILD_CFLAGS in configure.ac
> > ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \
> >
> > build/%.o :  # dependencies provided by explicit rule later
> > $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
> > -o $@ $<
> >
> >  this has
> > $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> > $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
> > NO_PIE_CFLAGS_FOR_BUILD is apparently ignored
> >
> > # Rule for the generator programs:
> > $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): 
> > build/gen%.o $(BUILD_LIBDEPS)
>

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread H.J. Lu via Gcc-patches

On Wed, Aug 25, 2021 at 10:42 AM Iain Sandoe  wrote:
>
> Hi,
>
> > On 20 Aug 2021, at 11:29, Richard Sandiford  
> > wrote:
> >
> >>> Maybe it would be easier to have the makefile fragments determine
> >>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", 
> >>> "-mdynamic-no-pic",
> >>> etc., and use:
> >>>
> >>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
> >>
> >> OK. I have misgivings about this - the problem is that:
> >>
> >> -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - 
> >> who expect that
> >> the “last edition of a flag will be the one in force”.
> >>
> >> So the PIE-ness and the PIC-ness are decoupled in the configury but they 
> >> need to be
> >> ordered specifically for targets that want PIC code by default (FWIW, I 
> >> don’t think Darwin
> >> is the only default-PIC case here, from discussions on irc).
> >
> > Yeah, that's what the above was supposed to achieve.  In other words,
> > if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS),
> > which restates whatever the base code model is.
> >
> > If it's the decoupling you're worried about, then an alternative would
> > be to have:
> >
> >  NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”
>
> I’d like to ask a couple of questions (of HJ who introduced the no-PIE logic) 
> before implementing this.
>
> A. We use no-PIE for cc1* because that is needed to handle the PCH 
> implementation (which relies on the executables being loaded at the same 
> addresses each time).
>
> B. It’s certainly not obvious to me why we need to build code to run on 
> $build to be no-PIE - I don’t see any such dependencies in the generators etc.
>
>  - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
> adopt no-PIE?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934

> ——
>
> Independently of this we seem to be building the objects for $host thus:
>
> $(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.
>
> but we build for $build thus:
>
> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
>
> which means that code model flags in $ALL_CXXFLAGS are overridden for $build, 
> but active for $host
> ^^ this is actually what causes the Darwin build fail - since on Darwin we 
> cannot build static linked code for user-space processes.
>
> in any event that’s inconsistent (unless there’s a reason that it should be 
> different).
>
> 
>
> below are extracts from gcc/Makefile *on linux* which demonstrates the 
> different ordering.
>
> AFAICT,
> NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?
>
> Question 2 : HJ, what was your intention for how a configuration would 
> request PIC code (for example) for things to run on $build?
>
> thanks
> Iain
>
> ---
>
>
> ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
>   $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \
>
> --
> # Native compiler for the build machine and its switches.
> CC_FOR_BUILD = $(CC)
> CXX_FOR_BUILD = $(CXX)
> NO_PIE_CFLAGS_FOR_BUILD =
> NO_PIE_FLAG_FOR_BUILD =
> BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
> BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
> BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS)
> BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS)
> BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS)
>
> # Native compiler that we use.  This may be C++ some day.
> COMPILER_FOR_BUILD = $(CXX_FOR_BUILD)
> BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS)
>
> # Native linker that we use.
> LINKER_FOR_BUILD = $(CXX_FOR_BUILD)
> BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
>
> # Native linker and preprocessor flags.  For x-fragment overrides.
> BUILD_LDFLAGS=$(LDFLAGS)
> BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG)
> BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG)
> BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
> -I$(srcdir)/../include  $(CPPINC) $(CPPFLAGS)
> --
> # This is the variable actually used when we compile. If you change this,
> # you probably want to update BUILD_CFLAGS in configure.ac
> ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \
>
> build/%.o :  # dependencies provided by explicit rule later
> $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
> -o $@ $<
>
>  this has
> $(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
> $(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
> NO_PIE_CFLAGS_FOR_BUILD is apparently ignored
>
> # Rule for the generator programs:
> $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): 
> build/gen%.o $(BUILD_LIBDEPS)
> +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
> $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)
>
> --
> build/genversion$(build_exeext): build/genversion.o
> +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) \
> build/genversion.o -o $@
> =
>
> # The name of the compiler to use.
> COMPILER = $(CXX)
> COMPILER_FLAGS =

Re: [PATCH] avoid printing range table header alone

2021-08-25 Thread Andrew MacLeod via Gcc-patches


On 8/25/21 1:25 PM, Martin Sebor wrote:

I noticed that the output of -fdump-tree-walloca (and probably all
passes that call gimple_ranger::export_global_ranges()) includes
the following two lines for all functions, even when there's
nothing else of relevance after them:

Exported global range table
===

I was a bit puzzled by it at first, wondering if it was leftover
debugging output, until I checked the code and realized it's
a header that may be (but isn't always) followed by the contents
of the table.

To reduce clutter in the dump the attached patch changes the function
to print the header only when it is followed by at least one line of
output.  (Another tweak to reduce even further the amount of output
printed by default that might be worth considering is to print
the table only when TDF_DETAILS is set.)

Martin


Except checking x == 1 won't work if ssa_name(1)  was not updated.

Just set a flag if the header hasn't been printed yet. tried and true. I 
guess you can protect it behind details if you want.


Andrew

Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

2021-08-25 Thread Segher Boessenkool

Hi Mike,

On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote:
> I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
> xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
> permute instructions (3 cycle latency) and should use 'vecperm' instead.

They are all executed on the PM pipe currently, yup.  If this changes
later we'll have to fix it, but that is for then :-)

> While I was at it, I changed the UNSPEC name for xxspltidp to be
> UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

In the future please do separate things as separate patches.

>   * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
>   UNSPEC_XXSPLTID.

* config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to...
(UNSPEC_XXSPLTIDP): ... this.

>   (xxspltidp_v2df): Use vecperm type attribute.  Use
>   UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

Typo ("UU").

Okay for trunk with those trivial fixes.  Also okay for backport to 11,
it is trivial enough.  Thanks!

Out of interest, did you notice any scheduling differences with this?

Segher

Re: [PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-25 Thread Iain Sandoe via Gcc-patches

Hi,

> On 20 Aug 2021, at 11:29, Richard Sandiford  wrote:
> 
>>> Maybe it would be easier to have the makefile fragments determine
>>> something like CODE_MODEL_CFLAGS, which can be "-fPIC", "-mdynamic-no-pic",
>>> etc., and use:
>>> 
>>> COMPILER += $(NO_PIE_CFLAGS) $(CODE_MODEL_CFLAGS)
>> 
>> OK. I have misgivings about this - the problem is that:
>> 
>> -fPIC -fno-PIE != -fno-PIE -fPIC,  which is not obvious to many folks - who 
>> expect that
>> the “last edition of a flag will be the one in force”.
>> 
>> So the PIE-ness and the PIC-ness are decoupled in the configury but they 
>> need to be
>> ordered specifically for targets that want PIC code by default (FWIW, I 
>> don’t think Darwin
>> is the only default-PIC case here, from discussions on irc).
> 
> Yeah, that's what the above was supposed to achieve.  In other words,
> if you force non-PIE, you also need to follow that by $(CODE_MODEL_CFLAGS),
> which restates whatever the base code model is.
> 
> If it's the decoupling you're worried about, then an alternative would
> be to have:
> 
>  NO_PIE_CFLAGS="-fno-PIE \$(CODE_MODEL_CFLAGS)”

I’d like to ask a couple of questions (of HJ who introduced the no-PIE logic) 
before implementing this.

A. We use no-PIE for cc1* because that is needed to handle the PCH 
implementation (which relies on the executables being loaded at the same 
addresses each time).

B. It’s certainly not obvious to me why we need to build code to run on $build 
to be no-PIE - I don’t see any such dependencies in the generators etc.

 - So Question1 - HJ what was the motivation for making the  XXX_BUILD_XXX 
adopt no-PIE?

——

Independently of this we seem to be building the objects for $host thus:

$(CXX) (NO_PIE_CFLAGS) -c $(ALL_CXXFLAGS) etc.

but we build for $build thus:

$(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
$(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)

which means that code model flags in $ALL_CXXFLAGS are overridden for $build, 
but active for $host
^^ this is actually what causes the Darwin build fail - since on Darwin we 
cannot build static linked code for user-space processes.

in any event that’s inconsistent (unless there’s a reason that it should be 
different).



below are extracts from gcc/Makefile *on linux* which demonstrates the 
different ordering.

AFAICT, 
NO_PIE_CFLAGS_FOR_BUILD, NO_PIE_FLAG_FOR_BUILD are dead variables?

Question 2 : HJ, what was your intention for how a configuration would request 
PIC code (for example) for things to run on $build?

thanks
Iain

---


ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
  $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \

--
# Native compiler for the build machine and its switches.
CC_FOR_BUILD = $(CC)
CXX_FOR_BUILD = $(CXX)
NO_PIE_CFLAGS_FOR_BUILD = 
NO_PIE_FLAG_FOR_BUILD = 
BUILD_CFLAGS= $(ALL_CFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
BUILD_CXXFLAGS = $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE
BUILD_NO_PIE_CFLAGS = $(NO_PIE_CFLAGS)
BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS)
BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS)

# Native compiler that we use.  This may be C++ some day.
COMPILER_FOR_BUILD = $(CXX_FOR_BUILD)
BUILD_COMPILERFLAGS = $(BUILD_CXXFLAGS)

# Native linker that we use.
LINKER_FOR_BUILD = $(CXX_FOR_BUILD)
BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)

# Native linker and preprocessor flags.  For x-fragment overrides.
BUILD_LDFLAGS=$(LDFLAGS)
BUILD_NO_PIE_FLAG = $(NO_PIE_FLAG)
BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG)
BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
-I$(srcdir)/../include  $(CPPINC) $(CPPFLAGS)
--
# This is the variable actually used when we compile. If you change this,
# you probably want to update BUILD_CFLAGS in configure.ac
ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \

build/%.o :  # dependencies provided by explicit rule later
$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
-o $@ $<

 this has 
$(CXX) -c $(ALL_CXXFLAGS) $(GENERATOR_CFLAGS) -DGENERATOR_FILE 
$(BUILD_NO_PIE_CFLAGS) $(BUILD_CPPFLAGS)
NO_PIE_CFLAGS_FOR_BUILD is apparently ignored

# Rule for the generator programs:
$(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): build/gen%.o 
$(BUILD_LIBDEPS)
+$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
$(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)

--
build/genversion$(build_exeext): build/genversion.o
+$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) \
build/genversion.o -o $@
=

# The name of the compiler to use.
COMPILER = $(CXX)
COMPILER_FLAGS = $(CXXFLAGS)
# If HOST_LIBS is set, then the user is controlling the libraries to
--
CET_HOST_FLAGS = -fcf-protection
COMPILER += $(CET_HOST_FLAGS)

--
# We don't want to compile the compilers with -fPIE, it make PCH fail.
COMPILER += $(NO_PIE_CFLAGS)

--
# A list of all the language-specific executables.
COMPILERS =  gnat1$(exeext) cc1$(exeext)

[PATCH] avoid printing range table header alone

2021-08-25 Thread Martin Sebor via Gcc-patches


I noticed that the output of -fdump-tree-walloca (and probably all
passes that call gimple_ranger::export_global_ranges()) includes
the following two lines for all functions, even when there's
nothing else of relevance after them:

Exported global range table
===

I was a bit puzzled by it at first, wondering if it was leftover
debugging output, until I checked the code and realized it's
a header that may be (but isn't always) followed by the contents
of the table.

To reduce clutter in the dump the attached patch changes the function
to print the header only when it is followed by at least one line of
output.  (Another tweak to reduce even further the amount of output
printed by default that might be worth considering is to print
the table only when TDF_DETAILS is set.)

Martin
Avoid printing range table header alone.

gcc/ChangeLog:
	* gimple-range.cc (gimple_ranger::export_global_ranges):

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index ef3afeacc90..521d7cee5f0 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -259,16 +259,9 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
 void
 gimple_ranger::export_global_ranges ()
 {
-  unsigned x;
-  int_range_max r;
-  if (dump_file)
-{
-  fprintf (dump_file, "Exported global range table\n");
-  fprintf (dump_file, "===\n");
-}
-
-  for ( x = 1; x < num_ssa_names; x++)
+  for (unsigned x = 1; x < num_ssa_names; x++)
 {
+  int_range_max r;
   tree name = ssa_name (x);
   if (name && !SSA_NAME_IN_FREE_LIST (name)
 	  && gimple_range_ssa_p (name)
@@ -276,21 +269,28 @@ gimple_ranger::export_global_ranges ()
 	  && !r.varying_p())
 	{
 	  bool updated = update_global_range (r, name);
+	  if (!updated || !dump_file)
+	continue;
 
-	  if (updated && dump_file)
+	  if (x == 1)
 	{
-	  value_range vr = r;
-	  print_generic_expr (dump_file, name , TDF_SLIM);
-	  fprintf (dump_file, " --> ");
-	  vr.dump (dump_file);
+	  /* Print the header only when there's something else
+		 to print below.  */
+	  fprintf (dump_file, "Exported global range table\n");
+	  fprintf (dump_file, "===\n");
+	}
+
+	  value_range vr = r;
+	  print_generic_expr (dump_file, name , TDF_SLIM);
+	  fprintf (dump_file, " --> ");
+	  vr.dump (dump_file);
+	  fprintf (dump_file, "\n");
+	  int_range_max same = vr;
+	  if (same != r)
+	{
+	  fprintf (dump_file, " irange : ");
+	  r.dump (dump_file);
 	  fprintf (dump_file, "\n");
-	  int_range_max same = vr;
-	  if (same != r)
-		{
-		  fprintf (dump_file, " irange : ");
-		  r.dump (dump_file);
-		  fprintf (dump_file, "\n");
-		}
 	}
 	}
 }

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread David Malcolm via Gcc-patches

On Wed, 2021-08-25 at 21:25 +0530, Ankur Saini wrote:
> 
> 
> > On 25-Aug-2021, at 8:35 PM, David Malcolm 
> > wrote:
> > 
> > On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote:
> > > 
> > > 
> > > > On 25-Aug-2021, at 7:24 PM, Martin Liška 
> > > > wrote:
> > > > 
> > > > On 8/25/21 15:22, David Malcolm via Gcc-patches wrote:
> > > > > On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote:
> > > > > > This should also fix the failing regression found in PR
> > > > > > analyzer/101980.
> > > > > > 
> > > > > > - The patch is in sync with current master
> > > > > > - successfully bootstrapped and tested on x86_64-linux-gnu.
> > > > > > 
> > > > > The patch is OK for trunk.
> > > > > Thanks for fixing this
> > > > > Dave
> > > > 
> > > > Hello.
> > > > 
> > > > Quite accidentally, but I noticed the patch violates GNU coding
> > > > style:
> > > > 
> > > > $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 |
> > > > ./contrib/check_GNU_style.py -
> > > > 
> > > > === ERROR type #1: blocks of 8 spaces should be replaced with
> > > > tabs
> > > > (8 error(s)) ===
> > > > 
> > > > gcc/analyzer/engine.cc:3064:0: that exceed it further.
> > > > 
> > > > gcc/analyzer/engine.cc:3065:0: This is something of a
> > > > blunt
> > > > workaround, but it only
> > > > 
> > > > gcc/analyzer/engine.cc:3066:0: applies to recursion
> > > > (and
> > > > mutual recursion), not to
> > > > 
> > > > gcc/analyzer/engine.cc:3067:0: general call stacks.  */
> > > > 
> > > > gcc/analyzer/engine.cc:3069:0:  >
> > > > param_analyzer_max_recursion_depth)
> > > > 
> > > > gcc/analyzer/engine.cc:3071:0:if (logger)
> > > > 
> > > > gcc/analyzer/engine.cc:3072:0:  logger->log ("rejecting
> > > > call edge: recursion limit exceeded");
> > > > 
> > > > gcc/analyzer/engine.cc:3073:0:return false;
> > > > 
> > > 
> > > Looks like my editor again converted all the tabs to spaces.
> > > 
> > > btw, I can also see a lot of other places where 8 spaces are not
> > > begin converted to tabs, should I also change that accordingly or
> > > leave them the way it is and just update this patch ?
> > 
> > It's usually best to split out bugfixes from formatting/whitespace
> > changes, so maybe do it as two patches:
> > 
> > (1) an updated version of this patch to fix the recursion issue,
> > using
> > the correct whitespace for the lines that are touched
> 
> unfortunately, I already checked in and pushed the changes to the
> master before this was pointed out. 

Fair enough.

> But I can add them to the whitespace issue patch.

Sounds good.
Dave

> 
> > 
> > (2) a patch that fixes whitespaces issues, but doesn't change the
> > behavior of the code
> > 
> > 
> > Dave
>

[Patch] Fix cygming-crtend.c build warning due to -Wprio-ctor-dtor

2021-08-25 Thread Jonathan Yong via Gcc-patches


Attached patches OK?

cygming-crtend.c: fix build warnings

libgcc/Changelog:
* config/i386/cygming-crtend.c: Fix register_frame_ctor
and register_frame_dtor warnings.


extend.texi: add note about reserved ctor/dtor priorities

gcc/Changelog:
* doc/extend.texi: Add note about reserved priorities
to the constructor attribute.
From e9758ef8c03e617eafe13ca6512fd7cd0256abc7 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Wed, 25 Aug 2021 16:33:36 +
Subject: [PATCH 1/2] cygming-crtend.c: fix build warnings

libgcc/Changelog:

	* config/i386/cygming-crtend.c: Fix register_frame_ctor
	and register_frame_dtor warnings.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 libgcc/config/i386/cygming-crtend.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgcc/config/i386/cygming-crtend.c b/libgcc/config/i386/cygming-crtend.c
index c7ba109a04a..4ab63427ee2 100644
--- a/libgcc/config/i386/cygming-crtend.c
+++ b/libgcc/config/i386/cygming-crtend.c
@@ -56,7 +56,10 @@ static EH_FRAME_SECTION_CONST int __FRAME_END__[]
 extern void __gcc_register_frame (void); 
 extern void __gcc_deregister_frame (void);
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wprio-ctor-dtor"
 static void register_frame_ctor (void) __attribute__ ((constructor (0)));
+#pragma GCC diagnostic pop
 
 static void
 register_frame_ctor (void)
@@ -65,7 +68,10 @@ register_frame_ctor (void)
 }
 
 #if !DEFAULT_USE_CXA_ATEXIT
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wprio-ctor-dtor"
 static void deregister_frame_dtor (void) __attribute__ ((destructor (0)));
+#pragma GCC diagnostic pop
 
 static void
 deregister_frame_dtor (void)
-- 
2.33.0

From 12b18651af252fb111e1a7375fe09fbc69987922 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Wed, 25 Aug 2021 16:36:14 +
Subject: [PATCH 2/2] extend.texi: add note about reserved ctor/dtor priorities

gcc/Changelog:

	* doc/extend.texi: Add note about reserved priorities
	to the constructor attribute.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/doc/extend.texi | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 49df8e6dc38..251a10302b4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2787,16 +2787,16 @@ On some targets the attributes also accept an integer argument to
 specify a priority to control the order in which constructor and
 destructor functions are run.  A constructor
 with a smaller priority number runs before a constructor with a larger
-priority number; the opposite relationship holds for destructors.  So,
-if you have a constructor that allocates a resource and a destructor
-that deallocates the same resource, both functions typically have the
-same priority.  The priorities for constructor and destructor
-functions are the same as those specified for namespace-scope C++
-objects (@pxref{C++ Attributes}).  However, at present, the order in which
-constructors for C++ objects with static storage duration and functions
-decorated with attribute @code{constructor} are invoked is unspecified.
-In mixed declarations, attribute @code{init_priority} can be used to
-impose a specific ordering.
+priority number; the opposite relationship holds for destructors.  Note
+that priorities 0-100 are reserved.  So, if you have a constructor that
+allocates a resource and a destructor that deallocates the same
+resource, both functions typically have the same priority.  The
+priorities for constructor and destructor functions are the same as
+those specified for namespace-scope C++ objects (@pxref{C++ Attributes}).
+However, at present, the order in which constructors for C++ objects
+with static storage duration and functions decorated with attribute
+@code{constructor} are invoked is unspecified. In mixed declarations,
+attribute @code{init_priority} can be used to impose a specific ordering.
 
 Using the argument forms of the @code{constructor} and @code{destructor}
 attributes on targets where the feature is not supported is rejected with
-- 
2.33.0



OpenPGP_0x713B5FE29C145D45.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

2021-08-25 Thread Michael Meissner via Gcc-patches

Make xxsplti*, xpermx, xxeval be vecperm type.

I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
permute instructions (3 cycle latency) and should use 'vecperm' instead.

While I was at it, I changed the UNSPEC name for xxspltidp to be
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

I've built bootstrap compilers with this fix applied on little endian power9,
little endian power10, and big endian power8.  Can I check this patch into the
master branch.

I would also like to check this patch into the gcc-11 branch after a suitable
period of time.  Can I do this also?  Note, the patch for gcc-11 will be
different because the patch to move the xx* built-ins from altivec.md to vsx.md
has not been back ported to gcc 11.

2021-08-24  Michael Meissner  

gcc/
* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
UNSPEC_XXSPLTID.
(xxspltiw_v4si): Use vecperm type attribute.
(xxspltiw_v4si_inst): Use vecperm type attribute.
(xxspltiw_v4sf_inst): Likewise.
(xxspltidp_v2df): Use vecperm type attribute.  Use
UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
(xxspltidp_v2df_inst): Likewise.
(xxsplti32dx_v4si): Use vecperm type attribute.
(xxsplti32dx_v4si_inst): Likewise.
(xxsplti32dx_v4sf_inst): Likewise.
(xxblend_): Likewise.
(xxpermx): Likewise.
(xxpermx_inst): Likewise.
(xxeval): Likewise.
---
 gcc/config/rs6000/vsx.md | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e4ca6e94d49..bf033e31c1c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -374,7 +374,7 @@ (define_c_enum "unspec"
UNSPEC_VDIVEU
UNSPEC_XXEVAL
UNSPEC_XXSPLTIW
-   UNSPEC_XXSPLTID
+   UNSPEC_XXSPLTIDP
UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
UNSPEC_XXPERMX
@@ -6414,7 +6414,7 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
+ [(set_attr "type" "vecperm")
   (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
@@ -6434,14 +6434,14 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
+ [(set_attr "type" "vecperm")
   (set_attr "prefixed" "yes")])
 
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
-UNSPEC_XXSPLTID))]
+UNSPEC_XXSPLTIDP))]
  "TARGET_POWER10"
 {
   long value = rs6000_const_f32_to_i32 (operands[1]);
@@ -6452,10 +6452,10 @@ (define_expand "xxspltidp_v2df"
 (define_insn "xxspltidp_v2df_inst"
   [(set (match_operand:V2DF 0 "register_operand" "=wa")
(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-UNSPEC_XXSPLTID))]
+UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
(set_attr "prefixed" "yes")])
 
 ;; XXSPLTI32DX built-in function support
@@ -6476,7 +6476,7 @@ (define_expand "xxsplti32dx_v4si"
 GEN_INT (index), operands[3]));
DONE;
 }
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")])
 
 (define_insn "xxsplti32dx_v4si_inst"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -6486,7 +6486,7 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
(set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
@@ -6515,7 +6515,7 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
(set_attr "prefixed" "yes")])
 
 ;; XXBLEND built-in function support
@@ -6527,7 +6527,7 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
(set_attr "prefixed" "yes")])
 
 ;; XXPERMX built-in function support
@@ -6562,7 +6562,7 @@ (define_expand "xxpermx"
 
   DONE;
 }
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")])
 
 (define_insn "xxpermx_inst"
   [(set (match_operand:V2DI 0 "register_operand" "+v")
@@ -6573,7 +6573,7 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
(set_attr "prefixed" "yes")])
 
 ;; XXEVAL built-in function support
@@ -6586,6

Re: [PING][PATCH] enable ranger and caching in pass_waccess

2021-08-25 Thread Andrew MacLeod via Gcc-patches


On 8/25/21 11:20 AM, Martin Sebor wrote:

Ping: Andrew, did I answer your questions?  Do you (or anyone else)
have any other comments on the latest patch below?

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html



I wasn't attempting to block it, its outside my review purview..

I was merely commenting that you should not need a pointer to a 
range_query at all anymore









Are you planning to transition to using the get_range_query() 
interface instead of keeping a range_query pointer in the 
pointer_query class?


This pass and to a smaller extent the pointer_query class that's
used by it and the strlen pass are still a work in progress.
I also still need to convert the strlen pass to use Ranger and
I expect it will take some changes to pointer_query.  So at that
point, if going through get_range_query (cfun) everywhere is what
you recommend, I'm happy to do it.

absolutely. you should not need to even know whether you have a ranger 
instance running or not. get_range_query will give you whichever is 
active, and there is ALWAYS something active.. defaulting to the global 
version.


the code in get_range() seems to be the end of the call chain which uses 
the pointer and should be consolidated to something much simpler


 if (rvals && stmt)
    {
  if (!rvals->range_of_expr (vr, val, stmt))
    return NULL_TREE;

    <...>

  // ?? This entire function should use get_range_query or
   get_global_range_query (),
  // instead of doing something different for RVALS and global ranges.

  if (!get_global_range_query ()->range_of_expr (vr, val) ||
   vr.undefined_p ())
    return NULL_TREE;


This entire section can boil down to something like

if (!get_range_query (cfun)->range_of_expr (vr, val, stmt))
  return NULL;





PS There has been an effort to get rid of global variables from GCC,
or, as the first step, to avoid accessing them directly(*).  If and
when that happens, it seems like each pass will have to store either
the ranger instance as a member (directly or indirectly, via a member
of a class that stores it) or the function passed to pass::execute()
if it wants to access either.

[*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html
The patch at the link above wasn't approved but IIUC removing globals
from GCC is still a goal.


I have no idea what direction that is going, but the net effect will be 
the same in the end.  You'll be calling get_range_query() with either 
the function pointer, or the pass pointer, or something.. but you should 
never need to pass around  a pointer to either a ranger or range-query.  
As I said earlier, this will likely be a pass property and accessed thru 
the pass eventually... but until then, use what we have.. cfun.  It'll 
be trivial to transition down the road to whatever the ultimate solution is.


Andrew

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread Ankur Saini via Gcc-patches




> On 25-Aug-2021, at 8:35 PM, David Malcolm  wrote:
> 
> On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote:
>> 
>> 
>>> On 25-Aug-2021, at 7:24 PM, Martin Liška  wrote:
>>> 
>>> On 8/25/21 15:22, David Malcolm via Gcc-patches wrote:
 On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote:
> This should also fix the failing regression found in PR
> analyzer/101980.
> 
> - The patch is in sync with current master
> - successfully bootstrapped and tested on x86_64-linux-gnu.
> 
 The patch is OK for trunk.
 Thanks for fixing this
 Dave
>>> 
>>> Hello.
>>> 
>>> Quite accidentally, but I noticed the patch violates GNU coding
>>> style:
>>> 
>>> $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 |
>>> ./contrib/check_GNU_style.py -
>>> 
>>> === ERROR type #1: blocks of 8 spaces should be replaced with tabs
>>> (8 error(s)) ===
>>> 
>>> gcc/analyzer/engine.cc:3064:0: that exceed it further.
>>> 
>>> gcc/analyzer/engine.cc:3065:0: This is something of a blunt
>>> workaround, but it only
>>> 
>>> gcc/analyzer/engine.cc:3066:0: applies to recursion (and
>>> mutual recursion), not to
>>> 
>>> gcc/analyzer/engine.cc:3067:0: general call stacks.  */
>>> 
>>> gcc/analyzer/engine.cc:3069:0:  >
>>> param_analyzer_max_recursion_depth)
>>> 
>>> gcc/analyzer/engine.cc:3071:0:if (logger)
>>> 
>>> gcc/analyzer/engine.cc:3072:0:  logger->log ("rejecting
>>> call edge: recursion limit exceeded");
>>> 
>>> gcc/analyzer/engine.cc:3073:0:return false;
>>> 
>> 
>> Looks like my editor again converted all the tabs to spaces.
>> 
>> btw, I can also see a lot of other places where 8 spaces are not
>> begin converted to tabs, should I also change that accordingly or
>> leave them the way it is and just update this patch ?
> 
> It's usually best to split out bugfixes from formatting/whitespace
> changes, so maybe do it as two patches:
> 
> (1) an updated version of this patch to fix the recursion issue, using
> the correct whitespace for the lines that are touched

unfortunately, I already checked in and pushed the changes to the master before 
this was pointed out. But I can add them to the whitespace issue patch.

> 
> (2) a patch that fixes whitespaces issues, but doesn't change the
> behavior of the code
> 
> 
> Dave

[PING][PATCH] enable ranger and caching in pass_waccess

2021-08-25 Thread Martin Sebor via Gcc-patches


Ping: Andrew, did I answer your questions?  Do you (or anyone else)
have any other comments on the latest patch below?

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577865.html

On 8/20/21 4:16 PM, Martin Sebor wrote:

On 8/20/21 7:09 AM, Andrew MacLeod wrote:

On 8/19/21 7:09 PM, Martin Sebor via Gcc-patches wrote:

The attached patch changes the new access warning pass to use
the per-function ranger instance.  To do that it makes a number
of the global static functions members of the pass (that involved
moving one to a later point in the file, increasing the diff;
the body of the function hasn't changed otherwise).  Still more
functions remain.  At the same time, the patch also enables
the simple pointer_query cache to avoid repeatedly recomputing
the properties of related pointers into the same objects, and
makes the cache more effective (trunk fails to cache a bunch of
intermediate results).  Finally, the patch enhances the debugging
support for the cache.

Other than the ranger/caching the changes have no user-visible
effect.



Why are you calling enable/disable ranger if you are passing a ranger 
instance around instead of using the get_range_query (cfun)->range* 
calls?


The pass stores an instance of the pointer_query class which in
turn stores a pointer to range_query (which is a copy of the ranger).
So storing it also in pass_waccess isn't necessary and can be
removed.  I've made that change in the attached update.  I'm not
sure the corresponding pointer should at some point also be removed
from the pointer_query class and replaced by calls to get_range_query
(cfun).  If so, that would take some surgery to the strlen pass which
also uses pointer_query and isn't quite ready to make this switch.




Are you planning to transition to using the get_range_query() 
interface instead of keeping a range_query pointer in the 
pointer_query class?


This pass and to a smaller extent the pointer_query class that's
used by it and the strlen pass are still a work in progress.
I also still need to convert the strlen pass to use Ranger and
I expect it will take some changes to pointer_query.  So at that
point, if going through get_range_query (cfun) everywhere is what
you recommend, I'm happy to do it.

Anyway, attached is an updated revision with the m_ranger member
removed and a few helpers changed to take a range_query argument
to use the pointer_query member instead.  It was retested on
x86_64-linux.

Martin

PS There has been an effort to get rid of global variables from GCC,
or, as the first step, to avoid accessing them directly(*).  If and
when that happens, it seems like each pass will have to store either
the ranger instance as a member (directly or indirectly, via a member
of a class that stores it) or the function passed to pass::execute()
if it wants to access either.

[*] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573975.html
The patch at the link above wasn't approved but IIUC removing globals
from GCC is still a goal.

Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator

2021-08-25 Thread Siddhesh Poyarekar


On 8/25/21 5:44 PM, Matthias Klose wrote:

On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote:

On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote:

Recognize __builtin_free as being equivalent to free when passed into
__attribute__((malloc ())), similar to how it is treated when it is
encountered as a call.  This fixes spurious warnings in glibc where
xmalloc family of allocators as well as reallocarray, memalign,
etc. are declared to have __builtin_free as the free function.

 gcc/analyzer/ChangeLog:
 * sm-malloc.cc
 (malloc_state_machine::get_or_create_deallocator): Recognize
 __builtin_free.

 gcc/testsuite/ChangeLog:
 * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc,
 compatible_alloc2): New extern allocator declarations.
 (test_9, test_10): New tests.


Looks good to me, thanks
Dave





Please could this be backported to all active branches?



Sure, it looks like only gcc11 needs this since malloc attribute 
matching seems recent.  David, I've never done a backport before, may I 
just cherry-pick, push and post a [committed] patch on list or does it 
need to go through review?


Thanks,
Siddhesh

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread David Malcolm via Gcc-patches

On Wed, 2021-08-25 at 20:31 +0530, Ankur Saini wrote:
> 
> 
> > On 25-Aug-2021, at 7:24 PM, Martin Liška  wrote:
> > 
> > On 8/25/21 15:22, David Malcolm via Gcc-patches wrote:
> > > On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote:
> > > > This should also fix the failing regression found in PR
> > > > analyzer/101980.
> > > > 
> > > > - The patch is in sync with current master
> > > > - successfully bootstrapped and tested on x86_64-linux-gnu.
> > > > 
> > > The patch is OK for trunk.
> > > Thanks for fixing this
> > > Dave
> > 
> > Hello.
> > 
> > Quite accidentally, but I noticed the patch violates GNU coding
> > style:
> > 
> > $ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 |
> > ./contrib/check_GNU_style.py -
> > 
> > === ERROR type #1: blocks of 8 spaces should be replaced with tabs
> > (8 error(s)) ===
> > 
> > gcc/analyzer/engine.cc:3064:0: that exceed it further.
> > 
> > gcc/analyzer/engine.cc:3065:0: This is something of a blunt
> > workaround, but it only
> > 
> > gcc/analyzer/engine.cc:3066:0: applies to recursion (and
> > mutual recursion), not to
> > 
> > gcc/analyzer/engine.cc:3067:0: general call stacks.  */
> > 
> > gcc/analyzer/engine.cc:3069:0:  >
> > param_analyzer_max_recursion_depth)
> > 
> > gcc/analyzer/engine.cc:3071:0:if (logger)
> > 
> > gcc/analyzer/engine.cc:3072:0:  logger->log ("rejecting
> > call edge: recursion limit exceeded");
> > 
> > gcc/analyzer/engine.cc:3073:0:return false;
> > 
> 
> Looks like my editor again converted all the tabs to spaces.
> 
> btw, I can also see a lot of other places where 8 spaces are not
> begin converted to tabs, should I also change that accordingly or
> leave them the way it is and just update this patch ?

It's usually best to split out bugfixes from formatting/whitespace
changes, so maybe do it as two patches:

(1) an updated version of this patch to fix the recursion issue, using
the correct whitespace for the lines that are touched

(2) a patch that fixes whitespaces issues, but doesn't change the
behavior of the code


Dave

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread Martin Liška


On 8/25/21 15:22, David Malcolm via Gcc-patches wrote:

On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote:

This should also fix the failing regression found in PR
analyzer/101980.

- The patch is in sync with current master
- successfully bootstrapped and tested on x86_64-linux-gnu.



The patch is OK for trunk.

Thanks for fixing this
Dave




Hello.

Quite accidentally, but I noticed the patch violates GNU coding style:

$ git show 43a5d46feabd93ba78983919234f05f5fc9a0982 | 
./contrib/check_GNU_style.py -

=== ERROR type #1: blocks of 8 spaces should be replaced with tabs (8 error(s)) 
===

gcc/analyzer/engine.cc:3064:0: that exceed it further.

gcc/analyzer/engine.cc:3065:0: This is something of a blunt workaround, 
but it only

gcc/analyzer/engine.cc:3066:0: applies to recursion (and mutual 
recursion), not to

gcc/analyzer/engine.cc:3067:0: general call stacks.  */

gcc/analyzer/engine.cc:3069:0:  > param_analyzer_max_recursion_depth)

gcc/analyzer/engine.cc:3071:0:if (logger)

gcc/analyzer/engine.cc:3072:0:  logger->log ("rejecting call edge: recursion 
limit exceeded");

gcc/analyzer/engine.cc:3073:0:return false;


Martin

Re: Ping: [PATCH] diagnostics: Support for -finput-charset [PR93067]

2021-08-25 Thread David Malcolm via Gcc-patches

On Tue, 2021-08-24 at 19:28 -0400, Lewis Hyatt wrote:
> On Tue, Aug 24, 2021 at 6:51 PM David Malcolm 
> wrote:
> > 
> > On Tue, 2021-08-24 at 08:17 -0400, Lewis Hyatt wrote:
> > > Hello-
> > > 
> > > I thought it might be a good time to check on this patch please?
> > > Thanks!
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576449.html
> > > 
> > > -Lewis
> > 
> > I went through that latest version of the patch and have no further
> > suggestions - I like the changes you made to incorporate the
> > changes I
> > had made to input.c.
> > 
> > The latest version of the patch is OK for trunk.
> > 
> > It might be an idea to rebase it and retest it before pushing it,
> > to
> > make sure nothing significant has changed in the last few weeks.
> > 
> > Thanks for your work on this, and sorry again for the delay in
> > reviewing it.
> > 
> > Dave
> > 
> > 
> 
> OK great, thanks for your time. I will push after retesting.
> 
> BTW, do you think it would be worthwhile to work on the other half of
> encoding support, i.e. translating from UTF-8 to the user's locale,
> when outputting diagnostics? I have probably 90% of a patch that does
> this, however it complexifies things a bit and I am not sure if it is
> really worth the trouble. What is rather manageable (that my patch in
> progress does now) is to replace non-translatable characters with
> something like UCN escapes. What is not so easy, is to do this and
> preserve the alignment of carets and label lines and such... this
> requires making the display width of a character also
> locale-dependent, which concept doesn't exist currently. Adding that
> feels like a lot of complication for what would be a little-used
> feature... Anyway, if you think a patch that does the translation
> without preserving the alignment would be useful, I could finish it
> up
> and send it. Otherwise I was kinda inclined to forget about it.
> Thanks!

Maybe post the patch you have so far, making clear that it's
unfinished/work-in-progress?  I think I'll find it easier to think
about this with a patch in hand rather than a description of a patch,
and also, that way the list archives will have a copy of your work in
case we do want to finish it at some later point (rather than it just
being on your hard drive).

Thanks
Dave

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread David Malcolm via Gcc-patches

On Wed, 2021-08-25 at 13:39 +0530, Ankur Saini wrote:
> This should also fix the failing regression found in PR
> analyzer/101980.
> 
> - The patch is in sync with current master
> - successfully bootstrapped and tested on x86_64-linux-gnu.
> 

The patch is OK for trunk.

Thanks for fixing this
Dave

Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-08-25 Thread H.J. Lu via Gcc-patches

On Mon, Aug 23, 2021 at 10:55 PM John McCall  wrote:
>
> On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu  wrote:
>>
>> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu  wrote:
>> >
>> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers  
>> > wrote:
>> > >
>> > > On Tue, 13 Jul 2021, H.J. Lu wrote:
>> > >
>> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  
>> > > > wrote:
>> > > > >
>> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 
>> > > > > > registers.
>> > > > >
>> > > > > Can you please explain the behavior here? Is there difference 
>> > > > > between _Float16 and _Complex _Float16 when return? I.e.,
>> > > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
>> > > > > 2, For a single _Float16 value, are both real part and imaginary 
>> > > > > part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
>> > > >
>> > > > Here is the v2 patch to add the missing _Float16 bits.   The PDF file 
>> > > > is at
>> > > >
>> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
>> > >
>> > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be
>> > > 4-byte size, 2-byte alignment).
>> > >
>> > > It also seems to change double from 4-byte to 8-byte alignment, which is
>> > > wrong.  And it's inconsistent about whether it covers the long double =
>> > > double (Android) case - it shows that case for _Complex long double but
>> > > not for long double itself.
>> >
>> > Here is the v3 patch with the fixes.  I also updated the PDF file.
>>
>> Here is the final patch I checked in.   _Complex _Float16 is changed to 
>> return
>> in XMM0 register.   The new PDF file is at
>>
>> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
>
>
> This should be explicit that the real part is returned in bits 0..15 and the 
> imaginary part is returned in bits 16..31, or however we conventionally 
> designate subcomponents of a vector.
>
> John.

How about this?

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 860ff66..8f527c1 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory.
 & \texttt{__float128} & memory \\
 \hline
 & \texttt{_Complex _Float16} & \reg{xmm0} \\
+& & The real part is returned in bits 0..15. The imaginary part is
+returned \\
+& & in bits 16..31.\\
 \cline{2-3}
 Complex & \texttt{_Complex float} & \EDX:\EAX \\
 floating- & & The real part is returned in \EAX. The imaginary part is

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf

-- 
H.J.

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread Bill Schmidt via Gcc-patches


Hi Haochen,

Thanks for the updates!  This looks good to me; please await Segher's 
response.


Bill

On 8/25/21 2:06 AM, HAO CHEN GUI wrote:

Hi,

      I refined the patch according to Bill's advice. I pasted the
ChangeLog and diff file here. If it doesn't work, please let me know.
Thanks.

2021-08-25 Haochen Gui 

gcc/
      * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
      Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
      VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/
      * gcc.target/powerpc/vec-minmax-1.c: New test.
      * gcc.target/powerpc/vec-minmax-2.c: Likewise.

diff --git a/gcc/config/rs6000/rs6000-call.c
b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..90527734ceb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin
(gimple_stmt_iterator *gsi)
     return true;
   /* flavors of vec_min.  */
   case VSX_BUILTIN_XVMINDP:
+    case ALTIVEC_BUILTIN_VMINFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MIN_EXPR.  */
+  gcc_fallthrough ();
   case P8V_BUILTIN_VMINSD:
   case P8V_BUILTIN_VMINUD:
   case ALTIVEC_BUILTIN_VMINSB:
@@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator
*gsi)
   case ALTIVEC_BUILTIN_VMINUB:
   case ALTIVEC_BUILTIN_VMINUH:
   case ALTIVEC_BUILTIN_VMINUW:
-    case ALTIVEC_BUILTIN_VMINFP:
     arg0 = gimple_call_arg (stmt, 0);
     arg1 = gimple_call_arg (stmt, 1);
     lhs = gimple_call_lhs (stmt);
@@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin
(gimple_stmt_iterator *gsi)
     return true;
   /* flavors of vec_max.  */
   case VSX_BUILTIN_XVMAXDP:
+    case ALTIVEC_BUILTIN_VMAXFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MAX_EXPR.  */
+  gcc_fallthrough ();
   case P8V_BUILTIN_VMAXSD:
   case P8V_BUILTIN_VMAXUD:
   case ALTIVEC_BUILTIN_VMAXSB:
@@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator
*gsi)
   case ALTIVEC_BUILTIN_VMAXUB:
   case ALTIVEC_BUILTIN_VMAXUH:
   case ALTIVEC_BUILTIN_VMAXUW:
-    case ALTIVEC_BUILTIN_VMAXFP:
     arg0 = gimple_call_arg (stmt, 0);
     arg1 = gimple_call_arg (stmt, 1);
     lhs = gimple_call_lhs (stmt);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
new file mode 100644
index 000..da92f059aea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
new file mode 100644
index 000..d318b933181
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be converted
+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a,

Re: [PATCH] Use non-numbered clones for target_clones.

2021-08-25 Thread Martin Liška


Hello.

There's updated version of the patch that:
- strips '.$number' suffix for default clones
- skips numbering for both declarations and definitions of target clones.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 99fdcd317cec7e3742e161f43951a67421506df7 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 20 Aug 2021 16:35:18 +0200
Subject: [PATCH] Use non-numbered clones for target_clones.

gcc/ChangeLog:

	* cgraph.h (create_version_clone_with_body): Add new parameter.
	* cgraphclones.c: Likewise.
	* multiple_target.c (create_dispatcher_calls): Do not use
	numbered suffixes.
	(create_target_clone): Likewise here.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/mvc5.c: Scan assembly names.
	* gcc.target/i386/mvc7.c: Likewise.
	* gcc.target/i386/pr95778-1.c: Update scanned patterns.
	* gcc.target/i386/pr95778-2.c: Likewise.

Co-Authored-By: Stefan Kneifel 
---
 gcc/cgraph.h  |  5 -
 gcc/cgraphclones.c| 11 ---
 gcc/multiple_target.c | 16 +++-
 gcc/testsuite/gcc.target/i386/mvc5.c  |  4 
 gcc/testsuite/gcc.target/i386/mvc7.c  |  8 ++--
 gcc/testsuite/gcc.target/i386/pr95778-1.c |  4 ++--
 gcc/testsuite/gcc.target/i386/pr95778-2.c |  4 ++--
 7 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index d54a258c2ee..4cdb3738b4d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1006,13 +1006,16 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
  that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
  of the declaration.
 
+ If VERSION_DECL is set true, use clone_function_name_numbered for the
+ function clone.  Otherwise, use clone_function_name.
+
  Return the new version's cgraph node.  */
   cgraph_node *create_version_clone_with_body
 (vec redirect_callers,
  vec *tree_map,
  ipa_param_adjustments *param_adjustments,
  bitmap bbs_to_copy, basic_block new_entry_block, const char *clone_name,
- tree target_attributes = NULL_TREE);
+ tree target_attributes = NULL_TREE, bool version_decl = true);
 
   /* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
  corresponding to cgraph_node.  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index b16e68194e1..ae91dccd31d 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -987,6 +987,9 @@ cgraph_node::create_version_clone (tree new_decl,
that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
of the declaration.
 
+   If VERSION_DECL is set true, use clone_function_name_numbered for the
+   function clone.  Otherwise, use clone_function_name.
+
Return the new version's cgraph node.  */
 
 cgraph_node *
@@ -995,7 +998,7 @@ cgraph_node::create_version_clone_with_body
vec *tree_map,
ipa_param_adjustments *param_adjustments,
bitmap bbs_to_copy, basic_block new_entry_block, const char *suffix,
-   tree target_attributes)
+   tree target_attributes, bool version_decl)
 {
   tree old_decl = decl;
   cgraph_node *new_version_node = NULL;
@@ -1016,8 +1019,10 @@ cgraph_node::create_version_clone_with_body
 new_decl = copy_node (old_decl);
 
   /* Generate a new name for the new version. */
-  DECL_NAME (new_decl) = clone_function_name_numbered (old_decl, suffix);
-  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
+  tree fnname = (version_decl ? clone_function_name_numbered (old_decl, suffix)
+		: clone_function_name (old_decl, suffix));
+  DECL_NAME (new_decl) = fnname;
+  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
 
   DECL_VIRTUAL_P (new_decl) = 0;
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 6c0565880c5..28d5f95dca5 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -166,9 +166,8 @@ create_dispatcher_calls (struct cgraph_node *node)
 	}
 }
 
-  symtab->change_decl_assembler_name (node->decl,
-  clone_function_name_numbered (
-	  node->decl, "default"));
+  tree fname = clone_function_name (node->decl, "default");
+  symtab->change_decl_assembler_name (node->decl, fname);
 
   if (node->definition)
 {
@@ -309,9 +308,9 @@ create_target_clone (cgraph_node *node, bool definition, char *name,
 
   if (definition)
 {
-  new_node = node->create_version_clone_with_body (vNULL, NULL,
-		   NULL, NULL,
-		   NULL, name, attributes);
+  new_node
+	= node->create_version_clone_with_body (vNULL, NULL, NULL, NULL, NULL,
+		name, attributes, false);
   if (new_node == NULL)
 	return NULL;
   new_node->force_output = true;
@@ -322,9 +321,8 @@ create_target_clone (cgraph_node *node, bool definition, char *name,
   new_node = cgraph_node::get_create (new_decl);
   DECL_ATTRIBUTES (new_decl) = attributes;
   /* Generate a new name for the new version.  */

Re: [GCC-11] [PATCH 0/5] Finish and general-regs-only

2021-08-25 Thread H.J. Lu via Gcc-patches

On Wed, Aug 25, 2021 at 12:34 AM Uros Bizjak  wrote:
>
> On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu  wrote:
> >
> > On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
> >  wrote:
> > >
> > > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu  wrote:
> > > >
> > > >  and target("general-regs-only") function attribute
> > > > were added to GCC 11.  But their implementations are incomplete.  I'd
> > > > like to backport the following patches to GCC 11 branch to finish them.
> > >
> > > Fine with me if x86 maintainers do not disagree (also see one comment I 
> > > have
> > > on the -mwait adding patch).
> >
> > Hi Uros, Honza,
> >
> > Do you have any comments?  The updated -mwait patch with LTO_minor_version
> > bump is at:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577471.html
>
> I don't have any comments, but IIRC, approved changes can be
> backported from mainline to release branches without additional
> approval.

I am checking them in.

Thanks.

> Uros.
>
> > Thanks.
> >
> > H.J.
> > > > H.J. Lu (5):
> > > >   x86: Add -mmwait for -mgeneral-regs-only
> > > >   x86: Use crc32 target option for CRC32 intrinsics
> > > >   x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions
> > > >   x86: Enable the GPR only instructions for -mgeneral-regs-only
> > > >   : Add pragma GCC target("general-regs-only")
> > > >
> > > >  gcc/common/config/i386/i386-common.c   |  45 ++-
> > > >  gcc/config.gcc |   6 +-
> > > >  gcc/config/i386/i386-builtin.def   |   8 +-
> > > >  gcc/config/i386/i386-builtins.c|   4 +-
> > > >  gcc/config/i386/i386-c.c   |   2 +
> > > >  gcc/config/i386/i386-options.c |  12 +
> > > >  gcc/config/i386/i386.c |   6 +-
> > > >  gcc/config/i386/i386.h |   2 +
> > > >  gcc/config/i386/i386.md|   4 +-
> > > >  gcc/config/i386/i386.opt   |   4 +
> > > >  gcc/config/i386/ia32intrin.h   |  42 ++-
> > > >  gcc/config/i386/mwaitintrin.h  |  52 +++
> > > >  gcc/config/i386/pmmintrin.h|  13 +-
> > > >  gcc/config/i386/serializeintrin.h  |   7 +-
> > > >  gcc/config/i386/sse.md |   4 +-
> > > >  gcc/config/i386/x86gprintrin.h |  13 +
> > > >  gcc/doc/extend.texi|   5 +
> > > >  gcc/doc/invoke.texi|   8 +-
> > > >  gcc/testsuite/gcc.target/i386/crc32-6.c|  13 +
> > > >  gcc/testsuite/gcc.target/i386/monitor-2.c  |  27 ++
> > > >  gcc/testsuite/gcc.target/i386/pr101492-1.c |  10 +
> > > >  gcc/testsuite/gcc.target/i386/pr101492-2.c |  10 +
> > > >  gcc/testsuite/gcc.target/i386/pr101492-3.c |  10 +
> > > >  gcc/testsuite/gcc.target/i386/pr101492-4.c |  12 +
> > > >  gcc/testsuite/gcc.target/i386/pr99744-3.c  |  13 +
> > > >  gcc/testsuite/gcc.target/i386/pr99744-4.c  | 357 +
> > > >  gcc/testsuite/gcc.target/i386/pr99744-5.c  |  25 ++
> > > >  gcc/testsuite/gcc.target/i386/pr99744-6.c  |  23 ++
> > > >  gcc/testsuite/gcc.target/i386/pr99744-7.c  |  12 +
> > > >  gcc/testsuite/gcc.target/i386/pr99744-8.c  |  13 +
> > > >  30 files changed, 717 insertions(+), 45 deletions(-)
> > > >  create mode 100644 gcc/config/i386/mwaitintrin.h
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-4.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-4.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-5.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-6.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-7.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-8.c
> > > >
> > > > --
> > > > 2.31.1
> > > >
> >
> >
> >
> > --
> > H.J.



-- 
H.J.

Re: [PATCH] analyzer: Recognize __builtin_free as a matching deallocator

2021-08-25 Thread Matthias Klose

On 7/28/21 1:44 PM, David Malcolm via Gcc-patches wrote:
> On Wed, 2021-07-28 at 10:34 +0530, Siddhesh Poyarekar wrote:
>> Recognize __builtin_free as being equivalent to free when passed into
>> __attribute__((malloc ())), similar to how it is treated when it is
>> encountered as a call.  This fixes spurious warnings in glibc where
>> xmalloc family of allocators as well as reallocarray, memalign,
>> etc. are declared to have __builtin_free as the free function.
>>
>> gcc/analyzer/ChangeLog:
>> * sm-malloc.cc
>> (malloc_state_machine::get_or_create_deallocator): Recognize
>> __builtin_free.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/analyzer/attr-malloc-1.c (compatible_alloc,
>> compatible_alloc2): New extern allocator declarations.
>> (test_9, test_10): New tests.
> 
> Looks good to me, thanks
> Dave
> 
> 


Please could this be backported to all active branches?

Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-08-25 Thread Marcel Vollweiler


Hi Jakub,

I applied all your suggested changes and checked for no test regressions
on x86_64-linux with nvptx offloading. The revised patch is attached.

Do you think that it's ok to commit the code?

Thanks,

Marcel

Am 23.08.2021 um 19:47 schrieb Jakub Jelinek:

On Fri, Aug 20, 2021 at 09:18:32PM +0200, Marcel Vollweiler wrote:


--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15864,37 +15864,81 @@ c_parser_omp_clause_map (c_parser *parser, tree list)
  }

  /* OpenMP 4.0:
-   device ( expression ) */
+>>>
   device ( expression )


Please remove all the >s.

+
+   OpenMP 5.0:
+   device ( [device-modifier :] integer-expression )
+
+   device-modifier:
+ ancestor | device_num */




+  /* A requires directive with the reverse_offload clause must be
+  specified.  */
+  if ((omp_requires_mask & OMP_REQUIRES_REVERSE_OFFLOAD) == 0)
+{
+  c_parser_error (parser, "a % directive with the "
+  "% clause must be "
+  "specified");


[BI think this diagnostics is confusing, it tells the user that it has to
do something but doesn't tell why.  It is also not a parser error.
So I think it should be instead
error_at (tok->location, "% device modifier not "
 "preceded by % directive "
 "with % clause");


+  parens.skip_until_found_close (parser);
+  return list;
+}
+  ancestor = true;
+}



+  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
+{
+  c_parser_error (parser, "expected integer expression");
+  return list;
  }

+  check_no_duplicate_clause (list, OMP_CLAUSE_DEVICE, "device");
+
+  c = build_omp_clause (clause_loc, OMP_CLAUSE_DEVICE);
+
+  OMP_CLAUSE_DEVICE_ID (c) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  OMP_CLAUSE_DEVICE_ANCESTOR (c) = ancestor;
+
+  list = c;
return list;
  }

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 5349ef1..b4d8d81 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -15139,6 +15139,22 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
 case OMP_CLAUSE_COLLAPSE:
 case OMP_CLAUSE_FINAL:
 case OMP_CLAUSE_DEVICE:
+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE
+  && OMP_CLAUSE_DEVICE_ANCESTOR (c))
+{
+  t = OMP_CLAUSE_DEVICE_ID (c);
+  if (TREE_CODE (t) == INTEGER_CST
+  && wi::to_widest (t) != 1)
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+"the % clause expression must evaluate to "
+"%<1%>");
+  remove = true;
+  break;
+}
+}
+  /* FALLTHRU */


For the C FE, I'd suggest to move this to the c_parser_omp_clause_device
routine like other similar checking is done there too.  And you can use
if (TREE_CODE (t) == INTEGER_CST && !integer_onep (t))

+  error_at (tok->location, "a % directive with the "



+   "% clause must be "
+   "specified");


See above.


@@ -38562,6 +38601,7 @@ cp_parser_omp_clause_device (cp_parser *parser, tree 
list,
c = build_omp_clause (location, OMP_CLAUSE_DEVICE);
OMP_CLAUSE_DEVICE_ID (c) = t;
OMP_CLAUSE_CHAIN (c) = list;
+  OMP_CLAUSE_DEVICE_ANCESTOR (c) = ancestor;


But in C++ the INTEGER_CST checking shouldn't be done here, because
the argument could be type or value dependent.


--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7334,6 +7334,15 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
 "% id must be integral");
   remove = true;
 }
+  else if (OMP_CLAUSE_DEVICE_ANCESTOR (c)
+   && TREE_CODE (t) == INTEGER_CST
+   && wi::to_widest (t) != 1)


!integer_onep (t)


+  if (!(gfc_current_ns->omp_requires & OMP_REQ_REVERSE_OFFLOAD))
+{
+  gfc_error ("a % directive with the "
+ "% clause must be "
+ "specified at %C");


See above.


+  else if (gfc_match ("%e )", >device) == MATCH_YES)
+{
+}
+  else


Better != MATCH_YES and drop the {} else ?


+{
+  gfc_error ("Expected integer expression or a single device-"
+  "modifier % or % at %C");
+  break;
+}
+  continue;
+}
   if ((mask & OMP_CLAUSE_DEVICE)
   && openacc
   && gfc_match ("device ( ") == MATCH_YES



+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE
+  && OMP_CLAUSE_DEVICE_ANCESTOR (c))
+{
+  if (code != OMP_TARGET)
+{
+error_at (OMP_CLAUSE_LOCATION (c),
+  "% clause with % is only "
+

[PATCH] tree-optimization/102046 - fix SLP build from scalars with patterns

2021-08-25 Thread Richard Biener via Gcc-patches

When we swap operands for SLP builds we lose track where exactly
pattern defs are - but we fail to update the any_pattern member
of the operands info.  Do so conservatively.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-08-25  Richard Biener  

PR tree-optimization/102046
* tree-vect-slp.c (vect_build_slp_tree_2): Conservatively
update ->any_pattern when swapping operands.

* gcc.dg/vect/pr102046.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr102046.c | 19 +++
 gcc/tree-vect-slp.c  |  4 
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102046.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr102046.c 
b/gcc/testsuite/gcc.dg/vect/pr102046.c
new file mode 100644
index 000..ae48b497dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr102046.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fvect-cost-model=dynamic" } */
+/* { dg-additional-options "-march=btver2" { target x86_64-*-* i?86-*-* } } */
+
+struct S
+{
+  unsigned a, b;
+};
+
+struct S g;
+
+void
+foo (struct S *o)
+{
+  struct S s = g;
+  s.b *= 3;
+  s.a -= s.a / 2;
+  *o = s;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index edc11c62793..4d688c7a267 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2311,6 +2311,10 @@ out:
  }
  if (dump_enabled_p ())
dump_printf (MSG_NOTE, "\n");
+ /* After swapping some operands we lost track whether an
+operand has any pattern defs so be conservative here.  */
+ if (oprnds_info[0]->any_pattern || oprnds_info[1]->any_pattern)
+   oprnds_info[0]->any_pattern = oprnds_info[1]->any_pattern = true;
  /* And try again with scratch 'matches' ... */
  bool *tem = XALLOCAVEC (bool, group_size);
  if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts,
-- 
2.31.1

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches




On 25/8/2021 下午 4:17, HAO CHEN GUI via Gcc-patches wrote:

Hi Kewen,

  Thanks for your advice.

On 25/8/2021 下午 3:50, Kewen.Lin wrote:

Hi Haochen,

on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote:

Hi,

 I refined the patch according to Bill's advice. I pasted the 
ChangeLog and diff file here. If it doesn't work, please let me 
know. Thanks.


2021-08-25 Haochen Gui 

gcc/

IIUC, this patch is for PR93127, one line for PR is missing here.
The patch does comes from the PR, but it doesn't work as the PR 
requires. So I am not sure if I should add the PR number.



 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
 Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
 VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/

Same, need a PR line.


 * gcc.target/powerpc/vec-minmax-1.c: New test.
 * gcc.target/powerpc/vec-minmax-2.c: Likewise.


Maybe it's better to use pr93127-{1,2}.c for case names?

...

--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */

I guess this "powerpc64le" isn't intentional?  The test case
has the macro to distinguish endianess, I assume we want this
to be compiled on BE?  If so, we just put the line below instead?
It should be tested on BE as well. I will replace it with 
'powerpc*-*-* && lp64'.

It should be 'powerpc*-*-*'. Thanks again!


/* { dg-do compile } */

And it needs extra testing on BE as well.  :)

Thanks for fixing this!

BR,
Kewen


+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set. */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c

new file mode 100644
index 000..d318b933181
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be 
converted

+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}

Re: [PATCH] Change illegitimate constant into memref of constant pool in change_zero_ext.

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote:
> >   This patch extend change_zero_ext to change illegitimate constant
> > into constant pool, this will enable simplification of below:
>
> It should be in a separate function.  recog_for_combine will call both.
> But not both for the same RTL!  This is important.  Originally, combine
> tried only one thing for every combination of input instructions it got.
> Every extra attempt causes quite a bit more garbage (not to mention it
> takes time as well, recog isn't super cheap), so we should try to not
> make recog_for_combine exponential in the variants it tries..
>
> And of course the function name should always be descriptive of what a
> function does :-)
>
> >  change_zero_ext (rtx pat)
> > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat)
> >  {
> >rtx x = **iter;
> >scalar_int_mode mode, inner_mode;
> > +  machine_mode const_mode = GET_MODE (x);
> > +
> > +  /* Change illegitimate constant into memref of constant pool.  */
> > +  if (CONSTANT_P (x)
> > +   && !const_vec_duplicate_p (x)
>
> This is x86-specific?  It makes no sense in generic code, anyway.  Or if
> it does it needs a big fat comment :-)
>
> > +   && const_mode != BLKmode
> > +   && GET_CODE (x) != HIGH
> > +   && GET_MODE_SIZE (const_mode).is_constant ()
> > +   && !targetm.legitimate_constant_p (const_mode, x)
> > +   && !targetm.cannot_force_const_mem (const_mode, x))
>
> You should only test that it did not recog, and then force a constant
> to memory.  You do not want to do this for every constant (rotate by 42
> won't likely match better if you force 42 to memory) so some
> sophistication will help here, but please do not make it target-
> specific.
>
> > + {
> > +   x = force_const_mem (GET_MODE (x), x);
>
> That mode is const_mode.
>
>
> Ideally you will have some example where it benefits some other target,
> too.  Running recog twice for a big fraction of all combine attempts,
> for no benefit at all, is not a good idea.  The zext* thing is there
> because combine *itself* creates a lot of extra zext*, whether those
> exist for the target or not.  So this isn't obvious precedent (and that
> wouldn't mean it is a good idea anyway ;-) )
When trying to construct a testcase for another backend, I realized
that the issue can also be solved by folding _mm_shuffle_ps into
gimple vec_perm_expr.
Let me try that way.

>
>
> Segher



-- 
BR,
Hongtao

Re: [PATCH] Change illegitimate constant into memref of constant pool in change_zero_ext.

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote:
> >   This patch extend change_zero_ext to change illegitimate constant
> > into constant pool, this will enable simplification of below:
>
> It should be in a separate function.  recog_for_combine will call both.
> But not both for the same RTL!  This is important.  Originally, combine
> tried only one thing for every combination of input instructions it got.
> Every extra attempt causes quite a bit more garbage (not to mention it
> takes time as well, recog isn't super cheap), so we should try to not
> make recog_for_combine exponential in the variants it tries..
>
> And of course the function name should always be descriptive of what a
> function does :-)
>
Yes, will do.
> >  change_zero_ext (rtx pat)
> > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat)
> >  {
> >rtx x = **iter;
> >scalar_int_mode mode, inner_mode;
> > +  machine_mode const_mode = GET_MODE (x);
> > +
> > +  /* Change illegitimate constant into memref of constant pool.  */
> > +  if (CONSTANT_P (x)
> > +   && !const_vec_duplicate_p (x)
>
> This is x86-specific?  It makes no sense in generic code, anyway.  Or if
> it does it needs a big fat comment :-)
>
> > +   && const_mode != BLKmode
> > +   && GET_CODE (x) != HIGH
> > +   && GET_MODE_SIZE (const_mode).is_constant ()
> > +   && !targetm.legitimate_constant_p (const_mode, x)
> > +   && !targetm.cannot_force_const_mem (const_mode, x))
>
> You should only test that it did not recog, and then force a constant
> to memory.  You do not want to do this for every constant (rotate by 42
> won't likely match better if you force 42 to memory) so some
> sophistication will help here, but please do not make it target-
> specific.
When it comes to change_zero_ext, insn_code_number must < 0,  constant
pool is used as a second choice.
-cut--
  if (insn_code_number >= 0 || check_asm_operands (pat))
return insn_code_number;

  void *marker = get_undo_marker ();
  bool changed = false;

  if (GET_CODE (pat) == SET)
changed = change_zero_ext (pat);
---cut end-

>
> > + {
> > +   x = force_const_mem (GET_MODE (x), x);
>
> That mode is const_mode.
>
>
> Ideally you will have some example where it benefits some other target,
> too.  Running recog twice for a big fraction of all combine attempts,
> for no benefit at all, is not a good idea.  The zext* thing is there
This optimization is similar, pass_combine extracts const_vector from
constant pool and simplify it to another const_vector, but doesn't
realize the simplified const_vector should also be in constant pool.
Maybe I should just restrict this optimization to const_vector only.
 > because combine *itself* creates a lot of extra zext*, whether those
> exist for the target or not.  So this isn't obvious precedent (and that
> wouldn't mean it is a good idea anyway ;-) )
>
>
> Segher



-- 
BR,
Hongtao

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches


Hi Kewen,

  Thanks for your advice.

On 25/8/2021 下午 3:50, Kewen.Lin wrote:

Hi Haochen,

on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote:

Hi,

     I refined the patch according to Bill's advice. I pasted the ChangeLog and 
diff file here. If it doesn't work, please let me know. Thanks.

2021-08-25 Haochen Gui 

gcc/

IIUC, this patch is for PR93127, one line for PR is missing here.
The patch does comes from the PR, but it doesn't work as the PR 
requires. So I am not sure if I should add the PR number.



     * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
     Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
     VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/

Same, need a PR line.


     * gcc.target/powerpc/vec-minmax-1.c: New test.
     * gcc.target/powerpc/vec-minmax-2.c: Likewise.


Maybe it's better to use pr93127-{1,2}.c for case names?

...

--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */

I guess this "powerpc64le" isn't intentional?  The test case
has the macro to distinguish endianess, I assume we want this
to be compiled on BE?  If so, we just put the line below instead?
It should be tested on BE as well. I will replace it with 'powerpc*-*-* 
&& lp64'.


/* { dg-do compile } */

And it needs extra testing on BE as well.  :)

Thanks for fixing this!

BR,
Kewen


+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
new file mode 100644
index 000..d318b933181
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be converted
+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}

[PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-25 Thread Ankur Saini via Gcc-patches

This should also fix the failing regression found in PR analyzer/101980.

- The patch is in sync with current master
- successfully bootstrapped and tested on x86_64-linux-gnu.



fix.patch
Description: Binary data


Thanks 
- Ankur

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
> 
>     I refined the patch according to Bill's advice. I pasted the ChangeLog 
> and diff file here. If it doesn't work, please let me know. Thanks.
> 
> 2021-08-25 Haochen Gui 
> 
> gcc/

IIUC, this patch is for PR93127, one line for PR is missing here.

>     * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
>     Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
>     VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.
> 
> gcc/testsuite/

Same, need a PR line.

>     * gcc.target/powerpc/vec-minmax-1.c: New test.
>     * gcc.target/powerpc/vec-minmax-2.c: Likewise.
> 

Maybe it's better to use pr93127-{1,2}.c for case names?

...
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile { target { powerpc64le-*-* } } } */

I guess this "powerpc64le" isn't intentional?  The test case
has the macro to distinguish endianess, I assume we want this
to be compiled on BE?  If so, we just put the line below instead?

/* { dg-do compile } */

And it needs extra testing on BE as well.  :)

Thanks for fixing this!

BR,
Kewen

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
> +
> +/* This test verifies that float or double vec_min/max are bound to
> +   xv[min|max][d|s]p instructions when fast-math is not set.  */
> +
> +
> +#include 
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_D = 0;
> +#else
> +   const int PREF_D = 1;
> +#endif
> +
> +double vmaxd (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_max (va, vb), PREF_D);
> +}
> +
> +double vmind (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_min (va, vb), PREF_D);
> +}
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_F = 0;
> +#else
> +   const int PREF_F = 3;
> +#endif
> +
> +float vmaxf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_max (va, vb), PREF_F);
> +}
> +
> +float vminf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_min (va, vb), PREF_F);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
> new file mode 100644
> index 000..d318b933181
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile { target { powerpc64le-*-* } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
> +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
> +
> +/* This test verifies that float or double vec_min/max can be converted
> +   to scalar comparison when fast-math is set.  */
> +
> +
> +#include 
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_D = 0;
> +#else
> +   const int PREF_D = 1;
> +#endif
> +
> +double vmaxd (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_max (va, vb), PREF_D);
> +}
> +
> +double vmind (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_min (va, vb), PREF_D);
> +}
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_F = 0;
> +#else
> +   const int PREF_F = 3;
> +#endif
> +
> +float vmaxf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_max (va, vb), PREF_F);
> +}
> +
> +float vminf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_min (va, vb), PREF_F);
> +}
>

Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-25 Thread Uros Bizjak via Gcc-patches

On Tue, Aug 24, 2021 at 5:22 PM Hongyu Wang  wrote:
>
> Hi Uros,
>
> Sorry for the late update. I have tried adjusting the combine pass but
> found it is not easy to modify shift const, so I came up with an
> alternative solution with your patch. It matches the non-canonical
> zero-extend in ix86_decompose_address and adjust ix86_rtx_cost to
> combine below pattern
>
> (set (reg:DI 85)
>(and:DI (ashift:DI (reg:DI 87)
>(const_int 1 [0x1]))
>(const_int 4294967294 [0xfffe])))
>
> Survived bootstrap and regtest on x86-64-linux. Ok for master?

gcc/ChangeLog:

PR target/101716
* config/i386/i386.c (ix86_live_on_entry): Adjust comment.
(ix86_decompose_address): Remove retval check for ASHIFT,
allow non-canonical zero extend if AND mask covers ASHIFT
count.
(ix86_legitimate_address_p): Adjust condition for decompose.
(ix86_rtx_costs): Adjust cost for lea with non-canonical
zero-extend.

OK.

Thanks,
Uros.

> Uros Bizjak  于2021年8月16日周一 下午5:26写道：
>
> >
> > On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang  wrote:
> > >
> > > > So, the question is if the combine pass really needs to zero-extend
> > > > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> > > > 0x should be better and in line with canonical zero-extension
> > > > RTX.
> > >
> > > The shift mask is generated in simplify_shift_const_1:
> > >
> > > mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
> > >  int_result_mode);
> > > rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> > > mask_rtx
> > >   = simplify_const_binary_operation (code, int_result_mode,
> > >  mask_rtx, count_rtx);
> > >
> > > Can we adjust the count for ashift if nonzero_bits overlaps it?
> > >
> > > > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> > > > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> > > > call in ix86_legitimate_address_p) for some (historic?) reason. It
> > > > looks to me that this restriction is not necessary, since
> > > > ix86_legitimize_address can canonicalize ASHIFT RTXes without
> > > > problems. The attached patch that survives bootstrap and regtest can
> > > > help in your case.
> > >
> > > We have a split to transform ashift to mult, I'm afraid it could not
> > > help this issue.
> >
> > If you want existing *lea to accept ASHIFT RTX, it uses
> > address_no_seg_operand predicate which uses address_operand predicate,
> > which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes.
> >
> > Uros.

Re: [GCC-11] [PATCH 0/5] Finish and general-regs-only

2021-08-25 Thread Uros Bizjak via Gcc-patches

On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu  wrote:
>
> On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
>  wrote:
> >
> > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu  wrote:
> > >
> > >  and target("general-regs-only") function attribute
> > > were added to GCC 11.  But their implementations are incomplete.  I'd
> > > like to backport the following patches to GCC 11 branch to finish them.
> >
> > Fine with me if x86 maintainers do not disagree (also see one comment I have
> > on the -mwait adding patch).
>
> Hi Uros, Honza,
>
> Do you have any comments?  The updated -mwait patch with LTO_minor_version
> bump is at:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577471.html

I don't have any comments, but IIRC, approved changes can be
backported from mainline to release branches without additional
approval.

Uros.

> Thanks.
>
> H.J.
> > > H.J. Lu (5):
> > >   x86: Add -mmwait for -mgeneral-regs-only
> > >   x86: Use crc32 target option for CRC32 intrinsics
> > >   x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions
> > >   x86: Enable the GPR only instructions for -mgeneral-regs-only
> > >   : Add pragma GCC target("general-regs-only")
> > >
> > >  gcc/common/config/i386/i386-common.c   |  45 ++-
> > >  gcc/config.gcc |   6 +-
> > >  gcc/config/i386/i386-builtin.def   |   8 +-
> > >  gcc/config/i386/i386-builtins.c|   4 +-
> > >  gcc/config/i386/i386-c.c   |   2 +
> > >  gcc/config/i386/i386-options.c |  12 +
> > >  gcc/config/i386/i386.c |   6 +-
> > >  gcc/config/i386/i386.h |   2 +
> > >  gcc/config/i386/i386.md|   4 +-
> > >  gcc/config/i386/i386.opt   |   4 +
> > >  gcc/config/i386/ia32intrin.h   |  42 ++-
> > >  gcc/config/i386/mwaitintrin.h  |  52 +++
> > >  gcc/config/i386/pmmintrin.h|  13 +-
> > >  gcc/config/i386/serializeintrin.h  |   7 +-
> > >  gcc/config/i386/sse.md |   4 +-
> > >  gcc/config/i386/x86gprintrin.h |  13 +
> > >  gcc/doc/extend.texi|   5 +
> > >  gcc/doc/invoke.texi|   8 +-
> > >  gcc/testsuite/gcc.target/i386/crc32-6.c|  13 +
> > >  gcc/testsuite/gcc.target/i386/monitor-2.c  |  27 ++
> > >  gcc/testsuite/gcc.target/i386/pr101492-1.c |  10 +
> > >  gcc/testsuite/gcc.target/i386/pr101492-2.c |  10 +
> > >  gcc/testsuite/gcc.target/i386/pr101492-3.c |  10 +
> > >  gcc/testsuite/gcc.target/i386/pr101492-4.c |  12 +
> > >  gcc/testsuite/gcc.target/i386/pr99744-3.c  |  13 +
> > >  gcc/testsuite/gcc.target/i386/pr99744-4.c  | 357 +
> > >  gcc/testsuite/gcc.target/i386/pr99744-5.c  |  25 ++
> > >  gcc/testsuite/gcc.target/i386/pr99744-6.c  |  23 ++
> > >  gcc/testsuite/gcc.target/i386/pr99744-7.c  |  12 +
> > >  gcc/testsuite/gcc.target/i386/pr99744-8.c  |  13 +
> > >  30 files changed, 717 insertions(+), 45 deletions(-)
> > >  create mode 100644 gcc/config/i386/mwaitintrin.h
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-5.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-6.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-7.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-8.c
> > >
> > > --
> > > 2.31.1
> > >
>
>
>
> --
> H.J.

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches


Hi,

    I refined the patch according to Bill's advice. I pasted the 
ChangeLog and diff file here. If it doesn't work, please let me know. 
Thanks.


2021-08-25 Haochen Gui 

gcc/
    * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
    Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
    VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/
    * gcc.target/powerpc/vec-minmax-1.c: New test.
    * gcc.target/powerpc/vec-minmax-2.c: Likewise.

diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index b4e13af4dc6..90527734ceb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin 
(gimple_stmt_iterator *gsi)

   return true;
 /* flavors of vec_min.  */
 case VSX_BUILTIN_XVMINDP:
+    case ALTIVEC_BUILTIN_VMINFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MIN_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMINSD:
 case P8V_BUILTIN_VMINUD:
 case ALTIVEC_BUILTIN_VMINSB:
@@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
*gsi)

 case ALTIVEC_BUILTIN_VMINUB:
 case ALTIVEC_BUILTIN_VMINUH:
 case ALTIVEC_BUILTIN_VMINUW:
-    case ALTIVEC_BUILTIN_VMINFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
@@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin 
(gimple_stmt_iterator *gsi)

   return true;
 /* flavors of vec_max.  */
 case VSX_BUILTIN_XVMAXDP:
+    case ALTIVEC_BUILTIN_VMAXFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+    return false;
+  /* Fall through to MAX_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMAXSD:
 case P8V_BUILTIN_VMAXUD:
 case ALTIVEC_BUILTIN_VMAXSB:
@@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
*gsi)

 case ALTIVEC_BUILTIN_VMAXUB:
 case ALTIVEC_BUILTIN_VMAXUH:
 case ALTIVEC_BUILTIN_VMAXUW:
-    case ALTIVEC_BUILTIN_VMAXFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c

new file mode 100644
index 000..da92f059aea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c

new file mode 100644
index 000..d318b933181
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be converted
+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a,

Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Hongtao Liu via Gcc-patches

On Wed, Aug 25, 2021 at 2:14 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> For avx512f_scattersi, mask operand only affect set src, we
> need to refine the pattern to let gcc know mask register also affect the dest.
> So we put mask operand into UNSPEC_VSIBADDR.
>
> Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
> Ok for master?
>
> gcc/ChangeLog:
>
> *config/i386/sse.md (scattersi): Add mask operand to
> UNSPEC_VSIBADDR.
> (scattersi): Likewise.
> (*avx512f_scattersi): Merge mask operand
> to set_dest.
> (*avx512f_scatterdi): Likewise
>
> gcc/testsuite/ChangeLog:
>
> *gcc.target/i386/avx512f-pr101472.c: New test.
> *gcc.target/i386/avx512vl-pr101472.c: Ditto.
Please follow GCC Coding Convention ChanLog which is described in
https://gcc.gnu.org/codingconventions.html#ChangeLogs.

-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]),
UNSPEC_VSIBADDR);
Lines shall be at most 80 columns.
 })

 (define_insn "*avx512f_scattersi"
@@ -24214,10 +24214,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand: 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand: 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand:VI48F 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch: 1 "="))]
@@ -24243,8 +24244,8 @@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]),
UNSPEC_VSIBADDR);
Ditto.
 })

-- 
BR,
Hongtao

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Kong, Lingling via Gcc-patches

Hi,

For avx512f_scattersi, mask operand only affect set src, we
need to refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.

Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

gcc/ChangeLog:

*config/i386/sse.md (scattersi): Add mask operand to
UNSPEC_VSIBADDR.
(scattersi): Likewise.
(*avx512f_scattersi): Merge mask operand
to set_dest.
(*avx512f_scatterdi): Likewise

gcc/testsuite/ChangeLog:

*gcc.target/i386/avx512f-pr101472.c: New test.
*gcc.target/i386/avx512vl-pr101472.c: Ditto.


0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch
Description: 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch

[PATCH] i386: Fix _mm512_fpclass_ps_mask in O0 [PR 101471]

2021-08-25 Thread Kong, Lingling via Gcc-patches

Hi,

For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of
(__mmask8)-1).

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for master?

gcc/ChangeLog:

* gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-pr101471.c: add new test


0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch
Description: 0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch

84 matches

Mail list logo