Re: [PATCH] configure: respect --with-build-time-tools [PR43301]

2022-08-02 Thread Alexandre Oliva via Gcc-patches
On Aug  2, 2022, Eric Gallager  wrote:

> On Tue, Aug 2, 2022 at 1:24 AM Alexandre Oliva  wrote:

>> -elif test -x as$build_exeext; then
>> +elif test -x as$build_exeext \
>> +   && { test "x$build_exeext" != "x" \
>> +|| test "x`grep '^# Invoke as, ld or nm from the build tree' \
>> + as`" = "x"; }; then
>> 
>> WDYT?

> Hi, thanks for the feedback; I'm a bit confused, though: where exactly
> would this proposed change go?

gcc/configure.ac, just before:

# Build using assembler in the current directory.
gcc_cv_as=./as$build_exeext

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH v5] LoongArch: add movable attribute

2022-08-02 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-08-03 at 10:59 +0800, WANG Xuerui wrote:

> I don't think mindlessly caring for vendor forks is always correct. In
> fact I find the name "movable" too generic, and something like 
> "force_got_access" could be better.

The problem is "what will this behave *if* we later add some code model
without GOT".  If it's named "movable" we generate a full 4-instruction
absolute (or PC-relative) address loading sequence if GOT is disabled. 
If it's named "force_got_access" we report an error and reject the code
if GOT is disabled.

> I don't currently have time to test this, unfortunately, due to day job. 
> Might be able to give it a whirl one or two week later though...

Unfortunately, I can't access my dev system via SSH too because while
I'm remote, a sudden power surge happened and I forgot to configure an
automatically power-on.

I'm kind of rushy because I want to make it into 12.2, leaving 12.1 the
only exception cannot build Linux >= 6.0.  But maybe it just can't be
backported anyway.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-02 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-08-03 at 10:55 +0800, chengl...@loongson.cn wrote:
> I think there is no problem with this patch。But I have a question. The
> visibility attribute works, so is it necessary to add the moveable
> attribute?

1. My use of -fPIC and visibility is not in the way ELF visibility has
been designed for.  It's hardly to tell if it's an legitimate use or a
misuse.
2. Adding -fPIC can make unwanted side effects, especially: if we add
some optimizations only suitable for -fno-PIC, we'll miss them using -
fPIC.  Note that -fPIC does not only mean "produce position independent
code", but "produce position independent code *suitable for ELF dynamic
libraries*".  So to other people it will be ridiculous to use -fPIC for
kernel.
3. Huacai said he didn't like using __attribute__((visibility)) like
this (in kernel ML) and I share his feeling.

> I'd like to wait for the kernel team to test the performance data of
> the two implementations before deciding whether to support this
> attribute.
> 
> What do you think?

Perhaps, I can't access my dev system now anyway (I've configured the
SSH access but then a sudden power surge happened and I didn't
configured automatically power on :( )
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5] LoongArch: add movable attribute

2022-08-02 Thread WANG Xuerui

On 2022/8/3 09:36, Xi Ruoyao wrote:

Is it OK for trunk or I need to change something?

By the way, I'm seeking a possibility to include this into 12.2.  Then
we leaves only 12.1 without this attribute, and we can just say
"building the kernel needs GCC 12.2 or later".

On Mon, 2022-08-01 at 18:07 +0800, Xi Ruoyao wrote:

Changes v4 -> v5: Fix changelog.  No code change.

Changes v3 -> v4:

  * Use "movable" as the attribute name as Huacai says it's already
used
    in downstream GCC fork.


I don't think mindlessly caring for vendor forks is always correct. In 
fact I find the name "movable" too generic, and something like 
"force_got_access" could be better.


I don't currently have time to test this, unfortunately, due to day job. 
Might be able to give it a whirl one or two week later though...



  * Remove an inaccurate line from the doc. (Initially I tried to
    implement a "model(...)" like IA64 or M32R. Then I changed my mind
    but forgot to remove the line copied from M32R doc.)

-- >8 --

A linker script and/or a section attribute may locate a local object
in
some way unexpected by the code model, leading to a link failure.
This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

 * config/loongarch/loongarch.cc (loongarch_attribute_table):
 New attribute table.
 (TARGET_ATTRIBUTE_TABLE): Define the target hook.
 (loongarch_handle_addr_global_attribute): New static function.
 (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
 SYMBOL_REF_DECL with addr_global attribute.
 (loongarch_use_anchors_for_symbol_p): New static function.
 (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
 * doc/extend.texi (Variable Attributes): Document new
 LoongArch specific attribute.

gcc/testsuite/ChangeLog:

 * gcc.target/loongarch/addr-global.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 63
+++
  gcc/doc/extend.texi   | 16 +
  .../gcc.target/loongarch/attr-movable.c   | 29 +
  3 files changed, 108 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..6b6026700a6 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x)
    && !loongarch_symbol_binds_local_p (x))
  return SYMBOL_GOT_DISP;
  
+  if (SYMBOL_REF_P (x))

+    {
+  tree decl = SYMBOL_REF_DECL (x);
+  /* A movable symbol may be moved away from the +/- 2GiB range
around
+    the PC, so we have to use GOT.  */
+  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES
(decl)))
+   return SYMBOL_GOT_DISP;
+    }
+
    return SYMBOL_PCREL;
  }
  
@@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void)

    return crtl->outgoing_args_size;
  }
  
+static tree

+loongarch_handle_movable_attribute (tree *node, tree name, tree, int,
+   bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+    {
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+    }
+  else
+    {
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "movable", 0, 0, true, false, false, false,
+    loongarch_handle_movable_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
+bool
+loongarch_use_anchors_for_symbol_p (const_rtx symbol)
+{
+  tree decl = SYMBOL_REF_DECL (symbol);
+
+  /* A movable attribute indicates the linker may move the symbol
away,
+ so the use of anchor may cause relocation overflow.  */
+  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl)))
+    return false;
+
+  return default_use_anchors_for_symbol_p (symbol);
+}
+
  /* Initialize the GCC target structure.  */
  #undef TARGET_ASM_ALIGNED_HI_OP
  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void)
  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
  #define TARGET_HAVE_SPECULATION_SAFE_VALUE
speculation_safe_value_not_needed
  
+#undef  TARGET_ATTRIBUTE_TABLE

+#define TARGET_ATTRIBUTE_TABLE 

[PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-02 Thread Takayuki 'January June' Suwa via Gcc-patches
Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
data flow consistent, but it also increases register allocation pressure
and thus often creates many unwanted register-to-register moves that
cannot be optimized away.  It seems just analogous to partial register
stall which is a famous problem on processors that do register renaming.

In my opinion, when the register to be clobbered is a composite of hard
ones, we should clobber the individual elements separetely, otherwise
clear the entire to zero prior to use as the "init-regs" pass does (like
partial register stall workarounds on x86 CPUs).  Such redundant zero
constant assignments will be removed later in the "cprop_hardreg" pass.

This patch may give better output code quality for the reasons above,
especially on architectures that don't have DFmode hard registers
(On architectures with such hard registers, this patch changes virtually
nothing).

For example (Espressif ESP8266, Xtensa without FP hard regs):

/* example */
double _Complex conjugate(double _Complex z) {
  __imag__(z) *= -1;
  return z;
}

;; before
conjugate:
movi.n  a6, -1
sllia6, a6, 31
mov.n   a8, a2
mov.n   a9, a3
mov.n   a7, a4
xor a6, a5, a6
mov.n   a2, a8
mov.n   a3, a9
mov.n   a4, a7
mov.n   a5, a6
ret.n

;; after
conjugate:
movi.n  a6, -1
sllia6, a6, 31
xor a6, a5, a6
mov.n   a5, a6
ret.n

gcc/ChangeLog:

* lower-subreg.cc (resolve_simple_move):
Add zero clear of the entire register immediately after
the clobber.
* expr.cc (emit_move_complex_parts):
Change to clobber the real and imaginary parts separately
instead of the whole complex register if possible.
---
 gcc/expr.cc | 26 --
 gcc/lower-subreg.cc |  7 ++-
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5..9732e8fd4e5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
 rtx_insn *
 emit_move_complex_parts (rtx x, rtx y)
 {
-  /* Show the output dies here.  This is necessary for SUBREGs
- of pseudos since we cannot track their lifetimes correctly;
- hard regs shouldn't appear here except as return values.  */
-  if (!reload_completed && !reload_in_progress
-  && REG_P (x) && !reg_overlap_mentioned_p (x, y))
-emit_clobber (x);
+  rtx_insn *re_insn, *im_insn;
 
   write_complex_part (x, read_complex_part (y, false), false, true);
+  re_insn = get_last_insn ();
   write_complex_part (x, read_complex_part (y, true), true, false);
+  im_insn = get_last_insn ();
+
+  /* Show the output dies here.  This is necessary for SUBREGs
+ of pseudos since we cannot track their lifetimes correctly.  */
+  if (can_create_pseudo_p ()
+  && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
+{
+  /* Hard regs shouldn't appear here except as return values.  */
+  if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
+   {
+ emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
+   re_insn);
+ emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
+   im_insn);
+   }
+  else
+   emit_insn_before (gen_clobber (x), re_insn);
+}
 
   return get_last_insn ();
 }
diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 03e9326c663..4ff0a7d1556 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
   unsigned int i;
 
   if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-   emit_clobber (dest);
+   {
+ emit_clobber (dest);
+ /* We clear the entire of dest with zero after the clobber,
+similar to the "init-regs" pass.  */
+ emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
+   }
 
   for (i = 0; i < words; ++i)
{
-- 
2.20.1


Re: [PATCH v5] LoongArch: add movable attribute

2022-08-02 Thread Xi Ruoyao via Gcc-patches
Is it OK for trunk or I need to change something?

By the way, I'm seeking a possibility to include this into 12.2.  Then
we leaves only 12.1 without this attribute, and we can just say
"building the kernel needs GCC 12.2 or later".

On Mon, 2022-08-01 at 18:07 +0800, Xi Ruoyao wrote:
> Changes v4 -> v5: Fix changelog.  No code change.
> 
> Changes v3 -> v4:
> 
>  * Use "movable" as the attribute name as Huacai says it's already
> used
>    in downstream GCC fork.
>  * Remove an inaccurate line from the doc. (Initially I tried to
>    implement a "model(...)" like IA64 or M32R. Then I changed my mind
>    but forgot to remove the line copied from M32R doc.)
> 
> -- >8 --
> 
> A linker script and/or a section attribute may locate a local object
> in
> some way unexpected by the code model, leading to a link failure. 
> This
> happens when the Linux kernel loads a module with "local" per-CPU
> variables.
> 
> Add an attribute to explicitly mark an variable with the address
> unlimited by the code model so we would be able to work around such
> problems.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.cc (loongarch_attribute_table):
> New attribute table.
> (TARGET_ATTRIBUTE_TABLE): Define the target hook.
> (loongarch_handle_addr_global_attribute): New static function.
> (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
> SYMBOL_REF_DECL with addr_global attribute.
> (loongarch_use_anchors_for_symbol_p): New static function.
> (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
> * doc/extend.texi (Variable Attributes): Document new
> LoongArch specific attribute.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/addr-global.c: New test.
> ---
>  gcc/config/loongarch/loongarch.cc | 63
> +++
>  gcc/doc/extend.texi   | 16 +
>  .../gcc.target/loongarch/attr-movable.c   | 29 +
>  3 files changed, 108 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c
> 
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index 79687340dfd..6b6026700a6 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x)
>    && !loongarch_symbol_binds_local_p (x))
>  return SYMBOL_GOT_DISP;
>  
> +  if (SYMBOL_REF_P (x))
> +    {
> +  tree decl = SYMBOL_REF_DECL (x);
> +  /* A movable symbol may be moved away from the +/- 2GiB range
> around
> +    the PC, so we have to use GOT.  */
> +  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES
> (decl)))
> +   return SYMBOL_GOT_DISP;
> +    }
> +
>    return SYMBOL_PCREL;
>  }
>  
> @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void)
>    return crtl->outgoing_args_size;
>  }
>  
> +static tree
> +loongarch_handle_movable_attribute (tree *node, tree name, tree, int,
> +   bool *no_add_attrs)
> +{
> +  tree decl = *node;
> +  if (TREE_CODE (decl) == VAR_DECL)
> +    {
> +  if (DECL_CONTEXT (decl)
> + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
> + && !TREE_STATIC (decl))
> +   {
> + error_at (DECL_SOURCE_LOCATION (decl),
> +   "%qE attribute cannot be specified for local "
> +   "variables", name);
> + *no_add_attrs = true;
> +   }
> +    }
> +  else
> +    {
> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +    }
> +  return NULL_TREE;
> +}
> +
> +static const struct attribute_spec loongarch_attribute_table[] =
> +{
> +  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> +   affects_type_identity, handler, exclude } */
> +  { "movable", 0, 0, true, false, false, false,
> +    loongarch_handle_movable_attribute, NULL },
> +  /* The last attribute spec is set to be NULL.  */
> +  {}
> +};
> +
> +bool
> +loongarch_use_anchors_for_symbol_p (const_rtx symbol)
> +{
> +  tree decl = SYMBOL_REF_DECL (symbol);
> +
> +  /* A movable attribute indicates the linker may move the symbol
> away,
> + so the use of anchor may cause relocation overflow.  */
> +  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl)))
> +    return false;
> +
> +  return default_use_anchors_for_symbol_p (symbol);
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void)
>  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
>  #define TARGET_HAVE_SPECULATION_SAFE_VALUE
> speculation_safe_value_not_needed
>  
> +#undef  TARGET_ATTRIBUTE_TABLE
> +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
> +
> +#undef  TARGET_USE_ANCHORS_FOR_SYMBOL_P
> +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P
> 

Re: [PATCH] Mips: Fix the ASAN shadow offset hook for the n32 ABI

2022-08-02 Thread Xi Ruoyao via Gcc-patches
On Sat, 2022-07-30 at 14:22 +0100, Maciej W. Rozycki wrote:
> On Mon, 6 Jun 2022, Dimitrije Milosevic wrote:
> 
> >     * config/mips/mips.cc (mips_asan_shadow_offset): Reformat
> >     to handle the N32 ABI.
> 
>  That's not what the change does.
> 
> >     * config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Remove
> >     the macro, as it is not needed anymore.
> 
>  Why is the macro not needed anymore?

Because it's only used by mips_asan_shadow_offset and now we directly
code its content into mips_asan_shadow_offset.

SUBTARGET_SHADOW_OFFSET is only needed if a different subtarget (say,
mips64el-freebsd) needs a different shadow offset.  But for MIPS we
don't have any subtarget other than mips*-linux-gnu* supporting ASAN so
we can omit SUBTARGET_SHADOW_OFFSET and fold the content directly into
the asan_shadow_offset target hook.  RISCV port does the same.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] cselib: add function to check if SET is redundant [PR106187]

2022-08-02 Thread Jeff Law via Gcc-patches




On 8/2/2022 10:06 AM, Richard Earnshaw wrote:



On 01/08/2022 11:38, Richard Earnshaw via Gcc-patches wrote:



On 30/07/2022 20:57, Jeff Law via Gcc-patches wrote:



On 7/29/2022 7:52 AM, Richard Earnshaw via Gcc-patches wrote:
A SET operation that writes memory may have the same value as an 
earlier store but if the alias sets of the new and earlier store do 
not conflict then the set is not truly redundant.  This can happen, 
for example, if objects of different types share a stack slot.


To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler. Firstly
in cfgcleanup and secondly in postreload.

gcc/ChangeLog:
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.
Seems quite reasonable.   The only question I would have would be 
whether or not you considered including the aliasing info into the 
hashing used by cselib.  You'd probably still need the bulk of this 
patch as well since we could presumably still get a hash conflict 
with two stores of the same value to the same location, but with 
different alias sets (it's just much less likely), so perhaps it 
doesn't really buy us anything.


I thought about this, but if the alias set were included in the hash, 
then surely you'd get every alias set in a different value.  Then 
you'd miss the cases where the alias sets do conflict even though 
they are not the same.  Anyway, the values /are/ the same so in some 
circumstances you might want to know that.




Ideally this would include a testcase.  You might be able to turn 
that non-executawble reduced case into something useful by scanning 
the post-reload dumps.


I considered this as well, but the testcase I have is far too 
fragile, I think.  The existing test only fails on Arm, only fails on 
11.2 (not 11.3 or gcc-12 onwards), relies on two objects with the 
same value being in distinct alias sets but still assigned to the 
same stack slot and for some copy dance to end up trying to write 
back the original value to the same slot but with a non-conflicting 
set.  And finally, the scheduler has to then try to move a load past 
the non-aliasing store.





To get anywhere close to this I think we'd need something akin to the 
gimple reader but for RTL so that we could set up all the conditions 
for the failure without the risk of an earlier transform blowing the 
test away.


I wasn't aware of the rtl reader already in the compiler.  But it 
doesn't really get me any closer as it is lacking in so many regards:


- It can't handle (const_double:SF ...) - it tries to handle the 
argument as an int.  This is a consequence, I think, of the reader 
being based on that for reading machine descriptions where FP 
const_double is simply never encountered.


- It doesn't seem to handle anything much more than very basic types, 
and in particular appears to have no way of ensuring that alias sets 
match up with the type system.




I even considered whether we could start with a gimple dump and 
bypassing all the tree/gimple transformations, but even that would be 
still at the mercy of the stack-slot allocation algorithm.


I spent a while trying to get some gimple out of the dumpers in a form 
that was usable, but that's pretty much a non-starter.  To make it 
work we'd need to add support for gimple clobbers on objects - without 
that there's no way to get the stack-slot sharing code to work.  
Furthermore, even feeding fully-optimized gimple directly into expand 
is such a long way from the postreload pass, that I can't believe the 
testcase would remain stable for long.


And the other major issue is that the original testcase is heavily 
templated C++ and neither of the parsers gimple or rtl is supported in 
cc1plus: converting the boilerplate to be C-friendly is probably going 
to be hard.


I can't afford to spend much more time on this, especially given the 
low-quality test we're going to get out of the end of the process.
Understood.  Let's just go with the patch as-is.  That's normal for 
cases where we can't produce a reasonable test.


jeff


Re: [PATCH] stack-protector: Check stack canary for noreturn function

2022-08-02 Thread Jeff Law via Gcc-patches




On 8/2/2022 11:43 AM, H.J. Lu wrote:

On Sat, Jul 30, 2022 at 1:30 PM Jeff Law via Gcc-patches
 wrote:



On 7/14/2022 3:55 PM, H.J. Lu via Gcc-patches wrote:

Check stack canary for noreturn function to catch stack corruption
before calling noreturn function.  For C++, check stack canary when
throwing exception or resuming stack unwind to avoid corrupted stack.

gcc/

   PR middle-end/58245
   * calls.cc (expand_call): Check stack canary for noreturn
   function.

gcc/testsuite/

   PR middle-end/58245
   * c-c++-common/pr58245-1.c: New test.
   * g++.dg/pr58245-1.C: Likewise.
   * g++.dg/fstack-protector-strong.C: Adjusted.

But is this really something we want?   I'd actually lean towards
eliminating the useless load -- I don't necessarily think we should be
treating non-returning paths specially here.

The whole point of the stack protector is to prevent the *return* path
from going to an attacker controlled location.  I'm not sure checking
the protector at this point actually does anything particularly useful.

throw is marked no return.   Since the unwind library may read
the stack contents to unwind stack, it the stack is corrupted, the
exception handling may go wrong.   Should we handle this case?
That's the question I think we need to answer.  The EH paths are a known 
security issue on Windows and while ours are notably different I'm not 
sure if there's a real attack surface in those paths.  My sense is that 
if we need to tackle this that doing so on the throw side might be 
better as it's closer conceptually to when//how we check the canary for 
a normal return.


jeff


  --
H.J.




[COMMITTED] tree-optimization/106510 - Do not register edges for statements not understood.

2022-08-02 Thread Andrew MacLeod via Gcc-patches
When only interger ranges were supported, we knew all gimple COND 
statements were supported.  this is no longer true with float support, 
so gracefully do nothing when they are encountered.


You can choose to remove the "unsupported relational" range-ops if you 
so wish.  we shouldn't need those.,


Bootstrapped on 86_64-pc-linux-gnu with no regressions.  pushed.

Andrew
commit 70daecc03235aa7187b03681cebed6e04b32678e
Author: Andrew MacLeod 
Date:   Tue Aug 2 17:31:37 2022 -0400

Do not register edges for statements not understood.

Previously, all gimple_cond types were undserstoof, with float values,
this is no longer true.  We should gracefully do nothing if the
gcond type is not supported.

PR tree-optimization/106510
gcc/
* gimple-range-fold.cc (fur_source::register_outgoing_edges):
  Check for unsupported statements early.

gcc/testsuite
* gcc.dg/pr106510.c: New.

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 923094abd62..689d8279627 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1496,6 +1496,10 @@ fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, edge
   tree name;
   basic_block bb = gimple_bb (s);
 
+  range_op_handler handler (s);
+  if (!handler)
+return;
+
   if (e0)
 {
   // If this edge is never taken, ignore it.
@@ -1524,8 +1528,6 @@ fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, edge
   tree ssa2 = gimple_range_ssa_p (gimple_range_operand2 (s));
   if (ssa1 && ssa2)
 {
-  range_op_handler handler (s);
-  gcc_checking_assert (handler);
   if (e0)
 	{
 	  relation_kind relation = handler.op1_op2_relation (e0_range);
diff --git a/gcc/testsuite/gcc.dg/pr106510.c b/gcc/testsuite/gcc.dg/pr106510.c
new file mode 100644
index 000..24e91123f63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106510.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void foo ();
+void ine_ok() {
+  float y, x;
+  if (x < y || x > y || y)
+foo ();
+}
+


[PATCH] c++: Extend -Wpessimizing-move to other contexts

2022-08-02 Thread Marek Polacek via Gcc-patches
In my recent patch which enhanced -Wpessimizing-move so that it warns
about class prvalues too I said that I'd like to extend it so that it
warns in more contexts where a std::move can prevent copy elision, such
as:

  T t = std::move(T());
  T t(std::move(T()));
  T t{std::move(T())};
  T t = {std::move(T())};
  void foo (T);
  foo (std::move(T()));

This patch does that by adding two maybe_warn_pessimizing_move calls.
These must happen before we've converted the initializers otherwise the
std::move will be buried in a TARGET_EXPR.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106276

gcc/cp/ChangeLog:

* call.cc (build_over_call): Call maybe_warn_pessimizing_move.
* cp-tree.h (maybe_warn_pessimizing_move): Declare.
* decl.cc (build_aggr_init_full_exprs): Call
maybe_warn_pessimizing_move.
* typeck.cc (maybe_warn_pessimizing_move): Handle TREE_LIST and
CONSTRUCTOR.  Add a bool parameter and use it.  Adjust a diagnostic
message.
(check_return_expr): Adjust the call to maybe_warn_pessimizing_move.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wpessimizing-move7.C: Add dg-warning.
* g++.dg/cpp0x/Wpessimizing-move8.C: New test.
---
 gcc/cp/call.cc|  5 +-
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl.cc|  3 +-
 gcc/cp/typeck.cc  | 58 -
 .../g++.dg/cpp0x/Wpessimizing-move7.C | 16 ++---
 .../g++.dg/cpp0x/Wpessimizing-move8.C | 65 +++
 6 files changed, 120 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wpessimizing-move8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 01a7be10077..370137ebd6d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -9627,10 +9627,13 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   if (!conversion_warning)
arg_complain &= ~tf_warning;
 
+  if (arg_complain & tf_warning)
+   maybe_warn_pessimizing_move (arg, type, /*return_p*/false);
+
   val = convert_like_with_context (conv, arg, fn, i - is_method,
   arg_complain);
   val = convert_for_arg_passing (type, val, arg_complain);
-   
+
   if (val == error_mark_node)
 return error_mark_node;
   else
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3278b4114bd..5a8af22b509 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8101,6 +8101,7 @@ extern tree finish_right_unary_fold_expr (tree, int);
 extern tree finish_binary_fold_expr  (tree, tree, int);
 extern tree treat_lvalue_as_rvalue_p(tree, bool);
 extern bool decl_in_std_namespace_p (tree);
+extern void maybe_warn_pessimizing_move (tree, tree, bool);
 
 /* in typeck2.cc */
 extern void require_complete_eh_spec_types (tree, tree);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 70ad681467e..dc6853a7de1 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7220,9 +7220,10 @@ check_array_initializer (tree decl, tree type, tree init)
 
 static tree
 build_aggr_init_full_exprs (tree decl, tree init, int flags)
- 
 {
   gcc_assert (stmts_are_full_exprs_p ());
+  if (init)
+maybe_warn_pessimizing_move (init, TREE_TYPE (decl), /*return_p*/false);
   return build_aggr_init (decl, init, flags, tf_warning_or_error);
 }
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 9500c4e2fe8..2650beb780e 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10368,17 +10368,17 @@ treat_lvalue_as_rvalue_p (tree expr, bool return_p)
 }
 }
 
-/* Warn about wrong usage of std::move in a return statement.  RETVAL
-   is the expression we are returning; FUNCTYPE is the type the function
-   is declared to return.  */
+/* Warn about dubious usage of std::move (in a return statement, if RETURN_P
+   is true).  EXPR is the std::move expression; TYPE is the type of the object
+   being initialized.  */
 
-static void
-maybe_warn_pessimizing_move (tree retval, tree functype)
+void
+maybe_warn_pessimizing_move (tree expr, tree type, bool return_p)
 {
   if (!(warn_pessimizing_move || warn_redundant_move))
 return;
 
-  location_t loc = cp_expr_loc_or_input_loc (retval);
+  const location_t loc = cp_expr_loc_or_input_loc (expr);
 
   /* C++98 doesn't know move.  */
   if (cxx_dialect < cxx11)
@@ -10390,14 +10390,32 @@ maybe_warn_pessimizing_move (tree retval, tree 
functype)
 return;
 
   /* This is only interesting for class types.  */
-  if (!CLASS_TYPE_P (functype))
+  if (!CLASS_TYPE_P (type))
 return;
 
+  /* A a = std::move (A());  */
+  if (TREE_CODE (expr) == TREE_LIST)
+{
+  if (list_length (expr) == 1)
+   expr = TREE_VALUE (expr);
+  else
+   return;
+}
+  /* A a = {std::move (A())};
+ A a{std::move (A())};  */
+  else if (TREE_CODE (expr) == CONSTRUCTOR)
+{
+  if 

Re: [COMMITTED,gcc12] c: Fix location for _Pragma tokens [PR97498]

2022-08-02 Thread Lewis Hyatt via Gcc-patches
On Mon, Aug 01, 2022 at 07:15:48PM -0400, Lewis Hyatt wrote:
> Hello-
> 
> This backport from r13-1596 to GCC 12 has been committed after
> pre-approval. This was a straightforward cherry-pick from master with no
> adjustments needed. I would like to note that subsequent to r13-1596, Thomas
> made a few commits to the libgomp testsuite to test for new diagnostic notes
> output after this patch; I have not backported these since I was not sure if
> that would be appropriate. I did verify that the libgomp testsuite changes
> work OK as-is on this branch, i.e. do not introduce any new failures,
> including with offloading enabled.

I have done the same for GCC 11 and 10 branches, patches attached. Thanks!

-Lewis
[GCC10] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:

a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.

With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.

This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.

gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.c (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.c (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* gcc.dg/pragma-message.c: Adapt for improved warning locations.

(cherry picked from commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2d347ad927c..b2c7a74b464 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -12328,6 +12328,7 @@ c_parser_pragma (c_parser *parser, enum pragma_context 
context, bool *if_p)
   unsigned int id;
   const char *construct = NULL;
 
+  input_location = c_parser_peek_token (parser)->location;
   id = c_parser_peek_token (parser)->pragma_kind;
   gcc_assert (id != PRAGMA_NONE);
 
diff --git a/gcc/testsuite/c-c++-common/pr97498.c 
b/gcc/testsuite/c-c++-common/pr97498.c
new file mode 100644
index 000..f5fa420415b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr97498.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wunused-function" } */
+#pragma GCC diagnostic ignored "-Wunused-function"
+static void f() {} _Pragma("GCC diagnostic error \"-Wunused-function\"") /* { 
dg-bogus "-Wunused-function" } */
diff --git a/gcc/testsuite/gcc.dg/pragma-message.c 
b/gcc/testsuite/gcc.dg/pragma-message.c
index 2f44b617710..1b7cf09de0a 100644
--- a/gcc/testsuite/gcc.dg/pragma-message.c
+++ b/gcc/testsuite/gcc.dg/pragma-message.c
@@ -42,9 +42,11 @@
 #pragma message ("Okay " THREE)  /* { dg-message "Okay 3" } */
 
 /* Create a TODO() that prints a message on compilation.  */
-#define DO_PRAGMA(x) _Pragma (#x)
-#define TODO(x) DO_PRAGMA(message ("TODO - " #x))
-TODO(Okay 4) /* { dg-message "TODO - Okay 4" } */
+#define DO_PRAGMA(x) _Pragma (#x) /* { dg-line pragma_loc1 } */
+#define TODO(x) DO_PRAGMA(message ("TODO - " #x)) /* { dg-line pragma_loc2 } */
+TODO(Okay 4) /* { dg-message "in expansion of macro 'TODO'" } */
+/* { dg-message "TODO - Okay 4" "test4.1" { target *-*-* } pragma_loc1 } */
+/* { dg-message "in expansion of macro 'DO_PRAGMA'" "test4.2" { target *-*-* } 
pragma_loc2 } */
 
 #if 0
 #pragma message ("Not printed")
diff --git a/libcpp/directives.c b/libcpp/directives.c
index cab9aad64d2..0d8545a7df9 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -1887,6 +1887,7 @@ destringize_and_run (cpp_reader *pfile, const cpp_string 
*in,
   maxcount = 50;
   toks = XNEWVEC (cpp_token, maxcount);
   toks[0] = pfile->directive_result;
+  toks[0].src_loc = expansion_loc;
 
   do
{
[GCC11] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location 

Re: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.

2022-08-02 Thread Joseph Myers
On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This patch corrects handling of UTF-8 character literals in preprocessing
> directives so that they are treated as unsigned types in char8_t enabled
> C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
> UTF-8 character literals were always treated as having the same type as
> ordinary character literals (signed or unsigned dependent on target or use
> of the -fsigned-char or -funsigned char options).

OK in the absence of C++ maintainer objections within 72 hours.  (This is 
the case where, when I added support for such literals for C (commit 
7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of 
whether they should be unsigned in the preprocessor for C++ as well.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes

2022-08-02 Thread Joseph Myers
On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This patch implements the core language and compiler dependent library
> changes adopted for C2X via WG14 N2653.  The changes include:
> - Change of type for UTF-8 string literals from array of const char to
>   array of const char8_t (unsigned char).
> - A new atomic_char8_t typedef.
> - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
>   __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Segher Boessenkool
Hi!

On Fri, Jul 29, 2022 at 07:57:51AM +0100, Roger Sayle wrote:
> > On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > > This patch implements some additional zero-extension and
> > > sign-extension related optimizations in simplify-rtx.cc.  The original
> > > motivation comes from PR rtl-optimization/71775, where in comment #2
> > Andrew Pinski sees:
> > >
> > > Failed to match this instruction:
> > > (set (reg:DI 88 [ _1 ])
> > > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> > >
> > > On many platforms the result of DImode CTZ is constrained to be a
> > > small unsigned integer (between 0 and 64), hence the truncation to
> > > 32-bits (using a SUBREG) and the following sign extension back to
> > > 64-bits are effectively a no-op, so the above should ideally (often)
> > > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> > 
> > And you can also do that if ctz is undefined for a zero argument!
> 
> Forgive my perhaps poor use of terminology.  The case of ctz 0 on
> x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
> would allow us to do anything, but implementation defined (which
> Intel calls "undefined" in their documentation).

This is about CTZ in RTL, in GCC.  CTZ_DEFINED_VALUE_AT_ZERO is 0 here,
which means a zero argument gives an undefined result.

> Hence, we don't
> know which DI value is placed in the result register.  In this case,
> truncating to SI mode, then sign extending the result is not a no-op,
> as the top bits will/must now all be the same [though admittedly to an
> unknown undefined signbit].

And any value is valid.

> Hence the above optimization would 
> be invalid, as it doesn't guarantee the result would be sign-extended.

It does not have to be!  Truncating an undefined DImode value to SIMode
gives an undefined SImode value.  On most architectures (including x86
afaik) you do not need to do any machine insn for that (the top 32 bits
in the register are just ignored for a SImode value).

> > Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> > value it returns in its second arg does not survive sign extending
> unmodified (if it
> > is 0x for an extend from SI to DI for example).
> 
> Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
> result, such as -1 is already handled (accounted for) in nonzero_bits.  The
> relevant
> code in rtlanal.cc's nonzero_bits1 is:

A negative result, yes.  But that was not my example.


Segher


[COMMITTED] Adjust testsuite/gcc.dg/tree-ssa/vrp-float-1.c

2022-08-02 Thread Aldy Hernandez via Gcc-patches
I missed the -details dump flag, plus I wasn't checking the actual folding.
As a bonus I had flipped the dump file name and the count, so the test
was coming out as unresolved, which I missed because I was only checking
for failures and passes.

Whooops.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp-float-1.c: Adjust test so it passes.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c
index 88faf72ac42..5be54267cf7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O2 -fdisable-tree-ethread -fdisable-tree-fre1 
-fdump-tree-evrp" }
+// { dg-options "-O2 -fdisable-tree-ethread -fdisable-tree-fre1 
-fdump-tree-evrp-details" }
 
 void bar ();
 void george ();
@@ -16,4 +16,4 @@ foo (float x, float y)
 }
 }
 
-// { dg-final { scan-tree-dump-times "Folding predicate x_*to 1" "evrp" 1 } }
+// { dg-final { scan-tree-dump-times "Folded into: if \\(1 != 0\\)" 1 "evrp" } 
}
-- 
2.37.1



[PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.

2022-08-02 Thread Tom Honermann via Gcc-patches
This patch corrects handling of UTF-8 character literals in preprocessing
directives so that they are treated as unsigned types in char8_t enabled
C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
UTF-8 character literals were always treated as having the same type as
ordinary character literals (signed or unsigned dependent on target or use
of the -fsigned-char or -funsigned char options).

PR preprocessor/106426

gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char
subject to -fchar8_t, -fsigned-char, and/or -funsigned-char.

gcc/testsuite/ChangeLog:
* g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals.
* g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals.

libcpp/ChangeLog:
* charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR
literals based on unsigned_utf8char.
* include/cpplib.h (cpp_options): Add unsigned_utf8char.
* init.cc (cpp_create_reader): Initialize unsigned_utf8char.
---
 gcc/c-family/c-opts.cc| 1 +
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C | 6 +-
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C | 4 
 libcpp/charset.cc | 4 ++--
 libcpp/include/cpplib.h   | 4 ++--
 libcpp/init.cc| 1 +
 6 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 108adc5caf8..02ce1e86cdb 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1062,6 +1062,7 @@ c_common_post_options (const char **pfilename)
   /* char8_t support is implicitly enabled in C++20 and C2X.  */
   if (flag_char8_t == -1)
 flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x;
+  cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char;
 
   if (flag_extern_tls_init)
 {
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C 
b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
index 8ed85ccfdcd..2994dd38516 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
@@ -1,6 +1,6 @@
 // Test that UTF-8 character literals have type char if -fchar8_t is not 
enabled.
 // { dg-do compile }
-// { dg-options "-std=c++17 -fno-char8_t" }
+// { dg-options "-std=c++17 -fsigned-char -fno-char8_t" }
 
 template
   struct is_same
@@ -10,3 +10,7 @@ template
   { static const bool value = true; };
 
 static_assert(is_same::value, "Error");
+
+#if u8'\0' - 1 > 0
+#error "UTF-8 character literals not signed in preprocessor"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C 
b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
index 7861736689c..db4fe70046d 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
@@ -10,3 +10,7 @@ template
   { static const bool value = true; };
 
 static_assert(is_same::value, "Error");
+
+#if u8'\0' - 1 < 0
+#error "UTF-8 character literals not unsigned in preprocessor"
+#endif
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index ca8b7cf7aa5..12e31632228 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1960,8 +1960,8 @@ narrow_str_to_charconst (cpp_reader *pfile, cpp_string 
str,
   /* Multichar constants are of type int and therefore signed.  */
   if (i > 1)
 unsigned_p = 0;
-  else if (type == CPP_UTF8CHAR && !CPP_OPTION (pfile, cplusplus))
-unsigned_p = 1;
+  else if (type == CPP_UTF8CHAR)
+unsigned_p = CPP_OPTION (pfile, unsigned_utf8char);
   else
 unsigned_p = CPP_OPTION (pfile, unsigned_char);
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 3eba6f74b57..f9c042db034 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -581,8 +581,8 @@ struct cpp_options
  ints and target wide characters, respectively.  */
   size_t precision, char_precision, int_precision, wchar_precision;
 
-  /* True means chars (wide chars) are unsigned.  */
-  bool unsigned_char, unsigned_wchar;
+  /* True means chars (wide chars, UTF-8 chars) are unsigned.  */
+  bool unsigned_char, unsigned_wchar, unsigned_utf8char;
 
   /* True if the most significant byte in a word has the lowest
  address in memory.  */
diff --git a/libcpp/init.cc b/libcpp/init.cc
index f4ab83d2145..0242da5f55c 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -231,6 +231,7 @@ cpp_create_reader (enum c_lang lang, cpp_hash_table *table,
   CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
   CPP_OPTION (pfile, unsigned_char) = 0;
   CPP_OPTION (pfile, unsigned_wchar) = 1;
+  CPP_OPTION (pfile, unsigned_utf8char) = 1;
   CPP_OPTION (pfile, bytes_big_endian) = 1;  /* does not matter */
 
   /* Default to no charset conversion.  */
-- 
2.32.0



[PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes

2022-08-02 Thread Tom Honermann via Gcc-patches
This patch implements the core language and compiler dependent library
changes adopted for C2X via WG14 N2653.  The changes include:
- Change of type for UTF-8 string literals from array of const char to
  array of const char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
  __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.

gcc/ChangeLog:

* ginclude/stdatomic.h (atomic_char8_t,
ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro.

gcc/c/ChangeLog:

* c-parser.c (c_parser_string_literal): Use char8_t as the type
of CPP_UTF8STRING when char8_t support is enabled.
* c-typeck.c (digest_init): Allow initialization of an array
of character type by a string literal with type array of
char8_t.

gcc/c-family/ChangeLog:

* c-lex.c (lex_string, lex_charconst): Use char8_t as the type
of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is
enabled.
* c-opts.c (c_common_post_options): Set flag_char8_t if
targeting C2x.

gcc/testsuite/ChangeLog:
* gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/c11-utf8str-type.c: New test.
* gcc.dg/c17-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str.c: New test.
* gcc.dg/gnu2x-utf8str-type.c: New test.
* gcc.dg/gnu2x-utf8str.c: New test.
---
 gcc/c-family/c-lex.cc | 13 --
 gcc/c-family/c-opts.cc|  4 +-
 gcc/c/c-parser.cc | 16 ++-
 gcc/c/c-typeck.cc |  2 +-
 gcc/ginclude/stdatomic.h  |  6 +++
 .../atomic/c2x-stdatomic-lockfree-char8_t.c   | 42 +++
 .../atomic/gnu2x-stdatomic-lockfree-char8_t.c |  5 +++
 gcc/testsuite/gcc.dg/c11-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c17-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str.c| 34 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c |  5 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str.c  | 34 +++
 13 files changed, 170 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
 create mode 100644 
gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/c11-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c17-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str.c

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 8bfa4f4024f..0b6f94e18a8 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -1352,7 +1352,14 @@ lex_string (const cpp_token *tok, tree *valp, bool 
objc_string, bool translate)
default:
case CPP_STRING:
case CPP_UTF8STRING:
- value = build_string (1, "");
+ if (type == CPP_UTF8STRING && flag_char8_t)
+   {
+ value = build_string (TYPE_PRECISION (char8_type_node)
+   / TYPE_PRECISION (char_type_node),
+   "");  /* char8_t is 8 bits */
+   }
+ else
+   value = build_string (1, "");
  break;
case CPP_STRING16:
  value = build_string (TYPE_PRECISION (char16_type_node)
@@ -1425,9 +1432,7 @@ lex_charconst (const cpp_token *token)
 type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
 {
-  if (!c_dialect_cxx ())
-   type = unsigned_char_type_node;
-  else if (flag_char8_t)
+  if (flag_char8_t)
 type = char8_type_node;
   else
 type = char_type_node;
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index b9f01a65ed7..108adc5caf8 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1059,9 +1059,9 @@ c_common_post_options (const char **pfilename)
   if (flag_sized_deallocation == -1)
 flag_sized_deallocation = (cxx_dialect >= cxx14);
 
-  /* char8_t support is new in C++20.  */
+  /* char8_t support is implicitly enabled in C++20 and C2X.  */
   if (flag_char8_t == -1)
-flag_char8_t = (cxx_dialect >= cxx20);
+flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x;
 
   if (flag_extern_tls_init)
 {
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 92049d1a101..fa9395986de 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -7447,7 +7447,14 @@ c_parser_string_literal (c_parser *parser, bool 
translate, bool wide_ok)
default:
case CPP_STRING:
case CPP_UTF8STRING:
- value = build_string (1, "");
+

[PATCH v4 0/2] Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++

2022-08-02 Thread Tom Honermann via Gcc-patches
This patch series provides an implementation and tests for the WG14 N2653
paper as adopted for C2X.

Additionally, a fix is included for the C++ preprocessor to treat UTF-8
character literals in preprocessor directives as an unsigned type in char8_t
enabled modes (in C++17 and earlier with -fchar8_t or in C++20 or later
without -fno-char8_t).

Tom Honermann (2):
  C: Implement C2X N2653 char8_t and UTF-8 string literal changes
  preprocessor/106426: Treat u8 character literals as unsigned in
char8_t modes.

 gcc/c-family/c-lex.cc | 13 --
 gcc/c-family/c-opts.cc|  5 ++-
 gcc/c/c-parser.cc | 16 ++-
 gcc/c/c-typeck.cc |  2 +-
 gcc/ginclude/stdatomic.h  |  6 +++
 .../g++.dg/ext/char8_t-char-literal-1.C   |  6 ++-
 .../g++.dg/ext/char8_t-char-literal-2.C   |  4 ++
 .../atomic/c2x-stdatomic-lockfree-char8_t.c   | 42 +++
 .../atomic/gnu2x-stdatomic-lockfree-char8_t.c |  5 +++
 gcc/testsuite/gcc.dg/c11-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c17-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str-type.c   |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str.c| 34 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c |  5 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str.c  | 34 +++
 libcpp/charset.cc |  4 +-
 libcpp/include/cpplib.h   |  4 +-
 libcpp/init.cc|  1 +
 18 files changed, 185 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
 create mode 100644 
gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/c11-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c17-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str.c

-- 
2.32.0



[COMMITTED] PR tree-optimization/106474 - Check equivalencies when calculating range on entry.

2022-08-02 Thread Andrew MacLeod via Gcc-patches
This PR demonstrates a case where evaluating equivalencies can assist 
VRP in refining values.


Previous revisions did not do this due to the expense of calculating or 
checking for equivalencies everytime we process a use.


Earlier change to range_from_dom provided modes for the cache to query, 
and in particular, it has a mode which can query the DOM tree, but not 
update the cache.   This tends to be quite quick, and will not cause any 
memory increases.


We can also check if an ssa-name has had any outgoing range calculated 
thus far, and can significantly trim the number of equivalences that 
need to be examined.


This patch adjusts the on-entry propagator to check if there are any 
existing equivalence ranges before it updates the cache and returns the 
on-entry value to ranger.


This causes a very marginal slowdown of 0.34% building gcc across the 2 
VRP passes using ranger..


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit 53f7b80929978dd2773f58cfd0c2cfa49f54538e
Author: Andrew MacLeod 
Date:   Fri Jul 29 12:05:38 2022 -0400

Check equivalencies when calculating range on entry.

When propagating on-entry values in the cache, checking if any equivalence
has a known value can improve results.  No new calculations are made.
Only queries via dominators which do not populate the cache are checked.

PR tree-optimization/106474
gcc/
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Query
range of equivalences that may contribute to the range.

gcc/testsuite/
* g++.dg/pr106474.C: New.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index d9e160c9a2a..4782d47265e 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1211,13 +1211,56 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   // Check if a dominators can supply the range.
   if (range_from_dom (block_result, name, bb, RFD_FILL))
 {
-  m_on_entry.set_bb_range (name, bb, block_result);
   if (DEBUG_RANGE_CACHE)
 	{
 	  fprintf (dump_file, "Filled from dominator! :  ");
 	  block_result.dump (dump_file);
 	  fprintf (dump_file, "\n");
 	}
+  // See if any equivalences can refine it.
+  if (m_oracle)
+	{
+	  unsigned i;
+	  bitmap_iterator bi;
+	  // Query equivalences in read-only mode.
+	  const_bitmap equiv = m_oracle->equiv_set (name, bb);
+	  EXECUTE_IF_SET_IN_BITMAP (equiv, 0, i, bi)
+	{
+	  if (i == SSA_NAME_VERSION (name))
+		continue;
+	  tree equiv_name = ssa_name (i);
+	  basic_block equiv_bb = gimple_bb (SSA_NAME_DEF_STMT (equiv_name));
+
+	  // Check if the equiv has any ranges calculated.
+	  if (!m_gori.has_edge_range_p (equiv_name))
+		continue;
+
+	  // Check if the equiv definition dominates this block
+	  if (equiv_bb == bb ||
+		  (equiv_bb && !dominated_by_p (CDI_DOMINATORS, bb, equiv_bb)))
+		continue;
+
+	  Value_Range equiv_range (TREE_TYPE (equiv_name));
+	  if (range_from_dom (equiv_range, equiv_name, bb, RFD_READ_ONLY))
+		{
+		  if (block_result.intersect (equiv_range))
+		{
+		  if (DEBUG_RANGE_CACHE)
+			{
+			  fprintf (dump_file, "Equivalence update! :  ");
+			  print_generic_expr (dump_file, equiv_name, TDF_SLIM);
+			  fprintf (dump_file, "had range  :  ");
+			  equiv_range.dump (dump_file);
+			  fprintf (dump_file, " refining range to :");
+			  block_result.dump (dump_file);
+			  fprintf (dump_file, "\n");
+			}
+		}
+		}
+	}
+	}
+
+  m_on_entry.set_bb_range (name, bb, block_result);
   gcc_checking_assert (m_workback.length () == 0);
   return;
 }
diff --git a/gcc/testsuite/g++.dg/pr106474.C b/gcc/testsuite/g++.dg/pr106474.C
new file mode 100644
index 000..6cd37a20643
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr106474.C
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp " } */
+
+void foo();
+static void __attribute__ ((noinline)) DCEMarker0_() {foo ();}
+
+void f(bool s, bool c) {
+if ((!c == !s) && !c) {
+if (s) {
+DCEMarker0_();
+}
+}
+}
+
+// With equivalences, vrp should be able to remove all IFs.
+/* { dg-final { scan-tree-dump-not "goto" "evrp" } } */


Re: [PATCH] Fix ICE when multiple speculative targets are expanded in different ltrans unit [PR93318]

2022-08-02 Thread Martin Jambor
Hi,

On Thu, Jun 23 2022, Xionghu Luo via Gcc-patches wrote:
> There is a corner case for speculative multiple targets, that if speculative
> edges are streamed to different ltrans units, and then edges are expanded
> in one ltrans unit but the speculative property is not cleared by
> resolve_speculation in other ltrans unit finally cause ICE.

I am sorry but this does not sound right.  The ltrans compilations are
different processes and changes in the call graph of one cannot alter
the others.

In the ICEing ltrans, at what point does the speculative edge become
lacking an indirect counterpart?  If filing a bug report is impractical,
can you please try figuring that out with a gdb conditional breakpoint
on cgraph_edge::remove if it gets removed and/or watch point on its
speculative flag if its speculation gets resolved?

IMHO that point seems to the be the correct time to avoid this
situation, rather than gracefully ignoring it later.

> This patch fixes this by adding checking to guard speculative edge
> without no indirect edge binded to it.  No case is provided since this
> is from large program with 128 LTO ltrans units not easy to reduce
> originated from protobuf.
>
> Bootstrap and regression tested pass on x86_64-linux-gnu, OK for master?
>
> Signed-off-by: Xionghu Luo 
>
> gcc/ChangeLog:
>
>   PR ipa/93318
>   * cgraph.cc (cgraph_edge::resolve_speculation): Remove
>   speculative info if no indirect edge found.
>   * cgraph.h: Return NULL if the edges are resloved in other
>   ltrans units.
>   * tree-inline.cc (copy_bb): Only clone current edge if the
>   speculative call is processed in other ltrans units.
>
> Signed-off-by: Xionghu Luo 
> ---
>  gcc/cgraph.cc  |  7 +++
>  gcc/cgraph.h   |  3 ++-
>  gcc/tree-inline.cc | 11 +++
>  3 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index 7eeda53ca84..120c0ac7650 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -1217,6 +1217,13 @@ cgraph_edge::resolve_speculation (cgraph_edge *edge, 
> tree callee_decl)
>  e2 = edge;
>ref = e2->speculative_call_target_ref ();
>edge = edge->speculative_call_indirect_edge ();
> +  if (!edge)
> +  {
> +e2->speculative = false;
> +ref->remove_reference ();
> +return e2;
> +  }
> +
>if (!callee_decl
>|| !ref->referred->semantically_equivalent_p
>  (symtab_node::get (callee_decl)))
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 4be67e3cea9..5404f023e31 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -1773,11 +1773,12 @@ public:
>  if (!callee)
>return this;
>  for (cgraph_edge *e2 = caller->indirect_calls;
> -  true; e2 = e2->next_callee)
> +  e2; e2 = e2->next_callee)
>if (e2->speculative
> && call_stmt == e2->call_stmt
> && lto_stmt_uid == e2->lto_stmt_uid)
>   return e2;
> +return NULL;
>}
>  
>/* When called on any edge in speculative call and when given any target
> diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> index 043e1d5987a..777d9efdd70 100644
> --- a/gcc/tree-inline.cc
> +++ b/gcc/tree-inline.cc
> @@ -2262,6 +2262,17 @@ copy_bb (copy_body_data *id, basic_block bb,
>  
> cgraph_edge *indirect
>= old_edge->speculative_call_indirect_edge ();
> +   if (indirect == NULL
> +   && old_edge->num_speculative_call_targets_p ()
> +== 0)
> + {
> +   cgraph_edge::resolve_speculation (old_edge);
> +   edge = old_edge->clone (id->dst_node, call_stmt,

using an edge after you have called resolve_speculation on it is bad,
even if you know that the current implementation won't deallocate it
from under you.

But again, I tend to thing the problem must be solved elsewhere.

Thanks,

Martin


Re: [PATCH 2/3 v3] testsuite: Add tests for C2X N2653 char8_t and UTF-8 string literal changes

2022-08-02 Thread Tom Honermann via Gcc-patches

On 8/2/22 12:53 PM, Joseph Myers wrote:

On Mon, 1 Aug 2022, Tom Honermann via Gcc-patches wrote:


This change provides new tests for the core language and compiler
dependent library changes adopted for C2X via WG14 N2653.

Could you please send a complete patch series?  I'm not sure what the
matching patches 1 and 3 are.  Also, I don't generally find it helpful for
tests to be separated from the patch making the changes they test, since
tests are necessary to review of that code.


Absolutely. I'll merge the implementation and test commits, so the next 
series (v4) will have just two commits; one for the C2X N2653 
implementation and the other for the C++ u8 preprocessor string type 
fix. Coming right up.


Tom.



Re: [PATCH] stack-protector: Check stack canary for noreturn function

2022-08-02 Thread H.J. Lu via Gcc-patches
On Sat, Jul 30, 2022 at 1:30 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 7/14/2022 3:55 PM, H.J. Lu via Gcc-patches wrote:
> > Check stack canary for noreturn function to catch stack corruption
> > before calling noreturn function.  For C++, check stack canary when
> > throwing exception or resuming stack unwind to avoid corrupted stack.
> >
> > gcc/
> >
> >   PR middle-end/58245
> >   * calls.cc (expand_call): Check stack canary for noreturn
> >   function.
> >
> > gcc/testsuite/
> >
> >   PR middle-end/58245
> >   * c-c++-common/pr58245-1.c: New test.
> >   * g++.dg/pr58245-1.C: Likewise.
> >   * g++.dg/fstack-protector-strong.C: Adjusted.
> But is this really something we want?   I'd actually lean towards
> eliminating the useless load -- I don't necessarily think we should be
> treating non-returning paths specially here.
>
> The whole point of the stack protector is to prevent the *return* path
> from going to an attacker controlled location.  I'm not sure checking
> the protector at this point actually does anything particularly useful.

throw is marked no return.   Since the unwind library may read
the stack contents to unwind stack, it the stack is corrupted, the
exception handling may go wrong.   Should we handle this case?

 --
H.J.


Re: [PATCH] analyzer: support for creat, dup, dup2 and dup3 [PR106298]

2022-08-02 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-02 at 22:08 +0530, Mir Immad wrote:
> The above patch is bootstrapped, lightly tested (on x86_64 Linux) and
> approved for trunk by David.

For reference, Immad sent that version to me off-list to me for review,
and I approved it.

He's committed it to trunk now (as r13-1936-g6a11f2d974a912).

Dave



Re: [x86 PATCH] PR target/47949: Use xchg to move from/to AX_REG with -Oz.

2022-08-02 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 2, 2022 at 4:59 PM Roger Sayle  wrote:
>
>
> This patch adds a peephole2 to i386.md to implement the suggestion in
> PR target/47949, of using xchg instead of mov for moving values to/from
> the %rax/%eax register, controlled by -Oz, as the xchg instruction is
> one byte shorter than the move it is replacing.
>
> The new test case is taken from the PR:
> int foo(int x) { return x; }
>
> where previously we'd generate:
> foo:mov %edi,%eax  // 2 bytes
> ret
>
> but with this patch, using -Oz, we generate:
> foo:xchg %eax,%edi  // 1 byte
> ret
>
> On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
> the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).
>
> Interestingly, some modern architectures (such as Zen 3) implement
> xchg using zero latency register renaming (just like mov), so in theory
> this transformation could be enabled when optimizing for speed, if
> benchmarking shows the improved code density produces consistently
> better performance.  However, this is architecture dependent, and
> there may be interactions using xchg (instead a single_set) in the
> late RTL passes (such as cprop_hardreg), so for now I've restricted
> this to -Oz.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  
>
> gcc/ChangeLog
> PR target/47949
> * config/i386/i386.md (peephole2): New peephole2 to convert
> SWI48 moves to/from %rax/%eax where the src is dead to xchg,
> when optimizing for minimal size with -Oz.
>
> gcc/testsuite/ChangeLog
> PR target/47949
> * gcc.target/i386/pr47949.c: New test case.

OK.

BTW: The testcase can be improved to cover Dimode as well as SImode case:

+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Oz" } */
+
+int foo(int x)

long foo (long x, long y)
{
   return y;
}

And add additional options of -mregparm=2 for ia32. This will move
%edx to %eax for 32bit and %rsi to %rax for 64bit targets.

Thanks,
Uros.


Re: [PATCH] btf: do not use the CHAR `encoding' bit for BTF

2022-08-02 Thread Jose E. Marchesi via Gcc-patches


> On 8/2/22 08:42, Jose E. Marchesi wrote:
>> 
>>> On 7/26/22 14:58, Indu Bhagat wrote:
 On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>
> Contrary to CTF and our previous expectations, as per [1], turns out
> that in BTF:
>
> 1) The `encoding' field in integer types shall not be treated as a
> bitmap, but as an enumerated, i.e. these bits are exclusive to each
> other.
>
> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
> for char nor `unsigned char'.
>

 Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
 for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
 should not be set for char / unsigned char).
>>>
>>> Agreed it would be good to add this note.
>> 
>> Hmm, I am not sure such a comment actually belongs to include/btf.h,
>> which is not specific to the compiler and is supposed to reflect the BTF
>> format per-se.  The CHAR bit is documented in the kernel documentation
>> and it may be used at some point by bpflib, or who knows what.
>
> OK you make a good point.
>
> In that case the patch LGTM to commit. Thanks!

Pushed to master.
Thanks!

>> 
>> That's why I put the comment in btfout.cc instead, to make it clear that
>> BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
>> compiler:
>> 
> +  /* In BTF the CHAR `encoding' seems to not be used, so clear it
> + here.  */
> +  dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;


Re: [x86 PATCH] Improved pre-reload split of double word comparison against -1.

2022-08-02 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 2, 2022 at 1:31 PM Roger Sayle  wrote:
>
>
> This patch adds an extra optimization to *cmp_doubleword to improve
> the code generated for comparisons against -1.  Hypothetically, if a
> comparison against -1 reached this splitter we'd currently generate code
> that looks like:
>
> notq%rdx; 3 bytes
> notq%rax; 3 bytes
> orq %rdx, %rax  ; 3 bytes
> setne   %al
>
> With this patch we would instead generate the superior:
>
> andq%rdx, %rax  ; 3 bytes
> cmpq$-1, %rax   ; 4 bytes
> setne   %al
>
> which is both faster and smaller, and also what's currently generated
> thanks to the middle-end splitting double word comparisons against
> zero and minus one during RTL expansion.  Should that change, this would
> become a missed-optimization regression, but this patch also (potentially)
> helps suitable comparisons created by CSE and combine.
>
> This patch has been tested on x86_64-pc-linux-gnu, on its own and in
> combination with a middle-end patch tweaking RTL expansion, both with
> and without --target-board=unix{-m32}, with no new failures.
> Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*cmp_doubleword): Add a special case
> to split comparisons against -1 using AND and CMP -1 instructions.

OK.

Thanks,
Uros.


Re: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.

2022-08-02 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 2, 2022 at 7:02 PM Roger Sayle  wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak 
> > Sent: 31 July 2022 18:23
> > To: Roger Sayle 
> > On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle 
> > wrote:
> > >
> > > This patch improves TImode STV by adding support for logical shifts by
> > > integer constants that are multiples of 8.  For the test case:
> > >
> > > __int128 a, b;
> > > void foo() { a = b << 16; }
> > >
> > > on x86_64, gcc -O2 currently generates:
> > >
> > > movqb(%rip), %rax
> > > movqb+8(%rip), %rdx
> > > shldq   $16, %rax, %rdx
> > > salq$16, %rax
> > > movq%rax, a(%rip)
> > > movq%rdx, a+8(%rip)
> > > ret
> > >
> > > with this patch we now generate:
> > >
> > > movdqa  b(%rip), %xmm0
> > > pslldq  $2, %xmm0
> > > movaps  %xmm0, a(%rip)
> > > ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check. both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2022-07-28  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > * config/i386/i386-features.cc (compute_convert_gain): Add gain
> > > for converting suitable TImode shift to a V1TImode shift.
> > > (timode_scalar_chain::convert_insn): Add support for converting
> > > suitable ASHIFT and LSHIFTRT.
> > > (timode_scalar_to_vector_candidate_p): Consider logical shifts
> > > by integer constants that are multiples of 8 to be candidates.
> > >
> > > gcc/testsuite/ChangeLog
> > > * gcc.target/i386/sse4_1-stv-7.c: New test case.
> >
> > + case ASHIFT:
> > + case LSHIFTRT:
> > +  /* For logical shifts by constant multiples of 8. */  igain =
> > + optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
> > +  : COSTS_N_INSNS (1);
> >
> > Isn't the conversion an universal win for -O2 as well as for -Os? The 
> > conversion
> > to/from XMM register is already accounted for, so for -Os substituting
> > shldq/salq with pslldq should always be a win. I'd expect the cost 
> > calculation to
> > be similar to the general_scalar_chain::compute_convert_gain cost 
> > calculation
> > with m = 2.
>
> I agree that the terminology is perhaps a little confusing.  The
> compute_convert_gain function calculates the total "gain" from an
> STV chain, summing the igain of each instruction, and performs
> the STV transformation if this total gain is greater than zero.
> Hence positive values are good and negative values are bad.
>
> In this case, of a logical shift by multiple of 8, converting the chain is 
> indeed always
> beneficial, reducing by 4 bytes in size when optimizing for size, and 
> avoiding 1 fast
> instruction when optimizing for speed.  Having a "positive gain of four 
> bytes" sounds bad,
> but in this sense the gain is used as a synonym of "benefit" not "magnitude".
>
> By comparison, shifting by a single bit 128 bit value is always a net loss, 
> requiring
> three addition fast instructions, or 15 extra bytes in size.  However, it's 
> still worth
> considering/capturing these counter-productive (i.e. negative) values, as they
> might be compensated for by other wins in the chain.
>
> Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS
> (when optimizing for speed) allows much finer granularity.  For example,
> the constant number of bits used in a shift/rotate, or the value of an
> immediate constant in a compare have significant effects on the size/speed
> of scalar vs. vector code, and this isn't (yet) something easily handled by 
> the
> simple "m" approximation used in general_scalar_chain::compute_convert_gain.
>
> See (comment #5 of) PR target/105034 which mentions the need for more
> accurate parameterization of compute_convert_gain (in response to the
> undesirable suggestion of simply disabling STV when optimizing for size).
>
> I hope this explains the above idiom.  Hopefully, things will become clearer
> when support for shifts by other bit counts, and arithmetic shifts, are added
> to this part of the code (STV).  I'll be sure to add more comments.

Thanks for the explanation, I was confused by the differences between
32bit DImode and 64bit TImode costs (these should IMO be quite
similar, so if the TImode cost function is now more precise, DImode
cost function should follow it).

> [Ok for mainline?]

OK. The cost functions are always in need of more accuracy..

Thanks,
Uros.


RE: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.

2022-08-02 Thread Roger Sayle


Hi Uros,

> From: Uros Bizjak 
> Sent: 31 July 2022 18:23
> To: Roger Sayle 
> On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle 
> wrote:
> >
> > This patch improves TImode STV by adding support for logical shifts by
> > integer constants that are multiples of 8.  For the test case:
> >
> > __int128 a, b;
> > void foo() { a = b << 16; }
> >
> > on x86_64, gcc -O2 currently generates:
> >
> > movqb(%rip), %rax
> > movqb+8(%rip), %rdx
> > shldq   $16, %rax, %rdx
> > salq$16, %rax
> > movq%rax, a(%rip)
> > movq%rdx, a+8(%rip)
> > ret
> >
> > with this patch we now generate:
> >
> > movdqa  b(%rip), %xmm0
> > pslldq  $2, %xmm0
> > movaps  %xmm0, a(%rip)
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check. both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-07-28  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-features.cc (compute_convert_gain): Add gain
> > for converting suitable TImode shift to a V1TImode shift.
> > (timode_scalar_chain::convert_insn): Add support for converting
> > suitable ASHIFT and LSHIFTRT.
> > (timode_scalar_to_vector_candidate_p): Consider logical shifts
> > by integer constants that are multiples of 8 to be candidates.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/sse4_1-stv-7.c: New test case.
> 
> + case ASHIFT:
> + case LSHIFTRT:
> +  /* For logical shifts by constant multiples of 8. */  igain =
> + optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
> +  : COSTS_N_INSNS (1);
> 
> Isn't the conversion an universal win for -O2 as well as for -Os? The 
> conversion
> to/from XMM register is already accounted for, so for -Os substituting
> shldq/salq with pslldq should always be a win. I'd expect the cost 
> calculation to
> be similar to the general_scalar_chain::compute_convert_gain cost calculation
> with m = 2.

I agree that the terminology is perhaps a little confusing.  The
compute_convert_gain function calculates the total "gain" from an
STV chain, summing the igain of each instruction, and performs
the STV transformation if this total gain is greater than zero.
Hence positive values are good and negative values are bad.

In this case, of a logical shift by multiple of 8, converting the chain is 
indeed always
beneficial, reducing by 4 bytes in size when optimizing for size, and avoiding 
1 fast
instruction when optimizing for speed.  Having a "positive gain of four bytes" 
sounds bad,
but in this sense the gain is used as a synonym of "benefit" not "magnitude".

By comparison, shifting by a single bit 128 bit value is always a net loss, 
requiring
three addition fast instructions, or 15 extra bytes in size.  However, it's 
still worth
considering/capturing these counter-productive (i.e. negative) values, as they
might be compensated for by other wins in the chain.

Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS
(when optimizing for speed) allows much finer granularity.  For example,
the constant number of bits used in a shift/rotate, or the value of an
immediate constant in a compare have significant effects on the size/speed
of scalar vs. vector code, and this isn't (yet) something easily handled by the
simple "m" approximation used in general_scalar_chain::compute_convert_gain.

See (comment #5 of) PR target/105034 which mentions the need for more
accurate parameterization of compute_convert_gain (in response to the
undesirable suggestion of simply disabling STV when optimizing for size). 

I hope this explains the above idiom.  Hopefully, things will become clearer
when support for shifts by other bit counts, and arithmetic shifts, are added
to this part of the code (STV).  I'll be sure to add more comments.

[Ok for mainline?]

Cheers,
Roger
--




Re: How to check -std=c89 or -std=gnu89 is set in C FE?

2022-08-02 Thread Qing Zhao via Gcc-patches


> On Aug 2, 2022, at 12:34 PM, Marek Polacek  wrote:
> 
> On Tue, Aug 02, 2022 at 04:19:56PM +, Qing Zhao via Gcc-patches wrote:
>> Hi, Joseph,
>> 
>> When -std=c89 or -std=gnu89 present in the command line, in C FE, which 
>> flags should be
>> checked to decide it’s -std=c89 or -std=gnu89?
> 
> You should be able to check flag_iso and related.

Got it, thank you!

Qing
> 
> Marek
> 



Re: [PATCH 2/3 v3] testsuite: Add tests for C2X N2653 char8_t and UTF-8 string literal changes

2022-08-02 Thread Joseph Myers
On Mon, 1 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This change provides new tests for the core language and compiler
> dependent library changes adopted for C2X via WG14 N2653.

Could you please send a complete patch series?  I'm not sure what the 
matching patches 1 and 3 are.  Also, I don't generally find it helpful for 
tests to be separated from the patch making the changes they test, since 
tests are necessary to review of that code.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> "Roger Sayle"  writes:
>> Many thanks to Segher and Richard for pointing out that my removal
>> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
>> of this patch was incorrect, and my assumption that these would be 
>> subsumed by val_signbit_known_clear_p was mistaken.  That the
>> tests for ABS and FFS looked out of place, was not an indication that
>> they were not required, but that we were missing simplifications for
>> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
>> To make up for this mistake, in this revised patch I've not only restored
>> the tests for ABS and FFS, but also added the many sibling RTX codes
>> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32},
>> with no new failures.  Ok for mainline?
>>
>>
>> 2022-08-02  Roger Sayle  
>> Segher Boessenkool  
>> Richard Sandiford  
>
> Thanks, but I didn't actually write anything. :-)
>
>> gcc/ChangeLog
>> * simplify_rtx.cc (simplify_unary_operation_1) : Add
>> optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
>> and LSHIFTRT that are all positive to complement the existing
>> FFS and (idempotent) ABS simplifications.
>> : Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>> val_signbit_known_clear_p is true of the operand.
>> Simplify sign extensions of SUBREG truncations of operands
>> that are already suitably (zero) extended.
>> : Simplify zero extensions of SUBREG truncations
>> of operands that are already suitably zero extended.
>>
>> Thanks in advance,
>> Roger
>> --
>>
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: 02 August 2022 10:39
>>> To: Roger Sayle 
>>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>>> 
>>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
>> in
>>> simplify-rtx.
>>> 
>>> "Roger Sayle"  writes:
>>> > This patch implements some additional zero-extension and
>>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>>> Andrew Pinski sees:
>>> >
>>> > Failed to match this instruction:
>>> > (set (reg:DI 88 [ _1 ])
>>> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>>> >
>>> > On many platforms the result of DImode CTZ is constrained to be a
>>> > small unsigned integer (between 0 and 64), hence the truncation to
>>> > 32-bits (using a SUBREG) and the following sign extension back to
>>> > 64-bits are effectively a no-op, so the above should ideally (often)
>>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>>> >
>>> > To implement this, and some closely related transformations, we build
>>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>>> > can itself be simplified.
>>> 
>>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
>> minimum
>>> integer (giving back the minimum integer), so we can't assume that ABS
>> leaves
>>> the sign bit clear.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>>> > sign-bit is known to be clear.  The final two chunks are for
>>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>>> > pessimistically thinks that the upper bits may have an arbitrary value
>>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>>> > to confirm that the high bits are known to be zero.
>>> >
>>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>>> > default architecture options is undefined at zero, so we can't be sure
>>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
>> ones).
>>> > nonzero_bits knows this, so the above transformations don't trigger,
>>> > but the transformations themselves are perfectly valid for other
>>> > operations such as FFS, POPCOUNT and PARITY, and on other
>>> > targets/-march settings where CTZ is defined at zero.
>>> >
>>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> > and make -k check, both with and without --target_board=unix{-m32},
>>> > with no new failures.  Testing with CSiBE shows these transformations
>>> > trigger on several source files (and with -Os reduces the size of the
>>> > code).  Ok for mainline?
>>> >
>>> >
>>> > 2022-07-27  Roger Sayle  
>>> >
>>> > gcc/ChangeLog
>>> > * simplify_rtx.cc 

Re: [PATCH] analyzer: support for creat, dup, dup2 and dup3 [PR106298]

2022-08-02 Thread Mir Immad via Gcc-patches
The above patch is bootstrapped, lightly tested (on x86_64 Linux) and
approved for trunk by David.

On Tue, Aug 2, 2022 at 10:04 PM Immad Mir  wrote:

> This patch extends the state machine in sm-fd.cc to support
> creat, dup, dup2 and dup3 functions.
>
> Lightly tested on x86_64 Linux.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/106298
> * sm-fd.cc (fd_state_machine::on_open): Add
> creat, dup, dup2 and dup3 functions.
> (enum dup): New.
> (fd_state_machine::valid_to_unchecked_state): New.
> (fd_state_machine::on_creat): New.
> (fd_state_machine::on_dup): New.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/106298
> * gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
> * gcc.dg/analyzer/fd-2.c: Likewise.
> * gcc.dg/analyzer/fd-4.c: Likewise.
> * gcc.dg/analyzer/fd-dup-1.c: New tests.
>
> Signed-off-by: Immad Mir 
> ---
>  gcc/analyzer/sm-fd.cc| 129 -
>  gcc/testsuite/gcc.dg/analyzer/fd-1.c |  21 +++
>  gcc/testsuite/gcc.dg/analyzer/fd-2.c |  15 ++
>  gcc/testsuite/gcc.dg/analyzer/fd-4.c |  31 +++-
>  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 223 +++
>  5 files changed, 415 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
>
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> index ed923ade100..8bb76d72b05 100644
> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -69,6 +69,14 @@ enum access_directions
>DIRS_WRITE
>  };
>
> +/* An enum for distinguishing between dup, dup2 and dup3.  */
> +enum dup
> +{
> +  DUP_1,
> +  DUP_2,
> +  DUP_3
> +};
> +
>  class fd_state_machine : public state_machine
>  {
>  public:
> @@ -114,7 +122,9 @@ public:
>bool is_readonly_fd_p (state_t s) const;
>bool is_writeonly_fd_p (state_t s) const;
>enum access_mode get_access_mode_from_flag (int flag) const;
> -
> +  /* Function for one-to-one correspondence between valid
> + and unchecked states.  */
> +  state_t valid_to_unchecked_state (state_t state) const;
>/* State for a constant file descriptor (>= 0) */
>state_t m_constant_fd;
>
> @@ -147,6 +157,8 @@ public:
>  private:
>void on_open (sm_context *sm_ctxt, const supernode *node, const gimple
> *stmt,
> const gcall *call) const;
> +  void on_creat (sm_context *sm_ctxt, const supernode *node, const gimple
> *stmt,
> +   const gcall *call) const;
>void on_close (sm_context *sm_ctxt, const supernode *node, const gimple
> *stmt,
>  const gcall *call) const;
>void on_read (sm_context *sm_ctxt, const supernode *node, const gimple
> *stmt,
> @@ -170,6 +182,9 @@ private:
>const gimple *stmt, const gcall *call,
>const tree callee_fndecl, const char *attr_name,
>access_directions fd_attr_access_dir) const;
> +  void check_for_dup (sm_context *sm_ctxt, const supernode *node,
> +   const gimple *stmt, const gcall *call, const tree callee_fndecl,
> +   enum dup kind) const;
>  };
>
>  /* Base diagnostic class relative to fd_state_machine.  */
> @@ -723,6 +738,20 @@ fd_state_machine::is_constant_fd_p (state_t state)
> const
>return (state == m_constant_fd);
>  }
>
> +fd_state_machine::state_t
> +fd_state_machine::valid_to_unchecked_state (state_t state) const
> +{
> +  if (state == m_valid_read_write)
> +return m_unchecked_read_write;
> +  else if (state == m_valid_write_only)
> +return m_unchecked_write_only;
> +  else if (state == m_valid_read_only)
> +return m_unchecked_read_only;
> +  else
> +gcc_unreachable ();
> +  return NULL;
> +}
> +
>  bool
>  fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
>const gimple *stmt) const
> @@ -736,6 +765,11 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const
> supernode *node,
> return true;
>   } //  "open"
>
> +   if (is_named_call_p (callee_fndecl, "creat", call, 2))
> + {
> +   on_creat (sm_ctxt, node, stmt, call);
> + } // "creat"
> +
> if (is_named_call_p (callee_fndecl, "close", call, 1))
>   {
> on_close (sm_ctxt, node, stmt, call);
> @@ -754,6 +788,23 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const
> supernode *node,
> return true;
>   } // "read"
>
> +   if (is_named_call_p (callee_fndecl, "dup", call, 1))
> + {
> +   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl,
> DUP_1);
> +   return true;
> + }
> +
> +   if (is_named_call_p (callee_fndecl, "dup2", call, 2))
> + {
> +   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl,
> DUP_2);
> +   return true;
> + }
> +
> +   if (is_named_call_p (callee_fndecl, "dup3", call, 3))
> + {
> +   check_for_dup (sm_ctxt, node, stmt, 

Re: How to check -std=c89 or -std=gnu89 is set in C FE?

2022-08-02 Thread Marek Polacek via Gcc-patches
On Tue, Aug 02, 2022 at 04:19:56PM +, Qing Zhao via Gcc-patches wrote:
> Hi, Joseph,
> 
> When -std=c89 or -std=gnu89 present in the command line, in C FE, which flags 
> should be
> checked to decide it’s -std=c89 or -std=gnu89?

You should be able to check flag_iso and related.

Marek



Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Jan Hubicka via Gcc-patches
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> 
> > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener  wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > Unfortunately, this was before my time, so I don't know.
> > > >
> > > > That being said, thanks for tackling these issues that my work
> > > > triggered last release.  Much appreciated.
> > >
> > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > >
> > > -  else if (n_insns > 1)
> > > +  else if (!m_speed_p && n_insns > 1)
> > >
> > > causing the breakage on the 12 branch.  That leads to a simpler
> > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> > 
> > Huh.  It's been a while, but that looks like a typo.  That patch was
> > supposed to be non-behavior changing.
> 
> Exactly my thinking so reverting it shouldn't be a reason for
> detailed questions.  Now, the contains_hot_bb computation is,
> that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
> together with the comment and a testcase.
> 
> So - Honza, what was the reasoning to look at raw BB counts here
> rather than for example the path entry edge count?
I think the explanation is in the final comment:
  /* Threading is profitable if the path duplicated is hot but also
 in a case we separate cold path from hot path and permit ptimization
 of the hot path later.  Be on the agressive side here. In some estcases,
 as in PR 78407 this leads to noticeable improvements.  */

If you have non-threadable hot path threading out cold paths will make
it easier to be optimized since you have fewer meets in the dataflow.

Honza
> 
> Richard.


Re: [PATCH] IPA: reduce what we dump in normal mode

2022-08-02 Thread Jan Hubicka via Gcc-patches
> gcc/ChangeLog:
> 
>   * profile.cc (compute_branch_probabilities): Dump details only
>   if TDF_DETAILS.
>   * symtab.cc (symtab_node::dump_base): Do not dump pointer unless
>   TDF_ADDRESS is used, it makes comparison harder.
> ---
>  gcc/profile.cc | 2 +-
>  gcc/symtab.cc  | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index 08af512cbca..92de821b8bb 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -766,7 +766,7 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
> sum2 += freq2;
>   }
>   }
> -  if (dump_file)
> +  if (dump_file && (dump_flags & TDF_DETAILS))
If you disable dumping, you can also disable the collection of stats
which is guarded by if (dump_file) as well.  Otherwise the patch is OK.
>   {
> double nsum1 = 0, nsum2 = 0;
> stats.qsort (cmp_stats);
> diff --git a/gcc/symtab.cc b/gcc/symtab.cc
> index 8670337416e..f2d96c0268b 100644
> --- a/gcc/symtab.cc
> +++ b/gcc/symtab.cc
> @@ -894,7 +894,8 @@ symtab_node::dump_base (FILE *f)
>};
>  
>fprintf (f, "%s (%s)", dump_asm_name (), name ());
> -  dump_addr (f, " @", (void *)this);
> +  if (dump_flags & TDF_ADDRESS)
> +dump_addr (f, " @", (void *)this);
>fprintf (f, "\n  Type: %s", symtab_type_names[type]);
>  
>if (definition)
> -- 
> 2.37.1
> 


How to check -std=c89 or -std=gnu89 is set in C FE?

2022-08-02 Thread Qing Zhao via Gcc-patches
Hi, Joseph,

When -std=c89 or -std=gnu89 present in the command line, in C FE, which flags 
should be
checked to decide it’s -std=c89 or -std=gnu89?

Thanks a lot for your help.

Qing

Re: [PATCH v2] cselib: add function to check if SET is redundant [PR106187]

2022-08-02 Thread Richard Earnshaw via Gcc-patches




On 01/08/2022 11:38, Richard Earnshaw via Gcc-patches wrote:



On 30/07/2022 20:57, Jeff Law via Gcc-patches wrote:



On 7/29/2022 7:52 AM, Richard Earnshaw via Gcc-patches wrote:
A SET operation that writes memory may have the same value as an 
earlier store but if the alias sets of the new and earlier store do 
not conflict then the set is not truly redundant.  This can happen, 
for example, if objects of different types share a stack slot.


To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler. Firstly
in cfgcleanup and secondly in postreload.

gcc/ChangeLog:
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.
Seems quite reasonable.   The only question I would have would be 
whether or not you considered including the aliasing info into the 
hashing used by cselib.  You'd probably still need the bulk of this 
patch as well since we could presumably still get a hash conflict with 
two stores of the same value to the same location, but with different 
alias sets (it's just much less likely), so perhaps it doesn't really 
buy us anything.


I thought about this, but if the alias set were included in the hash, 
then surely you'd get every alias set in a different value.  Then you'd 
miss the cases where the alias sets do conflict even though they are not 
the same.  Anyway, the values /are/ the same so in some circumstances 
you might want to know that.




Ideally this would include a testcase.  You might be able to turn that 
non-executawble reduced case into something useful by scanning the 
post-reload dumps.


I considered this as well, but the testcase I have is far too fragile, I 
think.  The existing test only fails on Arm, only fails on 11.2 (not 
11.3 or gcc-12 onwards), relies on two objects with the same value being 
in distinct alias sets but still assigned to the same stack slot and for 
some copy dance to end up trying to write back the original value to the 
same slot but with a non-conflicting set.  And finally, the scheduler 
has to then try to move a load past the non-aliasing store.





To get anywhere close to this I think we'd need something akin to the 
gimple reader but for RTL so that we could set up all the conditions for 
the failure without the risk of an earlier transform blowing the test away.


I wasn't aware of the rtl reader already in the compiler.  But it 
doesn't really get me any closer as it is lacking in so many regards:


- It can't handle (const_double:SF ...) - it tries to handle the 
argument as an int.  This is a consequence, I think, of the reader being 
based on that for reading machine descriptions where FP const_double is 
simply never encountered.


- It doesn't seem to handle anything much more than very basic types, 
and in particular appears to have no way of ensuring that alias sets 
match up with the type system.




I even considered whether we could start with a gimple dump and 
bypassing all the tree/gimple transformations, but even that would be 
still at the mercy of the stack-slot allocation algorithm.


I spent a while trying to get some gimple out of the dumpers in a form 
that was usable, but that's pretty much a non-starter.  To make it work 
we'd need to add support for gimple clobbers on objects - without that 
there's no way to get the stack-slot sharing code to work.  Furthermore, 
even feeding fully-optimized gimple directly into expand is such a long 
way from the postreload pass, that I can't believe the testcase would 
remain stable for long.


And the other major issue is that the original testcase is heavily 
templated C++ and neither of the parsers gimple or rtl is supported in 
cc1plus: converting the boilerplate to be C-friendly is probably going 
to be hard.


I can't afford to spend much more time on this, especially given the 
low-quality test we're going to get out of the end of the process.






Jeff


R.


R.


Re: [PATCH] btf: do not use the CHAR `encoding' bit for BTF

2022-08-02 Thread David Faust via Gcc-patches



On 8/2/22 08:42, Jose E. Marchesi wrote:
> 
>> On 7/26/22 14:58, Indu Bhagat wrote:
>>> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:

 Contrary to CTF and our previous expectations, as per [1], turns out
 that in BTF:

 1) The `encoding' field in integer types shall not be treated as a
 bitmap, but as an enumerated, i.e. these bits are exclusive to each
 other.

 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
 for char nor `unsigned char'.

>>>
>>> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
>>> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
>>> should not be set for char / unsigned char).
>>
>> Agreed it would be good to add this note.
> 
> Hmm, I am not sure such a comment actually belongs to include/btf.h,
> which is not specific to the compiler and is supposed to reflect the BTF
> format per-se.  The CHAR bit is documented in the kernel documentation
> and it may be used at some point by bpflib, or who knows what.

OK you make a good point.

In that case the patch LGTM to commit. Thanks!

> 
> That's why I put the comment in btfout.cc instead, to make it clear that
> BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
> compiler:
> 
 +  /* In BTF the CHAR `encoding' seems to not be used, so clear it
 + here.  */
 +  dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;


Re: [PATCH] btf: do not use the CHAR `encoding' bit for BTF

2022-08-02 Thread Jose E. Marchesi via Gcc-patches


> On 7/26/22 14:58, Indu Bhagat wrote:
>> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>>>
>>> Contrary to CTF and our previous expectations, as per [1], turns out
>>> that in BTF:
>>>
>>> 1) The `encoding' field in integer types shall not be treated as a
>>> bitmap, but as an enumerated, i.e. these bits are exclusive to each
>>> other.
>>>
>>> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>>> for char nor `unsigned char'.
>>>
>> 
>> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
>> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
>> should not be set for char / unsigned char).
>
> Agreed it would be good to add this note.

Hmm, I am not sure such a comment actually belongs to include/btf.h,
which is not specific to the compiler and is supposed to reflect the BTF
format per-se.  The CHAR bit is documented in the kernel documentation
and it may be used at some point by bpflib, or who knows what.

That's why I put the comment in btfout.cc instead, to make it clear that
BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
compiler:

>>> +  /* In BTF the CHAR `encoding' seems to not be used, so clear it
>>> + here.  */
>>> +  dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;


Re: [GCC13][Patch][V2][2/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-08-02 Thread Qing Zhao via Gcc-patches
Thanks a lot for your testing on Linux Kernel.
Will work on the version 3 of this patch soon.

Qing

> On Aug 2, 2022, at 11:30 AM, Kees Cook  wrote:
> 
> On Tue, Jul 19, 2022 at 02:11:19PM +, Qing Zhao wrote:
>> From a09f39ded462611286a44d9e8273de8342673ba2 Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 18:12:26 +
>> Subject: [PATCH 2/2] Use new flag DECL_NOT_FLEXARRAY in __builtin_object_size
>> [PR101836]
>> 
>> Use new flag DECL_NOT_FLEXARRAY to determine whether the trailing array
>> of a structure is flexible array member in __builtin_object_size.
> 
> FWIW, with these patches I've done builds of the Linux kernel using
> the ...=3 level, and things are looking correct on our end. (There are,
> as expected, many places in Linux that need to be fixed, and that work
> is on-going, guided by this option's results.)
> 
> -Kees
> 
> -- 
> Kees Cook



Re: [GCC13][Patch][V2][2/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-08-02 Thread Kees Cook via Gcc-patches
On Tue, Jul 19, 2022 at 02:11:19PM +, Qing Zhao wrote:
> From a09f39ded462611286a44d9e8273de8342673ba2 Mon Sep 17 00:00:00 2001
> From: Qing Zhao 
> Date: Mon, 18 Jul 2022 18:12:26 +
> Subject: [PATCH 2/2] Use new flag DECL_NOT_FLEXARRAY in __builtin_object_size
> [PR101836]
> 
> Use new flag DECL_NOT_FLEXARRAY to determine whether the trailing array
> of a structure is flexible array member in __builtin_object_size.

FWIW, with these patches I've done builds of the Linux kernel using
the ...=3 level, and things are looking correct on our end. (There are,
as expected, many places in Linux that need to be fixed, and that work
is on-going, guided by this option's results.)

-Kees

-- 
Kees Cook


[x86 PATCH] PR target/47949: Use xchg to move from/to AX_REG with -Oz.

2022-08-02 Thread Roger Sayle

This patch adds a peephole2 to i386.md to implement the suggestion in
PR target/47949, of using xchg instead of mov for moving values to/from
the %rax/%eax register, controlled by -Oz, as the xchg instruction is
one byte shorter than the move it is replacing.

The new test case is taken from the PR:
int foo(int x) { return x; }

where previously we'd generate:
foo:mov %edi,%eax  // 2 bytes
ret

but with this patch, using -Oz, we generate:
foo:xchg %eax,%edi  // 1 byte
ret

On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).

Interestingly, some modern architectures (such as Zen 3) implement
xchg using zero latency register renaming (just like mov), so in theory
this transformation could be enabled when optimizing for speed, if
benchmarking shows the improved code density produces consistently
better performance.  However, this is architecture dependent, and
there may be interactions using xchg (instead a single_set) in the
late RTL passes (such as cprop_hardreg), so for now I've restricted
this to -Oz.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-02  Roger Sayle  

gcc/ChangeLog
PR target/47949
* config/i386/i386.md (peephole2): New peephole2 to convert
SWI48 moves to/from %rax/%eax where the src is dead to xchg,
when optimizing for minimal size with -Oz.

gcc/testsuite/ChangeLog
PR target/47949
* gcc.target/i386/pr47949.c: New test case.


Thanks,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f1158e1..11629ce 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3018,6 +3018,18 @@
   [(parallel [(set (match_dup 1) (match_dup 2))
  (set (match_dup 2) (match_dup 1))])])
 
+;; Convert moves to/from AX_REG into xchg with -Oz.
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (match_operand:SWI48 1 "general_reg_operand"))]
+ "optimize_size > 1
+  && (REGNO (operands[0]) == AX_REG
+  || REGNO (operands[1]) == AX_REG)
+  && optimize_insn_for_size_p ()
+  && peep2_reg_dead_p (1, operands[1])"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 1) (match_dup 0))])])
+
 (define_expand "movstrict"
   [(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
(match_operand:SWI12 1 "general_operand"))]
diff --git a/gcc/testsuite/gcc.target/i386/pr47949.c 
b/gcc/testsuite/gcc.target/i386/pr47949.c
new file mode 100644
index 000..7a58fa4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr47949.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Oz" } */
+
+int foo(int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xchg" } } */


Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-02 Thread Qing Zhao via Gcc-patches
Hi, Nathan,

I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
(gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays. 


diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ea9f281f1cc..458c6e6ceea 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
 TYPE_WARN_IF_NOT_ALIGN.  */
  unsigned int warn_if_not_align : 6;

-  /* 14 bits unused.  */
+  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
+  unsigned int decl_not_flexarray : 1;
+
+  /* 13 bits unused.  */

  /* UID for points-to sets, stable over copying from inlining.  */
  unsigned int pt_uid;


(Please refer to the following for details:

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598556.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598965.html
)

Richard mentioned the following:

"I've not seen it so you are probably missing it - the bit has to be
streamed in tree-streamer-{in,out}.cc to be usable from LTO.  Possibly
C++ module streaming also needs to handle it.”

I have figured out that where to add the handling of the bit in 
“tree-streamer-{in, out}.cc,
However, it’s quite difficult for me to locate where should I add the handling 
of this new bit in 
C++ module streaming,  could you please help me on this?

Thanks a lot for your help.

Qing

Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Aldy Hernandez via Gcc-patches
Feel free to blame me for everything except the profitability code and the
generic block copier. That stuff was all there before and I mostly avoided
it.

:-)

Thanks for the work in this space.
Aldy

On Tue, Aug 2, 2022, 15:29 Richard Biener  wrote:

> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 2, 2022 at 1:59 PM Richard Biener  wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener 
> wrote:
> > > > >
> > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > > > >
> > > > > > Unfortunately, this was before my time, so I don't know.
> > > > > >
> > > > > > That being said, thanks for tackling these issues that my work
> > > > > > triggered last release.  Much appreciated.
> > > > >
> > > > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > > > >
> > > > > -  else if (n_insns > 1)
> > > > > +  else if (!m_speed_p && n_insns > 1)
> > > > >
> > > > > causing the breakage on the 12 branch.  That leads to a simpler
> > > > > fix I guess.  Will re-test and also backport to GCC 12 if
> successful.
> > > >
> > > > Huh.  It's been a while, but that looks like a typo.  That patch was
> > > > supposed to be non-behavior changing.
> > >
> > > Exactly my thinking so reverting it shouldn't be a reason for
> > > detailed questions.  Now, the contains_hot_bb computation is,
> >
> > Sorry for the pain.
>
> So - actually the change was probably done on purpose (even if
> reverting - which I've now already one - caused no testsuite regressions).
> That's because the whole function is invoked N + 1 times for a path
> of length N and we definitely want to avoid using the size optimization
> heuristics when the path is not complete yet.  I think the proper
> way is to do
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index ba114e98a41..6979398ef76 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const
> vec _path,
>   as in PR 78407 this leads to noticeable improvements.  */
>if (m_speed_p
>&& ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> - || contains_hot_bb))
> + || contains_hot_bb
> + /* Avoid using the size heuristics when not doing the final
> +thread evaluation, we get called for each added BB
> +to the path.  */
> + || !taken_edge))
>  {
>if (n_insns >= param_max_fsm_thread_path_insns)
> {
>
> thus assume there'll be a hot BB in the future.
>
> That said, the very best fix would be to not call this function
> N + 1 times (I have a patch to call it only N times - yay), but
> instead factor out parts to be called per BB plus keeping enough
> state so we can incrementally collect info.
>
> There's more "odd" things in the backward threader, of course :/
>
> I'm looking for things applicable to the GCC 12 branch right now
> so will try the above.
>
> Richard.
>
>


Re: [GCC13][Patch][V2][1/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-08-02 Thread Qing Zhao via Gcc-patches


> On Aug 2, 2022, at 3:03 AM, Richard Biener  wrote:
> 
> On Mon, 1 Aug 2022, Qing Zhao wrote:
> 
>> 
>> 
>>> On Aug 1, 2022, at 3:38 AM, Richard Biener  wrote:
>>> 
>>> On Fri, 29 Jul 2022, Qing Zhao wrote:
>>> 
 Hi, Richard,
 
 Thanks a lot for your comments and suggestions. (And sorry for my late 
 reply).
 
> On Jul 28, 2022, at 3:26 AM, Richard Biener  wrote:
> 
> On Tue, 19 Jul 2022, Qing Zhao wrote:
> 
>> From 3854004802b8e2f132ebf218fc35a632f5e80c6a Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 17:04:12 +
>> Subject: [PATCH 1/2] Add a new option -fstrict-flex-array[=n] and new
>> attribute strict_flex_array
>> 
>> Add the following new option -fstrict-flex-array[=n] and a corresponding
>> attribute strict_flex_array to GCC:
>> 
>> '-fstrict-flex-array'
>>  Treat the trailing array of a structure as a flexible array member
>>  in a stricter way.  The positive form is equivalent to
>>  '-fstrict-flex-array=3', which is the strictest.  A trailing array
>>  is treated as a flexible array member only when it is declared as a
>>  flexible array member per C99 standard onwards.  The negative form
>>  is equivalent to '-fstrict-flex-array=0', which is the least
>>  strict.  All trailing arrays of structures are treated as flexible
>>  array members.
>> 
>> '-fstrict-flex-array=LEVEL'
>>  Treat the trailing array of a structure as a flexible array member
>>  in a stricter way.  The value of LEVEL controls the level of
>>  strictness.
>> 
>>  The possible values of LEVEL are the same as for the
>>  'strict_flex_array' attribute (*note Variable Attributes::).
>> 
>>  You can control this behavior for a specific trailing array field
>>  of a structure by using the variable attribute 'strict_flex_array'
>>  attribute (*note Variable Attributes::).
>> 
>> 'strict_flex_array (LEVEL)'
>>  The 'strict_flex_array' attribute should be attached to the
>>  trailing array field of a structure.  It specifies the level of
>>  strictness of treating the trailing array field of a structure as a
>>  flexible array member.  LEVEL must be an integer betwen 0 to 3.
>> 
>>  LEVEL=0 is the least strict level, all trailing arrays of
>>  structures are treated as flexible array members.  LEVEL=3 is the
>>  strictest level, only when the trailing array is declared as a
>>  flexible array member per C99 standard onwards ([]), it is treated
>>  as a flexible array member.
>> 
>>  There are two more levels in between 0 and 3, which are provided to
>>  support older codes that use GCC zero-length array extension ([0])
>>  or one-size array as flexible array member ([1]): When LEVEL is 1,
>>  the trailing array is treated as a flexible array member when it is
>>  declared as either [], [0], or [1]; When LEVEL is 2, the trailing
>>  array is treated as a flexible array member when it is declared as
>>  either [], or [0].
>> 
>>  This attribute can be used with or without '-fstrict-flex-array'.
>>  When both the attribute and the option present at the same time,
>>  the level of the strictness for the specific trailing array field
>>  is determined by the attribute.
>> 
>> gcc/c-family/ChangeLog:
>> 
>>  * c-attribs.cc (handle_strict_flex_array_attribute): New function.
>>  (c_common_attribute_table): New item for strict_flex_array.
>>  * c.opt (fstrict-flex-array): New option.
>>  (fstrict-flex-array=): New option.
>> 
>> gcc/c/ChangeLog:
>> 
>>  * c-decl.cc (add_flexible_array_elts_to_size): Call new utility
>>  routine flexible_array_member_p.
>>  (is_flexible_array_member_p): New function.
>>  (finish_struct): Set the new DECL_NOT_FLEXARRAY flag.
>> 
>> gcc/ChangeLog:
>> 
>>  * doc/extend.texi: Document strict_flex_array attribute.
>>  * doc/invoke.texi: Document -fstrict-flex-array[=n] option.
>>  * tree-core.h (struct tree_decl_common): New bit field
>>  decl_not_flexarray.
>>  * tree.cc (component_ref_size): Reorg by using new utility functions.
>>  (flexible_array_member_p): New function.
>>  (zero_length_array_p): Likewise.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_type_p): Likewise.
>>  * tree.h (DECL_NOT_FLEXARRAY): New flag.
>>  (zero_length_array_p): New function prototype.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_member_p): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/strict-flex-array-1.c: New test.
>> ---
>> gcc/c-family/c-attribs.cc  |  47 
>> gcc/c-family/c.opt |   7 ++
>> gcc/c/c-decl.cc|  91 +--
>> gcc/doc/extend.texi|  25 

Re: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-08-02 Thread Richard Earnshaw via Gcc-patches




On 13/06/2022 15:33, Richard Sandiford via Gcc-patches wrote:

On aarch64, --with-arch, --with-cpu and --with-tune only have an
effect on the driver, so “./xgcc -B./ -O3” can give significantly
different results from “./cc1 -O3”.  --with-arch did have a limited
effect on ./cc1 in previous releases, although it didn't work
entirely correctly.

Being of a lazy persuasion, I've got used to ./cc1 selecting SVE for
--with-arch=armv8.2-a+sve without having to supply an explicit -march,
so this patch makes ./cc1 emulate the relevant OPTION_DEFAULT_SPECS.
It relies on Wilco's earlier clean-ups.

The patch makes config.gcc define WITH_FOO_STRING macros for each
supported --with-foo option.  This could be done only in aarch64-
specific code, but I thought it could be useful on other targets
too (and can be safely ignored otherwise).  There didn't seem to
be any existing and potentially clashing uses of macros with this
style of name.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for the configure
bits?

Richard


gcc/
* config.gcc: Define WITH_FOO_STRING macros for each supported
--with-foo option.
* config/aarch64/aarch64.cc (aarch64_override_options): Emulate
OPTION_DEFAULT_SPECS.
* config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Reference the above.
---
  gcc/config.gcc| 14 ++
  gcc/config/aarch64/aarch64.cc |  8 
  gcc/config/aarch64/aarch64.h  |  5 -
  3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index cdbefb5b4f5..e039230431c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5865,6 +5865,20 @@ else
configure_default_options="{ ${t} }"
  fi
  
+for option in $supported_defaults

+do
+   lc_option=`echo $option | sed s/-/_/g`
+   uc_option=`echo $lc_option | tr a-z A-Z`
+   eval "val=\$with_$lc_option"
+   if test -n "$val"
+   then
+   val="\\\"$val\\\""
+   else
+   val=nullptr
+   fi
+   tm_defines="$tm_defines WITH_${uc_option}_STRING=$val"
+done


This bit would really be best reviewed by a non-arm maintainer.  It 
generally looks OK.  My only comment would be why define anything if the 
corresponding --with-foo was not specified.  They you can use #ifdef to 
test if the user specified a default.


R.


+
  if test "$target_cpu_default2" != ""
  then
if test "$target_cpu_default" != ""
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d21e041eccb..0bc700b81ad 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18109,6 +18109,14 @@ aarch64_override_options (void)
if (aarch64_branch_protection_string)
  aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
  
+  /* Emulate OPTION_DEFAULT_SPECS.  */

+  if (!aarch64_arch_string && !aarch64_cpu_string)
+aarch64_arch_string = WITH_ARCH_STRING;
+  if (!aarch64_arch_string && !aarch64_cpu_string)
+aarch64_cpu_string = WITH_CPU_STRING;
+  if (!aarch64_cpu_string && !aarch64_tune_string)
+aarch64_tune_string = WITH_TUNE_STRING;
+
/* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
   If either of -march or -mtune is given, they override their
   respective component of -mcpu.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 80cfe4b7407..3122dbd7098 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1267,7 +1267,10 @@ extern enum aarch64_code_model aarch64_cmodel;
  /* Support for configure-time --with-arch, --with-cpu and --with-tune.
 --with-arch and --with-cpu are ignored if either -mcpu or -march is used.
 --with-tune is ignored if either -mtune or -mcpu is used (but is not
-   affected by -march).  */
+   affected by -march).
+
+   There is corresponding code in aarch64_override_options that emulates
+   this behavior when cc1  are invoked directly.  */
  #define OPTION_DEFAULT_SPECS  \
{"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" },\
{"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },   \


Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> Many thanks to Segher and Richard for pointing out that my removal
> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
> of this patch was incorrect, and my assumption that these would be 
> subsumed by val_signbit_known_clear_p was mistaken.  That the
> tests for ABS and FFS looked out of place, was not an indication that
> they were not required, but that we were missing simplifications for
> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
> To make up for this mistake, in this revised patch I've not only restored
> the tests for ABS and FFS, but also added the many sibling RTX codes
> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  
> Segher Boessenkool  
> Richard Sandiford  

Thanks, but I didn't actually write anything. :-)

> gcc/ChangeLog
> * simplify_rtx.cc (simplify_unary_operation_1) : Add
> optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
> and LSHIFTRT that are all positive to complement the existing
> FFS and (idempotent) ABS simplifications.
> : Canonicalize SIGN_EXTEND to ZERO_EXTEND when
> val_signbit_known_clear_p is true of the operand.
> Simplify sign extensions of SUBREG truncations of operands
> that are already suitably (zero) extended.
> : Simplify zero extensions of SUBREG truncations
> of operands that are already suitably zero extended.
>
> Thanks in advance,
> Roger
> --
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: 02 August 2022 10:39
>> To: Roger Sayle 
>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>> 
>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
> in
>> simplify-rtx.
>> 
>> "Roger Sayle"  writes:
>> > This patch implements some additional zero-extension and
>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>> Andrew Pinski sees:
>> >
>> > Failed to match this instruction:
>> > (set (reg:DI 88 [ _1 ])
>> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>> >
>> > On many platforms the result of DImode CTZ is constrained to be a
>> > small unsigned integer (between 0 and 64), hence the truncation to
>> > 32-bits (using a SUBREG) and the following sign extension back to
>> > 64-bits are effectively a no-op, so the above should ideally (often)
>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>> >
>> > To implement this, and some closely related transformations, we build
>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>> > can itself be simplified.
>> 
>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
> minimum
>> integer (giving back the minimum integer), so we can't assume that ABS
> leaves
>> the sign bit clear.
>> 
>> Thanks,
>> Richard
>> 
>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>> > sign-bit is known to be clear.  The final two chunks are for
>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>> > pessimistically thinks that the upper bits may have an arbitrary value
>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>> > to confirm that the high bits are known to be zero.
>> >
>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>> > default architecture options is undefined at zero, so we can't be sure
>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
> ones).
>> > nonzero_bits knows this, so the above transformations don't trigger,
>> > but the transformations themselves are perfectly valid for other
>> > operations such as FFS, POPCOUNT and PARITY, and on other
>> > targets/-march settings where CTZ is defined at zero.
>> >
>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> > and make -k check, both with and without --target_board=unix{-m32},
>> > with no new failures.  Testing with CSiBE shows these transformations
>> > trigger on several source files (and with -Os reduces the size of the
>> > code).  Ok for mainline?
>> >
>> >
>> > 2022-07-27  Roger Sayle  
>> >
>> > gcc/ChangeLog
>> > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify
>> > test as both FFS and ABS result in nonzero_bits returning a
>> > mask that satisfies val_signbit_known_clear_p.
>> >  

Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 2, 2022 at 1:59 PM Richard Biener  wrote:
> >
> > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener  wrote:
> > > >
> > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > > >
> > > > > Unfortunately, this was before my time, so I don't know.
> > > > >
> > > > > That being said, thanks for tackling these issues that my work
> > > > > triggered last release.  Much appreciated.
> > > >
> > > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > > >
> > > > -  else if (n_insns > 1)
> > > > +  else if (!m_speed_p && n_insns > 1)
> > > >
> > > > causing the breakage on the 12 branch.  That leads to a simpler
> > > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> > >
> > > Huh.  It's been a while, but that looks like a typo.  That patch was
> > > supposed to be non-behavior changing.
> >
> > Exactly my thinking so reverting it shouldn't be a reason for
> > detailed questions.  Now, the contains_hot_bb computation is,
> 
> Sorry for the pain.

So - actually the change was probably done on purpose (even if
reverting - which I've now already one - caused no testsuite regressions).
That's because the whole function is invoked N + 1 times for a path
of length N and we definitely want to avoid using the size optimization
heuristics when the path is not complete yet.  I think the proper
way is to do

diff --git a/gcc/tree-ssa-threadbackward.cc 
b/gcc/tree-ssa-threadbackward.cc
index ba114e98a41..6979398ef76 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const 
vec _path,
  as in PR 78407 this leads to noticeable improvements.  */
   if (m_speed_p
   && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
- || contains_hot_bb))
+ || contains_hot_bb
+ /* Avoid using the size heuristics when not doing the final
+thread evaluation, we get called for each added BB
+to the path.  */
+ || !taken_edge))
 {
   if (n_insns >= param_max_fsm_thread_path_insns)
{

thus assume there'll be a hot BB in the future.

That said, the very best fix would be to not call this function
N + 1 times (I have a patch to call it only N times - yay), but
instead factor out parts to be called per BB plus keeping enough
state so we can incrementally collect info.

There's more "odd" things in the backward threader, of course :/

I'm looking for things applicable to the GCC 12 branch right now
so will try the above.

Richard.


[PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Richard Biener via Gcc-patches
I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
 in a case we separate cold path from hot path and permit optimization
 of the hot path later.  Be on the agressive side here. In some testcases,
 as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
  && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
  || contains_hot_bb))
{
  if (n_insns >= param_max_fsm_thread_path_insns)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
 "the number of instructions on the path "
 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
  return false;
}
...
}
  else if (!m_speed_p && n_insns > 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
 "duplication of %i insns is needed and optimizing for size.\n",
 n_insns);
  return false;
}
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

This was caused by r12-324-g69e5544210e3c0 and the following simply
reverts the offending change.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-threadbackwards.cc
(back_threader_profitability::profitable_path_p): Apply
size constraints to all paths again.
---
 gcc/tree-ssa-threadbackward.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..ba114e98a41 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -794,7 +794,7 @@ back_threader_profitability::profitable_path_p (const 
vec _path,
  return false;
}
 }
-  else if (!m_speed_p && n_insns > 1)
+  else if (n_insns > 1)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-- 
2.35.3


Re: [RFA] Implement basic range operators to enable floating point VRP.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
Alright, I know everyon'es pretty busy and I am blocking myself here,
so I hereby approve my own patch, since I am the maintainer ;-).

After all, I'd like to have 5-6 days to field any possible fallout
before I disappear off to a tropical island to drink mojitos and beer.
Ok, not really-- just changing diapers, but without a laptop.

Feedback still welcome; I'll just address it as a follow-up.

Pushed.
Aldy

On Sun, Jul 31, 2022 at 8:11 PM Aldy Hernandez  wrote:
>
> PING
>
> Andrew, anyone, would you mind giving this a once over?  I realize
> reviewing ranger's range-op code is not on anyone's list of
> priorities, but I could use a sanity check.
>
> The patch is sufficiently self-contained to easily catch anything
> caused by it, and I'd like to commit earlier in the week to have
> enough time to field any possible fallout before I take a few days off
> next week.
>
> Updated patch attached.
>
> Thanks.
> Aldy
>
> On Mon, Jul 25, 2022 at 8:50 PM Aldy Hernandez  wrote:
> >
> > Without further ado, here is the implementation for floating point
> > range operators, plus the switch to enable all ranger clients to
> > handle floats.
> >
> > These are bare bone implementations good enough for relation operators
> > to work, while keeping the NAN bits up to date in the frange.  There
> > is also minimal support for keeping track of +-INF when it is obvious.
> >
> > I have included some basic tests to help get a feel of what is
> > ultimately handled.
> >
> > Since range-ops is the domain specific core of ranger, I think its
> > best if a global maintainer or an FP expert could review this.
> >
> > OK for trunk?
> >
> > Tested on x86-64 Linux.
> >
> > p.s. I haven't done extensive testing in DOM, but with this we're mighty
> > close for the forward threader there to be replaceable with the backward
> > threader, thus removing the last use of the forward threader.
> >
> > gcc/ChangeLog:
> >
> > * range-op-float.cc (finite_operands_p): New.
> > (frelop_early_resolve): New.
> > (default_frelop_fold_range): New.
> > (class foperator_equal): New.
> > (class foperator_not_equal): New.
> > (class foperator_lt): New.
> > (class foperator_le): New.
> > (class foperator_gt): New.
> > (class foperator_ge): New.
> > (class foperator_unordered): New.
> > (class foperator_ordered): New.
> > (class foperator_relop_unknown): New.
> > (floating_op_table::floating_op_table): Add above classes to
> > floating op table.
> > * value-range.h (frange::supports_p): Enable.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/opt/pr94589-2.C: Add notes.
> > * gcc.dg/tree-ssa/vrp-float-1.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-11.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-3.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-4.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-6.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-7.c: New test.
> > * gcc.dg/tree-ssa/vrp-float-8.c: New test.
> > ---
> >  gcc/range-op-float.cc| 564 +++
> >  gcc/testsuite/g++.dg/opt/pr94589-2.C |  25 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c  |  19 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-11.c |  26 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3.c  |  18 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4.c  |  16 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-6.c  |  20 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-7.c  |  14 +
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-8.c  |  26 +
> >  gcc/value-range.h|   3 +-
> >  10 files changed, 729 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-11.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-6.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-7.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-8.c
> >
> > diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> > index 8e9d83e3827..d94ff6f915a 100644
> > --- a/gcc/range-op-float.cc
> > +++ b/gcc/range-op-float.cc
> > @@ -150,6 +150,50 @@ range_operator_float::op1_op2_relation (const irange 
> >  ATTRIBUTE_UNUSED) cons
> >return VREL_VARYING;
> >  }
> >
> > +// Return TRUE if OP1 and OP2 are known to be free of NANs.
> > +
> > +static inline bool
> > +finite_operands_p (const frange , const frange )
> > +{
> > +  return (flag_finite_math_only
> > + || (op1.get_nan ().no_p ()
> > + && op2.get_nan ().no_p ()));
> > +}
> > +
> > +// Floating version of relop_early_resolve that takes into account NAN
> > +// and -ffinite-math-only.
> > +
> > +inline bool
> > +frelop_early_resolve (irange 

Re: Porting the Docs to Sphinx - project status

2022-08-02 Thread Martin Liška
On 1/31/22 15:06, Martin Liška wrote:
> Hello.
> 
> It's about 5 months since the last project status update:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577108.html
> Now it's pretty clear that it won't be merged before GCC 12.1 gets released.
> 
> So where we are? I contacted documentation maintainers (Gerald, Sandra and 
> Joseph) at the
> end of the year in a private email, where I pinged the patches. My take away 
> is that both
> Gerald and Joseph are fine with the porting, while Sandra has some concerns. 
> Based on her
> feedback, I was able to improve the PDF generated output significantly and 
> I'm pleased by the
> provided feedback. That led to the following 2 Sphinx pulls requests that 
> need to be merged
> before we can migrate the documentation: [1], [2].
> 
> Since the last time I also made one more round of proofreading and the layout 
> was improved
> (mainly for PDF part). Current version of the documentation can be seen here:
> https://splichal.eu/scripts/sphinx/
> 
> I would like to finish the transition once GCC 12.1 gets released in May/June 
> this year.
> There are still some minor regressions, but overall the Sphinx-based 
> documentation should
> be a significant improvement over what we've got right now.
> 
> Please take this email as urgent call for a feedback!
> 
> Thank you,
> Martin
> 
> [1] https://github.com/sphinx-doc/sphinx/pull/10087
> [2] https://github.com/sphinx-doc/sphinx/pull/10001

Hello.

It's been another 5 months since the last project status update. In order to 
address some Sandra comments
I had to work with upstream in order to merge a few pull requests. That has 
been achieved and Sphinx 5.1.0
can built the converted documentation.

What has changed since the last time:
1) I made a couple of proof-reading rounds and the documentation should not 
contain any significant defects.
2) I used a more responsive HTML template Furo that works nicely on mobile 
devices as well.
3) Number of Sphinx warnings has been reduced.
4) configure script supports not --with-sphinx-build ([1]) that can be used 
when Sphinx is installed into pip env.
5) Building Documentation chapter was added to gccint manual ([2]).
6) I made a couple of screenshot that show up how is the docs better ([3], 
Chapter 1).

I'm not planning sending a patchset as most of the patches would not fit in the 
email limit, so please fetch the following
branch (the last 2 fixme commits will not be installed):

$ git fetch origin refs/users/marxin/heads/sphinx-v7
$ git checkout FETCH_HEAD

The generated documentation (built from GCC source based on the git branch) can 
be seen here:
https://splichal.eu/gccsphinx-final/

I would like to merge the documentation during the summer if possible before a 
bigger changes will land in fall.

Thank you,
Martin

[1] 
https://splichal.eu/gccsphinx-final/html/install/configuration.html#cmdoption-with-sphinx-build
[2] 
https://splichal.eu/gccsphinx-final/html/gccint/source-tree-structure-and-build-system/the-gcc-subdirectory/building-documentation.html
[3] https://splichal.eu/scripts/sphinx/demo/_build/latex/demo.pdf


[PATCH] Implement streamer for frange.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
This patch Allows us to export floating point ranges into the SSA name
(SSA_NAME_RANGE_INFO).

[Richi, in PR24021 you suggested that match.pd could use global float
ranges, because it would generally not invoke ranger.  This patch
implements the boiler plate to save the frange globally.]

[Jeff, we've also been talking in parallel of using NAN knowledge
during expansion to RTL.  This patch will provide the NAN bits in the
SSA name.]

Since frange's currently implementation is just a shell, with no
actual endpoints, frange_storage_slot only contains frange_props which
fits inside a byte.  When we have endpoints, y'all can decide if it's
worth saving them, or if the NAN/etc bits are good enough.

I've tested this on my branch which has FP-ranger enabled.  I will 
commit once I do a final round of testing on vanilla trunk.

gcc/ChangeLog:

* tree-core.h (struct tree_ssa_name): Add frange_info and
reshuffle the rest.
* value-range-storage.cc (vrange_storage::alloc_slot): Add case
for frange.
(vrange_storage::set_vrange): Same.
(vrange_storage::get_vrange): Same.
(vrange_storage::fits_p): Same.
(frange_storage_slot::alloc_slot): New.
(frange_storage_slot::set_frange): New.
(frange_storage_slot::get_frange): New.
(frange_storage_slot::fits_p): New.
* value-range-storage.h (class frange_storage_slot): New.
---
 gcc/tree-core.h| 12 
 gcc/value-range-storage.cc | 61 +-
 gcc/value-range-storage.h  | 19 
 3 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ea9f281f1cc..86a07c282af 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1589,17 +1589,17 @@ struct GTY(()) tree_ssa_name {
 
   /* Value range information.  */
   union ssa_name_info_type {
+/* Ranges for integers.  */
+struct GTY ((tag ("0"))) irange_storage_slot *irange_info;
+/* Ranges for floating point numbers.  */
+struct GTY ((tag ("1"))) frange_storage_slot *frange_info;
 /* Pointer attributes used for alias analysis.  */
-struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
+struct GTY ((tag ("2"))) ptr_info_def *ptr_info;
 /* This holds any range info supported by ranger (except ptr_info
above) and is managed by vrange_storage.  */
 void * GTY ((skip)) range_info;
-/* GTY tag when the range in the range_info slot above satisfies
-   irange::supports_type_p.  */
-struct GTY ((tag ("1"))) irange_storage_slot *irange_info;
   } GTY ((desc ("%1.typed.type ?" \
-   "!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;
-
+   "(POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) ? 2 : 
SCALAR_FLOAT_TYPE_P (TREE_TYPE ((tree)&%1))) : 3"))) info;
   /* Immediate uses list for this SSA_NAME.  */
   struct ssa_use_operand_t imm_uses;
 };
diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc
index 8b5ab544ce3..ea3b83ca641 100644
--- a/gcc/value-range-storage.cc
+++ b/gcc/value-range-storage.cc
@@ -40,7 +40,8 @@ vrange_storage::alloc_slot (const vrange )
 
   if (is_a  (r))
 return irange_storage_slot::alloc_slot (*m_alloc, as_a  (r));
-
+  if (is_a  (r))
+return frange_storage_slot::alloc_slot (*m_alloc, as_a  (r));
   return NULL;
 }
 
@@ -55,6 +56,12 @@ vrange_storage::set_vrange (void *slot, const vrange )
   gcc_checking_assert (s->fits_p (as_a  (r)));
   s->set_irange (as_a  (r));
 }
+  else if (is_a  (r))
+{
+  frange_storage_slot *s = static_cast  (slot);
+  gcc_checking_assert (s->fits_p (as_a  (r)));
+  s->set_frange (as_a  (r));
+}
   else
 gcc_unreachable ();
 }
@@ -70,6 +77,12 @@ vrange_storage::get_vrange (const void *slot, vrange , 
tree type)
= static_cast  (slot);
   s->get_irange (as_a  (r), type);
 }
+  else if (is_a  (r))
+{
+  const frange_storage_slot *s
+   = static_cast  (slot);
+  s->get_frange (as_a  (r), type);
+}
   else
 gcc_unreachable ();
 }
@@ -85,6 +98,12 @@ vrange_storage::fits_p (const void *slot, const vrange )
= static_cast  (slot);
   return s->fits_p (as_a  (r));
 }
+  if (is_a  (r))
+{
+  const frange_storage_slot *s
+   = static_cast  (slot);
+  return s->fits_p (as_a  (r));
+}
   gcc_unreachable ();
   return false;
 }
@@ -215,3 +234,43 @@ debug (const irange_storage_slot )
   storage.dump ();
   fprintf (stderr, "\n");
 }
+
+// Implementation of frange_storage_slot.
+
+frange_storage_slot *
+frange_storage_slot::alloc_slot (vrange_allocator , const frange )
+{
+  size_t size = sizeof (frange_storage_slot);
+  frange_storage_slot *p
+= static_cast  (allocator.alloc (size));
+  new (p) frange_storage_slot (r);
+  return p;
+}
+
+void
+frange_storage_slot::set_frange (const frange )
+{
+  gcc_checking_assert (fits_p (r));
+  gcc_checking_assert (!r.undefined_p ());
+
+  m_props = r.m_props;
+}
+
+void

[PATCH] More frange::set cleanups.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
Will commit pending a final round of tests.

gcc/ChangeLog:

* value-range.cc (frange::set): Initialize m_props and cleanup.
---
 gcc/value-range.cc | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index dc06f8b0078..dd5a4303908 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -292,38 +292,47 @@ frange::set (tree min, tree max, value_range_kind kind)
 
   m_kind = kind;
   m_type = TREE_TYPE (min);
-
-  // Mark NANness.
-  if (real_isnan (TREE_REAL_CST_PTR (min))
-  || real_isnan (TREE_REAL_CST_PTR (max)))
-{
-  gcc_checking_assert (operand_equal_p (min, max));
-  m_props.nan_set_yes ();
-}
-  else
-m_props.nan_set_no ();
+  m_props.set_varying ();
 
   bool is_min = vrp_val_is_min (min);
   bool is_max = vrp_val_is_max (max);
+  bool is_nan = (real_isnan (TREE_REAL_CST_PTR (min))
+|| real_isnan (TREE_REAL_CST_PTR (max)));
 
-  // Mark when the endpoints can't be INF.
-  if (!is_min)
-m_props.ninf_set_no ();
-  if (!is_max)
-m_props.inf_set_no ();
+  // Ranges with a NAN and a non-NAN endpoint are nonsensical.
+  gcc_checking_assert (!is_nan || operand_equal_p (min, max));
 
-  // Mark when the endpoints are definitely INF.
+  // The properties for singletons can be all set ahead of time.
   if (operand_equal_p (min, max))
 {
+  // Set INF properties.
   if (is_min)
m_props.ninf_set_yes ();
-  else if (is_max)
+  else
+   m_props.ninf_set_no ();
+  if (is_max)
m_props.inf_set_yes ();
+  else
+   m_props.inf_set_no ();
+  // Set NAN property.
+  if (is_nan)
+   m_props.nan_set_yes ();
+  else
+   m_props.nan_set_no ();
+}
+  else
+{
+  // Mark when the endpoints can't be +-INF.
+  if (!is_min)
+   m_props.ninf_set_no ();
+  if (!is_max)
+   m_props.inf_set_no ();
 }
 
   // Check for swapped ranges.
-  gcc_checking_assert (m_props.nan_yes_p ()
-  || tree_compare (LE_EXPR, min, max));
+  gcc_checking_assert (is_nan || tree_compare (LE_EXPR, min, max));
+
+  normalize_kind ();
 
   if (flag_checking)
 verify_range ();
-- 
2.37.1



[PATCH] Limit ranger query in ipa-prop.cc to integrals.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
ipa-* still works on legacy value_range's which only support
integrals.  This patch limits the query to integrals, as to not get a
floating point range that can't exist in an irange.

Will commit pending a final round of tests.

gcc/ChangeLog:

* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Limit ranger
query to integrals.
---
 gcc/ipa-prop.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index fb8f97397dc..ca5b9f31570 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2303,6 +2303,10 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
{
  if (TREE_CODE (arg) == SSA_NAME
  && param_type
+ /* Limit the ranger query to integral types as the rest
+of this file uses value_range's, which only hold
+integers and pointers.  */
+ && irange::supports_p (TREE_TYPE (arg))
  && get_range_query (cfun)->range_of_expr (vr, arg)
  && !vr.undefined_p ())
{
-- 
2.37.1



Re: [PATCH] configure: respect --with-build-time-tools [PR43301]

2022-08-02 Thread Eric Gallager via Gcc-patches
On Tue, Aug 2, 2022 at 1:24 AM Alexandre Oliva  wrote:
>
> Hello, Eric,
>
> Thanks for looking into this.
>
> On Aug  1, 2022, Eric Gallager via Gcc-patches  
> wrote:
>
> >> This just reassigns the value that was retrieved a couple of lines
> >> above from the very same variable.
>
> > Oh, ok, so I guess this isn't necessary after all?
>
> Yeah, I don't see how this patch could make any difference as to the
> reported problem.
>
> > In which case we can just close 43301 as INVALID then?
>
> AFAICT, with_build_time_tools is dealt with in the top level configure,
> setting up *_FOR_TARGET after searching for the tool names in the
> specified location.
>
> However, there's a potentially confusing consequence of the current
> code: gcc/configure looks for ./as$build_exeext in the build tree, and
> uses that without overwriting it if found, so if an earlier configure
> run created an 'as' script, a reconfigure will just use it, without
> creating the file again, even if it would have changed
> ORIGINAL_AS_FOR_TARGET in it.
>
> I suppose if the patch was tested by the original submitter on a clean
> build tree, so it would *appear* to have made a difference in fixing the
> problem, while it was actually just a no-op, and the apparent fix was a
> consequence of the clean build tree.
>
> So, the patch is not useful, but we may want to avoid the confusing
> scenario somehow.
>
> I suppose the point of not creating such a tiny script again is not to
> avoid unnecessary rebuilding of dependencies (I don't even see how
> dependencies on the script would come into play), so creating it again
> wouldn't hurt.  However, we wish to avoid overwriting an assembler
> copied into the build tree for use by gcc during the build.  Perhaps:
>
> -elif test -x as$build_exeext; then
> +elif test -x as$build_exeext \
> +   && { test "x$build_exeext" != "x" \
> +|| test "x`grep '^# Invoke as, ld or nm from the build tree' \
> + as`" = "x"; }; then
>
> WDYT?

Hi, thanks for the feedback; I'm a bit confused, though: where exactly
would this proposed change go?

>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [COMMITTED] Make irange dependency explicit for range_of_ssa_name_with_loop_info.

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, Aug 2, 2022 at 2:11 PM Aldy Hernandez  wrote:
>
> On Tue, Aug 2, 2022 at 2:07 PM Richard Biener
>  wrote:
> >
> > On Tue, Aug 2, 2022 at 1:41 PM Aldy Hernandez  wrote:
> > >
> > > On Tue, Aug 2, 2022 at 9:19 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, Aug 1, 2022 at 8:17 AM Aldy Hernandez via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Even though ranger is type agnostic, SCEV seems to only work with
> > > > > integers.  This patch removes some FIXME notes making it explicit that
> > > > > bounds_of_var_in_loop only works with iranges.
> > > >
> > > > SCEV also handles floats, where do you see this failing?  Yes, support 
> > > > is
> > > > somewhat limited, but still.
> > >
> > > Oh, it does?  We could convert  range_of_ssa_name_with_loop_info to
> > > the type agnostic vrange if so... (see attached untested patch).
> > >
> > > I'm looking at the testcase for 24021 with the attached patch, and
> > > bounds_of_var_in_loop() is returning false because SCEV couldn't
> > > figure out the direction of the loop:
> > >
> > >   dir = scev_direction (chrec);
> > >   if (/* Do not adjust ranges if we do not know whether the iv increases
> > >  or decreases,  ... */
> > >   dir == EV_DIR_UNKNOWN
> > >   /* ... or if it may wrap.  */
> > >   || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
> > > get_chrec_loop (chrec), true))
> > > return false;
> > >
> > > The chrec in question is rather simple... a loop increasing by 0.01:
> > >
> > > (gdb) p debug(chrec)
> > > {1.2081668171172168513294309377670288085938e-2, +,
> > > 1.0001490116119384765625e-1}_1
> >
> > Well, yeah - I meant it "supports" floats in that it can analyze the CHRECs 
> > (you
> > quote that) and it shouldn't ICE anywhere.  But of course some things may
> > return "I don't know" when not implemented.  Like scev_direction which has
> >
> >   step = CHREC_RIGHT (chrec);
> >   if (TREE_CODE (step) != INTEGER_CST)
> > return EV_DIR_UNKNOWN;
> >
> >   if (tree_int_cst_sign_bit (step))
> > return EV_DIR_DECREASES;
> >   else
> > return EV_DIR_GROWS;
> >
> > so it lacks support for REAL_CST step.  Would be interesting to see what 
> > happens
> > with a loop with -0.0 "increment" and honoring signed zeros.  That said,
> > inspecting the sign bit of a REAL_CST and handling zero (of any sign) as
> > EV_DIR_UNKNOWN would probably work.
> >
> > > I also see that bounds_of_var_in_loop() calls niter's
> > > max_loop_iterations(), which if I understand things correctly, bails
> > > at several points if the step is not integral.
> >
> > Yes, a lot of code is "simplified" by not considering FP IVs.  But it 
> > "works"
> > in terms of "it doesn't ICE" ;)
>
> That's a very generous interpretation of "works" :-).

;)

> In that case I will make our code type agnostic, with the previously
> attached patch (after testing).  Then whenever SCEV and
> bounds_of_var_in_loop (which also seems to be integer centric) support
> floats, everything will just work.

Sounds good.

Richard.

> Thanks.
> Aldy
>


Re: [COMMITTED] Make irange dependency explicit for range_of_ssa_name_with_loop_info.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 2, 2022 at 2:07 PM Richard Biener
 wrote:
>
> On Tue, Aug 2, 2022 at 1:41 PM Aldy Hernandez  wrote:
> >
> > On Tue, Aug 2, 2022 at 9:19 AM Richard Biener
> >  wrote:
> > >
> > > On Mon, Aug 1, 2022 at 8:17 AM Aldy Hernandez via Gcc-patches
> > >  wrote:
> > > >
> > > > Even though ranger is type agnostic, SCEV seems to only work with
> > > > integers.  This patch removes some FIXME notes making it explicit that
> > > > bounds_of_var_in_loop only works with iranges.
> > >
> > > SCEV also handles floats, where do you see this failing?  Yes, support is
> > > somewhat limited, but still.
> >
> > Oh, it does?  We could convert  range_of_ssa_name_with_loop_info to
> > the type agnostic vrange if so... (see attached untested patch).
> >
> > I'm looking at the testcase for 24021 with the attached patch, and
> > bounds_of_var_in_loop() is returning false because SCEV couldn't
> > figure out the direction of the loop:
> >
> >   dir = scev_direction (chrec);
> >   if (/* Do not adjust ranges if we do not know whether the iv increases
> >  or decreases,  ... */
> >   dir == EV_DIR_UNKNOWN
> >   /* ... or if it may wrap.  */
> >   || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
> > get_chrec_loop (chrec), true))
> > return false;
> >
> > The chrec in question is rather simple... a loop increasing by 0.01:
> >
> > (gdb) p debug(chrec)
> > {1.2081668171172168513294309377670288085938e-2, +,
> > 1.0001490116119384765625e-1}_1
>
> Well, yeah - I meant it "supports" floats in that it can analyze the CHRECs 
> (you
> quote that) and it shouldn't ICE anywhere.  But of course some things may
> return "I don't know" when not implemented.  Like scev_direction which has
>
>   step = CHREC_RIGHT (chrec);
>   if (TREE_CODE (step) != INTEGER_CST)
> return EV_DIR_UNKNOWN;
>
>   if (tree_int_cst_sign_bit (step))
> return EV_DIR_DECREASES;
>   else
> return EV_DIR_GROWS;
>
> so it lacks support for REAL_CST step.  Would be interesting to see what 
> happens
> with a loop with -0.0 "increment" and honoring signed zeros.  That said,
> inspecting the sign bit of a REAL_CST and handling zero (of any sign) as
> EV_DIR_UNKNOWN would probably work.
>
> > I also see that bounds_of_var_in_loop() calls niter's
> > max_loop_iterations(), which if I understand things correctly, bails
> > at several points if the step is not integral.
>
> Yes, a lot of code is "simplified" by not considering FP IVs.  But it "works"
> in terms of "it doesn't ICE" ;)

That's a very generous interpretation of "works" :-).

In that case I will make our code type agnostic, with the previously
attached patch (after testing).  Then whenever SCEV and
bounds_of_var_in_loop (which also seems to be integer centric) support
floats, everything will just work.

Thanks.
Aldy



Re: [COMMITTED] Make irange dependency explicit for range_of_ssa_name_with_loop_info.

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, Aug 2, 2022 at 1:41 PM Aldy Hernandez  wrote:
>
> On Tue, Aug 2, 2022 at 9:19 AM Richard Biener
>  wrote:
> >
> > On Mon, Aug 1, 2022 at 8:17 AM Aldy Hernandez via Gcc-patches
> >  wrote:
> > >
> > > Even though ranger is type agnostic, SCEV seems to only work with
> > > integers.  This patch removes some FIXME notes making it explicit that
> > > bounds_of_var_in_loop only works with iranges.
> >
> > SCEV also handles floats, where do you see this failing?  Yes, support is
> > somewhat limited, but still.
>
> Oh, it does?  We could convert  range_of_ssa_name_with_loop_info to
> the type agnostic vrange if so... (see attached untested patch).
>
> I'm looking at the testcase for 24021 with the attached patch, and
> bounds_of_var_in_loop() is returning false because SCEV couldn't
> figure out the direction of the loop:
>
>   dir = scev_direction (chrec);
>   if (/* Do not adjust ranges if we do not know whether the iv increases
>  or decreases,  ... */
>   dir == EV_DIR_UNKNOWN
>   /* ... or if it may wrap.  */
>   || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
> get_chrec_loop (chrec), true))
> return false;
>
> The chrec in question is rather simple... a loop increasing by 0.01:
>
> (gdb) p debug(chrec)
> {1.2081668171172168513294309377670288085938e-2, +,
> 1.0001490116119384765625e-1}_1

Well, yeah - I meant it "supports" floats in that it can analyze the CHRECs (you
quote that) and it shouldn't ICE anywhere.  But of course some things may
return "I don't know" when not implemented.  Like scev_direction which has

  step = CHREC_RIGHT (chrec);
  if (TREE_CODE (step) != INTEGER_CST)
return EV_DIR_UNKNOWN;

  if (tree_int_cst_sign_bit (step))
return EV_DIR_DECREASES;
  else
return EV_DIR_GROWS;

so it lacks support for REAL_CST step.  Would be interesting to see what happens
with a loop with -0.0 "increment" and honoring signed zeros.  That said,
inspecting the sign bit of a REAL_CST and handling zero (of any sign) as
EV_DIR_UNKNOWN would probably work.

> I also see that bounds_of_var_in_loop() calls niter's
> max_loop_iterations(), which if I understand things correctly, bails
> at several points if the step is not integral.

Yes, a lot of code is "simplified" by not considering FP IVs.  But it "works"
in terms of "it doesn't ICE" ;)

> I'd be happy to adapt our code, if SCEV can give me useful information.
>
> Aldy


Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 2, 2022 at 1:59 PM Richard Biener  wrote:
>
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener  wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > Unfortunately, this was before my time, so I don't know.
> > > >
> > > > That being said, thanks for tackling these issues that my work
> > > > triggered last release.  Much appreciated.
> > >
> > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > >
> > > -  else if (n_insns > 1)
> > > +  else if (!m_speed_p && n_insns > 1)
> > >
> > > causing the breakage on the 12 branch.  That leads to a simpler
> > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> >
> > Huh.  It's been a while, but that looks like a typo.  That patch was
> > supposed to be non-behavior changing.
>
> Exactly my thinking so reverting it shouldn't be a reason for
> detailed questions.  Now, the contains_hot_bb computation is,

Sorry for the pain.

Thanks for taking care of this.
Aldy

> that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
> together with the comment and a testcase.
>
> So - Honza, what was the reasoning to look at raw BB counts here
> rather than for example the path entry edge count?
>
> Richard.
>



Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 2, 2022 at 1:45 PM Richard Biener  wrote:
> >
> > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> >
> > > Unfortunately, this was before my time, so I don't know.
> > >
> > > That being said, thanks for tackling these issues that my work
> > > triggered last release.  Much appreciated.
> >
> > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> >
> > -  else if (n_insns > 1)
> > +  else if (!m_speed_p && n_insns > 1)
> >
> > causing the breakage on the 12 branch.  That leads to a simpler
> > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> 
> Huh.  It's been a while, but that looks like a typo.  That patch was
> supposed to be non-behavior changing.

Exactly my thinking so reverting it shouldn't be a reason for
detailed questions.  Now, the contains_hot_bb computation is,
that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
together with the comment and a testcase.

So - Honza, what was the reasoning to look at raw BB counts here
rather than for example the path entry edge count?

Richard.


[PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Roger Sayle

Many thanks to Segher and Richard for pointing out that my removal
of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
of this patch was incorrect, and my assumption that these would be 
subsumed by val_signbit_known_clear_p was mistaken.  That the
tests for ABS and FFS looked out of place, was not an indication that
they were not required, but that we were missing simplifications for
the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
To make up for this mistake, in this revised patch I've not only restored
the tests for ABS and FFS, but also added the many sibling RTX codes
that I'd also expect to see optimized here, such as ABS(PARITY(x)).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-02  Roger Sayle  
Segher Boessenkool  
Richard Sandiford  

gcc/ChangeLog
* simplify_rtx.cc (simplify_unary_operation_1) : Add
optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
and LSHIFTRT that are all positive to complement the existing
FFS and (idempotent) ABS simplifications.
: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
val_signbit_known_clear_p is true of the operand.
Simplify sign extensions of SUBREG truncations of operands
that are already suitably (zero) extended.
: Simplify zero extensions of SUBREG truncations
of operands that are already suitably zero extended.

Thanks in advance,
Roger
--

> -Original Message-
> From: Richard Sandiford 
> Sent: 02 August 2022 10:39
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
> 
> Subject: Re: [PATCH] Some additional zero-extension related optimizations
in
> simplify-rtx.
> 
> "Roger Sayle"  writes:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> >
> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
> > can itself be simplified.
> 
> I think I misunderstood, but just in case: RTL ABS is well-defined for the
minimum
> integer (giving back the minimum integer), so we can't assume that ABS
leaves
> the sign bit clear.
> 
> Thanks,
> Richard
> 
> > The second transformation is that we can canonicalized SIGN_EXTEND to
> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
> > sign-bit is known to be clear.  The final two chunks are for
> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
> > pessimistically thinks that the upper bits may have an arbitrary value
> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
> > to confirm that the high bits are known to be zero.
> >
> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> > default architecture options is undefined at zero, so we can't be sure
> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
ones).
> > nonzero_bits knows this, so the above transformations don't trigger,
> > but the transformations themselves are perfectly valid for other
> > operations such as FFS, POPCOUNT and PARITY, and on other
> > targets/-march settings where CTZ is defined at zero.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Testing with CSiBE shows these transformations
> > trigger on several source files (and with -Os reduces the size of the
> > code).  Ok for mainline?
> >
> >
> > 2022-07-27  Roger Sayle  
> >
> > gcc/ChangeLog
> > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify
> > test as both FFS and ABS result in nonzero_bits returning a
> > mask that satisfies val_signbit_known_clear_p.
> > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when
> > val_signbit_known_clear_p is true of the operand.
> > Simplify sign extensions of SUBREG truncations of operands
> > that are 

Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 2, 2022 at 1:45 PM Richard Biener  wrote:
>
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > Unfortunately, this was before my time, so I don't know.
> >
> > That being said, thanks for tackling these issues that my work
> > triggered last release.  Much appreciated.
>
> Ah.  But it was your r12-324-g69e5544210e3c0 that did
>
> -  else if (n_insns > 1)
> +  else if (!m_speed_p && n_insns > 1)
>
> causing the breakage on the 12 branch.  That leads to a simpler
> fix I guess.  Will re-test and also backport to GCC 12 if successful.

Huh.  It's been a while, but that looks like a typo.  That patch was
supposed to be non-behavior changing.

Aldy



[PATCH] autopar TLC

2022-08-02 Thread Richard Biener via Gcc-patches
The following removes all excessive update_ssa calls from OMP
expansion, thereby rewriting the atomic load and store cases to
GIMPLE code generation.  I don't think autopar ever exercises the
atomics code though.

There's not much test coverage overall so I've built SPEC 2k17
with -floop-parallelize-all -ftree-parallelize-loops=2 with and
without LTO (and otherwise -Ofast plus -march=haswell) without
fallout.

If there's any fallout it's not OK to update SSA form for
each and every OMP stmt lowered.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any objections?

Thanks,
Richard.

* omp-expand.cc (expand_omp_atomic_load): Emit GIMPLE
directly.  Avoid update_ssa when in SSA form.
(expand_omp_atomic_store): Likewise.
(expand_omp_atomic_fetch_op): Avoid update_ssa when in SSA
form.
(expand_omp_atomic_pipeline): Likewise.
(expand_omp_atomic_mutex): Likewise.
* tree-parloops.cc (gen_parallel_loop): Use
TODO_update_ssa_no_phi after loop_version.
---
 gcc/omp-expand.cc| 81 +++-
 gcc/tree-parloops.cc |  2 +-
 2 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 64e6308fc7b..48fbd157c6e 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -8617,7 +8617,7 @@ expand_omp_atomic_load (basic_block load_bb, tree addr,
   basic_block store_bb;
   location_t loc;
   gimple *stmt;
-  tree decl, call, type, itype;
+  tree decl, type, itype;
 
   gsi = gsi_last_nondebug_bb (load_bb);
   stmt = gsi_stmt (gsi);
@@ -8637,23 +8637,33 @@ expand_omp_atomic_load (basic_block load_bb, tree addr,
   itype = TREE_TYPE (TREE_TYPE (decl));
 
   enum omp_memory_order omo = gimple_omp_atomic_memory_order (stmt);
-  tree mo = build_int_cst (NULL, omp_memory_order_to_memmodel (omo));
-  call = build_call_expr_loc (loc, decl, 2, addr, mo);
+  tree mo = build_int_cst (integer_type_node,
+  omp_memory_order_to_memmodel (omo));
+  gcall *call = gimple_build_call (decl, 2, addr, mo);
+  gimple_set_location (call, loc);
+  gimple_set_vuse (call, gimple_vuse (stmt));
+  gimple *repl;
   if (!useless_type_conversion_p (type, itype))
-call = fold_build1_loc (loc, VIEW_CONVERT_EXPR, type, call);
-  call = build2_loc (loc, MODIFY_EXPR, void_type_node, loaded_val, call);
-
-  force_gimple_operand_gsi (, call, true, NULL_TREE, true, GSI_SAME_STMT);
-  gsi_remove (, true);
+{
+  tree lhs = make_ssa_name (itype);
+  gimple_call_set_lhs (call, lhs);
+  gsi_insert_before (, call, GSI_SAME_STMT);
+  repl = gimple_build_assign (loaded_val,
+ build1 (VIEW_CONVERT_EXPR, type, lhs));
+  gimple_set_location (repl, loc);
+}
+  else
+{
+  gimple_call_set_lhs (call, loaded_val);
+  repl = call;
+}
+  gsi_replace (, repl, true);
 
   store_bb = single_succ (load_bb);
   gsi = gsi_last_nondebug_bb (store_bb);
   gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ATOMIC_STORE);
   gsi_remove (, true);
 
-  if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_no_phi);
-
   return true;
 }
 
@@ -8669,7 +8679,7 @@ expand_omp_atomic_store (basic_block load_bb, tree addr,
   basic_block store_bb = single_succ (load_bb);
   location_t loc;
   gimple *stmt;
-  tree decl, call, type, itype;
+  tree decl, type, itype;
   machine_mode imode;
   bool exchange;
 
@@ -8710,25 +8720,36 @@ expand_omp_atomic_store (basic_block load_bb, tree addr,
   if (!useless_type_conversion_p (itype, type))
 stored_val = fold_build1_loc (loc, VIEW_CONVERT_EXPR, itype, stored_val);
   enum omp_memory_order omo = gimple_omp_atomic_memory_order (stmt);
-  tree mo = build_int_cst (NULL, omp_memory_order_to_memmodel (omo));
-  call = build_call_expr_loc (loc, decl, 3, addr, stored_val, mo);
+  tree mo = build_int_cst (integer_type_node,
+  omp_memory_order_to_memmodel (omo));
+  stored_val = force_gimple_operand_gsi (, stored_val, true, NULL_TREE,
+true, GSI_SAME_STMT);
+  gcall *call = gimple_build_call (decl, 3, addr, stored_val, mo);
+  gimple_set_location (call, loc);
+  gimple_set_vuse (call, gimple_vuse (stmt));
+  gimple_set_vdef (call, gimple_vdef (stmt));
+
+  gimple *repl = call;
   if (exchange)
 {
   if (!useless_type_conversion_p (type, itype))
-   call = build1_loc (loc, VIEW_CONVERT_EXPR, type, call);
-  call = build2_loc (loc, MODIFY_EXPR, void_type_node, loaded_val, call);
+   {
+ tree lhs = make_ssa_name (itype);
+ gimple_call_set_lhs (call, lhs);
+ gsi_insert_before (, call, GSI_SAME_STMT);
+ repl = gimple_build_assign (loaded_val,
+ build1 (VIEW_CONVERT_EXPR, type, lhs));
+ gimple_set_location  (repl, loc);
+   }
+  else
+   gimple_call_set_lhs (call, loaded_val);
 }
-
-  force_gimple_operand_gsi (, call, true, 

Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> Unfortunately, this was before my time, so I don't know.
> 
> That being said, thanks for tackling these issues that my work
> triggered last release.  Much appreciated.

Ah.  But it was your r12-324-g69e5544210e3c0 that did

-  else if (n_insns > 1)
+  else if (!m_speed_p && n_insns > 1)

causing the breakage on the 12 branch.  That leads to a simpler
fix I guess.  Will re-test and also backport to GCC 12 if successful.

Richard.

> Aldy
> 
> On Tue, Aug 2, 2022 at 10:41 AM Richard Biener  wrote:
> >
> > I am trying to make sense of back_threader_profitability::profitable_path_p
> > and the first thing I notice is that we do
> >
> >   /* Threading is profitable if the path duplicated is hot but also
> >  in a case we separate cold path from hot path and permit optimization
> >  of the hot path later.  Be on the agressive side here. In some 
> > testcases,
> >  as in PR 78407 this leads to noticeable improvements.  */
> >   if (m_speed_p
> >   && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> >   || contains_hot_bb))
> > {
> >   if (n_insns >= param_max_fsm_thread_path_insns)
> > {
> >   if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> >  "the number of instructions on the path "
> >  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> >   return false;
> > }
> > ...
> > }
> >   else if (!m_speed_p && n_insns > 1)
> > {
> >   if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> >  "duplication of %i insns is needed and optimizing for 
> > size.\n",
> >  n_insns);
> >   return false;
> > }
> > ...
> >   return true;
> >
> > thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> > to "hot paths".  The comment above this isn't entirely clear whether
> > this is by design ("Be on the aggressive side here ...") but I think
> > this is a mistake.  In fact the "hot path" check seems entirely
> > useless since if the path is not hot we simply continue threading it.
> >
> > I have my reservations about how we compute hot (contains_hot_bb
> > in particular), but the following first refactors the above to apply
> > the size constraints always and then _not_ threading if the path
> > is not considered hot (but allow threading if n_insns <= 1 as with
> > the !m_speed_p case).
> >
> > As for contains_hot_bb - it might be that this consciously captures
> > the case where we separate a cold from a hot path even though the
> > threaded path itself is cold.  Consider
> >
> >A
> >   / \ (unlikely)
> >  B   C
> >   \ /
> >D
> >   / \
> >  ..  abort()
> >
> > when we want to thread A -> B -> D -> abort () and A (or D)
> > has a hot BB count then we have contains_hot_bb even though
> > the counts on the path itself are small.  In fact when we
> > thread the only relevant count for the resulting threaded
> > path is the count of A with the A->C probability applied
> > (that should also the count to subtract from the blocks
> > we copied - sth missing for the backwards threader as well).
> >
> > So I'm wondering how the logic computing contains_hot_bb
> > relates to the above comment before the costing block.
> > Anyone remembers?
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >
> > * tree-ssa-threadbackwards.cc
> > (back_threader_profitability::profitable_path_p): Apply
> > size constraints to all paths.  Do not thread cold paths.
> > ---
> >  gcc/tree-ssa-threadbackward.cc | 53 +-
> >  1 file changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 0519f2a8c4b..a887568032b 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const 
> > vec _path,
> > *creates_irreducible_loop = true;
> >  }
> >
> > -  /* Threading is profitable if the path duplicated is hot but also
> > +  const int max_cold_insns = 1;
> > +  if (!m_speed_p && n_insns > max_cold_insns)
> > +{
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +"duplication of %i insns is needed and optimizing for 
> > size.\n",
> > +n_insns);
> > +  return false;
> > +}
> > +  else if (n_insns >= param_max_fsm_thread_path_insns)
> > +{
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +"the number of instructions on the path "
> > +"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> > +

Re: [COMMITTED] Make irange dependency explicit for range_of_ssa_name_with_loop_info.

2022-08-02 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 2, 2022 at 9:19 AM Richard Biener
 wrote:
>
> On Mon, Aug 1, 2022 at 8:17 AM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > Even though ranger is type agnostic, SCEV seems to only work with
> > integers.  This patch removes some FIXME notes making it explicit that
> > bounds_of_var_in_loop only works with iranges.
>
> SCEV also handles floats, where do you see this failing?  Yes, support is
> somewhat limited, but still.

Oh, it does?  We could convert  range_of_ssa_name_with_loop_info to
the type agnostic vrange if so... (see attached untested patch).

I'm looking at the testcase for 24021 with the attached patch, and
bounds_of_var_in_loop() is returning false because SCEV couldn't
figure out the direction of the loop:

  dir = scev_direction (chrec);
  if (/* Do not adjust ranges if we do not know whether the iv increases
 or decreases,  ... */
  dir == EV_DIR_UNKNOWN
  /* ... or if it may wrap.  */
  || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
get_chrec_loop (chrec), true))
return false;

The chrec in question is rather simple... a loop increasing by 0.01:

(gdb) p debug(chrec)
{1.2081668171172168513294309377670288085938e-2, +,
1.0001490116119384765625e-1}_1

I also see that bounds_of_var_in_loop() calls niter's
max_loop_iterations(), which if I understand things correctly, bails
at several points if the step is not integral.

I'd be happy to adapt our code, if SCEV can give me useful information.

Aldy
From 86752bb57b0534b496b51559ec8e28e56fb3ea4e Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 2 Aug 2022 13:27:16 +0200
Subject: [PATCH] Make range_of_ssa_name_with_loop_info type agnostic.

gcc/ChangeLog:

	* gimple-range-fold.cc (fold_using_range::range_of_phi): Remove
	irange check.
	(tree_lower_bound): New.
	(tree_upper_bound): New.
	(fold_using_range::range_of_ssa_name_with_loop_info): Convert to
	vrange.
	* gimple-range-fold.h (range_of_ssa_name_with_loop_info): Change
	argument to vrange.
---
 gcc/gimple-range-fold.cc | 46 ++--
 gcc/gimple-range-fold.h  |  2 +-
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 7665c954f2b..923094abd62 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -854,13 +854,12 @@ fold_using_range::range_of_phi (vrange , gphi *phi, fur_source )
 
   // If SCEV is available, query if this PHI has any knonwn values.
   if (scev_initialized_p ()
-  && !POINTER_TYPE_P (TREE_TYPE (phi_def))
-  && irange::supports_p (TREE_TYPE (phi_def)))
+  && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
 {
   class loop *l = loop_containing_stmt (phi);
   if (l && loop_outer (l))
 	{
-	  int_range_max loop_range;
+	  Value_Range loop_range (type);
 	  range_of_ssa_name_with_loop_info (loop_range, phi_def, l, phi, src);
 	  if (!loop_range.varying_p ())
 	{
@@ -1330,10 +1329,32 @@ fold_using_range::range_of_cond_expr  (vrange , gassign *s, fur_source )
   return true;
 }
 
+// Return the lower bound of R as a tree.
+
+static inline tree
+tree_lower_bound (const vrange , tree type)
+{
+  if (is_a  (r))
+return wide_int_to_tree (type, as_a  (r).lower_bound ());
+  // ?? Handle floats when they contain endpoints.
+  return NULL;
+}
+
+// Return the upper bound of R as a tree.
+
+static inline tree
+tree_upper_bound (const vrange , tree type)
+{
+  if (is_a  (r))
+return wide_int_to_tree (type, as_a  (r).upper_bound ());
+  // ?? Handle floats when they contain endpoints.
+  return NULL;
+}
+
 // If SCEV has any information about phi node NAME, return it as a range in R.
 
 void
-fold_using_range::range_of_ssa_name_with_loop_info (irange , tree name,
+fold_using_range::range_of_ssa_name_with_loop_info (vrange , tree name,
 		class loop *l, gphi *phi,
 		fur_source )
 {
@@ -1341,24 +1362,27 @@ fold_using_range::range_of_ssa_name_with_loop_info (irange , tree name,
   tree min, max, type = TREE_TYPE (name);
   if (bounds_of_var_in_loop (, , src.query (), l, phi, name))
 {
-  if (TREE_CODE (min) != INTEGER_CST)
+  if (!is_gimple_constant (min))
 	{
 	  if (src.query ()->range_of_expr (r, min, phi) && !r.undefined_p ())
-	min = wide_int_to_tree (type, r.lower_bound ());
+	min = tree_lower_bound (r, type);
 	  else
 	min = vrp_val_min (type);
 	}
-  if (TREE_CODE (max) != INTEGER_CST)
+  if (!is_gimple_constant (max))
 	{
 	  if (src.query ()->range_of_expr (r, max, phi) && !r.undefined_p ())
-	max = wide_int_to_tree (type, r.upper_bound ());
+	max = tree_upper_bound (r, type);
 	  else
 	max = vrp_val_max (type);
 	}
-  r.set (min, max);
+  if (min && max)
+	{
+	  r.set (min, max);
+	  return;
+	}
 }
-  else
-r.set_varying (type);
+  r.set_varying (type);
 }
 
 // ---
diff --git a/gcc/gimple-range-fold.h 

[x86 PATCH] Improved pre-reload split of double word comparison against -1.

2022-08-02 Thread Roger Sayle

This patch adds an extra optimization to *cmp_doubleword to improve
the code generated for comparisons against -1.  Hypothetically, if a
comparison against -1 reached this splitter we'd currently generate code
that looks like:

notq%rdx; 3 bytes
notq%rax; 3 bytes
orq %rdx, %rax  ; 3 bytes
setne   %al

With this patch we would instead generate the superior:

andq%rdx, %rax  ; 3 bytes
cmpq$-1, %rax   ; 4 bytes
setne   %al

which is both faster and smaller, and also what's currently generated
thanks to the middle-end splitting double word comparisons against
zero and minus one during RTL expansion.  Should that change, this would
become a missed-optimization regression, but this patch also (potentially)
helps suitable comparisons created by CSE and combine.

This patch has been tested on x86_64-pc-linux-gnu, on its own and in
combination with a middle-end patch tweaking RTL expansion, both with
and without --target-board=unix{-m32}, with no new failures.
Ok for mainline?


2022-08-02  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*cmp_doubleword): Add a special case
to split comparisons against -1 using AND and CMP -1 instructions.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f1158e1..e8f3851 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1526,6 +1526,15 @@
   operands[i] = force_reg (mode, operands[i]);
 
   operands[4] = gen_reg_rtx (mode);
+
+  /* Special case comparisons against -1.  */
+  if (operands[1] == constm1_rtx && operands[3] == constm1_rtx)
+{
+  emit_insn (gen_and3 (operands[4], operands[0], operands[2]));
+  emit_insn (gen_cmp_1 (mode, operands[4], constm1_rtx));
+  DONE;
+}
+
   if (operands[1] == const0_rtx)
 emit_move_insn (operands[4], operands[0]);
   else if (operands[0] == const0_rtx)


[PATCH] tree-optimization/106497 - more forward threader can-copy-bb

2022-08-02 Thread Richard Biener via Gcc-patches
This adds EDGE_COPY_SRC_JOINER_BLOCK sources to the set of blocks
we need to check we can duplicate.

Bootstrapped and tested on x86-64-unknown-linux-gnu, pushed.

PR tree-optimization/106497
* tree-ssa-threadupdate.cc (fwd_jt_path_registry::update_cfg):
Also verify we can copy EDGE_COPY_SRC_JOINER_BLOCK.

* gcc.dg/torture/pr106497.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr106497.c | 20 
 gcc/tree-ssa-threadupdate.cc|  3 ++-
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106497.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr106497.c 
b/gcc/testsuite/gcc.dg/torture/pr106497.c
new file mode 100644
index 000..601200de9e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106497.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-dce" } */
+
+int n;
+
+__attribute__ ((pure,returns_twice)) int
+bar (void);
+
+int
+foo (int x)
+{
+  n = 0;
+
+  bar ();
+
+  if (x && n)
+return 0;
+
+  foo (x);
+}
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index 0f2b319d44a..59c268a3567 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -2679,7 +2679,8 @@ fwd_jt_path_registry::update_cfg (bool 
may_peel_loop_headers)
  {
edge e = (*path)[j]->e;
if (m_removed_edges->find_slot (e, NO_INSERT)
-   || ((*path)[j]->type == EDGE_COPY_SRC_BLOCK
+   || (((*path)[j]->type == EDGE_COPY_SRC_BLOCK
+|| (*path)[j]->type == EDGE_COPY_SRC_JOINER_BLOCK)
&& !can_duplicate_block_p (e->src)))
  break;
  }
-- 
2.35.3


[PATCH] MIPS: improve -march=native arch detection

2022-08-02 Thread YunQiang Su
If we cannot get info from options and cpuinfo, we try to get from:
  1. getauxval(AT_BASE_PLATFORM), introduced since Linux 5.7
  2. _MIPS_ARCH from host compiler.

This can fix the wrong loader usage on r5/r6 platform with
 -march=native.

gcc/ChangeLog:
* config/mips/driver-native.cc (host_detect_local_cpu):
  try getauxval(AT_BASE_PLATFORM) and _MIPS_ARCH, too.
---
 gcc/config/mips/driver-native.cc | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/driver-native.cc b/gcc/config/mips/driver-native.cc
index 47627f85ce1..9aa7044c0b8 100644
--- a/gcc/config/mips/driver-native.cc
+++ b/gcc/config/mips/driver-native.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #define IN_TARGET_CODE 1
 
+#include 
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -46,15 +47,15 @@ host_detect_local_cpu (int argc, const char **argv)
   bool arch;
 
   if (argc < 1)
-return NULL;
+goto fallback_cpu;
 
   arch = strcmp (argv[0], "arch") == 0;
   if (!arch && strcmp (argv[0], "tune"))
-return NULL;
+goto fallback_cpu;
 
   f = fopen ("/proc/cpuinfo", "r");
   if (f == NULL)
-return NULL;
+goto fallback_cpu;
 
   while (fgets (buf, sizeof (buf), f) != NULL)
 if (startswith (buf, "cpu model"))
@@ -84,8 +85,23 @@ host_detect_local_cpu (int argc, const char **argv)
 
   fclose (f);
 
+fallback_cpu:
+/*FIXME: how about other OSes, like FreeBSD? */
+#ifdef __linux__
+  /*Note: getauxval may return NULL as:
+   * AT_BASE_PLATFORM is supported since Linux 5.7
+   * Or from older version of qemu-user
+   * */
+  if (cpu == NULL)
+cpu = (const char *) getauxval (AT_BASE_PLATFORM);
+#endif
+
   if (cpu == NULL)
+#if defined (_MIPS_ARCH)
+cpu = _MIPS_ARCH;
+#else
 return NULL;
+#endif
 
   return concat ("-m", argv[0], "=", cpu, NULL);
 }
-- 
2.30.2



Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Aldy Hernandez via Gcc-patches
Unfortunately, this was before my time, so I don't know.

That being said, thanks for tackling these issues that my work
triggered last release.  Much appreciated.

Aldy

On Tue, Aug 2, 2022 at 10:41 AM Richard Biener  wrote:
>
> I am trying to make sense of back_threader_profitability::profitable_path_p
> and the first thing I notice is that we do
>
>   /* Threading is profitable if the path duplicated is hot but also
>  in a case we separate cold path from hot path and permit optimization
>  of the hot path later.  Be on the agressive side here. In some testcases,
>  as in PR 78407 this leads to noticeable improvements.  */
>   if (m_speed_p
>   && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
>   || contains_hot_bb))
> {
>   if (n_insns >= param_max_fsm_thread_path_insns)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>  "the number of instructions on the path "
>  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>   return false;
> }
> ...
> }
>   else if (!m_speed_p && n_insns > 1)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>  "duplication of %i insns is needed and optimizing for 
> size.\n",
>  n_insns);
>   return false;
> }
> ...
>   return true;
>
> thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> to "hot paths".  The comment above this isn't entirely clear whether
> this is by design ("Be on the aggressive side here ...") but I think
> this is a mistake.  In fact the "hot path" check seems entirely
> useless since if the path is not hot we simply continue threading it.
>
> I have my reservations about how we compute hot (contains_hot_bb
> in particular), but the following first refactors the above to apply
> the size constraints always and then _not_ threading if the path
> is not considered hot (but allow threading if n_insns <= 1 as with
> the !m_speed_p case).
>
> As for contains_hot_bb - it might be that this consciously captures
> the case where we separate a cold from a hot path even though the
> threaded path itself is cold.  Consider
>
>A
>   / \ (unlikely)
>  B   C
>   \ /
>D
>   / \
>  ..  abort()
>
> when we want to thread A -> B -> D -> abort () and A (or D)
> has a hot BB count then we have contains_hot_bb even though
> the counts on the path itself are small.  In fact when we
> thread the only relevant count for the resulting threaded
> path is the count of A with the A->C probability applied
> (that should also the count to subtract from the blocks
> we copied - sth missing for the backwards threader as well).
>
> So I'm wondering how the logic computing contains_hot_bb
> relates to the above comment before the costing block.
> Anyone remembers?
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> * tree-ssa-threadbackwards.cc
> (back_threader_profitability::profitable_path_p): Apply
> size constraints to all paths.  Do not thread cold paths.
> ---
>  gcc/tree-ssa-threadbackward.cc | 53 +-
>  1 file changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 0519f2a8c4b..a887568032b 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const 
> vec _path,
> *creates_irreducible_loop = true;
>  }
>
> -  /* Threading is profitable if the path duplicated is hot but also
> +  const int max_cold_insns = 1;
> +  if (!m_speed_p && n_insns > max_cold_insns)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +"duplication of %i insns is needed and optimizing for 
> size.\n",
> +n_insns);
> +  return false;
> +}
> +  else if (n_insns >= param_max_fsm_thread_path_insns)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +"the number of instructions on the path "
> +"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> +  return false;
> +}
> +
> +  /* Threading is profitable if the path duplicated is small or hot but also
>   in a case we separate cold path from hot path and permit optimization
>   of the hot path later.  Be on the agressive side here. In some 
> testcases,
>   as in PR 78407 this leads to noticeable improvements.  */
> -  if (m_speed_p
> -  && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> - || contains_hot_bb))
> +  if (!(n_insns <= max_cold_insns
> +   || contains_hot_bb
> +   || 

Re: [PATCH][pushed] gcc-changelog: do not run extra deduction

2022-08-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Aug 2022 at 09:51, Martin Liška wrote:
>
> Do not deduce changelog for root ChangeLog ('').
>
> Small cleanup, pushed.
>
> @Jonathan: Please update the server hook.

Done. It looks like several recent changes to the hook had not been
committed to the gcc-hooks repo, so I've done that now.


Re: [PATCH] IPA: reduce what we dump in normal mode

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, Aug 2, 2022 at 10:46 AM Martin Liška  wrote:

OK

> gcc/ChangeLog:
>
> * profile.cc (compute_branch_probabilities): Dump details only
> if TDF_DETAILS.
> * symtab.cc (symtab_node::dump_base): Do not dump pointer unless
> TDF_ADDRESS is used, it makes comparison harder.
> ---
>  gcc/profile.cc | 2 +-
>  gcc/symtab.cc  | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index 08af512cbca..92de821b8bb 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -766,7 +766,7 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
>   sum2 += freq2;
> }
> }
> -  if (dump_file)
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   double nsum1 = 0, nsum2 = 0;
>   stats.qsort (cmp_stats);
> diff --git a/gcc/symtab.cc b/gcc/symtab.cc
> index 8670337416e..f2d96c0268b 100644
> --- a/gcc/symtab.cc
> +++ b/gcc/symtab.cc
> @@ -894,7 +894,8 @@ symtab_node::dump_base (FILE *f)
>};
>
>fprintf (f, "%s (%s)", dump_asm_name (), name ());
> -  dump_addr (f, " @", (void *)this);
> +  if (dump_flags & TDF_ADDRESS)
> +dump_addr (f, " @", (void *)this);
>fprintf (f, "\n  Type: %s", symtab_type_names[type]);
>
>if (definition)
> --
> 2.37.1
>


Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> This patch implements some additional zero-extension and sign-extension
> related optimizations in simplify-rtx.cc.  The original motivation comes
> from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:
>
> Failed to match this instruction:
> (set (reg:DI 88 [ _1 ])
> (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>
> On many platforms the result of DImode CTZ is constrained to be a
> small unsigned integer (between 0 and 64), hence the truncation to
> 32-bits (using a SUBREG) and the following sign extension back to
> 64-bits are effectively a no-op, so the above should ideally (often)
> be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>
> To implement this, and some closely related transformations, we build
> upon the existing val_signbit_known_clear_p predicate.  In the first
> chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
> can itself be simplified.

I think I misunderstood, but just in case: RTL ABS is well-defined
for the minimum integer (giving back the minimum integer), so we
can't assume that ABS leaves the sign bit clear.

Thanks,
Richard

> The second transformation is that we can
> canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
> when the operand's sign-bit is known to be clear.  The final two chunks
> are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
> truncating SUBREG respectively.  The nonzero_bits of a truncating
> SUBREG pessimistically thinks that the upper bits may have an
> arbitrary value (by taking the SUBREG), so we need look deeper at the
> SUBREG's operand to confirm that the high bits are known to be zero.
>
> Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> default architecture options is undefined at zero, so we can't be sure
> the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
> nonzero_bits knows this, so the above transformations don't trigger,
> but the transformations themselves are perfectly valid for other
> operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
> settings where CTZ is defined at zero.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Testing with CSiBE shows these transformations
> trigger on several source files (and with -Os reduces the size of the
> code).  Ok for mainline?
>
>
> 2022-07-27  Roger Sayle  
>
> gcc/ChangeLog
> * simplify_rtx.cc (simplify_unary_operation_1) : Simplify
> test as both FFS and ABS result in nonzero_bits returning a
> mask that satisfies val_signbit_known_clear_p.
> : Canonicalize SIGN_EXTEND to ZERO_EXTEND when
> val_signbit_known_clear_p is true of the operand.
> Simplify sign extensions of SUBREG truncations of operands
> that are already suitably (zero) extended.
> : Simplify zero extensions of SUBREG truncations
> of operands that are already suitably zero extended.
>
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index fa20665..e62bf56 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>   break;
>  
>/* If operand is something known to be positive, ignore the ABS.  */
> -  if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -   || val_signbit_known_clear_p (GET_MODE (op),
> - nonzero_bits (op, GET_MODE (op
> +  if (val_signbit_known_clear_p (GET_MODE (op),
> +  nonzero_bits (op, GET_MODE (op
>   return op;
>  
>/* If operand is known to be only -1 or 0, convert ABS to NEG.  */
> @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>   }
>   }
>  
> +  /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> + we know the sign bit of OP must be clear.  */
> +  if (val_signbit_known_clear_p (GET_MODE (op),
> +  nonzero_bits (op, GET_MODE (op
> + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> +
> +  /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +  if (GET_CODE (op) == SUBREG
> +   && subreg_lowpart_p (op)
> +   && GET_MODE (SUBREG_REG (op)) == mode
> +   && is_a  (mode, _mode)
> +   && is_a  (GET_MODE (op), _mode)
> +   && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +   && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +   && (nonzero_bits (SUBREG_REG (op), mode)
> +   & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> + return SUBREG_REG (op);
> +
> 

Re: [PATCH 2/2]middle-end: Support recognition of three-way max/min.

2022-08-02 Thread Richard Biener via Gcc-patches
On Tue, Aug 2, 2022 at 10:33 AM Tamar Christina via Gcc-patches
 wrote:
>
> > > > > When this function replaces the edge it doesn't seem to update the
> > > > dominators.
> > > > > Since It's replacing the middle BB we then end up with an error
> > > > >
> > > > > gcc/testsuite/gcc.dg/tree-ssa/minmax-14.c:17:1: error: dominator
> > > > > of 5 should be 4, not 2
> > > > >
> > > > > during early verify. So instead, I replace the BB but defer its
> > > > > deletion until cleanup which removes it and updates the dominators.
> > > >
> > > > Hmm, for a diamond shouldn't you replace
> > > >
> > > >   if (EDGE_SUCC (cond_block, 0)->dest == bb)
> > > > edge_to_remove = EDGE_SUCC (cond_block, 1);
> > > >   else
> > > > edge_to_remove = EDGE_SUCC (cond_block, 0);
> > > >
> > > > with
> > > >
> > > >   if (EDGE_SUCC (cond_block, 0)->dest == bb)
> > > > edge_to_remove = EDGE_SUCC (cond_block, 1);
> > > >   else if (EDGE_SUCC (cond_block, 1)->dest == bb)
> > > > edge_to_remove = EDGE_SUCC (cond_block, 0);
> > > >
> > > > thus, the code expects to be left with a fallthru to the PHI block
> > > > which is expected to have the immediate dominator being cond_block
> > > > but with a diamond there's a (possibly empty) block inbetween and
> > > > dominators are wrong.
> > >
> > > Agreed, but the (EDGE_SUCC (cond_block, 1)->dest == bb) doesn't seem
> > > like the Right one since for a diamond there will be a block in
> > > between the two.  Did you perhaps mean  EDGE_SUCC (EDGE_SUCC
> > > (cond_block, 1)->dest, 0)->dest == bb? i.e. that that destination across 
> > > the
> > diamond be bb, and then you remove the middle block?
> >
> > Hmm, I think my condition was correct - the code tries to remove the edge to
> > the middle-block and checks the remaining edge falls through to the merge
> > block.  With a true diamond there is no fallthru to the merge block to keep 
> > so
> > we better don't remove any edge?
> >
> > > For the minmax diamond we want both edges removed, since all the code
> > > in the middle BBs are now dead.  But this is probably not true in the 
> > > general
> > sense.
>
> Ah! Sorry I was firing a few cylinders short, I get what you mean now:
>
> @@ -425,8 +439,19 @@ replace_phi_edge_with_variable (basic_block cond_block,
>edge edge_to_remove;
>if (EDGE_SUCC (cond_block, 0)->dest == bb)
>  edge_to_remove = EDGE_SUCC (cond_block, 1);
> -  else
> +  else if (EDGE_SUCC (cond_block, 1)->dest == bb)
>  edge_to_remove = EDGE_SUCC (cond_block, 0);
> +  else
> +{
> +  /* If neither edge from the conditional is the final bb
> +then we must have a diamond block, in which case
> +the true edge was changed by SET_USE above and we must
> +mark the other edge as the false edge.  */
> +  gcond *cond = as_a  (last_stmt (cond_block));
> +  gimple_cond_make_false (cond);
> +  return;
> +}
> +

Note there is already

  if (EDGE_COUNT (edge_to_remove->dest->preds) == 1)
{
...
}
  else
{
  /* If there are other edges into the middle block make
 CFG cleanup deal with the edge removal to avoid
 updating dominators here in a non-trivial way.  */
  gcond *cond = as_a  (last_stmt (cond_block));
  if (edge_to_remove->flags & EDGE_TRUE_VALUE)
gimple_cond_make_false (cond);
  else
gimple_cond_make_true (cond);
}

I'm not sure how you can say 'e' is always the true edge?  May I suggest
to amend the first condition with edge_to_remove && (and initialize that
to NULL) and use e->flags instead of edge_to_remove in the else,
of course also inverting the logic since we're keeping 'e'?

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok with this Change?
>
> Thanks,
> Tamar


[PATCH][pushed] gcc-changelog: do not run extra deduction

2022-08-02 Thread Martin Liška
Do not deduce changelog for root ChangeLog ('').

Small cleanup, pushed.

@Jonathan: Please update the server hook.

Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Do not deduce changelog for root 
ChangeLog.
---
 contrib/gcc-changelog/git_commit.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index a6b5ff04f22..7f6ff87ba99 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -626,7 +626,7 @@ class GitCommit:
 
 def deduce_changelog_locations(self):
 for entry in self.changelog_entries:
-if not entry.folder:
+if entry.folder is None:
 changelog = None
 for file in entry.files:
 location = self.get_file_changelog_location(file)
-- 
2.37.1



[PATCH] IPA: reduce what we dump in normal mode

2022-08-02 Thread Martin Liška
gcc/ChangeLog:

* profile.cc (compute_branch_probabilities): Dump details only
if TDF_DETAILS.
* symtab.cc (symtab_node::dump_base): Do not dump pointer unless
TDF_ADDRESS is used, it makes comparison harder.
---
 gcc/profile.cc | 2 +-
 gcc/symtab.cc  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/profile.cc b/gcc/profile.cc
index 08af512cbca..92de821b8bb 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -766,7 +766,7 @@ compute_branch_probabilities (unsigned cfg_checksum, 
unsigned lineno_checksum)
  sum2 += freq2;
}
}
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
{
  double nsum1 = 0, nsum2 = 0;
  stats.qsort (cmp_stats);
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index 8670337416e..f2d96c0268b 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -894,7 +894,8 @@ symtab_node::dump_base (FILE *f)
   };
 
   fprintf (f, "%s (%s)", dump_asm_name (), name ());
-  dump_addr (f, " @", (void *)this);
+  if (dump_flags & TDF_ADDRESS)
+dump_addr (f, " @", (void *)this);
   fprintf (f, "\n  Type: %s", symtab_type_names[type]);
 
   if (definition)
-- 
2.37.1



[PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader

2022-08-02 Thread Richard Biener via Gcc-patches
I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
 in a case we separate cold path from hot path and permit optimization
 of the hot path later.  Be on the agressive side here. In some testcases,
 as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
  && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
  || contains_hot_bb))
{
  if (n_insns >= param_max_fsm_thread_path_insns)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
 "the number of instructions on the path "
 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
  return false;
}
...
}
  else if (!m_speed_p && n_insns > 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
 "duplication of %i insns is needed and optimizing for size.\n",
 n_insns);
  return false;
}
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

I have my reservations about how we compute hot (contains_hot_bb
in particular), but the following first refactors the above to apply
the size constraints always and then _not_ threading if the path
is not considered hot (but allow threading if n_insns <= 1 as with
the !m_speed_p case).

As for contains_hot_bb - it might be that this consciously captures
the case where we separate a cold from a hot path even though the
threaded path itself is cold.  Consider

   A
  / \ (unlikely)
 B   C
  \ /
   D
  / \
 ..  abort()

when we want to thread A -> B -> D -> abort () and A (or D)
has a hot BB count then we have contains_hot_bb even though
the counts on the path itself are small.  In fact when we
thread the only relevant count for the resulting threaded
path is the count of A with the A->C probability applied
(that should also the count to subtract from the blocks
we copied - sth missing for the backwards threader as well).

So I'm wondering how the logic computing contains_hot_bb
relates to the above comment before the costing block.
Anyone remembers?

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

* tree-ssa-threadbackwards.cc
(back_threader_profitability::profitable_path_p): Apply
size constraints to all paths.  Do not thread cold paths.
---
 gcc/tree-ssa-threadbackward.cc | 53 +-
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..a887568032b 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const 
vec _path,
*creates_irreducible_loop = true;
 }
 
-  /* Threading is profitable if the path duplicated is hot but also
+  const int max_cold_insns = 1;
+  if (!m_speed_p && n_insns > max_cold_insns)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+"duplication of %i insns is needed and optimizing for size.\n",
+n_insns);
+  return false;
+}
+  else if (n_insns >= param_max_fsm_thread_path_insns)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+"the number of instructions on the path "
+"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+  return false;
+}
+
+  /* Threading is profitable if the path duplicated is small or hot but also
  in a case we separate cold path from hot path and permit optimization
  of the hot path later.  Be on the agressive side here. In some testcases,
  as in PR 78407 this leads to noticeable improvements.  */
-  if (m_speed_p
-  && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
- || contains_hot_bb))
+  if (!(n_insns <= max_cold_insns
+   || contains_hot_bb
+   || (taken_edge && optimize_edge_for_speed_p (taken_edge
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+"path is not profitable to thread.\n");
+  return false;
+}
+
+  /* If the path is not small to duplicate and either the entry or
+ the final destination is probably never executed avoid separating
+ the cold path since that can lead 

RE: [PATCH 2/2]middle-end: Support recognition of three-way max/min.

2022-08-02 Thread Tamar Christina via Gcc-patches
> > > > When this function replaces the edge it doesn't seem to update the
> > > dominators.
> > > > Since It's replacing the middle BB we then end up with an error
> > > >
> > > > gcc/testsuite/gcc.dg/tree-ssa/minmax-14.c:17:1: error: dominator
> > > > of 5 should be 4, not 2
> > > >
> > > > during early verify. So instead, I replace the BB but defer its
> > > > deletion until cleanup which removes it and updates the dominators.
> > >
> > > Hmm, for a diamond shouldn't you replace
> > >
> > >   if (EDGE_SUCC (cond_block, 0)->dest == bb)
> > > edge_to_remove = EDGE_SUCC (cond_block, 1);
> > >   else
> > > edge_to_remove = EDGE_SUCC (cond_block, 0);
> > >
> > > with
> > >
> > >   if (EDGE_SUCC (cond_block, 0)->dest == bb)
> > > edge_to_remove = EDGE_SUCC (cond_block, 1);
> > >   else if (EDGE_SUCC (cond_block, 1)->dest == bb)
> > > edge_to_remove = EDGE_SUCC (cond_block, 0);
> > >
> > > thus, the code expects to be left with a fallthru to the PHI block
> > > which is expected to have the immediate dominator being cond_block
> > > but with a diamond there's a (possibly empty) block inbetween and
> > > dominators are wrong.
> >
> > Agreed, but the (EDGE_SUCC (cond_block, 1)->dest == bb) doesn't seem
> > like the Right one since for a diamond there will be a block in
> > between the two.  Did you perhaps mean  EDGE_SUCC (EDGE_SUCC
> > (cond_block, 1)->dest, 0)->dest == bb? i.e. that that destination across the
> diamond be bb, and then you remove the middle block?
> 
> Hmm, I think my condition was correct - the code tries to remove the edge to
> the middle-block and checks the remaining edge falls through to the merge
> block.  With a true diamond there is no fallthru to the merge block to keep so
> we better don't remove any edge?
> 
> > For the minmax diamond we want both edges removed, since all the code
> > in the middle BBs are now dead.  But this is probably not true in the 
> > general
> sense.

Ah! Sorry I was firing a few cylinders short, I get what you mean now:

@@ -425,8 +439,19 @@ replace_phi_edge_with_variable (basic_block cond_block,
   edge edge_to_remove;
   if (EDGE_SUCC (cond_block, 0)->dest == bb)
 edge_to_remove = EDGE_SUCC (cond_block, 1);
-  else
+  else if (EDGE_SUCC (cond_block, 1)->dest == bb)
 edge_to_remove = EDGE_SUCC (cond_block, 0);
+  else
+{
+  /* If neither edge from the conditional is the final bb
+then we must have a diamond block, in which case
+the true edge was changed by SET_USE above and we must
+mark the other edge as the false edge.  */
+  gcond *cond = as_a  (last_stmt (cond_block));
+  gimple_cond_make_false (cond);
+  return;
+}
+

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok with this Change?

Thanks,
Tamar


Re: [PATCH] Teach VN about masked/len stores

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following teaches VN to handle reads from .MASK_STORE and
> .LEN_STORE.  For this push_partial_def is extended first for
> convenience so we don't have to handle the full def case in the
> caller (possibly other paths can be simplified then).  Also
> the partial definition stored value can have an offset applied
> so we don't have to build a fake RHS when we register the pieces
> of an existing store.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, Kewen is
> going to test on powerpc.
>
> I'm not sure about whether it's worth (or easily possible) to
> handle .MASK_STORE_LANES, I think handling the constant def case
> might be possible but since it has an intrinsic permute it might
> make more sense to rewrite the constant def case into a .MASK_STORE?
> (does the mask apply to the destination memory order or the source
> lane order?)

There's one mask bit for each structure, rather than one for each
element.  So converting a constant .MASK_STORE_LANES to a constant
.MASK_STORE would require some massaging of the predicate.

Thanks,
Richard

>
>   PR tree-optimization/106365
>   * tree-ssa-sccvn.cc (pd_data::rhs_off): New field determining
>   the offset to start encoding of RHS from.
>   (vn_walk_cb_data::vn_walk_cb_data): Initialize it.
>   (vn_walk_cb_data::push_partial_def): Allow the first partial
>   definition to be fully providing the def.  Offset RHS
>   before encoding if requested.
>   (vn_reference_lookup_3): Initialize def_rhs everywhere.
>   Add support for .MASK_STORE and .LEN_STORE (partial) definitions.
>
>   * gcc.target/i386/vec-maskstore-vn.c: New testcase.
> ---
>  .../gcc.target/i386/vec-maskstore-vn.c|  30 +++
>  gcc/tree-ssa-sccvn.cc | 255 ++
>  2 files changed, 228 insertions(+), 57 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vec-maskstore-vn.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vec-maskstore-vn.c 
> b/gcc/testsuite/gcc.target/i386/vec-maskstore-vn.c
> new file mode 100644
> index 000..98213905ece
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vec-maskstore-vn.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -fdump-tree-fre5" } */
> +
> +void __attribute__((noinline,noclone))
> +foo (int *out, int *res)
> +{
> +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> +  int i;
> +  for (i = 0; i < 16; ++i)
> +{
> +  if (mask[i])
> +out[i] = i;
> +}
> +  int o0 = out[0];
> +  int o7 = out[7];
> +  int o14 = out[14];
> +  int o15 = out[15];
> +  res[0] = o0;
> +  res[2] = o7;
> +  res[4] = o14;
> +  res[6] = o15;
> +}
> +
> +/* Vectorization produces .MASK_STORE, unrolling will unroll the two
> +   vector iterations.  FRE5 after that should be able to CSE
> +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index f41d5031365..7d947b55a27 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -1790,6 +1790,7 @@ struct pd_range
>  struct pd_data
>  {
>tree rhs;
> +  HOST_WIDE_INT rhs_off;
>HOST_WIDE_INT offset;
>HOST_WIDE_INT size;
>  };
> @@ -1816,6 +1817,7 @@ struct vn_walk_cb_data
>   unsigned int pos = 0, prec = w.get_precision ();
>   pd_data pd;
>   pd.rhs = build_constructor (NULL_TREE, NULL);
> + pd.rhs_off = 0;
>   /* When bitwise and with a constant is done on a memory load,
>  we don't really need all the bits to be defined or defined
>  to constants, we don't really care what is in the position
> @@ -1976,6 +1978,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>  
>bool pd_constant_p = (TREE_CODE (pd.rhs) == CONSTRUCTOR
>   || CONSTANT_CLASS_P (pd.rhs));
> +  pd_range *r;
>if (partial_defs.is_empty ())
>  {
>/* If we get a clobber upfront, fail.  */
> @@ -1989,65 +1992,70 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>first_set = set;
>first_base_set = base_set;
>last_vuse_ptr = NULL;
> -  /* Continue looking for partial defs.  */
> -  return NULL;
> -}
> -
> -  if (!known_ranges)
> -{
> -  /* ???  Optimize the case where the 2nd partial def completes things.  
> */
> -  gcc_obstack_init (_obstack);
> -  known_ranges = splay_tree_new_with_allocator (pd_range_compare, 0, 0,
> - pd_tree_alloc,
> - pd_tree_dealloc, this);
> -  splay_tree_insert (known_ranges,
> -  (splay_tree_key)_range.offset,
> -  (splay_tree_value)_range);
> -  

[PATCH] gcov-dump: add --stable option

2022-08-02 Thread Martin Liška
The option prints TOP N counters in a stable format
usage for comparison (diff).

Will install the patch if there are no objections.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Martin

gcc/ChangeLog:

* doc/gcov-dump.texi: Document the new option.
* gcov-dump.cc (main): Parse the new option.
(print_usage): Show the option.
(tag_counters): Sort key:value pairs of TOP N counter.
---
 gcc/doc/gcov-dump.texi |  5 
 gcc/gcov-dump.cc   | 61 +-
 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/gcov-dump.texi b/gcc/doc/gcov-dump.texi
index 4f4e355dd28..0491ab17bc1 100644
--- a/gcc/doc/gcov-dump.texi
+++ b/gcc/doc/gcov-dump.texi
@@ -62,6 +62,7 @@ gcov-dump [@option{-v}|@option{--version}]
  [@option{-l}|@option{--long}]
  [@option{-p}|@option{--positions}]
  [@option{-r}|@option{--raw}]
+ [@option{-s}|@option{--stable}]
  @var{gcovfiles}
 @c man end
 @end ignore
@@ -85,6 +86,10 @@ Dump positions of records.
 @itemx --raw
 Print content records in raw format.
 
+@item -s
+@itemx --stable
+Print content in stable format usable for comparison.
+
 @item -v
 @itemx --version
 Display the @command{gcov-dump} version number (on the standard output),
diff --git a/gcc/gcov-dump.cc b/gcc/gcov-dump.cc
index 0804c794e9e..85b1be8859e 100644
--- a/gcc/gcov-dump.cc
+++ b/gcc/gcov-dump.cc
@@ -28,6 +28,10 @@ along with Gcov; see the file COPYING3.  If not see
 #include "gcov-io.h"
 #include "gcov-io.cc"
 
+#include 
+
+using namespace std;
+
 static void dump_gcov_file (const char *);
 static void print_prefix (const char *, unsigned, gcov_position_t);
 static void print_usage (void);
@@ -50,6 +54,7 @@ typedef struct tag_format
 static int flag_dump_contents = 0;
 static int flag_dump_positions = 0;
 static int flag_dump_raw = 0;
+static int flag_dump_stable = 0;
 
 static const struct option options[] =
 {
@@ -57,7 +62,9 @@ static const struct option options[] =
   { "version",  no_argument,   NULL, 'v' },
   { "long", no_argument,   NULL, 'l' },
   { "positions",   no_argument,   NULL, 'o' },
-  { 0, 0, 0, 0 }
+  { "raw", no_argument,   NULL, 'r' },
+  { "stable",  no_argument,   NULL, 's' },
+  {}
 };
 
 #define VALUE_PADDING_PREFIX "  "
@@ -96,7 +103,7 @@ main (int argc ATTRIBUTE_UNUSED, char **argv)
 
   diagnostic_initialize (global_dc, 0);
 
-  while ((opt = getopt_long (argc, argv, "hlprvw", options, NULL)) != -1)
+  while ((opt = getopt_long (argc, argv, "hlprsvw", options, NULL)) != -1)
 {
   switch (opt)
{
@@ -115,6 +122,9 @@ main (int argc ATTRIBUTE_UNUSED, char **argv)
case 'r':
  flag_dump_raw = 1;
  break;
+   case 's':
+ flag_dump_stable = 1;
+ break;
default:
  fprintf (stderr, "unknown flag `%c'\n", opt);
}
@@ -134,6 +144,8 @@ print_usage (void)
   printf ("  -l, --long   Dump record contents too\n");
   printf ("  -p, --positions  Dump record positions\n");
   printf ("  -r, --rawPrint content records in raw format\n");
+  printf ("  -s, --stable Print content in stable "
+ "format usable for comparison\n");
   printf ("  -v, --versionPrint version number\n");
   printf ("\nFor bug reporting instructions, please see:\n%s.\n",
   bug_report_url);
@@ -439,16 +451,52 @@ tag_counters (const char *filename ATTRIBUTE_UNUSED,
   int n_counts = GCOV_TAG_COUNTER_NUM (length);
   bool has_zeros = n_counts < 0;
   n_counts = abs (n_counts);
+  unsigned counter_idx = GCOV_COUNTER_FOR_TAG (tag);
 
   printf (" %s %u counts%s",
- counter_names[GCOV_COUNTER_FOR_TAG (tag)], n_counts,
+ counter_names[counter_idx], n_counts,
  has_zeros ? " (all zero)" : "");
   if (flag_dump_contents)
 {
+  vector counters;
   for (int ix = 0; ix != n_counts; ix++)
+   counters.push_back (has_zeros ? 0 : gcov_read_counter ());
+
+  /* Make stable sort for TOP N counters.  */
+  if (flag_dump_stable)
+   if (counter_idx == GCOV_COUNTER_V_INDIR
+   || counter_idx == GCOV_COUNTER_V_TOPN)
+ {
+   unsigned start = 0;
+   while (start < counters.size ())
+ {
+   unsigned n = counters[start + 1];
+
+   /* Use bubble sort.  */
+   for (unsigned i = 1; i <= n; ++i)
+ for (unsigned j = i; j <= n; ++j)
+   {
+ gcov_type key1 = counters[start + 2 * i];
+ gcov_type value1 = counters[start + 2 * i + 1];
+ gcov_type key2 = counters[start + 2 * j];
+ gcov_type value2 = counters[start + 2 * j + 1];
+
+ if (value1 < value2 || (value1 == value2 && key1 < key2))
+   {
+ std::swap 

Ping^3: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-08-02 Thread Richard Sandiford via Gcc-patches
Ping^3 for the configure bits.

Richard Sandiford via Gcc-patches  writes:
> On aarch64, --with-arch, --with-cpu and --with-tune only have an
> effect on the driver, so “./xgcc -B./ -O3” can give significantly
> different results from “./cc1 -O3”.  --with-arch did have a limited
> effect on ./cc1 in previous releases, although it didn't work
> entirely correctly.
>
> Being of a lazy persuasion, I've got used to ./cc1 selecting SVE for
> --with-arch=armv8.2-a+sve without having to supply an explicit -march,
> so this patch makes ./cc1 emulate the relevant OPTION_DEFAULT_SPECS.
> It relies on Wilco's earlier clean-ups.
>
> The patch makes config.gcc define WITH_FOO_STRING macros for each
> supported --with-foo option.  This could be done only in aarch64-
> specific code, but I thought it could be useful on other targets
> too (and can be safely ignored otherwise).  There didn't seem to
> be any existing and potentially clashing uses of macros with this
> style of name.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for the configure
> bits?
>
> Richard
>
>
> gcc/
>   * config.gcc: Define WITH_FOO_STRING macros for each supported
>   --with-foo option.
>   * config/aarch64/aarch64.cc (aarch64_override_options): Emulate
>   OPTION_DEFAULT_SPECS.
>   * config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Reference the above.
> ---
>  gcc/config.gcc| 14 ++
>  gcc/config/aarch64/aarch64.cc |  8 
>  gcc/config/aarch64/aarch64.h  |  5 -
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index cdbefb5b4f5..e039230431c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -5865,6 +5865,20 @@ else
>   configure_default_options="{ ${t} }"
>  fi
>  
> +for option in $supported_defaults
> +do
> + lc_option=`echo $option | sed s/-/_/g`
> + uc_option=`echo $lc_option | tr a-z A-Z`
> + eval "val=\$with_$lc_option"
> + if test -n "$val"
> + then
> + val="\\\"$val\\\""
> + else
> + val=nullptr
> + fi
> + tm_defines="$tm_defines WITH_${uc_option}_STRING=$val"
> +done
> +
>  if test "$target_cpu_default2" != ""
>  then
>   if test "$target_cpu_default" != ""
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d21e041eccb..0bc700b81ad 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18109,6 +18109,14 @@ aarch64_override_options (void)
>if (aarch64_branch_protection_string)
>  aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
>  
> +  /* Emulate OPTION_DEFAULT_SPECS.  */
> +  if (!aarch64_arch_string && !aarch64_cpu_string)
> +aarch64_arch_string = WITH_ARCH_STRING;
> +  if (!aarch64_arch_string && !aarch64_cpu_string)
> +aarch64_cpu_string = WITH_CPU_STRING;
> +  if (!aarch64_cpu_string && !aarch64_tune_string)
> +aarch64_tune_string = WITH_TUNE_STRING;
> +
>/* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
>   If either of -march or -mtune is given, they override their
>   respective component of -mcpu.  */
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 80cfe4b7407..3122dbd7098 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -1267,7 +1267,10 @@ extern enum aarch64_code_model aarch64_cmodel;
>  /* Support for configure-time --with-arch, --with-cpu and --with-tune.
> --with-arch and --with-cpu are ignored if either -mcpu or -march is used.
> --with-tune is ignored if either -mtune or -mcpu is used (but is not
> -   affected by -march).  */
> +   affected by -march).
> +
> +   There is corresponding code in aarch64_override_options that emulates
> +   this behavior when cc1  are invoked directly.  */
>  #define OPTION_DEFAULT_SPECS \
>{"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" },   \
>{"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },   \


Re: [PATCH] Add condition coverage profiling

2022-08-02 Thread Jørgen Kvalsvik via Gcc-patches
On 15/07/2022 15:47, Jørgen Kvalsvik wrote:
> On 15/07/2022 15:31, Sebastian Huber wrote:
>> On 15.07.22 13:47, Jørgen Kvalsvik via Gcc-patches wrote:
>>> 2. New vocabulary for the output - decisions for, well, the decisions. It 
>>> also
>>> writes at most one line per condition:
>>>
>>> decisions covered 1/4
>>> condition  0 not covered (true false)
>>> condition  1 not covered (true)
>>
>> Do we really have multiple decisions? I think we have only one decision 
>> composed
>> of conditions and zero or more boolean operators. We have variants of 
>> condition
>> outcomes.
>>
> 
> Maybe, I'm not sure - you could argue that since a fixture of boolean
> "dominates" the outcome (either by short circuiting subsequent terms or 
> masking
> preceding terms) then that fixture is a decision which leads to one of two
> outcomes.  The other parameters may change but they wouldn't change the
> decision. You have 2^N inputs but only N+1 decisions. Personally the 
> "variants"
> phrasing doesn't feel right to me.
> 
> That being said I'm open to making this whatever the maintainers feel is
> appropriate.

Coming back to this, this is the terms + definitions gracefully borrowed from
wiki [1]:

Condition
A condition is a leaf-level Boolean expression (it cannot be broken down
into simpler Boolean expressions).

Decision
A Boolean expression composed of conditions and zero or more Boolean
operators. A decision without a Boolean operator is a condition.

Condition coverage
Every condition in a decision in the program has taken all possible outcomes
at least once.

Decision coverage
Every point of entry and exit in the program has been invoked at least once,
and every decision in the program has taken all possible outcomes at least once.



I agree that decisions n/m, based on these definitions, is not the best output
unless it is specifically overloaded to mean "the family of bit vectors that
cover some conditions". Not great.

Based on this established terminology I can think of a few good candidates:

condition outcomes covered n/m
outcomes covered n/m

What do you think?

[1] https://en.wikipedia.org/wiki/Modified_condition/decision_coverage


[PATCH] RFC: Extend SLP permutation optimisations

2022-08-02 Thread Richard Sandiford via Gcc-patches
Currently SLP tries to force permute operations "down" the graph
from loads in the hope of reducing the total number of permutes
needed or (in the best case) removing the need for the permutes
entirely.  This patch tries to extend it as follows:

- Allow loads to take a different permutation from the one they
  started with, rather than choosing between "original permutation"
  and "no permutation".

- Allow changes in both directions, if the target supports the
  reverse permute operation.

- Treat the placement of permute operations as a two-way dataflow
  problem: after propagating information from leaves to roots (as now),
  propagate information back up the graph.

- Take execution frequency into account when optimising for speed,
  so that (for example) permutes inside loops have a higher cost
  than permutes outside loops.

- Try to reduce the total number of permutes when optimising for
  size, even if that increases the number of permutes on a given
  execution path.

See the big block comment above vect_optimize_slp_pass for
a detailed description.

A while back I posted a patch to extend the existing optimisation
to non-consecutive loads.  This patch doesn't include that change
(although it's a possible future addition).

The original motivation for doing this was to add a framework that would
allow other layout differences in future.  The two main ones are:

- Make it easier to represent predicated operations, including
  predicated operations with gaps.  E.g.:

 a[0] += 1;
 a[1] += 1;
 a[3] += 1;

  could be a single load/add/store for SVE.  We could handle this
  by representing a layout such as { 0, 1, _, 2 } or { 0, 1, _, 3 }
  (depending on what's being counted).  We might need to move
  elements between lanes at various points, like with permutes.

  (This would first mean adding support for stores with gaps.)

- Make it easier to switch between an even/odd and unpermuted layout
  when switching between wide and narrow elements.  E.g. if a widening
  operation produces an even vector and an odd vector, we should try
  to keep operations on the wide elements in that order rather than
  force them to be permuted back "in order".

To give some examples of what the current patch does:

int f1(int *__restrict a, int *__restrict b, int *__restrict c,
   int *__restrict d)
{
  a[0] = (b[1] << c[3]) - d[1];
  a[1] = (b[0] << c[2]) - d[0];
  a[2] = (b[3] << c[1]) - d[3];
  a[3] = (b[2] << c[0]) - d[2];
}

continues to produce the same code as before when optimising for
speed: b, c and d are permuted at load time.  But when optimising
for size we instead permute c into the same order as b+d and then
permute the result of the arithmetic into the same order as a:

ldr q1, [x2]
ldr q0, [x1]
ext v1.16b, v1.16b, v1.16b, #8 // <--
sshlv0.4s, v0.4s, v1.4s
ldr q1, [x3]
sub v0.4s, v0.4s, v1.4s
rev64   v0.4s, v0.4s   // <--
str q0, [x0]
ret

The following function:

int f2(int *__restrict a, int *__restrict b, int *__restrict c,
   int *__restrict d)
{
  a[0] = (b[3] << c[3]) - d[3];
  a[1] = (b[2] << c[2]) - d[2];
  a[2] = (b[1] << c[1]) - d[1];
  a[3] = (b[0] << c[0]) - d[0];
}

continues to push the reverse down to just before the store,
like the current code does.

In:

int f3(int *__restrict a, int *__restrict b, int *__restrict c,
   int *__restrict d)
{
  for (int i = 0; i < 100; ++i)
{
  a[0] = (a[0] + c[3]);
  a[1] = (a[1] + c[2]);
  a[2] = (a[2] + c[1]);
  a[3] = (a[3] + c[0]);
  c += 4;
}
}

the loads of a are hoisted and the stores of a are sunk, so that
only the load from c happens in the loop.  When optimising for
speed, we prefer to have the loop operate on the reversed layout,
changing on entry and exit from the loop:

mov x3, x0
adrpx0, .LC0
add x1, x2, 1600
ldr q2, [x0, #:lo12:.LC0]
ldr q0, [x3]
mov v1.16b, v0.16b
tbl v0.16b, {v0.16b - v1.16b}, v2.16b// <
.p2align 3,,7
.L6:
ldr q1, [x2], 16
add v0.4s, v0.4s, v1.4s
cmp x2, x1
bne .L6
mov v1.16b, v0.16b
adrpx0, .LC0
ldr q2, [x0, #:lo12:.LC0]
tbl v0.16b, {v0.16b - v1.16b}, v2.16b// <
str q0, [x3]
ret

Similarly, for the very artificial testcase:

int f4(int *__restrict a, int *__restrict b, int *__restrict c,
   int *__restrict d)
{
  int a0 = a[0];
  int a1 = a[1];
  int a2 = a[2];
  int a3 = a[3];
  for (int i = 0; i < 100; ++i)
{
  a0 ^= c[0];
  a1 ^= c[1];
  a2 ^= c[2];
  a3 ^= c[3];
  c += 4;
  for (int j = 0; j < 100; ++j)
{
  a0 += d[1];
  a1 += d[0];
  a2 += d[3];
  a3 += d[2];
  d += 4;
}
  b[0] = a0;
  b[1] = a1;
  b[2] = a2;
 

Re: [COMMITTED] Make irange dependency explicit for range_of_ssa_name_with_loop_info.

2022-08-02 Thread Richard Biener via Gcc-patches
On Mon, Aug 1, 2022 at 8:17 AM Aldy Hernandez via Gcc-patches
 wrote:
>
> Even though ranger is type agnostic, SCEV seems to only work with
> integers.  This patch removes some FIXME notes making it explicit that
> bounds_of_var_in_loop only works with iranges.

SCEV also handles floats, where do you see this failing?  Yes, support is
somewhat limited, but still.

> Tested on x86-64 Linux.
>
> gcc/ChangeLog:
>
> * gimple-range-fold.cc (fold_using_range::range_of_phi): Only
> query SCEV for integers.
> (fold_using_range::range_of_ssa_name_with_loop_info): Remove
> irange check.
> ---
>  gcc/gimple-range-fold.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
> index 6f907df5bf5..7665c954f2b 100644
> --- a/gcc/gimple-range-fold.cc
> +++ b/gcc/gimple-range-fold.cc
> @@ -853,12 +853,14 @@ fold_using_range::range_of_phi (vrange , gphi *phi, 
> fur_source )
>}
>
>// If SCEV is available, query if this PHI has any knonwn values.
> -  if (scev_initialized_p () && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
> +  if (scev_initialized_p ()
> +  && !POINTER_TYPE_P (TREE_TYPE (phi_def))
> +  && irange::supports_p (TREE_TYPE (phi_def)))
>  {
> -  value_range loop_range;
>class loop *l = loop_containing_stmt (phi);
>if (l && loop_outer (l))
> {
> + int_range_max loop_range;
>   range_of_ssa_name_with_loop_info (loop_range, phi_def, l, phi, src);
>   if (!loop_range.varying_p ())
> {
> @@ -1337,9 +1339,7 @@ fold_using_range::range_of_ssa_name_with_loop_info 
> (irange , tree name,
>  {
>gcc_checking_assert (TREE_CODE (name) == SSA_NAME);
>tree min, max, type = TREE_TYPE (name);
> -  // FIXME: Remove the supports_p() once all this can handle floats, etc.
> -  if (irange::supports_p (type)
> -  && bounds_of_var_in_loop (, , src.query (), l, phi, name))
> +  if (bounds_of_var_in_loop (, , src.query (), l, phi, name))
>  {
>if (TREE_CODE (min) != INTEGER_CST)
> {
> --
> 2.37.1
>


Re: [GCC13][Patch][V2][1/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-08-02 Thread Richard Biener via Gcc-patches
On Mon, 1 Aug 2022, Qing Zhao wrote:

> 
> 
> > On Aug 1, 2022, at 3:38 AM, Richard Biener  wrote:
> > 
> > On Fri, 29 Jul 2022, Qing Zhao wrote:
> > 
> >> Hi, Richard,
> >> 
> >> Thanks a lot for your comments and suggestions. (And sorry for my late 
> >> reply).
> >> 
> >>> On Jul 28, 2022, at 3:26 AM, Richard Biener  wrote:
> >>> 
> >>> On Tue, 19 Jul 2022, Qing Zhao wrote:
> >>> 
>  From 3854004802b8e2f132ebf218fc35a632f5e80c6a Mon Sep 17 00:00:00 2001
>  From: Qing Zhao 
>  Date: Mon, 18 Jul 2022 17:04:12 +
>  Subject: [PATCH 1/2] Add a new option -fstrict-flex-array[=n] and new
>  attribute strict_flex_array
>  
>  Add the following new option -fstrict-flex-array[=n] and a corresponding
>  attribute strict_flex_array to GCC:
>  
>  '-fstrict-flex-array'
>    Treat the trailing array of a structure as a flexible array member
>    in a stricter way.  The positive form is equivalent to
>    '-fstrict-flex-array=3', which is the strictest.  A trailing array
>    is treated as a flexible array member only when it is declared as a
>    flexible array member per C99 standard onwards.  The negative form
>    is equivalent to '-fstrict-flex-array=0', which is the least
>    strict.  All trailing arrays of structures are treated as flexible
>    array members.
>  
>  '-fstrict-flex-array=LEVEL'
>    Treat the trailing array of a structure as a flexible array member
>    in a stricter way.  The value of LEVEL controls the level of
>    strictness.
>  
>    The possible values of LEVEL are the same as for the
>    'strict_flex_array' attribute (*note Variable Attributes::).
>  
>    You can control this behavior for a specific trailing array field
>    of a structure by using the variable attribute 'strict_flex_array'
>    attribute (*note Variable Attributes::).
>  
>  'strict_flex_array (LEVEL)'
>    The 'strict_flex_array' attribute should be attached to the
>    trailing array field of a structure.  It specifies the level of
>    strictness of treating the trailing array field of a structure as a
>    flexible array member.  LEVEL must be an integer betwen 0 to 3.
>  
>    LEVEL=0 is the least strict level, all trailing arrays of
>    structures are treated as flexible array members.  LEVEL=3 is the
>    strictest level, only when the trailing array is declared as a
>    flexible array member per C99 standard onwards ([]), it is treated
>    as a flexible array member.
>  
>    There are two more levels in between 0 and 3, which are provided to
>    support older codes that use GCC zero-length array extension ([0])
>    or one-size array as flexible array member ([1]): When LEVEL is 1,
>    the trailing array is treated as a flexible array member when it is
>    declared as either [], [0], or [1]; When LEVEL is 2, the trailing
>    array is treated as a flexible array member when it is declared as
>    either [], or [0].
>  
>    This attribute can be used with or without '-fstrict-flex-array'.
>    When both the attribute and the option present at the same time,
>    the level of the strictness for the specific trailing array field
>    is determined by the attribute.
>  
>  gcc/c-family/ChangeLog:
>  
>   * c-attribs.cc (handle_strict_flex_array_attribute): New function.
>   (c_common_attribute_table): New item for strict_flex_array.
>   * c.opt (fstrict-flex-array): New option.
>   (fstrict-flex-array=): New option.
>  
>  gcc/c/ChangeLog:
>  
>   * c-decl.cc (add_flexible_array_elts_to_size): Call new utility
>   routine flexible_array_member_p.
>   (is_flexible_array_member_p): New function.
>   (finish_struct): Set the new DECL_NOT_FLEXARRAY flag.
>  
>  gcc/ChangeLog:
>  
>   * doc/extend.texi: Document strict_flex_array attribute.
>   * doc/invoke.texi: Document -fstrict-flex-array[=n] option.
>   * tree-core.h (struct tree_decl_common): New bit field
>   decl_not_flexarray.
>   * tree.cc (component_ref_size): Reorg by using new utility functions.
>   (flexible_array_member_p): New function.
>   (zero_length_array_p): Likewise.
>   (one_element_array_p): Likewise.
>   (flexible_array_type_p): Likewise.
>   * tree.h (DECL_NOT_FLEXARRAY): New flag.
>   (zero_length_array_p): New function prototype.
>   (one_element_array_p): Likewise.
>   (flexible_array_member_p): Likewise.
>  
>  gcc/testsuite/ChangeLog:
>  
>   * gcc.dg/strict-flex-array-1.c: New test.
>  ---
>  gcc/c-family/c-attribs.cc  |  47 
>  gcc/c-family/c.opt |   7 ++
>  gcc/c/c-decl.cc|  91 +--
>  gcc/doc/extend.texi|  25 
>  gcc/doc/invoke.texi 

[PATCH] tree-optimization/106498 - reduce SSA updates in autopar

2022-08-02 Thread Richard Biener via Gcc-patches
The following reduces the number of SSA updates done during autopar
OMP expansion, specifically avoiding the cases that just add virtual
operands (where maybe none have been before) in dead regions of the CFG.

Instead virtual SSA update is delayed until after the pass.  There's
much more TLC needed here, but test coverage makes it really difficult.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106498
* omp-expand.cc (expand_omp_taskreg): Do not perform virtual
SSA update here.
(expand_omp_for): Or here.
(execute_expand_omp): Instead schedule it here together
with CFG cleanup via TODO.
---
 gcc/omp-expand.cc | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 936adff7f45..64e6308fc7b 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -1530,8 +1530,6 @@ expand_omp_taskreg (struct omp_region *region)
 expand_teams_call (new_bb, as_a  (entry_stmt));
   else
 expand_task_call (region, new_bb, as_a  (entry_stmt));
-  if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 /* Information about members of an OpenACC collapsed loop nest.  */
@@ -8191,9 +8189,6 @@ expand_omp_for (struct omp_region *region, gimple 
*inner_stmt)
  (enum built_in_function) next_ix, sched_arg,
  inner_stmt);
 }
-
-  if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 /* Expand code for an OpenMP sections directive.  In pseudo code, we generate
@@ -10591,13 +10586,10 @@ execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
-  if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
-verify_loop_structure ();
-  cleanup_tree_cfg ();
-
   omp_free_regions ();
 
-  return 0;
+  return (TODO_cleanup_cfg
+ | (gimple_in_ssa_p (cfun) ? TODO_update_ssa_only_virtuals : 0));
 }
 
 /* OMP expansion -- the default pass, run before creation of SSA form.  */
-- 
2.35.3


[PATCH] lto/106334 - fix previous fix wrt -flto-partition=none

2022-08-02 Thread Richard Biener via Gcc-patches
This adjusts the assert guard to include -flto-partition=none which
behaves as WPA.

Bootstrapped & tested on x86_64-unknown-linux-gnu, pushed.

PR lto/106334
* dwarf2out.cc (dwarf2out_register_external_die): Adjust
assert.
---
 gcc/dwarf2out.cc | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 3ac39c1a5b0..cfea9cf6451 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -6069,11 +6069,12 @@ dwarf2out_register_external_die (tree decl, const char 
*sym,
 
   if (!external_die_map)
 external_die_map = hash_map::create_ggc (1000);
-  /* When we do tree merging during WPA we can end up re-using GC memory
- as there's currently no way to unregister external DIEs.  Ideally
- we'd register them only after merging finished but allowing override
- here is easiest.  See PR106334.  */
-  gcc_checking_assert (flag_wpa || !external_die_map->get (decl));
+  /* When we do tree merging during WPA or with -flto-partition=none we
+ can end up re-using GC memory as there's currently no way to unregister
+ external DIEs.  Ideally we'd register them only after merging finished
+ but allowing override here is easiest.  See PR106334.  */
+  gcc_checking_assert (!(in_lto_p && !flag_wpa)
+  || !external_die_map->get (decl));
   sym_off_pair p = { IDENTIFIER_POINTER (get_identifier (sym)), off };
   external_die_map->put (decl, p);
 }
-- 
2.35.3