PING^1: [PATCH] ipa-devirt: check precision mismatch of enum values [PR101396]

2021-07-18 Thread Xi Ruoyao via Gcc-patches
Ping.

On Sun, 2021-07-11 at 01:48 +0800, Xi Ruoyao wrote:
> We are comparing enum values (in wide_int) to check ODR violation.
> However, if we compare two wide_int values with different precision,
> we'll trigger an assert, leading to ICE.  With enum-base introduced
> in C++11, it's easy to sink into this situation.
> 
> To fix the issue, we need to explicitly check this kind of mismatch,
> and emit a proper warning message if there is such one.
> 
> Bootstrapped & regtested on x86_64-linux-gnu.  Ok for trunk?
> 
> gcc/
> 
> PR ipa/101396
> * ipa-devirt.c (ipa_odr_read_section): Compare the precision of
>   enum values, and emit a warning if they mismatch.
> 
> gcc/testsuite/
> 
> PR ipa/101396
> * g++.dg/lto/pr101396_0.C: New test.
> * g++.dg/lto/pr101396_1.C: New test.
> ---
>  gcc/ipa-devirt.c  |  9 +
>  gcc/testsuite/g++.dg/lto/pr101396_0.C | 12 
>  gcc/testsuite/g++.dg/lto/pr101396_1.C | 10 ++
>  3 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr101396_0.C
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr101396_1.C
> 
> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
> index 8cd1100aba9..8deec75b2df 100644
> --- a/gcc/ipa-devirt.c
> +++ b/gcc/ipa-devirt.c
> @@ -4193,6 +4193,8 @@ ipa_odr_read_section (struct lto_file_decl_data 
> *file_data, const char *data,
>   if (do_warning != -1 || j >= this_enum.vals.length ())
> continue;
>   if (strcmp (id, this_enum.vals[j].name)
> + || (val.get_precision() !=
> + this_enum.vals[j].val.get_precision())
>   || val != this_enum.vals[j].val)
> {
>   warn_name = xstrdup (id);
> @@ -4260,6 +4262,13 @@ ipa_odr_read_section (struct lto_file_decl_data 
> *file_data, const char *data,
>     "name %qs differs from name %qs defined"
>     " in another translation unit",
>     this_enum.vals[j].name, warn_name);
> + else if (this_enum.vals[j].val.get_precision() !=
> +  warn_value.get_precision())
> +   inform (this_enum.vals[j].locus,
> +   "name %qs is defined as %u-bit while another "
> +   "translation unit defines it as %u-bit",
> +   warn_name, this_enum.vals[j].val.get_precision(),
> +   warn_value.get_precision());
>   /* FIXME: In case there is easy way to print wide_ints,
>  perhaps we could do it here instead of overflow check.  
> */
>   else if (wi::fits_shwi_p (this_enum.vals[j].val)
> diff --git a/gcc/testsuite/g++.dg/lto/pr101396_0.C 
> b/gcc/testsuite/g++.dg/lto/pr101396_0.C
> new file mode 100644
> index 000..b7a2947a880
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr101396_0.C
> @@ -0,0 +1,12 @@
> +/* { dg-lto-do link } */
> +
> +enum A : __UINT32_TYPE__ { // { dg-lto-warning "6: type 'A' violates the 
> C\\+\\+ One Definition Rule" }
> +  a, // { dg-lto-note "3: name 'a' is defined as 32-bit while another 
> translation unit defines it as 64-bit" }
> +  b,
> +  c
> +};
> +
> +int main()
> +{
> +  return (int) A::a;
> +}
> diff --git a/gcc/testsuite/g++.dg/lto/pr101396_1.C 
> b/gcc/testsuite/g++.dg/lto/pr101396_1.C
> new file mode 100644
> index 000..a6d032d694d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr101396_1.C
> @@ -0,0 +1,10 @@
> +enum A : __UINT64_TYPE__ { // { dg-lto-note "6: an enum with different value 
> name is defined in another translation unit" }
> +  a, // { dg-lto-note "3: mismatching definition" }
> +  b,
> +  c
> +};
> +
> +int f(enum A x)
> +{
> +  return (int) x;
> +}

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Ping [PATCH, rs6000] fix failure test cases caused by disabling mode promotion for pseudos [PR100952]

2021-07-18 Thread HAO CHEN GUI via Gcc-patches

Hi,

  Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574503.html

Thanks.

On 6/7/2021 上午 11:11, HAO CHEN GUI wrote:


Hi

   The patch changed matching conditions in pr81384.c and pr56605.c. 
The original conditions failed to match due to mode promotion disabled.


   The attachments are the patch diff and change log file.

   Bootstrapped and tested on powerpc64le-linux with no regressions. 
Is this okay for trunk? Any recommendations? Thanks a lot.




Ping [PATCH, rs6000] fix execution failure of parity_1.f90 on P10 [PR100952]

2021-07-18 Thread HAO CHEN GUI via Gcc-patches

Hi,

 Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575036.html

Thanks

On 13/7/2021 上午 9:38, HAO CHEN GUI wrote:

Hi,

   I refined the patch according to Segher's advice. Is this okay for 
trunk? Any recommendations? Thanks a lot.


On 6/7/2021 上午 11:01, HAO CHEN GUI wrote:

Hi,

   The patch fixed the wrong "if" fall through in "cstore4" 
expand, which causes comparison pattern expanded twice on P10.


   The attachments are the patch diff and change log file.

    Bootstrapped and tested on powerpc64le-linux with no regressions. 
Is this okay for trunk? Any recommendations? Thanks a lot.




Re: [PATCH] Fold bswap32(x) != 0 to x != 0 (and related transforms)

2021-07-18 Thread Marc Glisse

On Sun, 18 Jul 2021, Roger Sayle wrote:


+(if (GIMPLE || !TREE_SIDE_EFFECTS (@0))


I don't think you need to worry about that, the general genmatch machinery 
is already supposed to take care of it. All the existing cases in match.pd 
are about cond_expr, where counting the occurrences of each @i is not 
reliable.


--
Marc Glisse


RE: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-07-18 Thread Di Zhao OS via Gcc-patches

I tried to improve the patch following your advices and to catch more
opportunities. Hope it'll be helpful.

On 6/24/21 8:29 AM, Richard Biener wrote: 
> On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches  patc...@gcc.gnu.org> wrote:
> 
> I have some reservations about extending the ad-hoc "predicated value" code.
> 
> Some comments on the patch:
> 
> +/* hashtable & helpers to record equivalences at given bb.  */
> +
> +typedef struct val_equiv_s
> +{
> +  val_equiv_s *next;
> +  val_equiv_s *unwind_to;
> +  hashval_t hashcode;
> +  /* SSA name this val_equiv_s is associated with.  */
> +  tree name;
> +  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
> +  vn_pval *values;
> +} * val_equiv_t;
> 
> all of this (and using a hashtable for recording) is IMHO a bit overkill.
> Since you only ever record equivalences for values the more natural place to
> hook those in is the vn_ssa_aux structure where we also record the 
> availability
> chain.
 
I tried to store the equivalences in the vn_ssa_aux structure, but I didn't  
optimize the second case successfully: I need to record the equivalence
of a PHI expression's result and arguments, but their value number results will
become VARYING first, so they won't be changed. Maybe I'm missing something, or
can I force change a VARYING result?

Besides, the way currently used, equivalences only need to be "predictable"
rather than available, maybe availability chains do not represent them very
well?

> There's little commentary in the new code, in particular function-level
> comments are missing everywhere.

Added more comments.

> There's complexity issues, like I see val_equiv_insert has a "recurse"
> feature but also find_predicated_value_by_equiv is quadratic in the number of
> equivalences of the lhs/rhs.  Without knowing what the recursion on the
> former is for - nothing tells me - I suspect either of both should be 
> redundant.

The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of
"C opcode Z", to find the result of "A opcode Y". I removed the "recurse"
feature and modified the searching logic so solve the issue. Now a temporary
hash_set is used to record the equivalences that are visited when searching.

> You seem to record equivalences at possible use points which looks odd at best
> - I'd expected equivalences being recorded at the same point we record
> predicated values and for the current condition, not the one determining some
> other predication.
> What was the motivation to do it the way you do it?

The purpose is to "bring down" what can be known from a previous basic-block
that effectively dominates current block, but not actually does so (in the
example it is because jump threading is hindered by a loop). For example in
this case:

  if (a != 0)
  // Nothing useful can be recorded here, because this BB doesn't dominate
  // the BB that we want to simplify.
  c = b;
  for (unsigned i = 0; i < c; i++)
{
  if (a != 0)  // The recording is triggered here.
   {
 // c == b will be recorded here, so it can be used for simplification.
 // In gimple it is the equivalence of a PHI's result and argument.
 if (i >= b) 
   foo ();

These requires finding a previous condition that is identical with current
one, so it is convenient to do this in FRE. Besides, as FRE records derived
predicate, so for relative conditions there also might be opportunities for
optimization. In the new patch code this is included.

Besides, to find more opportunities, added a hashmap to store mappings from
immediate dominators to basic-blocks with PHIs of interest.

> Why is the code conditional on 'iterate'?

I haven't worked it out to fit the non-iterate mode, so it now breaks the
if_conversion pass. I think this is because some of the equivalence-recordings
are too optimistic for non-iterate mode.

> You've probably realized that there's no "nice" way to handle temporary
> equivalences in a VN scheme using hashing for expressions (unless you degrade
> hashes a lot).

I modified the code to use TREE_HASH on ssa names. Would that be better?  

> You quote opportunities that are catched with this like
> 
> +  if (a != 0)
> +{
> +  c = b;
> +}
> +  for (unsigned i = 0; i < c; i++)
> +{
> +  if (a != 0)
> +   {
> + if (i >= b)
> +   /* Should be eliminated.
> +*/
> +   foo ();
> 
> but say other "obvious" cases like
> 
> +  if (a != 0)
> +{
> +  c = b;
> +}
> +  for (unsigned i = 0; i < c; i++)
> +{
> +  if (a != 0)
> +   {
> +   /* Should be zero.  */
>  return b - c;
> 
> are not handled.  That's in line with the current "value predication"
> which mainly aims at catching simple jump threading opportunities; you only
> simplify conditions with the recorded equivalences.  But then the complexity 
> of
> handling equivalences does probably not outweight the opportunities catched -

Re: [PATCH] ix86: Enable the GPR only instructions for -mgeneral-regs-only

2021-07-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 18, 2021 at 3:40 AM H.J. Lu  wrote:
>
> For -mgeneral-regs-only, enable the GPR only instructions which are
> enabled implicitly by SSE ISAs unless they have been disabled explicitly.
>
> gcc/
>
> PR target/101492
> * common/config/i386/i386-common.c (ix86_handle_option): For
> -mgeneral-regs-only, enable the GPR only instructions which are
> enabled implicitly by SSE ISAs unless they have been disabled
> explicitly.
>
> gcc/testsuite/
>
> PR target/101492
> * gcc.target/i386/pr101492-1.c: New test.
> * gcc.target/i386/pr101492-2.c: Likewise.
> * gcc.target/i386/pr101492-3.c: Likewise.
> * gcc.target/i386/pr101492-4.c: Likewise.

LGTM.

Thanks,
Uros.

> ---
>  gcc/common/config/i386/i386-common.c   | 27 --
>  gcc/testsuite/gcc.target/i386/pr101492-1.c | 10 
>  gcc/testsuite/gcc.target/i386/pr101492-2.c | 10 
>  gcc/testsuite/gcc.target/i386/pr101492-3.c | 10 
>  gcc/testsuite/gcc.target/i386/pr101492-4.c | 12 ++
>  5 files changed, 67 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101492-4.c
>
> diff --git a/gcc/common/config/i386/i386-common.c 
> b/gcc/common/config/i386/i386-common.c
> index e156cc34584..76ab1a14e54 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -354,16 +354,39 @@ ix86_handle_option (struct gcc_options *opts,
>  case OPT_mgeneral_regs_only:
>if (value)
> {
> + HOST_WIDE_INT general_regs_only_flags = 0;
> + HOST_WIDE_INT general_regs_only_flags2 = 0;
> +
> + /* NB: Enable the GPR only instructions which are enabled
> +implicitly by SSE ISAs unless they have been disabled
> +explicitly.  */
> + if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
> +   {
> + if (!TARGET_EXPLICIT_CRC32_P (opts))
> +   general_regs_only_flags |= OPTION_MASK_ISA_CRC32;
> + if (!TARGET_EXPLICIT_POPCNT_P (opts))
> +   general_regs_only_flags |= OPTION_MASK_ISA_POPCNT;
> +   }
> + if (TARGET_SSE3_P (opts->x_ix86_isa_flags))
> +   {
> + if (!TARGET_EXPLICIT_MWAIT_P (opts))
> +   general_regs_only_flags2 |= OPTION_MASK_ISA2_MWAIT;
> +   }
> +
>   /* Disable MMX, SSE and x87 instructions if only
>  general registers are allowed.  */
>   opts->x_ix86_isa_flags
> &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
>   opts->x_ix86_isa_flags2
> &= ~OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET;
> + opts->x_ix86_isa_flags |= general_regs_only_flags;
> + opts->x_ix86_isa_flags2 |= general_regs_only_flags2;
>   opts->x_ix86_isa_flags_explicit
> -   |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
> +   |= (OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET
> +   | general_regs_only_flags);
>   opts->x_ix86_isa_flags2_explicit
> -   |= OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET;
> +   |= (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET
> +   | general_regs_only_flags2);
>
>   opts->x_target_flags &= ~MASK_80387;
> }
> diff --git a/gcc/testsuite/gcc.target/i386/pr101492-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101492-1.c
> new file mode 100644
> index 000..41002571761
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101492-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse4.2 -mgeneral-regs-only" } */
> +
> +#include 
> +
> +unsigned int
> +foo1 (unsigned int x, unsigned int y)
> +{
> +  return __crc32d (x, y);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr101492-2.c 
> b/gcc/testsuite/gcc.target/i386/pr101492-2.c
> new file mode 100644
> index 000..c7d24f43c39
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101492-2.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse4.2 -mgeneral-regs-only" } */
> +
> +#include 
> +
> +unsigned int
> +foo1 (unsigned int x)
> +{
> +  return _mm_popcnt_u32 (x);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr101492-3.c 
> b/gcc/testsuite/gcc.target/i386/pr101492-3.c
> new file mode 100644
> index 000..37e2071ab57
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101492-3.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse3 -mgeneral-regs-only" } */
> +
> +#include 
> +
> +void
> +foo1 (unsigned int x, unsigned int y)
> +{
> +  _mm_mwait (x, y);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr101492-4.c 
> b/gcc/testsuite/gcc.target/i386/pr101492-4.c
> new file mode 100644
> index 000..c5a4f0abd25
> --- /dev/null
> 

Re: [PATCH] x86: Don't issue vzeroupper if callee returns AVX register

2021-07-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 18, 2021 at 6:47 PM H.J. Lu  wrote:
>
> Don't issue vzeroupper before function call if callee returns AVX
> register since callee must be compiled with AVX.
>
> gcc/
>
> PR target/101495
> * config/i386/i386.c (ix86_check_avx_upper_stores): Moved before
> ix86_avx_u128_mode_needed.
> (ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY if callee
> returns AVX register.
>
> gcc/testsuite/
>
> PR target/101495
> * gcc.target/i386/avx-vzeroupper-28.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c| 32 ---
>  .../gcc.target/i386/avx-vzeroupper-28.c   | 17 ++
>  2 files changed, 37 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 9d74b7a191b..e6c82624272 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -14093,6 +14093,18 @@ ix86_check_avx_upper_register (const_rtx exp)
>   && GET_MODE_BITSIZE (GET_MODE (exp)) > 128);
>  }
>
> +/* Check if a 256bit or 512bit AVX register is referenced in stores.   */
> +
> +static void
> +ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data)
> + {
> +   if (ix86_check_avx_upper_register (dest))
> +{
> +  bool *used = (bool *) data;
> +  *used = true;
> +}
> + }
> +
>  /* Return needed mode for entity in optimize_mode_switching pass.  */
>
>  static int
> @@ -14117,6 +14129,14 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
> }
> }
>
> +  /* Needed mode is set to AVX_U128_CLEAN if there are no 256bit
> + nor 512bit registers used in the function return register.  */
> +  bool avx_upper_reg_found = false;
> +  note_stores (insn, ix86_check_avx_upper_stores,
> +  _upper_reg_found);
> +  if (avx_upper_reg_found)
> +   return AVX_U128_DIRTY;
> +
>/* If the function is known to preserve some SSE registers,
>  RA and previous passes can legitimately rely on that for
>  modes wider than 256 bits.  It's only safe to issue a
> @@ -14217,18 +14237,6 @@ ix86_mode_needed (int entity, rtx_insn *insn)
>return 0;
>  }
>
> -/* Check if a 256bit or 512bit AVX register is referenced in stores.   */
> -
> -static void
> -ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data)
> - {
> -   if (ix86_check_avx_upper_register (dest))
> -{
> -  bool *used = (bool *) data;
> -  *used = true;
> -}
> - }
> -
>  /* Calculate mode of upper 128bit AVX registers after the insn.  */
>
>  static int
> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c 
> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c
> new file mode 100644
> index 000..381ee9a7f96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c
> @@ -0,0 +1,17 @@
> +/* PR target/101495  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -mtune=generic -dp" } */
> +
> +#include 
> +
> +extern __m256 x, y;
> +extern __m256 bar (void);
> +
> +__m256
> +foo ()
> +{
> +  x = y;
> +  return bar ();
> +}
> +
> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
> --
> 2.31.1
>


[PATCH] x86: Don't issue vzeroupper if callee returns AVX register

2021-07-18 Thread H.J. Lu via Gcc-patches
Don't issue vzeroupper before function call if callee returns AVX
register since callee must be compiled with AVX.

gcc/

PR target/101495
* config/i386/i386.c (ix86_check_avx_upper_stores): Moved before
ix86_avx_u128_mode_needed.
(ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY if callee
returns AVX register.

gcc/testsuite/

PR target/101495
* gcc.target/i386/avx-vzeroupper-28.c: New test.
---
 gcc/config/i386/i386.c| 32 ---
 .../gcc.target/i386/avx-vzeroupper-28.c   | 17 ++
 2 files changed, 37 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9d74b7a191b..e6c82624272 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14093,6 +14093,18 @@ ix86_check_avx_upper_register (const_rtx exp)
  && GET_MODE_BITSIZE (GET_MODE (exp)) > 128);
 }
 
+/* Check if a 256bit or 512bit AVX register is referenced in stores.   */
+
+static void
+ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data)
+ {
+   if (ix86_check_avx_upper_register (dest))
+{
+  bool *used = (bool *) data;
+  *used = true;
+}
+ }
+
 /* Return needed mode for entity in optimize_mode_switching pass.  */
 
 static int
@@ -14117,6 +14129,14 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
}
}
 
+  /* Needed mode is set to AVX_U128_CLEAN if there are no 256bit
+ nor 512bit registers used in the function return register.  */
+  bool avx_upper_reg_found = false;
+  note_stores (insn, ix86_check_avx_upper_stores,
+  _upper_reg_found);
+  if (avx_upper_reg_found)
+   return AVX_U128_DIRTY;
+
   /* If the function is known to preserve some SSE registers,
 RA and previous passes can legitimately rely on that for
 modes wider than 256 bits.  It's only safe to issue a
@@ -14217,18 +14237,6 @@ ix86_mode_needed (int entity, rtx_insn *insn)
   return 0;
 }
 
-/* Check if a 256bit or 512bit AVX register is referenced in stores.   */
- 
-static void
-ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data)
- {
-   if (ix86_check_avx_upper_register (dest))
-{
-  bool *used = (bool *) data;
-  *used = true;
-}
- } 
-
 /* Calculate mode of upper 128bit AVX registers after the insn.  */
 
 static int
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c 
b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c
new file mode 100644
index 000..381ee9a7f96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c
@@ -0,0 +1,17 @@
+/* PR target/101495  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -mtune=generic -dp" } */
+
+#include 
+
+extern __m256 x, y;
+extern __m256 bar (void);
+
+__m256
+foo ()
+{
+  x = y;
+  return bar ();
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
-- 
2.31.1



[PATCH] Fold bswap32(x) != 0 to x != 0 (and related transforms)

2021-07-18 Thread Roger Sayle
This patch to match.pd implements several closely related folding
simplifications at the tree-level, that make use of the property
that bit permutation functions, rotate and bswap have inverses.

[1] bswap(X) eq/ne C, for constant C, simplifies to X eq/ne C'
where C'=bswap(C), generalizing the transform in the subject.
[2] bswap(X) eq/ne bswap(Y) simplifies to X eq/ne Y.
[3] lrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3, where
C3 = rrotate(C2,C1), i.e. apply the inverse rotation to C2.
[4] Likewise, rrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3,
where C3 = lrotate(C2,C1).
[5] rotate(X,Z) eq/ne rotate(Y,Z) simplifies to X eq/ne Y, when
the bit-count Z (the same on both sides) has no side-effects.
[6] rotate(X,Y) eq/ne 0 simplifies to X eq/ne 0 if Y has no
side-effects.
[7] Likewise, rotate(X,Y) eq/ne -1 simplifies to X eq/ne -1,
if Y has no side-effects.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.

Ok for mainline?


2010-07-18  Roger Sayle  

gcc/ChangeLog
* match.pd (rotate): Simplify equality/inequality of rotations.
(bswap): Simplify equality/inequality tests of byte swapping.

gcc/testsuite/ChangeLog
* gcc.dg/fold-eqrotate-1.c: New test case.
* gcc.dg/fold-eqbswap-1.c: New test case.

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index beb8d27..aa850bb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3312,6 +3312,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  { tree rotate_type = TREE_TYPE (@0); }
   (convert (rotate (convert:rotate_type @1) @2))
 
+(for cmp (eq ne)
+ (for rotate (lrotate rrotate)
+  invrot (rrotate lrotate)
+  /* (X >>r Y) cmp (Z >>r Y) may simplify to X cmp Y. */
+  (simplify
+   (cmp (rotate @1 @0) (rotate @2 @0))
+(if (GIMPLE || !TREE_SIDE_EFFECTS (@0))
+ (cmp @1 @2)))
+  /* (X >>r C1) cmp C2 may simplify to X cmp C3. */
+  (simplify
+   (cmp (rotate @0 INTEGER_CST@1) INTEGER_CST@2)
+   (cmp @0 { const_binop (invrot, TREE_TYPE (@0), @2, @1); }))
+  /* (X >>r Y) cmp C where C is 0 or ~0, may simplify to X cmp C.  */
+  (simplify
+   (cmp (rotate @0 @1) INTEGER_CST@2)
+(if ((GIMPLE || !TREE_SIDE_EFFECTS (@1))
+&& (integer_zerop (@2) || integer_all_onesp (@2)))
+ (cmp @0 @2)
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
@@ -3622,6 +3641,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (simplify
(bswap (bitop:c (bswap @0) @1))
(bitop @0 (bswap @1
+ (for cmp (eq ne)
+  (simplify
+   (cmp (bswap @0) (bswap @1))
+   (cmp @0 @1))
+  (simplify
+   (cmp (bswap @0) INTEGER_CST@1)
+   (cmp @0 (bswap @1
  /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
  (simplify
   (bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

int test1(int x, int y)
{
#if __SIZEOF_INT__ == 4
  return __builtin_bswap32(x) == __builtin_bswap32(y);
#else
  return x == y;
#endif
}

int test2(int x, int y)
{
#if __SIZEOF_INT__ == 4
  return __builtin_bswap32(x) != __builtin_bswap32(y);
#else
  return x != y;
#endif
}

int test3(int x)
{
#if __SIZEOF_INT__ == 4
  return __builtin_bswap32(x) == 12345;
#else
  return x;
#endif
}

int test4(int x)
{
#if __SIZEOF_INT__ == 4
  return __builtin_bswap32(x) != 12345;
#else
  return x;
#endif
}

int test1ll(long long x, long long y)
{
#if __SIZEOF_LONG_LONG__ == 8
  return __builtin_bswap64(x) == __builtin_bswap64(y);
#else
  return x == y;
#endif
}

int test2ll(long long x, long long y)
{
#if __SIZEOF_LONG_LONG__ == 8
  return __builtin_bswap64(x) != __builtin_bswap64(y);
#else
  return x != y;
#endif
}

int test3ll(long long x)
{
#if __SIZEOF_LONG_LONG__ == 8
  return __builtin_bswap64(x) == 12345;
#else
  return (int)x;
#endif
}

int test4ll(long long x)
{
#if __SIZEOF_LONG_LONG__ == 8
  return __builtin_bswap64(x) != 12345;
#else
  return (int)x;
#endif
}

int test1s(short x, short y)
{
#if __SIZEOF_SHORT__ == 2
  return __builtin_bswap16(x) == __builtin_bswap16(y);
#else
  return x == y;
#endif
}

int test2s(short x, short y)
{
#if __SIZEOF_SHORT__ == 2
  return __builtin_bswap16(x) != __builtin_bswap16(y);
#else
  return x != y;
#endif
}

int test3s(short x)
{
#if __SIZEOF_SHORT__ == 2
  return __builtin_bswap16(x) == 12345;
#else
  return (int)x;
#endif
}

int test4s(short x)
{
#if __SIZEOF_SHORT__ == 2
  return __builtin_bswap16(x) != 12345;
#else
  return (int)x;
#endif
}

/* { dg-final { scan-tree-dump-times "__builtin_bswap" 0 "optimized" } } */

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

int test1(unsigned int x, unsigned int y)
{
#if __SIZEOF_INT__ == 4
  unsigned int r1 = (x << 16) | (x >> 16);
  unsigned int r2 = (y << 16) | (y >> 16);
  return r1 == r2;
#else
  return x == y;

Re: [PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-18 Thread Andreas Schwab
On Jul 18 2021, Segher Boessenkool wrote:

> On Sat, Jul 17, 2021 at 03:13:59PM -0700, Andrew Pinski wrote:
>> On Sat, Jul 17, 2021 at 2:31 PM Segher Boessenkool
>>  wrote:
>> > On Fri, Jul 16, 2021 at 11:35:25AM -0700, apinski--- via Gcc-patches wrote:
>> > > --- a/gcc/c-family/c-common.c
>> > > +++ b/gcc/c-family/c-common.c
>> > > @@ -5799,7 +5799,7 @@ parse_optimize_options (tree args, bool attr_p)
>> > >
>> > >if (TREE_CODE (value) == INTEGER_CST)
>> > >   {
>> > > -   char buffer[20];
>> > > +   char buffer[HOST_BITS_PER_LONG / 3 + 4];
>> > > sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
>> > > vec_safe_push (optimize_args, ggc_strdup (buffer));
>> > >   }
>> >
>> > This calculation is correct, assuming "value" is non-negative.  You can
>> > easily avoid all of that though:
>> 
>> Actually it is still correct even for negative values because we are
>> adding 4 rather than 3 to make sure there is enough room for the sign
>> character.
>
> Say your longs have only two bits, one sign and one value (so it is
> {-2, -1, 0, 1}).  There you get  char buffer[4]  although you need 5
> here if you care about negative numbers: '-', 'O', '-', '1', 0.
>
> But longs are at least 32 bits, of course.  And 14 chars is (just)
> enough, because you divide by only 3 now (instead of {}^2 log 10, or
> 3.32 as before), giving some leeway.
>
> (n/3+4 also fails for longs of sizes 5, 8, or 11 bits, but no more).

/* Bound on length of the string representing an unsigned integer
   value representable in B bits.  log10 (2.0) < 146/485.  The
   smallest value of B where this bound is not tight is 2621.  */
#define INT_BITS_STRLEN_BOUND(b) (((b) * 146 + 484) / 485)

/* Bound on length of the string representing an integer type or expression T.
   Subtract 1 for the sign bit if T is signed, and then add 1 more for
   a minus sign if needed.

   Because _GL_SIGNED_TYPE_OR_EXPR sometimes returns 1 when its argument is
   unsigned, this macro may overestimate the true bound by one byte when
   applied to unsigned types of size 2, 4, 16, ... bytes.  */
#define INT_STRLEN_BOUND(t) \
  (INT_BITS_STRLEN_BOUND (TYPE_WIDTH (t) - _GL_SIGNED_TYPE_OR_EXPR (t)) \
   + _GL_SIGNED_TYPE_OR_EXPR (t))

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-18 Thread Segher Boessenkool
On Sat, Jul 17, 2021 at 03:13:59PM -0700, Andrew Pinski wrote:
> On Sat, Jul 17, 2021 at 2:31 PM Segher Boessenkool
>  wrote:
> > On Fri, Jul 16, 2021 at 11:35:25AM -0700, apinski--- via Gcc-patches wrote:
> > > --- a/gcc/c-family/c-common.c
> > > +++ b/gcc/c-family/c-common.c
> > > @@ -5799,7 +5799,7 @@ parse_optimize_options (tree args, bool attr_p)
> > >
> > >if (TREE_CODE (value) == INTEGER_CST)
> > >   {
> > > -   char buffer[20];
> > > +   char buffer[HOST_BITS_PER_LONG / 3 + 4];
> > > sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
> > > vec_safe_push (optimize_args, ggc_strdup (buffer));
> > >   }
> >
> > This calculation is correct, assuming "value" is non-negative.  You can
> > easily avoid all of that though:
> 
> Actually it is still correct even for negative values because we are
> adding 4 rather than 3 to make sure there is enough room for the sign
> character.

Say your longs have only two bits, one sign and one value (so it is
{-2, -1, 0, 1}).  There you get  char buffer[4]  although you need 5
here if you care about negative numbers: '-', 'O', '-', '1', 0.

But longs are at least 32 bits, of course.  And 14 chars is (just)
enough, because you divide by only 3 now (instead of {}^2 log 10, or
3.32 as before), giving some leeway.

(n/3+4 also fails for longs of sizes 5, 8, or 11 bits, but no more).


This only strengthens my point though: it is much easier and it requires
much less thinking, but most of all it is less error-prone, if you let
the compiler do the tricky bits.  But it is less fun!  :-)


Segher