Re: [PATCH 3/4 v2] ivopts: Consider cost_step on different forms during unrolling

2020-08-21 Thread Bin.Cheng via Gcc-patches
On Tue, Aug 18, 2020 at 5:03 PM Kewen.Lin  wrote:
>
> Hi Bin,
>
> > I see, it's similar to the auto-increment case where cost should be
> > recorded only once.  So this is okay given 1) fine predicting
> > rtl-unroll is likely impossible here; 2) the patch has very limited
> > impact.
> >
> Really appreciate your time and patience!
>
> I extended the previous version to address Richard S.'s comments on
> candidates with the same base/step but different offsets here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547014.html.
>
> The previous version only allows the candidate derived from the group
> of interest, this updated patch extends it to those ones which have the
> same bases/steps and same/different offsets but in the acceptable range
> by considering unrolling.
>
> For one particular case like:
>
> for (i = 0; i < SIZE; i++)
>   y[i] = a * x[i] + z[i];
>
> we will mark reg_offset_p for IV candidates on x as below:
>- (unsigned long) (x_18(D) + 8)// only mark this before.
>- x_18(D) + 8
>- (unsigned long) (x_18(D) + 24)
>- (unsigned long) ((vector(2) double *) (x_18(D) + 8) + 
> 18446744073709551600)
>...
>
> Do you mind to have a review again?  Thanks in advance!
I trust you with the change.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P9.
>
> SPEC2017 P9 performance run has no remarkable degradations/improvements.
Is this run with unroll-loops?
Could you exercise the code with unroll-loops enabled when
bootstrap/regtest please?  It doesn't matter if cases fail with
unroll-loops, just making sure there is no fallout.  Otherwise it's
fine with me.

Thanks,
bin
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-ssa-loop-ivopts.c (struct iv_cand): New field reg_offset_p.
> (struct ivopts_data): New field consider_reg_offset_for_unroll_p.
> (mark_reg_offset_candidates): New function.
> (add_candidate_1): Set reg_offset_p to false for new candidate.
> (set_group_iv_cost): Scale up group cost with estimate_unroll_factor 
> if
> consider_reg_offset_for_unroll_p.
> (determine_iv_cost): Increase step cost with estimate_unroll_factor if
> consider_reg_offset_for_unroll_p.
> (tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update
> consider_reg_offset_for_unroll_p.
>


Re: [PATCH][GCC][GCC-10 backport] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Aug 21, 2020 at 2:28 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
> memory operands.
>
> Tested on arm-none-eabi, clean w.r.t. gcc and CMSIS-DSP testsuites. Backports
> cleanly onto gcc-10 branch. OK for backport?
>

OK.

Ramana



> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> PR target/96683
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/96683
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
>
> (cherry picked from commit 91d206adfe39ce063f6a5731b92a03c05e82e94a)
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  

Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
 wrote:
>
> armv8-m.base (cortex-m23) has the movt instruction, so we need to
> disable the define_split to generate a constant in this case,
> otherwise we get incorrect insn constraints as described in PR94538.
>
> We also need to fix the pure-code alternative for thumb1_movsi_insn
> because the assembler complains with instructions like
> movs r0, #:upper8_15:1234
> (Internal error in md_apply_fix)
> We now generate movs r0, 4 instead.
>
> 2020-08-19  Christophe Lyon  
> gcc/ChangeLog:
>
> * config/arm/thumb1.md: Disable set-constant splitter when
> TARGET_HAVE_MOVT.
> (thumb1_movsi_insn): Fix -mpure-code
> alternative.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/pure-code/pr94538-1.c: New test.
> * gcc.target/arm/pure-code/pr94538-2.c: New test.

 I take it that this fixes the ICE rather than addressing the code
generation / performance bits that Wilco was referring to ? Really the
other code quality / performance issues listed in the PR should really
be broken down into separate PRs while we take this as a fix for
fixing the ICE,

Under that assumption OK.

Ramana

> ---
>  gcc/config/arm/thumb1.md   | 66 
> ++
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
>  3 files changed, 79 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
>
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 0ff8190..f0129db 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -70,6 +70,7 @@ (define_split
>"TARGET_THUMB1
> && arm_disable_literal_pool
> && GET_CODE (operands[1]) == CONST_INT
> +   && !TARGET_HAVE_MOVT
> && !satisfies_constraint_I (operands[1])"
>[(clobber (const_int 0))]
>"
> @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
>"TARGET_THUMB1
> && (   register_operand (operands[0], SImode)
> || register_operand (operands[1], SImode))"
> -  "@
> -   movs%0, %1
> -   movs%0, %1
> -   movw%0, %1
> -   #
> -   #
> -   ldmia\\t%1, {%0}
> -   stmia\\t%0, {%1}
> -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> #:lower0_7:%1
> -   ldr\\t%0, %1
> -   str\\t%1, %0
> -   mov\\t%0, %1"
> +{
> +  switch (which_alternative)
> +{
> +  default:
> +  case 0: return "movs\t%0, %1";
> +  case 1: return "movs\t%0, %1";
> +  case 2: return "movw\t%0, %1";
> +  case 3: return "#";
> +  case 4: return "#";
> +  case 5: return "ldmia\t%1, {%0}";
> +  case 6: return "stmia\t%0, {%1}";
> +  case 7:
> +  /* pure-code alternative: build the constant byte by byte,
> +instead of loading it from a constant pool.  */
> +   {
> + int i;
> + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> + bool mov_done_p = false;
> + rtx ops[2];
> + ops[0] = operands[0];
> +
> + /* Emit upper 3 bytes if needed.  */
> + for (i = 0; i < 3; i++)
> +   {
> +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> +
> + if (byte)
> +   {
> + ops[1] = GEN_INT (byte);
> + if (mov_done_p)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + else
> +   output_asm_insn ("movs\t%0, %1", ops);
> + mov_done_p = true;
> +   }
> +
> + if (mov_done_p)
> +   output_asm_insn ("lsls\t%0, #8", ops);
> +   }
> +
> + /* Emit lower byte if needed.  */
> + ops[1] = GEN_INT (op1 & 0xff);
> + if (!mov_done_p)
> +   output_asm_insn ("movs\t%0, %1", ops);
> + else if (op1 & 0xff)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + return "";
> +   }
> +  case 8: return "ldr\t%0, %1";
> +  case 9: return "str\t%1, %0";
> +  case 10: return "mov\t%0, %1";
> +}
> +}
>[(set_attr "length" "2,2,4,4,4,2,2,14,2,2,2")
> (set_attr "type" 
> "mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,alu_sreg,load_4,store_4,mov_reg")
> (set_attr "pool_range" "*,*,*,*,*,*,*, *,1018,*,*")
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> new file mode 100644
> index 000..31061d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
> +/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb 
> -mfloat-abi=soft" } */
> +
> +typedef int 

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang  wrote:
>
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>

Hi Dennis,

Thanks for this patch . However a quick read suggests  at first glance
that it could do with some refactoring or indeed further breaking
down.

1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
the motivation for obviously on a quick read. I'll try and read that
again. Please document why these complex TARGET_ macros exist and how
they are expected to be used in the machine description and what they
are indicated to do.
2. It seems odd that we would have
 "&& ((mode != V2SFmode && mode != V4SFmode)
+|| flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
apply this to TARGET_MVE_FLOAT in the sub3 expander. The point
is that if it isn't safe to vectorize a subtract for Neon, why is it
safe to do the same for MVE ? This was done in 2010 by Julian to fix
PR target/43703 - isn't this applicable on MVE as well ?
3. I'm also going to quibble a bit about the use of VSEL as the name
of an iterator as that conflates it with the instruction vsel and it's
not obvious what's going on here.


> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>

I'm a bit confused as to why this got exposed because of the new MVE
vector modes exposed by this patch.

> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
--with-fpu=neon in the configuration ?


> Is it OK for trunk please?



Ramana

>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
> * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
> (TARGET_NEON_MVE_HFP): Likewise.
> * config/arm/iterators.md (VSEL): New mode iterator to select modes
> for corresponding targets.
> * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
> using expression 'minus'.
> (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
> * config/arm/neon.md (sub3): Removed here. Integrated in the
> sub3 in vec-common.md
> * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
> to select available modes. Exclude TARGET_NEON_FP16INST from
> TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> option -fno-ipa-icf and change the instruction count from 8 to 16.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> * gcc.target/arm/mve/vect/vect_sub_1.c: New test.


Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-21 Thread Josh Triplett
On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> This patch series add a new flag "-fparallel-jobs=" to control if the
> compiler should try to compile the current file in parallel.
[...]
> Bootstrapped and Regtested on Linux x86_64.
> 
> Giuliano Belinassi (6):
>   Modify gcc driver for parallel compilation
>   Implement a new partitioner for parallel compilation
>   Implement fork-based parallelism engine
>   Add `+' for Jobserver Integration
>   Add invoke documentation
>   New tests for parallel compilation feature

Very nice!

I'm interested in testing this on a highly parallel system. What
baseline do these patches apply to?  They don't seem to apply to GCC
trunk.

Also, I tried to bootstrap the current tip of the devel/autopar_devel
branch, but ended up with compiler segfaults that all look like this:
../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
   86 | }
  | ^


Re: [PATCH] c++: Implement P1009: Array size deduction in new-expressions.

2020-08-21 Thread Jason Merrill via Gcc-patches

On 8/20/20 4:22 PM, Marek Polacek wrote:

This patch implements C++20 P1009, allowing code like

   new double[]{1,2,3}; // array bound will be deduced

Since this proposal makes the initialization rules more consistent, it is
applied to all previous versions of C++ (thus, effectively, all the way back
to C++11).

My patch is based on Jason's patch that handled the basic case.  I've
extended it to work with ()-init and also the string literal case.
Further testing revealed that to handle stuff like

   new int[]{t...};

in a template, we have to consider such a NEW_EXPR type-dependent.
Obviously, we first have to expand the pack to be able to deduce the
number of elements in the array.

Curiously, while implementing this proposal, I noticed that we fail
to accept

   new char[4]{"abc"};

so I've assigned 77841 to self.  I think the fix will depend on the
build_new_1 hunk in this patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/93529
* init.c (build_new_1): Handle new char[]{"foo"}.
(build_new): Deduce the array size in new-expression if not
present.  Handle ()-init.  Handle initializing an array from
a string literal.
* parser.c (cp_parser_new_type_id): Leave [] alone.
(cp_parser_direct_new_declarator): Allow [].
* pt.c (type_dependent_expression_p): In a NEW_EXPR, consider
array types whose dimension has to be deduced type-dependent.

gcc/testsuite/ChangeLog:

PR c++/93529
* g++.dg/cpp0x/sfinae4.C: Adjust expected result after P1009.
* g++.dg/cpp2a/new-array1.C: New test.
* g++.dg/cpp2a/new-array2.C: New test.
* g++.dg/cpp2a/new-array3.C: New test.

Co-authored-by: Jason Merrill 
---
  gcc/cp/init.c   | 54 ++-
  gcc/cp/parser.c | 11 ++--
  gcc/cp/pt.c |  4 ++
  gcc/testsuite/g++.dg/cpp0x/sfinae4.C|  8 ++-
  gcc/testsuite/g++.dg/cpp2a/new-array1.C | 70 +
  gcc/testsuite/g++.dg/cpp2a/new-array2.C | 22 
  gcc/testsuite/g++.dg/cpp2a/new-array3.C | 17 ++
  7 files changed, 180 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/new-array3.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 872c23453fd..ae1177079e4 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3559,8 +3559,8 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
else if (array_p)
{
  tree vecinit = NULL_TREE;
- if (vec_safe_length (*init) == 1
- && DIRECT_LIST_INIT_P ((**init)[0]))
+ const size_t len = vec_safe_length (*init);
+ if (len == 1 && DIRECT_LIST_INIT_P ((**init)[0]))
{
  vecinit = (**init)[0];
  if (CONSTRUCTOR_NELTS (vecinit) == 0)
@@ -3578,6 +3578,15 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  vecinit = digest_init (arraytype, vecinit, complain);
}
}
+ /* This handles code like new char[]{"foo"}.  */
+ else if (len == 1
+  && char_type_p (TYPE_MAIN_VARIANT (type))
+  && TREE_CODE (tree_strip_any_location_wrapper ((**init)[0]))
+ == STRING_CST)
+   {
+ vecinit = (**init)[0];
+ STRIP_ANY_LOCATION_WRAPPER (vecinit);
+   }
  else if (*init)
  {
if (complain & tf_error)
@@ -3917,6 +3926,47 @@ build_new (location_t loc, vec **placement, 
tree type,
return error_mark_node;
  }
  
+  /* P1009: Array size deduction in new-expressions.  */

+  if (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type)
+  && *init)
+{
+  /* This means we have 'new T[]()'.  */
+  if ((*init)->is_empty ())
+   {
+ tree ctor = build_constructor (init_list_type_node, NULL);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ vec_safe_push (*init, ctor);
+   }
+  tree  = (**init)[0];
+  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
+  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
+   {
+ /* Handle new char[]("foo").  */
+ if (vec_safe_length (*init) == 1
+ && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && TREE_CODE (tree_strip_any_location_wrapper (elt))
+== STRING_CST)
+   /* Leave it alone: the string should not be wrapped in {}.  */;
+ else
+   {
+ /* Create a CONSTRUCTOR from the vector INIT.  */
+ tree list = build_tree_list_vec (*init);
+ tree ctor = build_constructor_from_list (init_list_type_node, 
list);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ CONSTRUCTOR_IS_PAREN_INIT 

Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-21 Thread Martin Sebor via Gcc-patches

On 8/19/20 6:09 PM, Joseph Myers wrote:

On Wed, 19 Aug 2020, Martin Sebor via Gcc-patches wrote:


I think you need a while loop there, not just an if, to account for the
case of multiple consecutive cdk_attrs.  At least the GNU attribute syntax

 direct-declarator:
[...]
   ( gnu-attributes[opt] declarator )

should produce multiple consecutive cdk_attrs for each level of
parentheses with attributes inside.


I had considered a loop but couldn't find a way to trigger what you
describe (or a test in the testsuite that would do it) so I didn't
use one.  I saw loops like that in other places but I couldn't get
even those to uncover such a test case.  Here's what I tried:

   #define A(N) __attribute__ ((aligned (N), may_alias))
   int n;
   void f (int (* A (2) A (4) (* A (2) A (4) (* A (2) A (4) [n])[n])));

Sequences of consecutive attributes are all chained together.

I've added the loop here but I have no test for it.  It would be
good to add one if it really is needed.


The sort of thing I'm thinking of would be, where A is some attribute:

void f (int (A (A (A arg;

(that example doesn't involve an array, but it illustrates the syntax I'd
expect to produce multiple consecutive cdk_attrs).


Yes, that does it, thanks.  But as a result of the test:

  if (pd->kind != cdk_array)
continue;

I don't see how to write a declaration where the if rather than
a loop would cause trouble.  If next->kind == cdk_attrs after
the test in the if statement (i.e., before I replaced it with
the while loop), the test above would be true and the for loop
would continue.  The next test for next->kind would then skip
over the attrs.

Let me know if I'm missing something.  Otherwise I'll just leave
the loop there with no test.

Martin


Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-21 Thread John David Anglin
Hi Roger,

On 2020-08-21 8:53 a.m., Roger Sayle wrote:
> I was wondering whether Dave or Jeff (or someone else with access
> to real hardware) might "spin" this patch for me?
This may be totally unrelated to this patch but I hit this error in stage2 
testing your change:
build/genattrtab ../../gcc/gcc/common.md ../../gcc/gcc/config/pa/pa.md 
insn-conditions.md \
    -Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c
genattrtab: Internal error: abort in attr_alt_union, at genattrtab.c:2383

It's great that you are back helpting with the middle-end.

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net



[PATCH] hppa: PR middle-end/87256: Improved hppa_rtx_costs avoids synth_mult madness.

2020-08-21 Thread Roger Sayle

This is my proposed fix to PR middle-end/87256 where synth_mult takes an
unreasonable amount of CPU time determining an optimal sequence of
instructions to perform multiplications by (large) integer constants on
hppa.
One workaround, proposed in bugzilla, is to increase the hash table used
to cache/reuse intermediate results. This helps but is a workaround for
the (more subtle) underlying problem.

The real issue is that the hppa_rtx_costs function is providing wildly
inaccurate values (estimates) to the middle-end.  For example, (p*q)+(r*s)
would appear to be cheaper than a single multiplication.  Another
example is that "(ashiftrt:di regA regB)" is claimed to be only
COST_N_INSNS(1) when in fact the hppa backend actually generates:

ashrdi: ldi 32,%r28
and %r24,%r28,%r28
comib,= 0,%r28,.L6
mtsar %r24
subi 31,%r24,%r24
extrs %r25,0,1,%r28
mtsar %r24
bv %r0(%r2)
vextrs %r25,32,%r29
.L6:uaddcm %r0,%r24,%r28
vshd %r0,%r26,%r29
subi 31,%r28,%r28
mtsar %r28
zdep %r25,30,31,%r19
subi 31,%r24,%r24
zvdep %r19,32,%r19
mtsar %r24
or %r19,%r29,%r29
bv %r0(%r2)
vextrs %r25,32,%r28

which is slightly more than a single instruction.

It turns out that simply tightening up the logic in hppa_rtx_costs to
return more reasonable values, dramatically reduces the number of recursive
invocations in synth_mult for the test case in PR87256, and presumably
also produces faster code (that should be observable in benchmarks).

Unfortunately, once again this has only be tested via cross-compilers
to hppa-unknown-linux-gnu and hppa64-unknown-linux-gnu hosted on
x86_64-pc-linux-gnu, but I can confirm cross-compilation is much faster.
Many thanks in advance to anyone who can bootstrap and regression test
this on real hardware.  In an ideal world, changes to rtx_costs should
be pretty safe, this function can't introduce any bugs, only expose those
are already present (but possibly latent) elsewhere in the compiler.  Ha.


2020-08-21  Roger Sayle  

gcc/ChangeLog
PR middle-end/87256
* config/pa/pa.c (hppa_rtx_costs_shadd_p): New helper function
to check for coefficients supported by shNadd and shladd,l.
(hppa_rtx_costs):  Rewrite to avoid using estimates based upon
FACTOR and enable recursing deeper into RTL expressions.


Thanks again.
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c
index 07d3287..cc876e6 100644
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@@ -1492,6 +1492,33 @@ hppa_address_cost (rtx X, machine_mode mode 
ATTRIBUTE_UNUSED,
 }
 }
 
+/* Return true if X represents a (possibly non-canonical) shNadd pattern.
+   The machine mode of X is known to be SImode or DImode.  */
+
+static bool
+hppa_rtx_costs_shadd_p (rtx x)
+{
+  if (GET_CODE (x) != PLUS
+  || !REG_P (XEXP (x, 1)))
+return false;
+  rtx op0 = XEXP (x, 0);
+  if (GET_CODE (op0) == ASHIFT
+  && CONST_INT_P (XEXP (op0, 1))
+  && REG_P (XEXP (op0, 0)))
+{
+  unsigned HOST_WIDE_INT x = UINTVAL (XEXP (op0, 1));
+  return x == 1 || x == 2 || x == 3;
+}
+  if (GET_CODE (op0) == MULT
+  && CONST_INT_P (XEXP (op0, 1))
+  && REG_P (XEXP (op0, 0)))
+{
+  unsigned HOST_WIDE_INT x = UINTVAL (XEXP (op0, 1));
+  return x == 2 || x == 4 || x == 8;
+}
+  return false;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.  */
@@ -1499,15 +1526,16 @@ hppa_address_cost (rtx X, machine_mode mode 
ATTRIBUTE_UNUSED,
 static bool
 hppa_rtx_costs (rtx x, machine_mode mode, int outer_code,
int opno ATTRIBUTE_UNUSED,
-   int *total, bool speed ATTRIBUTE_UNUSED)
+   int *total, bool speed)
 {
-  int factor;
   int code = GET_CODE (x);
 
   switch (code)
 {
 case CONST_INT:
-  if (INTVAL (x) == 0)
+  if (outer_code == SET)
+   *total = COSTS_N_INSNS (1);
+  else if (INTVAL (x) == 0)
*total = 0;
   else if (INT_14_BITS (x))
*total = 1;
@@ -1537,25 +1565,28 @@ hppa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
  *total = COSTS_N_INSNS (3);
- return true;
}
-
-  /* A mode size N times larger than SImode needs O(N*N) more insns.  */
-  factor = GET_MODE_SIZE (mode) / 4;
-  if (factor == 0)
-   factor = 1;
-
-  if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT)
-   *total = factor * factor * COSTS_N_INSNS (8);
+  else if (mode == DImode)
+   {
+  if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT)
+   *total = COSTS_N_INSNS (32);
+ else
+   *total = COSTS_N_INSNS (80);
+   }
   

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 21, 2020 at 6:29 PM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > > QImode data go into mask registers.
> > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > > registers.
> > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > predicate.
> > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > splitters
> > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > A little nit, please put new splitters after the instruction 
> > > > > > pattern.
> > > > > >
> > > > > > OK for the whole patch set with the above change,
> > > > > >
> > > > >
> > > > > Yes, thanks for the review.
> > > >
> > > > Please note that your patch introduces several testsuite fails with 
> > > > -m32:
> > > >
> > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > >
> > >
> > > I can't reproduce this failure.
> >
> > Because you are running it on AVX512 enabled target.
> >
> > > > Program received signal SIGILL, Illegal instruction.
> > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > __ecx=, __ebx=, __eax= > > > pointer>,
> > > > __subleaf=0, __leaf=7) at 
> > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > *__edx);
> > > >
> > > >0x080490a3 <+51>:cpuid
> > > >0x080490a5 <+53>:mov$0x1,%eax
> > > >0x080490aa <+58>:mov%ecx,%esi
> > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > >0x080490b0 <+64>:mov%edi,%ecx
> > > >0x080490b2 <+66>:mov%edi,%ebx
> > > >
> > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > determines, if the new instructions are supported. The binary will
> > > > crash in the detection code if the processor lacks AVX512
> > > > instructions.
> > > >
> > >
> > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> >
> > No, it could run, because it checks for AVX512BW at runtime.
> >
>
> Got it.
>
> > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > dg-require-effective-target avx512bw } */.
> >
> > This is to check the toolchain for support.
> >
> > > what's the version of your assembler?
> >
> > GNU assembler version 2.34-4.fc32
> >
>
> If assembler supports avx512bw, but processor not, the test would pass
> condition `dg-require-effective-target avx512bw` and be runned.
> then crashed for illegal instruction.

Please look at avx512-check.h. This is where main function lives.

> > Please add something like
> > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> >
> > Handle this in inline_secondary_memory_needed to reject direct moves
> > for all other targets. This should disable direct moves for generic
> > targets.
> >
>
> Yes, I'll add it.

Thanks.

Uros.


Re: [Patch, fortran] PR fortran/95352 - ICE on select rank with assumed-size selector and lbound intrinsic

2020-08-21 Thread Thomas Koenig via Gcc-patches

Hi Jose,

Proposed patch to PR95352 - ICE on select rank with assumed-size 
selector and lbound intrinsic.


Patch tested only on x86_64-pc-linux-gnu.

Add check for NULL pointer before trying to access structure member, 
patch by Steve Kargl.


this is OK, but you'll have to adjust your ChangeLog.

I'll only write this once for your series of patches
(I think you just broke the record for most patches per day :-)

Regarding the ChangeLog, for this and for your other patches:
Before this is accepted by the scripts, you will need to massage
the git commit log into a form for that the scripts will accept.

You can run your commit through "git gcc-verify" to check
if you have previously run contrib/gcc-git-customization.sh
(which I recommend). Read

https://gcc.gnu.org/codingconventions.html#ChangeLogs

to find the gory details.  Running contrib/mklog.py will
prepare a template for the ChangeLog.

Thanks a lot for taking up this patch!

Best regards

Thomas



Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 22, 2020 at 1:08 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
> > >
> > > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > > gcc/
> > > > > > > > > > > PR target/88808
> > > > > > > > > > > * config/i386/i386.c 
> > > > > > > > > > > (ix86_preferred_reload_class): Allow
> > > > > > > > > > > QImode data go into mask registers.
> > > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > > > constraints
> > > > > > > > > > > for mask registers.
> > > > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > > > (*and_1): Ditto.
> > > > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > > > (*_1): Ditto.
> > > > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > > > (define_peephole2): Move constant 0/-1 directly 
> > > > > > > > > > > into mask
> > > > > > > > > > > registers.
> > > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): 
> > > > > > > > > > > New predicate.
> > > > > > > > > > > * config/i386/sse.md (define_split): Add 
> > > > > > > > > > > post-reload splitters
> > > > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > > > patterns.
> > > > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > > > >
> > > > > > > > > > > gcc/testsuite/
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New 
> > > > > > > > > > > testcase.
> > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > > > testcase.
> > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > > > >
> > > > > > > > > > A little nit, please put new splitters after the 
> > > > > > > > > > instruction pattern.
> > > > > > > > > >
> > > > > > > > > > OK for the whole patch set with the above change,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, thanks for the review.
> > > > > > > >
> > > > > > > > Please note that your patch introduces several testsuite fails 
> > > > > > > > with -m32:
> > > > > > > >
> > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g 
> > > > > > > > avx512bitalg-vpopcntb-1.c
> > > > > > > >
> > > > > > >
> > > > > > > I can't reproduce this failure.
> > > > > >
> > > > > > Because you are running it on AVX512 enabled target.
> > > > > >
> > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > > > __ecx=, __ebx=, 
> > > > > > > > __eax= > > > > > > > pointer>,
> > > > > > > > __subleaf=0, __leaf=7) at 
> > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, 
> > > > > > > > *__ecx, *__edx);
> > > > > > > >
> > > > > > > >0x080490a3 <+51>:cpuid
> > > > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > > > >
> > > > > > > > kmov insn is generated for __cpuid_count function, where the 
> > > > > > > > binary
> > > > > > > > determines, if the new instructions are supported. The binary 
> > > > > > > > will
> > > > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > > > instructions.
> > > > > > > >
> > > > > > >
> > > > > > > IMHO, the testcase shouldn't be run on processors without 
> > > > > > > AVX512BW.
> > > > > >
> > > > > > No, it could run, because it checks for AVX512BW at runtime.
> > > > > >
> > > > >
> > > > > Got it.
> > > > >
> > > > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > > > dg-require-effective-target 

Re: [Patch, fortran] PR fortran/94110 - Passing an assumed-size to an assumed-shape argument should be rejected

2020-08-21 Thread Thomas Koenig via Gcc-patches

Hi Jose,

Proposed patch to PR94110 - Passing an assumed-size to an assumed-shape 
argument should be rejected.


OK for master.

Thanks a lot for the patch!

Best regards

Thomas



Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
> >
> > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > > > > gcc/
> > > > > > > > > > PR target/88808
> > > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > > > Allow
> > > > > > > > > > QImode data go into mask registers.
> > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > > constraints
> > > > > > > > > > for mask registers.
> > > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > > (*and_1): Ditto.
> > > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > > (*_1): Ditto.
> > > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > > (define_peephole2): Move constant 0/-1 directly 
> > > > > > > > > > into mask
> > > > > > > > > > registers.
> > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > > > predicate.
> > > > > > > > > > * config/i386/sse.md (define_split): Add 
> > > > > > > > > > post-reload splitters
> > > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > > patterns.
> > > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > > >
> > > > > > > > > > gcc/testsuite/
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > > testcase.
> > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > > >
> > > > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > > > pattern.
> > > > > > > > >
> > > > > > > > > OK for the whole patch set with the above change,
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, thanks for the review.
> > > > > > >
> > > > > > > Please note that your patch introduces several testsuite fails 
> > > > > > > with -m32:
> > > > > > >
> > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g 
> > > > > > > avx512bitalg-vpopcntb-1.c
> > > > > > >
> > > > > >
> > > > > > I can't reproduce this failure.
> > > > >
> > > > > Because you are running it on AVX512 enabled target.
> > > > >
> > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > > __ecx=, __ebx=, 
> > > > > > > __eax= > > > > > > pointer>,
> > > > > > > __subleaf=0, __leaf=7) at 
> > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, 
> > > > > > > *__ecx, *__edx);
> > > > > > >
> > > > > > >0x080490a3 <+51>:cpuid
> > > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > > >
> > > > > > > kmov insn is generated for __cpuid_count function, where the 
> > > > > > > binary
> > > > > > > determines, if the new instructions are supported. The binary will
> > > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > > instructions.
> > > > > > >
> > > > > >
> > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > > > >
> > > > > No, it could run, because it checks for AVX512BW at runtime.
> > > > >
> > > >
> > > > Got it.
> > > >
> > > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > > dg-require-effective-target avx512bw } */.
> > > > >
> > > > > This is to check the toolchain for support.
> > > > >
> > > > > > what's the version of your assembler?
> > > > >
> > > > > GNU assembler version 2.34-4.fc32
> > > > >
> > > >
> > > > If assembler supports avx512bw, but processor not, the test would pass
> > > > condition `dg-require-effective-target 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
>
> On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > > > > gcc/
> > > > > > > > > PR target/88808
> > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > > Allow
> > > > > > > > > QImode data go into mask registers.
> > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > constraints
> > > > > > > > > for mask registers.
> > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > (*and_1): Ditto.
> > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > (*_1): Ditto.
> > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > > mask
> > > > > > > > > registers.
> > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > > predicate.
> > > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > > splitters
> > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > patterns.
> > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > >
> > > > > > > > > gcc/testsuite/
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > testcase.
> > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > >
> > > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > > pattern.
> > > > > > > >
> > > > > > > > OK for the whole patch set with the above change,
> > > > > > > >
> > > > > > >
> > > > > > > Yes, thanks for the review.
> > > > > >
> > > > > > Please note that your patch introduces several testsuite fails with 
> > > > > > -m32:
> > > > > >
> > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > > >
> > > > >
> > > > > I can't reproduce this failure.
> > > >
> > > > Because you are running it on AVX512 enabled target.
> > > >
> > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > __ecx=, __ebx=, 
> > > > > > __eax= > > > > > pointer>,
> > > > > > __subleaf=0, __leaf=7) at 
> > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > > *__edx);
> > > > > >
> > > > > >0x080490a3 <+51>:cpuid
> > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > >
> > > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > > determines, if the new instructions are supported. The binary will
> > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > instructions.
> > > > > >
> > > > >
> > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > > >
> > > > No, it could run, because it checks for AVX512BW at runtime.
> > > >
> > >
> > > Got it.
> > >
> > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > dg-require-effective-target avx512bw } */.
> > > >
> > > > This is to check the toolchain for support.
> > > >
> > > > > what's the version of your assembler?
> > > >
> > > > GNU assembler version 2.34-4.fc32
> > > >
> > >
> > > If assembler supports avx512bw, but processor not, the test would pass
> > > condition `dg-require-effective-target avx512bw` and be runned.
> > > then crashed for illegal instruction.
> > >
> > > > Please add something like
> > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > > >
> > > > Handle this in inline_secondary_memory_needed to reject 

[PATCH] middle-end: PR tree-optimization/21137: STRIP_NOPS avoids missed optimization.

2020-08-21 Thread Roger Sayle

PR tree-optimization/21137 is now an old enhancement request pointing out
that an optimization I added back in 2006, to optimize "((x>>31)&64) != 0"
as "x < 0", doesn't fire in the presence of unanticipated type conversions.
The fix is to call STRIP_NOPS at the appropriate point.

I'd considered moving this transformation to match.pd, but it's a lot of
complex logic that (I suspect) would be just as ugly in match.pd as it is
in fold-const.c.

This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check" with no new failures.
Ok for mainline?

2020-08-21  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/21137
* gcc/fold-const.c (fold_binary_loc) [NE_EXPR/EQ_EXPR]: Call
STRIP_NOPS when checking whether to simplify ((x>>C1)) != 0.

gcc/testsuite/ChangeLog
PR tree-optimization/21137
* gcc.dg/pr21137.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 9fc4c2a..efe77e7 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -11596,45 +11596,53 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
 C1 is a valid shift constant, and C2 is a power of two, i.e.
 a single bit.  */
   if (TREE_CODE (arg0) == BIT_AND_EXPR
- && TREE_CODE (TREE_OPERAND (arg0, 0)) == RSHIFT_EXPR
- && TREE_CODE (TREE_OPERAND (TREE_OPERAND (arg0, 0), 1))
-== INTEGER_CST
  && integer_pow2p (TREE_OPERAND (arg0, 1))
  && integer_zerop (arg1))
{
- tree itype = TREE_TYPE (arg0);
- tree arg001 = TREE_OPERAND (TREE_OPERAND (arg0, 0), 1);
- prec = TYPE_PRECISION (itype);
-
- /* Check for a valid shift count.  */
- if (wi::ltu_p (wi::to_wide (arg001), prec))
+ tree arg00 = TREE_OPERAND (arg0, 0);
+ STRIP_NOPS (arg00);
+ if (TREE_CODE (arg00) == RSHIFT_EXPR
+ && TREE_CODE (TREE_OPERAND (arg00, 1)) == INTEGER_CST)
{
- tree arg01 = TREE_OPERAND (arg0, 1);
- tree arg000 = TREE_OPERAND (TREE_OPERAND (arg0, 0), 0);
- unsigned HOST_WIDE_INT log2 = tree_log2 (arg01);
- /* If (C2 << C1) doesn't overflow, then ((X >> C1) & C2) != 0
-can be rewritten as (X & (C2 << C1)) != 0.  */
- if ((log2 + TREE_INT_CST_LOW (arg001)) < prec)
+ tree itype = TREE_TYPE (arg00);
+ tree arg001 = TREE_OPERAND (arg00, 1);
+ prec = TYPE_PRECISION (itype);
+
+ /* Check for a valid shift count.  */
+ if (wi::ltu_p (wi::to_wide (arg001), prec))
{
- tem = fold_build2_loc (loc, LSHIFT_EXPR, itype, arg01, 
arg001);
- tem = fold_build2_loc (loc, BIT_AND_EXPR, itype, arg000, tem);
- return fold_build2_loc (loc, code, type, tem,
- fold_convert_loc (loc, itype, arg1));
-   }
- /* Otherwise, for signed (arithmetic) shifts,
-((X >> C1) & C2) != 0 is rewritten as X < 0, and
-((X >> C1) & C2) == 0 is rewritten as X >= 0.  */
- else if (!TYPE_UNSIGNED (itype))
-   return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : 
LT_EXPR, type,
-   arg000, build_int_cst (itype, 0));
- /* Otherwise, of unsigned (logical) shifts,
-((X >> C1) & C2) != 0 is rewritten as (X,false), and
-((X >> C1) & C2) == 0 is rewritten as (X,true).  */
- else
-   return omit_one_operand_loc (loc, type,
+ tree arg01 = TREE_OPERAND (arg0, 1);
+ tree arg000 = TREE_OPERAND (arg00, 0);
+ unsigned HOST_WIDE_INT log2 = tree_log2 (arg01);
+ /* If (C2 << C1) doesn't overflow, then
+((X >> C1) & C2) != 0 can be rewritten as
+(X & (C2 << C1)) != 0.  */
+ if ((log2 + TREE_INT_CST_LOW (arg001)) < prec)
+   {
+ tem = fold_build2_loc (loc, LSHIFT_EXPR, itype,
+arg01, arg001);
+ tem = fold_build2_loc (loc, BIT_AND_EXPR, itype,
+arg000, tem);
+ return fold_build2_loc (loc, code, type, tem,
+   fold_convert_loc (loc, itype, arg1));
+   }
+ /* Otherwise, for signed (arithmetic) shifts,
+((X >> C1) & C2) != 0 is rewritten as X < 0, and
+((X >> C1) & C2) == 0 is rewritten as X >= 0.  */
+ else if (!TYPE_UNSIGNED (itype))
+   return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR
+: LT_EXPR,
+  

[PATCH] middle-end: Simplify popcount/parity of bswap/rotate.

2020-08-21 Thread Roger Sayle

This simple patch to match.pd optimizes away bit permutation
operations, specifically bswap and rotate, in calls to popcount and
parity.  Although this patch has been developed and tested on LP64,
it relies on there being no truncations or extensions to "marry up"
the appropriate PARITY, PARITYL and PARITYLL forms with either BSWAP32
or BSWAP64, assuming this transformation won't fire if the integral
types have different sizes.

The following patch has been tested on x86_64-pc-linux-gnu with
"make bootstrap" and "make -k check" with no new failures.
Ok for mainline?

2020-08-21  Roger Sayle  

gcc/ChangeLog
* gcc/match.pd (popcount(bswapN(x)) -> popcount(x),
popcount(rotate(x)) -> popcount(x), parity(bswapN(x)) -> parity(x),
parity(rotate(x)) -> parity(x)): New simplifications.

gcc/testsuite/ChangeLog
* gcc.dg/fold-popcount-6.c: New test.
* gcc.dg/fold-popcount-7.c: New test.
* gcc.dg/fold-parity-6.c: New test.
* gcc.dg/fold-parity-7.c: New test.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index c3b8816..7e8a893 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6060,12 +6060,40 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_and (POPCOUNT @0) integer_onep)
   (PARITY @0))
 
+/* Rely on no conversion to marry POPCOUNT, POPCOUNTL and POPCOUNTLL.  */
+(simplify
+  (POPCOUNT (BUILT_IN_BSWAP32 @0))
+  (POPCOUNT @0))
+(simplify
+  (POPCOUNT (BUILT_IN_BSWAP64 @0))
+  (POPCOUNT @0))
+
+(for popcount (POPCOUNT)
+  (for rot (lrotate rrotate)
+(simplify
+  (popcount (rot @0 @1))
+  (popcount @0
+
 /* PARITY simplifications.  */
 /* parity(~X) is parity(X).  */
 (simplify
   (PARITY (bit_not @0))
   (PARITY @0))
 
+/* Rely on no conversion to marry PARITY, PARITYL and PARITYLL.  */
+(simplify
+  (PARITY (BUILT_IN_BSWAP32 @0))
+  (PARITY @0))
+(simplify
+  (PARITY (BUILT_IN_BSWAP64 @0))
+  (PARITY @0))
+
+(for parity (PARITY)
+  (for rot (lrotate rrotate)
+(simplify
+  (parity (rot @0 @1))
+  (parity @0
+
 /* parity(X)^parity(Y) is parity(X^Y).  */
 (simplify
   (bit_xor (PARITY:s @0) (PARITY:s @1))
diff --git a/gcc/testsuite/gcc.dg/fold-popcount-6.c 
b/gcc/testsuite/gcc.dg/fold-popcount-6.c
new file mode 100644
index 000..37b55a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-popcount-6.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned int x)
+{
+  if (sizeof(unsigned int) == 4)
+return __builtin_popcount (__builtin_bswap32(x));
+  return x;
+}
+
+unsigned int bar(unsigned long x)
+{
+  if (sizeof(unsigned long) == 8)
+return __builtin_popcountl (__builtin_bswap64(x));
+  if (sizeof(unsigned long) == 4)
+return __builtin_popcountl (__builtin_bswap32(x));
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times "bswap" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-popcount-7.c 
b/gcc/testsuite/gcc.dg/fold-popcount-7.c
new file mode 100644
index 000..fdcefe1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-popcount-7.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned int x)
+{
+  if (sizeof(unsigned int) == 4) {
+unsigned int y = (x>>4) | (x<<28);
+return __builtin_popcount(y);
+  } else return x;
+}
+
+unsigned int bar(unsigned long x)
+{
+  if (sizeof(unsigned long) == 8) {
+unsigned long y = (x>>4) | (x<<60);
+return __builtin_popcountl (y);
+  } else if(sizeof(unsigned long) == 4) {
+unsigned long y = (x>>4) | (x<<28);
+return __builtin_popcountl (y);
+  } else return (unsigned int)x;
+}
+
+/* { dg-final { scan-tree-dump-times " r>> " 0 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-6.c 
b/gcc/testsuite/gcc.dg/fold-parity-6.c
new file mode 100644
index 000..ece0048
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-6.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned int x)
+{
+  if (sizeof(unsigned int) == 4)
+return __builtin_parity (__builtin_bswap32(x));
+  return x;
+}
+
+unsigned int bar(unsigned long x)
+{
+  if (sizeof(unsigned long) == 8)
+return __builtin_parityl (__builtin_bswap64(x));
+  if (sizeof(unsigned long) == 4)
+return __builtin_parityl (__builtin_bswap32(x));
+  return (unsigned int)x;
+}
+
+/* { dg-final { scan-tree-dump-times "bswap" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-parity-7.c 
b/gcc/testsuite/gcc.dg/fold-parity-7.c
new file mode 100644
index 000..9b5085b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-7.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned int x)
+{
+  if (sizeof(unsigned int) == 4) {
+unsigned int y = (x>>4) | (x<<28);
+return __builtin_parity (y);
+  } else return x;
+}
+
+unsigned int bar(unsigned long 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > >
> > > > > > > > gcc/
> > > > > > > > PR target/88808
> > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > Allow
> > > > > > > > QImode data go into mask registers.
> > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > constraints
> > > > > > > > for mask registers.
> > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > (*and_1): Ditto.
> > > > > > > > (*andqi_1): Ditto.
> > > > > > > > (*andn_1): Ditto.
> > > > > > > > (*_1): Ditto.
> > > > > > > > (*qi_1): Ditto.
> > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > mask
> > > > > > > > registers.
> > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > predicate.
> > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > splitters
> > > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > >
> > > > > > > > gcc/testsuite/
> > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > >
> > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > pattern.
> > > > > > >
> > > > > > > OK for the whole patch set with the above change,
> > > > > > >
> > > > > >
> > > > > > Yes, thanks for the review.
> > > > >
> > > > > Please note that your patch introduces several testsuite fails with 
> > > > > -m32:
> > > > >
> > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > >
> > > >
> > > > I can't reproduce this failure.
> > >
> > > Because you are running it on AVX512 enabled target.
> > >
> > > > > Program received signal SIGILL, Illegal instruction.
> > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > __ecx=, __ebx=, __eax= > > > > pointer>,
> > > > > __subleaf=0, __leaf=7) at 
> > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > *__edx);
> > > > >
> > > > >0x080490a3 <+51>:cpuid
> > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > >
> > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > determines, if the new instructions are supported. The binary will
> > > > > crash in the detection code if the processor lacks AVX512
> > > > > instructions.
> > > > >
> > > >
> > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > >
> > > No, it could run, because it checks for AVX512BW at runtime.
> > >
> >
> > Got it.
> >
> > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > dg-require-effective-target avx512bw } */.
> > >
> > > This is to check the toolchain for support.
> > >
> > > > what's the version of your assembler?
> > >
> > > GNU assembler version 2.34-4.fc32
> > >
> >
> > If assembler supports avx512bw, but processor not, the test would pass
> > condition `dg-require-effective-target avx512bw` and be runned.
> > then crashed for illegal instruction.
> >
> > > Please add something like
> > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > >
> > > Handle this in inline_secondary_memory_needed to reject direct moves
> > > for all other targets. This should disable direct moves for generic
> > > targets.
> > >
> >
> > Yes, I'll add it.
> >
>
>
> (define_insn "*movsi_internal"
>   [(set (match_operand:SI 0 "nonimmediate_operand"
> "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:35 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > >
> > > > > > > > gcc/
> > > > > > > > PR target/88808
> > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > Allow
> > > > > > > > QImode data go into mask registers.
> > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > constraints
> > > > > > > > for mask registers.
> > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > (*and_1): Ditto.
> > > > > > > > (*andqi_1): Ditto.
> > > > > > > > (*andn_1): Ditto.
> > > > > > > > (*_1): Ditto.
> > > > > > > > (*qi_1): Ditto.
> > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > mask
> > > > > > > > registers.
> > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > predicate.
> > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > splitters
> > > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > >
> > > > > > > > gcc/testsuite/
> > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > >
> > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > pattern.
> > > > > > >
> > > > > > > OK for the whole patch set with the above change,
> > > > > > >
> > > > > >
> > > > > > Yes, thanks for the review.
> > > > >
> > > > > Please note that your patch introduces several testsuite fails with 
> > > > > -m32:
> > > > >
> > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > >
> > > >
> > > > I can't reproduce this failure.
> > >
> > > Because you are running it on AVX512 enabled target.
> > >
> > > > > Program received signal SIGILL, Illegal instruction.
> > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > __ecx=, __ebx=, __eax= > > > > pointer>,
> > > > > __subleaf=0, __leaf=7) at 
> > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > *__edx);
> > > > >
> > > > >0x080490a3 <+51>:cpuid
> > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > >
> > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > determines, if the new instructions are supported. The binary will
> > > > > crash in the detection code if the processor lacks AVX512
> > > > > instructions.
> > > > >
> > > >
> > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > >
> > > No, it could run, because it checks for AVX512BW at runtime.
> > >
> >
> > Got it.
> >
> > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > dg-require-effective-target avx512bw } */.
> > >
> > > This is to check the toolchain for support.
> > >
> > > > what's the version of your assembler?
> > >
> > > GNU assembler version 2.34-4.fc32
> > >
> >
> > If assembler supports avx512bw, but processor not, the test would pass
> > condition `dg-require-effective-target avx512bw` and be runned.
> > then crashed for illegal instruction.
> >
> > > Please add something like
> > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > >
> > > Handle this in inline_secondary_memory_needed to reject direct moves
> > > for all other targets. This should disable direct moves for generic
> > > targets.
> > >
> >
> > Yes, I'll add it.
> >
>
>
> (define_insn "*movsi_internal"
>   [(set (match_operand:SI 0 "nonimmediate_operand"
> "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > > QImode data go into mask registers.
> > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > > registers.
> > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > predicate.
> > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > splitters
> > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > A little nit, please put new splitters after the instruction 
> > > > > > pattern.
> > > > > >
> > > > > > OK for the whole patch set with the above change,
> > > > > >
> > > > >
> > > > > Yes, thanks for the review.
> > > >
> > > > Please note that your patch introduces several testsuite fails with 
> > > > -m32:
> > > >
> > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > >
> > >
> > > I can't reproduce this failure.
> >
> > Because you are running it on AVX512 enabled target.
> >
> > > > Program received signal SIGILL, Illegal instruction.
> > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > __ecx=, __ebx=, __eax= > > > pointer>,
> > > > __subleaf=0, __leaf=7) at 
> > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > *__edx);
> > > >
> > > >0x080490a3 <+51>:cpuid
> > > >0x080490a5 <+53>:mov$0x1,%eax
> > > >0x080490aa <+58>:mov%ecx,%esi
> > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > >0x080490b0 <+64>:mov%edi,%ecx
> > > >0x080490b2 <+66>:mov%edi,%ebx
> > > >
> > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > determines, if the new instructions are supported. The binary will
> > > > crash in the detection code if the processor lacks AVX512
> > > > instructions.
> > > >
> > >
> > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> >
> > No, it could run, because it checks for AVX512BW at runtime.
> >
>
> Got it.
>
> > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > dg-require-effective-target avx512bw } */.
> >
> > This is to check the toolchain for support.
> >
> > > what's the version of your assembler?
> >
> > GNU assembler version 2.34-4.fc32
> >
>
> If assembler supports avx512bw, but processor not, the test would pass
> condition `dg-require-effective-target avx512bw` and be runned.
> then crashed for illegal instruction.
>
> > Please add something like
> > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> >
> > Handle this in inline_secondary_memory_needed to reject direct moves
> > for all other targets. This should disable direct moves for generic
> > targets.
> >
>
> Yes, I'll add it.
>


(define_insn "*movsi_internal"
  [(set (match_operand:SI 0 "nonimmediate_operand"
"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
(match_operand:SI 1 "general_operand"
"g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
...
 [(set (attr "isa")
 (cond [(eq_attr "alternative" "12,13")
  (const_string "sse2")
   ]
   (const_string "*")))

is wrong.   mask register alternatives should be 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
>
> On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > >
> > > > > > gcc/
> > > > > > PR target/88808
> > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > QImode data go into mask registers.
> > > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > > for mask registers.
> > > > > > (*movqi_internal): Ditto.
> > > > > > (*anddi_1): Support mask register operations
> > > > > > (*and_1): Ditto.
> > > > > > (*andqi_1): Ditto.
> > > > > > (*andn_1): Ditto.
> > > > > > (*_1): Ditto.
> > > > > > (*qi_1): Ditto.
> > > > > > (*one_cmpl2_1): Ditto.
> > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > registers.
> > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > predicate.
> > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > splitters
> > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > (*knotsi_1_zext): New define_insn.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > >
> > > > > A little nit, please put new splitters after the instruction pattern.
> > > > >
> > > > > OK for the whole patch set with the above change,
> > > > >
> > > >
> > > > Yes, thanks for the review.
> > >
> > > Please note that your patch introduces several testsuite fails with -m32:
> > >
> > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > >
> >
> > I can't reproduce this failure.
>
> Because you are running it on AVX512 enabled target.
>
> > > Program received signal SIGILL, Illegal instruction.
> > > 0x080490ac in __get_cpuid_count (__edx=,
> > > __ecx=, __ebx=, __eax= > > pointer>,
> > > __subleaf=0, __leaf=7) at 
> > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > *__edx);
> > >
> > >0x080490a3 <+51>:cpuid
> > >0x080490a5 <+53>:mov$0x1,%eax
> > >0x080490aa <+58>:mov%ecx,%esi
> > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > >0x080490b0 <+64>:mov%edi,%ecx
> > >0x080490b2 <+66>:mov%edi,%ebx
> > >
> > > kmov insn is generated for __cpuid_count function, where the binary
> > > determines, if the new instructions are supported. The binary will
> > > crash in the detection code if the processor lacks AVX512
> > > instructions.
> > >
> >
> > IMHO, the testcase shouldn't be run on processors without AVX512BW.
>
> No, it could run, because it checks for AVX512BW at runtime.
>

Got it.

> > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > dg-require-effective-target avx512bw } */.
>
> This is to check the toolchain for support.
>
> > what's the version of your assembler?
>
> GNU assembler version 2.34-4.fc32
>

If assembler supports avx512bw, but processor not, the test would pass
condition `dg-require-effective-target avx512bw` and be runned.
then crashed for illegal instruction.

> Please add something like
> X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
>
> Handle this in inline_secondary_memory_needed to reject direct moves
> for all other targets. This should disable direct moves for generic
> targets.
>

Yes, I'll add it.

> Uros.



-- 
BR,
Hongtao


[OG10] cherry pick a bunch of OpenMP 5 patches

2020-08-21 Thread Tobias Burnus

OG10 = devel/omp/gcc-10
a GCC 10 branch with some additional OpenMP/OpenACC/offloading patches

I have cherry-picked the following GCC 11 patches,
related to OpenMP 5 features (newest commit first):

commit 8ec8095634cab5053da4c49935eeba13f2aee2fa
gcc/fortran/module.c: Fix indentation
(cherry picked from commit 6de5600a8bd1ef0ad3d57670efdcc68bb3484276)
commit cee7f178c9b06461530ef7079c4daa1dfbbcc7a5
OpenMP: Add 'omp requires' to Fortran (mostly parsing)
(cherry picked from commit 269322ece17202632bc354e9c510e4a5bd6ad84b)
commit ade8db2028ed6735167950e2d36da4dc218cee3a
Fortran: Add support for OpenMP's nontemporal clause
(cherry picked from commit 21cfe724cbdc30612bf1ef59b26f19ada2210832)
commit 835160b024ce34bc6f258e7df366bb5c15e12a4b
OpenMP: Handle order(concurrent) clause in gfortran
(cherry picked from commit d8140b9ed3c0fed041aedaff3fa4a603984ca10f)
commit a477484ea40cdce12038ce7f8dc84772f06fc151
Fortran: Fix for OpenMP's 'lastprivate(conditional:'
(cherry picked from commit 7bd72dd5a385dfa6d49cfe640cefc9ed187361d3)
commit 4b339407ad42ba13fea125b4b0c651d3d3d02891
OpenMP: Support 'lastprivate (conditional:' in Fortran
(cherry picked from commit 084dc63a0200e60e0fbb7c36b412a158d234f5c0)

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > QImode data go into mask registers.
> > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > registers.
> > > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > > that would convert "generic" patterns to mask patterns.
> > > > > (*knotsi_1_zext): New define_insn.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > A little nit, please put new splitters after the instruction pattern.
> > > >
> > > > OK for the whole patch set with the above change,
> > > >
> > >
> > > Yes, thanks for the review.
> >
> > Please note that your patch introduces several testsuite fails with -m32:
> >
> > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> >
>
> I can't reproduce this failure.

Because you are running it on AVX512 enabled target.

> > Program received signal SIGILL, Illegal instruction.
> > 0x080490ac in __get_cpuid_count (__edx=,
> > __ecx=, __ebx=, __eax= > pointer>,
> > __subleaf=0, __leaf=7) at 
> > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
> >
> >0x080490a3 <+51>:cpuid
> >0x080490a5 <+53>:mov$0x1,%eax
> >0x080490aa <+58>:mov%ecx,%esi
> > => 0x080490ac <+60>:kmovd  %ebx,%k0
> >0x080490b0 <+64>:mov%edi,%ecx
> >0x080490b2 <+66>:mov%edi,%ebx
> >
> > kmov insn is generated for __cpuid_count function, where the binary
> > determines, if the new instructions are supported. The binary will
> > crash in the detection code if the processor lacks AVX512
> > instructions.
> >
>
> IMHO, the testcase shouldn't be run on processors without AVX512BW.

No, it could run, because it checks for AVX512BW at runtime.

> Because in  avx512bitalg-vpopcntb-1.c, there's /*
> dg-require-effective-target avx512bw } */.

This is to check the toolchain for support.

> what's the version of your assembler?

GNU assembler version 2.34-4.fc32

Please add something like
X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).

Handle this in inline_secondary_memory_needed to reject direct moves
for all other targets. This should disable direct moves for generic
targets.

Uros.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 8:41 AM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > QImode data go into mask registers.
> > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > registers.
> > > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > > that would convert "generic" patterns to mask patterns.
> > > > > (*knotsi_1_zext): New define_insn.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > A little nit, please put new splitters after the instruction pattern.
> > > >
> > > > OK for the whole patch set with the above change,
> > > >
> > >
> > > Yes, thanks for the review.
> >
> > Please note that your patch introduces several testsuite fails with -m32:
> >
> > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> >
>
> I can't reproduce this failure.
>
> > Program received signal SIGILL, Illegal instruction.
> > 0x080490ac in __get_cpuid_count (__edx=,
> > __ecx=, __ebx=, __eax= > pointer>,
> > __subleaf=0, __leaf=7) at 
> > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
> >
> >0x080490a3 <+51>:cpuid
> >0x080490a5 <+53>:mov$0x1,%eax
> >0x080490aa <+58>:mov%ecx,%esi
> > => 0x080490ac <+60>:kmovd  %ebx,%k0
> >0x080490b0 <+64>:mov%edi,%ecx
> >0x080490b2 <+66>:mov%edi,%ebx
> >
> > kmov insn is generated for __cpuid_count function, where the binary
> > determines, if the new instructions are supported. The binary will
> > crash in the detection code if the processor lacks AVX512
> > instructions.
> >
>
> IMHO, the testcase shouldn't be run on processors without AVX512BW.
> Because in  avx512bitalg-vpopcntb-1.c, there's /* {
> dg-require-effective-target avx512bw } */.
>

dg-require-effective-target avx512bw checks assembler support.
__cpuid_count can't use any mask instructions.   Please run this test
on CPU without AVX512.


-- 
H.J.


Re: [PATCH] Using gen_int_mode instead of GEN_INT to avoid ICE caused by type promotion.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 21, 2020 at 5:44 PM Richard Sandiford
 wrote:
>
> Hongtao Liu via Gcc-patches  writes:
> > ping ^ 4, it's a very simple fix for ICE.
>
> OK, thanks.  (Reviewing on the basis that I agree it's a simple rtx
> correctness fix.)
>

Thanks for the review.

> Richard
>
> >
> > On Mon, Aug 10, 2020 at 6:00 PM Hongtao Liu  wrote:
> >>
> >> Ping^3
> >>
> >> On Tue, Aug 4, 2020 at 4:21 PM Hongtao Liu  wrote:
> >> >
> >> > ping ^2
> >> >
> >> > On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
> >> > >
> >> > > ping
> >> > >
> >> > > On Wed, Jul 22, 2020 at 3:57 PM Hongtao Liu  wrote:
> >> > > >
> >> > > >   Bootstrap is ok, regression test is ok for i386 backend.
> >> > > >
> >> > > > gcc/
> >> > > > PR target/96262
> >> > > > * config/i386/i386-expand.c
> >> > > > (ix86_expand_vec_shift_qihi_constant): Refine.
> >> > > >
> >> > > > gcc/testsuite/
> >> > > > * gcc.target/i386/pr96262-1.c: New test.
> >> > > >
> >> > > > ---
> >> > > >  gcc/config/i386/i386-expand.c |  6 +++---
> >> > > >  gcc/testsuite/gcc.target/i386/pr96262-1.c | 11 +++
> >> > > >  2 files changed, 14 insertions(+), 3 deletions(-)
> >> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96262-1.c
> >> > > >
> >> > > > diff --git a/gcc/config/i386/i386-expand.c 
> >> > > > b/gcc/config/i386/i386-expand.c
> >> > > > index e194214804b..d57d043106a 100644
> >> > > > --- a/gcc/config/i386/i386-expand.c
> >> > > > +++ b/gcc/config/i386/i386-expand.c
> >> > > > @@ -19537,7 +19537,7 @@ bool
> >> > > >  ix86_expand_vec_shift_qihi_constant (enum rtx_code code, rtx dest,
> >> > > > rtx op1, rtx op2)
> >> > > >  {
> >> > > >machine_mode qimode, himode;
> >> > > > -  unsigned int and_constant, xor_constant;
> >> > > > +  HOST_WIDE_INT and_constant, xor_constant;
> >> > > >HOST_WIDE_INT shift_amount;
> >> > > >rtx vec_const_and, vec_const_xor;
> >> > > >rtx tmp, op1_subreg;
> >> > > > @@ -19612,7 +19612,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> >> > > > rtx_code code, rtx dest, rtx op1, rtx
> >> > > >emit_move_insn (dest, simplify_gen_subreg (qimode, tmp, himode, 
> >> > > > 0));
> >> > > >emit_move_insn (vec_const_and,
> >> > > >   ix86_build_const_vector (qimode, true,
> >> > > > -  GEN_INT (and_constant)));
> >> > > > +  gen_int_mode 
> >> > > > (and_constant,
> >> > > > QImode)));
> >> > > >emit_insn (gen_and (dest, dest, vec_const_and));
> >> > > >
> >> > > >/* For ASHIFTRT, perform extra operation like
> >> > > > @@ -19623,7 +19623,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> >> > > > rtx_code code, rtx dest, rtx op1, rtx
> >> > > >vec_const_xor = gen_reg_rtx (qimode);
> >> > > >emit_move_insn (vec_const_xor,
> >> > > >   ix86_build_const_vector (qimode, true,
> >> > > > -  GEN_INT 
> >> > > > (xor_constant)));
> >> > > > +  gen_int_mode
> >> > > > (xor_constant, QImode)));
> >> > > >emit_insn (gen_xor (dest, dest, vec_const_xor));
> >> > > >emit_insn (gen_sub (dest, dest, vec_const_xor));
> >> > > >  }
> >> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr96262-1.c
> >> > > > b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> >> > > > new file mode 100644
> >> > > > index 000..1825388072e
> >> > > > --- /dev/null
> >> > > > +++ b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> >> > > > @@ -0,0 +1,11 @@
> >> > > > +/* PR target/96262 */
> >> > > > +/* { dg-do compile } */
> >> > > > +/* { dg-options "-mavx512bw -O" } */
> >> > > > +
> >> > > > +typedef char __attribute__ ((__vector_size__ (64))) V;
> >> > > > +
> >> > > > +V
> >> > > > +foo (V v)
> >> > > > +{
> >> > > > +  return ~(v << 1);
> >> > > > +}
> >> > > > --
> >> > > >
> >> > > > --
> >> > > > BR,
> >> > > > Hongtao
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > BR,
> >> > > Hongtao
> >> >
> >> >
> >> >
> >> > --
> >> > BR,
> >> > Hongtao
> >>
> >>
> >>
> >> --
> >> BR,
> >> Hongtao



-- 
BR,
Hongtao


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
>
> > > > gcc/
> > > > PR target/88808
> > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > QImode data go into mask registers.
> > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > for mask registers.
> > > > (*movqi_internal): Ditto.
> > > > (*anddi_1): Support mask register operations
> > > > (*and_1): Ditto.
> > > > (*andqi_1): Ditto.
> > > > (*andn_1): Ditto.
> > > > (*_1): Ditto.
> > > > (*qi_1): Ditto.
> > > > (*one_cmpl2_1): Ditto.
> > > > (*one_cmplsi2_1_zext): Ditto.
> > > > (*one_cmplqi2_1): Ditto.
> > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > registers.
> > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > that would convert "generic" patterns to mask patterns.
> > > > (*knotsi_1_zext): New define_insn.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > >
> > > A little nit, please put new splitters after the instruction pattern.
> > >
> > > OK for the whole patch set with the above change,
> > >
> >
> > Yes, thanks for the review.
>
> Please note that your patch introduces several testsuite fails with -m32:
>
> gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
>

I can't reproduce this failure.

> Program received signal SIGILL, Illegal instruction.
> 0x080490ac in __get_cpuid_count (__edx=,
> __ecx=, __ebx=, __eax= pointer>,
> __subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
>
>0x080490a3 <+51>:cpuid
>0x080490a5 <+53>:mov$0x1,%eax
>0x080490aa <+58>:mov%ecx,%esi
> => 0x080490ac <+60>:kmovd  %ebx,%k0
>0x080490b0 <+64>:mov%edi,%ecx
>0x080490b2 <+66>:mov%edi,%ebx
>
> kmov insn is generated for __cpuid_count function, where the binary
> determines, if the new instructions are supported. The binary will
> crash in the detection code if the processor lacks AVX512
> instructions.
>

IMHO, the testcase shouldn't be run on processors without AVX512BW.
Because in  avx512bitalg-vpopcntb-1.c, there's /* {
dg-require-effective-target avx512bw } */.

what's the version of your assembler?

> Uros.



-- 
BR,
Hongtao


[GCC 10][patch, committed] Backported: [LTO/offloading] Fix offloading-compilation ICE without -flto (PR84320)

2020-08-21 Thread Tobias Burnus

Seemingly, the patch which caused this made it now to GCC 10;
at least it fails now with offloading on the OG10 branch, after
merging the trunk into that branch.

Hence, I committed this to GCC 10 to avoid this ICE. It occurs
here for libgomp.c/../libgomp.c-c++-common/reduction-16.c when
compiling with '-fopenmp' for offloading (when offloading is
configured).

Committed my patch as obvious :-)

I also merged GCC 10 into OG10 to have it there as well.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 2974c828615b240f66b208301b5a73c6a07fcb22
Author: Tobias Burnus 
Date:   Tue May 26 18:24:28 2020 +0200

[LTO/offloading] Fix offloading-compilation ICE without -flto (PR84320)

gcc/ChangeLog:
PR ipa/95320
* ipa-utils.h (odr_type_p): Also permit calls with
only flag_generate_offload set.

(cherry picked from commit c5ab336ba106a407a67e84d8faac5b0ea6f18310)

diff --git a/gcc/ipa-utils.h b/gcc/ipa-utils.h
index 6597593d138..178c2cbe446 100644
--- a/gcc/ipa-utils.h
+++ b/gcc/ipa-utils.h
@@ -245,7 +245,7 @@ odr_type_p (const_tree t)
 {
   /* We do not have this information when not in LTO, but we do not need
  to care, since it is used only for type merging.  */
-  gcc_checking_assert (in_lto_p || flag_lto);
+  gcc_checking_assert (in_lto_p || flag_lto || flag_generate_offload);
   return TYPE_NAME (t) && TREE_CODE (TYPE_NAME (t)) == TYPE_DECL
  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (t));
 }


[PATCH][GCC][GCC-10 backport] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-21 Thread Joe Ramsay
From: Joe Ramsay 

Hi,

Previously, the machine description patterns for vst1q accepted a generic memory
operand for the destination, which could lead to an unrecognised builtin when
expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
memory operands.

Tested on arm-none-eabi, clean w.r.t. gcc and CMSIS-DSP testsuites. Backports
cleanly onto gcc-10 branch. OK for backport?

Thanks,
Joe

gcc/ChangeLog:

PR target/96683
* config/arm/mve.md (mve_vst1q_f): Require MVE memory operand for
destination.
(mve_vst1q_): Likewise.

gcc/testsuite/ChangeLog:

PR target/96683
* gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.

(cherry picked from commit 91d206adfe39ce063f6a5731b92a03c05e82e94a)
---
 gcc/config/arm/mve.md   |  4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
 6 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9758862..465b39a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9330,7 +9330,7 @@
   [(set_attr "length" "4")])
 
 (define_expand "mve_vst1q_f"
-  [(match_operand: 0 "memory_operand")
+  [(match_operand: 0 "mve_memory_operand")
(unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
   ]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
@@ -9340,7 +9340,7 @@
 })
 
 (define_expand "mve_vst1q_"
-  [(match_operand:MVE_2 0 "memory_operand")
+  [(match_operand:MVE_2 0 "mve_memory_operand")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
   ]
   "TARGET_HAVE_MVE"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
index 363b4ca..312b746 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
@@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
   vst1q_f16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (float16_t * addr, float16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (float16_t a, float16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
index 37c4713..cd14e2c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
@@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
   vst1q_s16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (int16_t * addr, int16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (int16_t a, int16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
index fe5edea..0004c80 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
@@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
   vst1q_s8 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrb.8"  }  } */
-
 void
 foo1 (int8_t * addr, int8x16_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrb.8"  }  } */
+/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
+
+void
+foo2 (int8_t a, int8x16_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
index a4c8c1a..248e7ce 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
@@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
   vst1q_u16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (uint16_t * addr, uint16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (uint16_t a, uint16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
> > > gcc/
> > > PR target/88808
> > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > QImode data go into mask registers.
> > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > for mask registers.
> > > (*movqi_internal): Ditto.
> > > (*anddi_1): Support mask register operations
> > > (*and_1): Ditto.
> > > (*andqi_1): Ditto.
> > > (*andn_1): Ditto.
> > > (*_1): Ditto.
> > > (*qi_1): Ditto.
> > > (*one_cmpl2_1): Ditto.
> > > (*one_cmplsi2_1_zext): Ditto.
> > > (*one_cmplqi2_1): Ditto.
> > > (define_peephole2): Move constant 0/-1 directly into mask
> > > registers.
> > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > that would convert "generic" patterns to mask patterns.
> > > (*knotsi_1_zext): New define_insn.
> > >
> > > gcc/testsuite/
> > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> >
> > A little nit, please put new splitters after the instruction pattern.
> >
> > OK for the whole patch set with the above change,
> >
>
> Yes, thanks for the review.

Please note that your patch introduces several testsuite fails with -m32:

gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c

Program received signal SIGILL, Illegal instruction.
0x080490ac in __get_cpuid_count (__edx=,
__ecx=, __ebx=, __eax=,
__subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);

   0x080490a3 <+51>:cpuid
   0x080490a5 <+53>:mov$0x1,%eax
   0x080490aa <+58>:mov%ecx,%esi
=> 0x080490ac <+60>:kmovd  %ebx,%k0
   0x080490b0 <+64>:mov%edi,%ecx
   0x080490b2 <+66>:mov%edi,%ebx

kmov insn is generated for __cpuid_count function, where the binary
determines, if the new instructions are supported. The binary will
crash in the detection code if the processor lacks AVX512
instructions.

Uros.


[PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-21 Thread Roger Sayle

This patch improves the code generated on PA-RISC for DImode
(double word) left shifts by small constants (1-31).  This target
has a very cool shd instruction that can be recognized by combine
for simple shifts, but relying on combine is fragile for more
complicated functions.  This patch tweaks pa.md's ashldi3 expander,
to form the optimal two instruction shd/zdep sequence at RTL
expansion time.

As an example of the benefits of this approach, the simple function

unsigned long long u9(unsigned long long x) { return x*9; }

currently generates 9 instructions

u9: copy %r25,%r28
copy %r26,%r29
extru %r26,2,3,%r21
zdep %r25,28,29,%r19
zdep %r26,28,29,%r20
or %r21,%r19,%r19
add %r29,%r20,%r29
addc %r28,%r19,%r28
bv,n %r0(%r2)

and with this patch now requires only 7:

u9: copy %r25,%r28
copy %r26,%r29
shd %r26,%r25,29,%r19
zdep %r26,28,29,%r20
add %r29,%r20,%r29
addc %r28,%r19,%r28
bv,n %r0(%r2)


This improvement is a first step towards getting synth_mult to
behave sanely on hppa (PR middle-end/87256).

Unfortunately, it's been a long while since I've had access to a
hppa system, so apart from building a cross-compiler and looking at
the assembler it generates, this patch is completely untested.
I was wondering whether Dave or Jeff (or someone else with access
to real hardware) might "spin" this patch for me?

2020-08-21  Roger Sayle  

* config/pa/pa.md (ashldi3): Additionally, on !TARGET_64BIT
generate a two instruction shd/zdep sequence when shifting
registers by suitable constants.
(shd_internal): New define_expand to provide gen_shd_internal.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 6350c68..e7b7635 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -6416,9 +6416,32 @@
   [(set (match_operand:DI 0 "register_operand" "")
(ashift:DI (match_operand:DI 1 "lhs_lshift_operand" "")
   (match_operand:DI 2 "arith32_operand" "")))]
-  "TARGET_64BIT"
+  ""
   "
 {
+  if (!TARGET_64BIT)
+{
+  if (REG_P (operands[0]) && GET_CODE (operands[2]) == CONST_INT)
+   {
+  unsigned HOST_WIDE_INT shift = UINTVAL (operands[2]);
+ if (shift >= 1 && shift <= 31)
+   {
+ rtx dst = operands[0];
+ rtx src = force_reg (DImode, operands[1]);
+ emit_insn (gen_shd_internal (gen_highpart (SImode, dst),
+  gen_highpart (SImode, src),
+  GEN_INT (32-shift),
+  gen_lowpart (SImode, src),
+  GEN_INT (shift)));
+ emit_insn (gen_ashlsi3 (gen_lowpart (SImode, dst),
+ gen_lowpart (SImode, src),
+ GEN_INT (shift)));
+ DONE;
+   }
+   }
+  /* Fallback to using optabs.c's expand_doubleword_shift.  */
+  FAIL;
+}
   if (GET_CODE (operands[2]) != CONST_INT)
 {
   rtx temp = gen_reg_rtx (DImode);
@@ -6705,6 +6728,15 @@
   [(set_attr "type" "shift")
(set_attr "length" "4")])
 
+(define_expand "shd_internal"
+  [(set (match_operand:SI 0 "register_operand")
+   (ior:SI
+ (lshiftrt:SI (match_operand:SI 1 "register_operand")
+  (match_operand:SI 2 "const_int_operand"))
+ (ashift:SI (match_operand:SI 3 "register_operand")
+(match_operand:SI 4 "const_int_operand"]
+  "")
+
 (define_insn ""
   [(set (match_operand:SI 0 "register_operand" "=r")
(and:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")


Re: [PATCH] Fix libstdc++ testsuite to handle VxWorks gthreads implementation

2020-08-21 Thread Alexandre Oliva
On Dec 20, 2019, Jonathan Wakely  wrote:

> On 10/12/19 15:58 +0100, Corentin Gay wrote:
>> This patch was tested on x86_64-linux and is part of our nightly testing
>> on all platforms, including VxWorks.

> Was it tested on AIX?

> I think dg-require-gthreads will prevent the tests running for the
> single-threaded multilib on AIX, so it will work OK. But there's a
> chance it will need fixing. Let's wait and see (I'm currently unable
> to build GCC on AIX).

Sorry it took us so long to get back on this.  I've just refreshed the
patch at :

- resolving some trivial conflicts within 30_threads/shared_mutex,

- updating some renamed test file names within 30_threads/this_thread, and

- dropping the obviated change to 30_threads/this_thread/yield.cc

and gave it a spin on gcc111 in the cfarm.

There weren't any changes to the libstdc++ results, according to
test_summary, not even in the unsupported (or any other) test counts.

> OK for trunk, thanks.

Given the approval and the lack of significant changes, I'll put this in
unless there are objections in the next 48 hours.  Thanks for the review!



Fix libstdc++ testsuite to handle VxWorks gthreads implementation

From: Corentin Gay 

When implementing the support for gthreads in VxWorks, we stumbled on
a problem in the testsuite. In the libstdc++ testsuite, we
indiscriminately add the `-pthread` switch to the tests that require
linking against the pthread library. In certain cases, such as
VxWorks, the gthread interface relies on the system native threads
lilbrary and the `-pthread` switch does not exist.

This patch adds a condition for the use of the `-pthread` switch. It
adds it only if the target supports it. The patch also adds
`dg-require-gthreads` in tests that were lacking it.

libstc++/ChangeLog:
* testsuite/20_util/shared_ptr/atomic/3.cc: Do not require POSIX
threads and add -pthread only on targets supporting them.
* testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc:
Likewise.
* testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc:
Likewise.
* testsuite/30_threads/async/42819.cc: Likewise.
* testsuite/30_threads/async/49668.cc: Likewise.
* testsuite/30_threads/async/54297.cc: Likewise.
* testsuite/30_threads/async/any.cc: Likewise.
* testsuite/30_threads/async/async.cc: Likewise.
* testsuite/30_threads/async/except.cc: Likewise.
* testsuite/30_threads/async/launch.cc: Likewise.
* testsuite/30_threads/async/lwg2021.cc: Likewise.
* testsuite/30_threads/async/sync.cc: Likewise. : Likewise.
* testsuite/30_threads/call_once/39909.cc: Likewise.
* testsuite/30_threads/call_once/49668.cc: Likewise.
* testsuite/30_threads/call_once/60497.cc: Likewise.
* testsuite/30_threads/call_once/call_once1.cc: Likewise.
* testsuite/30_threads/call_once/dr2442.cc: Likewise.
* testsuite/30_threads/condition_variable/54185.cc: Likewise.
* testsuite/30_threads/condition_variable/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable/members/2.cc: Likewise.
* testsuite/30_threads/condition_variable/members/3.cc: Likewise.
* testsuite/30_threads/condition_variable/members/53841.cc: Likewise.
* testsuite/30_threads/condition_variable/members/68519.cc: Likewise.
* testsuite/30_threads/condition_variable/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/condition_variable_any/50862.cc: Likewise.
* testsuite/30_threads/condition_variable_any/53830.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cond.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/members/2.cc: Likewise.
* testsuite/30_threads/future/cons/move.cc: Likewise.
* testsuite/30_threads/future/members/45133.cc: Likewise.
* testsuite/30_threads/future/members/get.cc: Likewise.
* testsuite/30_threads/future/members/get2.cc: Likewise.
* testsuite/30_threads/future/members/share.cc: Likewise.
* testsuite/30_threads/future/members/valid.cc: Likewise.
* testsuite/30_threads/future/members/wait.cc: Likewise.
* testsuite/30_threads/future/members/wait_for.cc: Likewise.
* testsuite/30_threads/future/members/wait_until.cc: Likewise.
* testsuite/30_threads/lock/1.cc: Likewise.
* testsuite/30_threads/lock/2.cc: Likewise.
* testsuite/30_threads/lock/3.cc: Likewise.
* testsuite/30_threads/lock/4.cc: Likewise.
* testsuite/30_threads/mutex/cons/1.cc: Likewise.
* testsuite/30_threads/mutex/dest/destructor_locked.cc: 

Re: [PATCH] libgccjit: update some comments in libgccjit.c

2020-08-21 Thread David Malcolm via Gcc-patches
On Wed, 2020-08-19 at 09:24 +0200, Andrea Corallo wrote:
> Hi all,
> 
> just a small patch updating some comments that apparently went out of
> sync a while ago adding gcc_jit_context_new_rvalue_from_long.

> Okay for trunk?

Yes

Thanks for fixing these
Dave



Re: [OG10] merge GCC 10 into branch; cherry-pick two OpenMP patches

2020-08-21 Thread Tobias Burnus

On 8/21/20 9:55 AM, Tobias Burnus wrote:

* c0db5b424d33577e633895c9c430bc1626336fb5
  Backport of 'Fortran: Fix OpenMP's 'if(simd:' etc. conditions'


Missed that OG10 has changed the warning to an error;
this could be also something for the trunk, matching C/C++
which does print an error ...

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 091fd5dae659b16428ea5ec782d91fa5295991ff
Author: Tobias Burnus 
Date:   Fri Aug 21 13:28:06 2020 +0200

Update dg-* in gfortran.dg/gomp/pr67500.f90

Contrary to GCC 11, OG10 uses an error instead of a warning,
cf. commit 271c7fef548a86676d304b1eb2be5c0d47280bd6.

gcc/testsuite/
* gfortran.dg/gomp/pr67500.f90: Change dg-warning to
dg-error.

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 07ff35ad086..5eeea5afabd 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,8 @@
+2020-08-21  Tobias Burnus  
+
+	* gfortran.dg/gomp/pr67500.f90: Change dg-warning to
+	dg-error.
+
 2020-08-21  Tobias Burnus  
 
 	Backport from mainline
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr67500.f90 b/gcc/testsuite/gfortran.dg/gomp/pr67500.f90
index 1cecdc48578..11ed69f10a7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/pr67500.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/pr67500.f90
@@ -10,11 +10,11 @@ subroutine f2
 end
 
 subroutine f3 (i)
-  !$omp declare simd simdlen(-2)   ! { dg-warning "INTEGER expression of SIMDLEN clause at .1. must be positive" }
+  !$omp declare simd simdlen(-2)   ! { dg-error "INTEGER expression of SIMDLEN clause at .1. must be positive" }
 end subroutine
 
 subroutine f4
-  !$omp declare simd simdlen(0)	   ! { dg-warning "INTEGER expression of SIMDLEN clause at .1. must be positive" }
+  !$omp declare simd simdlen(0)	   ! { dg-error "INTEGER expression of SIMDLEN clause at .1. must be positive" }
 end
 
 subroutine foo(p, d, n)
@@ -31,11 +31,11 @@ subroutine foo(p, d, n)
   do i = 1, 16
   end do
 
-  !$omp simd safelen(-2)! { dg-warning "INTEGER expression of SAFELEN clause at .1. must be positive" }
+  !$omp simd safelen(-2)! { dg-error "INTEGER expression of SAFELEN clause at .1. must be positive" }
   do i = 1, 16
   end do
 
-  !$omp simd safelen(0) ! { dg-warning "INTEGER expression of SAFELEN clause at .1. must be positive" }
+  !$omp simd safelen(0) ! { dg-error "INTEGER expression of SAFELEN clause at .1. must be positive" }
   do i = 1, 16
   end do
 


[committed] libstdc++: Skip PSTL tests when installed TBB is too old [PR 96718]

2020-08-21 Thread Jonathan Wakely via Gcc-patches
These tests do not actually require TBB, because they only inspect the
feature test macros present in the headers. However, if TBB is installed
then its headers will be included, and the version will be checked. If
the version is too old, compilation fails due to a #error directive.

This change disables the tests if TBB is not present, so that we skip
them instead of failing.

libstdc++-v3/ChangeLog:

PR libstdc++/96718
* testsuite/25_algorithms/pstl/feature_test-2.cc: Require
tbb-backend effective target.
* testsuite/25_algorithms/pstl/feature_test-3.cc: Likewise.
* testsuite/25_algorithms/pstl/feature_test-5.cc: Likewise.
* testsuite/25_algorithms/pstl/feature_test.cc: Likewise.

Tested x86_64-linux. Committed to trunk and gcc-10.

commit 988fb2f597d67cdf3603654372c020c28448441f
Author: Jonathan Wakely 
Date:   Fri Aug 21 12:01:05 2020

libstdc++: Skip PSTL tests when installed TBB is too old [PR 96718]

These tests do not actually require TBB, because they only inspect the
feature test macros present in the headers. However, if TBB is installed
then its headers will be included, and the version will be checked. If
the version is too old, compilation fails due to a #error directive.

This change disables the tests if TBB is not present, so that we skip
them instead of failing.

libstdc++-v3/ChangeLog:

PR libstdc++/96718
* testsuite/25_algorithms/pstl/feature_test-2.cc: Require
tbb-backend effective target.
* testsuite/25_algorithms/pstl/feature_test-3.cc: Likewise.
* testsuite/25_algorithms/pstl/feature_test-5.cc: Likewise.
* testsuite/25_algorithms/pstl/feature_test.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-2.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-2.cc
index 3e74f89bc41..88c5ea5b1d1 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-2.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-2.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do preprocess { target c++17 } }
+// { dg-require-effective-target tbb-backend }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-3.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-3.cc
index 7693fe03548..4d75a186211 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-3.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-3.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do preprocess { target c++17 } }
+// { dg-require-effective-target tbb-backend }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-5.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-5.cc
index 2d991958e75..f6f910204fe 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-5.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test-5.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do preprocess { target c++17 } }
+// { dg-require-effective-target tbb-backend }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test.cc
index c3a9be5e45a..817bda3474e 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/feature_test.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-std=gnu++17" }
 // { dg-do preprocess { target c++17 } }
+// { dg-require-effective-target tbb-backend }
 
 #include 
 


[Patch, fortran] PR fortran/95352 - ICE on select rank with assumed-size selector and lbound intrinsic

2020-08-21 Thread José Rui Faustino de Sousa via Gcc-patches

Hi all!

Proposed patch to PR95352 - ICE on select rank with assumed-size 
selector and lbound intrinsic.


Patch tested only on x86_64-pc-linux-gnu.

Add check for NULL pointer before trying to access structure member, 
patch by Steve Kargl.


Thank you very much.

Best regards,
José Rui


2020-8-21  Steve Kargl 

 PR fortran/95352
 * simplify.c (simplify_bound_dim): Add check for NULL pointer before
 trying to access structure member.

2020-8-21  José Rui Faustino de Sousa  

 PR fortran/95352
 * PR95352.f90: New test.
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 074b50c..a1153dd 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4080,7 +4080,7 @@ simplify_bound_dim (gfc_expr *array, gfc_expr *kind, int d, int upper,
   || (coarray && d == as->rank + as->corank
 	  && (!upper || flag_coarray == GFC_FCOARRAY_SINGLE)))
 {
-  if (as->lower[d-1]->expr_type == EXPR_CONSTANT)
+  if (as->lower[d-1] && as->lower[d-1]->expr_type == EXPR_CONSTANT)
 	{
 	  gfc_free_expr (result);
 	  return gfc_copy_expr (as->lower[d-1]);
diff --git a/gcc/testsuite/gfortran.dg/PR95352.f90 b/gcc/testsuite/gfortran.dg/PR95352.f90
new file mode 100644
index 000..20c8167
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR95352.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+!
+! Test the fix for PR95352
+! 
+  
+module ice6_m
+
+  implicit none
+
+contains
+
+  function ice6_s(a) result(ierr)
+integer, intent(in) :: a(..)
+
+integer :: ierr
+
+integer :: lb
+
+select rank(a)
+rank(*)
+  lb = lbound(a, dim=1)
+  if(lbound(a, dim=1)/=lb) ierr = -1
+end select
+return
+  end function ice6_s
+  
+end module ice6_m


Re: [PATCH] cmpelim: recognize extra clobbers in insns

2020-08-21 Thread Richard Sandiford
Pip Cet  writes:
>> Pip Cet via Gcc-patches  writes:
>> > I'm working on the AVR cc0 -> CCmode conversion (bug#92729). One
>> > problem is that the cmpelim pass is currently very strict in requiring
>> > insns of the form
>> >
>> > (parallel [(set (reg:SI) (op:SI ... ...))
>> >(clobber (reg:CC REG_CC))])
>> >
>> > when in fact AVR's insns often have the form
>> >
>> > (parallel [(set (reg:SI) (op:SI ... ...))
>> >(clobber (scratch:QI))
>> >(clobber (reg:CC REG_CC))])
>> >
>> > The attached patch relaxes checks in the cmpelim code to recognize
>> > such insns, and makes it attempt to recognize
>> >
>> > (parallel [(set (reg:CC REG_CC) (compare:CC ... ...))
>> >(set (reg:SI (op:SI ... ...)))
>> >(clobber (scratch:QI))])
>> >
>> > as a new insn for that example. This appears to work.
>>
>> The idea looks good.  However, I think it'd be better (or at least
>> more usual) for the define_insns to list the clobbers the other
>> way around:
>>
>> (parallel [(set (reg:SI) (op:SI ... ...))
>>(clobber (reg:CC REG_CC))
>>(clobber (scratch:QI))])
>
> That makes sense, thanks for the suggestion. I realized quite quickly
> that I would have to reproduce the patterns precisely, including order
> in a parallel, and decided to go with the wrong consistent option by
> always placing the CC clobber last.
>
> My question is whether it makes more sense to recognize either form
> (i.e. a clobber of targetm.flags_regnum at any position in a parallel)
> or whether it's okay to assume that the clobber is always the second
> element in the parallel. I'm leaning towards the latter version.

Yeah, I agree the latter sounds best.  We should be able to test whether
the rest of the parallel is suitable by adding a single_set test
(after testing everything else).

Thanks,
Richard


Re: [PATCH] Using gen_int_mode instead of GEN_INT to avoid ICE caused by type promotion.

2020-08-21 Thread Richard Sandiford
Hongtao Liu via Gcc-patches  writes:
> ping ^ 4, it's a very simple fix for ICE.

OK, thanks.  (Reviewing on the basis that I agree it's a simple rtx
correctness fix.)

Richard

>
> On Mon, Aug 10, 2020 at 6:00 PM Hongtao Liu  wrote:
>>
>> Ping^3
>>
>> On Tue, Aug 4, 2020 at 4:21 PM Hongtao Liu  wrote:
>> >
>> > ping ^2
>> >
>> > On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
>> > >
>> > > ping
>> > >
>> > > On Wed, Jul 22, 2020 at 3:57 PM Hongtao Liu  wrote:
>> > > >
>> > > >   Bootstrap is ok, regression test is ok for i386 backend.
>> > > >
>> > > > gcc/
>> > > > PR target/96262
>> > > > * config/i386/i386-expand.c
>> > > > (ix86_expand_vec_shift_qihi_constant): Refine.
>> > > >
>> > > > gcc/testsuite/
>> > > > * gcc.target/i386/pr96262-1.c: New test.
>> > > >
>> > > > ---
>> > > >  gcc/config/i386/i386-expand.c |  6 +++---
>> > > >  gcc/testsuite/gcc.target/i386/pr96262-1.c | 11 +++
>> > > >  2 files changed, 14 insertions(+), 3 deletions(-)
>> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96262-1.c
>> > > >
>> > > > diff --git a/gcc/config/i386/i386-expand.c 
>> > > > b/gcc/config/i386/i386-expand.c
>> > > > index e194214804b..d57d043106a 100644
>> > > > --- a/gcc/config/i386/i386-expand.c
>> > > > +++ b/gcc/config/i386/i386-expand.c
>> > > > @@ -19537,7 +19537,7 @@ bool
>> > > >  ix86_expand_vec_shift_qihi_constant (enum rtx_code code, rtx dest,
>> > > > rtx op1, rtx op2)
>> > > >  {
>> > > >machine_mode qimode, himode;
>> > > > -  unsigned int and_constant, xor_constant;
>> > > > +  HOST_WIDE_INT and_constant, xor_constant;
>> > > >HOST_WIDE_INT shift_amount;
>> > > >rtx vec_const_and, vec_const_xor;
>> > > >rtx tmp, op1_subreg;
>> > > > @@ -19612,7 +19612,7 @@ ix86_expand_vec_shift_qihi_constant (enum
>> > > > rtx_code code, rtx dest, rtx op1, rtx
>> > > >emit_move_insn (dest, simplify_gen_subreg (qimode, tmp, himode, 0));
>> > > >emit_move_insn (vec_const_and,
>> > > >   ix86_build_const_vector (qimode, true,
>> > > > -  GEN_INT (and_constant)));
>> > > > +  gen_int_mode (and_constant,
>> > > > QImode)));
>> > > >emit_insn (gen_and (dest, dest, vec_const_and));
>> > > >
>> > > >/* For ASHIFTRT, perform extra operation like
>> > > > @@ -19623,7 +19623,7 @@ ix86_expand_vec_shift_qihi_constant (enum
>> > > > rtx_code code, rtx dest, rtx op1, rtx
>> > > >vec_const_xor = gen_reg_rtx (qimode);
>> > > >emit_move_insn (vec_const_xor,
>> > > >   ix86_build_const_vector (qimode, true,
>> > > > -  GEN_INT 
>> > > > (xor_constant)));
>> > > > +  gen_int_mode
>> > > > (xor_constant, QImode)));
>> > > >emit_insn (gen_xor (dest, dest, vec_const_xor));
>> > > >emit_insn (gen_sub (dest, dest, vec_const_xor));
>> > > >  }
>> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr96262-1.c
>> > > > b/gcc/testsuite/gcc.target/i386/pr96262-1.c
>> > > > new file mode 100644
>> > > > index 000..1825388072e
>> > > > --- /dev/null
>> > > > +++ b/gcc/testsuite/gcc.target/i386/pr96262-1.c
>> > > > @@ -0,0 +1,11 @@
>> > > > +/* PR target/96262 */
>> > > > +/* { dg-do compile } */
>> > > > +/* { dg-options "-mavx512bw -O" } */
>> > > > +
>> > > > +typedef char __attribute__ ((__vector_size__ (64))) V;
>> > > > +
>> > > > +V
>> > > > +foo (V v)
>> > > > +{
>> > > > +  return ~(v << 1);
>> > > > +}
>> > > > --
>> > > >
>> > > > --
>> > > > BR,
>> > > > Hongtao
>> > >
>> > >
>> > >
>> > > --
>> > > BR,
>> > > Hongtao
>> >
>> >
>> >
>> > --
>> > BR,
>> > Hongtao
>>
>>
>>
>> --
>> BR,
>> Hongtao


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-21 Thread Richard Sandiford
Alex Coplan  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: 18 August 2020 09:35
>> To: Alex Coplan 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
>> Marcus Shawcroft ; Kyrylo Tkachov
>> 
>> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
>> syntax
>> 
>> Alex Coplan  writes:
>> > Note that an obvious omission here is that this patch does not touch
>> the
>> > mult patterns such as *add__mult_. I found
>> > that I couldn't hit these patterns with C code since multiplications by
>> > powers of two always get turned into shifts by earlier RTL passes. If
>> > there's a way to reliably hit these patterns, then perhaps these should
>> > be updated as well.
>> 
>> Hmm.  Feels like we should either update them or delete them.  E.g.:
>> 
>>   *adds__multp2
>>   *subs__multp2
>> 
>> were added alongside the adds3.c and subs3.c tests that you're updating,
>> so if the tests don't/no longer need the multp2 patterns to pass,
>> there's a good chance that the patterns are redundant.
>> 
>> For reasons I never understood, the canonical representation is to use
>> (mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
>> So perhaps the patterns were originally for address calculations that had
>> been moved outside of a (mem …) and not updated to shifts instead of
>> mults.
>
> Thanks for the review, and for clarifying this. I tried removing these
> together with the *_mul_imm_ patterns (e.g. *adds_mul_imm_) and
> the only failure was gcc/testsuite/gcc.dg/torture/pr34330.c when
> compiled with -Os -ftree-vectorize which appears to depend on the
> *add_mul_imm_di pattern. Without this pattern, we ICE in LRA on this
> input.
>
> In this case, GCC appears to have done exactly what you described: we
> have the (plus (mult ...) ...) nodes inside (mem)s prior to register
> allocation, and then we end up pulling these out without converting them
> to shifts.
>
> Seeing this behaviour (and in particular seeing the ICE) makes me
> hesitant to just go ahead and remove the other patterns. That said, I
> have a patch to remove the following patterns:
>
>  *adds__multp2
>  *subs__multp2
>  *add__mult_
>  *add__mult_si_uxtw
>  *add__multp2
>  *add_si_multp2_uxtw
>  *add_uxt_multp2
>  *add_uxtsi_multp2_uxtw
>  *sub__multp2
>  *sub_si_multp2_uxtw
>  *sub_uxt_multp2
>  *sub_uxtsi_multp2_uxtw
>  *sub_uxtsi_multp2_uxtw
>
> (together with the predicate aarch64_pwr_imm3 which is only used in
> these patterns) and this bootstraps/regtests just fine.
>
> So, I have a couple of questions:
>
> (1) Should it be considered a bug if we pull (plus (mult (power of 2)
>...) ...) out of a (mem) RTX without re-writing the (mult) as a
>shift?

IMO, yes.  But if we have an example in which it happens, we have
to fix it before removing the patterns.  That could end up being
a bit of a rabbit hole, and could affect other targets too.

If we keep the patterns, we should fix the [su]xtw problem in:

  *adds__multp2
  *subs__multp2
  *add__mult_
  *add_uxt_multp2
  *sub_uxt_multp2

too.  (Plus any others I missed, if that isn't the full list.)

Thanks,
Richard


回复:[RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-21 Thread joshua via Gcc-patches
Hi Palmer,

The 64-bit RISC-V Linux port has a minimum of 39-bit virtual addresses, so it 
should be 1<<36 for 64-bit targets. In the implementation of address sanitizer, 
we need a shadow memory that is 1/8th of the memory size, which is
where the 36 comes from. I don't think the choice of this value is arbitrary. 

Jun
--
发件人:Palmer Dabbelt 
发送时间:2020年8月21日(星期五) 00:04
收件人:gcc-patches 
抄 送:Kito Cheng ; Andrew Waterman ; 
gcc-patches ; cooper.joshua 

主 题:Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

On Wed, 19 Aug 2020 02:25:37 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> Hi Andrew:
>
> I am not sure the reason why some targets pick different numbers.
> It seems it's not only target dependent but also OS dependent[1].
>
> For RV32, I think using 1<<29 like other 32 bit targets is fine.
>
> [1] 
> https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/asan/asan_mapping.h#L159
>
> Hi Joshua:
>
> Could you update that for RV32, and this patch will be pending until
> LLVM accepts the libsanitizer part.

This is ABI, and Linux only supports kasan on rv64 right now so it's
technically undefined.  It's probably best to avoid picking an arbitrary number
for rv32, as we still have some open questions WRT the kernel memory map over
there.  I doubt that will get sorted out for a while, as the rv32 doesn't get a
lot of attention (though hopefully the glibc stuff will help out).

> On Wed, Aug 19, 2020 at 4:48 PM Andrew Waterman  wrote:
>>
>> I'm having trouble understanding why different ports chose their
>> various constants--e.g., SPARC uses 1<<29 for 32-bit and 1<<43 for
>> 64-bit, whereas x86 uses 1<<29 and 0x7fff8000, respectively.  So I
>> can't comment on the choice of the constant 1<<36 for RISC-V.  But
>> isn't it a problem that 1<<36 is not a valid Pmode value for ILP32?

This is for kasan (not regular asan), which requires some coordination between
the kernel's memory map and the compiler's inline address sanitizer (as you
can't just pick your own memory map).  Essentially what's going on is that
there's an array of valid tags associated with each address, which is checked
in-line by the compiler for performance reasons (IIRC it used to be library
routines).  The compiler needs to know how to map between addresses and tags,
which depends on the kernel's memory map -- essentially baking the kernel's
memory map into the compiler.  That's why the constants seem somewhat
arbitrary.

In order to save memory there's some lossyness in the address->tag mapping.
Most 32-bit ports pick a tag array that's 1/8th of the memory size, which is
where the 29 comes from.  I don't see any reason why that wouldn't be workable
on rv32, but it seems better to make sure that's the case rather than just
making up an ABI :)

>> On Wed, Aug 19, 2020 at 1:02 AM Joshua via Gcc-patches
>>  wrote:
>> >
>> > From: cooper.joshua 
>> >
>> > gcc/
>> >
>> > * config/riscv/riscv.c (asan_shadow_offset): Implement the offset 
>> > of asan shadow memory for risc-v.
>> > (asan_shadow_offset): new macro definition.
>> > ---
>> >
>> >  gcc/config/riscv/riscv.c | 11 +++
>> >  1 file changed, 11 insertions(+)
>> >
>> > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
>> > index 63b0c38..b85b459 100644
>> > --- a/gcc/config/riscv/riscv.c
>> > +++ b/gcc/config/riscv/riscv.c
>> > @@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
>> >return true;
>> >  }
>> >
>> > +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
>> > +
>> > +static unsigned HOST_WIDE_INT
>> > +riscv_asan_shadow_offset (void)
>> > +{
>> > +  return HOST_WIDE_INT_1U << 36;
>> > +}
>> > +
>> >  /* Initialize the GCC target structure.  */
>> >  #undef TARGET_ASM_ALIGNED_HI_OP
>> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
>> > @@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
>> >  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>> >  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>> >
>> > +#undef TARGET_ASAN_SHADOW_OFFSET
>> > +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
>> > +
>> >  struct gcc_target targetm = TARGET_INITIALIZER;
>> >
>> >  #include "gt-riscv.h"
>> > --
>> > 2.7.4
>> >



RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-21 Thread Tamar Christina
Hi Martin,

> Hi,
> 
> On Thu, Aug 20 2020, Richard Sandiford wrote:
> >>
> >>
> >> Really appreciate for your detailed explanation.  BTW, My previous
> >> patch for PGO build on exchange2 takes this similar method by setting
> >> each cloned node to 1/10th of the frequency several month agao :)
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
> >
> > Does it seem likely that we'll reach a resolution on this soon?
> > I take the point that the patch that introduced the exchange
> > regression
> > [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> > was just uncovering a latent issue, but still, this is a large
> > regression in an important benchmark to be carrying around.  For those
> > of us doing regular benchmark CI, the longer the performance trough
> > gets, the harder it is to spot other unrelated regressions in the “properly
> optimised” code.
> >
> > So if we don't have a specific plan for fixing the regression soon, I
> > think we should consider reverting the patch until we have something
> > that avoids the exchange regression, even though the patch wasn't
> > technically wrong.
> 
> Honza's changes have been motivated to big extent as an enabler for IPA-CP
> heuristics changes to actually speed up 548.exchange2_r.
> 
> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
> below it is just 276 seconds - faster than one built with GCC 8 which needs
> 283 seconds.
> 
> I'll be interested in knowing if it also works this well on other 
> architectures.
> 

Many thanks for working on this!

I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
Do I need any flags for it to work? The patch was applied on top of 
656218ab982cc22b826227045826c92743143af1

And I tried 3 runs
1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 
-fno-inline-functions-called-once
2) -mcpu=native -Ofast -fomit-frame-pointer -flto 
-fno-inline-functions-called-once
3) -mcpu=native -Ofast -fomit-frame-pointer -flto

First one used to give us the best result, with this patch there's no 
difference between 1 and 2 (11% regression) and the 3rd one is about 15% on top 
of that.

If there's anything I can do to help just let me know.

Cheers,
Tamar

> The patch still needs a bit of a cleanup.  The change of the default value of
> ipa-cp-unit-growth needs to be done only for small compilation units (like
> inlining does).  I should experiment if the value of
> param_ipa_cp_loop_hint_bonus should be changed or not.  And last but not
> least, I also want to clean-up the interfaces between ipa-fnsummary.c and
> ipa-cp.c a bit.  I am working on all of this and hope to finish the patch set 
> in a
> few (working) days.
> 
> The bottom line is that there is a plan to address this regression.
> 
> Martin
> 
> 
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index e4910a04ffa..0d44310503a
> 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3190,11 +3190,23 @@ devirtualization_time_bonus (struct
> cgraph_node *node,
>  /* Return time bonus incurred because of HINTS.  */
> 
>  static int
> -hint_time_bonus (cgraph_node *node, ipa_hints hints)
> +hint_time_bonus (cgraph_node *node, ipa_hints hints, sreal
> known_iter_freq,
> +  sreal known_strides_freq)
>  {
>int result = 0;
> -  if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
> -result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
> +  sreal bonus_for_one = opt_for_fn (node->decl,
> + param_ipa_cp_loop_hint_bonus);
> +
> +  if (hints & INLINE_HINT_loop_iterations)
> +{
> +  /* FIXME: This should probably be way more nuanced.  */
> +  result += (known_iter_freq * bonus_for_one).to_int ();
> +}
> +  if (hints & INLINE_HINT_loop_stride)
> +{
> +  /* FIXME: And this as well.  */
> +  result += (known_strides_freq * bonus_for_one).to_int ();
> +}
> +
>return result;
>  }
> 
> @@ -3395,12 +3407,13 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec known_csts,
>  int est_move_cost, ipcp_value_base *val)  {
>int size, time_benefit;
> -  sreal time, base_time;
> +  sreal time, base_time, known_iter_freq, known_strides_freq;
>ipa_hints hints;
> 
>estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>known_aggs, , ,
> -  _time, );
> +  _time, , _iter_freq,
> +  _strides_freq);
>base_time -= time;
>if (base_time > 65535)
>  base_time = 65535;
> @@ -3414,7 +3427,7 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec known_csts,
>  time_benefit = base_time.to_int ()
>+ devirtualization_time_bonus (node, known_csts, known_contexts,
>

Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-21 Thread Richard Sandiford
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Thursday, August 20, 2020 4:55 PM
>> To: xiezhiheng 
>> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> Sent: Wednesday, August 19, 2020 6:06 PM
>> >> To: xiezhiheng 
>> >> Cc: Richard Biener ;
>> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >>
>> >> xiezhiheng  writes:
>> >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first 
>> >> > for a
>> try,
>> >> > including all the add/sub arithmetic intrinsics.
>> >> >
>> >> > Something like faddp intrinsic which only handles floating-point
>> operations,
>> >> > both FP and NONE flags are suitable for it because FLAG_FP will be
>> added
>> >> > later if the intrinsic handles floating-point operations.  And I prefer 
>> >> > FP
>> >> since
>> >> > it would be more clear.
>> >>
>> >> Sounds good to me.
>> >>
>> >> > But for qadd intrinsics, they would modify FPSR register which is a
>> scenario
>> >> > I missed before.  And I consider to add an additional flag
>> >> FLAG_WRITE_FPSR
>> >> > to represent it.
>> >>
>> >> I don't think we make any attempt to guarantee that the Q flag is
>> >> meaningful after saturating intrinsics.  To do that, we'd need to model
>> >> the modification of the flag in the .md patterns too.
>> >>
>> >> So my preference would be to leave this out and just use NONE for the
>> >> saturating forms too.
>> >
>> > The problem is that the test case in the attachment has different results
>> under -O0 and -O2.
>> 
>> Right.  But my point was that I don't think that use case is supported.
>> If you want to use saturating instructions and read the Q flag afterwards,
>> the saturating instructions need to be inline asm too.
>> 
>> > In gimple phase statement:
>> >   _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
>> > would be treated as dead code if we set NONE flag for saturating 
>> > intrinsics.
>> > Adding FLAG_WRITE_FPSR would help fix this problem.
>> >
>> > Even when we set FLAG_WRITE_FPSR, the uqadd insn:
>> >   (insn 11 10 12 2 (set (reg:V2SI 97)
>> > (us_plus:V2SI (reg:V2SI 98)
>> > (reg:V2SI 99))) {aarch64_uqaddv2si}
>> >  (nil))
>> > could also be eliminated in RTL phase because this insn will be treated as
>> dead insn.
>> > So I think we might also need to modify saturating instruction patterns
>> adding the side effect of set the FPSR register.
>> 
>> The problem is that FPSR is global state and we don't in general
>> know who might read it.  So if we modelled the modification of the FPSR,
>> we'd never be able to fold away saturating arithmetic that does actually
>> saturate at compile time, because we'd never know whether the program
>> wanted the effect on the Q flag result to be visible (perhaps to another
>> function that the compiler can't see).  We'd also be unable to remove
>> results that really are dead.
>> 
>> So I think this is one of those situations in which we can't keep all
>> constituents happy.  Catering for people who want to read the Q flag
>> would make things worse for those who want saturating arithmetic to be
>> optimised as aggressively as possible.  And the same holds in reverse.
>
> I agree.  The test case is extracted from 
> gcc.target/aarch64/advsimd-intrinsics/vqadd.c
> If we set NONE flag for saturating intrinsics, it would fail in regression 
> because some qadd
> intrinsics would be treated as dead code and be eliminated.
>   Running target unix
>   Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ...
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  (test for excess 
> errors)
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  (test for 
> excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  (test for 
> excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
> -fno-use-linker-plugin 

[OG10] merge GCC 10 into branch; cherry-pick two OpenMP patches

2020-08-21 Thread Tobias Burnus

OG10 = devel/omp/gcc-10 – a GCC 10 branch with some additional 
OpenMP/OpenACC/offloading patches

Commits:
* 16052969a54db5df3d37fdcc81acba6ed1ec8c6a
  moved OG10 ChangeLog items to ChangeLog.omp
* 612fee635bbb1198bc550c9c328330cae3259ed5
  Merged origin/releases/gcc-10 into branch
* f99ec65aea0d432b98a160035bebb6be23b9acfe
  Backport of 'OpenMP: Support 'if (simd:/cancel:' in Fortran'
* c0db5b424d33577e633895c9c430bc1626336fb5
  Backport of 'Fortran: Fix OpenMP's 'if(simd:' etc. conditions'

Thanks,

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-21 Thread Olivier Hainque
Hello Iain,

> On 20 Aug 2020, at 14:54, Iain Buclaw  wrote:
> 
>> We have a batch of vxworks changes queued that we will be submitting soon,
>> and we might get to rationalize this with other places along the way.
>> 
> 
> Running the build through one more time, and I've noticed that the make
> recipe macro_list also triggers a VSB_DIR not defined error.
> 
> However unlike selftests, it does not result in a failed build, so
> probably only a minor concern.

Thanks for the note.

macro_list is essentially for fixincludes, a notorious source
of headaches for VxWorks. IIRC, there is code to inhibit macro-lists
somewhere down the line so it doesn't seem problematic.

It seems ok for the kind of sanity-check build you are doing anyway, and
not the kind of thing we want to make special efforts hiding, as we want
to catch such missing definitions in builds intended for real environments,
when we do need visibility on the system headers.

Cheers,

Olivier


[PATCH] C-SKY: Add -mbacktrace option.

2020-08-21 Thread Jojo R
gcc/ChangeLog:

* config/csky/csky.opt (TARGET_BACKTRACE): New.
* doc/invoke.texi (C-SKY Options): Document -mbacktrace.

---
 gcc/config/csky/csky.opt | 4 
 gcc/doc/invoke.texi  | 7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/csky/csky.opt b/gcc/config/csky/csky.opt
index 60b51e5..00fa343 100644
--- a/gcc/config/csky/csky.opt
+++ b/gcc/config/csky/csky.opt
@@ -192,3 +192,7 @@ Set the branch costs to roughly the specified number of 
instructions.
 msched-prolog
 Target Report Var(flag_sched_prolog) Init(0)
 Permit scheduling of function prologue and epilogue sequences.
+
+mbacktrace
+Target Report Var(TARGET_BACKTRACE) Init(0) Undocumented
+Generate code can be backtraced.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6352eab..41d0634 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -821,7 +821,7 @@ Objective-C and Objective-C++ Dialects}.
 -mdsp  -medsp  -mvdsp @gol
 -mdiv  -msmart  -mhigh-registers  -manchor @gol
 -mpushpop  -mmultiple-stld  -mconstpool  -mstack-size  -mccrt @gol
--mbranch-cost=@var{n}  -mcse-cc  -msched-prolog}
+-mbranch-cost=@var{n}  -mcse-cc  -msched-prolog -mbacktrace}
 
 @emph{Darwin Options}
 @gccoptlist{-all_load  -allowable_client  -arch  -arch_errors_fatal @gol
@@ -20434,6 +20434,11 @@ this option can result in code that is not compliant 
with the C-SKY V2 ABI
 prologue requirements and that cannot be debugged or backtraced.
 It is disabled by default.
 
+@item -mbacktrace
+@itemx -mno-backtrace
+@opindex mbacktrace
+Generate code can be backtraced.  This option defaults to off.
+
 @end table
 
 @node Darwin Options
-- 
1.9.1