Re: [PATCH] Fix 300.twolf regression caused by STV

2019-08-20 Thread Richard Biener
On Tue, 20 Aug 2019, Uros Bizjak wrote:

> >> Uros noted that STV with !TImode isn't supposed to run before combine.
> >> The following adjusts things accordingly and now the pass runs twice
> >> for TARGET_64BIT.  I've also adjusted another gpr->xmm move to
> >> use (vec_merge (vec_duplicate..)) style rather than using a subreg.
> >> This isn't strictly neccesary to fix the bug though and my previous
> >> needs to do this might have been caused by the pass running too early.
> >>
> >> So - with or without this consistency part?
> 
> The true STV pass (SI/DImode) is intended to run after combine, so
> memory operands are merged to the instruction together with other
> scalar simplications combine can perform.
> 
> As noted by HJ, there is intended pessimization of moves betwen
> register sets that was introduced before IRA and
> preffered_for_{speed,size} attribute.
> 
> >> Bootstrap / regtest running on x86_64-unknown-linux-gnu, OK?
> 
> IMO, the patch is OK, the regression should be solved by retuning
> iter-regset move cost. We should not trick the compiler by hiding the
> move using tricks where we make the move RTX more complex, this is
> fragile and will regress as soon as the compiler learns corresponding
> simplification. One example is the referred failing minmax-6.c case.

OK, I have applied the patch now and the minmax-6.c testcase now FAILs
(while the added minmax-7.c one passes).

Let's work from this state, testers will pick up any other fallout.

Richard.

> Uros.
> 
> > It regresses
> >
> > FAIL: gcc.target/i386/minmax-6.c scan-assembler-not rsp
> >
> > Have to investigate that again then (it caused the change to
> > use (vec_merge (vec_duplicate..)) for conversion in the first place)
> >
> > Richard.
> >
> >> Thanks,
> >> Richard.
> >>
> >> 2019-08-19  Richard Biener  
> >>
> >> PR target/91154
> >> * config/i386/i386-features.c (general_scalar_chain::convert_op):
> >> Use (vec_merge (vec_duplicate..)) style vector from scalar move.
> >> (convert_scalars_to_vector): Add timode_p parameter and use it
> >> to guard TImode-only operation.
> >> (pass_stv::gate): Adjust so STV runs twice for TARGET_64BIT.
> >> (pass_stv::execute): Pass down timode_p.
> >>
> >> * gcc.target/i386/minmax-7.c: New testcase.
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: [PATCH] Fix 300.twolf regression caused by STV

2019-08-20 Thread Uros Bizjak
>> Uros noted that STV with !TImode isn't supposed to run before combine.
>> The following adjusts things accordingly and now the pass runs twice
>> for TARGET_64BIT.  I've also adjusted another gpr->xmm move to
>> use (vec_merge (vec_duplicate..)) style rather than using a subreg.
>> This isn't strictly neccesary to fix the bug though and my previous
>> needs to do this might have been caused by the pass running too early.
>>
>> So - with or without this consistency part?

The true STV pass (SI/DImode) is intended to run after combine, so
memory operands are merged to the instruction together with other
scalar simplications combine can perform.

As noted by HJ, there is intended pessimization of moves betwen
register sets that was introduced before IRA and
preffered_for_{speed,size} attribute.

>> Bootstrap / regtest running on x86_64-unknown-linux-gnu, OK?

IMO, the patch is OK, the regression should be solved by retuning
iter-regset move cost. We should not trick the compiler by hiding the
move using tricks where we make the move RTX more complex, this is
fragile and will regress as soon as the compiler learns corresponding
simplification. One example is the referred failing minmax-6.c case.

Uros.

> It regresses
>
> FAIL: gcc.target/i386/minmax-6.c scan-assembler-not rsp
>
> Have to investigate that again then (it caused the change to
> use (vec_merge (vec_duplicate..)) for conversion in the first place)
>
> Richard.
>
>> Thanks,
>> Richard.
>>
>> 2019-08-19  Richard Biener  
>>
>> PR target/91154
>> * config/i386/i386-features.c (general_scalar_chain::convert_op):
>> Use (vec_merge (vec_duplicate..)) style vector from scalar move.
>> (convert_scalars_to_vector): Add timode_p parameter and use it
>> to guard TImode-only operation.
>> (pass_stv::gate): Adjust so STV runs twice for TARGET_64BIT.
>> (pass_stv::execute): Pass down timode_p.
>>
>> * gcc.target/i386/minmax-7.c: New testcase.


Re: [PATCH] Fix 300.twolf regression caused by STV

2019-08-19 Thread Richard Biener
On Mon, 19 Aug 2019, Richard Biener wrote:

> 
> Uros noted that STV with !TImode isn't supposed to run before combine.
> The following adjusts things accordingly and now the pass runs twice
> for TARGET_64BIT.  I've also adjusted another gpr->xmm move to
> use (vec_merge (vec_duplicate..)) style rather than using a subreg.
> This isn't strictly neccesary to fix the bug though and my previous
> needs to do this might have been caused by the pass running too early.
> 
> So - with or without this consistency part?
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu, OK?

It regresses

FAIL: gcc.target/i386/minmax-6.c scan-assembler-not rsp

Have to investigate that again then (it caused the change to
use (vec_merge (vec_duplicate..)) for conversion in the first place)

Richard.

> Thanks,
> Richard.
> 
> 2019-08-19  Richard Biener  
> 
>   PR target/91154
>   * config/i386/i386-features.c (general_scalar_chain::convert_op):
>   Use (vec_merge (vec_duplicate..)) style vector from scalar move.
>   (convert_scalars_to_vector): Add timode_p parameter and use it
>   to guard TImode-only operation.
>   (pass_stv::gate): Adjust so STV runs twice for TARGET_64BIT.
>   (pass_stv::execute): Pass down timode_p.
> 
>   * gcc.target/i386/minmax-7.c: New testcase.
> 
> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c   (revision 274666)
> +++ gcc/config/i386/i386-features.c   (working copy)
> @@ -910,7 +910,9 @@ general_scalar_chain::convert_op (rtx *o
>  {
>rtx tmp = gen_reg_rtx (GET_MODE (*op));
>  
> -  emit_insn_before (gen_move_insn (tmp, *op), insn);
> +  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> +  gen_gpr_to_xmm_move_src (vmode, *op)),
> + insn);
>*op = gen_rtx_SUBREG (vmode, tmp, 0);
>  
>if (dump_file)
> @@ -1664,7 +1666,7 @@ timode_remove_non_convertible_regs (bitm
> instructions into vector mode when profitable.  */
>  
>  static unsigned int
> -convert_scalars_to_vector ()
> +convert_scalars_to_vector (bool timode_p)
>  {
>basic_block bb;
>int converted_insns = 0;
> @@ -1690,7 +1692,7 @@ convert_scalars_to_vector ()
>  {
>rtx_insn *insn;
>FOR_BB_INSNS (bb, insn)
> - if (TARGET_64BIT
> + if (timode_p
>   && timode_scalar_to_vector_candidate_p (insn))
> {
>   if (dump_file)
> @@ -1699,7 +1701,7 @@ convert_scalars_to_vector ()
>  
>   bitmap_set_bit ([2], INSN_UID (insn));
> }
> - else
> + else if (!timode_p)
> {
>   /* Check {SI,DI}mode.  */
>   for (unsigned i = 0; i <= 1; ++i)
> @@ -1715,7 +1717,7 @@ convert_scalars_to_vector ()
> }
>  }
>  
> -  if (TARGET_64BIT)
> +  if (timode_p)
>  timode_remove_non_convertible_regs ([2]);
>for (unsigned i = 0; i <= 1; ++i)
>  general_remove_non_convertible_regs ([i]);
> @@ -1875,13 +1877,13 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return (timode_p == !!TARGET_64BIT
> +  return ((!timode_p || TARGET_64BIT)
> && TARGET_STV && TARGET_SSE2 && optimize > 1);
>  }
>  
>virtual unsigned int execute (function *)
>  {
> -  return convert_scalars_to_vector ();
> +  return convert_scalars_to_vector (timode_p);
>  }
>  
>opt_pass *clone ()
> Index: gcc/testsuite/gcc.target/i386/minmax-7.c
> ===
> --- gcc/testsuite/gcc.target/i386/minmax-7.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/i386/minmax-7.c  (working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=haswell" } */
> +
> +extern int numBins;
> +extern int binOffst;
> +extern int binWidth;
> +extern int Trybin;
> +void foo (int);
> +
> +void bar (int aleft, int axcenter)
> +{
> +  int a1LoBin = (((Trybin=((axcenter + aleft)-binOffst)/binWidth)<0)
> +  ? 0 : ((Trybin>numBins) ? numBins : Trybin));
> +  foo (a1LoBin);
> +}
> +
> +/* We do not want the RA to spill %esi for it's dual-use but using
> +   pminsd is OK.  */
> +/* { dg-final { scan-assembler-not "rsp" { target { ! { ia32 } } } } } */
> +/* { dg-final { scan-assembler "pminsd" } } */
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

[PATCH] Fix 300.twolf regression caused by STV

2019-08-19 Thread Richard Biener


Uros noted that STV with !TImode isn't supposed to run before combine.
The following adjusts things accordingly and now the pass runs twice
for TARGET_64BIT.  I've also adjusted another gpr->xmm move to
use (vec_merge (vec_duplicate..)) style rather than using a subreg.
This isn't strictly neccesary to fix the bug though and my previous
needs to do this might have been caused by the pass running too early.

So - with or without this consistency part?

Bootstrap / regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2019-08-19  Richard Biener  

PR target/91154
* config/i386/i386-features.c (general_scalar_chain::convert_op):
Use (vec_merge (vec_duplicate..)) style vector from scalar move.
(convert_scalars_to_vector): Add timode_p parameter and use it
to guard TImode-only operation.
(pass_stv::gate): Adjust so STV runs twice for TARGET_64BIT.
(pass_stv::execute): Pass down timode_p.

* gcc.target/i386/minmax-7.c: New testcase.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274666)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -910,7 +910,9 @@ general_scalar_chain::convert_op (rtx *o
 {
   rtx tmp = gen_reg_rtx (GET_MODE (*op));
 
-  emit_insn_before (gen_move_insn (tmp, *op), insn);
+  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+gen_gpr_to_xmm_move_src (vmode, *op)),
+   insn);
   *op = gen_rtx_SUBREG (vmode, tmp, 0);
 
   if (dump_file)
@@ -1664,7 +1666,7 @@ timode_remove_non_convertible_regs (bitm
instructions into vector mode when profitable.  */
 
 static unsigned int
-convert_scalars_to_vector ()
+convert_scalars_to_vector (bool timode_p)
 {
   basic_block bb;
   int converted_insns = 0;
@@ -1690,7 +1692,7 @@ convert_scalars_to_vector ()
 {
   rtx_insn *insn;
   FOR_BB_INSNS (bb, insn)
-   if (TARGET_64BIT
+   if (timode_p
&& timode_scalar_to_vector_candidate_p (insn))
  {
if (dump_file)
@@ -1699,7 +1701,7 @@ convert_scalars_to_vector ()
 
bitmap_set_bit ([2], INSN_UID (insn));
  }
-   else
+   else if (!timode_p)
  {
/* Check {SI,DI}mode.  */
for (unsigned i = 0; i <= 1; ++i)
@@ -1715,7 +1717,7 @@ convert_scalars_to_vector ()
  }
 }
 
-  if (TARGET_64BIT)
+  if (timode_p)
 timode_remove_non_convertible_regs ([2]);
   for (unsigned i = 0; i <= 1; ++i)
 general_remove_non_convertible_regs ([i]);
@@ -1875,13 +1877,13 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return (timode_p == !!TARGET_64BIT
+  return ((!timode_p || TARGET_64BIT)
  && TARGET_STV && TARGET_SSE2 && optimize > 1);
 }
 
   virtual unsigned int execute (function *)
 {
-  return convert_scalars_to_vector ();
+  return convert_scalars_to_vector (timode_p);
 }
 
   opt_pass *clone ()
Index: gcc/testsuite/gcc.target/i386/minmax-7.c
===
--- gcc/testsuite/gcc.target/i386/minmax-7.c(nonexistent)
+++ gcc/testsuite/gcc.target/i386/minmax-7.c(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+
+extern int numBins;
+extern int binOffst;
+extern int binWidth;
+extern int Trybin;
+void foo (int);
+
+void bar (int aleft, int axcenter)
+{
+  int a1LoBin = (((Trybin=((axcenter + aleft)-binOffst)/binWidth)<0)
+? 0 : ((Trybin>numBins) ? numBins : Trybin));
+  foo (a1LoBin);
+}
+
+/* We do not want the RA to spill %esi for it's dual-use but using
+   pminsd is OK.  */
+/* { dg-final { scan-assembler-not "rsp" { target { ! { ia32 } } } } } */
+/* { dg-final { scan-assembler "pminsd" } } */