Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-09 Thread Segher Boessenkool
Hi,

On Tue, May 08, 2018 at 05:04:33PM -0700, Carl Love wrote:
> On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote:
> > What ISA version is required for the TH field to do anything?  Will
> > it work on older machines too (just ignored)?  What assembler version
> > is required?
> 
> I went back and checked.  The mnemonics for 
> 
>   dcbtt RA,RB  dcbt for TH value of 0b1
>   dcbtstt RA,RB dcbtst for TH value of 0b1.
> 
> were introduced in ISA 2.06.

So we need a check for that.  TH=0b1 is new in ISA 2.06 as well.
For older CPUs the hint is undefined.

I think we can assume that all assemblers for which we can compile with
-mcpu=power7 will support the 3-argument form of dcbt, and have dcbtt etc.
as well.  So you just need to test for TARGET_POPCNTD I think?

> There is another pair of mnemonics 
> 
>   dcbtds RA,RB,TH   dcbt for TH values of 0b0 or
>     0b01000 - 0b0;
>     other TH values are invalid.
> 
>   dcbtstds RA,RB,TH  dcbtst for TH values of 0b0
>          or 0b01000 - 0b01010;
>      other TH values are invalid.
> 
> that could be used instead.  These are both supported starting with 
> ISA 2.05.  The dcbtds is actually supported back to ISA 2.03 but the
> dcbtstds is not.

Those do not cover 0b1.

> #ifndef HAVE_AS_POPCNTB   
>   
> #undef  TARGET_POPCNTB
>   
> #define TARGET_POPCNTB 0  
>   
> #endif  
> 
> I haven't found anything that I could use specifically for Power 7 and
> newer.  Not sure if it is worth defining a HAVE_AS_DCBTT to do
> something similar?  Seems a bit over kill.  Thoughts on how to limit
> the generation of dcbtt and dcbtstt to Power 7 or newer?

There is TARGET_POPCNTD.  (TARGET_POPCNTB is for ISA 2.02).


Segher


Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-08 Thread Carl Love
Segher:

On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote:
> What ISA version is required for the TH field to do anything?  Will
> it work on older machines too (just ignored)?  What assembler version
> is required?

I went back and checked.  The mnemonics for 

  dcbtt RA,RB  dcbt for TH value of 0b1
  dcbtstt RA,RB dcbtst for TH value of 0b1.

were introduced in ISA 2.06.

There is another pair of mnemonics 

  dcbtds RA,RB,TH   dcbt for TH values of 0b0 or
    0b01000 - 0b0;
    other TH values are invalid.

  dcbtstds RA,RB,TH  dcbtst for TH values of 0b0
         or 0b01000 - 0b01010;
     other TH values are invalid.

that could be used instead.  These are both supported starting with 
ISA 2.05.  The dcbtds is actually supported back to ISA 2.03 but the
dcbtstds is not.

I was looking for some kind of conditional compilation for Power 7 or
newer.  In rs6000.h there are defines for the assembler supporting the
popcount byte instruction, 

#ifndef HAVE_AS_POPCNTB 
#undef  TARGET_POPCNTB  
#define TARGET_POPCNTB 0
#endif  

I haven't found anything that I could use specifically for Power 7 and
newer.  Not sure if it is worth defining a HAVE_AS_DCBTT to do
something similar?  Seems a bit over kill.  Thoughts on how to limit
the generation of dcbtt and dcbtstt to Power 7 or newer?

   Carl Love



Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-08 Thread Segher Boessenkool
Hi Carl,

On Mon, May 07, 2018 at 01:34:55PM -0700, Carl Love wrote:
> This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH
> value of 0b1) for a write prefetch and dcbtst for n2 in range
> [1,3].  
> 
> The dcbtt mnemonic (dcbt for TH value of 0b1) is generated for a
> read prefetch when n2=0 and the dbct instruction is generated for n2 in
> range [1,3].
> 
> The ISA states that the value TH = 0b1 is a hint that the processor
> will probably soon perform a load from the addressed block. 

(s/dcbst/dcbtst/).  Yup, sounds good.

> The regression testing of the patch was done on 
> 
>powerpc64le-unknown-linux-gnu (Power 8 LE)
> 
> with no regressions.  

What ISA version is required for the TH field to do anything?  Will
it work on older machines too (just ignored)?  What assembler version
is required?

> 2018-05-07  Carl Love  
> 
> * config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation
>   to define_insn prefetch.

* config/rs6000/rs6000.md (prefetch): Generate dcbtt and dcbtstt instructions
if operands[2] is 0.

or similar.

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 2b15cca..7429d33 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -13233,10 +13233,19 @@
>(match_operand:SI 2 "const_int_operand" "n"))]
>""
>  {
> -  if (GET_CODE (operands[0]) == REG)
> -return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0";
> -  return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0";
> -}
> +  if (GET_CODE (operands[0]) == REG) {

Use REG_P please.

The correct formatting is

if (this)
  {
something;
  }
else
  {
whatever;
  }

> +if (INTVAL (operands[1]) == 0)

You can also say

  if (operands[1] == const0_rtx)

if that is easier to read.

> +  }
> + }

If you do indenting right there never is a single space indent difference
(always is even).

It's a pity we need to decide between %a0 and not.  Hardly seems worth
making another output modifier for though.


Segher


[PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-07 Thread Carl Love
GCC maintainers:

The architecture independent builtin __builtin_prefetch() is defined
as:

void __builtin_prefetch (const void *addr, int n1, int n2)
 
n1 -
prefetch read = 0, prefetch write = 1
n2 - temporal locality 0 to 3.  No
temporal locality = 0, high temporal
 locality = 3.

The implementation for Power maps to define_insn "prefetch" in
gcc/config/rs6000/rs6000.md.  The Power implementation currently
ignores the value of n2 and simply generates the dcbtst and dbct
instructions. 

This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH
value of 0b1) for a write prefetch and dcbtst for n2 in range
[1,3].  

The dcbtt mnemonic (dcbt for TH value of 0b1) is generated for a
read prefetch when n2=0 and the dbct instruction is generated for n2 in
range [1,3].

The ISA states that the value TH = 0b1 is a hint that the processor
will probably soon perform a load from the addressed block. 
 
There is an existing test case in
gcc/testsuite/gcc.target/sh/prefetch.dump.  The test case generates the
following output with the patch:

gcc -g -c -o prefetch prefetch.c
objdump -S -d prefetch > prefetch.dump
more prefetch.dump

...
  __builtin_prefetch (&data[0], 0, 0);  
   c:   2c 00 3f 39 addir9,r31,44   
  10:   2c 4a 00 7e dcbtt   0,r9
  __builtin_prefetch (&data[0], 0, 1);  
  14:   2c 00 3f 39 addir9,r31,44   
  18:   2c 4a 00 7c dcbt0,r9
  __builtin_prefetch (&data[0], 0, 2);  
  1c:   2c 00 3f 39 addir9,r31,44   
  20:   2c 4a 00 7c dcbt0,r9
  __builtin_prefetch (&data[0], 0, 3);  
  24:   2c 00 3f 39 addir9,r31,44   
  28:   2c 4a 00 7c dcbt0,r9
  __builtin_prefetch (&data[0], 1, 0);  
  2c:   2c 00 3f 39 addir9,r31,44   
  30:   ec 49 00 7e dcbtstt 0,r9
  __builtin_prefetch (&data[0], 1, 1);  
  34:   2c 00 3f 39 addir9,r31,44   
  38:   ec 49 00 7c dcbtst  0,r9
  __builtin_prefetch (&data[0], 1, 2);  
  3c:   2c 00 3f 39 addir9,r31,44   
  40:   ec 49 00 7c dcbtst  0,r9
  __builtin_prefetch (&data[0], 1, 3);  
  44:   2c 00 3f 39 addir9,r31,44   
  48:   ec 49 00 7c dcbtst  0,r9 
...

The regression testing of the patch was done on 

   powerpc64le-unknown-linux-gnu (Power 8 LE)

with no regressions.  

Please let me know if the patch looks OK for GCC mainline.

 Carl Love


gcc/ChangeLog:

2018-05-07  Carl Love  

* config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation
to define_insn prefetch.
---
 gcc/config/rs6000/rs6000.md | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2b15cca..7429d33 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13233,10 +13233,19 @@
 (match_operand:SI 2 "const_int_operand" "n"))]
   ""
 {
-  if (GET_CODE (operands[0]) == REG)
-return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0";
-  return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0";
-}
+  if (GET_CODE (operands[0]) == REG) {
+if (INTVAL (operands[1]) == 0)
+  return INTVAL (operands[2]) ? "dcbt 0,%0" : "dcbtt 0,%0";
+else
+  return INTVAL (operands[2]) ? "dcbtst 0,%0" : "dcbtstt 0,%0";
+
+  } else {
+if (INTVAL (operands[1]) == 0)
+  return INTVAL (operands[2]) ? "dcbt %a0" : "dcbtt %a0";
+else
+  return INTVAL (operands[2]) ? "dcbtst %a0" : "dcbtstt %a0";
+  }
+ }
   [(set_attr "type" "load")])
 
 ;; Handle -fsplit-stack.
-- 
2.7.4