Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.
Hi, On Tue, May 08, 2018 at 05:04:33PM -0700, Carl Love wrote: > On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote: > > What ISA version is required for the TH field to do anything? Will > > it work on older machines too (just ignored)? What assembler version > > is required? > > I went back and checked. The mnemonics for > > dcbtt RA,RB dcbt for TH value of 0b1 > dcbtstt RA,RB dcbtst for TH value of 0b1. > > were introduced in ISA 2.06. So we need a check for that. TH=0b1 is new in ISA 2.06 as well. For older CPUs the hint is undefined. I think we can assume that all assemblers for which we can compile with -mcpu=power7 will support the 3-argument form of dcbt, and have dcbtt etc. as well. So you just need to test for TARGET_POPCNTD I think? > There is another pair of mnemonics > > dcbtds RA,RB,TH dcbt for TH values of 0b0 or > 0b01000 - 0b0; > other TH values are invalid. > > dcbtstds RA,RB,TH dcbtst for TH values of 0b0 > or 0b01000 - 0b01010; > other TH values are invalid. > > that could be used instead. These are both supported starting with > ISA 2.05. The dcbtds is actually supported back to ISA 2.03 but the > dcbtstds is not. Those do not cover 0b1. > #ifndef HAVE_AS_POPCNTB > > #undef TARGET_POPCNTB > > #define TARGET_POPCNTB 0 > > #endif > > I haven't found anything that I could use specifically for Power 7 and > newer. Not sure if it is worth defining a HAVE_AS_DCBTT to do > something similar? Seems a bit over kill. Thoughts on how to limit > the generation of dcbtt and dcbtstt to Power 7 or newer? There is TARGET_POPCNTD. (TARGET_POPCNTB is for ISA 2.02). Segher
Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.
Segher: On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote: > What ISA version is required for the TH field to do anything? Will > it work on older machines too (just ignored)? What assembler version > is required? I went back and checked. The mnemonics for dcbtt RA,RB dcbt for TH value of 0b1 dcbtstt RA,RB dcbtst for TH value of 0b1. were introduced in ISA 2.06. There is another pair of mnemonics dcbtds RA,RB,TH dcbt for TH values of 0b0 or 0b01000 - 0b0; other TH values are invalid. dcbtstds RA,RB,TH dcbtst for TH values of 0b0 or 0b01000 - 0b01010; other TH values are invalid. that could be used instead. These are both supported starting with ISA 2.05. The dcbtds is actually supported back to ISA 2.03 but the dcbtstds is not. I was looking for some kind of conditional compilation for Power 7 or newer. In rs6000.h there are defines for the assembler supporting the popcount byte instruction, #ifndef HAVE_AS_POPCNTB #undef TARGET_POPCNTB #define TARGET_POPCNTB 0 #endif I haven't found anything that I could use specifically for Power 7 and newer. Not sure if it is worth defining a HAVE_AS_DCBTT to do something similar? Seems a bit over kill. Thoughts on how to limit the generation of dcbtt and dcbtstt to Power 7 or newer? Carl Love
Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.
Hi Carl, On Mon, May 07, 2018 at 01:34:55PM -0700, Carl Love wrote: > This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH > value of 0b1) for a write prefetch and dcbtst for n2 in range > [1,3]. > > The dcbtt mnemonic (dcbt for TH value of 0b1) is generated for a > read prefetch when n2=0 and the dbct instruction is generated for n2 in > range [1,3]. > > The ISA states that the value TH = 0b1 is a hint that the processor > will probably soon perform a load from the addressed block. (s/dcbst/dcbtst/). Yup, sounds good. > The regression testing of the patch was done on > >powerpc64le-unknown-linux-gnu (Power 8 LE) > > with no regressions. What ISA version is required for the TH field to do anything? Will it work on older machines too (just ignored)? What assembler version is required? > 2018-05-07 Carl Love > > * config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation > to define_insn prefetch. * config/rs6000/rs6000.md (prefetch): Generate dcbtt and dcbtstt instructions if operands[2] is 0. or similar. > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 2b15cca..7429d33 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -13233,10 +13233,19 @@ >(match_operand:SI 2 "const_int_operand" "n"))] >"" > { > - if (GET_CODE (operands[0]) == REG) > -return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0"; > - return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0"; > -} > + if (GET_CODE (operands[0]) == REG) { Use REG_P please. The correct formatting is if (this) { something; } else { whatever; } > +if (INTVAL (operands[1]) == 0) You can also say if (operands[1] == const0_rtx) if that is easier to read. > + } > + } If you do indenting right there never is a single space indent difference (always is even). It's a pity we need to decide between %a0 and not. Hardly seems worth making another output modifier for though. Segher
[PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.
GCC maintainers: The architecture independent builtin __builtin_prefetch() is defined as: void __builtin_prefetch (const void *addr, int n1, int n2) n1 - prefetch read = 0, prefetch write = 1 n2 - temporal locality 0 to 3. No temporal locality = 0, high temporal locality = 3. The implementation for Power maps to define_insn "prefetch" in gcc/config/rs6000/rs6000.md. The Power implementation currently ignores the value of n2 and simply generates the dcbtst and dbct instructions. This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH value of 0b1) for a write prefetch and dcbtst for n2 in range [1,3]. The dcbtt mnemonic (dcbt for TH value of 0b1) is generated for a read prefetch when n2=0 and the dbct instruction is generated for n2 in range [1,3]. The ISA states that the value TH = 0b1 is a hint that the processor will probably soon perform a load from the addressed block. There is an existing test case in gcc/testsuite/gcc.target/sh/prefetch.dump. The test case generates the following output with the patch: gcc -g -c -o prefetch prefetch.c objdump -S -d prefetch > prefetch.dump more prefetch.dump ... __builtin_prefetch (&data[0], 0, 0); c: 2c 00 3f 39 addir9,r31,44 10: 2c 4a 00 7e dcbtt 0,r9 __builtin_prefetch (&data[0], 0, 1); 14: 2c 00 3f 39 addir9,r31,44 18: 2c 4a 00 7c dcbt0,r9 __builtin_prefetch (&data[0], 0, 2); 1c: 2c 00 3f 39 addir9,r31,44 20: 2c 4a 00 7c dcbt0,r9 __builtin_prefetch (&data[0], 0, 3); 24: 2c 00 3f 39 addir9,r31,44 28: 2c 4a 00 7c dcbt0,r9 __builtin_prefetch (&data[0], 1, 0); 2c: 2c 00 3f 39 addir9,r31,44 30: ec 49 00 7e dcbtstt 0,r9 __builtin_prefetch (&data[0], 1, 1); 34: 2c 00 3f 39 addir9,r31,44 38: ec 49 00 7c dcbtst 0,r9 __builtin_prefetch (&data[0], 1, 2); 3c: 2c 00 3f 39 addir9,r31,44 40: ec 49 00 7c dcbtst 0,r9 __builtin_prefetch (&data[0], 1, 3); 44: 2c 00 3f 39 addir9,r31,44 48: ec 49 00 7c dcbtst 0,r9 ... The regression testing of the patch was done on powerpc64le-unknown-linux-gnu (Power 8 LE) with no regressions. Please let me know if the patch looks OK for GCC mainline. Carl Love gcc/ChangeLog: 2018-05-07 Carl Love * config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation to define_insn prefetch. --- gcc/config/rs6000/rs6000.md | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2b15cca..7429d33 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13233,10 +13233,19 @@ (match_operand:SI 2 "const_int_operand" "n"))] "" { - if (GET_CODE (operands[0]) == REG) -return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0"; - return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0"; -} + if (GET_CODE (operands[0]) == REG) { +if (INTVAL (operands[1]) == 0) + return INTVAL (operands[2]) ? "dcbt 0,%0" : "dcbtt 0,%0"; +else + return INTVAL (operands[2]) ? "dcbtst 0,%0" : "dcbtstt 0,%0"; + + } else { +if (INTVAL (operands[1]) == 0) + return INTVAL (operands[2]) ? "dcbt %a0" : "dcbtt %a0"; +else + return INTVAL (operands[2]) ? "dcbtst %a0" : "dcbtstt %a0"; + } + } [(set_attr "type" "load")]) ;; Handle -fsplit-stack. -- 2.7.4