Hi Sam, Yes, you can test Turboshaft Reducers directly. You can find examples in test/unittests/compiler/turboshaft/ <https://source.chromium.org/chromium/chromium/src/+/main:v8/test/unittests/compiler/turboshaft/>. Some examples: - Basic tests: https://source.chromium.org/chromium/chromium/src/+/main:v8/test/unittests/compiler/turboshaft/control-flow-unittest.cc - Bit more complex tests: https://source.chromium.org/chromium/chromium/src/+/main:v8/test/unittests/compiler/turboshaft/late-load-elimination-reducer-unittest.cc
You'll see that we don't have that many unittests for Reducers yet. As a result, the framework may be lacking some useful features. If this happens, feel free to add whatever you need :) Cheers, Darius On Fri, 12 Jul 2024 at 11:18, Sam Parker-Haynes <[email protected]> wrote: > Hi, > > I think I've got a reasonable implementation of this, where I'm performing > the reduction in machine-optimization-reducer.h. Is there a way of testing > turboshaft reducers directly, or will I need to write a mjsunit test? > > cheers > > On Wednesday, June 5, 2024 at 4:34:50 PM UTC+1 Sam Parker-Haynes wrote: > >> Okay, good!! >> >> So, although I'm wanting to generate horizontal reduction operations, I'm >> currently thinking about lowering these to pairwise instructions, such as >> SSE/AVX haddp and Neon faddp. The semantics of the TS op will be of a >> recursively pairwise operation so targets should be able to lower them to a >> variety of optimised sequences, which does mean we'd be able to use addv >> for ints on aarch64. >> >> Thanks again, >> Sam >> >> On Wednesday, June 5, 2024 at 4:04:36 PM UTC+1 [email protected] wrote: >> >>> And one more thing that will be nicer in a Reducer than in the >>> instruction selector: you don't have to worry about CanCover :o :o :o >>> >>> Btw, as far as I can tell, there is no corresponding Intel operations >>> for vaddvq (which I guess is what you want to generate), but I think that >>> it's still better in a reduce than in the ISEL directly. Maybe add a #ifdef >>> V8_TARGET_ARCH_ARM64 around the arm64-specific opcodes that you define. >>> >>> Cheers, >>> Darius >>> On Wednesday, June 5, 2024 at 4:56:56 PM UTC+2 Matthias Liedtke wrote: >>> >>>> Hi, >>>> >>>> I quickly synced with Darius: >>>> 1) In general it makes sense to do the matching on the graph itself >>>> (i.e. in a reducer) assuming this is a generic pattern for which there >>>> might also be specialized / optimized instructions on other architectures. >>>> 2) Intel is working on a re-vectorization pass to replace 128 bit SIMD >>>> operations with 256 bit SIMD operations. So, if these optimized "add + >>>> shuffle" operations exist on intel as well, there would be a clear benefit >>>> in doing it in a reducer that could then potentially run prior to the >>>> revectorization (which would require additional modifications to the >>>> revectorizer). >>>> >>>> In general it's advisable to have as little architecture-specific code >>>> paths in the reducers as possible, so the operations shouldn't be >>>> overfitting to some arm64-only instructions. >>>> Still, having some SIMD operations with clear semantics in the graph >>>> that only exist on some architectures, is fine. >>>> >>>> I don't think the overhead of pattern matching on the graph is likely >>>> to be more effort or slower than pattern matching during instruction >>>> selection. >>>> Given the complexity of arm64 and x64 ISel code, I'm happy about >>>> anything that isn't added on top of that. :) >>>> >>>> Cheers, >>>> Matthias >>>> >>>> On Wed, Jun 5, 2024 at 3:59 PM Sam Parker-Haynes <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'd like to add some pattern matching, for Turboshaft, to recognise >>>>> add + shuffle patterns which correspond to a horizontal pairwise >>>>> reduction. >>>>> I've started doing this with wasm::SimdShuffle helpers and then during >>>>> arm64 instruction selection, but it feels like the pattern matching should >>>>> be done in a generic place too... So, I was thinking about adding more >>>>> four >>>>> more kinds (I32x4, I64x4, F32x4 and F64x2 PairwiseReduction) >>>>> to Simd128UnaryOp and then perform the combining in >>>>> machine-optimization-reducer. >>>>> >>>>> Does this sound reasonable enough..? Or is the overhead of plumbing >>>>> this into the TS IR likely going to be significantly more complicated than >>>>> backend pattern matching? >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> -- >>>>> -- >>>>> v8-dev mailing list >>>>> [email protected] >>>>> http://groups.google.com/group/v8-dev >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "v8-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/v8-dev/2a9c3fcd-ee78-4877-9587-2ccb3b0a59e6n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/v8-dev/2a9c3fcd-ee78-4877-9587-2ccb3b0a59e6n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/v8-dev/c95c86f3-25b8-41da-8ae2-7ecb03c3b54dn%40googlegroups.com > <https://groups.google.com/d/msgid/v8-dev/c95c86f3-25b8-41da-8ae2-7ecb03c3b54dn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CAKRYUpvVwTWwG8_6Ns-JN29QamMk9GE7As9bNc6_T4Hw_8zYNw%40mail.gmail.com.
