Great, thanks for the tips Darius! On Thursday, August 31, 2023 at 1:40:30 PM UTC+1 dmerc...@google.com wrote:
> Hi Sam, > > ReduceBranch in branch-elimination-reducer.h sounds like the right place > to do this (the method is declared with "REDUCE(Branch)" rather than just > "ReduceBranch"). > You can probably draw some inspiration from "REDUCE(Goto)" in this same > file: this method tries to eliminate double diamonds by identifying Gotos > to blocks that end with a branch whose condition is currently known, and > inline the destination (see the 2nd part of the topmost comment of the > file). > > Note that "if_true" in this ReduceBranch is the "true" block in the output > graph that we're currently building, which is currently not bound and is > thus empty (we always start by emitting all of the predecessors of a block > before emitting it (except for loop headers)). You thus have to look at its > origin in the input graph, like the beginning of ReduceBranch does with > "if_true_origin = Asm().OriginForBlockStart(if_true)". > > Also note that in your example, once you've inlined B1 into B0, you do not > need to explicitly skip B1 later: B1 will now have 0 predecessors, and will > thus automatically be skipped by the OptimizationPhase. > > Let me know if you have any questions and feel free to add me as reviewer > to your CL ;) > > Cheers, > Darius > > On Thu, 31 Aug 2023 at 14:21, Sam Parker-Haynes <sam.p...@arm.com> wrote: > >> Hi Nico, >> >> I finally got around to trying to make a prototype and I'm unsure how to >> proceed. As an overview, I want to find a triangle CFG and hoist the >> conditional 'middle' block into its only predecessor. Something like this: >> B0: >> x = ComparisonOp a, b, kind >> BranchOp x, B2, B1 >> B1: >> y = ComparisonOp c, d, kind >> BranchOp y, B2, B3 >> B2: >> z = .... >> ------------------------------------- >> B0: >> x = ComparisionOp a, b, kind >> y = ConditionalComparisonOp a, b, x, kind >> BranchOp y, B2, B3 >> B2: >> z = ... >> B3: >> >> My current implementation would, from `Bind`, search the CFG from B2 and >> decide that B1 can be hoisted into B0. From there, we need to replace the >> BranchOp in B0 with the one in B1. We'r'e then replacing the condition of >> that branch with a new ConditonalCompareOp. >> >> If this sounds feasible, where/how should I do the replacing? I'm still >> yet to figure out if there is an interface which a reducer has to >> implement, but I've seen several examples which sound promising: >> REDUCE(Branch), >> ReduceBranchCondition and ReduceInputGraphBranch. Would any of these be >> right place or would they be a misuse of how the reducer stacks operate..? >> >> cheers, >> sam >> On Thursday, July 27, 2023 at 3:47:08 PM UTC+1 Nico Hartmann wrote: >> >>> Great to hear that it's working as expected. >>> >>> The phase ordering is a hard problem indeed. There is no one answer to >>> that question. Every phase has a cost in the sense that the graph has to be >>> iterated and a new graph has to be emitted. Although the cost is relatively >>> small, it is not free. In other words, you might not want to put every >>> reducer in its own phase. Besides some reducers that have a few additional >>> constraints (e.g. they always run on the input graph or they always need to >>> be at the bottom of the stack), if you have a situation where ReducerA >>> recognizes and optimizes a certain pattern and you have a ReducerB that may >>> produce those as part of some other reductions/optimizations, then it might >>> typically be a good idea to run ReducerA (somewhere) behind ReducerB in a >>> stack. >>> >>> Other than that, for the pipeline we have so far, we just tried >>> different combinations and compared performance of generated code and of >>> the compilation itself. >>> >>> Hope that helps a bit, >>> Nico >>> >>> On Thu, Jul 27, 2023 at 3:49 PM Sam Parker-Haynes <sam.p...@arm.com> >>> wrote: >>> >>>> Right, it is now working as expected. I've broken the WasmOptimizePhase >>>> into two: >>>> >>>> void WasmOptimizePhase::Run(Zone* temp_zone) { >>>> >>>> UnparkedScopeIfNeeded scope(PipelineData::Get().broker(), >>>> v8_flags.turboshaft_trace_reduction); >>>> // TODO(14108): Add more reducers as needed. >>>> OptimizationPhase<WasmLoweringReducer, >>>> >>>> MachineOptimizationReducerSignallingNanPossible>::Run(temp_zone); >>>> OptimizationPhase<ValueNumberingReducer>::Run(temp_zone); >>>> >>>> And this indeed cleans up the graph to remove dead code, which allows >>>> the backend to perform optimisations under CanCover: >>>> >>>> B5: AO#4 (no frame) instructions: [11, 12) >>>> predecessors: B4 >>>> 11: gap () () >>>> Arm64Cmp32 && branch if equal v0(R) #50 [rpo_immediate:7] >>>> [rpo_immediate:6] >>>> >>>> So, my final question :) how does one decide how to organize passes? >>>> There's always going to be an ordering problem, but what are the >>>> trade-offs >>>> to grouping/splitting reducers into OptimizationPhase(s)? >>>> >>>> Sam >>>> On Thursday, July 27, 2023 at 1:24:25 PM UTC+1 Sam Parker-Haynes wrote: >>>> >>>>> Ah, I think I was miss-reading the Arm64 instructions: >>>>> Arm64CompareAndBranch is cbz (?) so the control-flow would work out... >>>>> Too >>>>> many levels of conditionals for me! >>>>> On Thursday, July 27, 2023 at 11:34:02 AM UTC+1 Sam Parker-Haynes >>>>> wrote: >>>>> >>>>>> If I'm reading the IR correctly, the branch has already been >>>>>> rewritten to directly use #26, so I'm not expecting there to be two >>>>>> comparisons. The branch targets are the same, but the condition is being >>>>>> inverted (again) during ISel. I think the IR should be translated to >>>>>> this: >>>>>> >>>>>> v7(R) = Arm64Cmp32 && branch if equal v0(R) #50 [rpo_immediate:7] >>>>>> [rpo_immediate:6] >>>>>> >>>>>> And many thanks for the pipeline overview! I will continue playing. >>>>>> >>>>>> >>>>>> On Thursday, July 27, 2023 at 9:52:27 AM UTC+1 Nico Hartmann wrote: >>>>>> >>>>>>> Hi Sam, >>>>>>> >>>>>>> Briefly summarized, the Turboshaft pipeline is organized into phases >>>>>>> where each phase is defined by a stack of reducers (e.g. >>>>>>> `OptimizationPhase<SomeReducer, SomeOtherReducer, >>>>>>> AnotherReducer>::Run()`). >>>>>>> A Turboshaft graph is passed to each phase (the "input graph") and the >>>>>>> phase then iterates over the graph and passes each operation through >>>>>>> the >>>>>>> stack. At the bottom of the stack, a new operation (or a sequence of >>>>>>> operations) is generated into the phase's output (the "output graph"). >>>>>>> The >>>>>>> output graph is then the next phase's input. However, some phases run >>>>>>> analyses. Those always process the phase's input graph regardless of >>>>>>> where >>>>>>> they are in the stack. The `DeadCodeEliminationReducer` is one of these >>>>>>> that runs an analysis on the input graph and then skips emitting dead >>>>>>> operations during reduction. As a result, operations that became dead >>>>>>> during a previous reduction in the same phase (reducer stack), will not >>>>>>> be >>>>>>> removed. >>>>>>> >>>>>>> There are two ways to get rid of unused operations: >>>>>>> - When a phase iterates over the input graph, all operations that >>>>>>> don't have uses are just skipped. So once an operation that doesn't >>>>>>> have >>>>>>> any uses is emitted into the output graph, it will be removed (or >>>>>>> rather >>>>>>> ignored when generating the next graph) at the beginning of the next >>>>>>> phase. >>>>>>> - Using the `DeadCodeEliminationReducer`, which is a more powerful >>>>>>> liveness analysis that can do much more. It will run a full fixpoint >>>>>>> analysis on the input graph to find all live operations to remove all >>>>>>> dead >>>>>>> operations in one step. It will also eliminate unnecessary control flow >>>>>>> by >>>>>>> rewriting BranchOp and GotoOp targets and it will rewrite PhiOps and >>>>>>> such. >>>>>>> If you want to make use of this, it has to be in the following phase, >>>>>>> because - as described above - it always runs on the phase's input >>>>>>> graph. >>>>>>> However, this is a more expensive analysis+reducer, so I would not >>>>>>> recommend using it for the example you provided above but just rely on >>>>>>> the >>>>>>> next OptimizationPhase run to remove the unused operations. >>>>>>> >>>>>>> Regarding the ISel, I think the generated code looks okay. `v7(R) = >>>>>>> ...` is the comparison of `v0` with `50` (so that's basically `26: >>>>>>> Word32Equal(4, 25)`) and then we branch based on this result. Please >>>>>>> clarify what you think is wrong here. >>>>>>> >>>>>>> Nico >>>>>>> >>>>>>> On Wed, Jul 26, 2023 at 12:35 PM Sam Parker-Haynes <sam.p...@arm.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Nico, >>>>>>>> >>>>>>>> I've been having a poke around using a toy wasm example. I'm trying >>>>>>>> to understand how the wasm optimization pipeline works. I have this >>>>>>>> basic >>>>>>>> block: >>>>>>>> >>>>>>>> BLOCK B7 <- B4 >>>>>>>> >>>>>>>> 26: Constant()[word32: 50] >>>>>>>> >>>>>>>> 27: Equal(#3, #26)[Word32] >>>>>>>> >>>>>>>> 28: Constant()[word32: 0] >>>>>>>> >>>>>>>> 29: Equal(#27, #28)[Word32] >>>>>>>> >>>>>>>> 30: Branch(#29)[B10, B8, None] >>>>>>>> >>>>>>>> And I can see that MachineOptimizationReducer changes the branch >>>>>>>> (29) condition, but it doesn't remove (28) and (27): >>>>>>>> >>>>>>>> BLOCK B6 <- B4 >>>>>>>> 25: Constant()[word32: 50] >>>>>>>> 26: Equal(#3, #25)[Word32] >>>>>>>> 27: Constant()[word32: 0] >>>>>>>> 28: Equal(#26, #27)[Word32] >>>>>>>> 29: Branch(#26)[B7, B8, None] >>>>>>>> >>>>>>>> I've tried adding DeadCodeElimination, but it always seems to run >>>>>>>> *before* MachineOptimizationReducer, how can I get it to run >>>>>>>> afterwards? >>>>>>>> >>>>>>>> What I find somewhat worrying is that this IR is lowered to >>>>>>>> equivalent looking Turbofan IR but then ISel appears to be using >>>>>>>> Word32Equal (28) as the branch condition again: >>>>>>>> >>>>>>>> --- BLOCK B5 id7 <- B4 --- >>>>>>>> 24: IfFalse(22) >>>>>>>> 25: Int32Constant[50] >>>>>>>> 26: Word32Equal(4, 25) >>>>>>>> 27: Int32Constant[0] >>>>>>>> 28: Word32Equal(26, 27) >>>>>>>> 29: Branch[Unspecified, None](26) -> B7, B6 >>>>>>>> >>>>>>>> B5: AO#4 (no frame) instructions: [11, 13) >>>>>>>> predecessors: B4 >>>>>>>> 11: gap () () >>>>>>>> v7(R) = Arm64Cmp32 && set if equal v0(R) #50 >>>>>>>> 12: gap () () >>>>>>>> Arm64CompareAndBranch32 && branch if not equal v7(R) >>>>>>>> [rpo_immediate:7] [rpo_immediate:6] >>>>>>>> >>>>>>>> Any idea of what's going here? >>>>>>>> >>>>>>>> cheers! >>>>>>>> On Tuesday, July 25, 2023 at 10:53:01 AM UTC+1 Sam Parker-Haynes >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Great, thanks for the info, I'll take a look around. >>>>>>>>> >>>>>>>>> On Tuesday, July 25, 2023 at 10:47:07 AM UTC+1 Nico Hartmann wrote: >>>>>>>>> >>>>>>>>>> There is not a lot of (up to date) documentation. A very brief >>>>>>>>>> summary: >>>>>>>>>> >>>>>>>>>> The new IR's operations (which replace Turbofan's nodes) are >>>>>>>>>> defined in turboshaft/operations.h >>>>>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/operations.h> >>>>>>>>>> . >>>>>>>>>> The current state of the pipeline is as follows: We run >>>>>>>>>> Turbofan's frontend and then translate to the Turboshaft IR (in >>>>>>>>>> turboshaft/graph-builder.cc >>>>>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/graph-builder.cc>). >>>>>>>>>> >>>>>>>>>> Next, the Turboshaft optimization and lowering phases run on the CFG >>>>>>>>>> (see >>>>>>>>>> primarily pipeline.cc >>>>>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/pipeline.cc> >>>>>>>>>> and >>>>>>>>>> the turboshaft/*-phase.h included from there). Eventually, we >>>>>>>>>> translate the >>>>>>>>>> CFG back into Turbofan's sea-of-nodes IR (in >>>>>>>>>> turboshaft/recreate-schedule.cc >>>>>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/recreate-schedule.cc>) >>>>>>>>>> >>>>>>>>>> to run the backend. >>>>>>>>>> >>>>>>>>>> We are currently working on reimplementing the rest of >>>>>>>>>> Turbofan's phases to have a full Turboshaft CFG pipeline eventually, >>>>>>>>>> but we >>>>>>>>>> plan to ship a first version of Turboshaft soonish, which still only >>>>>>>>>> covers >>>>>>>>>> part of the pipeline and relies on the translation back and forth. >>>>>>>>>> >>>>>>>>>> On Tue, Jul 25, 2023 at 11:05 AM Sam Parker-Haynes < >>>>>>>>>> sam.p...@arm.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Nico, >>>>>>>>>>> >>>>>>>>>>> A CFG IR sounds good to me :) I've been hesitate to look as I >>>>>>>>>>> didn't know the status, is there any documentation? Is it currently >>>>>>>>>>> piggybacking off turbofan for some functionality? I can't >>>>>>>>>>> immediately see >>>>>>>>>>> where/how graph nodes are defined. >>>>>>>>>>> >>>>>>>>>>> cheers, >>>>>>>>>>> >>>>>>>>>>> On Tuesday, July 25, 2023 at 9:37:26 AM UTC+1 Nico Hartmann >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Sam, >>>>>>>>>>>> >>>>>>>>>>>> We are currently migrating Turbofan to a new CFG-based IR >>>>>>>>>>>> (Turboshaft). This could help. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Nico >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 25, 2023 at 10:23 AM Sam Parker-Haynes < >>>>>>>>>>>> sam.p...@arm.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I've been wondering how to go about adding general support for >>>>>>>>>>>>> conditional operations in Turbofan, specifically conditional >>>>>>>>>>>>> compares for >>>>>>>>>>>>> AArch64. But I now see that Intel APX extensions also support >>>>>>>>>>>>> more >>>>>>>>>>>>> conditional instructions, beyond cmov. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not a huge fan of flag continuations, as it's been too >>>>>>>>>>>>> easy for me to introduce bugs while using them, so is there a >>>>>>>>>>>>> safer way to >>>>>>>>>>>>> approach this? >>>>>>>>>>>>> >>>>>>>>>>>>> Machine-level predication would seem to require a CFG in a >>>>>>>>>>>>> form that isn't really available until during/after >>>>>>>>>>>>> jump-threading. So a >>>>>>>>>>>>> few questions about jump threading: >>>>>>>>>>>>> - Could predication happen there? >>>>>>>>>>>>> - Why does jump threading run so late, why not before regalloc? >>>>>>>>>>>>> >>>>>>>>>>>>> And, in general, what is the policy on having optimisation >>>>>>>>>>>>> passes that run on machine code? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> v8-dev mailing list >>>>>>>>>>>>> v8-...@googlegroups.com >>>>>>>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>>>>>>> --- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "v8-dev" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to v8-dev+un...@googlegroups.com. >>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/v8-dev/80169017-f248-4424-89ad-97260f28643dn%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/80169017-f248-4424-89ad-97260f28643dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nico Hartmann | Software Engineer | nicoha...@google.com | Chrome >>>>>>>>>>>> - V8 >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> v8-dev mailing list >>>>>>>>>>> v8-...@googlegroups.com >>>>>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>>>>> --- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "v8-dev" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to v8-dev+un...@googlegroups.com. >>>>>>>>>>> >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/v8-dev/f4cb3d5e-d703-4e0a-b65d-1fbacf1573a0n%40googlegroups.com >>>>>>>>>>> >>>>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/f4cb3d5e-d703-4e0a-b65d-1fbacf1573a0n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nico Hartmann | Software Engineer | nicoha...@google.com | Chrome >>>>>>>>>> - V8 >>>>>>>>>> >>>>>>>>> -- >>>>>>>> -- >>>>>>>> v8-dev mailing list >>>>>>>> v8-...@googlegroups.com >>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "v8-dev" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to v8-dev+un...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/v8-dev/87824ff9-a7a0-4a22-b69b-44e686d07f1an%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/v8-dev/87824ff9-a7a0-4a22-b69b-44e686d07f1an%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nico Hartmann | Software Engineer | nicoha...@google.com | Chrome - >>>>>>> V8 >>>>>>> >>>>>> -- >>>> -- >>>> v8-dev mailing list >>>> v8-...@googlegroups.com >>>> http://groups.google.com/group/v8-dev >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "v8-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to v8-dev+un...@googlegroups.com. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/v8-dev/2565d8b5-4153-4d11-81f2-76b0bbcd223an%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/v8-dev/2565d8b5-4153-4d11-81f2-76b0bbcd223an%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> Nico Hartmann | Software Engineer | nicoha...@google.com | Chrome - V8 >>> >> -- >> -- >> v8-dev mailing list >> v8-...@googlegroups.com >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to the Google Groups >> "v8-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to v8-dev+un...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/v8-dev/7df09dff-8493-4e78-9b1c-071023ef6d05n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/v8-dev/7df09dff-8493-4e78-9b1c-071023ef6d05n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > Darius Mercadier > > Software Engineer > > dmerc...@google.com > > Google Germany GmbH > > Erika-Mann-Straße 33 > > 80636 München > > Geschäftsführer: Paul Manicle, Liana Sebastian > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten > haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, > löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, > dass die E-Mail an die falsche Person gesendet wurde. > > > > This e-mail is confidential. If you received this communication by > mistake, please don't forward it to anyone else, please erase all copies > and attachments, and please let me know that it has gone to the wrong > person. > -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/3b6693d5-5bf5-4681-a425-1e27e3f0a9dbn%40googlegroups.com.