Right, it is now working as expected. I've broken the WasmOptimizePhase
into two:
void WasmOptimizePhase::Run(Zone* temp_zone) {
UnparkedScopeIfNeeded scope(PipelineData::Get().broker(),
v8_flags.turboshaft_trace_reduction);
// TODO(14108): Add more reducers as needed.
OptimizationPhase<WasmLoweringReducer,
MachineOptimizationReducerSignallingNanPossible>::Run(temp_zone);
OptimizationPhase<ValueNumberingReducer>::Run(temp_zone);
And this indeed cleans up the graph to remove dead code, which allows the
backend to perform optimisations under CanCover:
B5: AO#4 (no frame) instructions: [11, 12)
predecessors: B4
11: gap () ()
Arm64Cmp32 && branch if equal v0(R) #50 [rpo_immediate:7]
[rpo_immediate:6]
So, my final question :) how does one decide how to organize passes?
There's always going to be an ordering problem, but what are the trade-offs
to grouping/splitting reducers into OptimizationPhase(s)?
Sam
On Thursday, July 27, 2023 at 1:24:25 PM UTC+1 Sam Parker-Haynes wrote:
> Ah, I think I was miss-reading the Arm64 instructions:
> Arm64CompareAndBranch is cbz (?) so the control-flow would work out... Too
> many levels of conditionals for me!
> On Thursday, July 27, 2023 at 11:34:02 AM UTC+1 Sam Parker-Haynes wrote:
>
>> If I'm reading the IR correctly, the branch has already been rewritten to
>> directly use #26, so I'm not expecting there to be two comparisons. The
>> branch targets are the same, but the condition is being inverted (again)
>> during ISel. I think the IR should be translated to this:
>>
>> v7(R) = Arm64Cmp32 && branch if equal v0(R) #50 [rpo_immediate:7]
>> [rpo_immediate:6]
>>
>> And many thanks for the pipeline overview! I will continue playing.
>>
>>
>> On Thursday, July 27, 2023 at 9:52:27 AM UTC+1 Nico Hartmann wrote:
>>
>>> Hi Sam,
>>>
>>> Briefly summarized, the Turboshaft pipeline is organized into phases
>>> where each phase is defined by a stack of reducers (e.g.
>>> `OptimizationPhase<SomeReducer, SomeOtherReducer, AnotherReducer>::Run()`).
>>> A Turboshaft graph is passed to each phase (the "input graph") and the
>>> phase then iterates over the graph and passes each operation through the
>>> stack. At the bottom of the stack, a new operation (or a sequence of
>>> operations) is generated into the phase's output (the "output graph"). The
>>> output graph is then the next phase's input. However, some phases run
>>> analyses. Those always process the phase's input graph regardless of where
>>> they are in the stack. The `DeadCodeEliminationReducer` is one of these
>>> that runs an analysis on the input graph and then skips emitting dead
>>> operations during reduction. As a result, operations that became dead
>>> during a previous reduction in the same phase (reducer stack), will not be
>>> removed.
>>>
>>> There are two ways to get rid of unused operations:
>>> - When a phase iterates over the input graph, all operations that don't
>>> have uses are just skipped. So once an operation that doesn't have any uses
>>> is emitted into the output graph, it will be removed (or rather ignored
>>> when generating the next graph) at the beginning of the next phase.
>>> - Using the `DeadCodeEliminationReducer`, which is a more powerful
>>> liveness analysis that can do much more. It will run a full fixpoint
>>> analysis on the input graph to find all live operations to remove all dead
>>> operations in one step. It will also eliminate unnecessary control flow by
>>> rewriting BranchOp and GotoOp targets and it will rewrite PhiOps and such.
>>> If you want to make use of this, it has to be in the following phase,
>>> because - as described above - it always runs on the phase's input graph.
>>> However, this is a more expensive analysis+reducer, so I would not
>>> recommend using it for the example you provided above but just rely on the
>>> next OptimizationPhase run to remove the unused operations.
>>>
>>> Regarding the ISel, I think the generated code looks okay. `v7(R) = ...`
>>> is the comparison of `v0` with `50` (so that's basically `26:
>>> Word32Equal(4, 25)`) and then we branch based on this result. Please
>>> clarify what you think is wrong here.
>>>
>>> Nico
>>>
>>> On Wed, Jul 26, 2023 at 12:35 PM Sam Parker-Haynes <[email protected]>
>>> wrote:
>>>
>>>> Hi Nico,
>>>>
>>>> I've been having a poke around using a toy wasm example. I'm trying to
>>>> understand how the wasm optimization pipeline works. I have this basic
>>>> block:
>>>>
>>>> BLOCK B7 <- B4
>>>>
>>>> 26: Constant()[word32: 50]
>>>>
>>>> 27: Equal(#3, #26)[Word32]
>>>>
>>>> 28: Constant()[word32: 0]
>>>>
>>>> 29: Equal(#27, #28)[Word32]
>>>>
>>>> 30: Branch(#29)[B10, B8, None]
>>>>
>>>> And I can see that MachineOptimizationReducer changes the branch (29)
>>>> condition, but it doesn't remove (28) and (27):
>>>>
>>>> BLOCK B6 <- B4
>>>> 25: Constant()[word32: 50]
>>>> 26: Equal(#3, #25)[Word32]
>>>> 27: Constant()[word32: 0]
>>>> 28: Equal(#26, #27)[Word32]
>>>> 29: Branch(#26)[B7, B8, None]
>>>>
>>>> I've tried adding DeadCodeElimination, but it always seems to run
>>>> *before* MachineOptimizationReducer, how can I get it to run
>>>> afterwards?
>>>>
>>>> What I find somewhat worrying is that this IR is lowered to equivalent
>>>> looking Turbofan IR but then ISel appears to be using Word32Equal (28) as
>>>> the branch condition again:
>>>>
>>>> --- BLOCK B5 id7 <- B4 ---
>>>> 24: IfFalse(22)
>>>> 25: Int32Constant[50]
>>>> 26: Word32Equal(4, 25)
>>>> 27: Int32Constant[0]
>>>> 28: Word32Equal(26, 27)
>>>> 29: Branch[Unspecified, None](26) -> B7, B6
>>>>
>>>> B5: AO#4 (no frame) instructions: [11, 13)
>>>> predecessors: B4
>>>> 11: gap () ()
>>>> v7(R) = Arm64Cmp32 && set if equal v0(R) #50
>>>> 12: gap () ()
>>>> Arm64CompareAndBranch32 && branch if not equal v7(R)
>>>> [rpo_immediate:7] [rpo_immediate:6]
>>>>
>>>> Any idea of what's going here?
>>>>
>>>> cheers!
>>>> On Tuesday, July 25, 2023 at 10:53:01 AM UTC+1 Sam Parker-Haynes wrote:
>>>>
>>>>> Great, thanks for the info, I'll take a look around.
>>>>>
>>>>> On Tuesday, July 25, 2023 at 10:47:07 AM UTC+1 Nico Hartmann wrote:
>>>>>
>>>>>> There is not a lot of (up to date) documentation. A very brief
>>>>>> summary:
>>>>>>
>>>>>> The new IR's operations (which replace Turbofan's nodes) are defined
>>>>>> in turboshaft/operations.h
>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/operations.h>
>>>>>> .
>>>>>> The current state of the pipeline is as follows: We run
>>>>>> Turbofan's frontend and then translate to the Turboshaft IR (in
>>>>>> turboshaft/graph-builder.cc
>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/graph-builder.cc>).
>>>>>>
>>>>>> Next, the Turboshaft optimization and lowering phases run on the CFG
>>>>>> (see
>>>>>> primarily pipeline.cc
>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/pipeline.cc>
>>>>>> and
>>>>>> the turboshaft/*-phase.h included from there). Eventually, we translate
>>>>>> the
>>>>>> CFG back into Turbofan's sea-of-nodes IR (in
>>>>>> turboshaft/recreate-schedule.cc
>>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/recreate-schedule.cc>)
>>>>>>
>>>>>> to run the backend.
>>>>>>
>>>>>> We are currently working on reimplementing the rest of
>>>>>> Turbofan's phases to have a full Turboshaft CFG pipeline eventually, but
>>>>>> we
>>>>>> plan to ship a first version of Turboshaft soonish, which still only
>>>>>> covers
>>>>>> part of the pipeline and relies on the translation back and forth.
>>>>>>
>>>>>> On Tue, Jul 25, 2023 at 11:05 AM Sam Parker-Haynes <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Nico,
>>>>>>>
>>>>>>> A CFG IR sounds good to me :) I've been hesitate to look as I didn't
>>>>>>> know the status, is there any documentation? Is it currently
>>>>>>> piggybacking
>>>>>>> off turbofan for some functionality? I can't immediately see where/how
>>>>>>> graph nodes are defined.
>>>>>>>
>>>>>>> cheers,
>>>>>>>
>>>>>>> On Tuesday, July 25, 2023 at 9:37:26 AM UTC+1 Nico Hartmann wrote:
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> We are currently migrating Turbofan to a new CFG-based IR
>>>>>>>> (Turboshaft). This could help.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Nico
>>>>>>>>
>>>>>>>> On Tue, Jul 25, 2023 at 10:23 AM Sam Parker-Haynes <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've been wondering how to go about adding general support for
>>>>>>>>> conditional operations in Turbofan, specifically conditional compares
>>>>>>>>> for
>>>>>>>>> AArch64. But I now see that Intel APX extensions also support more
>>>>>>>>> conditional instructions, beyond cmov.
>>>>>>>>>
>>>>>>>>> I'm not a huge fan of flag continuations, as it's been too easy
>>>>>>>>> for me to introduce bugs while using them, so is there a safer way to
>>>>>>>>> approach this?
>>>>>>>>>
>>>>>>>>> Machine-level predication would seem to require a CFG in a form
>>>>>>>>> that isn't really available until during/after jump-threading. So a
>>>>>>>>> few
>>>>>>>>> questions about jump threading:
>>>>>>>>> - Could predication happen there?
>>>>>>>>> - Why does jump threading run so late, why not before regalloc?
>>>>>>>>>
>>>>>>>>> And, in general, what is the policy on having optimisation passes
>>>>>>>>> that run on machine code?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sam
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> v8-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>>>> ---
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "v8-dev" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/v8-dev/80169017-f248-4424-89ad-97260f28643dn%40googlegroups.com
>>>>>>>>>
>>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/80169017-f248-4424-89ad-97260f28643dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Nico Hartmann | Software Engineer | [email protected] | Chrome
>>>>>>>> - V8
>>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> v8-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "v8-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/v8-dev/f4cb3d5e-d703-4e0a-b65d-1fbacf1573a0n%40googlegroups.com
>>>>>>>
>>>>>>> <https://groups.google.com/d/msgid/v8-dev/f4cb3d5e-d703-4e0a-b65d-1fbacf1573a0n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nico Hartmann | Software Engineer | [email protected] | Chrome -
>>>>>> V8
>>>>>>
>>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "v8-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/87824ff9-a7a0-4a22-b69b-44e686d07f1an%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/v8-dev/87824ff9-a7a0-4a22-b69b-44e686d07f1an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Nico Hartmann | Software Engineer | [email protected] | Chrome - V8
>>>
>>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/v8-dev/2565d8b5-4153-4d11-81f2-76b0bbcd223an%40googlegroups.com.