Hi,
Thanks both!
Looking again at the IR, I wonder if it is control related. This is the
well-behaved IR, where both instances of the Int32AddWithOverflow have a
control input.
--- BLOCK B1145 id5053 <- B1143, B1144 ---
43: Merge(42, 13024)
53: Phi[kRepWord32](18, 13025, 43)
52: Phi[kRepWord32](17, 13021, 43)
51: EffectPhi(15, 13024, 43)
13034: ChangeInt32ToFloat64(53)
6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43)
13026: Int32AddWithOverflow(53, 53, 43)
13027: Projection[1](13026, 43)
13028: Branch[Machine, False](13027, 43) -> B1147, B1146
--- BLOCK B1158 id5027 <- B1157, B1153 ---
13061: Merge(13054, 13044)
13063: Phi[kRepFloat64](13056, 13060, 13061)
13062: EffectPhi(13056, 78, 13061)
6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061)
13068: Int32AddWithOverflow(53, 53, 13061)
13069: Projection[1](13068, 13061)
13070: Branch[Machine, False](13069, 13061) -> B1160, B1159
However, with Turboshaft, the blocks look like this:
--- BLOCK B1145 id1146 <- B1143, B1144 ---
5473: Phi[kRepWord32](5449, 5470)
5474: Phi[kRepWord32](5450, 5466)
5475: Int64Constant[19]
5476: Load[kRepTaggedPointer|kTypeAny](28, 5475)
5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32,
sparse:^^](5474, 5473)
5479: TypedStateValues[, sparse:..]
5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny,
kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny,
sparse:^^^^^...](5478, 5479, 27, 26, 25)
5481: TypedStateValues[, sparse:.]
5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5
<SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0)
5483: Int32AddWithOverflow(5473, 5473)
5484: Projection[0](5483)
5485: Projection[1](5483)
5486: Branch[Unspecified, False](5485) -> B1147, B1146
--- BLOCK B1158 id1159 <- B1153, B1157 ---
5549: Phi[kRepFloat64](5534, 5548)
5550: Float64RoundDown(5549)
5551: Int64Constant[19]
5552: Load[kRepTaggedPointer|kTypeAny](27, 5551)
5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32,
sparse:^^](5474, 5473)
5555: TypedStateValues[, sparse:..]
5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny,
sparse:^^......](5554, 5555)
5557: TypedStateValues[, sparse:.]
5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5
<SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0)
5559: Branch[Unspecified, False](5485) -> B1160, B1159
I have seen that Turboshaft has an EmitProjectionReducer
(https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606)
and seems to be closely related to ValueNumbering. Without control inputs,
I can't see why ValueNumbering, or something similar, wouldn't remove the
second instance.
> It could be an artifact of scheduling
I will have a look at the scheduler.
> However, this should be prevented by this check
Yes, it's that check that I'm trying to remove :) and this works for
everything but projections. Surely we should be able to combine an
operation even if it has multiple users?
Regards,
Sam
On Wednesday, November 8, 2023 at 5:20:37 PM UTC [email protected] wrote:
> Hi,
>
> > With a turbofan-only flow, the overflow operation and the projection
> will be duplicated for the second branch, and everything is fine.
>
> I'm not sure which phase could duplicate a Int32AddWithOverflow. I don't
> think that the Instruction Selector does this. It could be an artifact of
> scheduling (which happens before effect control linearization or before
> instruction selection). I'm not sure it's a conscious optimization though,
> or just a side-effect of the scheduler.
>
> > Is it legal for all projected values to be live across basic blocks? If
> so, who should be responsible for ensuring values are live where they need
> to be?
>
> Yes it is legal (and probably happens fairly often). As to who should
> ensure that the values are live... It sounds like here the bug is in the
> Instruction Selector. What probably happens is that when emitting the 1st
> branch, we decide that it can cover the Projection[1] and don't need to
> emit it, which is wrong. However, this should be prevented by this check
> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278:
>
> when user=the_first_branch and value=projection[1], CanCover should return
> false, because the projection has another use.
>
> If you can provide an example that we can reproduce, I'd be happy to have
> a look, but without being able to reproduce, we're just speculating :/
>
> > I don't understand why it has a control input
>
> Projections don't really need control inputs, but I think that having
> control inputs in projections is somewhat useful for scheduling in
> Turbofan. We trying removing the control input (
> https://crrev.com/c/4570503), and performance regressed in some cases.
>
> Cheers,
> Darius
>
> On Wednesday, November 8, 2023 at 6:02:40 PM UTC+1 Matthias Liedtke wrote:
>
>> Hi Sam,
>>
>> Honestly I don't feel knowledgeable enough to answer this question, so I
>> hope some Turbofan experts can chime in.
>> This line
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/common-operator.cc;l=1622>
>> defines
>> that the Projection in Turbofan has one value input, a control input and a
>> value output but no control output.
>> I don't understand why it has a control input or what it exactly means
>> that an operation has a control input vs. let's say a Select operation that
>> doesn't have one.
>> Still, I'd assume that the Branch in B4 should have a control input that
>> at some point leads to the same control input as "2: Projection [1]" and
>> therefore I'd assume that it's valid as the control input of the projection
>> dominates the branch that uses it.
>> Looking at the schedule, as the Projection is in B0, it is guaranteed to
>> be "executed" before B4, so the virtual register should be defined and
>> should be "available" / usable by all instructions following itself in B0
>> and in all instructions in blocks that are reachable from B0 directly or
>> indirectly.
>> I don't think there is anything limiting the liveness of a virtual
>> register, so as long as a block A dominates another Block B, all virtual
>> registers defined in A are usable in B.
>> So your snippet looks totally fine based on my (limited) understanding of
>> Turbofan.
>>
>> However, I'd assume it should be
>> 1: Projection[0] (0)
>> 2: Projection[1] (0)
>> and not
>> 1: Projection[0] (0)
>> 2: Projection[1] (1)
>> (i.e. both projections should have Int32AddWithOverflow as an input)
>>
>> Best regards,
>> Matthias
>>
>>
>> On Wed, Nov 8, 2023 at 5:24 PM Sam Parker-Haynes <[email protected]>
>> wrote:
>>
>>> Hi Matthias,
>>>
>>> My RecreateSchedule example looks like this:
>>>
>>> B0:
>>> 0: Int32AddWithOverflow
>>> 1: Projection[0] (0)
>>> 2: Projection[1] (1)
>>> Branch (2)
>>> ...
>>>
>>> B4:
>>> Branch (2) <-- how should this branch receive (2)
>>>
>>> I know I said it looks okay at the IR level, but since the IR isn't
>>> documented that may not be true ;) So, what are the semantics of a
>>> Projection..?
>>> - Is the Projection[1] of an Overflow operation a part of the control
>>> chain?
>>> - Is it legal for all projected values to be live across basic blocks?
>>> If so, who should be responsible for ensuring values are live where they
>>> need to be?
>>>
>>> Thanks,
>>> Sam
>>>
>>> On Wednesday, November 8, 2023 at 2:15:58 PM UTC Sam Parker-Haynes wrote:
>>>
>>> Also, are any instruction selection tests being added for Turboshaft?
>>> Something like the equivalent of
>>> test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc?
>>>
>>> On Wednesday, November 8, 2023 at 1:26:12 PM UTC Sam Parker-Haynes wrote:
>>>
>>> EDIT: only passes with --no-turboshaft
>>>
>>> On Wednesday, November 8, 2023 at 1:20:03 PM UTC Sam Parker-Haynes wrote:
>>>
>>> Hi Matthias,
>>>
>>> Ah, I hadn't seen that, so yes I've been tweaking the OG selector
>>> function for arm64. On the x64 side of things, everything is fine with
>>> 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue arises
>>> when modifying the turbofan selector function and using ---no-turboshaft.
>>>
>>> So, I guess the problem is with the RecreateSchedule?
>>>
>>> Thanks,
>>> Sam
>>>
>>> On Wednesday, November 8, 2023 at 1:12:56 PM UTC [email protected]
>>> wrote:
>>>
>>> Hi Sam,
>>>
>>> I don't have any insights on x64 for VisitWordCompareZero but as a quick
>>> note, the TurboshaftAdapter instruction selection implementation for
>>> VisitWordCompareZero for arm64 was just added yesterday by me (
>>> https://chromium-review.googlesource.com/c/v8/v8/+/5002003).
>>>
>>> So, just to clarify, I assume you are talking about the Turbofan {Node*
>>> / TurbofanAdapter} based instruction selection with Turboshaft
>>> optimizations followed by a RecreateSchedule?
>>> If you do changes for the TurbofanAdapter instruction selection on arm64
>>> for code that already has an implementation for the TurboshaftAdapter,
>>> could you add a TODO(mliedtke) on the Turboshaft implementation with some
>>> nice description and CC me on the gerrit changes, so these don't get lost
>>> when we switch over to the TurboshaftAdapter?
>>>
>>> Thanks and best regards,
>>> Matthias
>>>
>>> On Wed, Nov 8, 2023 at 1:54 PM Sam Parker-Haynes <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I've been investigating rematerialising flag setting instructions in
>>> VisitWordCompareZero, just by removing the CanCover check, and something is
>>> going wrong with turboshaft.
>>>
>>> I have an arithmetic overflow operation, Int32AddWithOverflow, and the
>>> overflow check is used by two branch operations. With a turbofan-only flow,
>>> the overflow operation and the projection will be duplicated for the second
>>> branch, and everything is fine. But with turboshaft, neither the overflow
>>> operation or projection are duplicated and the second branch uses the
>>> original projection[1] value. This looks fine at the IR level, but ends up
>>> broken after isel when the register allocator comes across a virtual
>>> register without a definition. This happens for both x64 and arm64, so I'm
>>> assuming turboshaft is making some assumptions, based on the current
>>> behaviour, that are non-obvious to me.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> Sam
>>>
>>>
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/v8-dev/d56a5e3a-90b9-4e49-bf93-3751d7b16db9n%40googlegroups.com.