Hi, Thanks both!
Looking again at the IR, I wonder if it is control related. This is the well-behaved IR, where both instances of the Int32AddWithOverflow have a control input. --- BLOCK B1145 id5053 <- B1143, B1144 --- 43: Merge(42, 13024) 53: Phi[kRepWord32](18, 13025, 43) 52: Phi[kRepWord32](17, 13021, 43) 51: EffectPhi(15, 13024, 43) 13034: ChangeInt32ToFloat64(53) 6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43) 13026: Int32AddWithOverflow(53, 53, 43) 13027: Projection[1](13026, 43) 13028: Branch[Machine, False](13027, 43) -> B1147, B1146 --- BLOCK B1158 id5027 <- B1157, B1153 --- 13061: Merge(13054, 13044) 13063: Phi[kRepFloat64](13056, 13060, 13061) 13062: EffectPhi(13056, 78, 13061) 6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061) 13068: Int32AddWithOverflow(53, 53, 13061) 13069: Projection[1](13068, 13061) 13070: Branch[Machine, False](13069, 13061) -> B1160, B1159 However, with Turboshaft, the blocks look like this: --- BLOCK B1145 id1146 <- B1143, B1144 --- 5473: Phi[kRepWord32](5449, 5470) 5474: Phi[kRepWord32](5450, 5466) 5475: Int64Constant[19] 5476: Load[kRepTaggedPointer|kTypeAny](28, 5475) 5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) 5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, sparse:^^](5474, 5473) 5479: TypedStateValues[, sparse:..] 5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, sparse:^^^^^...](5478, 5479, 27, 26, 25) 5481: TypedStateValues[, sparse:.] 5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5 <SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0) 5483: Int32AddWithOverflow(5473, 5473) 5484: Projection[0](5483) 5485: Projection[1](5483) 5486: Branch[Unspecified, False](5485) -> B1147, B1146 --- BLOCK B1158 id1159 <- B1153, B1157 --- 5549: Phi[kRepFloat64](5534, 5548) 5550: Float64RoundDown(5549) 5551: Int64Constant[19] 5552: Load[kRepTaggedPointer|kTypeAny](27, 5551) 5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) 5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, sparse:^^](5474, 5473) 5555: TypedStateValues[, sparse:..] 5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, sparse:^^......](5554, 5555) 5557: TypedStateValues[, sparse:.] 5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 <SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0) 5559: Branch[Unspecified, False](5485) -> B1160, B1159 I have seen that Turboshaft has an EmitProjectionReducer (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606) and seems to be closely related to ValueNumbering. Without control inputs, I can't see why ValueNumbering, or something similar, wouldn't remove the second instance. > It could be an artifact of scheduling I will have a look at the scheduler. > However, this should be prevented by this check Yes, it's that check that I'm trying to remove :) and this works for everything but projections. Surely we should be able to combine an operation even if it has multiple users? Regards, Sam On Wednesday, November 8, 2023 at 5:20:37 PM UTC dmerc...@google.com wrote: > Hi, > > > With a turbofan-only flow, the overflow operation and the projection > will be duplicated for the second branch, and everything is fine. > > I'm not sure which phase could duplicate a Int32AddWithOverflow. I don't > think that the Instruction Selector does this. It could be an artifact of > scheduling (which happens before effect control linearization or before > instruction selection). I'm not sure it's a conscious optimization though, > or just a side-effect of the scheduler. > > > Is it legal for all projected values to be live across basic blocks? If > so, who should be responsible for ensuring values are live where they need > to be? > > Yes it is legal (and probably happens fairly often). As to who should > ensure that the values are live... It sounds like here the bug is in the > Instruction Selector. What probably happens is that when emitting the 1st > branch, we decide that it can cover the Projection[1] and don't need to > emit it, which is wrong. However, this should be prevented by this check > https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278: > > when user=the_first_branch and value=projection[1], CanCover should return > false, because the projection has another use. > > If you can provide an example that we can reproduce, I'd be happy to have > a look, but without being able to reproduce, we're just speculating :/ > > > I don't understand why it has a control input > > Projections don't really need control inputs, but I think that having > control inputs in projections is somewhat useful for scheduling in > Turbofan. We trying removing the control input ( > https://crrev.com/c/4570503), and performance regressed in some cases. > > Cheers, > Darius > > On Wednesday, November 8, 2023 at 6:02:40 PM UTC+1 Matthias Liedtke wrote: > >> Hi Sam, >> >> Honestly I don't feel knowledgeable enough to answer this question, so I >> hope some Turbofan experts can chime in. >> This line >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/common-operator.cc;l=1622> >> defines >> that the Projection in Turbofan has one value input, a control input and a >> value output but no control output. >> I don't understand why it has a control input or what it exactly means >> that an operation has a control input vs. let's say a Select operation that >> doesn't have one. >> Still, I'd assume that the Branch in B4 should have a control input that >> at some point leads to the same control input as "2: Projection [1]" and >> therefore I'd assume that it's valid as the control input of the projection >> dominates the branch that uses it. >> Looking at the schedule, as the Projection is in B0, it is guaranteed to >> be "executed" before B4, so the virtual register should be defined and >> should be "available" / usable by all instructions following itself in B0 >> and in all instructions in blocks that are reachable from B0 directly or >> indirectly. >> I don't think there is anything limiting the liveness of a virtual >> register, so as long as a block A dominates another Block B, all virtual >> registers defined in A are usable in B. >> So your snippet looks totally fine based on my (limited) understanding of >> Turbofan. >> >> However, I'd assume it should be >> 1: Projection[0] (0) >> 2: Projection[1] (0) >> and not >> 1: Projection[0] (0) >> 2: Projection[1] (1) >> (i.e. both projections should have Int32AddWithOverflow as an input) >> >> Best regards, >> Matthias >> >> >> On Wed, Nov 8, 2023 at 5:24 PM Sam Parker-Haynes <sam.p...@arm.com> >> wrote: >> >>> Hi Matthias, >>> >>> My RecreateSchedule example looks like this: >>> >>> B0: >>> 0: Int32AddWithOverflow >>> 1: Projection[0] (0) >>> 2: Projection[1] (1) >>> Branch (2) >>> ... >>> >>> B4: >>> Branch (2) <-- how should this branch receive (2) >>> >>> I know I said it looks okay at the IR level, but since the IR isn't >>> documented that may not be true ;) So, what are the semantics of a >>> Projection..? >>> - Is the Projection[1] of an Overflow operation a part of the control >>> chain? >>> - Is it legal for all projected values to be live across basic blocks? >>> If so, who should be responsible for ensuring values are live where they >>> need to be? >>> >>> Thanks, >>> Sam >>> >>> On Wednesday, November 8, 2023 at 2:15:58 PM UTC Sam Parker-Haynes wrote: >>> >>> Also, are any instruction selection tests being added for Turboshaft? >>> Something like the equivalent of >>> test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc? >>> >>> On Wednesday, November 8, 2023 at 1:26:12 PM UTC Sam Parker-Haynes wrote: >>> >>> EDIT: only passes with --no-turboshaft >>> >>> On Wednesday, November 8, 2023 at 1:20:03 PM UTC Sam Parker-Haynes wrote: >>> >>> Hi Matthias, >>> >>> Ah, I hadn't seen that, so yes I've been tweaking the OG selector >>> function for arm64. On the x64 side of things, everything is fine with >>> 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue arises >>> when modifying the turbofan selector function and using ---no-turboshaft. >>> >>> So, I guess the problem is with the RecreateSchedule? >>> >>> Thanks, >>> Sam >>> >>> On Wednesday, November 8, 2023 at 1:12:56 PM UTC mlie...@google.com >>> wrote: >>> >>> Hi Sam, >>> >>> I don't have any insights on x64 for VisitWordCompareZero but as a quick >>> note, the TurboshaftAdapter instruction selection implementation for >>> VisitWordCompareZero for arm64 was just added yesterday by me ( >>> https://chromium-review.googlesource.com/c/v8/v8/+/5002003). >>> >>> So, just to clarify, I assume you are talking about the Turbofan {Node* >>> / TurbofanAdapter} based instruction selection with Turboshaft >>> optimizations followed by a RecreateSchedule? >>> If you do changes for the TurbofanAdapter instruction selection on arm64 >>> for code that already has an implementation for the TurboshaftAdapter, >>> could you add a TODO(mliedtke) on the Turboshaft implementation with some >>> nice description and CC me on the gerrit changes, so these don't get lost >>> when we switch over to the TurboshaftAdapter? >>> >>> Thanks and best regards, >>> Matthias >>> >>> On Wed, Nov 8, 2023 at 1:54 PM Sam Parker-Haynes <sam.p...@arm.com> >>> wrote: >>> >>> Hi, >>> >>> I've been investigating rematerialising flag setting instructions in >>> VisitWordCompareZero, just by removing the CanCover check, and something is >>> going wrong with turboshaft. >>> >>> I have an arithmetic overflow operation, Int32AddWithOverflow, and the >>> overflow check is used by two branch operations. With a turbofan-only flow, >>> the overflow operation and the projection will be duplicated for the second >>> branch, and everything is fine. But with turboshaft, neither the overflow >>> operation or projection are duplicated and the second branch uses the >>> original projection[1] value. This looks fine at the IR level, but ends up >>> broken after isel when the register allocator comes across a virtual >>> register without a definition. This happens for both x64 and arm64, so I'm >>> assuming turboshaft is making some assumptions, based on the current >>> behaviour, that are non-obvious to me. >>> >>> Any ideas? >>> >>> Thanks! >>> Sam >>> >>> >>> >>> -- >>> -- >>> v8-dev mailing list >>> v8-...@googlegroups.com >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to v8-dev+un...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> -- >>> -- >>> v8-dev mailing list >>> v8-...@googlegroups.com >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to v8-dev+un...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/d56a5e3a-90b9-4e49-bf93-3751d7b16db9n%40googlegroups.com.