Aaaaaaah, thank you! Well, now it seems obvious :) I think, for now, I will leave the continuation alone and special-case projections to still require CanCover.
My question remains around testing. I can currently test specific isel patterns for Turbofan, is there anything like that for Turboshaft? Thanks again, Sam On Thursday, November 9, 2023 at 12:00:07 PM UTC dmerc...@google.com wrote: > Arf sorry, I read a bit too quickly your initial message and missed that > you were trying to remove a CanCover in VisitWordCompareZero. > > > Surely we should be able to combine an operation even if it has multiple > users? > > Yes, we should (but we don't). We've been thinking about relaxing some > CanCovers for this purpose for a while, but haven't had time to get to it > just yet. It's not surprising that it currently doesn't work, but it would > be nice to make it work. > > > So could Overflow operations and/or Projections[1] be special-cased..? > And why is there a difference here with a normal comparison? > > First, I think that very few operations besides Overflow binops have > projections. Only Calls come to mind, and they are fairly different with > regard to VisitWordCompareZero and CanCover (you probably never want to > cover a Call). So, Overflow binops are different in the sense that they > have 2 outputs and in particular an overflow output. > > Then, for a more useful explanation of what it going on: > In VisitWordCompareZero, the continuation is a ForBranch when coming from > a VisitBranch (which is your case). With such continuation, the overflow > bit is never Defined even if it's used (see VisitBinop: it clearly has a > single output regardless of the continuation, and EmitWithContinuation only > defines an additional output for Set continuations), so it's not surprising > that you end up with a virtual register (the one for the overflow bit) > being used but not defined: you use it in the 2nd branch, but the 1st > branch doesn't define it. > If you look at VisitInt32AddWithOverflow, you'll see that it uses a > VisitBinop (like VisitWordCompareZero does for branches on > Int32AddWithOverflow's) with a ForSet continuation, and this ForSet > continuation causes EmitWithContinuation to emit the overflow bit. > > In order to allow VisitWordCompareZero to cover Int32AddWithOverflow when > the overflow bit has other uses (which should all be already Defined), > you'll need to change something about the continuation used in > VisitWordCompareZero. A ForBranchAndSet continuation sounds reasonable, but > I haven't looked at the details. > > Happy to help if you have other questions :) > > Cheers, > Darius > > On Thursday, November 9, 2023 at 11:31:24 AM UTC+1 sam.p...@arm.com wrote: > >> I can confirm that by disabling value numbering, a second >> Int32AddWithOverflow does exist: >> >> --- BLOCK B1158 id1159 <- B1153, B1157 --- >> 5768: Phi[kRepFloat64](5753, 5767) >> 5769: Float64RoundDown(5768) >> 5770: Int64Constant[19] >> 5771: Load[kRepTaggedPointer|kTypeAny](27, 5770) >> 5772: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) >> 5773: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, >> sparse:^^](5690, 5689) >> 5774: TypedStateValues[, sparse:..] >> 5775: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, >> sparse:^^......](5773, 5774) >> 5776: TypedStateValues[, sparse:.] >> 5777: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 >> <SharedFunctionInfo test_div>](5772, 5775, 5776, 2, 8, 0) >> 5778: Int32AddWithOverflow(5689, 5689) >> 5779: Projection[0](5778) >> 5780: Projection[1](5778) >> 5781: Branch[Unspecified, False](5780) -> B1160, B1159 >> >> So could Overflow operations and/or Projections[1] be special-cased..? >> And why is there a difference here with a normal comparison? >> >> Thanks, >> Sam >> >> On Thursday, November 9, 2023 at 9:50:53 AM UTC Sam Parker-Haynes wrote: >> >>> Hi, >>> >>> Thanks both! >>> >>> Looking again at the IR, I wonder if it is control related. This is the >>> well-behaved IR, where both instances of the Int32AddWithOverflow have a >>> control input. >>> >>> --- BLOCK B1145 id5053 <- B1143, B1144 --- >>> >>> 43: Merge(42, 13024) >>> >>> 53: Phi[kRepWord32](18, 13025, 43) >>> >>> 52: Phi[kRepWord32](17, 13021, 43) >>> >>> 51: EffectPhi(15, 13024, 43) >>> >>> 13034: ChangeInt32ToFloat64(53) >>> 6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43) >>> 13026: Int32AddWithOverflow(53, 53, 43) >>> >>> 13027: Projection[1](13026, 43) >>> >>> 13028: Branch[Machine, False](13027, 43) -> B1147, B1146 >>> >>> --- BLOCK B1158 id5027 <- B1157, B1153 --- >>> 13061: Merge(13054, 13044) >>> 13063: Phi[kRepFloat64](13056, 13060, 13061) >>> 13062: EffectPhi(13056, 78, 13061) >>> 6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061) >>> 13068: Int32AddWithOverflow(53, 53, 13061) >>> 13069: Projection[1](13068, 13061) >>> 13070: Branch[Machine, False](13069, 13061) -> B1160, B1159 >>> >>> However, with Turboshaft, the blocks look like this: >>> >>> --- BLOCK B1145 id1146 <- B1143, B1144 --- >>> 5473: Phi[kRepWord32](5449, 5470) >>> 5474: Phi[kRepWord32](5450, 5466) >>> 5475: Int64Constant[19] >>> 5476: Load[kRepTaggedPointer|kTypeAny](28, 5475) >>> 5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) >>> 5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, >>> sparse:^^](5474, 5473) >>> 5479: TypedStateValues[, sparse:..] >>> 5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, >>> kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, >>> sparse:^^^^^...](5478, 5479, 27, 26, 25) >>> 5481: TypedStateValues[, sparse:.] >>> 5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5 >>> <SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0) >>> 5483: Int32AddWithOverflow(5473, 5473) >>> 5484: Projection[0](5483) >>> 5485: Projection[1](5483) >>> 5486: Branch[Unspecified, False](5485) -> B1147, B1146 >>> >>> --- BLOCK B1158 id1159 <- B1153, B1157 --- >>> 5549: Phi[kRepFloat64](5534, 5548) >>> 5550: Float64RoundDown(5549) >>> 5551: Int64Constant[19] >>> 5552: Load[kRepTaggedPointer|kTypeAny](27, 5551) >>> 5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) >>> 5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, >>> sparse:^^](5474, 5473) >>> 5555: TypedStateValues[, sparse:..] >>> 5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, >>> sparse:^^......](5554, 5555) >>> 5557: TypedStateValues[, sparse:.] >>> 5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 >>> <SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0) >>> 5559: Branch[Unspecified, False](5485) -> B1160, B1159 >>> >>> >>> I have seen that Turboshaft has an EmitProjectionReducer ( >>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606) >>> >>> and seems to be closely related to ValueNumbering. Without control inputs, >>> I can't see why ValueNumbering, or something similar, wouldn't remove the >>> second instance. >>> >>> > It could be an artifact of scheduling >>> >>> I will have a look at the scheduler. >>> >>> > However, this should be prevented by this check >>> >>> Yes, it's that check that I'm trying to remove :) and this works for >>> everything but projections. Surely we should be able to combine an >>> operation even if it has multiple users? >>> >>> Regards, >>> Sam >>> >>> On Wednesday, November 8, 2023 at 5:20:37 PM UTC dmerc...@google.com >>> wrote: >>> >>>> Hi, >>>> >>>> > With a turbofan-only flow, the overflow operation and the projection >>>> will be duplicated for the second branch, and everything is fine. >>>> >>>> I'm not sure which phase could duplicate a Int32AddWithOverflow. I >>>> don't think that the Instruction Selector does this. It could be an >>>> artifact of scheduling (which happens before effect control linearization >>>> or before instruction selection). I'm not sure it's a conscious >>>> optimization though, or just a side-effect of the scheduler. >>>> >>>> > Is it legal for all projected values to be live across basic blocks? >>>> If so, who should be responsible for ensuring values are live where they >>>> need to be? >>>> >>>> Yes it is legal (and probably happens fairly often). As to who should >>>> ensure that the values are live... It sounds like here the bug is in the >>>> Instruction Selector. What probably happens is that when emitting the 1st >>>> branch, we decide that it can cover the Projection[1] and don't need to >>>> emit it, which is wrong. However, this should be prevented by this check >>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278: >>>> >>>> when user=the_first_branch and value=projection[1], CanCover should return >>>> false, because the projection has another use. >>>> >>>> If you can provide an example that we can reproduce, I'd be happy to >>>> have a look, but without being able to reproduce, we're just speculating :/ >>>> >>>> > I don't understand why it has a control input >>>> >>>> Projections don't really need control inputs, but I think that having >>>> control inputs in projections is somewhat useful for scheduling in >>>> Turbofan. We trying removing the control input ( >>>> https://crrev.com/c/4570503), and performance regressed in some cases. >>>> >>>> Cheers, >>>> Darius >>>> >>>> On Wednesday, November 8, 2023 at 6:02:40 PM UTC+1 Matthias Liedtke >>>> wrote: >>>> >>>>> Hi Sam, >>>>> >>>>> Honestly I don't feel knowledgeable enough to answer this question, so >>>>> I hope some Turbofan experts can chime in. >>>>> This line >>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/common-operator.cc;l=1622> >>>>> defines >>>>> that the Projection in Turbofan has one value input, a control input and >>>>> a >>>>> value output but no control output. >>>>> I don't understand why it has a control input or what it exactly means >>>>> that an operation has a control input vs. let's say a Select operation >>>>> that >>>>> doesn't have one. >>>>> Still, I'd assume that the Branch in B4 should have a control input >>>>> that at some point leads to the same control input as "2: Projection [1]" >>>>> and therefore I'd assume that it's valid as the control input of the >>>>> projection dominates the branch that uses it. >>>>> Looking at the schedule, as the Projection is in B0, it is guaranteed >>>>> to be "executed" before B4, so the virtual register should be defined and >>>>> should be "available" / usable by all instructions following itself in B0 >>>>> and in all instructions in blocks that are reachable from B0 directly or >>>>> indirectly. >>>>> I don't think there is anything limiting the liveness of a virtual >>>>> register, so as long as a block A dominates another Block B, all virtual >>>>> registers defined in A are usable in B. >>>>> So your snippet looks totally fine based on my (limited) understanding >>>>> of Turbofan. >>>>> >>>>> However, I'd assume it should be >>>>> 1: Projection[0] (0) >>>>> 2: Projection[1] (0) >>>>> and not >>>>> 1: Projection[0] (0) >>>>> 2: Projection[1] (1) >>>>> (i.e. both projections should have Int32AddWithOverflow as an input) >>>>> >>>>> Best regards, >>>>> Matthias >>>>> >>>>> >>>>> On Wed, Nov 8, 2023 at 5:24 PM Sam Parker-Haynes <sam.p...@arm.com> >>>>> wrote: >>>>> >>>>>> Hi Matthias, >>>>>> >>>>>> My RecreateSchedule example looks like this: >>>>>> >>>>>> B0: >>>>>> 0: Int32AddWithOverflow >>>>>> 1: Projection[0] (0) >>>>>> 2: Projection[1] (1) >>>>>> Branch (2) >>>>>> ... >>>>>> >>>>>> B4: >>>>>> Branch (2) <-- how should this branch receive (2) >>>>>> >>>>>> I know I said it looks okay at the IR level, but since the IR isn't >>>>>> documented that may not be true ;) So, what are the semantics of a >>>>>> Projection..? >>>>>> - Is the Projection[1] of an Overflow operation a part of the control >>>>>> chain? >>>>>> - Is it legal for all projected values to be live across basic >>>>>> blocks? If so, who should be responsible for ensuring values are live >>>>>> where >>>>>> they need to be? >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Wednesday, November 8, 2023 at 2:15:58 PM UTC Sam Parker-Haynes >>>>>> wrote: >>>>>> >>>>>> Also, are any instruction selection tests being added for Turboshaft? >>>>>> Something like the equivalent of >>>>>> test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc? >>>>>> >>>>>> On Wednesday, November 8, 2023 at 1:26:12 PM UTC Sam Parker-Haynes >>>>>> wrote: >>>>>> >>>>>> EDIT: only passes with --no-turboshaft >>>>>> >>>>>> On Wednesday, November 8, 2023 at 1:20:03 PM UTC Sam Parker-Haynes >>>>>> wrote: >>>>>> >>>>>> Hi Matthias, >>>>>> >>>>>> Ah, I hadn't seen that, so yes I've been tweaking the OG selector >>>>>> function for arm64. On the x64 side of things, everything is fine with >>>>>> 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue >>>>>> arises >>>>>> when modifying the turbofan selector function and using ---no-turboshaft. >>>>>> >>>>>> So, I guess the problem is with the RecreateSchedule? >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Wednesday, November 8, 2023 at 1:12:56 PM UTC mlie...@google.com >>>>>> wrote: >>>>>> >>>>>> Hi Sam, >>>>>> >>>>>> I don't have any insights on x64 for VisitWordCompareZero but as a >>>>>> quick note, the TurboshaftAdapter instruction selection implementation >>>>>> for >>>>>> VisitWordCompareZero for arm64 was just added yesterday by me ( >>>>>> https://chromium-review.googlesource.com/c/v8/v8/+/5002003). >>>>>> >>>>>> So, just to clarify, I assume you are talking about the Turbofan >>>>>> {Node* / TurbofanAdapter} based instruction selection with Turboshaft >>>>>> optimizations followed by a RecreateSchedule? >>>>>> If you do changes for the TurbofanAdapter instruction selection on >>>>>> arm64 for code that already has an implementation for the >>>>>> TurboshaftAdapter, could you add a TODO(mliedtke) on the Turboshaft >>>>>> implementation with some nice description and CC me on the gerrit >>>>>> changes, >>>>>> so these don't get lost when we switch over to the TurboshaftAdapter? >>>>>> >>>>>> Thanks and best regards, >>>>>> Matthias >>>>>> >>>>>> On Wed, Nov 8, 2023 at 1:54 PM Sam Parker-Haynes <sam.p...@arm.com> >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've been investigating rematerialising flag setting instructions in >>>>>> VisitWordCompareZero, just by removing the CanCover check, and something >>>>>> is >>>>>> going wrong with turboshaft. >>>>>> >>>>>> I have an arithmetic overflow operation, Int32AddWithOverflow, and >>>>>> the overflow check is used by two branch operations. With a >>>>>> turbofan-only >>>>>> flow, the overflow operation and the projection will be duplicated for >>>>>> the >>>>>> second branch, and everything is fine. But with turboshaft, neither the >>>>>> overflow operation or projection are duplicated and the second branch >>>>>> uses >>>>>> the original projection[1] value. This looks fine at the IR level, but >>>>>> ends >>>>>> up broken after isel when the register allocator comes across a virtual >>>>>> register without a definition. This happens for both x64 and arm64, so >>>>>> I'm >>>>>> assuming turboshaft is making some assumptions, based on the current >>>>>> behaviour, that are non-obvious to me. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Thanks! >>>>>> Sam >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> v8-dev mailing list >>>>>> v8-...@googlegroups.com >>>>>> http://groups.google.com/group/v8-dev >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "v8-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to v8-dev+un...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> -- >>>>>> -- >>>>>> v8-dev mailing list >>>>>> v8-...@googlegroups.com >>>>>> http://groups.google.com/group/v8-dev >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "v8-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to v8-dev+un...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/3aabdaf2-7f9c-4b54-8b09-25441fb4b277n%40googlegroups.com.