Arf sorry, I read a bit too quickly your initial message and missed that you were trying to remove a CanCover in VisitWordCompareZero.
> Surely we should be able to combine an operation even if it has multiple users? Yes, we should (but we don't). We've been thinking about relaxing some CanCovers for this purpose for a while, but haven't had time to get to it just yet. It's not surprising that it currently doesn't work, but it would be nice to make it work. > So could Overflow operations and/or Projections[1] be special-cased..? And why is there a difference here with a normal comparison? First, I think that very few operations besides Overflow binops have projections. Only Calls come to mind, and they are fairly different with regard to VisitWordCompareZero and CanCover (you probably never want to cover a Call). So, Overflow binops are different in the sense that they have 2 outputs and in particular an overflow output. Then, for a more useful explanation of what it going on: In VisitWordCompareZero, the continuation is a ForBranch when coming from a VisitBranch (which is your case). With such continuation, the overflow bit is never Defined even if it's used (see VisitBinop: it clearly has a single output regardless of the continuation, and EmitWithContinuation only defines an additional output for Set continuations), so it's not surprising that you end up with a virtual register (the one for the overflow bit) being used but not defined: you use it in the 2nd branch, but the 1st branch doesn't define it. If you look at VisitInt32AddWithOverflow, you'll see that it uses a VisitBinop (like VisitWordCompareZero does for branches on Int32AddWithOverflow's) with a ForSet continuation, and this ForSet continuation causes EmitWithContinuation to emit the overflow bit. In order to allow VisitWordCompareZero to cover Int32AddWithOverflow when the overflow bit has other uses (which should all be already Defined), you'll need to change something about the continuation used in VisitWordCompareZero. A ForBranchAndSet continuation sounds reasonable, but I haven't looked at the details. Happy to help if you have other questions :) Cheers, Darius On Thursday, November 9, 2023 at 11:31:24 AM UTC+1 sam.p...@arm.com wrote: > I can confirm that by disabling value numbering, a second > Int32AddWithOverflow does exist: > > --- BLOCK B1158 id1159 <- B1153, B1157 --- > 5768: Phi[kRepFloat64](5753, 5767) > 5769: Float64RoundDown(5768) > 5770: Int64Constant[19] > 5771: Load[kRepTaggedPointer|kTypeAny](27, 5770) > 5772: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) > 5773: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, > sparse:^^](5690, 5689) > 5774: TypedStateValues[, sparse:..] > 5775: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, > sparse:^^......](5773, 5774) > 5776: TypedStateValues[, sparse:.] > 5777: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 > <SharedFunctionInfo test_div>](5772, 5775, 5776, 2, 8, 0) > 5778: Int32AddWithOverflow(5689, 5689) > 5779: Projection[0](5778) > 5780: Projection[1](5778) > 5781: Branch[Unspecified, False](5780) -> B1160, B1159 > > So could Overflow operations and/or Projections[1] be special-cased..? And > why is there a difference here with a normal comparison? > > Thanks, > Sam > > On Thursday, November 9, 2023 at 9:50:53 AM UTC Sam Parker-Haynes wrote: > >> Hi, >> >> Thanks both! >> >> Looking again at the IR, I wonder if it is control related. This is the >> well-behaved IR, where both instances of the Int32AddWithOverflow have a >> control input. >> >> --- BLOCK B1145 id5053 <- B1143, B1144 --- >> >> 43: Merge(42, 13024) >> >> 53: Phi[kRepWord32](18, 13025, 43) >> >> 52: Phi[kRepWord32](17, 13021, 43) >> >> 51: EffectPhi(15, 13024, 43) >> >> 13034: ChangeInt32ToFloat64(53) >> 6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43) >> 13026: Int32AddWithOverflow(53, 53, 43) >> >> 13027: Projection[1](13026, 43) >> >> 13028: Branch[Machine, False](13027, 43) -> B1147, B1146 >> >> --- BLOCK B1158 id5027 <- B1157, B1153 --- >> 13061: Merge(13054, 13044) >> 13063: Phi[kRepFloat64](13056, 13060, 13061) >> 13062: EffectPhi(13056, 78, 13061) >> 6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061) >> 13068: Int32AddWithOverflow(53, 53, 13061) >> 13069: Projection[1](13068, 13061) >> 13070: Branch[Machine, False](13069, 13061) -> B1160, B1159 >> >> However, with Turboshaft, the blocks look like this: >> >> --- BLOCK B1145 id1146 <- B1143, B1144 --- >> 5473: Phi[kRepWord32](5449, 5470) >> 5474: Phi[kRepWord32](5450, 5466) >> 5475: Int64Constant[19] >> 5476: Load[kRepTaggedPointer|kTypeAny](28, 5475) >> 5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) >> 5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, >> sparse:^^](5474, 5473) >> 5479: TypedStateValues[, sparse:..] >> 5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, >> kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, >> sparse:^^^^^...](5478, 5479, 27, 26, 25) >> 5481: TypedStateValues[, sparse:.] >> 5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5 >> <SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0) >> 5483: Int32AddWithOverflow(5473, 5473) >> 5484: Projection[0](5483) >> 5485: Projection[1](5483) >> 5486: Branch[Unspecified, False](5485) -> B1147, B1146 >> >> --- BLOCK B1158 id1159 <- B1153, B1157 --- >> 5549: Phi[kRepFloat64](5534, 5548) >> 5550: Float64RoundDown(5549) >> 5551: Int64Constant[19] >> 5552: Load[kRepTaggedPointer|kTypeAny](27, 5551) >> 5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3) >> 5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, >> sparse:^^](5474, 5473) >> 5555: TypedStateValues[, sparse:..] >> 5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, >> sparse:^^......](5554, 5555) >> 5557: TypedStateValues[, sparse:.] >> 5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 >> <SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0) >> 5559: Branch[Unspecified, False](5485) -> B1160, B1159 >> >> >> I have seen that Turboshaft has an EmitProjectionReducer ( >> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606) >> >> and seems to be closely related to ValueNumbering. Without control inputs, >> I can't see why ValueNumbering, or something similar, wouldn't remove the >> second instance. >> >> > It could be an artifact of scheduling >> >> I will have a look at the scheduler. >> >> > However, this should be prevented by this check >> >> Yes, it's that check that I'm trying to remove :) and this works for >> everything but projections. Surely we should be able to combine an >> operation even if it has multiple users? >> >> Regards, >> Sam >> >> On Wednesday, November 8, 2023 at 5:20:37 PM UTC dmerc...@google.com >> wrote: >> >>> Hi, >>> >>> > With a turbofan-only flow, the overflow operation and the projection >>> will be duplicated for the second branch, and everything is fine. >>> >>> I'm not sure which phase could duplicate a Int32AddWithOverflow. I don't >>> think that the Instruction Selector does this. It could be an artifact of >>> scheduling (which happens before effect control linearization or before >>> instruction selection). I'm not sure it's a conscious optimization though, >>> or just a side-effect of the scheduler. >>> >>> > Is it legal for all projected values to be live across basic blocks? >>> If so, who should be responsible for ensuring values are live where they >>> need to be? >>> >>> Yes it is legal (and probably happens fairly often). As to who should >>> ensure that the values are live... It sounds like here the bug is in the >>> Instruction Selector. What probably happens is that when emitting the 1st >>> branch, we decide that it can cover the Projection[1] and don't need to >>> emit it, which is wrong. However, this should be prevented by this check >>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278: >>> >>> when user=the_first_branch and value=projection[1], CanCover should return >>> false, because the projection has another use. >>> >>> If you can provide an example that we can reproduce, I'd be happy to >>> have a look, but without being able to reproduce, we're just speculating :/ >>> >>> > I don't understand why it has a control input >>> >>> Projections don't really need control inputs, but I think that having >>> control inputs in projections is somewhat useful for scheduling in >>> Turbofan. We trying removing the control input ( >>> https://crrev.com/c/4570503), and performance regressed in some cases. >>> >>> Cheers, >>> Darius >>> >>> On Wednesday, November 8, 2023 at 6:02:40 PM UTC+1 Matthias Liedtke >>> wrote: >>> >>>> Hi Sam, >>>> >>>> Honestly I don't feel knowledgeable enough to answer this question, so >>>> I hope some Turbofan experts can chime in. >>>> This line >>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/common-operator.cc;l=1622> >>>> defines >>>> that the Projection in Turbofan has one value input, a control input and a >>>> value output but no control output. >>>> I don't understand why it has a control input or what it exactly means >>>> that an operation has a control input vs. let's say a Select operation >>>> that >>>> doesn't have one. >>>> Still, I'd assume that the Branch in B4 should have a control input >>>> that at some point leads to the same control input as "2: Projection [1]" >>>> and therefore I'd assume that it's valid as the control input of the >>>> projection dominates the branch that uses it. >>>> Looking at the schedule, as the Projection is in B0, it is guaranteed >>>> to be "executed" before B4, so the virtual register should be defined and >>>> should be "available" / usable by all instructions following itself in B0 >>>> and in all instructions in blocks that are reachable from B0 directly or >>>> indirectly. >>>> I don't think there is anything limiting the liveness of a virtual >>>> register, so as long as a block A dominates another Block B, all virtual >>>> registers defined in A are usable in B. >>>> So your snippet looks totally fine based on my (limited) understanding >>>> of Turbofan. >>>> >>>> However, I'd assume it should be >>>> 1: Projection[0] (0) >>>> 2: Projection[1] (0) >>>> and not >>>> 1: Projection[0] (0) >>>> 2: Projection[1] (1) >>>> (i.e. both projections should have Int32AddWithOverflow as an input) >>>> >>>> Best regards, >>>> Matthias >>>> >>>> >>>> On Wed, Nov 8, 2023 at 5:24 PM Sam Parker-Haynes <sam.p...@arm.com> >>>> wrote: >>>> >>>>> Hi Matthias, >>>>> >>>>> My RecreateSchedule example looks like this: >>>>> >>>>> B0: >>>>> 0: Int32AddWithOverflow >>>>> 1: Projection[0] (0) >>>>> 2: Projection[1] (1) >>>>> Branch (2) >>>>> ... >>>>> >>>>> B4: >>>>> Branch (2) <-- how should this branch receive (2) >>>>> >>>>> I know I said it looks okay at the IR level, but since the IR isn't >>>>> documented that may not be true ;) So, what are the semantics of a >>>>> Projection..? >>>>> - Is the Projection[1] of an Overflow operation a part of the control >>>>> chain? >>>>> - Is it legal for all projected values to be live across basic blocks? >>>>> If so, who should be responsible for ensuring values are live where they >>>>> need to be? >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Wednesday, November 8, 2023 at 2:15:58 PM UTC Sam Parker-Haynes >>>>> wrote: >>>>> >>>>> Also, are any instruction selection tests being added for Turboshaft? >>>>> Something like the equivalent of >>>>> test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc? >>>>> >>>>> On Wednesday, November 8, 2023 at 1:26:12 PM UTC Sam Parker-Haynes >>>>> wrote: >>>>> >>>>> EDIT: only passes with --no-turboshaft >>>>> >>>>> On Wednesday, November 8, 2023 at 1:20:03 PM UTC Sam Parker-Haynes >>>>> wrote: >>>>> >>>>> Hi Matthias, >>>>> >>>>> Ah, I hadn't seen that, so yes I've been tweaking the OG selector >>>>> function for arm64. On the x64 side of things, everything is fine with >>>>> 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue >>>>> arises >>>>> when modifying the turbofan selector function and using ---no-turboshaft. >>>>> >>>>> So, I guess the problem is with the RecreateSchedule? >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Wednesday, November 8, 2023 at 1:12:56 PM UTC mlie...@google.com >>>>> wrote: >>>>> >>>>> Hi Sam, >>>>> >>>>> I don't have any insights on x64 for VisitWordCompareZero but as a >>>>> quick note, the TurboshaftAdapter instruction selection implementation >>>>> for >>>>> VisitWordCompareZero for arm64 was just added yesterday by me ( >>>>> https://chromium-review.googlesource.com/c/v8/v8/+/5002003). >>>>> >>>>> So, just to clarify, I assume you are talking about the Turbofan >>>>> {Node* / TurbofanAdapter} based instruction selection with Turboshaft >>>>> optimizations followed by a RecreateSchedule? >>>>> If you do changes for the TurbofanAdapter instruction selection on >>>>> arm64 for code that already has an implementation for the >>>>> TurboshaftAdapter, could you add a TODO(mliedtke) on the Turboshaft >>>>> implementation with some nice description and CC me on the gerrit >>>>> changes, >>>>> so these don't get lost when we switch over to the TurboshaftAdapter? >>>>> >>>>> Thanks and best regards, >>>>> Matthias >>>>> >>>>> On Wed, Nov 8, 2023 at 1:54 PM Sam Parker-Haynes <sam.p...@arm.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I've been investigating rematerialising flag setting instructions in >>>>> VisitWordCompareZero, just by removing the CanCover check, and something >>>>> is >>>>> going wrong with turboshaft. >>>>> >>>>> I have an arithmetic overflow operation, Int32AddWithOverflow, and the >>>>> overflow check is used by two branch operations. With a turbofan-only >>>>> flow, >>>>> the overflow operation and the projection will be duplicated for the >>>>> second >>>>> branch, and everything is fine. But with turboshaft, neither the overflow >>>>> operation or projection are duplicated and the second branch uses the >>>>> original projection[1] value. This looks fine at the IR level, but ends >>>>> up >>>>> broken after isel when the register allocator comes across a virtual >>>>> register without a definition. This happens for both x64 and arm64, so >>>>> I'm >>>>> assuming turboshaft is making some assumptions, based on the current >>>>> behaviour, that are non-obvious to me. >>>>> >>>>> Any ideas? >>>>> >>>>> Thanks! >>>>> Sam >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> v8-dev mailing list >>>>> v8-...@googlegroups.com >>>>> http://groups.google.com/group/v8-dev >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "v8-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to v8-dev+un...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> -- >>>>> -- >>>>> v8-dev mailing list >>>>> v8-...@googlegroups.com >>>>> http://groups.google.com/group/v8-dev >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "v8-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to v8-dev+un...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/7762a4fc-8c08-45fa-a4dd-88f4179ca92fn%40googlegroups.com.