Aaaaaaah, thank you! Well, now it seems obvious :)

I think, for now, I will leave the continuation alone and special-case 
projections to still require CanCover.

My question remains around testing. I can currently test specific isel 
patterns for Turbofan, is there anything like that for Turboshaft?

Thanks again,
Sam

On Thursday, November 9, 2023 at 12:00:07 PM UTC dmerc...@google.com wrote:

> Arf sorry, I read a bit too quickly your initial message and missed that 
> you were trying to remove a CanCover in VisitWordCompareZero.
>
> > Surely we should be able to combine an operation even if it has multiple 
> users?
>
> Yes, we should (but we don't). We've been thinking about relaxing some 
> CanCovers for this purpose for a while, but haven't had time to get to it 
> just yet. It's not surprising that it currently doesn't work, but it would 
> be nice to make it work.
>
> > So could Overflow operations and/or Projections[1] be special-cased..? 
> And why is there a difference here with a normal comparison?
>
> First, I think that very few operations besides Overflow binops have 
> projections. Only Calls come to mind, and they are fairly different with 
> regard to VisitWordCompareZero and CanCover (you probably never want to 
> cover a Call). So, Overflow binops are different in the sense that they 
> have 2 outputs and in particular an overflow output.
>
> Then, for a more useful explanation of what it going on:
> In VisitWordCompareZero, the continuation is a ForBranch when coming from 
> a VisitBranch (which is your case). With such continuation, the overflow 
> bit is never Defined even if it's used (see VisitBinop: it clearly has a 
> single output regardless of the continuation, and EmitWithContinuation only 
> defines an additional output for Set continuations), so it's not surprising 
> that you end up with a virtual register (the one for the overflow bit) 
> being used but not defined: you use it in the 2nd branch, but the 1st 
> branch doesn't define it.
> If you look at VisitInt32AddWithOverflow, you'll see that it uses a 
> VisitBinop (like VisitWordCompareZero does for branches on 
> Int32AddWithOverflow's) with a ForSet continuation, and this ForSet 
> continuation causes EmitWithContinuation to emit the overflow bit.
>
> In order to allow VisitWordCompareZero to cover Int32AddWithOverflow when 
> the overflow bit has other uses (which should all be already Defined), 
> you'll need to change something about the continuation used in 
> VisitWordCompareZero. A ForBranchAndSet continuation sounds reasonable, but 
> I haven't looked at the details.
>
> Happy to help if you have other questions :)
>
> Cheers,
> Darius
>
> On Thursday, November 9, 2023 at 11:31:24 AM UTC+1 sam.p...@arm.com wrote:
>
>> I can confirm that by disabling value numbering, a second 
>> Int32AddWithOverflow does exist:
>>
>> --- BLOCK B1158 id1159 <- B1153, B1157 ---
>>   5768: Phi[kRepFloat64](5753, 5767)
>>   5769: Float64RoundDown(5768)
>>   5770: Int64Constant[19]
>>   5771: Load[kRepTaggedPointer|kTypeAny](27, 5770)
>>   5772: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
>>   5773: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, 
>> sparse:^^](5690, 5689)
>>   5774: TypedStateValues[, sparse:..]
>>   5775: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, 
>> sparse:^^......](5773, 5774)
>>   5776: TypedStateValues[, sparse:.]
>>   5777: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 
>> <SharedFunctionInfo test_div>](5772, 5775, 5776, 2, 8, 0)
>>   5778: Int32AddWithOverflow(5689, 5689)
>>   5779: Projection[0](5778)
>>   5780: Projection[1](5778)
>>   5781: Branch[Unspecified, False](5780) -> B1160, B1159
>>
>> So could Overflow operations and/or Projections[1] be special-cased..? 
>> And why is there a difference here with a normal comparison?
>>
>> Thanks,
>> Sam
>>
>> On Thursday, November 9, 2023 at 9:50:53 AM UTC Sam Parker-Haynes wrote:
>>
>>> Hi,
>>>
>>> Thanks both!
>>>
>>> Looking again at the IR, I wonder if it is control related. This is the 
>>> well-behaved IR, where both instances of the Int32AddWithOverflow have a 
>>> control input.
>>>
>>> --- BLOCK B1145 id5053 <- B1143, B1144 ---                               
>>>                                                       
>>>   43: Merge(42, 13024)                                                   
>>>                                                       
>>>   53: Phi[kRepWord32](18, 13025, 43)                                     
>>>                                                       
>>>   52: Phi[kRepWord32](17, 13021, 43)                                     
>>>                                                       
>>>   51: EffectPhi(15, 13024, 43)                                           
>>>                                                       
>>>   13034: ChangeInt32ToFloat64(53)
>>>   6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43) 
>>>   13026: Int32AddWithOverflow(53, 53, 43)                               
>>>                                                        
>>>   13027: Projection[1](13026, 43)                                       
>>>                                                        
>>>   13028: Branch[Machine, False](13027, 43) -> B1147, B1146
>>>
>>> --- BLOCK B1158 id5027 <- B1157, B1153 ---
>>>   13061: Merge(13054, 13044)
>>>   13063: Phi[kRepFloat64](13056, 13060, 13061)
>>>   13062: EffectPhi(13056, 78, 13061)
>>>   6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061)
>>>   13068: Int32AddWithOverflow(53, 53, 13061)
>>>   13069: Projection[1](13068, 13061)
>>>   13070: Branch[Machine, False](13069, 13061) -> B1160, B1159
>>>
>>> However, with Turboshaft, the blocks look like this:
>>>
>>> --- BLOCK B1145 id1146 <- B1143, B1144 ---
>>>   5473: Phi[kRepWord32](5449, 5470)
>>>   5474: Phi[kRepWord32](5450, 5466)
>>>   5475: Int64Constant[19]
>>>   5476: Load[kRepTaggedPointer|kTypeAny](28, 5475)
>>>   5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
>>>   5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, 
>>> sparse:^^](5474, 5473)
>>>   5479: TypedStateValues[, sparse:..]
>>>   5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, 
>>> kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, 
>>> sparse:^^^^^...](5478, 5479, 27, 26, 25)
>>>   5481: TypedStateValues[, sparse:.]
>>>   5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5 
>>> <SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0)
>>>   5483: Int32AddWithOverflow(5473, 5473)
>>>   5484: Projection[0](5483)
>>>   5485: Projection[1](5483)
>>>   5486: Branch[Unspecified, False](5485) -> B1147, B1146
>>>
>>> --- BLOCK B1158 id1159 <- B1153, B1157 ---
>>>   5549: Phi[kRepFloat64](5534, 5548)
>>>   5550: Float64RoundDown(5549)
>>>   5551: Int64Constant[19]
>>>   5552: Load[kRepTaggedPointer|kTypeAny](27, 5551)
>>>   5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
>>>   5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, 
>>> sparse:^^](5474, 5473)
>>>   5555: TypedStateValues[, sparse:..]
>>>   5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, 
>>> sparse:^^......](5554, 5555)
>>>   5557: TypedStateValues[, sparse:.]
>>>   5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 
>>> <SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0)
>>>   5559: Branch[Unspecified, False](5485) -> B1160, B1159
>>>
>>>
>>> I have seen that Turboshaft has an EmitProjectionReducer (
>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606)
>>>  
>>> and seems to be closely related to ValueNumbering. Without control inputs, 
>>> I can't see why ValueNumbering, or something similar, wouldn't remove the 
>>> second instance.  
>>>
>>> >  It could be an artifact of scheduling
>>>
>>> I will have a look at the scheduler.
>>>
>>> > However, this should be prevented by this check
>>>
>>> Yes, it's that check that I'm trying to remove :) and this works for 
>>> everything but projections. Surely we should be able to combine an 
>>> operation even if it has multiple users? 
>>>
>>> Regards,
>>> Sam
>>>
>>> On Wednesday, November 8, 2023 at 5:20:37 PM UTC dmerc...@google.com 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> > With a turbofan-only flow, the overflow operation and the projection 
>>>> will be duplicated for the second branch, and everything is fine.
>>>>
>>>> I'm not sure which phase could duplicate a Int32AddWithOverflow. I 
>>>> don't think that the Instruction Selector does this. It could be an 
>>>> artifact of scheduling (which happens before effect control linearization 
>>>> or before instruction selection). I'm not sure it's a conscious 
>>>> optimization though, or just a side-effect of the scheduler.
>>>>
>>>> > Is it legal for all projected values to be live across basic blocks? 
>>>> If so, who should be responsible for ensuring values are live where they 
>>>> need to be?
>>>>
>>>> Yes it is legal (and probably happens fairly often). As to who should 
>>>> ensure that the values are live... It sounds like here the bug is in the 
>>>> Instruction Selector. What probably happens is that when emitting the 1st 
>>>> branch, we decide that it can cover the Projection[1] and don't need to 
>>>> emit it, which is wrong. However, this should be prevented by this check 
>>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278:
>>>>  
>>>> when user=the_first_branch and value=projection[1], CanCover should return 
>>>> false, because the projection has another use.
>>>>
>>>> If you can provide an example that we can reproduce, I'd be happy to 
>>>> have a look, but without being able to reproduce, we're just speculating :/
>>>>
>>>> > I don't understand why it has a control input
>>>>
>>>> Projections don't really need control inputs, but I think that having 
>>>> control inputs in projections is somewhat useful for scheduling in 
>>>> Turbofan. We trying removing the control input (
>>>> https://crrev.com/c/4570503), and performance regressed in some cases.
>>>>
>>>> Cheers,
>>>> Darius
>>>>
>>>> On Wednesday, November 8, 2023 at 6:02:40 PM UTC+1 Matthias Liedtke 
>>>> wrote:
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> Honestly I don't feel knowledgeable enough to answer this question, so 
>>>>> I hope some Turbofan experts can chime in.
>>>>> This line 
>>>>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/common-operator.cc;l=1622>
>>>>>  defines 
>>>>> that the Projection in Turbofan has one value input, a control input and 
>>>>> a 
>>>>> value output but no control output.
>>>>> I don't understand why it has a control input or what it exactly means 
>>>>> that an operation has a control input vs. let's say a Select operation 
>>>>> that 
>>>>> doesn't have one.
>>>>> Still, I'd assume that the Branch in B4 should have a control input 
>>>>> that at some point leads to the same control input as "2: Projection [1]" 
>>>>> and therefore I'd assume that it's valid as the control input of the 
>>>>> projection dominates the branch that uses it.
>>>>> Looking at the schedule, as the Projection is in B0, it is guaranteed 
>>>>> to be "executed" before B4, so the virtual register should be defined and 
>>>>> should be "available" / usable by all instructions following itself in B0 
>>>>> and in all instructions in blocks that are reachable from B0 directly or 
>>>>> indirectly.
>>>>> I don't think there is anything limiting the liveness of a virtual 
>>>>> register, so as long as a block A dominates another Block B, all virtual 
>>>>> registers defined in A are usable in B.
>>>>> So your snippet looks totally fine based on my (limited) understanding 
>>>>> of Turbofan.
>>>>>
>>>>> However, I'd assume it should be
>>>>>   1: Projection[0] (0)
>>>>>   2: Projection[1] (0)
>>>>> and not
>>>>>   1: Projection[0] (0)
>>>>>   2: Projection[1] (1)
>>>>> (i.e. both projections should have Int32AddWithOverflow as an input)
>>>>>
>>>>> Best regards,
>>>>> Matthias
>>>>>
>>>>>
>>>>> On Wed, Nov 8, 2023 at 5:24 PM Sam Parker-Haynes <sam.p...@arm.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi Matthias,
>>>>>>
>>>>>> My RecreateSchedule example looks like this:
>>>>>>
>>>>>> B0:
>>>>>>   0: Int32AddWithOverflow
>>>>>>   1: Projection[0] (0)
>>>>>>   2: Projection[1] (1)
>>>>>>   Branch (2)
>>>>>> ...
>>>>>>
>>>>>> B4:
>>>>>>   Branch (2)    <-- how should this branch receive (2)
>>>>>>
>>>>>> I know I said it looks okay at the IR level, but since the IR isn't 
>>>>>> documented that may not be true ;) So, what are the semantics of a 
>>>>>> Projection..?
>>>>>> - Is the Projection[1] of an Overflow operation a part of the control 
>>>>>> chain?
>>>>>> - Is it legal for all projected values to be live across basic 
>>>>>> blocks? If so, who should be responsible for ensuring values are live 
>>>>>> where 
>>>>>> they need to be?
>>>>>>
>>>>>> Thanks,
>>>>>> Sam
>>>>>>
>>>>>> On Wednesday, November 8, 2023 at 2:15:58 PM UTC Sam Parker-Haynes 
>>>>>> wrote:
>>>>>>
>>>>>> Also, are any instruction selection tests being added for Turboshaft? 
>>>>>> Something like the equivalent of 
>>>>>> test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc?
>>>>>>
>>>>>> On Wednesday, November 8, 2023 at 1:26:12 PM UTC Sam Parker-Haynes 
>>>>>> wrote:
>>>>>>
>>>>>> EDIT: only passes with --no-turboshaft
>>>>>>
>>>>>> On Wednesday, November 8, 2023 at 1:20:03 PM UTC Sam Parker-Haynes 
>>>>>> wrote:
>>>>>>
>>>>>> Hi Matthias,
>>>>>>
>>>>>> Ah, I hadn't seen that, so yes I've been tweaking the OG selector 
>>>>>> function for arm64. On the x64 side of things, everything is fine with 
>>>>>> 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue 
>>>>>> arises 
>>>>>> when modifying the turbofan selector function and using ---no-turboshaft.
>>>>>>
>>>>>> So, I guess the problem is with the RecreateSchedule?
>>>>>>
>>>>>> Thanks,
>>>>>> Sam 
>>>>>>
>>>>>> On Wednesday, November 8, 2023 at 1:12:56 PM UTC mlie...@google.com 
>>>>>> wrote:
>>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> I don't have any insights on x64 for VisitWordCompareZero but as a 
>>>>>> quick note, the TurboshaftAdapter instruction selection implementation 
>>>>>> for 
>>>>>> VisitWordCompareZero for arm64 was just added yesterday by me (
>>>>>> https://chromium-review.googlesource.com/c/v8/v8/+/5002003).
>>>>>>
>>>>>> So, just to clarify, I assume you are talking about the Turbofan 
>>>>>> {Node* / TurbofanAdapter} based instruction selection with Turboshaft 
>>>>>> optimizations followed by a RecreateSchedule?
>>>>>> If you do changes for the TurbofanAdapter instruction selection on 
>>>>>> arm64 for code that already has an implementation for the 
>>>>>> TurboshaftAdapter, could you add a TODO(mliedtke) on the Turboshaft 
>>>>>> implementation with some nice description and CC me on the gerrit 
>>>>>> changes, 
>>>>>> so these don't get lost when we switch over to the TurboshaftAdapter?
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Matthias
>>>>>>
>>>>>> On Wed, Nov 8, 2023 at 1:54 PM Sam Parker-Haynes <sam.p...@arm.com> 
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been investigating rematerialising flag setting instructions in 
>>>>>> VisitWordCompareZero, just by removing the CanCover check, and something 
>>>>>> is 
>>>>>> going wrong with turboshaft.
>>>>>>
>>>>>> I have an arithmetic overflow operation, Int32AddWithOverflow, and 
>>>>>> the overflow check is used by two branch operations. With a 
>>>>>> turbofan-only 
>>>>>> flow, the overflow operation and the projection will be duplicated for 
>>>>>> the 
>>>>>> second branch, and everything is fine. But with turboshaft, neither the 
>>>>>> overflow operation or projection are duplicated and the second branch 
>>>>>> uses 
>>>>>> the original projection[1] value. This looks fine at the IR level, but 
>>>>>> ends 
>>>>>> up broken after isel when the register allocator comes across a virtual 
>>>>>> register without a definition. This happens for both x64 and arm64, so 
>>>>>> I'm 
>>>>>> assuming turboshaft is making some assumptions, based on the current 
>>>>>> behaviour, that are non-obvious to me.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Thanks!
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> -- 
>>>>>> v8-dev mailing list
>>>>>> v8-...@googlegroups.com
>>>>>> http://groups.google.com/group/v8-dev
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "v8-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to v8-dev+un...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> -- 
>>>>>> -- 
>>>>>> v8-dev mailing list
>>>>>> v8-...@googlegroups.com
>>>>>> http://groups.google.com/group/v8-dev
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "v8-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to v8-dev+un...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/v8-dev/46693e6c-1216-493b-a104-2c9d41075987n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/3aabdaf2-7f9c-4b54-8b09-25441fb4b277n%40googlegroups.com.

Reply via email to