Re: [v8-dev] Optimizing Atomics for TurboFan

Ben Smith Thu, 14 Jan 2016 12:08:59 -0800


On Wednesday, January 13, 2016 at 11:02:13 PM UTC-8, Jaroslav Sevcik wrote:
>
> Replies inline.
>
> On Wed, Jan 13, 2016 at 9:52 PM, Ben Smith <bi...@chromium.org 
> <javascript:>> wrote:
>
>> Comments inline.
>>
>> On Wednesday, January 13, 2016 at 8:27:47 AM UTC-8, Jaroslav Sevcik wrote:
>>>
>>> I am sorry I am late to this thread. I would also prefer avoiding the 
>>> inline assembly. The stub approach seems less brittle. We should not be too 
>>> far from being able to generate stubs from Turbofan, which should mean 
>>> there would be just one implementation of each atomic.
>>>
>>
>> I like this idea, but I'm not sure how to implement it. Quoting myself 
>> from above:
>>
>> The Atomics functions are polymorphic over the different TypedArray types 
>> (similar to ArrayBuffer keyed property access). So currently the functions 
>> check the TypedArray base type and forward to the current atomic 
>> instruction for that type. But TurboFan for asm.js should have enough type 
>> information to determine the correct TypedArray type without this check, so 
>> it makes sense that the codestubs should not have this check.
>>
>> So assuming we have codestubs like AtomicLoad8, AtomicLoad16, etc. how 
>> can I call these from FCG? It looks like the current way code stubs are 
>> called in FCG is via intrinsics (e.g. Math.pow or String.fromCharCode). But 
>> if I am hooking to the JavaScript functions then I'll have to match 
>> Atomics.load first, then do the TypedArray check in the FCG compiler so I 
>> can forward to the correct codestub. Looks like it should work, but I'm 
>> wondering if it is the right way to do things. It seems like the current 
>> path uses a type feedback in an IC to handle this, though that seems like a 
>> lot more work, and AFAICT won't be optimized without CS anyway, is that 
>> correct?
>>
>>
> I actually thought we would do the TypedArray check inside each of the 
> stub (and then wire it up through the intrinsic). As a first step, we would 
> just call the stubs from FCS, CS and TF. This would avoid the overhead of 
> the JS->C call (which is the main cost in the current implementation).
>
> As the next step, we could specialize for constant typed array in TF. This 
> would avoid the switch on types and the (arguably small) overhead of stub 
> call for asm.js code because there we specialize to the constant typed 
> arrays in the context. If done naively, we would duplicate the code for 
> emitting the atomic ops. One way to avoid the duplication would be to 
> create macro-assembler wrappers for emitting the right sequence for each 
> atomic and then use this in both the stubs and in the TF codegen.
>


Ah, that makes sense, thanks.
 

>
> If we feel this has to be optimized even more, we could implement the ICs 
> for the atomic ops to gather feedback and then consume it in TF and CS. I 
> do not think the added complexity would pay for itself.
>
> Does it make any sense?
>

Yes, thanks. I'll start working on the stub for Atomics.load first. When I 
have that ready I'll send it to you in a CL so we can see if I'm on the 
right track.
 

>
>  
>
>
>> And speaking of CrankShaft, doesn't implementing the type check in FCG 
>> require me to do the same check in CS? Or is there a way to share this code 
>> between the two?
>>
>>  
>>
>>>
>>> As you say, the biggest problem is the simulators, but I am hoping we 
>>> could just have a global lock for all the atomic instructions + fences in 
>>> the simulators and use that to get synchronization.
>>>
>>
>> Yeah, if all compilers use the same stub and always run through the 
>> simulator it should be easier to reason about. I was more concerned about 
>> how to handle it when some code is running through the simulator emulating 
>> atomic instructions for the target architecture, and some code is 
>> implemented as inline assembly in the host architecture running real atomic 
>> instructions.
>>
>  
> Yeah, you are right, that just would not work. This is another argument 
> for only using the atomic ops that we generate.
>
>  
>>
>>>
>>> Cheers, Jaro
>>>
>>> On Wed, Jan 13, 2016 at 6:11 AM, Benedikt Meurer <bme...@chromium.org> 
>>> wrote:
>>>
>>>> Hey Ben,
>>>>
>>>> Sorry for the delay. I'm still not 100% sure why we need the inline 
>>>> assembly. As long as we are C++-only we should be able to use the atomics 
>>>> provided by Chrome and/or C++? Or am I missing something here?
>>>>
>>>
>> The atomics provided by Chrome don't have the properites we want (they 
>> aren't sequentially consistent), and they are missing many of the functions 
>> we want too. The C++ atomics and intrinsics do not have explicit 
>> instruction sequences, so we can't safely use them in concert with the 
>> instruction sequences we'd generate in TurboFan. AIUI, there are often 
>> alternate incompatible instruction sequences that can be used to implement 
>> C++11 atomics (see this document 
>> <https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html>), so we need to 
>> be sure we're generating the same ones in all cases.
>>
>
> Agreed.
>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] Optimizing Atomics for TurboFan

Reply via email to