Re: [v8-dev] Optimizing Atomics for TurboFan

'Jaroslav Sevcik' via v8-dev Wed, 13 Jan 2016 23:03:03 -0800

Replies inline.

On Wed, Jan 13, 2016 at 9:52 PM, Ben Smith <bi...@chromium.org> wrote:


> Comments inline.
>
> On Wednesday, January 13, 2016 at 8:27:47 AM UTC-8, Jaroslav Sevcik wrote:
>>
>> I am sorry I am late to this thread. I would also prefer avoiding the
>> inline assembly. The stub approach seems less brittle. We should not be too
>> far from being able to generate stubs from Turbofan, which should mean
>> there would be just one implementation of each atomic.
>>
>
> I like this idea, but I'm not sure how to implement it. Quoting myself
> from above:
>
> The Atomics functions are polymorphic over the different TypedArray types
> (similar to ArrayBuffer keyed property access). So currently the functions
> check the TypedArray base type and forward to the current atomic
> instruction for that type. But TurboFan for asm.js should have enough type
> information to determine the correct TypedArray type without this check, so
> it makes sense that the codestubs should not have this check.
>
> So assuming we have codestubs like AtomicLoad8, AtomicLoad16, etc. how can
> I call these from FCG? It looks like the current way code stubs are called
> in FCG is via intrinsics (e.g. Math.pow or String.fromCharCode). But if I
> am hooking to the JavaScript functions then I'll have to match Atomics.load
> first, then do the TypedArray check in the FCG compiler so I can forward to
> the correct codestub. Looks like it should work, but I'm wondering if it is
> the right way to do things. It seems like the current path uses a type
> feedback in an IC to handle this, though that seems like a lot more work,
> and AFAICT won't be optimized without CS anyway, is that correct?
>
>
I actually thought we would do the TypedArray check inside each of the stub
(and then wire it up through the intrinsic). As a first step, we would just
call the stubs from FCS, CS and TF. This would avoid the overhead of the
JS->C call (which is the main cost in the current implementation).

As the next step, we could specialize for constant typed array in TF. This
would avoid the switch on types and the (arguably small) overhead of stub
call for asm.js code because there we specialize to the constant typed
arrays in the context. If done naively, we would duplicate the code for
emitting the atomic ops. One way to avoid the duplication would be to
create macro-assembler wrappers for emitting the right sequence for each
atomic and then use this in both the stubs and in the TF codegen.

If we feel this has to be optimized even more, we could implement the ICs
for the atomic ops to gather feedback and then consume it in TF and CS. I
do not think the added complexity would pay for itself.

Does it make any sense?




> And speaking of CrankShaft, doesn't implementing the type check in FCG
> require me to do the same check in CS? Or is there a way to share this code
> between the two?
>
>
>
>>
>> As you say, the biggest problem is the simulators, but I am hoping we
>> could just have a global lock for all the atomic instructions + fences in
>> the simulators and use that to get synchronization.
>>
>
> Yeah, if all compilers use the same stub and always run through the
> simulator it should be easier to reason about. I was more concerned about
> how to handle it when some code is running through the simulator emulating
> atomic instructions for the target architecture, and some code is
> implemented as inline assembly in the host architecture running real atomic
> instructions.
>

Yeah, you are right, that just would not work. This is another argument for
only using the atomic ops that we generate.


>
>>
>> Cheers, Jaro
>>
>> On Wed, Jan 13, 2016 at 6:11 AM, Benedikt Meurer <bme...@chromium.org>
>> wrote:
>>
>>> Hey Ben,
>>>
>>> Sorry for the delay. I'm still not 100% sure why we need the inline
>>> assembly. As long as we are C++-only we should be able to use the atomics
>>> provided by Chrome and/or C++? Or am I missing something here?
>>>
>>
> The atomics provided by Chrome don't have the properites we want (they
> aren't sequentially consistent), and they are missing many of the functions
> we want too. The C++ atomics and intrinsics do not have explicit
> instruction sequences, so we can't safely use them in concert with the
> instruction sequences we'd generate in TurboFan. AIUI, there are often
> alternate incompatible instruction sequences that can be used to implement
> C++11 atomics (see this document
> <https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html>), so we need to
> be sure we're generating the same ones in all cases.
>

Agreed.

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] Optimizing Atomics for TurboFan

Reply via email to