Hi Nadav,

----- Original Message -----
> Hi Dan!
> > On Sep 28, 2014, at 6:44 AM, Dan Gohman <sunf...@mozilla.com> wrote:
> > 
> > Hi Nadav,
> > 
> > I agree with much of your assessment of the the proposed SIMD.js API.
> > However, I don't believe it's unsuitability for some problems
> > invalidates it for solving other very important problems, which it is
> > well suited for. Performance portability is actually one of SIMD.js'
> > biggest strengths: it's not the kind of performance portability that
> > aims for a consistent percentage of peak on every machine (which, as you
> > note, of course an explicit 128-bit SIMD API won't achieve), it's the
> > kind of performance portability that achieves predictable performance
> > and minimizes surprises across machines (though yes, there are some
> > unavoidable ones, but overall the picture is quite good).
> There is a tradeoff between the performance portability of the SIMD.js ISA
> and its usefulness. A small number of instructions (that only targets 32bit
> data types, no masks, etc) is not useful for developing non-trivial vector
> programs. You need 16bit vector elements to support WebGL vertex indices,
> and lane-masking for implementing predicated control flow for programs like
> ray tracers. Introducing a large number of vector instructions will expose
> the performance portability problems. I don’t believe that there is a sweet
> spot in this tradeoff. I don’t think that we can find a small set of
> instructions that will be useful for writing non-trivial vector code that is
> performance portable.

My belief in the existence of a sweet spot is based on looking at other 
systems, hardware and software, that have already gone there.

For an interesting example, take a look at this page:


Every SIMD operation used in that article is directly supported by a 
corresponding function in SIMD.js today. We do have an open question on whether 
we should do something different for the rsqrt instruction, since the hardware 
only provides an approximation. In this case the code requires some 
Newton-Raphson, which may give us some flexibility, but several things are 
possible there. And of course, sweet spot doesn't mean cure-all.

Also, I am preparing to propose that SIMD.js handle 16-bit vector elements too 
(int16x8). It fits pretty naturally into the overall model. There are some 
challenges on some architectures, but there are challenges with alternative 
approaches too, and overall the story looks good.

Other changes are also being discussed too. In general, the SIMD.js spec is 
still evolving; participation is welcome :-).

> > This is an example of a weakness of depending on automatic vectorization
> > alone. High-level language features create complications which can lead
> > to surprising performance problems. Compiler transformations to target
> > specialized hardware features often have widely varying applicability.
> > Expensive analyses can sometimes enable more and better vectorization,
> > but when a compiler has to do an expensive complex analysis in order to
> > optimize, it's unlikely that a programmer can count on other compilers
> > doing the exact same analysis and optimizing in all the same cases. This
> > is a problem we already face in many areas of compilers, but it's more
> > pronounced with vectorization than many other optimizations.
> I agree with this argument. Compiler optimizations are unpredictable. You
> never know when the register allocator will decide to spill a variable
> inside a hot loop.  or a memory operation confuse the alias analysis. I also
> agree that loop vectorization is especially sensitive.
> However, it looks like the kind of vectorization that is needed to replace
> SIMD.js is a very simple SLP vectorization
> <http://llvm.org/docs/Vectorizers.html#the-slp-vectorizer> (BB
> vectorization). It is really easy for a compiler to combine a few scalar
> arithmetic operations into a vector. LLVM’s SLP-vectorizer support
> vectorization of computations across basic blocks and succeeds in surprising
> places, like vectorization of STDLIB code where the ‘begin' and ‘end'
> iterators fit into a 128-bit register!

That's a surprising trick!

I agree that SLP vectorization doesn't have the same level of "performance 
cliff" as loop vectorization. And, it may be a desirable thing for JS JITs to 
start doing.

Even so, there is still value in an explicit SIMD API in the present. For the 
core features, instead of giving developers sets of expression patterns to 
follow to ensure SLP recognition, we are giving names to those patterns and 
letting developers identify which patterns they wish to use by their names. We 
can coordinate, compare, and standardize them by name across browsers, and in 
the future we may make a variety of interesting extensions to the API which 
developers will be able to feature-test for.

And if, in the future, SLP vectorization proves itself reliable enough in JS, 
then we can drop our custom JIT implementations of SIMD.js and just use the 
polyfill again, and SIMD.js as a language feature can just fade away. The 
footprint in the language is quite minimal. And also, work done on SIMD.js 
won't have been wasted, because a lot of this code is code that would be needed 
to support an auto-vectorizer as well. In fact, SIMD.js may be a natural step 
toward the future you propose. We may also observe that LLVM itself took this 
route, with explicit SIMD constructs well established before it added 
auto-vectorization on top of them.

> > In contrast, the proposed SIMD.js has the property that code using it
> > will not depend on expensive compiler analysis in the JIT, and is much
> > more likely to deliver predictable performance in practice between
> > different JIT implementations and across a very practical variety of
> > hardware architectures.
> Performance portability across JITs should not motivate us to solve a
> compiler problem in the language itself. JITs should continue to evolve and
> learn new tricks. Introducing new language features increases the barrier of
> entry for new JavaScript implementations.

New JITs not concerned with SIMD optimization can use the polyfill.

New JITs which do wish to optimize SIMD code will find SIMD.js much easier to 
implement than auto-vectorization. It builds on typed values, something the JS 
language is already moving to, and otherwise it just adds a bunch of 
straightforward functions which map to simple instruction sequences -- many of 
them being instruction sequences that an auto-vectorizing JIT would also need.

> > In fact, a good example of short and long vector models coexisting is in
> > these popular GPU programming models that you mentioned, where short
> > vectors represent things in the problem domains like colors and
> > coordinates, and are then broken down by the compiler to participate in
> > the long vectors, as you described. It's very plausible that the
> > proposed SIMD.js could be adapted to combine with a future long-vector
> > approach in the same way.
> Data-parallel languages like GLSL and OpenCL are statically typed and vector
> types are used to increase the developer productivity. Using vector types in
> data-parallel languages often hurts performance because it forces the memory
> layout to be AOS instead of SOA. In JavaScript, the library Three.js
> <http://threejs.org/> introduces data types such as “THREE.Vector3” that are
> used to describe the problem domain, and not to accelerate code.

On GPUs like Mali-T600 or many AMD GPU architectures, there is a natural float4 
type and other 128-bit types in hardware. Many GPUs have designs that naturally 
fit concepts in the graphics problem domain.

Also, if developers wish to use data types which increase their productivity or 
which are part of the problem domain, then they may wish to use AOS rather than 
SOA regardless of whether the underlying type is "SIMD" or not. It is indeed 
always an interesting question, whether it's desirable to decrease developer 
productivity in order to specialize for performance on some platforms.

webkit-dev mailing list

Reply via email to