Author: Richard Plangger <r...@pasra.at> Branch: vecopt-merge Changeset: r79117:3c01987639c7 Date: 2015-08-21 15:43 +0200 http://bitbucket.org/pypy/pypy/changeset/3c01987639c7/
Log: documentation additions (command line flags), added description of the ABC optimization, note on limitations diff --git a/rpython/doc/jit/index.rst b/rpython/doc/jit/index.rst --- a/rpython/doc/jit/index.rst +++ b/rpython/doc/jit/index.rst @@ -25,6 +25,7 @@ pyjitpl5 optimizer virtualizable + vectorization - :doc:`Overview <overview>`: motivating our approach diff --git a/rpython/doc/jit/optimizer.rst b/rpython/doc/jit/optimizer.rst --- a/rpython/doc/jit/optimizer.rst +++ b/rpython/doc/jit/optimizer.rst @@ -178,6 +178,10 @@ It is prepended to all optimizations and thus extends the Optimizer class and unrolls the loop once before it proceeds. +Vectorization +------------- + +- :doc:`Vectorization <vectorization>` What is missing from this document ---------------------------------- diff --git a/rpython/doc/jit/vectorization.rst b/rpython/doc/jit/vectorization.rst --- a/rpython/doc/jit/vectorization.rst +++ b/rpython/doc/jit/vectorization.rst @@ -7,6 +7,18 @@ that is that they use the same index variable and offset can be expressed as a a linear or affine combination. +Command line flags: + +* --jit vec=1: turns on the vectorization for marked jitdrivers + (e.g. those in the NumPyPy module). +* --jit vec_all=1: turns on the vectorization for any jit driver. See parameters for + the filtering heuristics of traces. +* --jit vec_ratio=2: A number from 0 to 10 that represents a real number (vec_ratio / 10). + This filters traces if vec_all is enabled. N is the trace count then the number of + vector transformable operations (add_int -> vec_add_int) M, the following must hold: + M / N >= (vec_ratio / 10) +* --jit vec_length=60: The maximum number of trace instructions the vectorizer filters for. + Features -------- @@ -38,6 +50,28 @@ load/store instructions) are not removed. The backend removes these instructions while assembling the trace. +In addition a simple heuristic (enabled by --jit vec_all=1) tries to remove +array bound checks for application level loops. It tries to identify the array +bound checks and adds a transitive guard at the top of the loop:: + + label(...) + ... + guard(i < n) # index guard + ... + guard(i < len(a)) + a = load(..., i, ...) + ... + jump(...) + # becomes + guard(n < len(a)) + label(...) + guard(i < n) # index guard + ... + a = load(..., i, ...) + ... + jump(...) + + Future Work and Limitations --------------------------- @@ -54,5 +88,9 @@ to have 2 xmm registers (one filled with zero bits and the other with one every bit). This cuts down 2 instructions for guard checking, trading for higher register pressure. * prod, sum are only supported by 64 bit data types +* isomorphic function prevents the following cases for combination into a pair: + 1) getarrayitem_gc, getarrayitem_gc_pure + 2) int_add(v,1), int_sub(v,-1) .. _PMUL: http://stackoverflow.com/questions/8866973/can-long-integer-routines-benefit-from-sse/8867025#8867025 + _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit