Hi Armin, hi all, below I give my 2 cents, giving some background on advanced optimizations like cross-loop optimization and automatic vectorization, which the others are mentioning without reference. These optimizations are used for numerical code like the one discussed (and mostly specific to that) in compiled languages like C/C++/Fortran.
On Fri, Dec 17, 2010 at 13:44, Armin Rigo <[email protected]> wrote: > Hi Alex, > > On Thu, Dec 16, 2010 at 8:28 PM, Alex Gaynor <[email protected]> wrote: >>> Regarding this - I was thinking about haveing a + b - c create a >>> bytecode that would be executed using small interpreter with a >>> jit-merge-point and a hint "can be vectorized". >> >> That seems like a pretty big special case, why not work at the larger idea >> of cross-loop optimizations? > > We have cross-loop optimizations. Unless I'm misunderstanding you, we > have already done it. That's how for example map(f, somelist) gets > JITted: it is a small loop that contains a call to the "normal" loop > of the interpreter for f(). This is JITted as the loop from map() > containing inline code from the interpreter loop. Cross-loop optimization are useful to optimize C/Fortran numerical code. For code like a + b - c, you want your compiler to optimize together different loops from the application (the loop implementing a+b and the one subtracting c from the result). For instance, if a, b, c are vectors, a + b - c is equivalent (after inlining) to the following code, which should be transformed into a single loop, to reduce loop overhead but more importantly to perform only one pass on d and thus improve cache locality for huge vectors, not entirely fitting in CPU caches: // a, b, c, d are vectors for i in range(1, length): d[i] = a[i] + b[i] for i in range(1, length): d[i] = d[i] - c[i] optimized result: for i in range(1, length): d[i] = a[i] + b[i] + c[i] > I am a bit unsure however what is being discussed here, because > fijal's comment is very terse and contains hidden issues, like when to > create the bytecode (at runtime? but how can it be a green constant > then? I think he wants to do something similar to regexp compilation + JITting as described on PyPy's blog. One can compile a regexp to bytecode (at runtime), but afterwards the result is constant for the regexp interpreter. In this case, I read the statement as fijal wants to compile not a regexp but the equivalent of a C++ expression template (see my other email): for a + b - c, you'd get Minus(Plus(a, b), c), and hopefully compile it directly to the efficient code above. > And what is the hint "can be vectorized"?). I can explain what "vectorized" means (it is a specific loop optimization), but I'm not sure about why you want the annotation. A reasonable possibility is to run this optimization pass only on the code with such an annotation, because most code doesn't need it. Most likely in this context, he refers to (automatic) loop vectorization, i.e., automatically converting the code to use SIMD instructions. On current processor, the following matrix code can be implemented in one SIMD sum instruction, but the compiler has to figure it out: for i in range(1, 4): c[i] = a[i] + b[i] In general, a SIMD operation operates in parallel on two short operands of vectors, rather than on two single operands. The Intel C/Fortran compilers, and more recently also GCC, can perform this rewriting without annotations from the user. See for instance: http://en.wikipedia.org/wiki/Vectorization_(parallel_computing) Introduction of: http://software.intel.com/en-us/articles/vectorization-with-the-intel-compilers-part-i/ -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/ _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
