Re: [pypy-dev] JIT'ed function performance degrades

Paolo Giarrusso Thu, 19 Aug 2010 04:42:50 -0700

Hi Maciej,
I think you totally misunderstood me, possibly because I was not
clear, see below. In short, I was wondering whether the approach of
the original code made any sense, and my guess was "mostly not",
exactly because there is little constant folding possible in the code,
as it is written.

[Hart, I don't think that any O(N^2) implementation of DFT (what is in
the code), i.e. two nested for loops, should be written to explicitly
take advantage of the JIT. I don't know about the FFT algorithm, but a
few vague ideas say "yes", because constant folding the length could
_maybe_ allow constant folding the permutations applied to data in the
Cooley–Tukey FFT algorithm.]

On Thu, Aug 19, 2010 at 12:11, Maciej Fijalkowski <[email protected]> wrote:
> Hi
<snip>

>> Moreover, I'm not sure you need to use the JIT yourself.
>> - Your code is RPython, so you could as well just translate it without
>> JIT annotations, and it will be compiled to C code.
>> - Otherwise, you could write that as a app-level function, i.e. in
>> normal Python, and pass it to a translated PyPy-JIT interpreter. Did
>> you try and benchmark the code?
>> Can I ask you why you did not write that as a app-level function, i.e.
>> as normal Python code, to use PyPy's JIT directly, without needing
>> detailed understanding of the JIT?
>> It would be interesting to see a comparison (and have it on the web,
>> after some code review).
>
> JIT can essentially speed up based on constant folding based on
> bytecode. Bytecode should be the only green variable here and all
> others (that you don't want to specialize over) should be red and not
> promoted. In your case it's very likely you compile new loop very
> often (overspecialization).

I see no bytecode in the example - it's a DFT implementation.
For each combination of green variables, there are 1024 iterations,
and there are 1024 such combinations, so overspecialization is almost
guaranteed.

My next question, inspired from the specific code, is: is JITted code
ever thrown away, if too much is generated? Even for valid use cases,
most JITs can generate too much code, and they need then to choose
what to keep and what to throw away.

>> Especially, I'm not sure that as currently written you're getting any
>> speedup, and I seriously wonder whether the JIT could give an
>> additional speedup over RPython here (the regexp interpreter is a
>> completely different case, since it compiles a regexp, but why do you
>> compile an array?).
>
> That's silly, our python interpreter is an RPython program. Anything
> that can have a meaningfully defined "bytecode" or a "compile time
> constant" can be sped up by the JIT. For example a templating
> language.

You misunderstood me, I totally agree with you, and my understanding
is that in the given program (which I read almost fully) constant
folding makes little sense.
Since that program is written with RPython + JIT, but it has green
variables which are not at all "compile time constants", "I wonder
seriously" was meant as "I wonder seriously whether what you are
trying makes any sense". As I argued, the only constant folding
possible is for the array length. And again, I wonder whether it's
worth it, my guess tends towards "no", but a benchmark is needed
(there will be some improvement probably).

I was just a bit vaguer because I just studied docs on PyPy (and
papers about tracing compilation). But your answer confirms that my
original analysis is correct, and that I should write more clearly
maybe.

>> I think just raw CPython can be 340x slower than C (I assume NumPy
>> uses C)

> You should check more and have less assumptions.

I did some checks, on PyPy's blog actually, not definitive though, and
I stand by what I meant (see below). Without reading the pastie in
full, however, my comments are out of context.
I guess your tone is fine, since you thought I wrote nonsense. But in
general, I have yet to see a guideline forbidding "IIRC" and similar
ways of discussing (the above was an _educated_ guess), especially
when the writer remembers correctly (as in this case).
Having said that, I'm always happy to see counterexamples and learn
something, if they exist. In this case, for what I actually meant (and
wrote, IMHO), a counterexample would be a RPython or a JITted program
>= 340x slower than C.

For the speed ratio, the code pastie writes that RPython JITted code
is 340x slower than NumPy code, and I was writing that it's
unreasonable; in this case, it happens because of overspecialization
caused by misuse of the JIT.

For speed ratios among CPython, C, RPython, I was comparing to
http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html.
What I meant is that JITted code can't be so much slower than C.

For NumPy, I had read this:
http://morepypy.blogspot.com/2009/07/pypy-numeric-experiments.html,
and it mostly implies that NumPy is written in C (it actually says
"NumPy's C version", but I missed it). And for the specific discussed
microbenchmark, the performance gap between NumPy and CPython is
~100x.

Best regards
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/
_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Re: [pypy-dev] JIT'ed function performance degrades

Reply via email to