On Wed, 14 Mar 2007, Armin Rigo wrote: > Hi Chris, Hey armin, I'm cc'ing pypy-dev, because this is suitable for a broader audience. I'm sorry for the delay in my reply, I have been travelling recently.
> After reading your slides from yesterday's presentation, it occurred to > me that you might have a wrong idea about what PyPy is. I'm Sorry about that. The third section of my talk was only loosely based on pypy. Some of it was "inspired" by pypy, some of it was "unification" ideas that I have been thinking for a while, some is based on my own experience with static analysis, and some is just wishful thinking. I'm sorry if it came across as "here is what pypy is doing", that is now what I intended. > double-guessing here, so of course I may be completely off - please bear > with me. What PyPy is definitely not, is a Python compiler. In order > to get to a common ground of discussion, may I suggest the following > references: Right. If I had to do a layman's summary, I would say that pypy provides infrastructure for building interpreters in [r]python. This infrastructure makes it much easier than starting from scratch, e.g. by providing reusable components for language runtimes (like GC's). My talk (available here http://llvm.org/pubs/2007-03-12-BossaLLVMIntro.html ) clearly doesn't describe pypy as it exists today, but I think it describes a place that pypy could get to if the community desired it (In other words, I think the strengths of pypy and of its community both play well into this). The members of the pypy project clearly have experience with type inference, clearly understand dynamic language issues, and clearly understand that the semantics of these languages differ widely, but there are also a lot of commonality. If the pypy community isn't interested, I will approach others, eventually falling back to doing it directly in LLVM if needed. For the record, I have read Brett Cannon's thesis. I don't think his results are applicable for a number of reasons. First, his work was in the context of an interpreter that used type information for optimization. The approach I'm describing uses type information to give substantially more information to a static compiler. The result of this is that the codegen would be dramatically more efficient, and without the overhead of an interpreter loop, this is far more significant than in his evaluation. Also, significantly more type information would be available to the type propagator if you used more aggressive techniques than he did. To me, the much more significant issue with python (and ruby, and others) for integer operations is that they automatically promote to big integers on demand. In practice, this means that the code generated by a static compiler would be optimized for the small case, but then would fallback gracefully in the large case. If you're familiar with X86 asm, I'd expect code for a series of adds to look like this: add EAX, EBX jo recover_1 add EAX, 17 jo recover_2 add EAX, ECX jo recover_3 The idea of this is that (in the common case where the app deals with small integers) you have *exteremely* fast codegen with easily predicatable branches. The recovery code could, for example, package up the registers into real "integer" objects, and then callback into the interpreter to handle the hard case (note that this is very similar to the fast paths in the standard python interpreter, which eventually falls back to calling PyNumber_Add). Floating point code, if type inference is successful, does not need this sort of recovery code (unlike integer code). Other languages (like Javascript) don't have this issue, but they have others (e.g. there are no integers :), only floats. Of course I'm biased, but I think that this sort of code could easily and naturally be generated and optimized by LLVM, if you chose to target it. This would let LLVM do the range analysis required to eliminate the branches (when possible), handle other low level optimization, codegen, etc. You would get extremely high performance code and a very retargettable compiler. Has python even been executed with a sustained performance of two instructions per add? :) -Chris -- http://nondot.org/sabre/ http://llvm.org/ _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev