Hi,
as Antoine pointed out in the corresponding issue
(http://bugs.python.org/issue14757#msg160870), measuring/assessing
real-world performance of my patch would be interesting. I mentioned
that I am not aware of any relevant Python 3 program/application to
report numbers for (but guess that the
Hello,
http://www.python.org/psf/contrib/
I took care of the formalities.
I am not sure how to proceed further. Would python-dev want me to draft a PEP?
Regards,
--stefan
PS: Personally, I am not a 100pct convinced that having a PEP is a
good thing in this case, as it makes a perfectly
I think you'll find that we don't keep a lot of things secret about CPython
and its implementation.
Yeah, I agree that this is in principal a good thing and what makes
CPython ideally suited for research. However, my optimizations make
use of unused opcodes, which might be used in the future by
Hello Mark,
A URL for the code repository (with an open-source license),
so code can be reviewed.
It is hard to review and update a giant patch.
OK, I took Nick's advice to heart and created a fork from the official
cpython mirror on bitbucket. You can view the code patched in
(branch:
Hi,
I only had little time to spend for my open sourcing efforts, which is
why I could not get back to python-dev any time earlier...
Yesterday I forward-ported my patches to revision 76549
(13c30fe3f427), which only took 25mins or so (primarly due to the
small changes necessary to Python itself
I really don't think that is a problem. The core contributors can deal
well with complexity in my experience. :-)
No no, I wasn't trying to insinuate anything like that at all. No, I
just figured that having the code generator being able to generate 4
optimizations where only one is supported
How many times did you regenerate this code until you got it right?
Well, honestly, I changed the code generator to pack the new
optimized instruction derivatives densly into the available opcodes,
so that I can make optimal use of what's there. Thus I only generated
the code twice for this
But let me put this straight: as an open-source project, we are hesitant to
accept changes which depend on closed software. Even if your optimization
techniques would result in performance a hundred times better than what is
currently achieved, we would still be wary to accept them.
Please
On Wed, Feb 1, 2012 at 09:46, Guido van Rossum gu...@python.org wrote:
Let's make one thing clear. The Python core developers need to be able
to reproduce your results from scratch, and that means access to the
templates, code generators, inputs, and everything else you used. (Of
course for
I assume yes here means yes, I'm aware and not yes, I'm using Python
2, right? And you're building on top of the existing support for threaded
code in order to improve it?
Your assumption is correct, I'm sorry for the sloppiness (I was
heading out for lunch.) None of the code is 2.x
If I read the patch correctly, most of it is auto-generated (and there
is probably a few spurious changes that blow it up, such as the
python-gdb.py file).
Hm, honestly I don't know where the python-gdb.py file comes from, I
thought it came with the switch from 3.1 to the tip version I was
There is also the issue of the two test modules removed from the
test suite.
Oh, I'm sorry, seems like the patch did contain too much of my
development stuff. (I did remove them before, because they were always
failing due to the instruction opcodes being changed because of
quickening; they
Hello,
Could you try benchmarking with the standard benchmarks:
http://hg.python.org/benchmarks/
and see what sort of performance gains you get?
Yeah, of course. I already did. Refere to the page listed below for
details. I did not look into the results yet, though.
How portable is the
Well, you're aware that Python already uses threaded code where
available? Or are you testing against Python 2?
Yes, and I am building on that.
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
Hi,
On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote:
2011/11/8 stefan brunthaler s.bruntha...@uci.edu:
How does that sound?
I think I can hear real patches and benchmarks most clearly.
I spent the better part of my -20% time on implementing the work as
suggested
Hi guys,
while there is at least some interest in incorporating my
optimizations, response has still been low. I figure that the changes
are probably too much for a single big incorporation step. On a recent
flight, I thought about cutting it down to make it more easily
digestible. The basic idea
as promised, I created a publicly available preview of an
implementation with my optimizations, which is available under the
following location:
https://bitbucket.org/py3_pio/preview/wiki/Home
One very important thing that I forgot was to indicate that you have
to use computed gotos (i.e.,
1) The SFC optimisation is purely based on static code analysis, right? I
assume it takes loops into account (and just multiplies scores for inner
loops)? Is that what you mean with nesting level? Obviously, static
analysis can sometimes be misleading, e.g. when there's a rare special case
Hi,
as promised, I created a publicly available preview of an
implementation with my optimizations, which is available under the
following location:
https://bitbucket.org/py3_pio/preview/wiki/Home
I followed Nick's advice and added some valuable advice and
overview/introduction at the wiki page
I think that you must deal with big endianess because some RISC can't handle
at all data in little endian format.
In WPython I have wrote some macros which handle both endianess, but lacking
big endian machines I never had the opportunity to verify if something was
wrong.
I am sorry for the
So, basically, you built a JIT compiler but don't want to call it that,
right? Just because it compiles byte code to other byte code rather than to
native CPU instructions does not mean it doesn't compile Just In Time.
For me, a definition of a JIT compiler or any dynamic compilation
subsystem
Changing the bytecode width wouldn't make the interpreter more complex.
No, but I think Stefan is proposing to add a *second* byte code format,
in addition to the one that remains there. That would certainly be an
increase in complexity.
Yes, indeed I have a more straightforward instruction
On Tue, Aug 30, 2011 at 09:42, Guido van Rossum gu...@python.org wrote:
Stefan, have you shared a pointer to your code yet? Is it open source?
I have no shared code repository, but could create one (is there any
pydev preferred provider?). I have all the copyrights on the code, and
I would like
Do you really need it to match a machine word? Or is, say, a 16-bit
format sufficient.
Hm, technically no, but practically it makes more sense, as (at least
for x86 architectures) having opargs and opcodes in half-words can be
efficiently expressed in assembly. On 64bit architectures, I could
Do I sense that the bytecode format is no longer platform-independent?
That will need a bit of discussion. I bet there are some things around
that depend on that.
Hm, I haven't really thought about that in detail and for longer, I
ran it on PowerPC 970 and Intel Atom i7 without problems (the
Um, I'm sorry, but that reply sounds incredibly naive, like you're not
really sure what the on-disk format for .pyc files is or why it would
matter. You're not even answering the question, except indirectly --
it seems that you've never even thought about the possibility of
generating a .pyc
Ok, there there's something else you haven't told us. Are you saying
that the original (old) bytecode is still used (and hence written to
and read from .pyc files)?
Short answer: yes.
Long answer: I added an invocation counter to the code object and keep
interpreting in the usual Python
On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson benja...@python.org wrote:
2011/8/30 stefan brunthaler ste...@brunthaler.net:
I will remove my development commentaries and create a private
repository at bitbucket for you* to take an early look like Georg (and
more or less Terry, too
Hi,
pretty much a year ago I wrote about the optimizations I did for my
PhD thesis that target the Python 3 series interpreters. While I got
some replies, the discussion never really picked up and no final
explicit conclusion was reached. AFAICT, because of the following two
factors, my
Perhaps there would be something to say given patches/overviews/specifics.
Currently I don't have patches, but for an overview and specifics, I
can provide the following:
* My optimizations basically rely on quickening to incorporate
run-time information.
* I use two separate instruction
The question really is whether this is an all-or-nothing deal. If you
could identify smaller parts that can be applied independently, interest
would be higher.
Well, it's not an all-or-nothing deal. In my current architecture, I
can selectively enable most of the optimizations as I see fit.
Does it speed up Python? :-) Could you provide numbers (benchmarks)?
Yes, it does ;)
The maximum overall speedup I achieved was by a factor of 2.42 on my
i7-920 for the spectralnorm benchmark of the computer language
benchmark game.
Others from the same set are:
binarytrees: 1.9257 (1.9891)
Personally, I *like* CPython fitting into the simple-and-portable
niche in the Python interpreter space. Armin Rigo made the judgment
years ago that CPython was a poor platform for serious optimisation
when he stopped working on Psyco and started PyPy instead, and I think
the contrasting
Hi,
I guess it would be a good idea to quickly outline my inline caching
approach, so that we all have a basic understanding of how it works.
If we take for instance the BINARY_ADD instruction, the interpreter
evaluates the actual operand types and chooses the matching operation
implementation at
This sounds like wpython (a CPython derivative with a wider set of byte code
commands) could benefit from it.
I am aware of the wpython project of Cesare di Mauro. I change the
instruction format from bytecode to wordcode, too (because it allows
for more efficient instruction decoding).
How do you generate the specialized opcode implementations?
I have a small code generator written in Python that uses Mako
templates to generate C files that can be included in the main
interpreter. It is a data driven approach that uses type information
gathered by gdb and check whether given
wpython has reached 1.1 final version. If you are interested, you can find
it here: http://code.google.com/p/wpython2/ and you can download the new
slides that cover the improvements over 1.0 alpha.
Thanks for the hint, I will definitely check your new slides.
Did you used wpython wordcode
I think I was wrong, but now I understand. The inlining you want is
to get the nb_add body, not the opcode body.
Exactly. This would increase performace by quite a bit -- I will start
experimentation with that stuff a.s.a.p.
The example you've given brings up a correctness issue. It seems
Hello,
during the last year, I have developed a couple of quickening-based
optimizations for the Python 3.1 interpreter. As part of my PhD
programme, I have published a first technique that combines quickening
with inline caching at this year's ECOOP, and subsequently extended
this technique to
Is the source code under an open source non-copyleft license?
I am (unfortunately) not employed or funded by anybody, so I think
that I can license/release the code as I see fit.
Have you checked that the whole regression test suite passes?
Currently, I am sure my prototype will not pass the
The Springer link [1] at least shows the front page to give more of an
idea as to what this is about.
Thanks, I forgot to mention the link.
The idea does sound potentially interesting, although I'm not sure how
applicable it will be with a full-blown LLVM-based JIT on the way for
3.3 (via
41 matches
Mail list logo