subject:"\[Python\-Dev\] Python 3 optimizations continued..."

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-02 Thread Lennart Regebro

On Wed, Feb 1, 2012 at 20:08, stefan brunthaler s.bruntha...@uci.edu wrote:
 I understand all of these issues. Currently, it's not really a mess,
 but much more complicated as it needs to be for only supporting the
 inca optimization.

I really don't think that is a problem. The core contributors can deal
well with complexity in my experience. :-)

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-02 Thread stefan brunthaler

 I really don't think that is a problem. The core contributors can deal
 well with complexity in my experience. :-)

No no, I wasn't trying to insinuate anything like that at all. No, I
just figured that having the code generator being able to generate 4
optimizations where only one is supported is a bad idea for several
reasons, such as maintainability, etc.

Anyways, I've just completed the integration of the code generator and
put the corresponding patch on my page
(http://www.ics.uci.edu/~sbruntha/pydev.html) for downloading. The
license thing is still missing, I'll do that tomorrow or sometime next
week.

Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread stefan brunthaler

 How many times did you regenerate this code until you got it right?

Well, honestly, I changed the code generator to pack the new
optimized instruction derivatives densly into the available opcodes,
so that I can make optimal use of what's there. Thus I only generated
the code twice for this patch.

 And how do you know that you really got it so right that it was the last time 
 ever
 that you needed your generator for it?

I am positive that I am going to need my code generator in the future,
as I have several ideas to increase performance even more. As I have
mentioned before, my quickening based inline caching technique is very
simple, and if it would crash, chances are that any of the
inline-cache miss guards don't capture all scenarios, i.e., are
non-exhaustive. The regression-tests run, so do the official
benchmarks plus the computer language benchmarks game. In addition,
this has been my line of research since 2009, so I have extensive
experience with it, too.

 What if the C structure of any of those several types ever changes?

Since I optimize interpreter instructions, any change that affects
their implementation requires changing of the optimized instructions,
too. Having the code generator ready for such things would certainly
be a good idea (probably also for generating the default interpreter
dispatch loop), since you could also add your own profile for your
application/domain to re-use the remaining 30+ instruction opcodes.
The direct answer is that I would need to re-generate the driver file,
which is basically a gdb-dump plus an Emacs macro (please note that I
did not need to do that since working with ~ 3.0b1) I will add a list
of the types I use for specializing to patch section on the
additional resources page of my homepage (including a fixed patch of
what Georg brought to my attention.)

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Łukasz Langa

Wiadomość napisana przez stefan brunthaler w dniu 1 lut 2012, o godz. 16:55:

 And how do you know that you really got it so right that it was the last 
 time ever
 that you needed your generator for it?
 
 I am positive that I am going to need my code generator in the future,
 as I have several ideas to increase performance even more.

Hello, Stefan.
First let me thank you for your interest in improving the interpreter. We 
appreciate and encourage efforts to make it perform better.

But let me put this straight: as an open-source project, we are hesitant to 
accept changes which depend on closed software. Even if your optimization 
techniques would result in performance a hundred times better than what is 
currently achieved, we would still be wary to accept them.

Please note that this is not because of lack of trust or better yet greed for 
your code. We need to make sure 
that under no circumstances our codebase is in danger because something 
important was left out along the way.

Maintenance of generated code is yet another nuissance that should better be 
strongly justified.

-- 
Best regards,
Łukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread stefan brunthaler

 But let me put this straight: as an open-source project, we are hesitant to
 accept changes which depend on closed software. Even if your optimization
 techniques would result in performance a hundred times better than what is
 currently achieved, we would still be wary to accept them.

 Please note that this is not because of lack of trust or better yet greed
 for your code. We need to make sure
 that under no circumstances our codebase is in danger because something
 important was left out along the way.

I am positive that the code generator does not depend on any closed
source components, I just juse mako for storing the C code templates
that I generate -- everything else I wrote myself.
Of course, I'll give the code generator to pydev, too, if necessary.
However, I need to strip it down, so that it does not do all the other
stuff that you don't need. I just wanted to give you the
implementation now, since Benjamin said that he wants to see real code
and results first. If you want to integrate the inca-optimization, I
am going to start working on this asap.


 Maintenance of generated code is yet another nuissance that should better be
 strongly justified.

I agree, but the nice thing is that the technique is very simple: only
if you changed a significant part of the interpreter implementation's,
you'd need to change the optimized derivatives, too. If one generates
the default interpreter implementation, too, then one gets the
optimizations almost for free. For maintenance reasons I chose to use
a template-based system, too, since this gives you a direct
correspondence between the actual code and what's generated, without
interfering with the code generator at all.

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Guido van Rossum

Let's make one thing clear. The Python core developers need to be able
to reproduce your results from scratch, and that means access to the
templates, code generators, inputs, and everything else you used. (Of
course for stuff you didn't write that's already open source, all we
need is a pointer to the open source project and the exact
version/configuration you used, plus any local mods you made.)

I understand that you're hesitant to just dump your current mess, and
you want to clean it up before you show it to us. That's fine. But
until you're ready to show it, we're not going to integrate any of
your work into CPython, even though some of us (maybe Benjamin) may be
interested in kicking its tires. And remember, it doesn't need to be
perfect (in fact perfectionism is probably a bad idea here). But it
does need to be open source. Every single bit of it. (And no GPL,
please.)

--Guido

2012/2/1 stefan brunthaler s.bruntha...@uci.edu:
 But let me put this straight: as an open-source project, we are hesitant to
 accept changes which depend on closed software. Even if your optimization
 techniques would result in performance a hundred times better than what is
 currently achieved, we would still be wary to accept them.

 Please note that this is not because of lack of trust or better yet greed
 for your code. We need to make sure
 that under no circumstances our codebase is in danger because something
 important was left out along the way.

 I am positive that the code generator does not depend on any closed
 source components, I just juse mako for storing the C code templates
 that I generate -- everything else I wrote myself.
 Of course, I'll give the code generator to pydev, too, if necessary.
 However, I need to strip it down, so that it does not do all the other
 stuff that you don't need. I just wanted to give you the
 implementation now, since Benjamin said that he wants to see real code
 and results first. If you want to integrate the inca-optimization, I
 am going to start working on this asap.


 Maintenance of generated code is yet another nuissance that should better be
 strongly justified.

 I agree, but the nice thing is that the technique is very simple: only
 if you changed a significant part of the interpreter implementation's,
 you'd need to change the optimized derivatives, too. If one generates
 the default interpreter implementation, too, then one gets the
 optimizations almost for free. For maintenance reasons I chose to use
 a template-based system, too, since this gives you a direct
 correspondence between the actual code and what's generated, without
 interfering with the code generator at all.

 --stefan
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread stefan brunthaler

On Wed, Feb 1, 2012 at 09:46, Guido van Rossum gu...@python.org wrote:
 Let's make one thing clear. The Python core developers need to be able
 to reproduce your results from scratch, and that means access to the
 templates, code generators, inputs, and everything else you used. (Of
 course for stuff you didn't write that's already open source, all we
 need is a pointer to the open source project and the exact
 version/configuration you used, plus any local mods you made.)

 I understand that you're hesitant to just dump your current mess, and
 you want to clean it up before you show it to us. That's fine. But
 until you're ready to show it, we're not going to integrate any of
 your work into CPython, even though some of us (maybe Benjamin) may be
 interested in kicking its tires. And remember, it doesn't need to be
 perfect (in fact perfectionism is probably a bad idea here). But it
 does need to be open source. Every single bit of it. (And no GPL,
 please.)

I understand all of these issues. Currently, it's not really a mess,
but much more complicated as it needs to be for only supporting the
inca optimization. I don't know  what the time frame for a possible
integration is (my guess is that it'd be safe anyways to disable it,
like the threaded code support was handled.)
As for the license: I really don't care about that at all, the only
thing nice to have would be to have a pointer to my home page and/or
the corresponding research, but that's about all on my wish list.

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Glyph Lefkowitz

On Feb 1, 2012, at 12:46 PM, Guido van Rossum wrote:

 I understand that you're hesitant to just dump your current mess, and
 you want to clean it up before you show it to us. That's fine. (...) And 
 remember, it doesn't need to be
 perfect (in fact perfectionism is probably a bad idea here).

Just as a general point of advice to open source contributors, I'd suggest 
erring on the side of the latter rather than the former suggestion here: dump 
your current mess, along with the relevant caveats (it's a mess, much of it is 
irrelevant) so that other developers can help you clean it up, rather than 
putting the entire burden of the cleanup on yourself.  Experience has taught me 
that most people who hold back work because it needs cleanup eventually run out 
of steam and their work never gets integrated and maintained.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Guido van Rossum

On Wed, Feb 1, 2012 at 11:08 AM, stefan brunthaler s.bruntha...@uci.edu wrote:
 On Wed, Feb 1, 2012 at 09:46, Guido van Rossum gu...@python.org wrote:
 Let's make one thing clear. The Python core developers need to be able
 to reproduce your results from scratch, and that means access to the
 templates, code generators, inputs, and everything else you used. (Of
 course for stuff you didn't write that's already open source, all we
 need is a pointer to the open source project and the exact
 version/configuration you used, plus any local mods you made.)

 I understand that you're hesitant to just dump your current mess, and
 you want to clean it up before you show it to us. That's fine. But
 until you're ready to show it, we're not going to integrate any of
 your work into CPython, even though some of us (maybe Benjamin) may be
 interested in kicking its tires. And remember, it doesn't need to be
 perfect (in fact perfectionism is probably a bad idea here). But it
 does need to be open source. Every single bit of it. (And no GPL,
 please.)

 I understand all of these issues. Currently, it's not really a mess,
 but much more complicated as it needs to be for only supporting the
 inca optimization. I don't know  what the time frame for a possible
 integration is (my guess is that it'd be safe anyways to disable it,
 like the threaded code support was handled.)

It won't be integrated until you have published your mess.

 As for the license: I really don't care about that at all, the only
 thing nice to have would be to have a pointer to my home page and/or
 the corresponding research, but that's about all on my wish list.

Please don't try to enforce that in the license. That usually
backfires. Use Apache 2, which is what the PSF prefers.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-31 Thread stefan brunthaler

 I assume yes here means yes, I'm aware and not yes, I'm using Python
 2, right? And you're building on top of the existing support for threaded
 code in order to improve it?

Your assumption is correct, I'm sorry for the sloppiness (I was
heading out for lunch.) None of the code is 2.x compatible, all of my
work has always targeted Python 3.x. My work does not improve threaded
code (as in interpreter dispatch technique), but enables efficient and
purely interpretative inline caching via quickening. (So, after
execution of BINARY_ADD, I rewrite the specific occurence of the
bytecode instruction to a, say, FLOAT_ADD instruction and ensure that
my assumption is correct in the FLOAT_ADD instruction.)

Thanks,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-31 Thread stefan brunthaler

 If I read the patch correctly, most of it is auto-generated (and there
 is probably a few spurious changes that blow it up, such as the
 python-gdb.py file).

Hm, honestly I don't know where the python-gdb.py file comes from, I
thought it came with the switch from 3.1 to the tip version I was
using. Anyways, I did not tuch it or at least have no recollection of
doing so. Regarding the spurious changes: This might very well be,
regression testing works, and it would actually be fairly easy to
figure out crashes (e.g., by tracing all executed bytecode
instructions and seeing if all of them are actually executed, I could
easily do that if wanted/necessary.)


 But the tool that actually generates the code
 doesn't seem to be included?  (Which means that in this form, the
 patch couldn't possibly be accepted.)

Well, the tool is not included because it does a lot more (e.g.,
generate the code for elimination of reference count operations.)
Unfortunately, my interpreter architecture that achieves the highest
speedups is more complicated, and I got the feeling that this is not
going well with python-dev. So, I had the idea of basically using just
one (but a major one) optimization technique and going with that. I
don't see why you would need my code generator, though. Not that I
cared, but I would need to strip down and remove many parts of it and
also make it more accessible to other people. However, if python-dev
decides that it wants to include the optimizations and requires the
code generator, I'll happily chip in the extra work an give you the
corresponding code generator, too.

Thanks,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-31 Thread Georg Brandl

Am 31.01.2012 16:46, schrieb stefan brunthaler:
 If I read the patch correctly, most of it is auto-generated (and there
 is probably a few spurious changes that blow it up, such as the
 python-gdb.py file).
 
 Hm, honestly I don't know where the python-gdb.py file comes from, I
 thought it came with the switch from 3.1 to the tip version I was
 using. Anyways, I did not tuch it or at least have no recollection of
 doing so. Regarding the spurious changes: This might very well be,
 regression testing works, and it would actually be fairly easy to
 figure out crashes (e.g., by tracing all executed bytecode
 instructions and seeing if all of them are actually executed, I could
 easily do that if wanted/necessary.)

There is also the issue of the two test modules removed from the
test suite.

 But the tool that actually generates the code
 doesn't seem to be included?  (Which means that in this form, the
 patch couldn't possibly be accepted.)

 Well, the tool is not included because it does a lot more (e.g.,
 generate the code for elimination of reference count operations.)
 Unfortunately, my interpreter architecture that achieves the highest
 speedups is more complicated, and I got the feeling that this is not
 going well with python-dev. So, I had the idea of basically using just
 one (but a major one) optimization technique and going with that. I
 don't see why you would need my code generator, though. Not that I
 cared, but I would need to strip down and remove many parts of it and
 also make it more accessible to other people. However, if python-dev
 decides that it wants to include the optimizations and requires the
 code generator, I'll happily chip in the extra work an give you the
 corresponding code generator, too.

Well, nobody wants to review generated code.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-31 Thread stefan brunthaler

 There is also the issue of the two test modules removed from the
 test suite.

Oh, I'm sorry, seems like the patch did contain too much of my
development stuff. (I did remove them before, because they were always
failing due to the instruction opcodes being changed because of
quickening; they pass the tests, though.)

 Well, nobody wants to review generated code.

I agree. The code generator basically uses templates that contain the
information and a dump of the C-structure of several types to traverse
and see which one of them implements which functions. There is really
no magic there, the most complex thing is to get the inline-cache
miss checks for function calls right. But I tried to make the
generated code look pretty, so that working with it is not too much of
a hassle. The code generator itself is a little bit more complicated,
so I am not sure it would help a lot...

best,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-31 Thread Stefan Behnel

stefan brunthaler, 31.01.2012 22:17:
 Well, nobody wants to review generated code.

 I agree. The code generator basically uses templates that contain the
 information and a dump of the C-structure of several types to traverse
 and see which one of them implements which functions. There is really
 no magic there, the most complex thing is to get the inline-cache
 miss checks for function calls right. But I tried to make the
 generated code look pretty, so that working with it is not too much of
 a hassle. The code generator itself is a little bit more complicated,
 so I am not sure it would help a lot...

How many times did you regenerate this code until you got it right? And how
do you know that you really got it so right that it was the last time ever
that you needed your generator for it? What if the C structure of any of
those several types ever changes?

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-30 Thread stefan brunthaler

Hello,

 Could you try benchmarking with the standard benchmarks:
 http://hg.python.org/benchmarks/
 and see what sort of performance gains you get?

Yeah, of course. I already did. Refere to the page listed below for
details. I did not look into the results yet, though.


 How portable is the threaded interpreter?

Well, you can implement threaded code on any machine that support
indirect branch instructions. Fortunately, GCC supports the
label-as-values feature, which makes it available on any machine
that supports GCC. My optimizations themselves are portable, and I
tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports
this feature, too.)


 Do you have a public repository for the code, so we can take a look?

I have created a patch (as Benjamin wanted) and put all of the
resources (i.e., benchmark results and the patch itself) on my home
page:
http://www.ics.uci.edu/~sbruntha/pydev.html


Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-30 Thread Antoine Pitrou


Hello,

 Well, you can implement threaded code on any machine that support
 indirect branch instructions. Fortunately, GCC supports the
 label-as-values feature, which makes it available on any machine
 that supports GCC. My optimizations themselves are portable, and I
 tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports
 this feature, too.)

Well, you're aware that Python already uses threaded code where
available? Or are you testing against Python 2?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-30 Thread stefan brunthaler

 Well, you're aware that Python already uses threaded code where
 available? Or are you testing against Python 2?

Yes, and I am building on that.

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-30 Thread Stefan Behnel

stefan brunthaler, 30.01.2012 20:18:
 Well, you're aware that Python already uses threaded code where
 available? Or are you testing against Python 2?

 Yes, and I am building on that.

I assume yes here means yes, I'm aware and not yes, I'm using Python
2, right? And you're building on top of the existing support for threaded
code in order to improve it?

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-30 Thread Georg Brandl

Am 30.01.2012 20:06, schrieb stefan brunthaler:

 Do you have a public repository for the code, so we can take a look?

 I have created a patch (as Benjamin wanted) and put all of the
 resources (i.e., benchmark results and the patch itself) on my home
 page:
 http://www.ics.uci.edu/~sbruntha/pydev.html

If I read the patch correctly, most of it is auto-generated (and there
is probably a few spurious changes that blow it up, such as the
python-gdb.py file).  But the tool that actually generates the code
doesn't seem to be included?  (Which means that in this form, the
patch couldn't possibly be accepted.)

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-28 Thread Mark Shannon


stefan brunthaler wrote:

Hi,

On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote:

2011/11/8 stefan brunthaler s.bruntha...@uci.edu:

How does that sound?

I think I can hear real patches and benchmarks most clearly.


I spent the better part of my -20% time on implementing the work as
suggested. Please find the benchmarks attached to this email, I just


Could you try benchmarking with the standard benchmarks:
http://hg.python.org/benchmarks/
and see what sort of performance gains you get?


did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
off the regular 3.3a0 default tip changeset 73977 shortly after your
email. I do not have an official patch yet, but am going to create one
if wanted. Changes to the existing interpreter are minimal, the
biggest chunk is a new interpreter dispatch loop.


How portable is the threaded interpreter?

Do you have a public repository for the code, so we can take a look?

Cheers,
Mark.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-27 Thread stefan brunthaler

Hi,

On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote:
 2011/11/8 stefan brunthaler s.bruntha...@uci.edu:
 How does that sound?

 I think I can hear real patches and benchmarks most clearly.

I spent the better part of my -20% time on implementing the work as
suggested. Please find the benchmarks attached to this email, I just
did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
off the regular 3.3a0 default tip changeset 73977 shortly after your
email. I do not have an official patch yet, but am going to create one
if wanted. Changes to the existing interpreter are minimal, the
biggest chunk is a new interpreter dispatch loop.

Merging dispatch loops eliminates some of my optimizations, but my
inline caching technique enables inlining some functionality, which
results in visible speedups. The code is normalized to the
non-threaded-code version of the CPython interpreter (named
vanilla), so that I can reference it to my preceding results. I
anticipate *no* compatibility issues and the interpreter requires less
than 100 KiB of extra memory at run-time. Since my interpreter is
using 215 of a maximum of 255 instructions, there is room for adding
additional derivatives, e.g., for popular Python libraries, too.


Let me know what python-dev thinks of this and have a nice weekend,
--stefan

PS: AFAIR the version without partial stack frame caching also passes
all regression tests modulo the ones that test against specific
bytecodes.
currently processing:  bench/binarytrees.py3.py
phd-cpy-3a0-thr-cod-pytho  arg: 10 | time:   0.161876  | stdev:  
0.007780 | var:  0.61 | mem:   6633.60
phd-cpy-3a0-thr-cod-pytho  arg: 12 | time:   0.699243  | stdev:  
0.019112 | var:  0.000365 | mem:   8142.67
phd-cpy-3a0-thr-cod-pytho  arg: 14 | time:   3.388344  | stdev:  
0.048042 | var:  0.002308 | mem:  13586.93
phd-cpy-pio-sne-pre-pyt-no-psf arg: 10 | time:   0.153875  | stdev:  
0.003828 | var:  0.15 | mem:   6873.73
phd-cpy-pio-sne-pre-pyt-no-psf arg: 12 | time:   0.632572  | stdev:  
0.019121 | var:  0.000366 | mem:   8246.27
phd-cpy-pio-sne-pre-pyt-no-psf arg: 14 | time:   3.020988  | stdev:  
0.043483 | var:  0.001891 | mem:  13640.27
phd-cpy-pio-sne-pre-pytho  arg: 10 | time:   0.150942  | stdev:  
0.005157 | var:  0.27 | mem:   6901.87
phd-cpy-pio-sne-pre-pytho  arg: 12 | time:   0.660841  | stdev:  
0.020538 | var:  0.000422 | mem:   8286.80
phd-cpy-pio-sne-pre-pytho  arg: 14 | time:   3.184198  | stdev:  
0.051103 | var:  0.002612 | mem:  13680.40
phd-cpy-3a0-van-pytho  arg: 10 | time:   0.202812  | stdev:  
0.005480 | var:  0.30 | mem:   6633.33
phd-cpy-3a0-van-pytho  arg: 12 | time:   0.908456  | stdev:  
0.015744 | var:  0.000248 | mem:   8153.07
phd-cpy-3a0-van-pytho  arg: 14 | time:   4.364805  | stdev:  
0.037522 | var:  0.001408 | mem:  13593.60
### phd-cpy-3a0-thr-cod-pytho :  1.2887 (avg-sum:   1.416488)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.4383 (avg-sum:   1.269145)
### phd-cpy-pio-sne-pre-pytho :  1.3704 (avg-sum:   1.331994)
### phd-cpy-3a0-van-pytho :  1. (avg-sum:   1.825358)
currently processing:  bench/fannkuch.py3.py
phd-cpy-3a0-thr-cod-pytho  arg:  8 | time:   0.172677  | stdev:  
0.006620 | var:  0.44 | mem:   6424.13
phd-cpy-3a0-thr-cod-pytho  arg:  9 | time:   1.426755  | stdev:  
0.035545 | var:  0.001263 | mem:   6425.20
phd-cpy-pio-sne-pre-pyt-no-psf arg:  8 | time:   0.168010  | stdev:  
0.010277 | var:  0.000106 | mem:   6481.07
phd-cpy-pio-sne-pre-pyt-no-psf arg:  9 | time:   1.345817  | stdev:  
0.033127 | var:  0.001097 | mem:   6479.60
phd-cpy-pio-sne-pre-pytho  arg:  8 | time:   0.165876  | stdev:  
0.007136 | var:  0.51 | mem:   6520.00
phd-cpy-pio-sne-pre-pytho  arg:  9 | time:   1.351150  | stdev:  
0.028822 | var:  0.000831 | mem:   6519.73
phd-cpy-3a0-van-pytho  arg:  8 | time:   0.216146  | stdev:  
0.012879 | var:  0.000166 | mem:   6419.07
phd-cpy-3a0-van-pytho  arg:  9 | time:   1.834247  | stdev:  
0.028224 | var:  0.000797 | mem:   6418.67
### phd-cpy-3a0-thr-cod-pytho :  1.2820 (avg-sum:   0.799716)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.3544 (avg-sum:   0.756913)
### phd-cpy-pio-sne-pre-pytho :  1.3516 (avg-sum:   0.758513)
### phd-cpy-3a0-van-pytho :  1. (avg-sum:   1.025197)
currently processing:  bench/fasta.py3.py
phd-cpy-3a0-thr-cod-pytho  arg:  5 | time:   0.374023  | stdev:  
0.010870 | var:  0.000118 | mem:   6495.07
phd-cpy-3a0-thr-cod-pytho  arg: 10 | time:   0.714577  | stdev:  
0.024713 | var:  0.000611 | mem:   6495.47
phd-cpy-3a0-thr-cod-pytho  arg: 15 | time:   1.062866  | stdev:  
0.040138 | var:  0.001611 | mem:   6496.27
phd-cpy-pio-sne-pre-pyt-no-psf arg:  5 | time:   0.345621  | stdev:  
0.022549 | var:  0.000508 | mem:   6551.87
phd-cpy-pio-sne-pre-pyt-no-psf

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-01-27 Thread Benjamin Peterson

2012/1/27 stefan brunthaler s.bruntha...@uci.edu:
 Hi,

 On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote:
 2011/11/8 stefan brunthaler s.bruntha...@uci.edu:
 How does that sound?

 I think I can hear real patches and benchmarks most clearly.

 I spent the better part of my -20% time on implementing the work as
 suggested. Please find the benchmarks attached to this email, I just
 did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
 off the regular 3.3a0 default tip changeset 73977 shortly after your
 email. I do not have an official patch yet, but am going to create one
 if wanted. Changes to the existing interpreter are minimal, the
 biggest chunk is a new interpreter dispatch loop.

 Merging dispatch loops eliminates some of my optimizations, but my
 inline caching technique enables inlining some functionality, which
 results in visible speedups. The code is normalized to the
 non-threaded-code version of the CPython interpreter (named
 vanilla), so that I can reference it to my preceding results. I
 anticipate *no* compatibility issues and the interpreter requires less
 than 100 KiB of extra memory at run-time. Since my interpreter is
 using 215 of a maximum of 255 instructions, there is room for adding
 additional derivatives, e.g., for popular Python libraries, too.


 Let me know what python-dev thinks of this and have a nice weekend,

Cool. It'd be nice to see a patch.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python 3 optimizations, continued, continued again...

2011-11-08 Thread stefan brunthaler

Hi guys,

while there is at least some interest in incorporating my
optimizations, response has still been low. I figure that the changes
are probably too much for a single big incorporation step. On a recent
flight, I thought about cutting it down to make it more easily
digestible. The basic idea is to remove the optimized interpreter
dispatch loop and advanced instruction format and use the existing
ones. Currently (rev. ca8a0dfb2176), opcode.h uses 109 of potentially
available 255 instructions using the current instruction format.
Hence, up to 149 instruction opcodes could be given to optimized
instruction derivatives. Consequently, a possible change would require
to change:
  a) opcode.h to add new instruction opcodes,
  b) ceval.c to include the new instruction opcodes in PyEval_EvalFrameEx,
  c) abstract.c, object.c (possible other files) to add the
quickening/rewriting function calls.

If this is more interesting, I could start evaluating which
instruction opcodes should be allocated to which derivatives to get
the biggest benefit. This is a lot easier to implement (because I can
re-use the existing instruction implementations) and can easily be
made to be conditionally compile-able, similar to the computed-gotos
option. Since the changes are minimal it is also simpler to understand
and deal with for everybody else, too. On the downside, however, not
all optimizations are possible and/or make sense in the given limit of
instructions (no data-object inlining and no reference-count
elimination.)

How does that sound?

Have a nice day,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2011-11-08 Thread Benjamin Peterson

2011/11/8 stefan brunthaler s.bruntha...@uci.edu:
 How does that sound?

I think I can hear real patches and benchmarks most clearly.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread Stefan Behnel


stefan brunthaler, 02.09.2011 06:37:

as promised, I created a publicly available preview of an
implementation with my optimizations, which is available under the
following location:
https://bitbucket.org/py3_pio/preview/wiki/Home

I followed Nick's advice and added some valuable advice and
overview/introduction at the wiki page the link points to, I am
positive that spending 10mins reading this will provide you with a
valuable information regarding what's happening.


It does, thanks.

A couple of remarks:

1) The SFC optimisation is purely based on static code analysis, right? I 
assume it takes loops into account (and just multiplies scores for inner 
loops)? Is that what you mean with nesting level? Obviously, static 
analysis can sometimes be misleading, e.g. when there's a rare special case 
with lots of loops that needs to adapt input data in some way, but in 
general, I'd expect that this heuristic would tend to hit the important 
cases, especially for well structured code with short functions.


2) The RC elimination is tricky to get right and thus somewhat dangerous, 
but sounds worthwhile and should work particularly well on a stack based 
byte code interpreter like CPython.


3) Inline caching also sounds worthwhile, although I wonder how large the 
savings will be here. You'd save a couple of indirect jumps at the C-API 
level, sure, but apart from that, my guess is that it would highly depend 
on the type of instruction. Certain (repeated) calls to C implemented 
functions would likely benefit quite a bit, for example, which would be a 
nice optimisation by itself, e.g. for builtins. I would expect that the 
same applies to iterators, even a couple of percent faster iteration can 
make a great deal of a difference, and a substantial set of iterators are 
implemented in C, e.g. itertools, range, zip and friends.


I'm not so sure about arithmetic operations. In Cython, we (currently?) do 
not optimistically replace these with more specific code (unless we know 
the types at compile time), because it complicates the generated C code and 
indirect jumps aren't all that slow that the benefit would be important. 
Savings are *much* higher when data can be unboxed, so much that the slight 
improvement for optimistic type guesses is totally dwarfed in Cython. I 
would expect that the return of investment is better when the types are 
actually known at runtime, as in your case.


4) Regarding inlined object references, I would expect that it's much more 
worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess 
that this would be best helped by watching the module dict and the builtin 
dict internally and invalidating the interpreter state after changes (e.g. 
by providing a change counter in those dicts and checking that in the 
instructions that access them), and otherwise keeping the objects cached. 
Simply watching the dedicated instructions that change that state isn't 
enough as Python allows code to change these dicts directly through their 
dict interface.


All in all, your list does sound like an interesting set of changes that 
are both understandable and worthwhile.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread stefan brunthaler

 as promised, I created a publicly available preview of an
 implementation with my optimizations, which is available under the
 following location:
 https://bitbucket.org/py3_pio/preview/wiki/Home

One very important thing that I forgot was to indicate that you have
to use computed gotos (i.e., configure --with-computed-gotos),
otherwise it won't work (though I think that most people can figure
this out easily, knowing this a priori isn't too bad.)

Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread stefan brunthaler

 1) The SFC optimisation is purely based on static code analysis, right? I
 assume it takes loops into account (and just multiplies scores for inner
 loops)? Is that what you mean with nesting level? Obviously, static
 analysis can sometimes be misleading, e.g. when there's a rare special case
 with lots of loops that needs to adapt input data in some way, but in
 general, I'd expect that this heuristic would tend to hit the important
 cases, especially for well structured code with short functions.

Yes, currently I only use the heuristic to statically estimate utility
of assigning an optimized slot to a local variable. And, another yes,
nested blocks (like for-statements) is what I have in mind when using
nesting level. I was told that the algorithm itself is very similar
to linear scan register allocation, modulo the ability to spill
values, of course.
From my benchmarks and in-depth analysis of several programs, I found
this to work very well. In fact, the only situation I found is
(unfortunately) one of the top-most executed functions in US'
bm_django.py: There is one loop that gets almost never executed but
this loop gives precedence to local variables used inside. Because of
this, I have already an idea for a better approach: first, use the
static heuristic to compute stack slot score, then count back-branches
(I would need this anyways, as the _Py_CheckInterval has gone and
OSR/hot-swapping is in general a good idea) and record their
frequency. Next, just replace the current static weight of 100 by the
dynamically recorded weight. Consequently, you should get better
allocations. (Please note that I did some quantitative analysis of
bython functions to determine that using 4 SFC-slots covers a
substantial amount of functions [IIRC 95%] with the trivial scenario
when there are at most 4 local variables.)


 2) The RC elimination is tricky to get right and thus somewhat dangerous,
 but sounds worthwhile and should work particularly well on a stack based
 byte code interpreter like CPython.

Well, it was very tricky to get right when I implemented it first
around Christmas 2009. The current implementation is reasonably simple
to understand, however, it depends on the function refcount_effect to
give me correct information at all times. I got the biggest
performance improvement on one benchmark on the PowerPC and think that
RISC architectures in general benefit more from this optimization
(eliminate the load, add and store instructions) than x86 CISCs do (an
INCREF is just an add on the memory location without data
dependencies, so fairly cheap). In any case, however, you get the
replication effect of improving CPU branch predicion by having these
additional instruction derivatives. It would be interesting
(research-wise, too) to be able to measure whether the reduction in
memory operations makes Python programs use less energy, and if so,
how much the difference is.


 3) Inline caching also sounds worthwhile, although I wonder how large the
 savings will be here. You'd save a couple of indirect jumps at the C-API
 level, sure, but apart from that, my guess is that it would highly depend on
 the type of instruction. Certain (repeated) calls to C implemented functions
 would likely benefit quite a bit, for example, which would be a nice
 optimisation by itself, e.g. for builtins. I would expect that the same
 applies to iterators, even a couple of percent faster iteration can make a
 great deal of a difference, and a substantial set of iterators are
 implemented in C, e.g. itertools, range, zip and friends.

 I'm not so sure about arithmetic operations. In Cython, we (currently?) do
 not optimistically replace these with more specific code (unless we know the
 types at compile time), because it complicates the generated C code and
 indirect jumps aren't all that slow that the benefit would be important.
 Savings are *much* higher when data can be unboxed, so much that the slight
 improvement for optimistic type guesses is totally dwarfed in Cython. I
 would expect that the return of investment is better when the types are
 actually known at runtime, as in your case.

Well, in my thesis I already hint at another improvement of the
existing design that can work on unboxed data as well (while still
being an interpreter.) I am eager to try this, but don't know how much
time I can spend on this (because there are several other research
projects I am actively involved in.) In my experience, this works very
well and you cannot actually report good speedups without
inline-caching arithmetic operations, simply because that's where all
JITs shine and most benchmarks don't reflect real world scenarios but
mathematics-inclined microbenchmarks. Also, if in the future compilers
(gcc and clang) will be able to inline the invoked functions, higher
speedups will be possible.



 4) Regarding inlined object references, I would expect that it's much more
 worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST.

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread Stefan Behnel


stefan brunthaler, 02.09.2011 17:55:

4) Regarding inlined object references, I would expect that it's much more
worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess
that this would be best helped by watching the module dict and the builtin
dict internally and invalidating the interpreter state after changes (e.g.
by providing a change counter in those dicts and checking that in the
instructions that access them), and otherwise keeping the objects cached.
Simply watching the dedicated instructions that change that state isn't
enough as Python allows code to change these dicts directly through their
dict interface.

 [...]
Thanks for the pointers to the dict stuff, I will take a look (IIRC,
Antoine pointed me in the same direction last year, but I think the
design was slightly different then),


Not unlikely, Antoine tends to know the internals pretty well.

The Cython project has been (hand wavingly) thinking about this also: 
implement our own module type with its own __setattr__ (and dict proxy) in 
order to speed up access to the globals in the *very* likely case that they 
rarely or never change after module initialisation time and that most 
critical code accesses them read-only from within functions. If it turns 
out that this makes sense for CPython in general, it wouldn't be a bad idea 
to join forces at some point in order to make this readily usable for both 
sides.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread Maciej Fijalkowski


 For a comparative real world benchmark I tested Martin von Loewis'
 django port (there are not that many meaningful Python 3 real world
 benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
 well, US got a speedup of 1.35 on this benchmark. I just checked that
 pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
 seem to be not working currently or *really* fast...), but I cannot
 tell directly how that relates to speedups (it just says less is
 better and I did not quickly find an explanation).
 Since I did this benchmark last year, I have spent more time
 investigating this benchmark and found that I could do better, but I
 would have to guess as to how much (An interesting aside though: on
 this benchmark, the executable never grew on more than 5 megs of
 memory usage, exactly like the vanilla Python 3 interpreter.)


PyPy is ~12x faster on the django benchmark FYI
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread Stefan Behnel


Maciej Fijalkowski, 02.09.2011 20:42:

For a comparative real world benchmark I tested Martin von Loewis'
django port (there are not that many meaningful Python 3 real world
benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
well, US got a speedup of 1.35 on this benchmark. I just checked that
pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
seem to be not working currently or *really* fast...), but I cannot
tell directly how that relates to speedups (it just says less is
better and I did not quickly find an explanation).


PyPy is ~12x faster on the django benchmark FYI


FYI, there's a recent thread up on the pypy ML where someone is complaining 
about PyPy being substantially slower than CPython when running Django on 
top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the 
benchmark results are not comparable anyway.


As usual, benchmark results depend on what you do in your benchmarks.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-02 Thread Maciej Fijalkowski

On Fri, Sep 2, 2011 at 9:20 PM, Stefan Behnel stefan...@behnel.de wrote:
 Maciej Fijalkowski, 02.09.2011 20:42:

 For a comparative real world benchmark I tested Martin von Loewis'
 django port (there are not that many meaningful Python 3 real world
 benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
 well, US got a speedup of 1.35 on this benchmark. I just checked that
 pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
 seem to be not working currently or *really* fast...), but I cannot
 tell directly how that relates to speedups (it just says less is
 better and I did not quickly find an explanation).

 PyPy is ~12x faster on the django benchmark FYI

 FYI, there's a recent thread up on the pypy ML where someone is complaining
 about PyPy being substantially slower than CPython when running Django on
 top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the
 benchmark results are not comparable anyway.

Yes, sqlite is slow. It's also much faster in trunk than in 1.6 and
there is an open ticket about it :)

The django benchmark is just templating, so it does not involve a database.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Ned Batchelder


On 8/30/2011 4:41 PM, stefan brunthaler wrote:

Ok, there there's something else you haven't told us. Are you saying
that the original (old) bytecode is still used (and hence written to
and read from .pyc files)?


Short answer: yes.
Long answer: I added an invocation counter to the code object and keep
interpreting in the usual Python interpreter until this counter
reaches a configurable threshold. When it reaches this threshold, I
create the new instruction format and interpret with this optimized
representation. All the macros look exactly the same in the source
code, they are just redefined to use the different instruction format.
I am at no point serializing this representation or the runtime
information gathered by me, as any subsequent invocation might have
different characteristics.
When the switchover to the new instruction format happens, what happens 
to sys.settrace() tracing?  Will it report the same sequence of line 
numbers?  For a small but important class of program executions, this is 
more important than speed.


--Ned.


Best,
--stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Cesare Di Mauro

2011/9/1 Ned Batchelder n...@nedbatchelder.com

 When the switchover to the new instruction format happens, what happens to
 sys.settrace() tracing?  Will it report the same sequence of line numbers?
  For a small but important class of program executions, this is more
 important than speed.

  --Ned


A simple solution: when tracing is enabled, the new instruction format will
never be executed (and information tracking disabled as well).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Mark Shannon


Cesare Di Mauro wrote:
2011/9/1 Ned Batchelder n...@nedbatchelder.com 
mailto:n...@nedbatchelder.com


When the switchover to the new instruction format happens, what
happens to sys.settrace() tracing?  Will it report the same sequence
of line numbers?  For a small but important class of program
executions, this is more important than speed.

 --Ned


A simple solution: when tracing is enabled, the new instruction format 
will never be executed (and information tracking disabled as well).


What happens if tracing is enabled *during* the execution of the new 
instruction format?
Some sort of deoptimisation will be required in order to recover the 
correct VM state.


Cheers,
Mark.


Regards,
Cesare




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Cesare Di Mauro

2011/9/1 Mark Shannon m...@hotpy.org

 Cesare Di Mauro wrote:

 2011/9/1 Ned Batchelder n...@nedbatchelder.com mailto:
 n...@nedbatchelder.com


When the switchover to the new instruction format happens, what
happens to sys.settrace() tracing?  Will it report the same sequence
of line numbers?  For a small but important class of program
executions, this is more important than speed.

 --Ned


 A simple solution: when tracing is enabled, the new instruction format
 will never be executed (and information tracking disabled as well).

  What happens if tracing is enabled *during* the execution of the new
 instruction format?
 Some sort of deoptimisation will be required in order to recover the
 correct VM state.

 Cheers,
 Mark.


Sure. I don't think that the regular ceval.c loop will be dropped when
executing the new instruction format, so we can intercept a change like
this using the why variable, for example, or something similar that is
normally used to break the regular loop execution.

Anyway, we need to take a look at the code.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Glyph Lefkowitz


On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:

 A simple solution: when tracing is enabled, the new instruction format will 
 never be executed (and information tracking disabled as well).

Correct me if I'm wrong: doesn't this mean that no profiler will accurately be 
able to measure the performance impact of the new instruction format, and 
therefore one may get incorrect data when on is trying to make a CPU 
optimization for real-world performance?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Guido van Rossum

On Thu, Sep 1, 2011 at 10:15 AM, Glyph Lefkowitz
gl...@twistedmatrix.com wrote:

 On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:

 A simple solution: when tracing is enabled, the new instruction format will
 never be executed (and information tracking disabled as well).

 Correct me if I'm wrong: doesn't this mean that no profiler will accurately
 be able to measure the performance impact of the new instruction format, and
 therefore one may get incorrect data when on is trying to make a CPU
 optimization for real-world performance?

Well, profilers already skew results by adding call overhead. But
tracing for debugging and profiling don't do exactly the same thing:
debug tracing stops at every line, but profiling only executes hooks
at the start and end of a function(*). So I think the function body
could still be executed using the new format (assuming this is turned
on/off per code object anyway).

(*) And whenever a generator yields or is resumed. I consider that an
annoying bug though, just as the debugger doesn't do the right thing
with yield -- there's no way to continue until the yielding generator
is resumed short of setting a manual breakpoint on the next line.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread stefan brunthaler

Hi,

as promised, I created a publicly available preview of an
implementation with my optimizations, which is available under the
following location:
https://bitbucket.org/py3_pio/preview/wiki/Home

I followed Nick's advice and added some valuable advice and
overview/introduction at the wiki page the link points to, I am
positive that spending 10mins reading this will provide you with a
valuable information regarding what's happening.
In addition, as Guido already mentioned, this is more or less a direct
copy of my research-branch without some of my private comments and
*no* additional refactorings because of software-engineering issues
(which I am very much aware of.)

I hope this clarifies a *lot* and makes it easier to see what parts
are involved and how all the pieces fit together.

I hope you'll like it,
have fun,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Stefan Behnel


stefan brunthaler, 30.08.2011 22:41:

Ok, there there's something else you haven't told us. Are you saying
that the original (old) bytecode is still used (and hence written to
and read from .pyc files)?


Short answer: yes.
Long answer: I added an invocation counter to the code object and keep
interpreting in the usual Python interpreter until this counter
reaches a configurable threshold. When it reaches this threshold, I
create the new instruction format and interpret with this optimized
representation. All the macros look exactly the same in the source
code, they are just redefined to use the different instruction format.
I am at no point serializing this representation or the runtime
information gathered by me, as any subsequent invocation might have
different characteristics.


So, basically, you built a JIT compiler but don't want to call it that, 
right? Just because it compiles byte code to other byte code rather than to 
native CPU instructions does not mean it doesn't compile Just In Time.


That actually sounds like a nice feature in general. It could even replace 
(or accompany?) the existing peep hole optimiser as part of a more general 
optimisation architecture, in the sense that it could apply byte code 
optimisations at runtime rather than compile time, potentially based on 
better knowledge about what's actually going on.




I will remove my development commentaries and create a private
repository at bitbucket


I agree with the others that it's best to open up your repository for 
everyone who is interested. I can see no reason why you would want to close 
it back down once it's there.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread stefan brunthaler

 I think that you must deal with big endianess because some RISC can't handle
 at all data in little endian format.

 In WPython I have wrote some macros which handle both endianess, but lacking
 big endian machines I never had the opportunity to verify if something was
 wrong.

I am sorry for the temporal lapse of not getting back to this directly
yesterday, we were just heading out for lunch and I figured it only
out then but immediately forgot it on our way back to the lab...

So, as I have already said, I evaluated my optimizations on x86
(little-endian) and PowerPC 970 (big-endian) and I did not have to
change any of my instruction decoding during interpretation. (The only
nasty bug I still remember vividly was that while on gcc for x86 the
data type char defaults to signed, whereas it defaults to unsigned on
PowerPC's gcc.) When I have time and access to a PowerPC machine again
(an ARM might be interesting, too), I will take a look at the
generated assembly code to figure out why this is working. (I have
some ideas why it might work without changing the code.)

If I run into any problems, I'll gladly contact you :)

BTW: AFAIR, we emailed last year regarding wpython and IIRC your
optimizations could primarily be summarized as clever
superinstructions. I have not implemented anything in that area at all
(and have in fact not even touched the compiler and its peephole
optimizer), but if parts my implementation gets in, I am sure that you
could add some of your work on top of that, too.

Cheers,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread stefan brunthaler

 So, basically, you built a JIT compiler but don't want to call it that,
 right? Just because it compiles byte code to other byte code rather than to
 native CPU instructions does not mean it doesn't compile Just In Time.

For me, a definition of a JIT compiler or any dynamic compilation
subsystem entails that native machine code is generated at run-time.
Furthermore, I am not compiling from bytecode to bytecode, but rather
changing the instruction encoding underneath and use subsequently use
quickening to optimize interpretation. But, OTOH, I am not aware of a
canonical definition of JIT compilation, so it depends ;)


 I agree with the others that it's best to open up your repository for
 everyone who is interested. I can see no reason why you would want to close
 it back down once it's there.

Well, my code has primarily been a vehicle for my research in that
area and thus is not immediately suited to adoption (it does not
adhere to Python C coding standards, contains lots of private comments
about various facts, debugging hints, etc.). The explanation for this
is easy: When I started out on my research it was far from clear that
it would be successful and really that much faster. So, I would like
to clean up the comments and some parts of the code and publish the
code I have without any of the clean-up work for naming conventions,
etc., so that you can all take a look and it is clear what it's all
about. After that we can then have a factual discussion about whether
it fits the bill for you, too, and if so, which changes (naming
conventions, extensive documentation, etc.) are necessary *before* any
adoption is reasonable for you, too.

That seems to be a good way to start off and get results and feedback
quickly, any ideas/complaints/comments/suggestions?

Best regards,
--stefan

PS: I am using Nick's suggested plan to incorporate my changes
directly to the most recent version, as mine is currently only running
on Python 3.1.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Guido van Rossum

On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
cesare.di.ma...@gmail.com wrote:
 It isn't, because motivation to do something new with CPython vanishes, at
 least on some areas (virtual machine / ceval.c), even having some ideas to
 experiment with. That's why in my last talk on EuroPython I decided to move
 on other areas (Python objects).

Cesare, I'm really sorry that you became so disillusioned that you
abandoned wordcode. I agree that we were too optimistic about Unladen
Swallow. Also that the existence of PyPy and its PR machine (:-)
should not stop us from improving CPython.

I'm wondering if, with your experience in creating WPython, you could
review Stefan Brunthaler's code and approach (once he's put it up for
review) and possibly the two of you could even work on a joint
project?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Guido van Rossum

On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler
ste...@brunthaler.net wrote:
 Well, my code has primarily been a vehicle for my research in that
 area and thus is not immediately suited to adoption [...].

But if you want to be taken seriously as a researcher, you should
publish your code! Without publication of your *code* research in your
area cannot be reproduced by others, so it is not science. Please stop
being shy and open up what you have. The software engineering issues
can be dealt with separately!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro

2011/8/31 stefan brunthaler ste...@brunthaler.net

  I think that you must deal with big endianess because some RISC can't
 handle
  at all data in little endian format.
 
  In WPython I have wrote some macros which handle both endianess, but
 lacking
  big endian machines I never had the opportunity to verify if something
 was
  wrong.
 
 I am sorry for the temporal lapse of not getting back to this directly
 yesterday, we were just heading out for lunch and I figured it only
 out then but immediately forgot it on our way back to the lab...

 So, as I have already said, I evaluated my optimizations on x86
 (little-endian) and PowerPC 970 (big-endian) and I did not have to
 change any of my instruction decoding during interpretation. (The only
 nasty bug I still remember vividly was that while on gcc for x86 the
 data type char defaults to signed, whereas it defaults to unsigned on
 PowerPC's gcc.) When I have time and access to a PowerPC machine again
 (an ARM might be interesting, too), I will take a look at the
 generated assembly code to figure out why this is working. (I have
 some ideas why it might work without changing the code.)

 If I run into any problems, I'll gladly contact you :)

 BTW: AFAIR, we emailed last year regarding wpython and IIRC your
 optimizations could primarily be summarized as clever
 superinstructions. I have not implemented anything in that area at all
 (and have in fact not even touched the compiler and its peephole
 optimizer), but if parts my implementation gets in, I am sure that you
 could add some of your work on top of that, too.

  Cheers,
 --stefan


You're right. I took a look at our old e-mails, and I found more details
about your work. It's definitely not affected by processor endianess, so you
don't need any check: it just works, because you'll produce the new opcodes
in memory, and consume them in memory as well.

Looking at your examples, I think that WPython wordcodes usage can be useful
only for the most simple ones. That's because superinstructions group
together several actions that need to be splitted again to simpler ones by a
tracing-JIT/compiler like your, if you want to keep it simple. You said that
you added about 400 specialized instructions last year with the usual
bytecodes, but wordcodes will require quite more (this can compromise
performance on CPU with small data caches).

So I think that it'll be better to finish your work, with all tests passed,
before thinking about adding something on top (that, for me, sounds like a
machine code JIT O:-)

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro

2011/8/31 Guido van Rossum gu...@python.org

 On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
 cesare.di.ma...@gmail.com wrote:
  It isn't, because motivation to do something new with CPython vanishes,
 at
  least on some areas (virtual machine / ceval.c), even having some ideas
 to
  experiment with. That's why in my last talk on EuroPython I decided to
 move
  on other areas (Python objects).

 Cesare, I'm really sorry that you became so disillusioned that you
 abandoned wordcode. I agree that we were too optimistic about Unladen
 Swallow. Also that the existence of PyPy and its PR machine (:-)
 should not stop us from improving CPython.


I never stopped thinking about new optimization. A lot can be made on
CPython, even without resorting to something like JIT et all.


 I'm wondering if, with your experience in creating WPython, you could
 review Stefan Brunthaler's code and approach (once he's put it up for
 review) and possibly the two of you could even work on a joint
 project?

 --
 --Guido van Rossum (python.org/~guido)



Yes, I can. I'll wait for Stefan to update its source (reaching Python 3.2
at least) as he has intended to do, and that everything is published, in
order to review the code.

I also agree with you that right now it doesn't need to look as
state-of-the-art. First make it work, then make it nicer. ;)

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Nick Coghlan

On Thu, Sep 1, 2011 at 3:28 AM, Guido van Rossum gu...@python.org wrote:
 On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
 Cesare, I'm really sorry that you became so disillusioned that you
 abandoned wordcode. I agree that we were too optimistic about Unladen
 Swallow. Also that the existence of PyPy and its PR machine (:-)
 should not stop us from improving CPython.

Yep, and I'll try to do a better job of discouraging creeping
complexity (without adequate payoffs) without the harmful side effect
of discouraging experimentation with CPython performance improvements
in general.

It's massive rewrite the world changes, that don't adequately
account for all the ways CPython gets used or the fact that core devs
need to be able to effectively *review* the changes, that are unlikely
to ever get anywhere. More localised changes, or those that are
relatively easy to explain have a much better chance.

So I'll switch my tone to just trying to make sure that portability
and maintainability concerns are given due weight :)

Cheers,
Nick.

P.S. I suspect a big part of my attitude stems from the fact that
we're still trying to untangle some of the consequences of committing
the PEP 3118 new buffer API implementation with inadequate review (it
turns out the implementation didn't reflect the PEP and the PEP had
deficiencies of its own), and I was one of the ones advocating in
favour of that patch. Once bitten, twice shy, etc.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Greg Ewing


Nick Coghlan wrote:


Personally, I *like* CPython fitting into the simple-and-portable
niche in the Python interpreter space.


Me, too! I like that I can read the CPython source and
understand what it's doing most of the time. Please don't
screw that up by attempting to perform heroic optimisations.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Eli Bendersky

On Tue, Aug 30, 2011 at 08:57, Greg Ewing greg.ew...@canterbury.ac.nzwrote:

 Nick Coghlan wrote:

  Personally, I *like* CPython fitting into the simple-and-portable
 niche in the Python interpreter space.


 Me, too! I like that I can read the CPython source and
 understand what it's doing most of the time. Please don't
 screw that up by attempting to perform heroic optimisations.

 --


Following this argument to the extreme, the bytecode evaluation code of
CPython can be simplified quite a bit. Lose 2x performance but gain a lot of
readability. Does that sound like a good deal? I don't intend to sound
sarcastic, just show that IMHO this argument isn't a good one. I think that
even clever optimized code can be properly written and *documented* to make
the task of understanding it feasible. Personally, I'd love CPython to be a
bit faster and see no reason to give up optimization opportunities for the
sake of code readability.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Nick Coghlan

On Tue, Aug 30, 2011 at 4:22 PM, Eli Bendersky eli...@gmail.com wrote:
 On Tue, Aug 30, 2011 at 08:57, Greg Ewing greg.ew...@canterbury.ac.nz
 wrote:
 Following this argument to the extreme, the bytecode evaluation code of
 CPython can be simplified quite a bit. Lose 2x performance but gain a lot of
 readability. Does that sound like a good deal? I don't intend to sound
 sarcastic, just show that IMHO this argument isn't a good one. I think that
 even clever optimized code can be properly written and *documented* to make
 the task of understanding it feasible. Personally, I'd love CPython to be a
 bit faster and see no reason to give up optimization opportunities for the
 sake of code readability.

Yeah, it's definitely a trade-off - the point I was trying to make is
that there *is* a trade-off being made between complexity and speed.

I think the computed-gotos stuff struck a nice balance - the macro-fu
involved means that you can still understand what the main eval loop
is *doing*, even if you don't know exactly what's hidden behind the
target macros. Ditto for the older opcode prediction feature and the
peephole optimiser - separation of concerns means that you can
understand the overall flow of events without needing to understand
every little detail.

This is where the request to extract individual orthogonal changes and
submit separate patches comes from - it makes it clear that the
independent changes *can* be separated cleanly, and aren't a giant
ball of incomprehensible mud. It's the difference between complex
(lots of moving parts, that can each be understood on their own and
are then composed into a meaningful whole) and complicated (massive
patches that don't work at all if any one component is delayed)

Eugene Toder's AST optimiser work that I still hope to get into 3.3
will have to undergo a similar process - the current patch covers a
bit too much ground and needs to be broken up into smaller steps
before we can seriously consider pushing it into the core.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Stefan Behnel


Nick Coghlan, 30.08.2011 02:00:

On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou wrote:

On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler wrote:

* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.


Having a word-sized bytecode format would probably be acceptable in
itself, so if you want to submit a patch for that, go ahead.


Although any such patch should discuss how it compares with Cesare's
work on wpython.

Personally, I *like* CPython fitting into the simple-and-portable
niche in the Python interpreter space. Armin Rigo made the judgment
years ago that CPython was a poor platform for serious optimisation
when he stopped working on Psyco and started PyPy instead, and I think
the contrasting fates of PyPy and Unladen Swallow have borne out that
opinion. Significantly increasing the complexity of CPython for
speed-ups that are dwarfed by those available through PyPy seems like
a poor trade-off to me.


If Stefan can cut down his changes into smaller feature chunks, thus making 
their benefit reproducible and verifiable by others, it's well worth 
reconsidering if even a visible increase of complexity isn't worth the 
improved performance, one patch at a time. Even if PyPy's performance tops 
the improvements, it's worth remembering that that's also a very different 
kind of system than CPython, with different resource requirements and a 
different level of maturity, compatibility, portability, etc. There are 
many reasons to continue using CPython, not only in corners, and there are 
many people who would be happy about a faster CPython. Raising the bar has 
its virtues.


That being said, I also second Nick's reference to wpython. If CPython 
grows its byte code size anyway (which, as I understand, is one part of the 
proposed changes), it's worth looking at wpython first, given that it has 
been around and working for a while. The other proposed changes sound like 
at least some of them are independent from this one.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Mark Shannon


Nick Coghlan wrote:

On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou solip...@pitrou.net wrote:

On Mon, 29 Aug 2011 11:33:14 -0700
stefan brunthaler s.bruntha...@uci.edu wrote:

* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.

Having a word-sized bytecode format would probably be acceptable in
itself, so if you want to submit a patch for that, go ahead.


Although any such patch should discuss how it compares with Cesare's
work on wpython.

Personally, I *like* CPython fitting into the simple-and-portable
niche in the Python interpreter space.


CPython has a a large number of micro-optimisations, scattered all of 
the code base. By removing these and adding large-scale optimisations, 
like Stephan's, the code base *might* actually get smaller overall (and 
thus simpler) *and* faster.

Of course, CPython must remain portable.

[snip]


At a bare minimum, I don't think any significant changes should be
made under the it will be faster justification until the bulk of the
real-world benchmark suite used for speed.pypy.org is available for
Python 3. (Wasn't there a GSoC project about that?)


+1

Cheers,
Mark.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Mark Shannon


Martin v. Löwis wrote:

So, the two big issues aside, is there any interest in incorporating
these optimizations in Python 3?


The question really is whether this is an all-or-nothing deal. If you
could identify smaller parts that can be applied independently, interest
would be higher.

Also, I'd be curious whether your techniques help or hinder a potential
integration of a JIT generator.


A JIT compiler is not a silver bullet, translation to machine code is
just one of many optimisations performed by PyPy.
A compiler merely removes interpretative overhead, at the cost of
significantly increased code size, whereas Stephan's work attacks both
interpreter overhead and some of the inefficiencies due to dynamic typing.

If Unladen Swallow achieved anything it was to demonstrate that a JIT
alone does not work well.

My (experimental) HotPy VM has similar base-line speed to CPython, yet
is able to outperform Unladen Swallow using interpreter-only optimisations.
(It goes even faster with the compiler turned on :) )

Cheers,
Mark.




Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Martin v. Löwis

 Although any such patch should discuss how it compares with Cesare's
 work on wpython.
 Personally, I *like* CPython fitting into the simple-and-portable
 niche in the Python interpreter space.
 
 Changing the bytecode width wouldn't make the interpreter more complex.

No, but I think Stefan is proposing to add a *second* byte code format,
in addition to the one that remains there. That would certainly be an
increase in complexity.

 Some years ago we were waiting for Unladen Swallow to improve itself
 and be ported to Python 3. Now it seems we are waiting for PyPy to be
 ported to Python 3. I'm not sure how let's just wait is a good
 trade-off if someone proposes interesting patches (which, of course,
 remains to be seen).

I completely agree. Let's not put unmet preconditions to such projects.

For example, I still plan to write a JIT for Python at some point. This
may happen in two months, or in two years. I wouldn't try to stop
anybody from contributing improvements that may become obsolete with the
JIT. The only recent case where I *did* try to stop people is with
PEP-393, where I do believe that some of the changes that had been
made over the last year become redundant.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Antoine Pitrou

On Tue, 30 Aug 2011 13:29:59 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
 Anecdotal, non-reproducible performance figures are *not* the way to
 go about serious optimisation efforts.

What about anecdotal *and* reproducible performance figures? :)
I may be half-joking, but we already have a set of py3k-compatible
benchmarks and, besides, sometimes a timeit invocation gives a good
idea of whether an approach is fruitful or not.
While a permanent public reference with historical tracking of
performance figures is even better, let's not freeze everything until
it's ready.
(for example, do we need to wait for speed.python.org before PEP 393 is
accepted?)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Nick Coghlan

On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Tue, 30 Aug 2011 13:29:59 +1000
 Nick Coghlan ncogh...@gmail.com wrote:

 Anecdotal, non-reproducible performance figures are *not* the way to
 go about serious optimisation efforts.

 What about anecdotal *and* reproducible performance figures? :)
 I may be half-joking, but we already have a set of py3k-compatible
 benchmarks and, besides, sometimes a timeit invocation gives a good
 idea of whether an approach is fruitful or not.
 While a permanent public reference with historical tracking of
 performance figures is even better, let's not freeze everything until
 it's ready.
 (for example, do we need to wait for speed.python.org before PEP 393 is
 accepted?)

Yeah, I'd neglected the idea of just running perf.py for pre- and
post-patch performance comparisons. You're right that that can
generate sufficient info to make a well-informed decision.

I'd still really like it if some of the people advocating that we care
about CPython performance actually volunteered to spearhead the effort
to get speed.python.org up and running, though. As far as I know, the
hardware's spinning idly waiting to be given work to do :P

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

 Changing the bytecode width wouldn't make the interpreter more complex.

 No, but I think Stefan is proposing to add a *second* byte code format,
 in addition to the one that remains there. That would certainly be an
 increase in complexity.

Yes, indeed I have a more straightforward instruction format to allow
for more efficient decoding. Just going from bytecode size to
word-code size without changing the instruction format is going to
require 8 (or word-size) times more memory on a 64bit system. From an
optimization perspective, the irregular instruction format was the
biggest problem, because checking for HAS_ARG is always on the fast
path and mostly unpredictable. Hence, I chose to extend the
instruction format to have word-size and use the additional space to
have the upper half be used for the argument and the lower half for
the actual opcode. Encoding is more efficient, and *not* more complex.
Using profiling to indicate what code is hot, I don't waste too much
memory on encoding this regular instruction format.


 For example, I still plan to write a JIT for Python at some point. This
 may happen in two months, or in two years. I wouldn't try to stop
 anybody from contributing improvements that may become obsolete with the
 JIT.

I would not necessary argue that at least my optimizations would
become obsolete; if you still think about writing a JIT, it might make
sense to re-use what I've got and not start from scratch, e.g.,
building a simple JIT compiler that just inlines the operation
implementations as templates to eliminate the interpretative overhead
(in similar vein as Piumarta and Riccardi's paper from 1998) might be
good start. Thoug I don't want to pre-influence your JIT design, I'm
just thinking out loud...

Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Guido van Rossum

Stefan, have you shared a pointer to your code yet? Is it open source?
It sounds like people are definitely interested and it would make
sense to let them experiment with your code and review it.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

On Tue, Aug 30, 2011 at 09:42, Guido van Rossum gu...@python.org wrote:
 Stefan, have you shared a pointer to your code yet? Is it open source?

I have no shared code repository, but could create one (is there any
pydev preferred provider?). I have all the copyrights on the code, and
I would like to open-source it.

 It sounds like people are definitely interested and it would make
 sense to let them experiment with your code and review it.

That sounds fine. I need to do some clean up work (contains most of my
comments to remind me of issues) and currently does not pass all
regression tests. But if people want to take a look first to decide if
they want it than that's good enough for me. (I just wanted to know if
there is substantial interest so that it eventually pays off to find
and fix the remaining bugs)

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Antoine Pitrou

On Tue, 30 Aug 2011 08:27:13 -0700
stefan brunthaler ste...@brunthaler.net wrote:
  Changing the bytecode width wouldn't make the interpreter more complex.
 
  No, but I think Stefan is proposing to add a *second* byte code format,
  in addition to the one that remains there. That would certainly be an
  increase in complexity.
 
 Yes, indeed I have a more straightforward instruction format to allow
 for more efficient decoding. Just going from bytecode size to
 word-code size without changing the instruction format is going to
 require 8 (or word-size) times more memory on a 64bit system.

Do you really need it to match a machine word? Or is, say, a 16-bit
format sufficient.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

 Do you really need it to match a machine word? Or is, say, a 16-bit
 format sufficient.

Hm, technically no, but practically it makes more sense, as (at least
for x86 architectures) having opargs and opcodes in half-words can be
efficiently expressed in assembly. On 64bit architectures, I could
also inline data object references that fit into the 32bit upper half.
It turns out that most constant objects fit nicely into this, and I
have used this for a special cache region (again below 2^32) for
global objects, too. So, technically it's not necessary, but
practically it makes a lot of sense. (Most of these things work on
32bit systems, too. For architectures with a smaller size, we can
adapt or disable the optimizations.)

Cheers,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Guido van Rossum

On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler
ste...@brunthaler.net wrote:
 Do you really need it to match a machine word? Or is, say, a 16-bit
 format sufficient.

 Hm, technically no, but practically it makes more sense, as (at least
 for x86 architectures) having opargs and opcodes in half-words can be
 efficiently expressed in assembly. On 64bit architectures, I could
 also inline data object references that fit into the 32bit upper half.
 It turns out that most constant objects fit nicely into this, and I
 have used this for a special cache region (again below 2^32) for
 global objects, too. So, technically it's not necessary, but
 practically it makes a lot of sense. (Most of these things work on
 32bit systems, too. For architectures with a smaller size, we can
 adapt or disable the optimizations.)

Do I sense that the bytecode format is no longer platform-independent?
That will need a bit of discussion. I bet there are some things around
that depend on that.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

 Do I sense that the bytecode format is no longer platform-independent?
 That will need a bit of discussion. I bet there are some things around
 that depend on that.

Hm, I haven't really thought about that in detail and for longer, I
ran it on PowerPC 970 and Intel Atom  i7 without problems (the latter
ones are a non-issue) and think that it can be portable. I just stuff
argument and opcode into one word for regular instruction decoding
like a RISC CPU, and I realize there might be little/big endian
issues, but they surely can be conditionally compiled...

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Guido van Rossum

On Tue, Aug 30, 2011 at 11:23 AM, stefan brunthaler
ste...@brunthaler.net wrote:
 Do I sense that the bytecode format is no longer platform-independent?
 That will need a bit of discussion. I bet there are some things around
 that depend on that.

 Hm, I haven't really thought about that in detail and for longer, I
 ran it on PowerPC 970 and Intel Atom  i7 without problems (the latter
 ones are a non-issue) and think that it can be portable. I just stuff
 argument and opcode into one word for regular instruction decoding
 like a RISC CPU, and I realize there might be little/big endian
 issues, but they surely can be conditionally compiled...

Um, I'm sorry, but that reply sounds incredibly naive, like you're not
really sure what the on-disk format for .pyc files is or why it would
matter. You're not even answering the question, except indirectly --
it seems that you've never even thought about the possibility of
generating a .pyc file on one platform and copying it to a computer
using a different one.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

 Um, I'm sorry, but that reply sounds incredibly naive, like you're not
 really sure what the on-disk format for .pyc files is or why it would
 matter. You're not even answering the question, except indirectly --
 it seems that you've never even thought about the possibility of
 generating a .pyc file on one platform and copying it to a computer
 using a different one.

Well, it may sound incredibly naive, but the truth is: I am never
storing the optimized representation to disk, it's done purely at
runtime when profiling tells me it makes sense to make the switch.
Thus I circumvent many of the problems outlined by you. So I am
positive that a full fledged change of the representation has many
more intricacies to it, but my approach is only tangentially
related...

--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Terry Reedy


On 8/30/2011 1:23 PM, stefan brunthaler wrote:

(I just wanted to know if there is substantial interest so that

  it eventually pays off to find and fix the remaining bugs)

It is the nature of our development process that there usually can be no 
guarantee of acceptance of future code. The rather early acceptance of 
Unladen Swallow was to me something of an anomaly. I also think it was 
something of a mistake insofar as it discouraged other efforts, like yours.


I think the answer you have gotten is that there is a) substantial 
interest and b) a willingness to consider a major change such as 
switfing from bytecode to something else. There also seem to be two main 
concerns: 1) that the increase in complexity be 'less' than the increase 
in speed, and 2) that the changes be presented in small enough chunks 
that they can be reviewed.


Whether this is good enough for you to proceed is for you to decide.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Guido van Rossum

On Tue, Aug 30, 2011 at 11:34 AM, stefan brunthaler
ste...@brunthaler.net wrote:
 Um, I'm sorry, but that reply sounds incredibly naive, like you're not
 really sure what the on-disk format for .pyc files is or why it would
 matter. You're not even answering the question, except indirectly --
 it seems that you've never even thought about the possibility of
 generating a .pyc file on one platform and copying it to a computer
 using a different one.

 Well, it may sound incredibly naive, but the truth is: I am never
 storing the optimized representation to disk, it's done purely at
 runtime when profiling tells me it makes sense to make the switch.
 Thus I circumvent many of the problems outlined by you. So I am
 positive that a full fledged change of the representation has many
 more intricacies to it, but my approach is only tangentially
 related...

Ok, there there's something else you haven't told us. Are you saying
that the original (old) bytecode is still used (and hence written to
and read from .pyc files)?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Georg Brandl

Am 30.08.2011 20:34, schrieb stefan brunthaler:
 Um, I'm sorry, but that reply sounds incredibly naive, like you're not
 really sure what the on-disk format for .pyc files is or why it would
 matter. You're not even answering the question, except indirectly --
 it seems that you've never even thought about the possibility of
 generating a .pyc file on one platform and copying it to a computer
 using a different one.

 Well, it may sound incredibly naive, but the truth is: I am never
 storing the optimized representation to disk, it's done purely at
 runtime when profiling tells me it makes sense to make the switch.
 Thus I circumvent many of the problems outlined by you. So I am
 positive that a full fledged change of the representation has many
 more intricacies to it, but my approach is only tangentially
 related...

You know, instead of all these half-explanations, giving us access to
the code would shut us up much more effectively.  Don't worry about not
passing tests, this is what the official trunk does half of the time ;)

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

 Ok, there there's something else you haven't told us. Are you saying
 that the original (old) bytecode is still used (and hence written to
 and read from .pyc files)?

Short answer: yes.
Long answer: I added an invocation counter to the code object and keep
interpreting in the usual Python interpreter until this counter
reaches a configurable threshold. When it reaches this threshold, I
create the new instruction format and interpret with this optimized
representation. All the macros look exactly the same in the source
code, they are just redefined to use the different instruction format.
I am at no point serializing this representation or the runtime
information gathered by me, as any subsequent invocation might have
different characteristics.

I will remove my development commentaries and create a private
repository at bitbucket for you* to take an early look like Georg (and
more or less Terry, too) suggested. Is that a good way for most of
you? (I would then give access to whomever wants to take a look.)

Best,
--stefan

*: not personally targeted at Guido (who is naturally very much
welcome to take a look, too) but addressed to python-dev in general.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Benjamin Peterson

2011/8/30 stefan brunthaler ste...@brunthaler.net:
 I will remove my development commentaries and create a private
 repository at bitbucket for you* to take an early look like Georg (and
 more or less Terry, too) suggested. Is that a good way for most of
 you? (I would then give access to whomever wants to take a look.)

And what is wrong with a public one?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread stefan brunthaler

On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson benja...@python.org wrote:
 2011/8/30 stefan brunthaler ste...@brunthaler.net:
 I will remove my development commentaries and create a private
 repository at bitbucket for you* to take an early look like Georg (and
 more or less Terry, too) suggested. Is that a good way for most of
 you? (I would then give access to whomever wants to take a look.)

 And what is wrong with a public one?

Well, since it does not fully pass all regression tests and is just
meant for people to take a first look to find out if it's interesting,
I think I might take it offline after you had a look. It seems to me
that that is easier to be done with a private repository, but in
general, I don't have a problem with a public one...

Regards,
--stefan

PS: If you want to, I can also just put a tarball on my home page and
post a link here. It's not that I would like to have control/influence
about who is allowed to look and who doesn't.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Gregory P. Smith

On Tue, Aug 30, 2011 at 1:54 PM, Benjamin Peterson benja...@python.orgwrote:

 2011/8/30 stefan brunthaler ste...@brunthaler.net:
  On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson benja...@python.org
 wrote:
  2011/8/30 stefan brunthaler ste...@brunthaler.net:
  I will remove my development commentaries and create a private
  repository at bitbucket for you* to take an early look like Georg (and
  more or less Terry, too) suggested. Is that a good way for most of
  you? (I would then give access to whomever wants to take a look.)
 
  And what is wrong with a public one?
 
  Well, since it does not fully pass all regression tests and is just
  meant for people to take a first look to find out if it's interesting,
  I think I might take it offline after you had a look. It seems to me
  that that is easier to be done with a private repository, but in
  general, I don't have a problem with a public one...

 Well, if your intention is for people to look at it, public seems to
 be the best solution.


+1

The point of open source is more eyeballs and the ability for anyone else to
pick up code and run in whatever direction they want (license permitting)
with it. :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Jesse Noller



On Aug 30, 2011, at 9:05 AM, Nick Coghlan ncogh...@gmail.com wrote:

 On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Tue, 30 Aug 2011 13:29:59 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
 
 Anecdotal, non-reproducible performance figures are *not* the way to
 go about serious optimisation efforts.
 
 What about anecdotal *and* reproducible performance figures? :)
 I may be half-joking, but we already have a set of py3k-compatible
 benchmarks and, besides, sometimes a timeit invocation gives a good
 idea of whether an approach is fruitful or not.
 While a permanent public reference with historical tracking of
 performance figures is even better, let's not freeze everything until
 it's ready.
 (for example, do we need to wait for speed.python.org before PEP 393 is
 accepted?)
 
 Yeah, I'd neglected the idea of just running perf.py for pre- and
 post-patch performance comparisons. You're right that that can
 generate sufficient info to make a well-informed decision.
 
 I'd still really like it if some of the people advocating that we care
 about CPython performance actually volunteered to spearhead the effort
 to get speed.python.org up and running, though. As far as I know, the
 hardware's spinning idly waiting to be given work to do :P
 
 Cheers,
 Nick.
 

Discussion of speed.python.org should happen on the mailing list for that 
project if possible.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Nick Coghlan

On Wed, Aug 31, 2011 at 3:23 AM, stefan brunthaler
ste...@brunthaler.net wrote:
 On Tue, Aug 30, 2011 at 09:42, Guido van Rossum gu...@python.org wrote:
 Stefan, have you shared a pointer to your code yet? Is it open source?

 I have no shared code repository, but could create one (is there any
 pydev preferred provider?). I have all the copyrights on the code, and
 I would like to open-source it.

Currently, the easiest way to create shared repositories for CPython
variants is to start with bitbucket's mirror of the main CPython repo:
https://bitbucket.org/mirror/cpython/overview

Use the website to create your own public CPython fork, then edit the
configuration of your local copy of the CPython repo to point to the
your new bitbucket repo rather than the main one on hg.python.org. hg
push/pull can then be used as normal to publish in-development
material to the world. 'hg pull' from hg.python.org makes it fairly
easy to track the trunk.

One key thing is to avoid making any changes of your own on the
official CPython branches (i.e. default, 3.2, 2.7). Instead, use a
named branch for anything you're working on. This makes it much easier
to generate standalone patches later on.

My own public sandbox
(https://bitbucket.org/ncoghlan/cpython_sandbox/overview) is set up
that way, and you can see plenty of other examples on bitbucket.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Nick Coghlan

On Wed, Aug 31, 2011 at 9:21 AM, Jesse Noller jnol...@gmail.com wrote:
 Discussion of speed.python.org should happen on the mailing list for that 
 project if possible.

Hah, that's how out of the loop I am on that front - I didn't even
know there *was* a mailing list for it :)

Subscribed!

Cheers,
Nick.

P.S. For anyone else that is interested:
http://mail.python.org/mailman/listinfo/speed

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Terry Reedy


On 8/30/2011 2:12 PM, Guido van Rossum wrote:

On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler
ste...@brunthaler.net  wrote:

Do you really need it to match a machine word? Or is, say, a 16-bit
format sufficient.


Hm, technically no, but practically it makes more sense, as (at least
for x86 architectures) having opargs and opcodes in half-words can be
efficiently expressed in assembly. On 64bit architectures, I could
also inline data object references that fit into the 32bit upper half.
It turns out that most constant objects fit nicely into this, and I
have used this for a special cache region (again below 2^32) for
global objects, too. So, technically it's not necessary, but
practically it makes a lot of sense. (Most of these things work on
32bit systems, too. For architectures with a smaller size, we can
adapt or disable the optimizations.)


Do I sense that the bytecode format is no longer platform-independent?
That will need a bit of discussion. I bet there are some things around
that depend on that.


I find myself more comfortable with the Cesare Di Mauro's idea of 
expanding to 16 bits as the code unit. His basic idea was using 2, 4, or 
6 bytes instead of 1, 3, or 6. It actually tended to save space because 
many ops with small ints (which are very common) contract from 3 bytes 
to 2 bytes or from 9(?) (two instructions) to 6. I am sorry he was not 
able to followup on the initial promising results. The dis output was 
probably easier to read than the current output.


Perhaps he made a mistake in combining the above idea with a shift from 
stack to hybrid stack+register design.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 Antoine Pitrou solip...@pitrou.net


 Changing the bytecode width wouldn't make the interpreter more complex.


It depends on the kind of changes. :)

WPython introduced a very different intermediate code representation that
required a big change on the peepholer optimizer on 1.0 alpha version.
On 1.1 final I decided to completely move that code on ast.c (mostly for
constant-folding) and compiler.c (for the usual peepholer usage: seeking for
some patterns to substitute with better ones) because I found it simpler
and more convenient.

In the end, taking out some new optimizations that I've implemented on the
road, the interpreter code is a bit more complex.


 Some years ago we were waiting for Unladen Swallow to improve itself
 and be ported to Python 3. Now it seems we are waiting for PyPy to be
 ported to Python 3. I'm not sure how let's just wait is a good
 trade-off if someone proposes interesting patches (which, of course,
 remains to be seen).

 Regards

 Antoine.

 It isn't, because motivation to do something new with CPython vanishes, at
least on some areas (virtual machine / ceval.c), even having some ideas to
experiment with. That's why in my last talk on EuroPython I decided to move
on other areas (Python objects).

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 Nick Coghlan ncogh...@gmail.com


 Yeah, it's definitely a trade-off - the point I was trying to make is
 that there *is* a trade-off being made between complexity and speed.

 I think the computed-gotos stuff struck a nice balance - the macro-fu
 involved means that you can still understand what the main eval loop
 is *doing*, even if you don't know exactly what's hidden behind the
 target macros. Ditto for the older opcode prediction feature and the
 peephole optimiser - separation of concerns means that you can
 understand the overall flow of events without needing to understand
 every little detail.

 This is where the request to extract individual orthogonal changes and
 submit separate patches comes from - it makes it clear that the
 independent changes *can* be separated cleanly, and aren't a giant
 ball of incomprehensible mud. It's the difference between complex
 (lots of moving parts, that can each be understood on their own and
 are then composed into a meaningful whole) and complicated (massive
 patches that don't work at all if any one component is delayed)

 Eugene Toder's AST optimiser work that I still hope to get into 3.3
 will have to undergo a similar process - the current patch covers a
 bit too much ground and needs to be broken up into smaller steps
 before we can seriously consider pushing it into the core.

 Regards,
 Nick.

 Sometimes it cannot be done, because big changes produces big patches as
well.

I don't see a problem here if the code is well written (as required buy
the Python community :) and the developer is available to talk about his
work to clear some doubts.

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 stefan brunthaler ste...@brunthaler.net

 Yes, indeed I have a more straightforward instruction format to allow
 for more efficient decoding. Just going from bytecode size to
 word-code size without changing the instruction format is going to
 require 8 (or word-size) times more memory on a 64bit system. From an
 optimization perspective, the irregular instruction format was the
 biggest problem, because checking for HAS_ARG is always on the fast
 path and mostly unpredictable. Hence, I chose to extend the
 instruction format to have word-size and use the additional space to
 have the upper half be used for the argument and the lower half for
 the actual opcode. Encoding is more efficient, and *not* more complex.
 Using profiling to indicate what code is hot, I don't waste too much
 memory on encoding this regular instruction format.

 Regards,
 --stefan

That seems exactly the WPython approach, albeit I used the new wordcode in
place of the old bytecode. Take a look at it. ;)

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 stefan brunthaler ste...@brunthaler.net

  Do I sense that the bytecode format is no longer platform-independent?
  That will need a bit of discussion. I bet there are some things around
  that depend on that.
 
 Hm, I haven't really thought about that in detail and for longer, I
 ran it on PowerPC 970 and Intel Atom  i7 without problems (the latter
 ones are a non-issue) and think that it can be portable. I just stuff
 argument and opcode into one word for regular instruction decoding
 like a RISC CPU, and I realize there might be little/big endian
 issues, but they surely can be conditionally compiled...

 --stefan

I think that you must deal with big endianess because some RISC can't handle
at all data in little endian format.

In WPython I have wrote some macros which handle both endianess, but lacking
big endian machines I never had the opportunity to verify if something was
wrong.

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/31 Terry Reedy tjre...@udel.edu

 I find myself more comfortable with the Cesare Di Mauro's idea of expanding
 to 16 bits as the code unit. His basic idea was using 2, 4, or 6 bytes
 instead of 1, 3, or 6.


It can be expanded to longer than 6 bytes opcodes, if needed. The format is
designed to be flexible enough to accommodate such changes without pains.


 It actually tended to save space because many ops with small ints (which
 are very common) contract from 3 bytes to 2 bytes or from 9(?) (two
 instructions) to 6.


It can pack up to 4 (old) opcodes into one wordcode (superinstruction).
Wordcodes are designed to favor instruction grouping.


 I am sorry he was not able to followup on the initial promising results.


In a few words: lack of interest. Why spending (so much) time to a project
when you see that the community is oriented towards other directions
(Unladen Swallow at first, PyPy in the last period, given the substantial
drop of the former)?

Also, Guido seems to dislike what he finds as hacks, and never showed
interest.

In WPython 1.1 I rolled back the hack that I introduced in PyObject
types (a couple of fields) in 1.0 alpha, to make the code more polished
(but with a sensible drop in the performance). But again, I saw no interest
on WPython, so I decided to put a stop at it, and blocking my initial idea
to  go for Python 3.


 The dis output was probably easier to read than the current output.

 Perhaps he made a mistake in combining the above idea with a shift from
 stack to hybrid stack+register design.

 --
 Terry Jan Reedy

 As I already said, wordcodes are designed to favor grouping. So It was
quite natural to became an hybrid VM. Anyway, both space and performance
gained from this wordcodes property. ;)

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler

Hi,

pretty much a year ago I wrote about the optimizations I did for my
PhD thesis that target the Python 3 series interpreters. While I got
some replies, the discussion never really picked up and no final
explicit conclusion was reached. AFAICT, because of the following two
factors, my optimizations were not that interesting for inclusion with
the distribution at that time:
a) Unladden Swallow was targeting Python 3, too.
b) My prototype did not pass the regression tests.

As of November 2010 (IIRC), Google is not supporting work on US
anymore, and the project is stalled. (If I am wrong and there is still
activity and any plans with the corresponding PEP, please let me
know.) Which is why I recently spent some time fixing issues so that I
can run the regression tests. There is still some work to be done, but
by and large it should be possible to complete all regression tests in
reasonable time (with the actual infrastructure in place, enabling
optimizations later on is not a problem at all, too.)

So, the two big issues aside, is there any interest in incorporating
these optimizations in Python 3?

Have a nice day,
--stefan

PS: It probably is unusual, but in a part of my home page I have
created a link to indicate interest (makes both counting and voting
easier, http://www.ics.uci.edu/~sbruntha/) There were also links
indicating interest in funding the work; I have disabled these, so as
not to upset anybody or make the impression of begging for money...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Benjamin Peterson

2011/8/29 stefan brunthaler s.bruntha...@uci.edu:
 So, the two big issues aside, is there any interest in incorporating
 these optimizations in Python 3?

Perhaps there would be something to say given patches/overviews/specifics.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler

 Perhaps there would be something to say given patches/overviews/specifics.

Currently I don't have patches, but for an overview and specifics, I
can provide the following:
* My optimizations basically rely on quickening to incorporate
run-time information.
* I use two separate instruction dispatch routines, and use profiling
to switch from the regular Python 3 dispatch routine to an optimized
one (the implementation is actually vice versa, but that is not
important now)
* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.
* I use inline-caching based on quickening (passes almost all
regression tests [302 out of 307]), eliminate reference count
operations using quickening (passes but has a memory leak), promote
frequently accessed local variables to their dedicated instructions
(passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction
encoding when possible (I am working on this right now.)

The changes I made can be summarized as:
* I changed some header files to accommodate additional information
(Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h)
* I changed mostly abstract.c to incorporate runtime-type feedback.
* All other changes target mostly ceval.c and all supplementary code
is in a sub-directory named opt and all generated files in a
sub-directory within that (opt/gen).
* I have a code generator in place that takes care of generating all
the functions; it uses the Mako template system for creating C code
and does not necessarily need to be shipped with the interpreter
(though one can play around and experiment with it.)

So, all in all, the changes are not that big to the actual
implementation, and most of the code is generated (using sloccount,
opt has 1990 lines of C, and opt/gen has 8649 lines of C).

That's a quick summary, if there are any further or more in-depth
questions, let me know.

best,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Martin v. Löwis

 So, the two big issues aside, is there any interest in incorporating
 these optimizations in Python 3?

The question really is whether this is an all-or-nothing deal. If you
could identify smaller parts that can be applied independently, interest
would be higher.

Also, I'd be curious whether your techniques help or hinder a potential
integration of a JIT generator.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler

 The question really is whether this is an all-or-nothing deal. If you
 could identify smaller parts that can be applied independently, interest
 would be higher.

Well, it's not an all-or-nothing deal. In my current architecture, I
can selectively enable most of the optimizations as I see fit. The
only pre-requisite (in my implementation) is that I have two dispatch
loops with a changed instruction format. It is, however, not a
technical necessity, just the way I implemented it. Basically, you can
choose whatever you like best, and I could extract that part. I am
just offering to add all the things that I have done :)


 Also, I'd be curious whether your techniques help or hinder a potential
 integration of a JIT generator.

This is something I have previously frequently discussed with several
JIT people. IMHO, having my optimizations in-place also helps a JIT
compiler, since it can re-use the information I gathered to generate
more aggressively optimized native machine code right away (the inline
caches can be generated with the type information right away, some
functions could be inlined with the guard statements subsumed, etc.)
Another benefit could be that the JIT compiler can spend longer time
on generating code, because the interpreter is already faster (so in
some cases it would probably not make sense to include a
non-optimizing fast and simple JIT compiler).
There are others on the list, who probably can/want to comment on this, too.

That aside, I think that while having a JIT is an important goal, I
can very well imagine scenarios where the additional memory
consumption (for the generated native machine code) of a JIT for each
process (I assume that the native machine code caches are not shared)
hinders scalability. I have in fact no data to back this up, but I
think that would be an interesting trade off, say if I have 30% gain
in performance without substantial additional memory requirements on
my existing hardware, compared to higher achievable speedups that
require more machines, though.


Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Antoine Pitrou

On Mon, 29 Aug 2011 11:33:14 -0700
stefan brunthaler s.bruntha...@uci.edu wrote:
 * The optimized dispatch routine has a changed instruction format
 (word-sized instead of bytecodes) that allows for regular instruction
 decoding (without the HAS_ARG-check) and inlinind of some objects in
 the instruction format on 64bit architectures.

Having a word-sized bytecode format would probably be acceptable in
itself, so if you want to submit a patch for that, go ahead.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Victor Stinner

Le lundi 29 août 2011 19:35:14, stefan brunthaler a écrit :
 pretty much a year ago I wrote about the optimizations I did for my
 PhD thesis that target the Python 3 series interpreters

Does it speed up Python? :-) Could you provide numbers (benchmarks)?

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler

 Does it speed up Python? :-) Could you provide numbers (benchmarks)?

Yes, it does ;)

The maximum overall speedup I achieved was by a factor of 2.42 on my
i7-920 for the spectralnorm benchmark of the computer language
benchmark game.

Others from the same set are:
  binarytrees: 1.9257 (1.9891)
  fannkuch: 1.6509 (1.7264)
  fasta: 1.5446 (1.7161)
  mandelbrot: 2.0040 (2.1847)
  nbody: 1.6165 (1.7602)
  spectralnorm: 2.2538 (2.4176)
  ---
  overall: 1.8213 (1.9382)

(The first number is the combination of all optimizations, the one in
parentheses is with my last optimization [Interpreter Instruction
Scheduling] enabled, too.)

For a comparative real world benchmark I tested Martin von Loewis'
django port (there are not that many meaningful Python 3 real world
benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
well, US got a speedup of 1.35 on this benchmark. I just checked that
pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
seem to be not working currently or *really* fast...), but I cannot
tell directly how that relates to speedups (it just says less is
better and I did not quickly find an explanation).
Since I did this benchmark last year, I have spent more time
investigating this benchmark and found that I could do better, but I
would have to guess as to how much (An interesting aside though: on
this benchmark, the executable never grew on more than 5 megs of
memory usage, exactly like the vanilla Python 3 interpreter.)

hth,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Nick Coghlan

On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou solip...@pitrou.net wrote:
 On Mon, 29 Aug 2011 11:33:14 -0700
 stefan brunthaler s.bruntha...@uci.edu wrote:
 * The optimized dispatch routine has a changed instruction format
 (word-sized instead of bytecodes) that allows for regular instruction
 decoding (without the HAS_ARG-check) and inlinind of some objects in
 the instruction format on 64bit architectures.

 Having a word-sized bytecode format would probably be acceptable in
 itself, so if you want to submit a patch for that, go ahead.

Although any such patch should discuss how it compares with Cesare's
work on wpython.

Personally, I *like* CPython fitting into the simple-and-portable
niche in the Python interpreter space. Armin Rigo made the judgment
years ago that CPython was a poor platform for serious optimisation
when he stopped working on Psyco and started PyPy instead, and I think
the contrasting fates of PyPy and Unladen Swallow have borne out that
opinion. Significantly increasing the complexity of CPython for
speed-ups that are dwarfed by those available through PyPy seems like
a poor trade-off to me.

At a bare minimum, I don't think any significant changes should be
made under the it will be faster justification until the bulk of the
real-world benchmark suite used for speed.pypy.org is available for
Python 3. (Wasn't there a GSoC project about that?)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler

 Personally, I *like* CPython fitting into the simple-and-portable
 niche in the Python interpreter space. Armin Rigo made the judgment
 years ago that CPython was a poor platform for serious optimisation
 when he stopped working on Psyco and started PyPy instead, and I think
 the contrasting fates of PyPy and Unladen Swallow have borne out that
 opinion. Significantly increasing the complexity of CPython for
 speed-ups that are dwarfed by those available through PyPy seems like
 a poor trade-off to me.

I agree with the trade-off, but the nice thing is that CPython's
interpreter remains simple and portable using my optimizations. All of
these optimizations are purely interpretative and the complexity of
CPython is not affected much. (For example, I have an inline-cached
version of BINARY_ADD that is called INCA_FLOAT_ADD [INCA being my
abbreviation for INline CAching]; you don't actually have to look at
its source code, since it is generated by my code generator but can by
looking at instruction traces immediately tell what's going on.) So,
the interpreter remains fully portable and any compatibility issues
with C modules should not occur either.


 At a bare minimum, I don't think any significant changes should be
 made under the it will be faster justification until the bulk of the
 real-world benchmark suite used for speed.pypy.org is available for
 Python 3. (Wasn't there a GSoC project about that?)

Having more tests would surely be helpful, as already said, the most
real-world stuff I can do is Martin's django patch (some of the other
benchmarks though are from the shootout and I can [and did] run them,
too {binarytrees, fannkuch, fasta, mandelbrot, nbody and
spectralnorm}. I have also the AI benchmark from Unladden Swallow but
no current figures.)


Best,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Antoine Pitrou

On Tue, 30 Aug 2011 10:00:28 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
  Having a word-sized bytecode format would probably be acceptable in
  itself, so if you want to submit a patch for that, go ahead.
 
 Although any such patch should discuss how it compares with Cesare's
 work on wpython.
 Personally, I *like* CPython fitting into the simple-and-portable
 niche in the Python interpreter space.

Changing the bytecode width wouldn't make the interpreter more complex.

 Armin Rigo made the judgment
 years ago that CPython was a poor platform for serious optimisation
 when he stopped working on Psyco and started PyPy instead, and I think
 the contrasting fates of PyPy and Unladen Swallow have borne out that
 opinion.

Well, PyPy didn't show any significant achievements before they spent
*much* more time on it than the Unladen Swallow guys did. Whether or not
a good JIT is possible on top of CPython might remain a largely
unanswered question.

 Significantly increasing the complexity of CPython for
 speed-ups that are dwarfed by those available through PyPy seems like
 a poor trade-off to me.

Some years ago we were waiting for Unladen Swallow to improve itself
and be ported to Python 3. Now it seems we are waiting for PyPy to be
ported to Python 3. I'm not sure how let's just wait is a good
trade-off if someone proposes interesting patches (which, of course,
remains to be seen).

 At a bare minimum, I don't think any significant changes should be
 made under the it will be faster justification until the bulk of the
 real-world benchmark suite used for speed.pypy.org is available for
 Python 3. (Wasn't there a GSoC project about that?)

I'm not sure what the bulk is, but have you already taken a look at
http://hg.python.org/benchmarks/ ?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Gregory P. Smith

On Mon, Aug 29, 2011 at 2:05 PM, stefan brunthaler s.bruntha...@uci.eduwrote:

  The question really is whether this is an all-or-nothing deal. If you
  could identify smaller parts that can be applied independently, interest
  would be higher.
 
 Well, it's not an all-or-nothing deal. In my current architecture, I
 can selectively enable most of the optimizations as I see fit. The
 only pre-requisite (in my implementation) is that I have two dispatch
 loops with a changed instruction format. It is, however, not a
 technical necessity, just the way I implemented it. Basically, you can
 choose whatever you like best, and I could extract that part. I am
 just offering to add all the things that I have done :)


+1 from me on going forward with your performance improvements.  The more
you can break them down into individual smaller patch sets the better as
they can be reviewed and applied as needed.  A prerequisites patch, a patch
for the wide opcodes, etc..

For benchmarks given this is python 3, just get as many useful ones running
as you can.

Some in this thread seemed to give the impression that CPython performance
is not something to care about. I disagree. I see CPython being the main
implementation of Python used in most places for a long time. Improving its
performance merely raises the bar to be met by other implementations if they
want to compete. That is a good thing!

-gps


  Also, I'd be curious whether your techniques help or hinder a potential
  integration of a JIT generator.
 
 This is something I have previously frequently discussed with several
 JIT people. IMHO, having my optimizations in-place also helps a JIT
 compiler, since it can re-use the information I gathered to generate
 more aggressively optimized native machine code right away (the inline
 caches can be generated with the type information right away, some
 functions could be inlined with the guard statements subsumed, etc.)
 Another benefit could be that the JIT compiler can spend longer time
 on generating code, because the interpreter is already faster (so in
 some cases it would probably not make sense to include a
 non-optimizing fast and simple JIT compiler).
 There are others on the list, who probably can/want to comment on this,
 too.

 That aside, I think that while having a JIT is an important goal, I
 can very well imagine scenarios where the additional memory
 consumption (for the generated native machine code) of a JIT for each
 process (I assume that the native machine code caches are not shared)
 hinders scalability. I have in fact no data to back this up, but I
 think that would be an interesting trade off, say if I have 30% gain
 in performance without substantial additional memory requirements on
 my existing hardware, compared to higher achievable speedups that
 require more machines, though.


 Regards,
 --stefan
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/greg%40krypto.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Nick Coghlan

On Tue, Aug 30, 2011 at 12:38 PM, Gregory P. Smith g...@krypto.org wrote:
 Some in this thread seemed to give the impression that CPython performance
 is not something to care about. I disagree. I see CPython being the main
 implementation of Python used in most places for a long time. Improving its
 performance merely raises the bar to be met by other implementations if they
 want to compete. That is a good thing!

Not the impression I intended to give. I merely want to highlight that
we need to be careful that incremental increases in complexity are
justified with real, measured performance improvements. PyPy has set
the bar on how to do that - people that seriously want to make CPython
faster need to focus on getting speed.python.org sorted *first* (so we
know where we're starting) and *then* work on trying to improve
CPython's numbers relative to that starting point.

The PSF has the hardware to run the site, but, unless more has been
going in the background than I am aware of, is still lacking trusted
volunteers to do the following:
1. Getting codespeed up and running on the PSF hardware
2. Hooking it in to the CPython source control infrastructure
3. Getting a reasonable set of benchmarks running on 3.x (likely
starting with the already ported set in Mercurial, but eventually we
want the full suite that PyPy uses)
4. Once PyPy, Jython and IronPython offer 3.x compatible versions,
start including them as well (alternatively, offer 2.x performance
comparisons as well, although that's less interesting from a CPython
point of view since it can't be used to guide future CPython
optimisation efforts)

Anecdotal, non-reproducible performance figures are *not* the way to
go about serious optimisation efforts. Using a dedicated machine is
vulnerable to architecture-specific idiosyncracies, but ad hoc testing
on other systems can still be used as a sanity check.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

93 matches

Mail list logo