[Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Hi, This is the second email thread I start regarding implementing an opcode cache in ceval loop. Since my first post on this topic: - I've implemented another optimization (LOAD_ATTR); - I've added detailed statistics mode so that I can "see" how the cache performs and tune it; - some

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 20:51, Yury Selivanov wrote: If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets deoptimized, and the default implementation is used. So if the code is very dynamic - there's no improvement, but no performance penalty either. Will you re-try optimizing it?

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Hi Damien, On 2016-02-01 3:59 PM, Damien George wrote: Hi Yury, That's great news about the speed improvements with the dict offset cache! The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 21:35, Yury Selivanov wrote: It's important to understand that if we have a lot of cache misses after the code object was executed 1000 times, it doesn't make sense to keep trying to update that cache. It just means that the code, in that particular point, works with different

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Andrew Barnert via Python-Dev
Looking over the thread and the two issues, you've got good arguments for why the improved code will be the most common code, and good benchmarks for various kinds of real-life code, but it doesn't seem like you'd tried to stress it on anything that could be made worse. From your explanations

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 19:28, Brett Cannon wrote: A search for [stack vs register based virtual machine] will get you some information. Alright. :) Will go for that. You aren't really supposed to yet. :) In Pyjion's case we are still working on compatibility, let alone trying to show a speed

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Ethan Furman
On 02/01/2016 08:40 AM, R. David Murray wrote: On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano wrote: I find that being able to easily open stdlib .py files in a text editor to read the source is extremely valuable. I've learned much more from reading the source than from (e.g.)

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread R. David Murray
On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano wrote: > On Sun, Jan 31, 2016 at 08:23:00PM +, Brett Cannon wrote: > > So freezing the stdlib helps on UNIX and not on OS X (if my old testing is > > still accurate). I guess the next question is what it does on Windows

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 09:08 Yury Selivanov wrote: > > > On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: > >> Hi, > >> > >> > >> tl;dr The summary is that I have a patch that improves CPython > >>

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 12:16 Yury Selivanov wrote: > Brett, > > On 2016-02-01 3:08 PM, Brett Cannon wrote: > > > > > > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > > wrote: > > > > Hi Brett, > > > [..] > > > >

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 18:18, Brett Cannon wrote: On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: >> Hi,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 10:21 Sven R. Kunze wrote: > > > On 01.02.2016 18:18, Brett Cannon wrote: > > > > On Mon, 1 Feb 2016 at 09:08 Yury Selivanov < > yselivanov...@gmail.com> wrote: > >> >> >> On 2016-01-29 11:28 PM, Steven D'Aprano wrote: >> > On Wed,

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 11:11 Yury Selivanov wrote: > Hi, > > This is the second email thread I start regarding implementing an opcode > cache in ceval loop. Since my first post on this topic: > > - I've implemented another optimization (LOAD_ATTR); > > - I've added

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Sven R. Kunze
Thanks, Brett. Wasn't aware of lazy imports as well. I think that one is even better reducing startup time as freezing stdlib. On 31.01.2016 18:57, Brett Cannon wrote: I have opened http://bugs.python.org/issue26252 to track writing the example (and before ppl go playing with the lazy loader,

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
On 2016-02-01 4:02 PM, Sven R. Kunze wrote: On 01.02.2016 21:35, Yury Selivanov wrote: It's important to understand that if we have a lot of cache misses after the code object was executed 1000 times, it doesn't make sense to keep trying to update that cache. It just means that the code, in

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Damien George
Hi Yury, That's great news about the speed improvements with the dict offset cache! > The cache struct is defined in code.h [2], and is 32 bytes long. When a > code object becomes hot, it gets an cache offset table allocated for it > (+1 byte for each opcode) + an array of cache structs. Ok, so

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Sven, On 2016-02-01 4:32 PM, Sven R. Kunze wrote: On 01.02.2016 22:27, Yury Selivanov wrote: Right now they are private constants in ceval.c. I will (maybe) expose a private API via the _testcapi module to re-define them (set them to 1 or 0), only to write better unittests. I have no plans

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
On 2016-02-01 4:21 PM, Yury Selivanov wrote: Hi Damien, On 2016-02-01 3:59 PM, Damien George wrote: [..] But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 22:27, Yury Selivanov wrote: Right now they are private constants in ceval.c. I will (maybe) expose a private API via the _testcapi module to re-define them (set them to 1 or 0), only to write better unittests. I have no plans to make those constants public or have a public API

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Mark Lawrence
On 01/02/2016 16:54, Yury Selivanov wrote: On 2016-01-29 11:28 PM, Steven D'Aprano wrote: On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: Hi, tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 11:51 Yury Selivanov wrote: > Hi Brett, > > On 2016-02-01 2:30 PM, Brett Cannon wrote: > > > > > > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov > > wrote: > > > > Hi, > > > [..] > > > >

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Brett, On 2016-02-01 3:08 PM, Brett Cannon wrote: On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > wrote: Hi Brett, [..] The first two fields are used to make sure that we have objects of the same type. If it changes,

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Hi Brett, On 2016-02-01 2:30 PM, Brett Cannon wrote: On Mon, 1 Feb 2016 at 11:11 Yury Selivanov > wrote: Hi, [..] What's next? First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
On 2016-02-01 3:21 PM, Brett Cannon wrote: On Mon, 1 Feb 2016 at 12:16 Yury Selivanov > wrote: Brett, On 2016-02-01 3:08 PM, Brett Cannon wrote: > > > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Andrew Barnert via Python-Dev
On Feb 1, 2016, at 09:59, mike.romb...@comcast.net wrote: > > If the stdlib were to use implicit namespace packages > ( https://www.python.org/dev/peps/pep-0420/ ) and the various > loaders/importers as well, then python could do what I've done with an > embedded python application for years.

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
On 2016-02-01 3:27 PM, Sven R. Kunze wrote: On 01.02.2016 20:51, Yury Selivanov wrote: If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets deoptimized, and the default implementation is used. So if the code is very dynamic - there's no improvement, but no performance

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov
Andrew, On 2016-02-01 4:29 PM, Andrew Barnert wrote: Looking over the thread and the two issues, you've got good arguments for why the improved code will be the most common code, and good benchmarks for various kinds of real-life code, but it doesn't seem like you'd tried to stress it on

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Greg Ewing
Sven R. Kunze wrote: Are there some resources on why register machines are considered faster than stack machines? If a register VM is faster, it's probably because each register instruction does the work of about 2-3 stack instructions, meaning less trips around the eval loop, so less

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov
Hi Brett, On 2016-02-01 12:18 PM, Brett Cannon wrote: On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: [..] If I were to do some big refactoring of the ceval loop, I'd probably consider implementing a register VM.

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread mike . romberg
> " " == Barry Warsaw writes: >> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >> I don't know about anyone else, but on my own development >> systems it is not that unusual for me to *edit* the stdlib >> files (to add debug prints) while

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Barry Warsaw
On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >Well, Brett said it would be optional, though perhaps the above >paragraph is asking about doing it in our Windows build. But the linux >distros might make also use the option if it exists, so the question is >very meaningful. However, you'd

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov
On 2016-01-29 11:28 PM, Steven D'Aprano wrote: On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: Hi, tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS X, desktop CPU/Linux,

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Nikolaus Rath
On Feb 01 2016, mike.romb...@comcast.net wrote: " " == Barry Warsaw writes: >> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >> I don't know about anyone else, but on my own development >> systems it is not that unusual for me to *edit* the >>

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Andrew Barnert via Python-Dev
On Feb 1, 2016, at 19:44, Terry Reedy wrote: > >> On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote: >> >> There are already multiple duplicate questions every month on >> StackOverflow from people asking "how do I find the source to stdlib >> module X". The canonical

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Terry Reedy
On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote: There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 17:54, Yury Selivanov wrote: If I were to do some big refactoring of the ceval loop, I'd probably consider implementing a register VM. While register VMs are a bit faster than stack VMs (up to 20-30%), they would also allow us to apply more optimizations, and even bolt on a

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 08:48 R. David Murray wrote: > On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano > wrote: > > On Sun, Jan 31, 2016 at 08:23:00PM +, Brett Cannon wrote: > > > So freezing the stdlib helps on UNIX and not on OS X (if my old >