[Python-Dev] Speeding up CPython

2020-10-20 Thread Mark Shannon
Hi everyone, CPython is slow. We all know that, yet little is done to fix it. I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding. I am aware that there have been several promised speed ups in the past that have

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread zreed
No problem, I did not think you were attacking me or find your response rude. On Wed, May 18, 2016, at 01:06 PM, Cesare Di Mauro wrote: > If you feel like I've attacked you, I apologize: it wasn't my > intention. Please, don't get it personal: I only reported my honest > opinion, albeit after

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread Cesare Di Mauro
If you feel like I've attacked you, I apologize: it wasn't my intention. Please, don't get it personal: I only reported my honest opinion, albeit after a re-read it looks too rude, and I'm sorry for that. Regarding the post-bytecode optimization issues, they are mainly represented by the constant

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread zreed
Your criticisms may very well be true. IIRC though, I wrote that pass because what was available was not general enough. The stackdepth_walk function made assumptions that, while true of code generated by the current cpython frontend, were not universally true. If a goal is to move this

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread Cesare Di Mauro
2016-05-17 8:25 GMT+02:00 : > In the project https://github.com/zachariahreed/byteasm I mentioned on > the list earlier this month, I have a pass that to computes stack usage > for a given sequence of bytecodes. It seems to be a fair bit more > agressive than cpython. Maybe

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-17 Thread zreed
In the project https://github.com/zachariahreed/byteasm I mentioned on the list earlier this month, I have a pass that to computes stack usage for a given sequence of bytecodes. It seems to be a fair bit more agressive than cpython. Maybe it's more generally useful. It's pure python rather than C

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-16 Thread Cesare Di Mauro
2016-05-16 17:55 GMT+02:00 Meador Inge : > On Sun, May 15, 2016 at 2:23 AM, Cesare Di Mauro < > cesare.di.ma...@gmail.com> wrote: > > >> Just one thing that comes to my mind: is the stack depth calculation >> routine changed? It was suboptimal, and calculating a better number

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-16 Thread Meador Inge
On Sun, May 15, 2016 at 2:23 AM, Cesare Di Mauro wrote: > Just one thing that comes to my mind: is the stack depth calculation > routine changed? It was suboptimal, and calculating a better number > decreases stack allocation, and increases the frame usage. > This is

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-15 Thread Cesare Di Mauro
2016-02-01 17:54 GMT+01:00 Yury Selivanov : > Thanks for bringing this up! > > IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per > bytecode instead of 8. No, it used 16, 32, and 48-bit per opcode (1, 2, or 3 16-bit words). > That allows to minimize

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-15 Thread Cesare Di Mauro
2016-02-02 10:28 GMT+01:00 Victor Stinner : > 2016-01-27 19:25 GMT+01:00 Yury Selivanov : > > tl;dr The summary is that I have a patch that improves CPython > performance > > up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-04 Thread Nick Coghlan
On 3 February 2016 at 03:52, Brett Cannon wrote: > Fifth, if we manage to show that a C API can easily be added to CPython to > make a JIT something that can simply be plugged in and be useful, then we > will also have a basic JIT framework for people to use. As I said, our use

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-02 Thread Peter Ludemann via Python-Dev
Also, modern compiler technology tends to use "infinite register" machines for the intermediate representation, then uses register coloring to assign the actual registers (and generate spill code if needed). I've seen work on inter-function optimization for avoiding some register loads and stores

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-02 Thread Brett Cannon
On Tue, 2 Feb 2016 at 01:29 Victor Stinner wrote: > Hi, > > I'm back for the FOSDEM event at Bruxelles, it was really cool. I gave > talk about FAT Python and I got good feedback. But friends told me > that people now have expectations on FAT Python. It looks like

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-02 Thread Yury Selivanov
On 2016-02-02 4:28 AM, Victor Stinner wrote: [..] I take a first look at your patch and sorry, Thanks for the initial code review! I'm skeptical about the design. I have to play with it a little bit more to check if there is no better design. So far I see two things you are worried

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-02 Thread Victor Stinner
Hi, I'm back for the FOSDEM event at Bruxelles, it was really cool. I gave talk about FAT Python and I got good feedback. But friends told me that people now have expectations on FAT Python. It looks like people care of Python performance :-) FYI the slides of my talk:

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-02 Thread Sven R. Kunze
On 02.02.2016 00:27, Greg Ewing wrote: Sven R. Kunze wrote: Are there some resources on why register machines are considered faster than stack machines? If a register VM is faster, it's probably because each register instruction does the work of about 2-3 stack instructions, meaning less

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 19:28, Brett Cannon wrote: A search for [stack vs register based virtual machine] will get you some information. Alright. :) Will go for that. You aren't really supposed to yet. :) In Pyjion's case we are still working on compatibility, let alone trying to show a speed

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 09:08 Yury Selivanov wrote: > > > On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: > >> Hi, > >> > >> > >> tl;dr The summary is that I have a patch that improves CPython > >>

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 18:18, Brett Cannon wrote: On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: >> Hi,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 10:21 Sven R. Kunze wrote: > > > On 01.02.2016 18:18, Brett Cannon wrote: > > > > On Mon, 1 Feb 2016 at 09:08 Yury Selivanov < > yselivanov...@gmail.com> wrote: > >> >> >> On 2016-01-29 11:28 PM, Steven D'Aprano wrote: >> > On Wed,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Mark Lawrence
On 01/02/2016 16:54, Yury Selivanov wrote: On 2016-01-29 11:28 PM, Steven D'Aprano wrote: On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: Hi, tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Greg Ewing
Sven R. Kunze wrote: Are there some resources on why register machines are considered faster than stack machines? If a register VM is faster, it's probably because each register instruction does the work of about 2-3 stack instructions, meaning less trips around the eval loop, so less

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov
Hi Brett, On 2016-02-01 12:18 PM, Brett Cannon wrote: On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: [..] If I were to do some big refactoring of the ceval loop, I'd probably consider implementing a register VM.

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov
On 2016-01-29 11:28 PM, Steven D'Aprano wrote: On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: Hi, tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS X, desktop CPU/Linux,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze
On 01.02.2016 17:54, Yury Selivanov wrote: If I were to do some big refactoring of the ceval loop, I'd probably consider implementing a register VM. While register VMs are a bit faster than stack VMs (up to 20-30%), they would also allow us to apply more optimizations, and even bolt on a

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-29 Thread Damien George
Hi Yury, > An off-topic: have you ever tried hg.python.org/benchmarks > or compare MicroPython vs CPython? I'm curious if MicroPython > is faster -- in that case we'll try to copy some optimization > ideas. I've tried a small number of those benchmarks, but not in any rigorous way, and not

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-29 Thread Stefan Behnel
Yury Selivanov schrieb am 27.01.2016 um 19:25: > tl;dr The summary is that I have a patch that improves CPython performance > up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS > X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no > slowdowns that I

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-29 Thread Yury Selivanov
On 2016-01-29 5:00 AM, Stefan Behnel wrote: Yury Selivanov schrieb am 27.01.2016 um 19:25: [..] LOAD_METHOD looks at the object on top of the stack, and checks if the name resolves to a method or to a regular attribute. If it's a method, then we push the unbound method object and the object

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-29 Thread Yury Selivanov
Hi Damien, BTW I just saw (and backed!) your new Kickstarter campaign to port MicroPython to ESP8266, good stuff! On 2016-01-29 7:38 AM, Damien George wrote: Hi Yury, [..] Do you use opcode dictionary caching only for LOAD_GLOBAL-like opcodes? Do you have an equivalent of LOAD_FAST, or you

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-29 Thread Steven D'Aprano
On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: > Hi, > > > tl;dr The summary is that I have a patch that improves CPython > performance up to 5-10% on macro benchmarks. Benchmarks results on > Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available > at [1].

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Damien George
Hi Yuri, I think these are great ideas to speed up CPython. They are probably the simplest yet most effective ways to get performance improvements in the VM. MicroPython has had LOAD_METHOD/CALL_METHOD from the start (inspired by PyPy, and the main reason to have it is because you don't need to

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
BTW, this optimization also makes some old optimization tricks obsolete. 1. No need to write 'def func(len=len)'. Globals lookups will be fast. 2. No need to save bound methods: obj = [] obj_append = obj.append for _ in range(10**6): obj_append(something) This hand-optimized code would

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
On 2016-01-27 3:46 PM, Glenn Linderman wrote: On 1/27/2016 12:37 PM, Yury Selivanov wrote: MicroPython also has dictionary lookup caching, but it's a bit different to your proposal. We do something much simpler: each opcode that has a cache ability (eg LOAD_GLOBAL, STORE_GLOBAL,

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Glenn Linderman
On 1/27/2016 12:37 PM, Yury Selivanov wrote: MicroPython also has dictionary lookup caching, but it's a bit different to your proposal. We do something much simpler: each opcode that has a cache ability (eg LOAD_GLOBAL, STORE_GLOBAL, LOAD_ATTR, etc) includes a single byte in the opcode which

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Brett Cannon
On Wed, 27 Jan 2016 at 10:26 Yury Selivanov wrote: > Hi, > > > tl;dr The summary is that I have a patch that improves CPython > performance up to 5-10% on macro benchmarks. Benchmarks results on > Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available >

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Damien George
Hi Yury, (Sorry for misspelling your name previously!) > Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all > kind of method calls. However, I'm not sure how big the impact will be, > need to do more benchmarking. I never did such fine grained analysis with MicroPython.

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
Damien, On 2016-01-27 4:20 PM, Damien George wrote: Hi Yury, (Sorry for misspelling your name previously!) NP. As long as the first letter is "y" I don't care ;) Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all kind of method calls. However, I'm not sure how big

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
On 2016-01-27 3:10 PM, Damien George wrote: Hi Yuri, I think these are great ideas to speed up CPython. They are probably the simplest yet most effective ways to get performance improvements in the VM. Thanks! MicroPython has had LOAD_METHOD/CALL_METHOD from the start (inspired by PyPy,

[Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
Hi, tl;dr The summary is that I have a patch that improves CPython performance up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no slowdowns that I could reproduce consistently. There are

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
On 2016-01-27 3:01 PM, Brett Cannon wrote: [..] We can also optimize LOAD_METHOD. There are high chances, that 'obj' in 'obj.method()' will be of the same type every time we execute the code object. So if we'd have an opcodes cache, LOAD_METHOD could then cache a

Re: [Python-Dev] Speeding up CPython 5-10%

2016-01-27 Thread Yury Selivanov
As Brett suggested, I've just run the benchmarks suite with memory tracking on. The results are here: https://gist.github.com/1st1/1851afb2773526fd7c58 Looks like the memory increase is around 1%. One synthetic micro-benchmark, unpack_sequence, contains hundreds of lines that load a global