Re: [Python-Dev] Python 3 optimizations, continued, continued again...
On Wed, Feb 1, 2012 at 20:08, stefan brunthaler s.bruntha...@uci.edu wrote: I understand all of these issues. Currently, it's not really a mess, but much more complicated as it needs to be for only supporting the inca optimization. I really don't think that is a problem. The core contributors can deal well with complexity in my experience. :-) //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
I really don't think that is a problem. The core contributors can deal well with complexity in my experience. :-) No no, I wasn't trying to insinuate anything like that at all. No, I just figured that having the code generator being able to generate 4 optimizations where only one is supported is a bad idea for several reasons, such as maintainability, etc. Anyways, I've just completed the integration of the code generator and put the corresponding patch on my page (http://www.ics.uci.edu/~sbruntha/pydev.html) for downloading. The license thing is still missing, I'll do that tomorrow or sometime next week. Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
How many times did you regenerate this code until you got it right? Well, honestly, I changed the code generator to pack the new optimized instruction derivatives densly into the available opcodes, so that I can make optimal use of what's there. Thus I only generated the code twice for this patch. And how do you know that you really got it so right that it was the last time ever that you needed your generator for it? I am positive that I am going to need my code generator in the future, as I have several ideas to increase performance even more. As I have mentioned before, my quickening based inline caching technique is very simple, and if it would crash, chances are that any of the inline-cache miss guards don't capture all scenarios, i.e., are non-exhaustive. The regression-tests run, so do the official benchmarks plus the computer language benchmarks game. In addition, this has been my line of research since 2009, so I have extensive experience with it, too. What if the C structure of any of those several types ever changes? Since I optimize interpreter instructions, any change that affects their implementation requires changing of the optimized instructions, too. Having the code generator ready for such things would certainly be a good idea (probably also for generating the default interpreter dispatch loop), since you could also add your own profile for your application/domain to re-use the remaining 30+ instruction opcodes. The direct answer is that I would need to re-generate the driver file, which is basically a gdb-dump plus an Emacs macro (please note that I did not need to do that since working with ~ 3.0b1) I will add a list of the types I use for specializing to patch section on the additional resources page of my homepage (including a fixed patch of what Georg brought to my attention.) --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Wiadomość napisana przez stefan brunthaler w dniu 1 lut 2012, o godz. 16:55: And how do you know that you really got it so right that it was the last time ever that you needed your generator for it? I am positive that I am going to need my code generator in the future, as I have several ideas to increase performance even more. Hello, Stefan. First let me thank you for your interest in improving the interpreter. We appreciate and encourage efforts to make it perform better. But let me put this straight: as an open-source project, we are hesitant to accept changes which depend on closed software. Even if your optimization techniques would result in performance a hundred times better than what is currently achieved, we would still be wary to accept them. Please note that this is not because of lack of trust or better yet greed for your code. We need to make sure that under no circumstances our codebase is in danger because something important was left out along the way. Maintenance of generated code is yet another nuissance that should better be strongly justified. -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
But let me put this straight: as an open-source project, we are hesitant to accept changes which depend on closed software. Even if your optimization techniques would result in performance a hundred times better than what is currently achieved, we would still be wary to accept them. Please note that this is not because of lack of trust or better yet greed for your code. We need to make sure that under no circumstances our codebase is in danger because something important was left out along the way. I am positive that the code generator does not depend on any closed source components, I just juse mako for storing the C code templates that I generate -- everything else I wrote myself. Of course, I'll give the code generator to pydev, too, if necessary. However, I need to strip it down, so that it does not do all the other stuff that you don't need. I just wanted to give you the implementation now, since Benjamin said that he wants to see real code and results first. If you want to integrate the inca-optimization, I am going to start working on this asap. Maintenance of generated code is yet another nuissance that should better be strongly justified. I agree, but the nice thing is that the technique is very simple: only if you changed a significant part of the interpreter implementation's, you'd need to change the optimized derivatives, too. If one generates the default interpreter implementation, too, then one gets the optimizations almost for free. For maintenance reasons I chose to use a template-based system, too, since this gives you a direct correspondence between the actual code and what's generated, without interfering with the code generator at all. --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Let's make one thing clear. The Python core developers need to be able to reproduce your results from scratch, and that means access to the templates, code generators, inputs, and everything else you used. (Of course for stuff you didn't write that's already open source, all we need is a pointer to the open source project and the exact version/configuration you used, plus any local mods you made.) I understand that you're hesitant to just dump your current mess, and you want to clean it up before you show it to us. That's fine. But until you're ready to show it, we're not going to integrate any of your work into CPython, even though some of us (maybe Benjamin) may be interested in kicking its tires. And remember, it doesn't need to be perfect (in fact perfectionism is probably a bad idea here). But it does need to be open source. Every single bit of it. (And no GPL, please.) --Guido 2012/2/1 stefan brunthaler s.bruntha...@uci.edu: But let me put this straight: as an open-source project, we are hesitant to accept changes which depend on closed software. Even if your optimization techniques would result in performance a hundred times better than what is currently achieved, we would still be wary to accept them. Please note that this is not because of lack of trust or better yet greed for your code. We need to make sure that under no circumstances our codebase is in danger because something important was left out along the way. I am positive that the code generator does not depend on any closed source components, I just juse mako for storing the C code templates that I generate -- everything else I wrote myself. Of course, I'll give the code generator to pydev, too, if necessary. However, I need to strip it down, so that it does not do all the other stuff that you don't need. I just wanted to give you the implementation now, since Benjamin said that he wants to see real code and results first. If you want to integrate the inca-optimization, I am going to start working on this asap. Maintenance of generated code is yet another nuissance that should better be strongly justified. I agree, but the nice thing is that the technique is very simple: only if you changed a significant part of the interpreter implementation's, you'd need to change the optimized derivatives, too. If one generates the default interpreter implementation, too, then one gets the optimizations almost for free. For maintenance reasons I chose to use a template-based system, too, since this gives you a direct correspondence between the actual code and what's generated, without interfering with the code generator at all. --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
On Wed, Feb 1, 2012 at 09:46, Guido van Rossum gu...@python.org wrote: Let's make one thing clear. The Python core developers need to be able to reproduce your results from scratch, and that means access to the templates, code generators, inputs, and everything else you used. (Of course for stuff you didn't write that's already open source, all we need is a pointer to the open source project and the exact version/configuration you used, plus any local mods you made.) I understand that you're hesitant to just dump your current mess, and you want to clean it up before you show it to us. That's fine. But until you're ready to show it, we're not going to integrate any of your work into CPython, even though some of us (maybe Benjamin) may be interested in kicking its tires. And remember, it doesn't need to be perfect (in fact perfectionism is probably a bad idea here). But it does need to be open source. Every single bit of it. (And no GPL, please.) I understand all of these issues. Currently, it's not really a mess, but much more complicated as it needs to be for only supporting the inca optimization. I don't know what the time frame for a possible integration is (my guess is that it'd be safe anyways to disable it, like the threaded code support was handled.) As for the license: I really don't care about that at all, the only thing nice to have would be to have a pointer to my home page and/or the corresponding research, but that's about all on my wish list. --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
On Feb 1, 2012, at 12:46 PM, Guido van Rossum wrote: I understand that you're hesitant to just dump your current mess, and you want to clean it up before you show it to us. That's fine. (...) And remember, it doesn't need to be perfect (in fact perfectionism is probably a bad idea here). Just as a general point of advice to open source contributors, I'd suggest erring on the side of the latter rather than the former suggestion here: dump your current mess, along with the relevant caveats (it's a mess, much of it is irrelevant) so that other developers can help you clean it up, rather than putting the entire burden of the cleanup on yourself. Experience has taught me that most people who hold back work because it needs cleanup eventually run out of steam and their work never gets integrated and maintained. -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
On Wed, Feb 1, 2012 at 11:08 AM, stefan brunthaler s.bruntha...@uci.edu wrote: On Wed, Feb 1, 2012 at 09:46, Guido van Rossum gu...@python.org wrote: Let's make one thing clear. The Python core developers need to be able to reproduce your results from scratch, and that means access to the templates, code generators, inputs, and everything else you used. (Of course for stuff you didn't write that's already open source, all we need is a pointer to the open source project and the exact version/configuration you used, plus any local mods you made.) I understand that you're hesitant to just dump your current mess, and you want to clean it up before you show it to us. That's fine. But until you're ready to show it, we're not going to integrate any of your work into CPython, even though some of us (maybe Benjamin) may be interested in kicking its tires. And remember, it doesn't need to be perfect (in fact perfectionism is probably a bad idea here). But it does need to be open source. Every single bit of it. (And no GPL, please.) I understand all of these issues. Currently, it's not really a mess, but much more complicated as it needs to be for only supporting the inca optimization. I don't know what the time frame for a possible integration is (my guess is that it'd be safe anyways to disable it, like the threaded code support was handled.) It won't be integrated until you have published your mess. As for the license: I really don't care about that at all, the only thing nice to have would be to have a pointer to my home page and/or the corresponding research, but that's about all on my wish list. Please don't try to enforce that in the license. That usually backfires. Use Apache 2, which is what the PSF prefers. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
I assume yes here means yes, I'm aware and not yes, I'm using Python 2, right? And you're building on top of the existing support for threaded code in order to improve it? Your assumption is correct, I'm sorry for the sloppiness (I was heading out for lunch.) None of the code is 2.x compatible, all of my work has always targeted Python 3.x. My work does not improve threaded code (as in interpreter dispatch technique), but enables efficient and purely interpretative inline caching via quickening. (So, after execution of BINARY_ADD, I rewrite the specific occurence of the bytecode instruction to a, say, FLOAT_ADD instruction and ensure that my assumption is correct in the FLOAT_ADD instruction.) Thanks, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
If I read the patch correctly, most of it is auto-generated (and there is probably a few spurious changes that blow it up, such as the python-gdb.py file). Hm, honestly I don't know where the python-gdb.py file comes from, I thought it came with the switch from 3.1 to the tip version I was using. Anyways, I did not tuch it or at least have no recollection of doing so. Regarding the spurious changes: This might very well be, regression testing works, and it would actually be fairly easy to figure out crashes (e.g., by tracing all executed bytecode instructions and seeing if all of them are actually executed, I could easily do that if wanted/necessary.) But the tool that actually generates the code doesn't seem to be included? (Which means that in this form, the patch couldn't possibly be accepted.) Well, the tool is not included because it does a lot more (e.g., generate the code for elimination of reference count operations.) Unfortunately, my interpreter architecture that achieves the highest speedups is more complicated, and I got the feeling that this is not going well with python-dev. So, I had the idea of basically using just one (but a major one) optimization technique and going with that. I don't see why you would need my code generator, though. Not that I cared, but I would need to strip down and remove many parts of it and also make it more accessible to other people. However, if python-dev decides that it wants to include the optimizations and requires the code generator, I'll happily chip in the extra work an give you the corresponding code generator, too. Thanks, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Am 31.01.2012 16:46, schrieb stefan brunthaler: If I read the patch correctly, most of it is auto-generated (and there is probably a few spurious changes that blow it up, such as the python-gdb.py file). Hm, honestly I don't know where the python-gdb.py file comes from, I thought it came with the switch from 3.1 to the tip version I was using. Anyways, I did not tuch it or at least have no recollection of doing so. Regarding the spurious changes: This might very well be, regression testing works, and it would actually be fairly easy to figure out crashes (e.g., by tracing all executed bytecode instructions and seeing if all of them are actually executed, I could easily do that if wanted/necessary.) There is also the issue of the two test modules removed from the test suite. But the tool that actually generates the code doesn't seem to be included? (Which means that in this form, the patch couldn't possibly be accepted.) Well, the tool is not included because it does a lot more (e.g., generate the code for elimination of reference count operations.) Unfortunately, my interpreter architecture that achieves the highest speedups is more complicated, and I got the feeling that this is not going well with python-dev. So, I had the idea of basically using just one (but a major one) optimization technique and going with that. I don't see why you would need my code generator, though. Not that I cared, but I would need to strip down and remove many parts of it and also make it more accessible to other people. However, if python-dev decides that it wants to include the optimizations and requires the code generator, I'll happily chip in the extra work an give you the corresponding code generator, too. Well, nobody wants to review generated code. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
There is also the issue of the two test modules removed from the test suite. Oh, I'm sorry, seems like the patch did contain too much of my development stuff. (I did remove them before, because they were always failing due to the instruction opcodes being changed because of quickening; they pass the tests, though.) Well, nobody wants to review generated code. I agree. The code generator basically uses templates that contain the information and a dump of the C-structure of several types to traverse and see which one of them implements which functions. There is really no magic there, the most complex thing is to get the inline-cache miss checks for function calls right. But I tried to make the generated code look pretty, so that working with it is not too much of a hassle. The code generator itself is a little bit more complicated, so I am not sure it would help a lot... best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
stefan brunthaler, 31.01.2012 22:17: Well, nobody wants to review generated code. I agree. The code generator basically uses templates that contain the information and a dump of the C-structure of several types to traverse and see which one of them implements which functions. There is really no magic there, the most complex thing is to get the inline-cache miss checks for function calls right. But I tried to make the generated code look pretty, so that working with it is not too much of a hassle. The code generator itself is a little bit more complicated, so I am not sure it would help a lot... How many times did you regenerate this code until you got it right? And how do you know that you really got it so right that it was the last time ever that you needed your generator for it? What if the C structure of any of those several types ever changes? Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Hello, Could you try benchmarking with the standard benchmarks: http://hg.python.org/benchmarks/ and see what sort of performance gains you get? Yeah, of course. I already did. Refere to the page listed below for details. I did not look into the results yet, though. How portable is the threaded interpreter? Well, you can implement threaded code on any machine that support indirect branch instructions. Fortunately, GCC supports the label-as-values feature, which makes it available on any machine that supports GCC. My optimizations themselves are portable, and I tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports this feature, too.) Do you have a public repository for the code, so we can take a look? I have created a patch (as Benjamin wanted) and put all of the resources (i.e., benchmark results and the patch itself) on my home page: http://www.ics.uci.edu/~sbruntha/pydev.html Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Hello, Well, you can implement threaded code on any machine that support indirect branch instructions. Fortunately, GCC supports the label-as-values feature, which makes it available on any machine that supports GCC. My optimizations themselves are portable, and I tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports this feature, too.) Well, you're aware that Python already uses threaded code where available? Or are you testing against Python 2? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Well, you're aware that Python already uses threaded code where available? Or are you testing against Python 2? Yes, and I am building on that. --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
stefan brunthaler, 30.01.2012 20:18: Well, you're aware that Python already uses threaded code where available? Or are you testing against Python 2? Yes, and I am building on that. I assume yes here means yes, I'm aware and not yes, I'm using Python 2, right? And you're building on top of the existing support for threaded code in order to improve it? Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Am 30.01.2012 20:06, schrieb stefan brunthaler: Do you have a public repository for the code, so we can take a look? I have created a patch (as Benjamin wanted) and put all of the resources (i.e., benchmark results and the patch itself) on my home page: http://www.ics.uci.edu/~sbruntha/pydev.html If I read the patch correctly, most of it is auto-generated (and there is probably a few spurious changes that blow it up, such as the python-gdb.py file). But the tool that actually generates the code doesn't seem to be included? (Which means that in this form, the patch couldn't possibly be accepted.) Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
stefan brunthaler wrote: Hi, On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote: 2011/11/8 stefan brunthaler s.bruntha...@uci.edu: How does that sound? I think I can hear real patches and benchmarks most clearly. I spent the better part of my -20% time on implementing the work as suggested. Please find the benchmarks attached to this email, I just Could you try benchmarking with the standard benchmarks: http://hg.python.org/benchmarks/ and see what sort of performance gains you get? did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched off the regular 3.3a0 default tip changeset 73977 shortly after your email. I do not have an official patch yet, but am going to create one if wanted. Changes to the existing interpreter are minimal, the biggest chunk is a new interpreter dispatch loop. How portable is the threaded interpreter? Do you have a public repository for the code, so we can take a look? Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
Hi, On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote: 2011/11/8 stefan brunthaler s.bruntha...@uci.edu: How does that sound? I think I can hear real patches and benchmarks most clearly. I spent the better part of my -20% time on implementing the work as suggested. Please find the benchmarks attached to this email, I just did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched off the regular 3.3a0 default tip changeset 73977 shortly after your email. I do not have an official patch yet, but am going to create one if wanted. Changes to the existing interpreter are minimal, the biggest chunk is a new interpreter dispatch loop. Merging dispatch loops eliminates some of my optimizations, but my inline caching technique enables inlining some functionality, which results in visible speedups. The code is normalized to the non-threaded-code version of the CPython interpreter (named vanilla), so that I can reference it to my preceding results. I anticipate *no* compatibility issues and the interpreter requires less than 100 KiB of extra memory at run-time. Since my interpreter is using 215 of a maximum of 255 instructions, there is room for adding additional derivatives, e.g., for popular Python libraries, too. Let me know what python-dev thinks of this and have a nice weekend, --stefan PS: AFAIR the version without partial stack frame caching also passes all regression tests modulo the ones that test against specific bytecodes. currently processing: bench/binarytrees.py3.py phd-cpy-3a0-thr-cod-pytho arg: 10 | time: 0.161876 | stdev: 0.007780 | var: 0.61 | mem: 6633.60 phd-cpy-3a0-thr-cod-pytho arg: 12 | time: 0.699243 | stdev: 0.019112 | var: 0.000365 | mem: 8142.67 phd-cpy-3a0-thr-cod-pytho arg: 14 | time: 3.388344 | stdev: 0.048042 | var: 0.002308 | mem: 13586.93 phd-cpy-pio-sne-pre-pyt-no-psf arg: 10 | time: 0.153875 | stdev: 0.003828 | var: 0.15 | mem: 6873.73 phd-cpy-pio-sne-pre-pyt-no-psf arg: 12 | time: 0.632572 | stdev: 0.019121 | var: 0.000366 | mem: 8246.27 phd-cpy-pio-sne-pre-pyt-no-psf arg: 14 | time: 3.020988 | stdev: 0.043483 | var: 0.001891 | mem: 13640.27 phd-cpy-pio-sne-pre-pytho arg: 10 | time: 0.150942 | stdev: 0.005157 | var: 0.27 | mem: 6901.87 phd-cpy-pio-sne-pre-pytho arg: 12 | time: 0.660841 | stdev: 0.020538 | var: 0.000422 | mem: 8286.80 phd-cpy-pio-sne-pre-pytho arg: 14 | time: 3.184198 | stdev: 0.051103 | var: 0.002612 | mem: 13680.40 phd-cpy-3a0-van-pytho arg: 10 | time: 0.202812 | stdev: 0.005480 | var: 0.30 | mem: 6633.33 phd-cpy-3a0-van-pytho arg: 12 | time: 0.908456 | stdev: 0.015744 | var: 0.000248 | mem: 8153.07 phd-cpy-3a0-van-pytho arg: 14 | time: 4.364805 | stdev: 0.037522 | var: 0.001408 | mem: 13593.60 ### phd-cpy-3a0-thr-cod-pytho : 1.2887 (avg-sum: 1.416488) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.4383 (avg-sum: 1.269145) ### phd-cpy-pio-sne-pre-pytho : 1.3704 (avg-sum: 1.331994) ### phd-cpy-3a0-van-pytho : 1. (avg-sum: 1.825358) currently processing: bench/fannkuch.py3.py phd-cpy-3a0-thr-cod-pytho arg: 8 | time: 0.172677 | stdev: 0.006620 | var: 0.44 | mem: 6424.13 phd-cpy-3a0-thr-cod-pytho arg: 9 | time: 1.426755 | stdev: 0.035545 | var: 0.001263 | mem: 6425.20 phd-cpy-pio-sne-pre-pyt-no-psf arg: 8 | time: 0.168010 | stdev: 0.010277 | var: 0.000106 | mem: 6481.07 phd-cpy-pio-sne-pre-pyt-no-psf arg: 9 | time: 1.345817 | stdev: 0.033127 | var: 0.001097 | mem: 6479.60 phd-cpy-pio-sne-pre-pytho arg: 8 | time: 0.165876 | stdev: 0.007136 | var: 0.51 | mem: 6520.00 phd-cpy-pio-sne-pre-pytho arg: 9 | time: 1.351150 | stdev: 0.028822 | var: 0.000831 | mem: 6519.73 phd-cpy-3a0-van-pytho arg: 8 | time: 0.216146 | stdev: 0.012879 | var: 0.000166 | mem: 6419.07 phd-cpy-3a0-van-pytho arg: 9 | time: 1.834247 | stdev: 0.028224 | var: 0.000797 | mem: 6418.67 ### phd-cpy-3a0-thr-cod-pytho : 1.2820 (avg-sum: 0.799716) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.3544 (avg-sum: 0.756913) ### phd-cpy-pio-sne-pre-pytho : 1.3516 (avg-sum: 0.758513) ### phd-cpy-3a0-van-pytho : 1. (avg-sum: 1.025197) currently processing: bench/fasta.py3.py phd-cpy-3a0-thr-cod-pytho arg: 5 | time: 0.374023 | stdev: 0.010870 | var: 0.000118 | mem: 6495.07 phd-cpy-3a0-thr-cod-pytho arg: 10 | time: 0.714577 | stdev: 0.024713 | var: 0.000611 | mem: 6495.47 phd-cpy-3a0-thr-cod-pytho arg: 15 | time: 1.062866 | stdev: 0.040138 | var: 0.001611 | mem: 6496.27 phd-cpy-pio-sne-pre-pyt-no-psf arg: 5 | time: 0.345621 | stdev: 0.022549 | var: 0.000508 | mem: 6551.87 phd-cpy-pio-sne-pre-pyt-no-psf
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
2012/1/27 stefan brunthaler s.bruntha...@uci.edu: Hi, On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson benja...@python.org wrote: 2011/11/8 stefan brunthaler s.bruntha...@uci.edu: How does that sound? I think I can hear real patches and benchmarks most clearly. I spent the better part of my -20% time on implementing the work as suggested. Please find the benchmarks attached to this email, I just did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched off the regular 3.3a0 default tip changeset 73977 shortly after your email. I do not have an official patch yet, but am going to create one if wanted. Changes to the existing interpreter are minimal, the biggest chunk is a new interpreter dispatch loop. Merging dispatch loops eliminates some of my optimizations, but my inline caching technique enables inlining some functionality, which results in visible speedups. The code is normalized to the non-threaded-code version of the CPython interpreter (named vanilla), so that I can reference it to my preceding results. I anticipate *no* compatibility issues and the interpreter requires less than 100 KiB of extra memory at run-time. Since my interpreter is using 215 of a maximum of 255 instructions, there is room for adding additional derivatives, e.g., for popular Python libraries, too. Let me know what python-dev thinks of this and have a nice weekend, Cool. It'd be nice to see a patch. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3 optimizations, continued, continued again...
Hi guys, while there is at least some interest in incorporating my optimizations, response has still been low. I figure that the changes are probably too much for a single big incorporation step. On a recent flight, I thought about cutting it down to make it more easily digestible. The basic idea is to remove the optimized interpreter dispatch loop and advanced instruction format and use the existing ones. Currently (rev. ca8a0dfb2176), opcode.h uses 109 of potentially available 255 instructions using the current instruction format. Hence, up to 149 instruction opcodes could be given to optimized instruction derivatives. Consequently, a possible change would require to change: a) opcode.h to add new instruction opcodes, b) ceval.c to include the new instruction opcodes in PyEval_EvalFrameEx, c) abstract.c, object.c (possible other files) to add the quickening/rewriting function calls. If this is more interesting, I could start evaluating which instruction opcodes should be allocated to which derivatives to get the biggest benefit. This is a lot easier to implement (because I can re-use the existing instruction implementations) and can easily be made to be conditionally compile-able, similar to the computed-gotos option. Since the changes are minimal it is also simpler to understand and deal with for everybody else, too. On the downside, however, not all optimizations are possible and/or make sense in the given limit of instructions (no data-object inlining and no reference-count elimination.) How does that sound? Have a nice day, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
2011/11/8 stefan brunthaler s.bruntha...@uci.edu: How does that sound? I think I can hear real patches and benchmarks most clearly. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
stefan brunthaler, 02.09.2011 06:37: as promised, I created a publicly available preview of an implementation with my optimizations, which is available under the following location: https://bitbucket.org/py3_pio/preview/wiki/Home I followed Nick's advice and added some valuable advice and overview/introduction at the wiki page the link points to, I am positive that spending 10mins reading this will provide you with a valuable information regarding what's happening. It does, thanks. A couple of remarks: 1) The SFC optimisation is purely based on static code analysis, right? I assume it takes loops into account (and just multiplies scores for inner loops)? Is that what you mean with nesting level? Obviously, static analysis can sometimes be misleading, e.g. when there's a rare special case with lots of loops that needs to adapt input data in some way, but in general, I'd expect that this heuristic would tend to hit the important cases, especially for well structured code with short functions. 2) The RC elimination is tricky to get right and thus somewhat dangerous, but sounds worthwhile and should work particularly well on a stack based byte code interpreter like CPython. 3) Inline caching also sounds worthwhile, although I wonder how large the savings will be here. You'd save a couple of indirect jumps at the C-API level, sure, but apart from that, my guess is that it would highly depend on the type of instruction. Certain (repeated) calls to C implemented functions would likely benefit quite a bit, for example, which would be a nice optimisation by itself, e.g. for builtins. I would expect that the same applies to iterators, even a couple of percent faster iteration can make a great deal of a difference, and a substantial set of iterators are implemented in C, e.g. itertools, range, zip and friends. I'm not so sure about arithmetic operations. In Cython, we (currently?) do not optimistically replace these with more specific code (unless we know the types at compile time), because it complicates the generated C code and indirect jumps aren't all that slow that the benefit would be important. Savings are *much* higher when data can be unboxed, so much that the slight improvement for optimistic type guesses is totally dwarfed in Cython. I would expect that the return of investment is better when the types are actually known at runtime, as in your case. 4) Regarding inlined object references, I would expect that it's much more worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess that this would be best helped by watching the module dict and the builtin dict internally and invalidating the interpreter state after changes (e.g. by providing a change counter in those dicts and checking that in the instructions that access them), and otherwise keeping the objects cached. Simply watching the dedicated instructions that change that state isn't enough as Python allows code to change these dicts directly through their dict interface. All in all, your list does sound like an interesting set of changes that are both understandable and worthwhile. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
as promised, I created a publicly available preview of an implementation with my optimizations, which is available under the following location: https://bitbucket.org/py3_pio/preview/wiki/Home One very important thing that I forgot was to indicate that you have to use computed gotos (i.e., configure --with-computed-gotos), otherwise it won't work (though I think that most people can figure this out easily, knowing this a priori isn't too bad.) Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
1) The SFC optimisation is purely based on static code analysis, right? I assume it takes loops into account (and just multiplies scores for inner loops)? Is that what you mean with nesting level? Obviously, static analysis can sometimes be misleading, e.g. when there's a rare special case with lots of loops that needs to adapt input data in some way, but in general, I'd expect that this heuristic would tend to hit the important cases, especially for well structured code with short functions. Yes, currently I only use the heuristic to statically estimate utility of assigning an optimized slot to a local variable. And, another yes, nested blocks (like for-statements) is what I have in mind when using nesting level. I was told that the algorithm itself is very similar to linear scan register allocation, modulo the ability to spill values, of course. From my benchmarks and in-depth analysis of several programs, I found this to work very well. In fact, the only situation I found is (unfortunately) one of the top-most executed functions in US' bm_django.py: There is one loop that gets almost never executed but this loop gives precedence to local variables used inside. Because of this, I have already an idea for a better approach: first, use the static heuristic to compute stack slot score, then count back-branches (I would need this anyways, as the _Py_CheckInterval has gone and OSR/hot-swapping is in general a good idea) and record their frequency. Next, just replace the current static weight of 100 by the dynamically recorded weight. Consequently, you should get better allocations. (Please note that I did some quantitative analysis of bython functions to determine that using 4 SFC-slots covers a substantial amount of functions [IIRC 95%] with the trivial scenario when there are at most 4 local variables.) 2) The RC elimination is tricky to get right and thus somewhat dangerous, but sounds worthwhile and should work particularly well on a stack based byte code interpreter like CPython. Well, it was very tricky to get right when I implemented it first around Christmas 2009. The current implementation is reasonably simple to understand, however, it depends on the function refcount_effect to give me correct information at all times. I got the biggest performance improvement on one benchmark on the PowerPC and think that RISC architectures in general benefit more from this optimization (eliminate the load, add and store instructions) than x86 CISCs do (an INCREF is just an add on the memory location without data dependencies, so fairly cheap). In any case, however, you get the replication effect of improving CPU branch predicion by having these additional instruction derivatives. It would be interesting (research-wise, too) to be able to measure whether the reduction in memory operations makes Python programs use less energy, and if so, how much the difference is. 3) Inline caching also sounds worthwhile, although I wonder how large the savings will be here. You'd save a couple of indirect jumps at the C-API level, sure, but apart from that, my guess is that it would highly depend on the type of instruction. Certain (repeated) calls to C implemented functions would likely benefit quite a bit, for example, which would be a nice optimisation by itself, e.g. for builtins. I would expect that the same applies to iterators, even a couple of percent faster iteration can make a great deal of a difference, and a substantial set of iterators are implemented in C, e.g. itertools, range, zip and friends. I'm not so sure about arithmetic operations. In Cython, we (currently?) do not optimistically replace these with more specific code (unless we know the types at compile time), because it complicates the generated C code and indirect jumps aren't all that slow that the benefit would be important. Savings are *much* higher when data can be unboxed, so much that the slight improvement for optimistic type guesses is totally dwarfed in Cython. I would expect that the return of investment is better when the types are actually known at runtime, as in your case. Well, in my thesis I already hint at another improvement of the existing design that can work on unboxed data as well (while still being an interpreter.) I am eager to try this, but don't know how much time I can spend on this (because there are several other research projects I am actively involved in.) In my experience, this works very well and you cannot actually report good speedups without inline-caching arithmetic operations, simply because that's where all JITs shine and most benchmarks don't reflect real world scenarios but mathematics-inclined microbenchmarks. Also, if in the future compilers (gcc and clang) will be able to inline the invoked functions, higher speedups will be possible. 4) Regarding inlined object references, I would expect that it's much more worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST.
Re: [Python-Dev] Python 3 optimizations continued...
stefan brunthaler, 02.09.2011 17:55: 4) Regarding inlined object references, I would expect that it's much more worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess that this would be best helped by watching the module dict and the builtin dict internally and invalidating the interpreter state after changes (e.g. by providing a change counter in those dicts and checking that in the instructions that access them), and otherwise keeping the objects cached. Simply watching the dedicated instructions that change that state isn't enough as Python allows code to change these dicts directly through their dict interface. [...] Thanks for the pointers to the dict stuff, I will take a look (IIRC, Antoine pointed me in the same direction last year, but I think the design was slightly different then), Not unlikely, Antoine tends to know the internals pretty well. The Cython project has been (hand wavingly) thinking about this also: implement our own module type with its own __setattr__ (and dict proxy) in order to speed up access to the globals in the *very* likely case that they rarely or never change after module initialisation time and that most critical code accesses them read-only from within functions. If it turns out that this makes sense for CPython in general, it wouldn't be a bad idea to join forces at some point in order to make this readily usable for both sides. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
For a comparative real world benchmark I tested Martin von Loewis' django port (there are not that many meaningful Python 3 real world benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably well, US got a speedup of 1.35 on this benchmark. I just checked that pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures seem to be not working currently or *really* fast...), but I cannot tell directly how that relates to speedups (it just says less is better and I did not quickly find an explanation). Since I did this benchmark last year, I have spent more time investigating this benchmark and found that I could do better, but I would have to guess as to how much (An interesting aside though: on this benchmark, the executable never grew on more than 5 megs of memory usage, exactly like the vanilla Python 3 interpreter.) PyPy is ~12x faster on the django benchmark FYI ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Maciej Fijalkowski, 02.09.2011 20:42: For a comparative real world benchmark I tested Martin von Loewis' django port (there are not that many meaningful Python 3 real world benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably well, US got a speedup of 1.35 on this benchmark. I just checked that pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures seem to be not working currently or *really* fast...), but I cannot tell directly how that relates to speedups (it just says less is better and I did not quickly find an explanation). PyPy is ~12x faster on the django benchmark FYI FYI, there's a recent thread up on the pypy ML where someone is complaining about PyPy being substantially slower than CPython when running Django on top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the benchmark results are not comparable anyway. As usual, benchmark results depend on what you do in your benchmarks. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Fri, Sep 2, 2011 at 9:20 PM, Stefan Behnel stefan...@behnel.de wrote: Maciej Fijalkowski, 02.09.2011 20:42: For a comparative real world benchmark I tested Martin von Loewis' django port (there are not that many meaningful Python 3 real world benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably well, US got a speedup of 1.35 on this benchmark. I just checked that pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures seem to be not working currently or *really* fast...), but I cannot tell directly how that relates to speedups (it just says less is better and I did not quickly find an explanation). PyPy is ~12x faster on the django benchmark FYI FYI, there's a recent thread up on the pypy ML where someone is complaining about PyPy being substantially slower than CPython when running Django on top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the benchmark results are not comparable anyway. Yes, sqlite is slow. It's also much faster in trunk than in 1.6 and there is an open ticket about it :) The django benchmark is just templating, so it does not involve a database. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On 8/30/2011 4:41 PM, stefan brunthaler wrote: Ok, there there's something else you haven't told us. Are you saying that the original (old) bytecode is still used (and hence written to and read from .pyc files)? Short answer: yes. Long answer: I added an invocation counter to the code object and keep interpreting in the usual Python interpreter until this counter reaches a configurable threshold. When it reaches this threshold, I create the new instruction format and interpret with this optimized representation. All the macros look exactly the same in the source code, they are just redefined to use the different instruction format. I am at no point serializing this representation or the runtime information gathered by me, as any subsequent invocation might have different characteristics. When the switchover to the new instruction format happens, what happens to sys.settrace() tracing? Will it report the same sequence of line numbers? For a small but important class of program executions, this is more important than speed. --Ned. Best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/9/1 Ned Batchelder n...@nedbatchelder.com When the switchover to the new instruction format happens, what happens to sys.settrace() tracing? Will it report the same sequence of line numbers? For a small but important class of program executions, this is more important than speed. --Ned A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Cesare Di Mauro wrote: 2011/9/1 Ned Batchelder n...@nedbatchelder.com mailto:n...@nedbatchelder.com When the switchover to the new instruction format happens, what happens to sys.settrace() tracing? Will it report the same sequence of line numbers? For a small but important class of program executions, this is more important than speed. --Ned A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). What happens if tracing is enabled *during* the execution of the new instruction format? Some sort of deoptimisation will be required in order to recover the correct VM state. Cheers, Mark. Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/9/1 Mark Shannon m...@hotpy.org Cesare Di Mauro wrote: 2011/9/1 Ned Batchelder n...@nedbatchelder.com mailto: n...@nedbatchelder.com When the switchover to the new instruction format happens, what happens to sys.settrace() tracing? Will it report the same sequence of line numbers? For a small but important class of program executions, this is more important than speed. --Ned A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). What happens if tracing is enabled *during* the execution of the new instruction format? Some sort of deoptimisation will be required in order to recover the correct VM state. Cheers, Mark. Sure. I don't think that the regular ceval.c loop will be dropped when executing the new instruction format, so we can intercept a change like this using the why variable, for example, or something similar that is normally used to break the regular loop execution. Anyway, we need to take a look at the code. Cheers, Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote: A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Correct me if I'm wrong: doesn't this mean that no profiler will accurately be able to measure the performance impact of the new instruction format, and therefore one may get incorrect data when on is trying to make a CPU optimization for real-world performance? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Thu, Sep 1, 2011 at 10:15 AM, Glyph Lefkowitz gl...@twistedmatrix.com wrote: On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote: A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Correct me if I'm wrong: doesn't this mean that no profiler will accurately be able to measure the performance impact of the new instruction format, and therefore one may get incorrect data when on is trying to make a CPU optimization for real-world performance? Well, profilers already skew results by adding call overhead. But tracing for debugging and profiling don't do exactly the same thing: debug tracing stops at every line, but profiling only executes hooks at the start and end of a function(*). So I think the function body could still be executed using the new format (assuming this is turned on/off per code object anyway). (*) And whenever a generator yields or is resumed. I consider that an annoying bug though, just as the debugger doesn't do the right thing with yield -- there's no way to continue until the yielding generator is resumed short of setting a manual breakpoint on the next line. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Hi, as promised, I created a publicly available preview of an implementation with my optimizations, which is available under the following location: https://bitbucket.org/py3_pio/preview/wiki/Home I followed Nick's advice and added some valuable advice and overview/introduction at the wiki page the link points to, I am positive that spending 10mins reading this will provide you with a valuable information regarding what's happening. In addition, as Guido already mentioned, this is more or less a direct copy of my research-branch without some of my private comments and *no* additional refactorings because of software-engineering issues (which I am very much aware of.) I hope this clarifies a *lot* and makes it easier to see what parts are involved and how all the pieces fit together. I hope you'll like it, have fun, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
stefan brunthaler, 30.08.2011 22:41: Ok, there there's something else you haven't told us. Are you saying that the original (old) bytecode is still used (and hence written to and read from .pyc files)? Short answer: yes. Long answer: I added an invocation counter to the code object and keep interpreting in the usual Python interpreter until this counter reaches a configurable threshold. When it reaches this threshold, I create the new instruction format and interpret with this optimized representation. All the macros look exactly the same in the source code, they are just redefined to use the different instruction format. I am at no point serializing this representation or the runtime information gathered by me, as any subsequent invocation might have different characteristics. So, basically, you built a JIT compiler but don't want to call it that, right? Just because it compiles byte code to other byte code rather than to native CPU instructions does not mean it doesn't compile Just In Time. That actually sounds like a nice feature in general. It could even replace (or accompany?) the existing peep hole optimiser as part of a more general optimisation architecture, in the sense that it could apply byte code optimisations at runtime rather than compile time, potentially based on better knowledge about what's actually going on. I will remove my development commentaries and create a private repository at bitbucket I agree with the others that it's best to open up your repository for everyone who is interested. I can see no reason why you would want to close it back down once it's there. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
I think that you must deal with big endianess because some RISC can't handle at all data in little endian format. In WPython I have wrote some macros which handle both endianess, but lacking big endian machines I never had the opportunity to verify if something was wrong. I am sorry for the temporal lapse of not getting back to this directly yesterday, we were just heading out for lunch and I figured it only out then but immediately forgot it on our way back to the lab... So, as I have already said, I evaluated my optimizations on x86 (little-endian) and PowerPC 970 (big-endian) and I did not have to change any of my instruction decoding during interpretation. (The only nasty bug I still remember vividly was that while on gcc for x86 the data type char defaults to signed, whereas it defaults to unsigned on PowerPC's gcc.) When I have time and access to a PowerPC machine again (an ARM might be interesting, too), I will take a look at the generated assembly code to figure out why this is working. (I have some ideas why it might work without changing the code.) If I run into any problems, I'll gladly contact you :) BTW: AFAIR, we emailed last year regarding wpython and IIRC your optimizations could primarily be summarized as clever superinstructions. I have not implemented anything in that area at all (and have in fact not even touched the compiler and its peephole optimizer), but if parts my implementation gets in, I am sure that you could add some of your work on top of that, too. Cheers, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
So, basically, you built a JIT compiler but don't want to call it that, right? Just because it compiles byte code to other byte code rather than to native CPU instructions does not mean it doesn't compile Just In Time. For me, a definition of a JIT compiler or any dynamic compilation subsystem entails that native machine code is generated at run-time. Furthermore, I am not compiling from bytecode to bytecode, but rather changing the instruction encoding underneath and use subsequently use quickening to optimize interpretation. But, OTOH, I am not aware of a canonical definition of JIT compilation, so it depends ;) I agree with the others that it's best to open up your repository for everyone who is interested. I can see no reason why you would want to close it back down once it's there. Well, my code has primarily been a vehicle for my research in that area and thus is not immediately suited to adoption (it does not adhere to Python C coding standards, contains lots of private comments about various facts, debugging hints, etc.). The explanation for this is easy: When I started out on my research it was far from clear that it would be successful and really that much faster. So, I would like to clean up the comments and some parts of the code and publish the code I have without any of the clean-up work for naming conventions, etc., so that you can all take a look and it is clear what it's all about. After that we can then have a factual discussion about whether it fits the bill for you, too, and if so, which changes (naming conventions, extensive documentation, etc.) are necessary *before* any adoption is reasonable for you, too. That seems to be a good way to start off and get results and feedback quickly, any ideas/complaints/comments/suggestions? Best regards, --stefan PS: I am using Nick's suggested plan to incorporate my changes directly to the most recent version, as mine is currently only running on Python 3.1. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro cesare.di.ma...@gmail.com wrote: It isn't, because motivation to do something new with CPython vanishes, at least on some areas (virtual machine / ceval.c), even having some ideas to experiment with. That's why in my last talk on EuroPython I decided to move on other areas (Python objects). Cesare, I'm really sorry that you became so disillusioned that you abandoned wordcode. I agree that we were too optimistic about Unladen Swallow. Also that the existence of PyPy and its PR machine (:-) should not stop us from improving CPython. I'm wondering if, with your experience in creating WPython, you could review Stefan Brunthaler's code and approach (once he's put it up for review) and possibly the two of you could even work on a joint project? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler ste...@brunthaler.net wrote: Well, my code has primarily been a vehicle for my research in that area and thus is not immediately suited to adoption [...]. But if you want to be taken seriously as a researcher, you should publish your code! Without publication of your *code* research in your area cannot be reproduced by others, so it is not science. Please stop being shy and open up what you have. The software engineering issues can be dealt with separately! -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/31 stefan brunthaler ste...@brunthaler.net I think that you must deal with big endianess because some RISC can't handle at all data in little endian format. In WPython I have wrote some macros which handle both endianess, but lacking big endian machines I never had the opportunity to verify if something was wrong. I am sorry for the temporal lapse of not getting back to this directly yesterday, we were just heading out for lunch and I figured it only out then but immediately forgot it on our way back to the lab... So, as I have already said, I evaluated my optimizations on x86 (little-endian) and PowerPC 970 (big-endian) and I did not have to change any of my instruction decoding during interpretation. (The only nasty bug I still remember vividly was that while on gcc for x86 the data type char defaults to signed, whereas it defaults to unsigned on PowerPC's gcc.) When I have time and access to a PowerPC machine again (an ARM might be interesting, too), I will take a look at the generated assembly code to figure out why this is working. (I have some ideas why it might work without changing the code.) If I run into any problems, I'll gladly contact you :) BTW: AFAIR, we emailed last year regarding wpython and IIRC your optimizations could primarily be summarized as clever superinstructions. I have not implemented anything in that area at all (and have in fact not even touched the compiler and its peephole optimizer), but if parts my implementation gets in, I am sure that you could add some of your work on top of that, too. Cheers, --stefan You're right. I took a look at our old e-mails, and I found more details about your work. It's definitely not affected by processor endianess, so you don't need any check: it just works, because you'll produce the new opcodes in memory, and consume them in memory as well. Looking at your examples, I think that WPython wordcodes usage can be useful only for the most simple ones. That's because superinstructions group together several actions that need to be splitted again to simpler ones by a tracing-JIT/compiler like your, if you want to keep it simple. You said that you added about 400 specialized instructions last year with the usual bytecodes, but wordcodes will require quite more (this can compromise performance on CPU with small data caches). So I think that it'll be better to finish your work, with all tests passed, before thinking about adding something on top (that, for me, sounds like a machine code JIT O:-) Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/31 Guido van Rossum gu...@python.org On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro cesare.di.ma...@gmail.com wrote: It isn't, because motivation to do something new with CPython vanishes, at least on some areas (virtual machine / ceval.c), even having some ideas to experiment with. That's why in my last talk on EuroPython I decided to move on other areas (Python objects). Cesare, I'm really sorry that you became so disillusioned that you abandoned wordcode. I agree that we were too optimistic about Unladen Swallow. Also that the existence of PyPy and its PR machine (:-) should not stop us from improving CPython. I never stopped thinking about new optimization. A lot can be made on CPython, even without resorting to something like JIT et all. I'm wondering if, with your experience in creating WPython, you could review Stefan Brunthaler's code and approach (once he's put it up for review) and possibly the two of you could even work on a joint project? -- --Guido van Rossum (python.org/~guido) Yes, I can. I'll wait for Stefan to update its source (reaching Python 3.2 at least) as he has intended to do, and that everything is published, in order to review the code. I also agree with you that right now it doesn't need to look as state-of-the-art. First make it work, then make it nicer. ;) Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Thu, Sep 1, 2011 at 3:28 AM, Guido van Rossum gu...@python.org wrote: On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro Cesare, I'm really sorry that you became so disillusioned that you abandoned wordcode. I agree that we were too optimistic about Unladen Swallow. Also that the existence of PyPy and its PR machine (:-) should not stop us from improving CPython. Yep, and I'll try to do a better job of discouraging creeping complexity (without adequate payoffs) without the harmful side effect of discouraging experimentation with CPython performance improvements in general. It's massive rewrite the world changes, that don't adequately account for all the ways CPython gets used or the fact that core devs need to be able to effectively *review* the changes, that are unlikely to ever get anywhere. More localised changes, or those that are relatively easy to explain have a much better chance. So I'll switch my tone to just trying to make sure that portability and maintainability concerns are given due weight :) Cheers, Nick. P.S. I suspect a big part of my attitude stems from the fact that we're still trying to untangle some of the consequences of committing the PEP 3118 new buffer API implementation with inadequate review (it turns out the implementation didn't reflect the PEP and the PEP had deficiencies of its own), and I was one of the ones advocating in favour of that patch. Once bitten, twice shy, etc. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Nick Coghlan wrote: Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Me, too! I like that I can read the CPython source and understand what it's doing most of the time. Please don't screw that up by attempting to perform heroic optimisations. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 08:57, Greg Ewing greg.ew...@canterbury.ac.nzwrote: Nick Coghlan wrote: Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Me, too! I like that I can read the CPython source and understand what it's doing most of the time. Please don't screw that up by attempting to perform heroic optimisations. -- Following this argument to the extreme, the bytecode evaluation code of CPython can be simplified quite a bit. Lose 2x performance but gain a lot of readability. Does that sound like a good deal? I don't intend to sound sarcastic, just show that IMHO this argument isn't a good one. I think that even clever optimized code can be properly written and *documented* to make the task of understanding it feasible. Personally, I'd love CPython to be a bit faster and see no reason to give up optimization opportunities for the sake of code readability. Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 4:22 PM, Eli Bendersky eli...@gmail.com wrote: On Tue, Aug 30, 2011 at 08:57, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Following this argument to the extreme, the bytecode evaluation code of CPython can be simplified quite a bit. Lose 2x performance but gain a lot of readability. Does that sound like a good deal? I don't intend to sound sarcastic, just show that IMHO this argument isn't a good one. I think that even clever optimized code can be properly written and *documented* to make the task of understanding it feasible. Personally, I'd love CPython to be a bit faster and see no reason to give up optimization opportunities for the sake of code readability. Yeah, it's definitely a trade-off - the point I was trying to make is that there *is* a trade-off being made between complexity and speed. I think the computed-gotos stuff struck a nice balance - the macro-fu involved means that you can still understand what the main eval loop is *doing*, even if you don't know exactly what's hidden behind the target macros. Ditto for the older opcode prediction feature and the peephole optimiser - separation of concerns means that you can understand the overall flow of events without needing to understand every little detail. This is where the request to extract individual orthogonal changes and submit separate patches comes from - it makes it clear that the independent changes *can* be separated cleanly, and aren't a giant ball of incomprehensible mud. It's the difference between complex (lots of moving parts, that can each be understood on their own and are then composed into a meaningful whole) and complicated (massive patches that don't work at all if any one component is delayed) Eugene Toder's AST optimiser work that I still hope to get into 3.3 will have to undergo a similar process - the current patch covers a bit too much ground and needs to be broken up into smaller steps before we can seriously consider pushing it into the core. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Nick Coghlan, 30.08.2011 02:00: On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou wrote: On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. If Stefan can cut down his changes into smaller feature chunks, thus making their benefit reproducible and verifiable by others, it's well worth reconsidering if even a visible increase of complexity isn't worth the improved performance, one patch at a time. Even if PyPy's performance tops the improvements, it's worth remembering that that's also a very different kind of system than CPython, with different resource requirements and a different level of maturity, compatibility, portability, etc. There are many reasons to continue using CPython, not only in corners, and there are many people who would be happy about a faster CPython. Raising the bar has its virtues. That being said, I also second Nick's reference to wpython. If CPython grows its byte code size anyway (which, as I understand, is one part of the proposed changes), it's worth looking at wpython first, given that it has been around and working for a while. The other proposed changes sound like at least some of them are independent from this one. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Nick Coghlan wrote: On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler s.bruntha...@uci.edu wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. CPython has a a large number of micro-optimisations, scattered all of the code base. By removing these and adding large-scale optimisations, like Stephan's, the code base *might* actually get smaller overall (and thus simpler) *and* faster. Of course, CPython must remain portable. [snip] At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) +1 Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Martin v. Löwis wrote: So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. A JIT compiler is not a silver bullet, translation to machine code is just one of many optimisations performed by PyPy. A compiler merely removes interpretative overhead, at the cost of significantly increased code size, whereas Stephan's work attacks both interpreter overhead and some of the inefficiencies due to dynamic typing. If Unladen Swallow achieved anything it was to demonstrate that a JIT alone does not work well. My (experimental) HotPy VM has similar base-line speed to CPython, yet is able to outperform Unladen Swallow using interpreter-only optimisations. (It goes even faster with the compiler turned on :) ) Cheers, Mark. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Changing the bytecode width wouldn't make the interpreter more complex. No, but I think Stefan is proposing to add a *second* byte code format, in addition to the one that remains there. That would certainly be an increase in complexity. Some years ago we were waiting for Unladen Swallow to improve itself and be ported to Python 3. Now it seems we are waiting for PyPy to be ported to Python 3. I'm not sure how let's just wait is a good trade-off if someone proposes interesting patches (which, of course, remains to be seen). I completely agree. Let's not put unmet preconditions to such projects. For example, I still plan to write a JIT for Python at some point. This may happen in two months, or in two years. I wouldn't try to stop anybody from contributing improvements that may become obsolete with the JIT. The only recent case where I *did* try to stop people is with PEP-393, where I do believe that some of the changes that had been made over the last year become redundant. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, 30 Aug 2011 13:29:59 +1000 Nick Coghlan ncogh...@gmail.com wrote: Anecdotal, non-reproducible performance figures are *not* the way to go about serious optimisation efforts. What about anecdotal *and* reproducible performance figures? :) I may be half-joking, but we already have a set of py3k-compatible benchmarks and, besides, sometimes a timeit invocation gives a good idea of whether an approach is fruitful or not. While a permanent public reference with historical tracking of performance figures is even better, let's not freeze everything until it's ready. (for example, do we need to wait for speed.python.org before PEP 393 is accepted?) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 30 Aug 2011 13:29:59 +1000 Nick Coghlan ncogh...@gmail.com wrote: Anecdotal, non-reproducible performance figures are *not* the way to go about serious optimisation efforts. What about anecdotal *and* reproducible performance figures? :) I may be half-joking, but we already have a set of py3k-compatible benchmarks and, besides, sometimes a timeit invocation gives a good idea of whether an approach is fruitful or not. While a permanent public reference with historical tracking of performance figures is even better, let's not freeze everything until it's ready. (for example, do we need to wait for speed.python.org before PEP 393 is accepted?) Yeah, I'd neglected the idea of just running perf.py for pre- and post-patch performance comparisons. You're right that that can generate sufficient info to make a well-informed decision. I'd still really like it if some of the people advocating that we care about CPython performance actually volunteered to spearhead the effort to get speed.python.org up and running, though. As far as I know, the hardware's spinning idly waiting to be given work to do :P Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Changing the bytecode width wouldn't make the interpreter more complex. No, but I think Stefan is proposing to add a *second* byte code format, in addition to the one that remains there. That would certainly be an increase in complexity. Yes, indeed I have a more straightforward instruction format to allow for more efficient decoding. Just going from bytecode size to word-code size without changing the instruction format is going to require 8 (or word-size) times more memory on a 64bit system. From an optimization perspective, the irregular instruction format was the biggest problem, because checking for HAS_ARG is always on the fast path and mostly unpredictable. Hence, I chose to extend the instruction format to have word-size and use the additional space to have the upper half be used for the argument and the lower half for the actual opcode. Encoding is more efficient, and *not* more complex. Using profiling to indicate what code is hot, I don't waste too much memory on encoding this regular instruction format. For example, I still plan to write a JIT for Python at some point. This may happen in two months, or in two years. I wouldn't try to stop anybody from contributing improvements that may become obsolete with the JIT. I would not necessary argue that at least my optimizations would become obsolete; if you still think about writing a JIT, it might make sense to re-use what I've got and not start from scratch, e.g., building a simple JIT compiler that just inlines the operation implementations as templates to eliminate the interpretative overhead (in similar vein as Piumarta and Riccardi's paper from 1998) might be good start. Thoug I don't want to pre-influence your JIT design, I'm just thinking out loud... Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Stefan, have you shared a pointer to your code yet? Is it open source? It sounds like people are definitely interested and it would make sense to let them experiment with your code and review it. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 09:42, Guido van Rossum gu...@python.org wrote: Stefan, have you shared a pointer to your code yet? Is it open source? I have no shared code repository, but could create one (is there any pydev preferred provider?). I have all the copyrights on the code, and I would like to open-source it. It sounds like people are definitely interested and it would make sense to let them experiment with your code and review it. That sounds fine. I need to do some clean up work (contains most of my comments to remind me of issues) and currently does not pass all regression tests. But if people want to take a look first to decide if they want it than that's good enough for me. (I just wanted to know if there is substantial interest so that it eventually pays off to find and fix the remaining bugs) --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, 30 Aug 2011 08:27:13 -0700 stefan brunthaler ste...@brunthaler.net wrote: Changing the bytecode width wouldn't make the interpreter more complex. No, but I think Stefan is proposing to add a *second* byte code format, in addition to the one that remains there. That would certainly be an increase in complexity. Yes, indeed I have a more straightforward instruction format to allow for more efficient decoding. Just going from bytecode size to word-code size without changing the instruction format is going to require 8 (or word-size) times more memory on a 64bit system. Do you really need it to match a machine word? Or is, say, a 16-bit format sufficient. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Do you really need it to match a machine word? Or is, say, a 16-bit format sufficient. Hm, technically no, but practically it makes more sense, as (at least for x86 architectures) having opargs and opcodes in half-words can be efficiently expressed in assembly. On 64bit architectures, I could also inline data object references that fit into the 32bit upper half. It turns out that most constant objects fit nicely into this, and I have used this for a special cache region (again below 2^32) for global objects, too. So, technically it's not necessary, but practically it makes a lot of sense. (Most of these things work on 32bit systems, too. For architectures with a smaller size, we can adapt or disable the optimizations.) Cheers, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler ste...@brunthaler.net wrote: Do you really need it to match a machine word? Or is, say, a 16-bit format sufficient. Hm, technically no, but practically it makes more sense, as (at least for x86 architectures) having opargs and opcodes in half-words can be efficiently expressed in assembly. On 64bit architectures, I could also inline data object references that fit into the 32bit upper half. It turns out that most constant objects fit nicely into this, and I have used this for a special cache region (again below 2^32) for global objects, too. So, technically it's not necessary, but practically it makes a lot of sense. (Most of these things work on 32bit systems, too. For architectures with a smaller size, we can adapt or disable the optimizations.) Do I sense that the bytecode format is no longer platform-independent? That will need a bit of discussion. I bet there are some things around that depend on that. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Do I sense that the bytecode format is no longer platform-independent? That will need a bit of discussion. I bet there are some things around that depend on that. Hm, I haven't really thought about that in detail and for longer, I ran it on PowerPC 970 and Intel Atom i7 without problems (the latter ones are a non-issue) and think that it can be portable. I just stuff argument and opcode into one word for regular instruction decoding like a RISC CPU, and I realize there might be little/big endian issues, but they surely can be conditionally compiled... --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 11:23 AM, stefan brunthaler ste...@brunthaler.net wrote: Do I sense that the bytecode format is no longer platform-independent? That will need a bit of discussion. I bet there are some things around that depend on that. Hm, I haven't really thought about that in detail and for longer, I ran it on PowerPC 970 and Intel Atom i7 without problems (the latter ones are a non-issue) and think that it can be portable. I just stuff argument and opcode into one word for regular instruction decoding like a RISC CPU, and I realize there might be little/big endian issues, but they surely can be conditionally compiled... Um, I'm sorry, but that reply sounds incredibly naive, like you're not really sure what the on-disk format for .pyc files is or why it would matter. You're not even answering the question, except indirectly -- it seems that you've never even thought about the possibility of generating a .pyc file on one platform and copying it to a computer using a different one. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Um, I'm sorry, but that reply sounds incredibly naive, like you're not really sure what the on-disk format for .pyc files is or why it would matter. You're not even answering the question, except indirectly -- it seems that you've never even thought about the possibility of generating a .pyc file on one platform and copying it to a computer using a different one. Well, it may sound incredibly naive, but the truth is: I am never storing the optimized representation to disk, it's done purely at runtime when profiling tells me it makes sense to make the switch. Thus I circumvent many of the problems outlined by you. So I am positive that a full fledged change of the representation has many more intricacies to it, but my approach is only tangentially related... --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On 8/30/2011 1:23 PM, stefan brunthaler wrote: (I just wanted to know if there is substantial interest so that it eventually pays off to find and fix the remaining bugs) It is the nature of our development process that there usually can be no guarantee of acceptance of future code. The rather early acceptance of Unladen Swallow was to me something of an anomaly. I also think it was something of a mistake insofar as it discouraged other efforts, like yours. I think the answer you have gotten is that there is a) substantial interest and b) a willingness to consider a major change such as switfing from bytecode to something else. There also seem to be two main concerns: 1) that the increase in complexity be 'less' than the increase in speed, and 2) that the changes be presented in small enough chunks that they can be reviewed. Whether this is good enough for you to proceed is for you to decide. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 11:34 AM, stefan brunthaler ste...@brunthaler.net wrote: Um, I'm sorry, but that reply sounds incredibly naive, like you're not really sure what the on-disk format for .pyc files is or why it would matter. You're not even answering the question, except indirectly -- it seems that you've never even thought about the possibility of generating a .pyc file on one platform and copying it to a computer using a different one. Well, it may sound incredibly naive, but the truth is: I am never storing the optimized representation to disk, it's done purely at runtime when profiling tells me it makes sense to make the switch. Thus I circumvent many of the problems outlined by you. So I am positive that a full fledged change of the representation has many more intricacies to it, but my approach is only tangentially related... Ok, there there's something else you haven't told us. Are you saying that the original (old) bytecode is still used (and hence written to and read from .pyc files)? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Am 30.08.2011 20:34, schrieb stefan brunthaler: Um, I'm sorry, but that reply sounds incredibly naive, like you're not really sure what the on-disk format for .pyc files is or why it would matter. You're not even answering the question, except indirectly -- it seems that you've never even thought about the possibility of generating a .pyc file on one platform and copying it to a computer using a different one. Well, it may sound incredibly naive, but the truth is: I am never storing the optimized representation to disk, it's done purely at runtime when profiling tells me it makes sense to make the switch. Thus I circumvent many of the problems outlined by you. So I am positive that a full fledged change of the representation has many more intricacies to it, but my approach is only tangentially related... You know, instead of all these half-explanations, giving us access to the code would shut us up much more effectively. Don't worry about not passing tests, this is what the official trunk does half of the time ;) Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Ok, there there's something else you haven't told us. Are you saying that the original (old) bytecode is still used (and hence written to and read from .pyc files)? Short answer: yes. Long answer: I added an invocation counter to the code object and keep interpreting in the usual Python interpreter until this counter reaches a configurable threshold. When it reaches this threshold, I create the new instruction format and interpret with this optimized representation. All the macros look exactly the same in the source code, they are just redefined to use the different instruction format. I am at no point serializing this representation or the runtime information gathered by me, as any subsequent invocation might have different characteristics. I will remove my development commentaries and create a private repository at bitbucket for you* to take an early look like Georg (and more or less Terry, too) suggested. Is that a good way for most of you? (I would then give access to whomever wants to take a look.) Best, --stefan *: not personally targeted at Guido (who is naturally very much welcome to take a look, too) but addressed to python-dev in general. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/30 stefan brunthaler ste...@brunthaler.net: I will remove my development commentaries and create a private repository at bitbucket for you* to take an early look like Georg (and more or less Terry, too) suggested. Is that a good way for most of you? (I would then give access to whomever wants to take a look.) And what is wrong with a public one? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson benja...@python.org wrote: 2011/8/30 stefan brunthaler ste...@brunthaler.net: I will remove my development commentaries and create a private repository at bitbucket for you* to take an early look like Georg (and more or less Terry, too) suggested. Is that a good way for most of you? (I would then give access to whomever wants to take a look.) And what is wrong with a public one? Well, since it does not fully pass all regression tests and is just meant for people to take a first look to find out if it's interesting, I think I might take it offline after you had a look. It seems to me that that is easier to be done with a private repository, but in general, I don't have a problem with a public one... Regards, --stefan PS: If you want to, I can also just put a tarball on my home page and post a link here. It's not that I would like to have control/influence about who is allowed to look and who doesn't. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 1:54 PM, Benjamin Peterson benja...@python.orgwrote: 2011/8/30 stefan brunthaler ste...@brunthaler.net: On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson benja...@python.org wrote: 2011/8/30 stefan brunthaler ste...@brunthaler.net: I will remove my development commentaries and create a private repository at bitbucket for you* to take an early look like Georg (and more or less Terry, too) suggested. Is that a good way for most of you? (I would then give access to whomever wants to take a look.) And what is wrong with a public one? Well, since it does not fully pass all regression tests and is just meant for people to take a first look to find out if it's interesting, I think I might take it offline after you had a look. It seems to me that that is easier to be done with a private repository, but in general, I don't have a problem with a public one... Well, if your intention is for people to look at it, public seems to be the best solution. +1 The point of open source is more eyeballs and the ability for anyone else to pick up code and run in whatever direction they want (license permitting) with it. :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Aug 30, 2011, at 9:05 AM, Nick Coghlan ncogh...@gmail.com wrote: On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 30 Aug 2011 13:29:59 +1000 Nick Coghlan ncogh...@gmail.com wrote: Anecdotal, non-reproducible performance figures are *not* the way to go about serious optimisation efforts. What about anecdotal *and* reproducible performance figures? :) I may be half-joking, but we already have a set of py3k-compatible benchmarks and, besides, sometimes a timeit invocation gives a good idea of whether an approach is fruitful or not. While a permanent public reference with historical tracking of performance figures is even better, let's not freeze everything until it's ready. (for example, do we need to wait for speed.python.org before PEP 393 is accepted?) Yeah, I'd neglected the idea of just running perf.py for pre- and post-patch performance comparisons. You're right that that can generate sufficient info to make a well-informed decision. I'd still really like it if some of the people advocating that we care about CPython performance actually volunteered to spearhead the effort to get speed.python.org up and running, though. As far as I know, the hardware's spinning idly waiting to be given work to do :P Cheers, Nick. Discussion of speed.python.org should happen on the mailing list for that project if possible. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Wed, Aug 31, 2011 at 3:23 AM, stefan brunthaler ste...@brunthaler.net wrote: On Tue, Aug 30, 2011 at 09:42, Guido van Rossum gu...@python.org wrote: Stefan, have you shared a pointer to your code yet? Is it open source? I have no shared code repository, but could create one (is there any pydev preferred provider?). I have all the copyrights on the code, and I would like to open-source it. Currently, the easiest way to create shared repositories for CPython variants is to start with bitbucket's mirror of the main CPython repo: https://bitbucket.org/mirror/cpython/overview Use the website to create your own public CPython fork, then edit the configuration of your local copy of the CPython repo to point to the your new bitbucket repo rather than the main one on hg.python.org. hg push/pull can then be used as normal to publish in-development material to the world. 'hg pull' from hg.python.org makes it fairly easy to track the trunk. One key thing is to avoid making any changes of your own on the official CPython branches (i.e. default, 3.2, 2.7). Instead, use a named branch for anything you're working on. This makes it much easier to generate standalone patches later on. My own public sandbox (https://bitbucket.org/ncoghlan/cpython_sandbox/overview) is set up that way, and you can see plenty of other examples on bitbucket. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Wed, Aug 31, 2011 at 9:21 AM, Jesse Noller jnol...@gmail.com wrote: Discussion of speed.python.org should happen on the mailing list for that project if possible. Hah, that's how out of the loop I am on that front - I didn't even know there *was* a mailing list for it :) Subscribed! Cheers, Nick. P.S. For anyone else that is interested: http://mail.python.org/mailman/listinfo/speed -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On 8/30/2011 2:12 PM, Guido van Rossum wrote: On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler ste...@brunthaler.net wrote: Do you really need it to match a machine word? Or is, say, a 16-bit format sufficient. Hm, technically no, but practically it makes more sense, as (at least for x86 architectures) having opargs and opcodes in half-words can be efficiently expressed in assembly. On 64bit architectures, I could also inline data object references that fit into the 32bit upper half. It turns out that most constant objects fit nicely into this, and I have used this for a special cache region (again below 2^32) for global objects, too. So, technically it's not necessary, but practically it makes a lot of sense. (Most of these things work on 32bit systems, too. For architectures with a smaller size, we can adapt or disable the optimizations.) Do I sense that the bytecode format is no longer platform-independent? That will need a bit of discussion. I bet there are some things around that depend on that. I find myself more comfortable with the Cesare Di Mauro's idea of expanding to 16 bits as the code unit. His basic idea was using 2, 4, or 6 bytes instead of 1, 3, or 6. It actually tended to save space because many ops with small ints (which are very common) contract from 3 bytes to 2 bytes or from 9(?) (two instructions) to 6. I am sorry he was not able to followup on the initial promising results. The dis output was probably easier to read than the current output. Perhaps he made a mistake in combining the above idea with a shift from stack to hybrid stack+register design. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/30 Antoine Pitrou solip...@pitrou.net Changing the bytecode width wouldn't make the interpreter more complex. It depends on the kind of changes. :) WPython introduced a very different intermediate code representation that required a big change on the peepholer optimizer on 1.0 alpha version. On 1.1 final I decided to completely move that code on ast.c (mostly for constant-folding) and compiler.c (for the usual peepholer usage: seeking for some patterns to substitute with better ones) because I found it simpler and more convenient. In the end, taking out some new optimizations that I've implemented on the road, the interpreter code is a bit more complex. Some years ago we were waiting for Unladen Swallow to improve itself and be ported to Python 3. Now it seems we are waiting for PyPy to be ported to Python 3. I'm not sure how let's just wait is a good trade-off if someone proposes interesting patches (which, of course, remains to be seen). Regards Antoine. It isn't, because motivation to do something new with CPython vanishes, at least on some areas (virtual machine / ceval.c), even having some ideas to experiment with. That's why in my last talk on EuroPython I decided to move on other areas (Python objects). Regards Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/30 Nick Coghlan ncogh...@gmail.com Yeah, it's definitely a trade-off - the point I was trying to make is that there *is* a trade-off being made between complexity and speed. I think the computed-gotos stuff struck a nice balance - the macro-fu involved means that you can still understand what the main eval loop is *doing*, even if you don't know exactly what's hidden behind the target macros. Ditto for the older opcode prediction feature and the peephole optimiser - separation of concerns means that you can understand the overall flow of events without needing to understand every little detail. This is where the request to extract individual orthogonal changes and submit separate patches comes from - it makes it clear that the independent changes *can* be separated cleanly, and aren't a giant ball of incomprehensible mud. It's the difference between complex (lots of moving parts, that can each be understood on their own and are then composed into a meaningful whole) and complicated (massive patches that don't work at all if any one component is delayed) Eugene Toder's AST optimiser work that I still hope to get into 3.3 will have to undergo a similar process - the current patch covers a bit too much ground and needs to be broken up into smaller steps before we can seriously consider pushing it into the core. Regards, Nick. Sometimes it cannot be done, because big changes produces big patches as well. I don't see a problem here if the code is well written (as required buy the Python community :) and the developer is available to talk about his work to clear some doubts. Regards Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/30 stefan brunthaler ste...@brunthaler.net Yes, indeed I have a more straightforward instruction format to allow for more efficient decoding. Just going from bytecode size to word-code size without changing the instruction format is going to require 8 (or word-size) times more memory on a 64bit system. From an optimization perspective, the irregular instruction format was the biggest problem, because checking for HAS_ARG is always on the fast path and mostly unpredictable. Hence, I chose to extend the instruction format to have word-size and use the additional space to have the upper half be used for the argument and the lower half for the actual opcode. Encoding is more efficient, and *not* more complex. Using profiling to indicate what code is hot, I don't waste too much memory on encoding this regular instruction format. Regards, --stefan That seems exactly the WPython approach, albeit I used the new wordcode in place of the old bytecode. Take a look at it. ;) Regards Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/30 stefan brunthaler ste...@brunthaler.net Do I sense that the bytecode format is no longer platform-independent? That will need a bit of discussion. I bet there are some things around that depend on that. Hm, I haven't really thought about that in detail and for longer, I ran it on PowerPC 970 and Intel Atom i7 without problems (the latter ones are a non-issue) and think that it can be portable. I just stuff argument and opcode into one word for regular instruction decoding like a RISC CPU, and I realize there might be little/big endian issues, but they surely can be conditionally compiled... --stefan I think that you must deal with big endianess because some RISC can't handle at all data in little endian format. In WPython I have wrote some macros which handle both endianess, but lacking big endian machines I never had the opportunity to verify if something was wrong. Regards Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/31 Terry Reedy tjre...@udel.edu I find myself more comfortable with the Cesare Di Mauro's idea of expanding to 16 bits as the code unit. His basic idea was using 2, 4, or 6 bytes instead of 1, 3, or 6. It can be expanded to longer than 6 bytes opcodes, if needed. The format is designed to be flexible enough to accommodate such changes without pains. It actually tended to save space because many ops with small ints (which are very common) contract from 3 bytes to 2 bytes or from 9(?) (two instructions) to 6. It can pack up to 4 (old) opcodes into one wordcode (superinstruction). Wordcodes are designed to favor instruction grouping. I am sorry he was not able to followup on the initial promising results. In a few words: lack of interest. Why spending (so much) time to a project when you see that the community is oriented towards other directions (Unladen Swallow at first, PyPy in the last period, given the substantial drop of the former)? Also, Guido seems to dislike what he finds as hacks, and never showed interest. In WPython 1.1 I rolled back the hack that I introduced in PyObject types (a couple of fields) in 1.0 alpha, to make the code more polished (but with a sensible drop in the performance). But again, I saw no interest on WPython, so I decided to put a stop at it, and blocking my initial idea to go for Python 3. The dis output was probably easier to read than the current output. Perhaps he made a mistake in combining the above idea with a shift from stack to hybrid stack+register design. -- Terry Jan Reedy As I already said, wordcodes are designed to favor grouping. So It was quite natural to became an hybrid VM. Anyway, both space and performance gained from this wordcodes property. ;) Regards Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3 optimizations continued...
Hi, pretty much a year ago I wrote about the optimizations I did for my PhD thesis that target the Python 3 series interpreters. While I got some replies, the discussion never really picked up and no final explicit conclusion was reached. AFAICT, because of the following two factors, my optimizations were not that interesting for inclusion with the distribution at that time: a) Unladden Swallow was targeting Python 3, too. b) My prototype did not pass the regression tests. As of November 2010 (IIRC), Google is not supporting work on US anymore, and the project is stalled. (If I am wrong and there is still activity and any plans with the corresponding PEP, please let me know.) Which is why I recently spent some time fixing issues so that I can run the regression tests. There is still some work to be done, but by and large it should be possible to complete all regression tests in reasonable time (with the actual infrastructure in place, enabling optimizations later on is not a problem at all, too.) So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? Have a nice day, --stefan PS: It probably is unusual, but in a part of my home page I have created a link to indicate interest (makes both counting and voting easier, http://www.ics.uci.edu/~sbruntha/) There were also links indicating interest in funding the work; I have disabled these, so as not to upset anybody or make the impression of begging for money... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/29 stefan brunthaler s.bruntha...@uci.edu: So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? Perhaps there would be something to say given patches/overviews/specifics. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Perhaps there would be something to say given patches/overviews/specifics. Currently I don't have patches, but for an overview and specifics, I can provide the following: * My optimizations basically rely on quickening to incorporate run-time information. * I use two separate instruction dispatch routines, and use profiling to switch from the regular Python 3 dispatch routine to an optimized one (the implementation is actually vice versa, but that is not important now) * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. * I use inline-caching based on quickening (passes almost all regression tests [302 out of 307]), eliminate reference count operations using quickening (passes but has a memory leak), promote frequently accessed local variables to their dedicated instructions (passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction encoding when possible (I am working on this right now.) The changes I made can be summarized as: * I changed some header files to accommodate additional information (Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h) * I changed mostly abstract.c to incorporate runtime-type feedback. * All other changes target mostly ceval.c and all supplementary code is in a sub-directory named opt and all generated files in a sub-directory within that (opt/gen). * I have a code generator in place that takes care of generating all the functions; it uses the Mako template system for creating C code and does not necessarily need to be shipped with the interpreter (though one can play around and experiment with it.) So, all in all, the changes are not that big to the actual implementation, and most of the code is generated (using sloccount, opt has 1990 lines of C, and opt/gen has 8649 lines of C). That's a quick summary, if there are any further or more in-depth questions, let me know. best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Well, it's not an all-or-nothing deal. In my current architecture, I can selectively enable most of the optimizations as I see fit. The only pre-requisite (in my implementation) is that I have two dispatch loops with a changed instruction format. It is, however, not a technical necessity, just the way I implemented it. Basically, you can choose whatever you like best, and I could extract that part. I am just offering to add all the things that I have done :) Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. This is something I have previously frequently discussed with several JIT people. IMHO, having my optimizations in-place also helps a JIT compiler, since it can re-use the information I gathered to generate more aggressively optimized native machine code right away (the inline caches can be generated with the type information right away, some functions could be inlined with the guard statements subsumed, etc.) Another benefit could be that the JIT compiler can spend longer time on generating code, because the interpreter is already faster (so in some cases it would probably not make sense to include a non-optimizing fast and simple JIT compiler). There are others on the list, who probably can/want to comment on this, too. That aside, I think that while having a JIT is an important goal, I can very well imagine scenarios where the additional memory consumption (for the generated native machine code) of a JIT for each process (I assume that the native machine code caches are not shared) hinders scalability. I have in fact no data to back this up, but I think that would be an interesting trade off, say if I have 30% gain in performance without substantial additional memory requirements on my existing hardware, compared to higher achievable speedups that require more machines, though. Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler s.bruntha...@uci.edu wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Le lundi 29 août 2011 19:35:14, stefan brunthaler a écrit : pretty much a year ago I wrote about the optimizations I did for my PhD thesis that target the Python 3 series interpreters Does it speed up Python? :-) Could you provide numbers (benchmarks)? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Does it speed up Python? :-) Could you provide numbers (benchmarks)? Yes, it does ;) The maximum overall speedup I achieved was by a factor of 2.42 on my i7-920 for the spectralnorm benchmark of the computer language benchmark game. Others from the same set are: binarytrees: 1.9257 (1.9891) fannkuch: 1.6509 (1.7264) fasta: 1.5446 (1.7161) mandelbrot: 2.0040 (2.1847) nbody: 1.6165 (1.7602) spectralnorm: 2.2538 (2.4176) --- overall: 1.8213 (1.9382) (The first number is the combination of all optimizations, the one in parentheses is with my last optimization [Interpreter Instruction Scheduling] enabled, too.) For a comparative real world benchmark I tested Martin von Loewis' django port (there are not that many meaningful Python 3 real world benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably well, US got a speedup of 1.35 on this benchmark. I just checked that pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures seem to be not working currently or *really* fast...), but I cannot tell directly how that relates to speedups (it just says less is better and I did not quickly find an explanation). Since I did this benchmark last year, I have spent more time investigating this benchmark and found that I could do better, but I would have to guess as to how much (An interesting aside though: on this benchmark, the executable never grew on more than 5 megs of memory usage, exactly like the vanilla Python 3 interpreter.) hth, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler s.bruntha...@uci.edu wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. I agree with the trade-off, but the nice thing is that CPython's interpreter remains simple and portable using my optimizations. All of these optimizations are purely interpretative and the complexity of CPython is not affected much. (For example, I have an inline-cached version of BINARY_ADD that is called INCA_FLOAT_ADD [INCA being my abbreviation for INline CAching]; you don't actually have to look at its source code, since it is generated by my code generator but can by looking at instruction traces immediately tell what's going on.) So, the interpreter remains fully portable and any compatibility issues with C modules should not occur either. At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) Having more tests would surely be helpful, as already said, the most real-world stuff I can do is Martin's django patch (some of the other benchmarks though are from the shootout and I can [and did] run them, too {binarytrees, fannkuch, fasta, mandelbrot, nbody and spectralnorm}. I have also the AI benchmark from Unladden Swallow but no current figures.) Best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, 30 Aug 2011 10:00:28 +1000 Nick Coghlan ncogh...@gmail.com wrote: Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Changing the bytecode width wouldn't make the interpreter more complex. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Well, PyPy didn't show any significant achievements before they spent *much* more time on it than the Unladen Swallow guys did. Whether or not a good JIT is possible on top of CPython might remain a largely unanswered question. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. Some years ago we were waiting for Unladen Swallow to improve itself and be ported to Python 3. Now it seems we are waiting for PyPy to be ported to Python 3. I'm not sure how let's just wait is a good trade-off if someone proposes interesting patches (which, of course, remains to be seen). At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) I'm not sure what the bulk is, but have you already taken a look at http://hg.python.org/benchmarks/ ? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Mon, Aug 29, 2011 at 2:05 PM, stefan brunthaler s.bruntha...@uci.eduwrote: The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Well, it's not an all-or-nothing deal. In my current architecture, I can selectively enable most of the optimizations as I see fit. The only pre-requisite (in my implementation) is that I have two dispatch loops with a changed instruction format. It is, however, not a technical necessity, just the way I implemented it. Basically, you can choose whatever you like best, and I could extract that part. I am just offering to add all the things that I have done :) +1 from me on going forward with your performance improvements. The more you can break them down into individual smaller patch sets the better as they can be reviewed and applied as needed. A prerequisites patch, a patch for the wide opcodes, etc.. For benchmarks given this is python 3, just get as many useful ones running as you can. Some in this thread seemed to give the impression that CPython performance is not something to care about. I disagree. I see CPython being the main implementation of Python used in most places for a long time. Improving its performance merely raises the bar to be met by other implementations if they want to compete. That is a good thing! -gps Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. This is something I have previously frequently discussed with several JIT people. IMHO, having my optimizations in-place also helps a JIT compiler, since it can re-use the information I gathered to generate more aggressively optimized native machine code right away (the inline caches can be generated with the type information right away, some functions could be inlined with the guard statements subsumed, etc.) Another benefit could be that the JIT compiler can spend longer time on generating code, because the interpreter is already faster (so in some cases it would probably not make sense to include a non-optimizing fast and simple JIT compiler). There are others on the list, who probably can/want to comment on this, too. That aside, I think that while having a JIT is an important goal, I can very well imagine scenarios where the additional memory consumption (for the generated native machine code) of a JIT for each process (I assume that the native machine code caches are not shared) hinders scalability. I have in fact no data to back this up, but I think that would be an interesting trade off, say if I have 30% gain in performance without substantial additional memory requirements on my existing hardware, compared to higher achievable speedups that require more machines, though. Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 12:38 PM, Gregory P. Smith g...@krypto.org wrote: Some in this thread seemed to give the impression that CPython performance is not something to care about. I disagree. I see CPython being the main implementation of Python used in most places for a long time. Improving its performance merely raises the bar to be met by other implementations if they want to compete. That is a good thing! Not the impression I intended to give. I merely want to highlight that we need to be careful that incremental increases in complexity are justified with real, measured performance improvements. PyPy has set the bar on how to do that - people that seriously want to make CPython faster need to focus on getting speed.python.org sorted *first* (so we know where we're starting) and *then* work on trying to improve CPython's numbers relative to that starting point. The PSF has the hardware to run the site, but, unless more has been going in the background than I am aware of, is still lacking trusted volunteers to do the following: 1. Getting codespeed up and running on the PSF hardware 2. Hooking it in to the CPython source control infrastructure 3. Getting a reasonable set of benchmarks running on 3.x (likely starting with the already ported set in Mercurial, but eventually we want the full suite that PyPy uses) 4. Once PyPy, Jython and IronPython offer 3.x compatible versions, start including them as well (alternatively, offer 2.x performance comparisons as well, although that's less interesting from a CPython point of view since it can't be used to guide future CPython optimisation efforts) Anecdotal, non-reproducible performance figures are *not* the way to go about serious optimisation efforts. Using a dedicated machine is vulnerable to architecture-specific idiosyncracies, but ad hoc testing on other systems can still be used as a sanity check. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com