Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
2016-04-13 23:23 GMT+02:00 Victor Stinner: > Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit > parameter for function with annotation. > I never found 32-bit parameters, and not even 24-bit ones. I think that their usage is as rare as all planets alignment. ;-) That's why with in WPython I supported only 8, 16, and 32-bit parameters (which are 6 bytes long). Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
2016-04-13 18:24 GMT+02:00 Victor Stinner: > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > >http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > -if (HAS_ARG(opcode)) > -oparg = NEXTARG(); > +oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > Correct. The old bytecode format wasn't so much predictable for the CPU. > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > >http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > It should not be a problem, since every PyObject is allocated with PyAlloc (however I don't remember if it's the correct name) which AFAIK guarantees a base 8 bytes alignment. So, it's safe to use an unsigned int for keeping/referencing a word at the time. The only problem with such approach is related to the processor endianess, but it can be solved with proper macros (like I did with WPython). Regards, Cesare ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Oh nice. Did you see my recent "bytecode" project? http://bytecode.readthedocs.io/ Victor Le 5 mai 2016 8:30 PM,a écrit : > Here is something I wrote because I was also unsatisfied with byteplay's > API: https://github.com/zachariahreed/byteasm. Maybe it's useful in a > discussion of "minimum viable" api for bytecode manipulation. > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Here is something I wrote because I was also unsatisfied with byteplay's API: https://github.com/zachariahreed/byteasm. Maybe it's useful in a discussion of "minimum viable" api for bytecode manipulation. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
> On Apr 24, 2016, at 2:31 PM, Victor Stinnerwrote: > > 2016-04-24 23:16 GMT+02:00 Raymond Hettinger : >>> On Apr 24, 2016, at 1:16 PM, Victor Stinner >>> wrote: >>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a >>> single 16-bit operation. It should be easy to implement it later, but >>> I prefer to focus on changing the format of the bytecode. >> >> Improving instruction decoding was the whole point and it was what >> kicked-off the work on the patch. It is also where most of the performance >> improvement comes from and isn't the difficult part of the patch. The >> persnickety parts of the patch lay elsewhere, so there is really nothing to >> be gained gutting out our actual objective. >> >> The OPs original patch had already gotten this part done and it ran fine for >> me. > > Oh wait, my phrasing is unclear. I do want optimize the (opcode, > oparg) fetch, I just suggested to split the patch in two parts, and > first review carefully the first part. Unless it is presenting a tough review challenge, we should do whatever we can to make it easier on the OP who seems to be working with very limited computational resources (I had to run the benchmarks for him because his setup lacked the requisite resources). He's already put a lot of work into the patch which is pretty good shape when it arrived. The opcode/oparg fetch logic is mostly already isolated to the part of the patch that touches ceval.c. I found that part to be relatively clean and clear. The part that took the most time to go through was for peephole.c. How about we let Yury and Serhiy take a pass at it as is. And, if they would benefit from splitting the patch into parts, then perhaps one of us with better tooling can pitch in to the help the OP. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
2016-04-24 23:16 GMT+02:00 Raymond Hettinger: >> On Apr 24, 2016, at 1:16 PM, Victor Stinner wrote: >> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a >> single 16-bit operation. It should be easy to implement it later, but >> I prefer to focus on changing the format of the bytecode. > > Improving instruction decoding was the whole point and it was what kicked-off > the work on the patch. It is also where most of the performance improvement > comes from and isn't the difficult part of the patch. The persnickety parts > of the patch lay elsewhere, so there is really nothing to be gained gutting > out our actual objective. > > The OPs original patch had already gotten this part done and it ran fine for > me. Oh wait, my phrasing is unclear. I do want optimize the (opcode, oparg) fetch, I just suggested to split the patch in two parts, and first review carefully the first part. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
> On Apr 24, 2016, at 1:16 PM, Victor Stinnerwrote: > > I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a > single 16-bit operation. It should be easy to implement it later, but > I prefer to focus on changing the format of the bytecode. Improving instruction decoding was the whole point and it was what kicked-off the work on the patch. It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective. The OPs original patch had already gotten this part done and it ran fine for me. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Hi Raymond, 2016-04-24 21:45 GMT+02:00 Raymond Hettinger: > I think the word code patch should go in sooner rather than later. Several > of us have been through the patch and it is in pretty good shape (some parts > still need work though). The earlier this goes in, the more time we'll have > to shake out any unexpected secondary effects. Yury Selivanov and Serhiy Storchaka told me that they will review shortly the patch. I give them one more week and then I will push the patch. I agree that the patch is in a good shape. I reviewed first versions of the change. I pushed some minor and obvious changes. I also asked to revert unrelated changes. I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a single 16-bit operation. It should be easy to implement it later, but I prefer to focus on changing the format of the bytecode. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
> On Apr 23, 2016, at 8:59 AM, Serhiy Storchakawrote: > > I collected statistics for use opcodes with different arguments during > running CPython tests. Estimated size with using wordcode is 1.33 times less > than with using current bytecode. > > [1] http://comments.gmane.org/gmane.comp.python.ideas/38293 I think the word code patch should go in sooner rather than later. Several of us have been through the patch and it is in pretty good shape (some parts still need work though). The earlier this goes in, the more time we'll have to shake out any unexpected secondary effects. perfect-is-the-enemy-of-good-ly yours, Raymond P.S. The patch is smaller, more tractable, and in better shape than the C version of OrderedDict was when it went in. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
On 13.04.16 19:33, Guido van Rossum wrote: Nice work. I think that for CPython, speed is much more important than memory use for the code. Disk space is practically free for anything smaller than a video. :-) I collected statistics for use opcodes with different arguments during running CPython tests. Estimated size with using wordcode is 1.33 times less than with using current bytecode. [1] http://comments.gmane.org/gmane.comp.python.ideas/38293 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Just on the off chance that it’s related, could it have something to do with the bug in findlabels? http://bugs.python.org/issue26448 (I have high confidence that my patch fixes the problem, just haven’t gotten around to completing the tests.) From: Demur Rumed [mailto:gunkm...@gmail.com] Sent: Saturday, April 16, 2016 17:05 To: python-dev@python.org Subject: Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units The outstanding bug with this patch right now is a regression in line numbers causing the test for http://bugs.python.org/issue9936 to fail. I've tried to debug it without success ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
The outstanding bug with this patch right now is a regression in line numbers causing the test for http://bugs.python.org/issue9936 to fail. I've tried to debug it without success ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Le jeudi 14 avril 2016, Nick Coghlana écrit : > > > IHMO it's not a big deal to update these projects for the future > > Python 3.6. I can even help them to support the new bytecode format. > > We've also had previous discussions on adding a "minimum viable > bytecode editing" API to the standard library, and updating these > third party modules to support wordcode instead of bytecode could > provide a good use-case-driven opportunity for defining that (i.e. it > wouldn't be about providing an end user facing API directly, but > rather about letting CPython take care of the bookkeeping details for > things like lnotab and sorting out jump targets). Yeah, I know well this discussion since it started with my PEP 511. I wrote the bytecode as a tool for the discussion, to try to understand better the use case. The main task was to design the API. I first looked at byteplay and codetranformer projects, but I found some issues in their design. Their API has some design issues. IMHO their API is not the best to modify bytecode. My goal is to support Bytecode.from_code(code).to_code()==code: store enough information to be able to emit again exactly the same bytecode (line numbers, exact argument value, etc.). I started with a long email, but I decided to document differences in bytecode documentation: https://bytecode.readthedocs.org/en/latest/byteplay_codetransformer.html Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
On 14 April 2016 at 08:26, Victor Stinnerwrote: > 2016-04-14 0:11 GMT+02:00 Ryan Gonzalez : >> So code that depends on iterating through bytecode via HAS_ARG is going to >> break... > > Sure. This change is backward incompatible for applications parsing > bytecode in C or Python. That's why the patch also has to update the > dis module. > > I don't see how you plan to keep the backwad compatibility, since the > argument size changed from 2 bytes to 1 byte. You must update your > code (written in C or Python or whatever). > > Hopefully, the dis was enhanced in Python 3.4: get_instructions() now > gives nice Instructon objects rather than only pure text output. > > FYI I wrote my own library to decode and decode bytecode. It provides > abstract bytecode objects to easily modify bytecode: > https://bytecode.readthedocs.org/ > > I suggest to use such library (or simply the dis module for simple > needs) if you have to handle bytecode, rather than writing your own > code. > > I know a few other projects which handle directly bytecode: > > * https://pypi.python.org/pypi/codetransformer > * https://github.com/serprex/byteplay > * https://pypi.python.org/pypi/coverage > > IHMO it's not a big deal to update these projects for the future > Python 3.6. I can even help them to support the new bytecode format. +1 We've also had previous discussions on adding a "minimum viable bytecode editing" API to the standard library, and updating these third party modules to support wordcode instead of bytecode could provide a good use-case-driven opportunity for defining that (i.e. it wouldn't be about providing an end user facing API directly, but rather about letting CPython take care of the bookkeeping details for things like lnotab and sorting out jump targets). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
On 2016-04-13 12:24 PM, Victor Stinner wrote: Can someone please review the change? +1 for the change. I can take a look at the patch in a few days. Yury ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
2016-04-14 0:11 GMT+02:00 Ryan Gonzalez: > So code that depends on iterating through bytecode via HAS_ARG is going to > break... Sure. This change is backward incompatible for applications parsing bytecode in C or Python. That's why the patch also has to update the dis module. I don't see how you plan to keep the backwad compatibility, since the argument size changed from 2 bytes to 1 byte. You must update your code (written in C or Python or whatever). Hopefully, the dis was enhanced in Python 3.4: get_instructions() now gives nice Instructon objects rather than only pure text output. FYI I wrote my own library to decode and decode bytecode. It provides abstract bytecode objects to easily modify bytecode: https://bytecode.readthedocs.org/ I suggest to use such library (or simply the dis module for simple needs) if you have to handle bytecode, rather than writing your own code. I know a few other projects which handle directly bytecode: * https://pypi.python.org/pypi/codetransformer * https://github.com/serprex/byteplay * https://pypi.python.org/pypi/coverage IHMO it's not a big deal to update these projects for the future Python 3.6. I can even help them to support the new bytecode format. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
So code that depends on iterating through bytecode via HAS_ARG is going to break... Darn it. :/ -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Apr 13, 2016 4:44 PM, "Victor Stinner"wrote: > Le mercredi 13 avril 2016, Ryan Gonzalez a écrit : > >> What is the value of HAS_ARG going to be now? >> > > I asked Demur to keep HAS_ARG(). Not really for backward compatibility, > but for the dis module: to keep a nice assembler. There are also debug > traces in ceval.c which use it. > > For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0 > rather than reading the bytecode) for operators with no argument. Or maybe > it's completly useless :-) > > Victor > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Le mercredi 13 avril 2016, Ryan Gonzaleza écrit : > What is the value of HAS_ARG going to be now? > I asked Demur to keep HAS_ARG(). Not really for backward compatibility, but for the dis module: to keep a nice assembler. There are also debug traces in ceval.c which use it. For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0 rather than reading the bytecode) for operators with no argument. Or maybe it's completly useless :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
The EXTENDED_ARG is included in the multibyte ops, I treat it just like any other operator. Here's a snippet of my hacked-dis.dis output, which made it clear to me that I could just count them as an "operator with word operand." Line 3000: x = x if x or not x and x is None else x 0001dc83 7c 00 00 LOAD_FAST x 0001dc86 91 01 00 EXTENDED_ARG1 0001dc89 70 9f dc JUMP_IF_TRUE_OR_POP L1dc9f 0001dc8c 7c 00 00 LOAD_FAST x 0001dc8f 0c UNARY_NOT 0001dc90 91 01 00 EXTENDED_ARG1 0001dc93 6f 9f dc JUMP_IF_FALSE_OR_POPL1dc9f 0001dc96 7c 00 00 LOAD_FAST x 0001dc99 74 01 00 LOAD_GLOBAL None 0001dc9c 6b 08 00 COMPARE_OP 'is' L1dc9f: 0001dc9f 91 01 00 EXTENDED_ARG1 0001dca2 72 ab dc POP_JUMP_IF_FALSE L1dcab 0001dca5 7c 00 00 LOAD_FAST x 0001dca8 6e 03 00 JUMP_FORWARDL1dcae (+3) L1dcab: 0001dcab 7c 00 00 LOAD_FAST x L1dcae: 0001dcae 7d 00 00 STORE_FAST x On Wed, Apr 13, 2016 at 2:23 PM, Victor Stinnerwrote: > 2016-04-13 23:02 GMT+02:00 Eric Fahlgren : > > Percentage of 1-byte args= 96.80% > > Yeah, I expected such high ratio. Good news that you confirm it. > > > > Non-argument ops =53,719 > > One-byte args= 368,787 > > Multi-byte args =12,191 > > Again, only a very few arguments take multiple bytes. Good, the > bytecode will be smaller. > > IMHO it's more a nice side effect than a real goal. The runtime > performance matters more than the size of the bytecode, it's not like > a bytecode take 4 MB. It's probably closer to 1 KB and so can probably > benefit of the fatest CPU caches. > > > > Just for the record, here's my arithmetic: > > byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs > > wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs > > If multiByteArgs means any size > 1 byte, the wordCodeSize formula is > wrong: > > - no parameter: 2 bytes > - 8-bit parameter: 2 bytes > - 16-bit parameter: 4 bytes > - 24-bit parameter: 6 bytes > - 32-bit parameter: 8 bytes > > But you wrote that you didn't see EXTEND_ARG, so I guess that > multibyte means 16-bit in your case, and so your formula is correct. > > Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit > parameter for function with annotation. > > > > (It is interesting to note that I have never encountered an EXTENDED_ARG > operator in the wild, only in my own synthetic examples.) > > As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with > annotations. > > Victor > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
What is the value of HAS_ARG going to be now? -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Apr 13, 2016 11:26 AM, "Victor Stinner"wrote: > Hi, > > In the middle of recent discussions about Python performance, it was > discussed to change the Python bytecode. Serhiy proposed to reuse > MicroPython short bytecode to reduce the disk space and reduce the > memory footprint. > > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > >http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > -if (HAS_ARG(opcode)) > -oparg = NEXTARG(); > +oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > > I reviewed first versions of the change, and IMHO it's almost ready to > be merged. But I would prefer to have a review from a least a second > core reviewer. > > Can someone please review the change? > > -- > > The side effect of wordcode is that arguments in 0..255 now uses 2 > bytes per instruction instead of 3, so it also reduce the size of > bytecode for the most common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead > of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 > bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit > argument for keyword defaults and 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for > bytecode longer than 256 bytes. > > -- > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > >http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
2016-04-13 23:02 GMT+02:00 Eric Fahlgren: > Percentage of 1-byte args= 96.80% Yeah, I expected such high ratio. Good news that you confirm it. > Non-argument ops =53,719 > One-byte args= 368,787 > Multi-byte args =12,191 Again, only a very few arguments take multiple bytes. Good, the bytecode will be smaller. IMHO it's more a nice side effect than a real goal. The runtime performance matters more than the size of the bytecode, it's not like a bytecode take 4 MB. It's probably closer to 1 KB and so can probably benefit of the fatest CPU caches. > Just for the record, here's my arithmetic: > byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs > wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs If multiByteArgs means any size > 1 byte, the wordCodeSize formula is wrong: - no parameter: 2 bytes - 8-bit parameter: 2 bytes - 16-bit parameter: 4 bytes - 24-bit parameter: 6 bytes - 32-bit parameter: 8 bytes But you wrote that you didn't see EXTEND_ARG, so I guess that multibyte means 16-bit in your case, and so your formula is correct. Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit parameter for function with annotation. > (It is interesting to note that I have never encountered an EXTENDED_ARG > operator in the wild, only in my own synthetic examples.) As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with annotations. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
On Wednesday, April 13, 2016 09:25, Victor Stinner wrote: > The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per > instruction instead of 3, so it also reduce the size of bytecode for the most > common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3. > Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit > uses 4 > units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and > 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for bytecode > longer than 256 bytes. A couple months ago during an earlier discussion of wordcode, I got curious enough to instrument dis.dis so that I could calculate the actual size changes expected in practice. I ran it on a large chunk of our product code, here are the results (looks best with a fixed font). I suspect the fairly significant reduction in footprint will also give better cache hit characteristics, so we might see some "magic" speed ups from that, too. Code-generating source lines =70,792 Total bytes = 1,196,653 Argument-bearing operators = 380,978 Operands over 1 byte long=12,191 Extended arguments = 0 Percentage of 1-byte args= 96.80% Total operators = 434,697 Non-argument ops =53,719 One-byte args= 368,787 Multi-byte args =12,191 Byte code size = 1,196,653 Word code size = 893,776 Word:byte size = 74.69% Just for the record, here's my arithmetic: byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs (It is interesting to note that I have never encountered an EXTENDED_ARG operator in the wild, only in my own synthetic examples.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Nice work. I think that for CPython, speed is much more important than memory use for the code. Disk space is practically free for anything smaller than a video. :-) On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinnerwrote: > Hi, > > In the middle of recent discussions about Python performance, it was > discussed to change the Python bytecode. Serhiy proposed to reuse > MicroPython short bytecode to reduce the disk space and reduce the > memory footprint. > > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > >http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > -if (HAS_ARG(opcode)) > -oparg = NEXTARG(); > +oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > > I reviewed first versions of the change, and IMHO it's almost ready to > be merged. But I would prefer to have a review from a least a second > core reviewer. > > Can someone please review the change? > > -- > > The side effect of wordcode is that arguments in 0..255 now uses 2 > bytes per instruction instead of 3, so it also reduce the size of > bytecode for the most common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead > of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 > bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit > argument for keyword defaults and 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for > bytecode longer than 256 bytes. > > -- > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > >http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Wordcode: new regular bytecode using 16-bit units
Hi, In the middle of recent discussions about Python performance, it was discussed to change the Python bytecode. Serhiy proposed to reuse MicroPython short bytecode to reduce the disk space and reduce the memory footprint. Demur Rumed proposes a different change to use a regular bytecode using 16-bit units: an instruction has always one 8-bit argument, it's zero if the instruction doesn't have an argument: http://bugs.python.org/issue26647 According to benchmarks, it looks faster: http://bugs.python.org/issue26647#msg263339 IMHO it's a nice enhancement: it makes the code simpler. The most interesting change is made in Python/ceval.c: -if (HAS_ARG(opcode)) -oparg = NEXTARG(); +oparg = NEXTARG(); This code is the very hot loop evaluating Python bytecode. I expect that removing a conditional branch here can reduce the CPU branch misprediction. I reviewed first versions of the change, and IMHO it's almost ready to be merged. But I would prefer to have a review from a least a second core reviewer. Can someone please review the change? -- The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per instruction instead of 3, so it also reduce the size of bytecode for the most common case. Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and 24-bit argument for annotations. Other common instruction known to use large argument are jumps for bytecode longer than 256 bytes. -- Right now, ceval.c still fetchs opcode and then oparg with two 8-bit instructions. Later, we can discuss if it would be possible to ensure that the bytecode is always aligned to 16-bit in memory to fetch the two bytes using a uint16_t* pointer. Maybe we can overallocate 1 byte in codeobject.c and align manually the memory block if needed. Or ceval.c should maybe copy the code if it's not aligned? Raymond Hettinger proposes something like that, but it looks like there are concerns about non-aligned memory accesses: http://bugs.python.org/issue25823 The cost of non-aligned memory accesses depends on the CPU architecture, but it can raise a SIGBUS on some arch (MIPS and SPARC?). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com