Nice work. I think that for CPython, speed is much more important than memory use for the code. Disk space is practically free for anything smaller than a video. :-)
On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinner <victor.stin...@gmail.com> wrote: > Hi, > > In the middle of recent discussions about Python performance, it was > discussed to change the Python bytecode. Serhiy proposed to reuse > MicroPython short bytecode to reduce the disk space and reduce the > memory footprint. > > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > > http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > - if (HAS_ARG(opcode)) > - oparg = NEXTARG(); > + oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > > I reviewed first versions of the change, and IMHO it's almost ready to > be merged. But I would prefer to have a review from a least a second > core reviewer. > > Can someone please review the change? > > -- > > The side effect of wordcode is that arguments in 0..255 now uses 2 > bytes per instruction instead of 3, so it also reduce the size of > bytecode for the most common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead > of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 > bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit > argument for keyword defaults and 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for > bytecode longer than 256 bytes. > > -- > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > > http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com