Hi Damien,

On 2016-02-01 3:59 PM, Damien George wrote:
Hi Yury,

That's great news about the speed improvements with the dict offset cache!

The cache struct is defined in code.h [2], and is 32 bytes long. When a
code object becomes hot, it gets an cache offset table allocated for it
(+1 byte for each opcode) + an array of cache structs.
Ok, so each opcode has a 1-byte cache that sits separately to the
actual bytecode.  But a lot of opcodes don't use it so that leads to
some wasted memory, correct?

Each code object has a list of opcodes and their arguments
(bytes object == unsigned char array).

"Hot" code objects have an offset table (unsigned chars), and
a cache entries array (hope your email client will display
the following correctly):

   opcodes          offset       cache entries
                    table

    OPCODE            0            cache for 1st LOAD_ATTR
    ARG1              0            cache for 1st LOAD_GLOBAL
    ARG2              0            cache for 2nd LOAD_ATTR
    OPCODE            0            cache for 1st LOAD_METHOD
    LOAD_ATTR         1            ...
    ARG1              0
    ARG2              0
    OPCODE            0
    LOAD_GLOBAL       2
    ARG1              0
    ARG2              0
    LOAD_ATTR         3
    ARG1              0
    ARG2              0
    ...              ...
    LOAD_METHOD       4
    ...              ...

When, say, a LOAD_ATTR opcode executes, it first checks if the
code object has a non-NULL cache-entries table.

If it has, that LOAD_ATTR then uses the offset table (indexing
with its `INSTR_OFFSET()`) to find its position in
cache-entries.


But then how do you index the cache, do you keep a count of the
current opcode number?  If I remember correctly, CPython has some
opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
bytecode cannot be easily mapped to a bytecode number.

First, when a code object is created, it doesn't have
an offset table and cache entries (those are set to NULL).

Each code object has a new field to count how many times
it was called.  Each time a code object is called with
PyEval_EvalFrameEx, that field is inced.

Once a code object is called more than 1024 times we:

1. allocate memory for its offset table

2. iterate through its opcodes and count how many
LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has;

3. As part of (2) we initialize the offset-table with
correct mapping.  Some opcodes will have a non-zero
entry in the offset-table, some won't.  Opcode args
will always have zeros in the offset tables.

4. Then we allocate cache-entries table.

Yury
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to