Re: [Python-Dev] Opcode cache in ceval loop

Yury Selivanov Mon, 01 Feb 2016 13:54:57 -0800

Hi Brett,

On 2016-02-01 2:30 PM, Brett Cannon wrote:

On Mon, 1 Feb 2016 at 11:11 Yury Selivanov <yselivanov...@gmail.com<mailto:yselivanov...@gmail.com>> wrote:
    Hi,

[..]


    What's next?
    ------------

    First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
    [1].  It's a very straightforward optimization, the patch is small and
    easy to review.


+1 from me.


    Second, I'd like to merge the new opcode cache, see issue 26219 [5].
    All unittests pass.  Memory usage increase is very moderate (<1mb for
    the entire test suite), and the performance increase is significant.
    The only potential blocker for this is PEP 509 approval (which I'd be
    happy to assist with).

I think the fact that it improves performance across the board as wellas eliminates the various tricks people use to cache global andbuilt-ins, a big +1 from me. I guess that means Victor needs to askfor pronouncement on PEP 509.

Great! AFAIK Victor still needs to update the PEP with some changes(globally unique ma_version). My patch includes the latestimplementation of PEP 509, and it works fine (no regressions, no brokenunittests). I can also assist with reviewing Victor's implementation ifthe PEP is accepted.


BTW, where does LOAD_ATTR fit into all of this?

LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if Iunderstand you question correctly). It's based on the followingassumptions (that really make JITs work so well):


1. Most classes don't implement __getattribute__.

2. A lot of attributes are stored in objects' __dict__s.

3. Most attributes aren't shaded by descriptors/getters-setters; mostcode just uses "self.attr".

4. An average method/function works on objects of the same type. Whichmeans that those objects were constructed in a very similar (if notexact) fashion.


For instance:

class F:
   def __init__(self, name):
       self.name = name
   def say(self):
       print(self.name)   # <- For all F instances,
                          # offset of 'name' in `F().__dict__`s
                          # will be the same

If LOAD_ATTR gets too many cache misses (20 in my current patch) it getsdeoptimized, and the default implementation is used. So if the code isvery dynamic - there's no improvement, but no performance penalty either.


In my patch, I use the cache to store (for LOAD_ATTR specifically):

- pointer to object's type
- type->tp_version_tag
- the last successful __dict__ offset

The first two fields are used to make sure that we have objects of thesame type. If it changes, we deoptimize the opcode immediately. Thenwe try the offset. If it's successful - we have a cache hit. If not,that's fine, we'll try another few times before deoptimizing the opcode.


    What do you think?


It all looks great to me!


Thanks!

Yury

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Opcode cache in ceval loop

Reply via email to