[sage-trac] Re: [Sage] #8611: speed up cached_function and cached_method

Sage Sun, 09 Jan 2011 12:55:29 -0800

#8611: speed up cached_function and cached_method
---------------------------------------+------------------------------------
   Reporter:  jason                    |       Owner:  tbd          
       Type:  enhancement              |      Status:  needs_review 
   Priority:  major                    |   Milestone:  sage-4.6.1   
  Component:  misc                     |    Keywords:  cached method
     Author:  Jason Grout, Simon King  |    Upstream:  N/A          
   Reviewer:                           |      Merged:               
Work_issues:                           |  
---------------------------------------+------------------------------------
Changes (by newvalueoldvalue):


  * keywords:  => cached method
  * status:  needs_work => needs_review
  * work_issues:  Patch doesn't apply =>
  * author:  Jason Grout => Jason Grout, Simon King


Comment:

 Replying to [comment:4 davidloeffler]:
 > Might it not be better for @cached_method to store its cache as an
 attribute of the instance object, rather than having a single cache which
 stores the values for all instances of that class?

 It was not the case that there was a single cache for all instances (the
 cache was in the instance for `cached_method` resp. in the parent of the
 instance for `cached_in_parent_method`). However, a part of the overhead
 came from the fact that the cached method ''itself'' has not been an
 attribute of the instance but of its class.

 I just attached a new patch that goes far beyond Jason's approach.

 These examples show how the cached method performance is
 improved by my patch:

 '''__Using instance attributes__'''

 Polynomial ideals have a cached method `groebner_basis`. Without
 my patch, asking for `I.groebner_basis` would return a
 `CachedMethodCaller` - always a ''new instance'' whenever the cached
 method is requested. Of course, that's a waste of time. Therefore,
 I store the cached method caller as an attribute of the instance,
 which is done when the attribute is first requested:

 {{{
 sage: R.<x, y, z> = PolynomialRing(QQ, 3)
 sage: I = R*(x^3 + y^3 + z^3,x^4-y^4)
 sage: I.__dict__.has_key('groebner_basis')
 False
 sage: I.groebner_basis
 Cached version of <function groebner_basis at 0x15a0668>
 sage: I.__dict__['groebner_basis']
 Cached version of <function groebner_basis at 0x15a0668>
 sage: I.groebner_basis is I.groebner_basis # False without my patch
 True
 }}}

 This already yields a considerable speedup. However, pickling posed
 a problem: Since the cached method caller is in the instance's
 dictionary, it would be attempted to pickle it when the instance is
 pickled. But functions can not be pickled.

 The solution is "creative" (I hope you don't find it crazy - at least
 it works!):

  * `CachedMethodCaller` has a `__reduce__` method, that
    creates a pickle - but when unpickling it, something else is created,
    namely an instance of the new class `CachedMethodPickle`. It only knows
    the name of the original cached method, and it knows the instance to
    which it belongs.

  * Now, after unpickling, one wants to work with the cached method.
    Working means: Requesting attributes (like `__call__`).
 `CachedMethodPickle`
    has a `__getattr__` method, which first removes the entry of the
    instance's dictionary pointing to it; so, it effectively commits
 suicide.

  * Since the `CachedMethodPickle` has been removed, the original cached
    method is again available from the class of the instance.

 Here is an example:
 {{{
 sage: I.groebner_basis()
 [y^5*z^3 - 1/4*x^2*z^6 + 1/2*x*y*z^6 + 1/4*y^2*z^6, x^2*y*z^3 - x*y^2*z^3
 + 2*y^3*z^3 + z^6, x*y^3 + y^4 + x*z^3, x^3 + y^3 + z^3]
 }}}
 We now pickle and unpickle the ideal. The cached method
 `groebner_basis` is replaced by a placeholder::
 {{{
 sage: J = loads(dumps(I))
 sage: J.groebner_basis
 Pickle of the cached method "groebner_basis"
 }}}

 But as soon as any other attribute is requested from the
 placeholder, it replaces itself by the cached method, and
 the entries of the cache are actually preserved::
 {{{
 sage: J.groebner_basis.is_in_cache()
 True
 sage: J.groebner_basis
 Cached version of <function groebner_basis at 0x...>
 sage: J.groebner_basis() == I.groebner_basis()
 True
 }}}

 '''__Creating the cache key__'''

 1. I found that most of the overhead comes from the computation
 of the cache key. It relies on
 `sage.misc.function_mangling.ArgumentFixer`.
 Internally, it works on lists and has some loops. So, I thought
 it would benefit from Cython - and indeed it does!

 Moreover, I made a shortpath in the method `fix_to_pos`: If there
 are no named arguments then the result is essentially the list of
 positional arguments plus the list of arguments with default value.

 2. Some more cache key improvements take place inside
 `sage.misc.cachefunc`.
 Originally, the key was obtained by a method `get_key`, which was then
 calling `_get_instance_key`. In order to reduce the overhead of calling
 two methods, I directly inserted the code of `_get_instance_key` and
 removed that method.

 3. I also use Jason's idea of creating a shortpath for the common case
 of an empty argument list. However, if the argument list is empty,
 there might still be arguments with a default value. But we want
 that explicitly providing these default values yields the same cache
 key - hence, the following would not work with Jason's patch (continuing
 the example above):
 {{{
 sage: I.groebner_basis(algorithm='') is I.groebner_basis()
 True
 }}}

 The solution is to compute the cache key for an empty argument list
 ''once'', and store the result for later. That's another reason why it
 is good to have one `CachedMethodCaller` permanently attached to an
 instance (rather than always creating a new one), since otherwise
 caching of the default key would be difficult.

 '''__Accessing the cache__'''

 Originally, a method `get_cache` was used to get the cache dictionary.
 In some cases, that method used to call another one,
 `_get_instance_cache`.
 Hence, again, there was an overhead of calling two methods.

 By my patch, the cache is ''always'' available as an attribute of
 the `CachedMethodCaller` instance - hence, access is superfast. There is
 almost no memory problem, since it is simply a new pointer to a
 dictionary that is an attribute of the instance (resp. of its parent):
 {{{
 sage: I.groebner_basis.cache is I._cache__groebner_basis
 True
 }}}

 There is an additional benefit of this approach: If the instance has
 no `__dict__` and does not accept an attribute to be set, it is still
 possible to cache the method. Similarly, if the parent of the instance
 does not accept attributes to be set, `cached_in_parent_method` would
 still result in a cached method (but it would be cached in the instance,
 not in its parent).

 '''__Documentation__'''

 I extended the documentation - but I did not cover `ClearCacheOnPickle`,
 which my patch does not change. Apart from that, the doctest coverage
 of `sage.misc.cachefunc` is complete. For `sage.misc.function_mangling`,
 the coverage script complains "Please add a `TestSuite(s).run()` doctest."
 I don't know were that comes from.

 Of course, all doctests pass for me. I did not test whether the
 documentation
 looks nice in html.

 '''__Performance__'''

 Last but not least, what the ticket is about: Performance!

 Here is the setting:
 {{{
 sage: class Foo:
 ....:     def __init__(self,P=None):
 ....:         if P is None:
 ....:             self._parent=self
 ....:         else:
 ....:             self._parent=P
 ....:     @cached_method
 ....:     def bar(self,m,n=3):
 ....:         return m^n
 ....:     @cached_in_parent_method
 ....:     def barP(self,m,n=3):
 ....:         return m^n
 ....:     def parent(self):
 ....:         return self._parent
 ....:
 sage: F = Foo()
 sage: F_ZZ = Foo(ZZ)
 }}}

 __Without my patch__
 {{{
 # cache in F:
 sage: F.bar(2,3) is F.bar(2) is F.bar(2,n=3) is F.bar(n=3,m=2)
 True
 sage: timeit('F.bar(2)', number=10000)
 10000 loops, best of 3: 18 Âµs per loop
 sage: timeit('F.bar(2,3)', number=10000)
 10000 loops, best of 3: 18.2 Âµs per loop
 sage: timeit('F.bar(2,n=3)', number=10000)
 10000 loops, best of 3: 20.2 Âµs per loop
 sage: timeit('F.bar(n=3,m=2)', number=10000)
 10000 loops, best of 3: 20.2 Âµs per loop
 }}}
 {{{
 # cache in F as a parent of itself:
 sage: F.barP(2,3) is F.barP(2)  # BUG!
 False
 sage: F.barP.get_key(2,3)
 ((<__main__.Foo instance at 0x4570d88>, 2, 3), ())
 sage: F.barP.get_key(2)
 ((<__main__.Foo instance at 0x4570d88>, 2), ())
 sage: timeit('F.barP(2)', number=10000)
 10000 loops, best of 3: 21.9 Âµs per loop
 sage: timeit('F.barP(2,3)', number=10000)
 10000 loops, best of 3: 22.4 Âµs per loop
 sage: timeit('F.barP(2,n=3)', number=10000)
 10000 loops, best of 3: 23.8 Âµs per loop
 sage: timeit('F.barP(n=3,m=2)', number=10000)
 10000 loops, best of 3: 24.5 Âµs per loop
 }}}
 {{{
 # cache in ZZ, which does not allow attribute assignment
 sage: F_ZZ.barP(2)
 Traceback (most recent call last):
 ...
 AttributeError: 'dictproxy' object has no attribute 'setdefault'
 }}}

 __With my patch__
 {{{
 # cache in F:
 sage: F.bar(2,3) is F.bar(2) is F.bar(2,n=3) is F.bar(n=3,m=2)
 True
 sage: timeit('F.bar(2)', number=10000)
 10000 loops, best of 3: 3.27 Âµs per loop
 sage: timeit('F.bar(2,3)', number=10000)
 10000 loops, best of 3: 3.26 Âµs per loop
 sage: timeit('F.bar(2,n=3)', number=10000)
 10000 loops, best of 3: 3.63 Âµs per loop
 sage: timeit('F.bar(n=3,m=2)', number=10000)
 10000 loops, best of 3: 3.69 Âµs per loop
 }}}
 {{{
 # cache in F as parent of itself:
 # The bug has disappeared
 sage: F.barP(2,3) is F.barP(2) is F.barP(2,n=3) is F.barP(n=3,m=2)
 True
 sage: timeit('F.barP(2)', number=10000)
 10000 loops, best of 3: 6.09 Âµs per loop
 sage: timeit('F.barP(2,3)', number=10000)
 10000 loops, best of 3: 5.5 Âµs per loop
 sage: timeit('F.barP(2,n=3)', number=10000)
 10000 loops, best of 3: 5.86 Âµs per loop
 sage: timeit('F.barP(n=3,m=2)', number=10000)
 10000 loops, best of 3: 6.03 Âµs per loop
 }}}
 {{{
 # cache in ZZ, which does not allow attribute assignment.
 # It works, since the cache is now automatically moved
 # to F_ZZ, as a last resort
 sage: F_ZZ.barP(2,3) is F_ZZ.barP(2) is F_ZZ.barP(2,n=3) is
 F_ZZ.barP(n=3,m=2)
 True
 sage: timeit('F_ZZ.barP(2)', number=10000)
 10000 loops, best of 3: 5.6 Âµs per loop
 sage: timeit('F_ZZ.barP(2,3)', number=10000)
 10000 loops, best of 3: 5.56 Âµs per loop
 sage: timeit('F_ZZ.barP(2,n=3)', number=10000)
 10000 loops, best of 3: 5.86 Âµs per loop
 sage: timeit('F_ZZ.barP(n=3,m=3)', number=10000)
 10000 loops, best of 3: 6.1 Âµs per loop
 }}}

 Here is the effect of using a shortpath:
 {{{
 sage: class BAR:
 ....:     @cached_method
 ....:     def bar(self,m=2,n=3):
 ....:         return m^n
 ....:
 sage: B = BAR()
 sage: B.bar() is B.bar(2,3)
 True
 }}}

 Without my patch:
 {{{
 sage: timeit('B.bar(n=2,m=3)', number=10000)
 10000 loops, best of 3: 20.4 Âµs per loop
 sage: timeit('B.bar()', number=10000)
 10000 loops, best of 3: 17.1 Âµs per loop
 }}}

 With my patch:
 {{{
 sage: timeit('B.bar(n=2,m=3)', number=10000)
 10000 loops, best of 3: 3.73 Âµs per loop
 sage: timeit('B.bar()', number=10000)
 10000 loops, best of 3: 1.95 Âµs per loop
 }}}

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/8611#comment:7>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

[sage-trac] Re: [Sage] #8611: speed up cached_function and cached_method

Reply via email to