#16316: cached_function and cached_method for unhashable elements
-------------------------------------+-------------------------------------
       Reporter:  saraedum           |        Owner:
           Type:  enhancement        |       Status:  needs_work
       Priority:  minor              |    Milestone:  sage-6.3
      Component:  misc               |   Resolution:
       Keywords:                     |    Merged in:
        Authors:  Julian Rueth       |    Reviewers:  Peter Bruin
Report Upstream:  N/A                |  Work issues:
         Branch:                     |       Commit:
  u/saraedum/ticket/16316            |  dc37d43b55bfdff5c3c4b93be8c298955d53872e
   Dependencies:  #16251             |     Stopgaps:
-------------------------------------+-------------------------------------
Changes (by tscrim):

 * status:  needs_review => needs_work


Comment:

 Replying to [comment:19 saraedum]:
 > I'm not sure I understand. If p-adics implemented a non-trivial
 `__hash__` then this could never be made to be consistent with our current
 `==`. The intermediate function creates something with a new `==` which is
 then consistent with its `__hash__`. Does this answer your question?

 Just making sure it is different than the hash requirements. By unique you
 mean this bit of doc:
 {{{
         An implementation must make sure that for elements ``a`` and
 ``b``, if
         ``a != b``, then also ``a._cache_key() != b._cache_key()``. In
 pratice
         this means that the ``_cache_key`` should always include the
 ``id`` of
         the parent.
 }}}
 whose the contrapositive is if `a._cache_key() == b._cache_key()`, then `a
 == b` (assuming that `==` in `p`-adics satisfies `a == b` iff `not (a !=
 b)`). Now this is the opposite requirement of hash: `a == b` implies
 `hash(a) == hash(b)`. In order to retrieve  elements from the cache, you
 need that `a._cache_key` never changes over the lifetime of `a`.
 Enforcement of this is no different than that of `__hash__`.

 I'm good with the new wording (BTW, there's a typo (pratice) and this
 docstring needs `::`).

 However there is a code smell to me. The more I think about it, the more I
 feel what should be done is `==` be strengthened (i.e. less things compare
 as equals) and a `__hash__` method implemented.

 ----

 Now for some timings, before:
 {{{
 sage: @cached_function
 ....: def foo(x):
 ....:     return x
 ....:
 sage: R.<x> = QQ[]
 sage: C = crystals.Tableaux(['A',2], shape=[2,1])
 sage: p = R(range(100))
 sage: q = R(range(10000))
 sage: pl = R(range(1000000))
 sage: m = matrix(10, 10, range(100))

 sage: %timeit foo(5)
 1000000 loops, best of 3: 1.46 µs per loop
 sage: %timeit foo(C)
 1000000 loops, best of 3: 942 ns per loop

 sage: %timeit foo(p)
 1000 loops, best of 3: 387 µs per loop
 sage: %timeit foo(q)
 10 loops, best of 3: 29.5 ms per loop
 sage: %timeit foo(pl)
 1 loops, best of 3: 2.85 s per loop

 sage: foo(m) # fails as it should
 }}}
 with the branch:
 {{{
 sage: %timeit foo(5)
 1000000 loops, best of 3: 1.51 µs per loop
 sage: %timeit foo(C)
 1000000 loops, best of 3: 976 ns per loop

 sage: %timeit foo(p)
 1000 loops, best of 3: 390 µs per loop
 sage: %timeit foo(q)
 10 loops, best of 3: 30.4 ms per loop
 sage: %timeit foo(pl)
 1 loops, best of 3: 3.07 s per loop

 sage: foo(m) # Still fails
 }}}
 So there's no statistically significant slowdown that I can see.

 ----

 Some other design questions/comments:

 - Is there some reason why you didn't `cpdef _cache_key(self)` or make it
 into a `@lazy_attribute` since it cannot change over the lifetime of the
 object? The latter would actually speed things up for things whose hash is
 long to compute (ex. large degree dense polynomials).

 - I'm worried about not using the hash for polynomials for caching. I feel
 that `_cache_key` should result `self` for every hashable object. For
 example, this no longer would cache as the same element:
 {{{
 sage: R.<x> = QQ[]
 sage: p = R(range(100))
 sage: hash(p)
 1520766690567384545
 sage: hash(p._cache_key())
 -4253877232222459194

 sage: R.<x> = ZZ[]
 sage: p = R(range(100))
 sage: hash(p)
 1520766690567384545
 sage: hash(p._cache_key())
 6965062123311622150
 }}}
   So now you're requiring everyone to be ''very'' careful about the parent
 an object belongs too.

 - I'm pretty sure using the parent's `id` will break pickling with the
 cache. So suppose you create parent `A`, store an element `a` in a cache
 `C`. Now you pickle and unpickle `a`, `A`, and `C`. The `id` of `A` is
 different than before the pickle and unpickling, so we are no longer able
 to retrieve `a` from `C`:
 {{{
 sage: R = QQ['x']
 sage: p = R(range(100))
 sage: hash(p._cache_key())
 -4253877232222459194
 sage: dumps(p)
 # This is some pickle
 # Start a new sage session:
 sage: s = # The pickle above
 sage: hash(loads(s)._cache_key())
 2293648738429524678
 sage: hash(loads(s)) # This is persistent across sessions
 1520766690567384545
 }}}

 I think both of the last two are issues that need to be addressed.

--
Ticket URL: <http://trac.sagemath.org/ticket/16316#comment:22>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to