#14711: Weak references in the coercion graph
-------------------------------------+-------------------------------------
       Reporter:  jpflori            |        Owner:  davidloeffler
           Type:  defect             |       Status:  needs_work
       Priority:  critical           |    Milestone:  sage-5.13
      Component:  number fields      |   Resolution:
       Keywords:  memleak, number    |    Merged in:
  field, QuadraticField              |    Reviewers:
        Authors:  Simon King         |  Work issues:  String repr. of
Report Upstream:  N/A                |  weakened maps; copying/pickling of
         Branch:                     |  maps
  u/SimonKing/ticket/14711           |       Commit:
   Dependencies:                     |  5168cfd15d4cdbaa3ffdbd4be0f7d783f77257c7
                                     |     Stopgaps:
-------------------------------------+-------------------------------------
Changes (by SimonKing):

 * status:  needs_review => needs_work
 * work_issues:   => String repr. of weakened maps; copying/pickling of maps


Old description:

> The following quickly eats up memory:
> {{{
> sage: for D in xrange(2,2**32):
> ....:     QuadraticField(-D);
> ....:
> }}}
> (This is with 5.10.rc0)
>
> __Problem analysis__
>
> The quadratic field is created with a coerce embedding into `CLF`. At the
> same
> time, this coerce embedding is stored in `CLF._coerce_from_hash`:
> {{{
> sage: phi = CLF.coerce_map_from(Q)
> sage: phi is Q.coerce_embedding()
> True
> sage: Q in CLF._introspect_coerce()['_coerce_from_hash']
> True
> }}}
> The "coerce_from_hash" is a `MonoDict`, hence, has only a weak reference
> to the key
> (Q, in this case). However, there still is a ''strong'' reference from
> CLF to the coerce map phi. And phi has a strong reference to its
> domain, thus, to Q. Hence, the existence of CLF prevents garbage
> collection of
> Q.
>
> And there is a second chain of strong references from CLF to Q: From CLF
> to
> phi to the parent of phi (i.e., a homset) to the domain Q of this homset.
>
> __Suggested solution__
>
> We can not turn the reference from CLF to phi into a weak reference,
> because
> then even a strong reference to Q would not prevent phi from garbage
> collection. Hence, we need to break the above mentioned reference chains
> in
> two points. In the attached branch, maps generally keep a strong
> reference to
> the codomain (this is important in composite maps and actions), but those
> used
> ''in the coercion system'' (and ''only'' there!!) will only have a weak
> reference to the domain, and they set the cdef `._parent` attribute to
> `None`
> (hence, we also override `.parent()`, so that it reconstructs the homset
> if
> the weak reference to the domain is still valid).
>
> To preserve the `domain()/codomain()` interface, I have removed the
> method
> `domain()` and have replaced it by a cdef public attribute that will
> either
> hold a weak reference (which returns the domain when called, hence, the
> interface does not change) or a `ConstantFunction` (which should actually
> be
> faster to call than a method). Since accessing a cdef attribute is still
> faster, the cdef attribute `_codomain` is kept (since this will always be
> a
> strong reference), but `_domain` has been removed.
>
> This "weakening of references" is done for the coercions found by
> `discover_coerce_map_from()` stored into `_coerce_from_hash`. So, this
> mainly
> happens for things done with `_coerce_map_from_()` and with composite
> maps. Similarly for `_convert_from_hash`.
>
> Weakening is ''not'' used on the maps that are explicitly registered by
> `.register_embedding()` and `.register_coercion()`. This is in order to
> preserve the connectivity of the coercion graph. The `register_*` methods
> are only used on selected maps, that are of particular importance for the
> backtrack search in `discover_coerce_map_from()`. These ''strong''
> registrations do not propagate: Compositions of strongly registered
> coercions found by `discover_coerce_map_from()` will be weakened.
>
> Since weakened maps should not be used outside of the coercion system,
> its
> string representation shows a warning to replace them by a copy. The
> attached
> branch implements copying of maps in some additional cases.
>
> `SchemeMorphism` can not inherit from `Morphism`, because of a bug with
> multiple inheritance of a Python class from Cython extension classes. But
> once
> this bug is fixed, we surely want to make `SchemeMorphism` inherit from
> `Morphism`. This transition is prepared here.
>
> In any case, the commit messages should give a concise description of
> what has
> been done.
>
> '''__Still TODO__'''
>
> Let the string representation of weakened maps point the user to the need
> of
> creating a copy.
>
> '''__TODO in future tickets__'''
>
> - Provide a documentation of the use of weak references in coercion, and
> of
>   different ways of registering coercions, with their different impacts
> on
>   garbage collecion.
> - Provide a version of `.register_coercion()` that weakens the coercion
>   map. It would hence have the same effect as returning a map by
>   `._coerce_map_from_()`, but of course `._coerce_map_from()` could not
> easily
>   be changed in an interactive session.
> - provide copying for ''all'' kinds of maps.
>
> '''__Effects on the overall functioning of Sage__'''
>
> It is conceivable that some parts of Sage still suppose implicitly that
> stuff
> cached with `UniqueRepresentation` is ''permanently'' cached, even though
> the
> seemingly permanent cache was not more than a consequence of a memory
> leak in
> the coercion system. With the attached branch, garbage collection of
> parent
> structures will much more often become possible. Hence, code that relied
> on a
> fake-permanent cache would now need to create the same parent repeatedly.
>
> I (Simon) have tested how many additional parent creations occur with the
> attached branch when running `sage -t --all`. The findings are summarised
> in
> comment:107: The number of additional parent creations increased by not
> more
> than 1% for all but two parent classes (both related with tableaux). I
> also
> found that the time to run the tests did not significantly increase.
>
> Jean-Pierre has occasionally stated that some of his computations have
> been
> infeasible with the memory leak in the above example. I hope that his
> computations will now succeed.

New description:

 The following quickly eats up memory:
 {{{
 sage: for D in xrange(2,2**32):
 ....:     QuadraticField(-D);
 ....:
 }}}
 (This is with 5.10.rc0)

 __Problem analysis__

 The quadratic field is created with a coerce embedding into `CLF`. At the
 same
 time, this coerce embedding is stored in `CLF._coerce_from_hash`:
 {{{
 sage: phi = CLF.coerce_map_from(Q)
 sage: phi is Q.coerce_embedding()
 True
 sage: Q in CLF._introspect_coerce()['_coerce_from_hash']
 True
 }}}
 The "coerce_from_hash" is a `MonoDict`, hence, has only a weak reference
 to the key
 (Q, in this case). However, there still is a ''strong'' reference from
 CLF to the coerce map phi. And phi has a strong reference to its
 domain, thus, to Q. Hence, the existence of CLF prevents garbage
 collection of
 Q.

 And there is a second chain of strong references from CLF to Q: From CLF
 to
 phi to the parent of phi (i.e., a homset) to the domain Q of this homset.

 __Suggested solution__

 We can not turn the reference from CLF to phi into a weak reference,
 because
 then even a strong reference to Q would not prevent phi from garbage
 collection. Hence, we need to break the above mentioned reference chains
 in
 two points. In the attached branch, maps generally keep a strong reference
 to
 the codomain (this is important in composite maps and actions), but those
 used
 ''in the coercion system'' (and ''only'' there!!) will only have a weak
 reference to the domain, and they set the cdef `._parent` attribute to
 `None`
 (hence, we also override `.parent()`, so that it reconstructs the homset
 if
 the weak reference to the domain is still valid).

 To preserve the `domain()/codomain()` interface, I have removed the method
 `domain()` and have replaced it by a cdef public attribute that will
 either
 hold a weak reference (which returns the domain when called, hence, the
 interface does not change) or a `ConstantFunction` (which should actually
 be
 faster to call than a method). Since accessing a cdef attribute is still
 faster, the cdef attribute `_codomain` is kept (since this will always be
 a
 strong reference), but `_domain` has been removed.

 This "weakening of references" is done for the coercions found by
 `discover_coerce_map_from()` stored into `_coerce_from_hash`. So, this
 mainly
 happens for things done with `_coerce_map_from_()` and with composite
 maps. Similarly for `_convert_from_hash`.

 Weakening is ''not'' used on the maps that are explicitly registered by
 `.register_embedding()` and `.register_coercion()`. This is in order to
 preserve the connectivity of the coercion graph. The `register_*` methods
 are only used on selected maps, that are of particular importance for the
 backtrack search in `discover_coerce_map_from()`. These ''strong''
 registrations do not propagate: Compositions of strongly registered
 coercions found by `discover_coerce_map_from()` will be weakened.

 Since weakened maps should not be used outside of the coercion system, its
 string representation shows a warning to replace them by a copy. The
 attached
 branch implements copying of maps in some additional cases.

 `SchemeMorphism` can not inherit from `Morphism`, because of a bug with
 multiple inheritance of a Python class from Cython extension classes. But
 once
 this bug is fixed, we surely want to make `SchemeMorphism` inherit from
 `Morphism`. This transition is prepared here.

 In any case, the commit messages should give a concise description of what
 has
 been done.

 '''__Still TODO__'''

 - Let the string representation of weakened maps point the user to the
 need of
 creating a copy.
 - Provide copying for ''all'' kinds of maps.

 '''__TODO in future tickets__'''

 - Provide a documentation of the use of weak references in coercion, and
 of
   different ways of registering coercions, with their different impacts on
   garbage collecion.
 - Provide a version of `.register_coercion()` that weakens the coercion
   map. It would hence have the same effect as returning a map by
   `._coerce_map_from_()`, but of course `._coerce_map_from()` could not
 easily
   be changed in an interactive session.

 '''__Effects on the overall functioning of Sage__'''

 It is conceivable that some parts of Sage still suppose implicitly that
 stuff
 cached with `UniqueRepresentation` is ''permanently'' cached, even though
 the
 seemingly permanent cache was not more than a consequence of a memory leak
 in
 the coercion system. With the attached branch, garbage collection of
 parent
 structures will much more often become possible. Hence, code that relied
 on a
 fake-permanent cache would now need to create the same parent repeatedly.

 I (Simon) have tested how many additional parent creations occur with the
 attached branch when running `sage -t --all`. The findings are summarised
 in
 comment:107: The number of additional parent creations increased by not
 more
 than 1% for all but two parent classes (both related with tableaux). I
 also
 found that the time to run the tests did not significantly increase.

 Jean-Pierre has occasionally stated that some of his computations have
 been
 infeasible with the memory leak in the above example. I hope that his
 computations will now succeed.

--

Comment:

 I think changing the string representation of weakened maps should be done
 here. And then, in a couple of tests, one needs to copy the map in order
 to get the test pass.

 Therefore, I suggest to implement copying for all maps ''here'' as well,
 not on a different ticket. After all, it is not difficult: One just looks
 at the list of cdef attributes, and implements `_extra_slots` and
 `_update_slots` taking exactly these attributes into account. The only
 difficulty is to really catch ''all'' kinds of maps.

 Note that in most cases `phi == loads(dumps(phi))` would return False, but
 this is since comparison of maps is often not implemented---and this is
 what I will certainly ''not'' attempt to implement here.

--
Ticket URL: <http://trac.sagemath.org/ticket/14711#comment:118>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to