#14711: Weak references in the coercion graph
-------------------------------------+-------------------------------------
Reporter: jpflori | Owner: davidloeffler
Type: defect | Status: needs_work
Priority: critical | Milestone: sage-5.13
Component: number fields | Resolution:
Keywords: memleak, number | Merged in:
field, QuadraticField | Reviewers:
Authors: Simon King | Work issues: String repr. of
Report Upstream: N/A | weakened maps; copying/pickling of
Branch: | maps
u/SimonKing/ticket/14711 | Commit:
Dependencies: | 5168cfd15d4cdbaa3ffdbd4be0f7d783f77257c7
| Stopgaps:
-------------------------------------+-------------------------------------
Changes (by SimonKing):
* status: needs_review => needs_work
* work_issues: => String repr. of weakened maps; copying/pickling of maps
Old description:
> The following quickly eats up memory:
> {{{
> sage: for D in xrange(2,2**32):
> ....: QuadraticField(-D);
> ....:
> }}}
> (This is with 5.10.rc0)
>
> __Problem analysis__
>
> The quadratic field is created with a coerce embedding into `CLF`. At the
> same
> time, this coerce embedding is stored in `CLF._coerce_from_hash`:
> {{{
> sage: phi = CLF.coerce_map_from(Q)
> sage: phi is Q.coerce_embedding()
> True
> sage: Q in CLF._introspect_coerce()['_coerce_from_hash']
> True
> }}}
> The "coerce_from_hash" is a `MonoDict`, hence, has only a weak reference
> to the key
> (Q, in this case). However, there still is a ''strong'' reference from
> CLF to the coerce map phi. And phi has a strong reference to its
> domain, thus, to Q. Hence, the existence of CLF prevents garbage
> collection of
> Q.
>
> And there is a second chain of strong references from CLF to Q: From CLF
> to
> phi to the parent of phi (i.e., a homset) to the domain Q of this homset.
>
> __Suggested solution__
>
> We can not turn the reference from CLF to phi into a weak reference,
> because
> then even a strong reference to Q would not prevent phi from garbage
> collection. Hence, we need to break the above mentioned reference chains
> in
> two points. In the attached branch, maps generally keep a strong
> reference to
> the codomain (this is important in composite maps and actions), but those
> used
> ''in the coercion system'' (and ''only'' there!!) will only have a weak
> reference to the domain, and they set the cdef `._parent` attribute to
> `None`
> (hence, we also override `.parent()`, so that it reconstructs the homset
> if
> the weak reference to the domain is still valid).
>
> To preserve the `domain()/codomain()` interface, I have removed the
> method
> `domain()` and have replaced it by a cdef public attribute that will
> either
> hold a weak reference (which returns the domain when called, hence, the
> interface does not change) or a `ConstantFunction` (which should actually
> be
> faster to call than a method). Since accessing a cdef attribute is still
> faster, the cdef attribute `_codomain` is kept (since this will always be
> a
> strong reference), but `_domain` has been removed.
>
> This "weakening of references" is done for the coercions found by
> `discover_coerce_map_from()` stored into `_coerce_from_hash`. So, this
> mainly
> happens for things done with `_coerce_map_from_()` and with composite
> maps. Similarly for `_convert_from_hash`.
>
> Weakening is ''not'' used on the maps that are explicitly registered by
> `.register_embedding()` and `.register_coercion()`. This is in order to
> preserve the connectivity of the coercion graph. The `register_*` methods
> are only used on selected maps, that are of particular importance for the
> backtrack search in `discover_coerce_map_from()`. These ''strong''
> registrations do not propagate: Compositions of strongly registered
> coercions found by `discover_coerce_map_from()` will be weakened.
>
> Since weakened maps should not be used outside of the coercion system,
> its
> string representation shows a warning to replace them by a copy. The
> attached
> branch implements copying of maps in some additional cases.
>
> `SchemeMorphism` can not inherit from `Morphism`, because of a bug with
> multiple inheritance of a Python class from Cython extension classes. But
> once
> this bug is fixed, we surely want to make `SchemeMorphism` inherit from
> `Morphism`. This transition is prepared here.
>
> In any case, the commit messages should give a concise description of
> what has
> been done.
>
> '''__Still TODO__'''
>
> Let the string representation of weakened maps point the user to the need
> of
> creating a copy.
>
> '''__TODO in future tickets__'''
>
> - Provide a documentation of the use of weak references in coercion, and
> of
> different ways of registering coercions, with their different impacts
> on
> garbage collecion.
> - Provide a version of `.register_coercion()` that weakens the coercion
> map. It would hence have the same effect as returning a map by
> `._coerce_map_from_()`, but of course `._coerce_map_from()` could not
> easily
> be changed in an interactive session.
> - provide copying for ''all'' kinds of maps.
>
> '''__Effects on the overall functioning of Sage__'''
>
> It is conceivable that some parts of Sage still suppose implicitly that
> stuff
> cached with `UniqueRepresentation` is ''permanently'' cached, even though
> the
> seemingly permanent cache was not more than a consequence of a memory
> leak in
> the coercion system. With the attached branch, garbage collection of
> parent
> structures will much more often become possible. Hence, code that relied
> on a
> fake-permanent cache would now need to create the same parent repeatedly.
>
> I (Simon) have tested how many additional parent creations occur with the
> attached branch when running `sage -t --all`. The findings are summarised
> in
> comment:107: The number of additional parent creations increased by not
> more
> than 1% for all but two parent classes (both related with tableaux). I
> also
> found that the time to run the tests did not significantly increase.
>
> Jean-Pierre has occasionally stated that some of his computations have
> been
> infeasible with the memory leak in the above example. I hope that his
> computations will now succeed.
New description:
The following quickly eats up memory:
{{{
sage: for D in xrange(2,2**32):
....: QuadraticField(-D);
....:
}}}
(This is with 5.10.rc0)
__Problem analysis__
The quadratic field is created with a coerce embedding into `CLF`. At the
same
time, this coerce embedding is stored in `CLF._coerce_from_hash`:
{{{
sage: phi = CLF.coerce_map_from(Q)
sage: phi is Q.coerce_embedding()
True
sage: Q in CLF._introspect_coerce()['_coerce_from_hash']
True
}}}
The "coerce_from_hash" is a `MonoDict`, hence, has only a weak reference
to the key
(Q, in this case). However, there still is a ''strong'' reference from
CLF to the coerce map phi. And phi has a strong reference to its
domain, thus, to Q. Hence, the existence of CLF prevents garbage
collection of
Q.
And there is a second chain of strong references from CLF to Q: From CLF
to
phi to the parent of phi (i.e., a homset) to the domain Q of this homset.
__Suggested solution__
We can not turn the reference from CLF to phi into a weak reference,
because
then even a strong reference to Q would not prevent phi from garbage
collection. Hence, we need to break the above mentioned reference chains
in
two points. In the attached branch, maps generally keep a strong reference
to
the codomain (this is important in composite maps and actions), but those
used
''in the coercion system'' (and ''only'' there!!) will only have a weak
reference to the domain, and they set the cdef `._parent` attribute to
`None`
(hence, we also override `.parent()`, so that it reconstructs the homset
if
the weak reference to the domain is still valid).
To preserve the `domain()/codomain()` interface, I have removed the method
`domain()` and have replaced it by a cdef public attribute that will
either
hold a weak reference (which returns the domain when called, hence, the
interface does not change) or a `ConstantFunction` (which should actually
be
faster to call than a method). Since accessing a cdef attribute is still
faster, the cdef attribute `_codomain` is kept (since this will always be
a
strong reference), but `_domain` has been removed.
This "weakening of references" is done for the coercions found by
`discover_coerce_map_from()` stored into `_coerce_from_hash`. So, this
mainly
happens for things done with `_coerce_map_from_()` and with composite
maps. Similarly for `_convert_from_hash`.
Weakening is ''not'' used on the maps that are explicitly registered by
`.register_embedding()` and `.register_coercion()`. This is in order to
preserve the connectivity of the coercion graph. The `register_*` methods
are only used on selected maps, that are of particular importance for the
backtrack search in `discover_coerce_map_from()`. These ''strong''
registrations do not propagate: Compositions of strongly registered
coercions found by `discover_coerce_map_from()` will be weakened.
Since weakened maps should not be used outside of the coercion system, its
string representation shows a warning to replace them by a copy. The
attached
branch implements copying of maps in some additional cases.
`SchemeMorphism` can not inherit from `Morphism`, because of a bug with
multiple inheritance of a Python class from Cython extension classes. But
once
this bug is fixed, we surely want to make `SchemeMorphism` inherit from
`Morphism`. This transition is prepared here.
In any case, the commit messages should give a concise description of what
has
been done.
'''__Still TODO__'''
- Let the string representation of weakened maps point the user to the
need of
creating a copy.
- Provide copying for ''all'' kinds of maps.
'''__TODO in future tickets__'''
- Provide a documentation of the use of weak references in coercion, and
of
different ways of registering coercions, with their different impacts on
garbage collecion.
- Provide a version of `.register_coercion()` that weakens the coercion
map. It would hence have the same effect as returning a map by
`._coerce_map_from_()`, but of course `._coerce_map_from()` could not
easily
be changed in an interactive session.
'''__Effects on the overall functioning of Sage__'''
It is conceivable that some parts of Sage still suppose implicitly that
stuff
cached with `UniqueRepresentation` is ''permanently'' cached, even though
the
seemingly permanent cache was not more than a consequence of a memory leak
in
the coercion system. With the attached branch, garbage collection of
parent
structures will much more often become possible. Hence, code that relied
on a
fake-permanent cache would now need to create the same parent repeatedly.
I (Simon) have tested how many additional parent creations occur with the
attached branch when running `sage -t --all`. The findings are summarised
in
comment:107: The number of additional parent creations increased by not
more
than 1% for all but two parent classes (both related with tableaux). I
also
found that the time to run the tests did not significantly increase.
Jean-Pierre has occasionally stated that some of his computations have
been
infeasible with the memory leak in the above example. I hope that his
computations will now succeed.
--
Comment:
I think changing the string representation of weakened maps should be done
here. And then, in a couple of tests, one needs to copy the map in order
to get the test pass.
Therefore, I suggest to implement copying for all maps ''here'' as well,
not on a different ticket. After all, it is not difficult: One just looks
at the list of cdef attributes, and implements `_extra_slots` and
`_update_slots` taking exactly these attributes into account. The only
difficulty is to really catch ''all'' kinds of maps.
Note that in most cases `phi == loads(dumps(phi))` would return False, but
this is since comparison of maps is often not implemented---and this is
what I will certainly ''not'' attempt to implement here.
--
Ticket URL: <http://trac.sagemath.org/ticket/14711#comment:118>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.