#14711: Memleak when creating QuadraticField
-------------------------------------+-------------------------------------
       Reporter:  jpflori            |        Owner:  davidloeffler
           Type:  defect             |       Status:  needs_review
       Priority:  critical           |    Milestone:  sage-5.13
      Component:  number fields      |   Resolution:
       Keywords:  memleak, number    |    Merged in:
  field, QuadraticField              |    Reviewers:
        Authors:  Simon King         |  Work issues:  Fix elliptic curves
Report Upstream:  N/A                |  code
         Branch:                     |       Commit:
  u/SimonKing/ticket/14711           |  05fb569cb132a8c89713021f1f4b25cd2dd7cb1c
   Dependencies:                     |     Stopgaps:
-------------------------------------+-------------------------------------

Comment (by SimonKing):

 Replying to [comment:96 nbruin]:
 > I agree that there is a place for such strong connections, but I have
 severe reservations about declaring it [`.register_coercion()`] is the
 only way or even the default way to inform the system
 > about coercions.

 Well, I have mentioned `._coerce_map_from_(...)` in several previous
 posts, and if you look at my thematic tutorial on categories and coercion,
 you'll find that I consider ''this'' the default. And it only yields
 ''weak'' caching.

 > I have severe reservations about declaring that this code "will never be
 memory efficient" in sage.

 I think that we want ''some'' particularly important coercions to be tied
 to the lifetime of the codomain, and thus we use `.register_coercion()`,
 and we want other coercions to be tied to the minimum of the lifetimes of
 domain and codomain, and thus we use `._coerce_map_from_()`. I don't think
 we have a problem here.

 > Consider:
 > {{{
 > Qx.<x>=QQ[]
 > K.<a>=NumberField(x^4-2)
 > L.<b>=NumberField(x^2-2,embedding=a^2)
 > }}}
 > This fits perfectly in the "unchanging universe" model. Also note that
 the coercion system does not need to let L keep K alive, since the
 construction parameters,
 > which get kept alive for the life of L by `CachedRepresentation` or
 something analogous, refer to K already.

 It isn't `CachedRepresentation`, but this doesn't matter.

 > Now consider
 > {{{
 > M.<b>=NumberField(x^2-2)
 > }}}
 > In the "unchanging universe" (and in sage as well) we have that M is
 distinct from L. However, I think it's unrealistic to expect that all
 "embeddings" etc. can be
 > specified at construction time.

 Again, nobody has claimed that everything needs to be declared at
 construction
 time. There are some particularly important coercions registered at
 construction time, namely '''the''' coerce embedding (if it exists then it
 is
 unique) and those installed by `.register_coercion()`. Everything else is
 dynamical, based on `_coerce_map_from_()`.

 In my description from comment:84, note that the digraph is not totally
 static. It has static parts (corresponding to coerce embeddings and
 coercions
 fixed by `.register_coercion()`) and dynamic shortcuts (corresponding to
 `_coerce_map_from_`).


 > So I think, even though it's not possible currently in sage, that one
 should allow for
 > {{{
 > m1=Hom(M,K)([a^2])
 > m2=Hom(M,K)([-a^2])
 > M.register_embedding(m1)
 > }}}

 I don't know if this is reasonable, but at least it is against what people
 originally wanted with the coerce embedding. If you declare the coerce
 embedding phi from a number field `K` to, say, `CC`, then you consider `K`
 as a
 subfield of `CC`. If you provide another number field `L`, which is
 isomorphic
 to `K`, with a different embedding psi into `CC`, then adding an element
 of
 `K` to an element of `L` is done by embedding both into `CC` and then
 adding
 complex numbers.

 Since we think of K and L as different subfields of `CC` and not as
 abstract
 fields, we ''must'' consider K and L as different objects, and so the
 different
 embedding must play a role in the cache key for K and L. This is why they
 have
 to be provided at construction time.

 It would be a totally different way of thinking if you tried to do the
 same
 with `CC.register_coercion(phi/psi)` or with
 `CC._coerce_map_from_(K/L)`. Namely, in both cases, you would not be able
 to
 add elements of K and L, because neither K nor L would know about the
 embedding. And in fact you would consider K and L as abstract fields, and
 you
 would in fact want `K is L` (at least if you fancy unique parents, which I
 do...). And then the axioms for coercion would strike: There can be at
 most
 one coercion from `K` (i.e., `L`) to `CC`. Hence, you could ''not''
 simultaneously declare different embeddings of `K` into `CC` as coercions.

 Since it pretty much seems to me that number theorists want to comfortable
 compute with different isomorphic subfields of `CC`, it would thus simply
 not
 feasible to restrict oneself to `.register_coercion` and
 `_coerce_map_from_`:
 One ''needs'' coerce embeddings, and one ''needs'' that they are part of
 the
 defining data of a number field.

 > Note that the choice of `m1` or `m2` here leads to different relations
 between `M` and `K` and hence different universes.
 > In other words, our concept of "globally unique" is not powerful enough
 to capture the
 > full identity of objects, which would include the coercion relations
 with
 > objects that haven't been "discovered" yet.

 I would state it differently. In order to define `K` (a subfield of CC),
 there
 is no way around providing the embedding during creation. "Discovering" a
 coercion relation seems the wrong approach here.

 And speaking about memory: The embedding of K into CC is stored as an
 attribute of K, not of CC. Hence, K keeps CC alive, but CC does not
 prevent K
 from garbage collection. So, I really don't understand where you see a
 problem.

 > In practice, we can usually work around that by
 > for instance changing the names of generators and hence create
 artificially differently labelled objects but that's already not a
 possibility for creating
 > different copies of `ZZ^n`, since there are no generator names to choose
 there.

 Well, if one has obvious distinguishing data, such as an embedding, then
 there
 is nothing artificial when using them.

 > I think one has to accept the reality here: what we have is a collection
 of objects whose relations do change in time.

 No. I don't see anything dynamic in your "embedded numberfield" examples.
 A
 subfield is a subfield is a subfield.

 > That's not the only thing coercion does. It may also find "common
 covering
 > structures", which may lead to construction of new parents. Those
 definitely
 > don't deserve to get nailed into memory. Yet, the code that creates
 these
 > parents will look (to the coercion system) as a "user", so it would be
 using
 > these strong-referencing coercion registration routines.

 What you seem to mention here is the pushout construction. It mainly
 relies on
 "construction functors". I don't even know if it takes the coerce
 embeddings
 into account at all.

 Anyway, the new parents created by pushouts indeed play the same role as
 parents
 created by the user. Let's try to be more concrete. Let P and Q be
 parents, you want to add an element p of P to an element q of Q, and the
 pushout
 construction finds a parent R such that both P and Q coerce into R,
 allowing
 to perform the addition in R, resulting in `r=R(p)+R(q)`.

 Now, it could be that `R.register_coercion(P)` and
 `R.register_coercion(Q)`
 are both executed in `R.__init__` (but see the remark below). In the
 current
 code (also in my branch), this would imply a strong reference chain from R
 to
 both P and Q. Hence, even if you did `del p,q,P,Q`, P and Q could not be
 garbage collected.

 But I don't think we should see this problematic, for several reasons:

 - Pushout constructions don't arise particularly often. Normally, either P
   coerces into Q or Q coerces into P, or both embed into the same parent
   anyway, and I have mentioned above: With a coerce embedding, the
 existance
   of R would not prevent P and Q from garbage collection (plus, it has
 nothing
   to do with pushout anyway...)

 - Is `register_coercion` really used so often? I think `_coerce_map_from_`
 is
   more commonly used, and then the existence of R would ''not'' prevent P
 and
   Q from garbage collection.

 > I think it may well be a feature, not a bug, that one at some point can
 just
 > be left with the shortcuts and that the intermediates have fallen out.

 How would you guarantee that you kept in mind enough shortcuts to not
 change
 connectivity by throwing away intermediates?

 > The
 > natural way of staying close to "discovering a permanent universe" is by
 > never throwing away anything that has been discovered, and I think we
 agree
 > that's a "memory leak".

 No, we disagree.

 It is a memory leak if a connected component (not taking into account
 shortcuts) of the coercion graph can not be garbage collected, even though
 there is no external strong reference (which may be a coerce embedding) to
 any
 vertex of the connected component.

 Note that this was exactly the problem with the example from the ticket
 description! As I have pointed out in comment:18, it was ''not'' the case
 that
 the problem lay in `_coerce_from_list_`, because this was empty. In
 particular, it was ''not'' the case that `.register_coercion()` was to
 blame.

 Instead, the memory leak came from short-cuts, i.e., from the stuff
 stored in `_coerce_from_hash`, which can also be seen in
 [attachment:chain.png].

 Hence, the quadratic field and CC did belong to different connected
 components
 of the coercion graph, but the shortcut kept Q alive.

 > > And there surely is a means available to add a coercion that doesn't
 tie the
 > > lifespan of two parents too closely: Implement
 `P._coerce_map_from_(Q)`,
 > > which can return a map or simply "True" (in the latter case,
 conversion is
 > > used to coerce Q into P). The result is cached in
 `P._coerce_from_hash`, but
 > > not in `P._coerce_from_list`.

 > You mean: implement
 > `P._install_coerce_map_from_(m)`, which does:
 > `_coerce_from_hash[Domain(m)]=m`. I think it is quite important to be
 able
 > to manipulate the coercion graph without having to modify library code.

 This might be a good idea. So, we would have `.register_coercion()` for
 permanent pathways, and `_install_coerce_map_from()` (with a similar
 semantics, i.e., you can either just provide the parent or a whole
 morphism)
 for impermanent shortcuts.

--
Ticket URL: <http://trac.sagemath.org/ticket/14711#comment:99>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to