[sage-devel] Re: Pickling and refactoring of combinat

Nicolas M. Thiery Mon, 16 Feb 2009 14:09:51 -0800

        Dear William, dear Anne, dear all,

> William, in another thread:
> In a lot of the rest of Sage, doing pickling "by construction"
> instead of "by data structure" is nearly pointless.  Very often one
> uses pickles this way in Sage:


> sage: A = make_some_object(...)   # instant
> sage: A.do_stuff()                            # takes a few minutes or hours
> sage: # Now A has extra useful information cached about itself
> sage: A.save('A')             # save my valuable work

> If A were pickled "by construction" as you suggest above, then
> everything cached about A vanishes.

> Maybe in sage-combinat objects don't get enriched like in number
> theory (say), so this just isn't an issue for you.

Thanks for pointing out this use case; as Anne mentioned, we also
encounter it, but this is slightly marginal (but important!) use case
for us; we tend to have a huge number of fairly trivial parents.

I'll give an updated design proposal below.

But first a short preamble for Anne:

------------------------------------------------------------------------------
"In computer science, in the context of data storage and transmission,
serialization is the process of converting an object into a sequence
of bits so that it can be stored on a storage medium (such as a file,
or a memory buffer) or transmitted across a network connection
link. When the resulting series of bits is reread according to the
serialization format, it can be used to create a semantically
identical clone of the original object. For many complex objects, such
as those that make extensive use of references, this process is not
straightforward." http://en.wikipedia.org/wiki/Serialization

In python, serialization is called pickling. Think putting your object
into a jar of vinegar to preserve it for reuse later. Hence the term
picklejar for a bunch of objects which have been pickled.

Note that there are two levels of difficulties with pickling:
 - being able to pickle and unpickle properly
 - being able to unpickle an object that has been created a long time
   ago by a much older version of Sage.

How to pickle C = Partitions(40, min_part = 2)?

 - "by data structure": this is how an object is pickled the default
   in python: store its class and its data structure (i.e., roughly
   its dictionary). On unpickling, an object with the same class and
   data structure is recreated.

 - "by construction": instead, in that case, we could store that C was
   constructed by Partitions, by passing 40, min_part = 2 as argument.
   As William pointed out, we might also want to store that there are
   6153 of them, as this is not instantaneous to compute.

------------------------------------------------------------------------------

Claims:

(a) For most parents, the default python pickling method "by data
    structure" impedes refactoring by increasing the potential for
    backward compatibility issues.

(b) Implementing pickling is tricky; at the same time, we want to
    promote the implementation of new parents by users (case to be
    done separately on demand). So, as much as possible, we want a
    sane default pickling implementation that works in most use cases.

(c) Among other features, it should be easy to specify which parts of
    the cache should be pickled, or not.

Rationale for (a)
 - Pickling by data structure violates the encapsulation principle,
   and reveals implementation details. This is particularly true for
   Parents which tend to have intricated datastructure where many
   attributes are mostly of technical nature:

      sage: QQ[x].__dict__
      {'_PolynomialRing_general__cyclopoly_cache': {},
       '_PolynomialRing_general__generator': x,
       '_PolynomialRing_general__is_sparse': False,
       '_PolynomialRing_general__polynomial_class': <class 
'sage.rings.polynomial.polynomial_element_generic.Polynomial_rational_dense'>,
       '_has_singular': True,
       '_implementation_names': (None,),
       '_implementation_repr': ''}

   This can be various caches, data for the coercion mechanism,
   etc. All those things are likely to evolve, very possibly
   independently of the code for the parent itself.

 - The actual class of an object could be also considered as an
   implementation detail, which we would like to be free to
   change. This becomes particularly acute with the new category
   mechanism, where classes are created on the fly, depending on the
   hierarchy of categories; again, something that is likely to evolve
   independently of the code for the parent itself.

 - Parents more often than not have (or should have) unique
   representation. How to guarantee this without having all parents of
   a kind be constructed / reconstructed through a single
   construction?


A proposal:

 - Modify @cache_method to offer the two following variants (better names 
welcome!)

       @cache_method
       def bla(self, truc)
           ...

       @cache_method_no_pickle
       def ble(self, truc)
           ...

   The cache for o.bla(...) would be stored respectively in
    - o._cache['bla']
    - o._cache_no_pickle['ble']

   (instead of o._cache__ble currently)


 - If a parent P has a construction method, pickle P by its
   construction. Also include P.cache in the pickle.

   Upon unpickling, call the construction, and insert back the cache.


Issues:

 - The construction may have readily started to populate the
   cache. This is typically true for a parent with unique
   representation, where the same parent can preexist in the same sage
   session. There needs to be some cache-merging mechanism.

 - The construction may want to use pieces from the cache if they are
   available. How to hand it down? Systematically passing the cache as
   extra argument to the construction is not desirable, since this
   means that all parent implementers would have to handle this in
   their __init__ method.

Best regards,
                                Nicolas

PS: since we discussed this over the phone, I mention right away a:

+1 from Florent

--
Nicolas M. Thiéry "Isil" <[email protected]>
http://Nicolas.Thiery.name/

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: Pickling and refactoring of combinat

Reply via email to