On Mon, Feb 16, 2009 at 11:09:34PM +0100, Nicolas Thiéry wrote:
>       Dear William, dear Anne, dear all,
> 
> > William, in another thread:
> > In a lot of the rest of Sage, doing pickling "by construction"
> > instead of "by data structure" is nearly pointless.  Very often one
> > uses pickles this way in Sage:
> 
> > sage: A = make_some_object(...)   # instant
> > sage: A.do_stuff()                            # takes a few minutes or hours
> > sage: # Now A has extra useful information cached about itself
> > sage: A.save('A')             # save my valuable work
> 
> > If A were pickled "by construction" as you suggest above, then
> > everything cached about A vanishes.
> 
> > Maybe in sage-combinat objects don't get enriched like in number
> > theory (say), so this just isn't an issue for you.
> 
> Thanks for pointing out this use case; as Anne mentioned, we also
> encounter it, but this is slightly marginal (but important!) use case
> for us; we tend to have a huge number of fairly trivial parents.
> 
> I'll give an updated design proposal below.
> 
> But first a short preamble for Anne:
> 
> ------------------------------------------------------------------------------
> "In computer science, in the context of data storage and transmission,
> serialization is the process of converting an object into a sequence
> of bits so that it can be stored on a storage medium (such as a file,
> or a memory buffer) or transmitted across a network connection
> link. When the resulting series of bits is reread according to the
> serialization format, it can be used to create a semantically
> identical clone of the original object. For many complex objects, such
> as those that make extensive use of references, this process is not
> straightforward." http://en.wikipedia.org/wiki/Serialization
> 
> In python, serialization is called pickling. Think putting your object
> into a jar of vinegar to preserve it for reuse later. Hence the term
> picklejar for a bunch of objects which have been pickled.
> 
> Note that there are two levels of difficulties with pickling:
>  - being able to pickle and unpickle properly
>  - being able to unpickle an object that has been created a long time
>    ago by a much older version of Sage.
> 
> How to pickle C = Partitions(40, min_part = 2)?
> 
>  - "by data structure": this is how an object is pickled the default
>    in python: store its class and its data structure (i.e., roughly
>    its dictionary). On unpickling, an object with the same class and
>    data structure is recreated.
> 
>  - "by construction": instead, in that case, we could store that C was
>    constructed by Partitions, by passing 40, min_part = 2 as argument.
>    As William pointed out, we might also want to store that there are
>    6153 of them, as this is not instantaneous to compute.
> 
> ------------------------------------------------------------------------------
> 
> Claims:
> 
> (a) For most parents, the default python pickling method "by data
>     structure" impedes refactoring by increasing the potential for
>     backward compatibility issues.
> 
> (b) Implementing pickling is tricky; at the same time, we want to
>     promote the implementation of new parents by users (case to be
>     done separately on demand). So, as much as possible, we want a
>     sane default pickling implementation that works in most use cases.
> 
> (c) Among other features, it should be easy to specify which parts of
>     the cache should be pickled, or not.
> 
> Rationale for (a)
>  - Pickling by data structure violates the encapsulation principle,
>    and reveals implementation details. This is particularly true for
>    Parents which tend to have intricated datastructure where many
>    attributes are mostly of technical nature:
> 
>       sage: QQ[x].__dict__
>       {'_PolynomialRing_general__cyclopoly_cache': {},
>        '_PolynomialRing_general__generator': x,
>        '_PolynomialRing_general__is_sparse': False,
>        '_PolynomialRing_general__polynomial_class': <class 
> 'sage.rings.polynomial.polynomial_element_generic.Polynomial_rational_dense'>,
>        '_has_singular': True,
>        '_implementation_names': (None,),
>        '_implementation_repr': ''}
> 
>    This can be various caches, data for the coercion mechanism,
>    etc. All those things are likely to evolve, very possibly
>    independently of the code for the parent itself.
> 
>  - The actual class of an object could be also considered as an
>    implementation detail, which we would like to be free to
>    change. This becomes particularly acute with the new category
>    mechanism, where classes are created on the fly, depending on the
>    hierarchy of categories; again, something that is likely to evolve
>    independently of the code for the parent itself.
> 
>  - Parents more often than not have (or should have) unique
>    representation. How to guarantee this without having all parents of
>    a kind be constructed / reconstructed through a single
>    construction?
> 
> 
> A proposal:
> 
>  - Modify @cache_method to offer the two following variants (better names 
> welcome!)
> 
>        @cache_method
>        def bla(self, truc)
>            ...
> 
>        @cache_method_no_pickle
>        def ble(self, truc)
>            ...
> 
>    The cache for o.bla(...) would be stored respectively in
>     - o._cache['bla']
>     - o._cache_no_pickle['ble']
> 
>    (instead of o._cache__ble currently)
> 
> 
>  - If a parent P has a construction method, pickle P by its
>    construction. Also include P.cache in the pickle.
> 
>    Upon unpickling, call the construction, and insert back the cache.
> 
> 
> Issues:
> 
>  - The construction may have readily started to populate the
>    cache. This is typically true for a parent with unique
>    representation, where the same parent can preexist in the same sage
>    session. There needs to be some cache-merging mechanism.
> 
>  - The construction may want to use pieces from the cache if they are
>    available. How to hand it down? Systematically passing the cache as
>    extra argument to the construction is not desirable, since this
>    means that all parent implementers would have to handle this in
>    their __init__ method.

I forgot to mention: without taking some care, doing this change will
probably break most pickle jars at once, or at least the "data
structure" ones (but Michael won't mind, right? :-)). And this will
also occur again with the introduction of the category framework.

However, I guess it should be possible to synchronise both changes,
and work around this problem by inserting just a bit of backward
compatibility code at some very specific places.

Cheers,
                                Nicolas
--
Nicolas M. Thiéry "Isil" <[email protected]>
http://Nicolas.Thiery.name/

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to