On Mon, Feb 16, 2009 at 11:09:34PM +0100, Nicolas Thiéry wrote:
> Dear William, dear Anne, dear all,
>
> > William, in another thread:
> > In a lot of the rest of Sage, doing pickling "by construction"
> > instead of "by data structure" is nearly pointless. Very often one
> > uses pickles this way in Sage:
>
> > sage: A = make_some_object(...) # instant
> > sage: A.do_stuff() # takes a few minutes or hours
> > sage: # Now A has extra useful information cached about itself
> > sage: A.save('A') # save my valuable work
>
> > If A were pickled "by construction" as you suggest above, then
> > everything cached about A vanishes.
>
> > Maybe in sage-combinat objects don't get enriched like in number
> > theory (say), so this just isn't an issue for you.
>
> Thanks for pointing out this use case; as Anne mentioned, we also
> encounter it, but this is slightly marginal (but important!) use case
> for us; we tend to have a huge number of fairly trivial parents.
>
> I'll give an updated design proposal below.
>
> But first a short preamble for Anne:
>
> ------------------------------------------------------------------------------
> "In computer science, in the context of data storage and transmission,
> serialization is the process of converting an object into a sequence
> of bits so that it can be stored on a storage medium (such as a file,
> or a memory buffer) or transmitted across a network connection
> link. When the resulting series of bits is reread according to the
> serialization format, it can be used to create a semantically
> identical clone of the original object. For many complex objects, such
> as those that make extensive use of references, this process is not
> straightforward." http://en.wikipedia.org/wiki/Serialization
>
> In python, serialization is called pickling. Think putting your object
> into a jar of vinegar to preserve it for reuse later. Hence the term
> picklejar for a bunch of objects which have been pickled.
>
> Note that there are two levels of difficulties with pickling:
> - being able to pickle and unpickle properly
> - being able to unpickle an object that has been created a long time
> ago by a much older version of Sage.
>
> How to pickle C = Partitions(40, min_part = 2)?
>
> - "by data structure": this is how an object is pickled the default
> in python: store its class and its data structure (i.e., roughly
> its dictionary). On unpickling, an object with the same class and
> data structure is recreated.
>
> - "by construction": instead, in that case, we could store that C was
> constructed by Partitions, by passing 40, min_part = 2 as argument.
> As William pointed out, we might also want to store that there are
> 6153 of them, as this is not instantaneous to compute.
>
> ------------------------------------------------------------------------------
>
> Claims:
>
> (a) For most parents, the default python pickling method "by data
> structure" impedes refactoring by increasing the potential for
> backward compatibility issues.
>
> (b) Implementing pickling is tricky; at the same time, we want to
> promote the implementation of new parents by users (case to be
> done separately on demand). So, as much as possible, we want a
> sane default pickling implementation that works in most use cases.
>
> (c) Among other features, it should be easy to specify which parts of
> the cache should be pickled, or not.
>
> Rationale for (a)
> - Pickling by data structure violates the encapsulation principle,
> and reveals implementation details. This is particularly true for
> Parents which tend to have intricated datastructure where many
> attributes are mostly of technical nature:
>
> sage: QQ[x].__dict__
> {'_PolynomialRing_general__cyclopoly_cache': {},
> '_PolynomialRing_general__generator': x,
> '_PolynomialRing_general__is_sparse': False,
> '_PolynomialRing_general__polynomial_class': <class
> 'sage.rings.polynomial.polynomial_element_generic.Polynomial_rational_dense'>,
> '_has_singular': True,
> '_implementation_names': (None,),
> '_implementation_repr': ''}
>
> This can be various caches, data for the coercion mechanism,
> etc. All those things are likely to evolve, very possibly
> independently of the code for the parent itself.
>
> - The actual class of an object could be also considered as an
> implementation detail, which we would like to be free to
> change. This becomes particularly acute with the new category
> mechanism, where classes are created on the fly, depending on the
> hierarchy of categories; again, something that is likely to evolve
> independently of the code for the parent itself.
>
> - Parents more often than not have (or should have) unique
> representation. How to guarantee this without having all parents of
> a kind be constructed / reconstructed through a single
> construction?
>
>
> A proposal:
>
> - Modify @cache_method to offer the two following variants (better names
> welcome!)
>
> @cache_method
> def bla(self, truc)
> ...
>
> @cache_method_no_pickle
> def ble(self, truc)
> ...
>
> The cache for o.bla(...) would be stored respectively in
> - o._cache['bla']
> - o._cache_no_pickle['ble']
>
> (instead of o._cache__ble currently)
>
>
> - If a parent P has a construction method, pickle P by its
> construction. Also include P.cache in the pickle.
>
> Upon unpickling, call the construction, and insert back the cache.
>
>
> Issues:
>
> - The construction may have readily started to populate the
> cache. This is typically true for a parent with unique
> representation, where the same parent can preexist in the same sage
> session. There needs to be some cache-merging mechanism.
>
> - The construction may want to use pieces from the cache if they are
> available. How to hand it down? Systematically passing the cache as
> extra argument to the construction is not desirable, since this
> means that all parent implementers would have to handle this in
> their __init__ method.
I forgot to mention: without taking some care, doing this change will
probably break most pickle jars at once, or at least the "data
structure" ones (but Michael won't mind, right? :-)). And this will
also occur again with the introduction of the category framework.
However, I guess it should be possible to synchronise both changes,
and work around this problem by inserting just a bit of backward
compatibility code at some very specific places.
Cheers,
Nicolas
--
Nicolas M. Thiéry "Isil" <[email protected]>
http://Nicolas.Thiery.name/
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---