Dear William, dear Anne, dear all,
> William, in another thread:
> In a lot of the rest of Sage, doing pickling "by construction"
> instead of "by data structure" is nearly pointless. Very often one
> uses pickles this way in Sage:
> sage: A = make_some_object(...) # instant
> sage: A.do_stuff() # takes a few minutes or hours
> sage: # Now A has extra useful information cached about itself
> sage: A.save('A') # save my valuable work
> If A were pickled "by construction" as you suggest above, then
> everything cached about A vanishes.
> Maybe in sage-combinat objects don't get enriched like in number
> theory (say), so this just isn't an issue for you.
Thanks for pointing out this use case; as Anne mentioned, we also
encounter it, but this is slightly marginal (but important!) use case
for us; we tend to have a huge number of fairly trivial parents.
I'll give an updated design proposal below.
But first a short preamble for Anne:
------------------------------------------------------------------------------
"In computer science, in the context of data storage and transmission,
serialization is the process of converting an object into a sequence
of bits so that it can be stored on a storage medium (such as a file,
or a memory buffer) or transmitted across a network connection
link. When the resulting series of bits is reread according to the
serialization format, it can be used to create a semantically
identical clone of the original object. For many complex objects, such
as those that make extensive use of references, this process is not
straightforward." http://en.wikipedia.org/wiki/Serialization
In python, serialization is called pickling. Think putting your object
into a jar of vinegar to preserve it for reuse later. Hence the term
picklejar for a bunch of objects which have been pickled.
Note that there are two levels of difficulties with pickling:
- being able to pickle and unpickle properly
- being able to unpickle an object that has been created a long time
ago by a much older version of Sage.
How to pickle C = Partitions(40, min_part = 2)?
- "by data structure": this is how an object is pickled the default
in python: store its class and its data structure (i.e., roughly
its dictionary). On unpickling, an object with the same class and
data structure is recreated.
- "by construction": instead, in that case, we could store that C was
constructed by Partitions, by passing 40, min_part = 2 as argument.
As William pointed out, we might also want to store that there are
6153 of them, as this is not instantaneous to compute.
------------------------------------------------------------------------------
Claims:
(a) For most parents, the default python pickling method "by data
structure" impedes refactoring by increasing the potential for
backward compatibility issues.
(b) Implementing pickling is tricky; at the same time, we want to
promote the implementation of new parents by users (case to be
done separately on demand). So, as much as possible, we want a
sane default pickling implementation that works in most use cases.
(c) Among other features, it should be easy to specify which parts of
the cache should be pickled, or not.
Rationale for (a)
- Pickling by data structure violates the encapsulation principle,
and reveals implementation details. This is particularly true for
Parents which tend to have intricated datastructure where many
attributes are mostly of technical nature:
sage: QQ[x].__dict__
{'_PolynomialRing_general__cyclopoly_cache': {},
'_PolynomialRing_general__generator': x,
'_PolynomialRing_general__is_sparse': False,
'_PolynomialRing_general__polynomial_class': <class
'sage.rings.polynomial.polynomial_element_generic.Polynomial_rational_dense'>,
'_has_singular': True,
'_implementation_names': (None,),
'_implementation_repr': ''}
This can be various caches, data for the coercion mechanism,
etc. All those things are likely to evolve, very possibly
independently of the code for the parent itself.
- The actual class of an object could be also considered as an
implementation detail, which we would like to be free to
change. This becomes particularly acute with the new category
mechanism, where classes are created on the fly, depending on the
hierarchy of categories; again, something that is likely to evolve
independently of the code for the parent itself.
- Parents more often than not have (or should have) unique
representation. How to guarantee this without having all parents of
a kind be constructed / reconstructed through a single
construction?
A proposal:
- Modify @cache_method to offer the two following variants (better names
welcome!)
@cache_method
def bla(self, truc)
...
@cache_method_no_pickle
def ble(self, truc)
...
The cache for o.bla(...) would be stored respectively in
- o._cache['bla']
- o._cache_no_pickle['ble']
(instead of o._cache__ble currently)
- If a parent P has a construction method, pickle P by its
construction. Also include P.cache in the pickle.
Upon unpickling, call the construction, and insert back the cache.
Issues:
- The construction may have readily started to populate the
cache. This is typically true for a parent with unique
representation, where the same parent can preexist in the same sage
session. There needs to be some cache-merging mechanism.
- The construction may want to use pieces from the cache if they are
available. How to hand it down? Systematically passing the cache as
extra argument to the construction is not desirable, since this
means that all parent implementers would have to handle this in
their __init__ method.
Best regards,
Nicolas
PS: since we discussed this over the phone, I mention right away a:
+1 from Florent
--
Nicolas M. ThiƩry "Isil" <[email protected]>
http://Nicolas.Thiery.name/
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---