On 10/5/07, Neil Schemenauer <[EMAIL PROTECTED]> wrote: > On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote: > > Could you elaborate on what you are trying to do? > > I'm trying to efficiently pickle a 'unicode' subclass. I'm > disappointed that it's not possible to be as efficient as the > built-in unicode class, even when using an extension code.
There is a few things you could do to produce smaller pickle streams. If you are certain that the objects you will pickle are not self-referential, then you can set Pickler.fast to True. This will disable the "memorizer", which adds a 2-bytes overhead to each objects pickled (depending on the input, this might or not shorten the resulting stream). If this isn't enough, then you could subclass Pickler and Unpickler and define a custom rule for your unicode subclass. An obvious optimization for pickle, in Py3k, would to add support for short unicode string. Currently, there is a 4-bytes overhead per string. Since Py3k is unicode throughout, this overhead can become quite large. > > Could point out specific examples of the "old code" that you are referring > > to? > > I don't have time right now to point at specific code. How about > the code that implements all the different versions of __reduce__ > and code for __getinitargs__, __getstate__, __setstate__? At first glance, __reduce__ seems to be useful only for instances of subclasses of built-in type. However, __getnewsargs__ could easily replace it for that. So, removing __reduce__ (and __reduce_ex__) is probably a good idea. As far as I know, the current pickle module doesn't use __getinitargs__ (this is one of the things the documentation is totally wrong about). As for __getstate__ and __setstate__, I think they are essential. Without them, you couldn't pickle objects with __slots__ or save the I/O state of certain objects. It would certainly be possible to simplify a little the algorithm used for pickling class instances. In "pseudo-code", it would look like something along these lines: def save_obj(obj): # let obj be the instance of a user-defined class cls = obj.__class__ if hasattr(obj, "__getnewargs__"): args = obj.__getnewargs__() else: args = () if hasattr(obj, "__getstate__"): state = obj.__getstate__() else: state = obj.__dict__ return (cls, args, state) def load_obj(cls, args, state): obj = cls.__new__(cls, *args) if hasattr(obj, "__getstate__"): try: obj.__setstate__(state) except AttributeError: raise UnpicklingError else: obj.__dict__.update(state) return obj The main difference, between this and current method used to pickle instances, is the use of __getnewargs__, instead of __reduce__. -- Alexandre _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com