Hash: SHA1

On 03/10/2013 09:19 AM, Jim Fulton wrote:
>>> Now, ZODB is built on top of pickles.  And pickles in Python 2
>>> know about two kinds of strings: str and unicode.  But there are
>>> actually *three* kinds of strings in Python-land:
>>> * bytes * unicode * native strings (same as bytes in Python 2,
>>> same as unicode in Python 3)
> I hadn't encountered that term before.  I see it informally used to 
> refer to ``str``, which is bytes in Python 2 and Unicode in Python 3. 
> This isn't a different kind of string.

WSGI (PEP 3333) uses the term.  In the case of pickles, we need to
guarantee that the strings used for attribute names (including module
attributes) are native strings.

Oops, I see that you arrive there below:  I should have read all the way
to the bottom.

> Is this an issue for anything but names (object attributes and global 
> names)?
> I don't think there's a "native strings" issue.  There *does* seem to 
> be an name issue.  In Python 2 and Python 3, (non-buggy) unicode
> aware applications use bytes and unicode the same way, unicode for
> text, bytes for data.
> AFAICT, Python 3 has (admirably) changed the way names are
> implemented to use unicode, rather than ASCII.
> Am I missing something?
> This is a somewhat thorny, but still fairly restricted problem.  I 
> would hazard to guess that 99.923% of persistent classes pickle their 
> state using their instance dictionaries.  99.9968% for regular Python 
> classes.  We know when we're pickling and unpickling instances and we 
> can apply transformations necessary for the target platforms.
> I think the fix is pretty straightforward.
> In the default __setstate__ provided by Persistent, and when loading 
> non-persistent instances:
> - On Python 2, ASCII encode unicode attribute names.
> - On Python 3, ASCII decode byte attribute names.
> The same transformation is necessary when looking up global names.

Hmm, if zodbpickle has to handle the issue for non-persistent instances
and global names, wouldn't it be simpler to make it handle persistent
instances too?  It can examine the stack inside 'load_dict' to figure out
that the context is an instance, right?

> This will cover the vast majority of cases where the default 
> __setstate__ is used.  In rare cases where a custom setstate is used, 
> or when Python 3 non-ASCII attribute names are used, then databases 
> may not be sharable accross Python versions.

Code with custom __setstate__ / __getstate__ where the difference matters
is going to need porting anyway, so it might as well straddle.

> There is also likely to be breakage in dictionaries or BTrees where 
> applications are sloppy about mixing Unicode and byte keys.  I don't 
> think we should try to compensate for this.

Mixing None into a tree with bytes / text keys is likely a bigger problem
than mixed bytes / text keys.

> These applications need to be fixed.  One could write a database
> analysis script to detect this kind of breakage (looking for mixed
> string and unicide keys).

- -- 
Tres Seaver          +1 540-429-0999          tsea...@palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/


For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to