(Resending because I used the wrong From address and the mail got stuck in moderation.)
Some goals, in order of decreasing priority 1. ZODB should work on Python 3 2. ZODB databases created on Python 2 should be loadable with ZODB on Python 3. 3. ZODB databases created on Python 3 should be loadable with ZODB on Python 2. This will be kinda longish, so please settle down. Now, ZODB is built on top of pickles. And pickles in Python 2 know about two kinds of strings: str and unicode. But there are actually *three* kinds of strings in Python-land: * bytes * unicode * native strings (same as bytes in Python 2, same as unicode in Python 3) Unfortunately we cannot distinguish bytes from native strings in the pickles produced on Python 2: both kinds are pickled as STRING, BINSTRING or SHORT_BINSTRING opcodes. If we assume they're native strings, we can break pickles that contain binary data, in one of two possible ways: i. assume 'ascii' and raise UnicodeDecodeError while loading ii. assume 'latin-1' and silently give applications unicode objects where they expect strings iii. assume 'utf-8' and combine the disadvantages of both of the above methods: sometimes fail, sometimes return unicode where applications expect bytes One very common example of binary data: persistent object references. What if we break stride with the standard library pickle, do our own pickle[1] and load BINSTRINGs as bytes? iv. assume bytes [2] Then we break *every object instance* by putting byte strings into the instance __dict__ on Python 3: >>> obj.__dict__[b'attr'] = value >>> obj.attr Traceback ... AttributeError: ... What if we try to detect which SHORT_BINSTRINGs are bytes and which ones are native strings? v. try to decode 'ascii', if that fails, return bytes [3] Then we, again, get the disadvantage of approach (ii), only in a very inconsistent manner: sometimes pickled binary data unpickles into unicode. Half of your OIDs are now u'\0\0\0\0\0\0\0\x7f', the other half is b'\0\0\0\0\0\0\0\x80'. ZODB itself can cope with that [4], but will someone think of the childre^H^H^H^H^H applications? What if we introduce a way for applications to specify whether they want bytes or unicode? vi. define an explicit schema of some kind for each Persistent subclass, e.g. _p_load_as_bytes = ('names', 'of', 'attributes'); advanced users can override __setstate__ and do type fixups in there I don't know. I haven't had the time to think this through yet. It sounds like a huge amount of work for everyone. [1] https://github.com/zopefoundation/zodbpickle [2] zodbpickle.pickle.Unpickler(encoding='bytes') [3] zodbpickle.pickle.Unpickler(encoding='ascii', errors='bytes') [4] this is the status quo of the 'py3' branch in the ZODB repo That's the situation with loading. I've implemented approach (v) in the ZODB py3 branch, but I'm by no means certain it is acceptable. But that's not all, there's more fun to be had on the dumping side too! We want pickles created by ZODB to be a) reasonably short b) round-trippable (what you dump, you get back on load) c) compatible with Python 2 d) noload()able [5] [5] i.e. we want to be able to do garbage collection without actually instantiating user-defined classes (think of a ZEO server that doesn't have the right modules in sys.path, or standalone zodbgc processing), which is why we added noload() back into zodbpickle. noload() must be able to crawl the pickles and get back OIDs from persistent references. There are problems with each of these requirements, and solutions for those problems make the other requirements impossible to implement. * Python 3 pickles bytestrings using a fancy REDUCE opcode, as a function call to codecs.encode(u'decoded bytestring', 'latin-1'). This makes them large and breaks (a), and our noload() copied from Python 2.x stdlib is unable to handle them, breaking (d). [8] * Why does Python 3 pickle bytestrings this way? Because that's the only way to get round-trippability with Python's intepretation of BINSTRING opcodes as unicode, if you use pickle protocols 0, 1, or 2. Pickle protocol 3 has separate opcodes for all three kinds of strings (bytes, unicode, native -- remember?), but it's incompatible with Python 2, breaking requirement (c). * We could implement a custom pickler [6] and pickle bytestrings as SHORT_BINSTRING, fulfilling requirement (a) and (c) and (d), but this breaks (b), i.e. round-tripping. [6] zodbpickle.pickle.Pickler(bytes_as_strings=True) [7] [7] this is the status quo of the 'py3' branch in the ZODB repo [8] OTOH we could implement special support for REDUCE of codecs.decode() in our noload -- I almost got that working before Jim suggested a different approach, which is [6]. At least there's some nice symmetry: no matter if you pickle your stuff on Python 2 or Python 3, you get to deal with bytes becoming unicode when you unpickle. These kinds of guessing games are inevitable when you're migrating pickles from Python 2 to Python 3, but do we want to make them mandatory for day-to-day operation? Perhaps we ought to drop our original goal (3) and require an explicit one-time possibly-lossy conversion process for goal (2), then use pickle protocol 3 on Python 3 and have short pickles, perfect roundtripping of bytestrings? Then there's ZEO, which uses pickles for both payloads _and_ for marshalling in its RPC layer. That's also fun, but I think we can at least declare that ZEO server and client must be on the same Python version, perhaps by bumping the protocol version. So, this is where things stand right now. Plus a few relatively minor matters like adding missing noload() tests to zodbpickle and making zodbpickle work on Python 3.2 [9] [9] https://mail.zope.org/pipermail/checkins/2013-March/065813.html Other than that, the ZODB py3 branch works on Python 3.3 [10]. As long as you're prepared to deal with bytestrings magically transforming into unicodes. [10] Stephan reported running an actual small demo application with it. Where do we go from here? Marius Gedminas -- Basically, what "Ajax" means is "Javascript now works." -- Paul Graham
signature.asc
Description: Digital signature
_______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev