On Tue, Apr 16, 2013 at 4:38 PM, Tres Seaver <tsea...@palladion.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > After getting a bit bogged down during the PyCon US 2013 sprints, I'd > like to restart the discussion by outlining the problem as I think I > understand it now. > > Proposal for ZODB pickle compatibility > ====================================== > > Issues > - ------ > > - - There exists no forward-compatible way to pickle bytes on Python2 > (Py3k pickle module "guesses", decoding any Python2 ``str`` using > ``latin1``). > > - - Some data pickled as ``str`` on Python2 truly is binary (e.g., > ``Pdata`` objects for Zope2's ``OFS.Image.File`` and > ``OFS.Image.Image`` types; crypto hases?) > > - - Some Python2 applications may have the same attribute for a given > class stored both as ``str`` and as ``unicode`` (due e.g., to bugs in > the code, literal defaults, browser quirks, changes to code over > time). > > > Scenarios > - --------- > > > .. _py2_forever: > > Existing Python2-only Application > +++++++++++++++++++++++++++++++++ > > - - Code for the app is never(ish) going to migrate to Py3k. > > - - Using an updated / supported ZODb package **must** be possible > > - - Ideally, requires no changes to application code. > > - - Ideally, requies no database fixup / conversion. > > - - Best strategy is likely ignore_compat_. > > > .. _py3k_only: > > New, Py3k-only Application > ++++++++++++++++++++++++++ > > - - Code for the app will run only on Py3k. > > - - Running with the latest-and-greatest ZODB **must** be possible. > > - - Ideally, the code for the app will make no concessions to backward- > compatibility. > > - - Best strategy is likely ignore_compat_. > > > .. _migrate_w_convert: > > Python2 Application Migrating to Py3k > +++++++++++++++++++++++++++++++++++++ > > - - Application code "straddles" both Pythons using "compatible subset" > dialect, but only during the migration period. > > - - During that period, code **must** be able to open the database from > both Python2 and Py3k. > > - - Ideally, application code will need to make no concessions to > backward-compatibility after migration. > > - - It is acceptable to run a conversion process which normalizes all > active records in the database prior to testing. > > - - For databases which are already "binary clean" (binary data exists > only in blobs; the application creates no new non-blob binary > attributes), the best strategy is likely ignore_compat_.
I don't like the idea of supporting binary data only in blobs. > > - - For databases which are not already "binary clean" (there may be > non-blob binary attributes), the best strategy is likely to > convert_storages_, followed by replace_py2_cpickle_ (if the Python2 > client might create new non-blob binary attributes). > > - - wrap_storages_ (on the Python2 side) might be simpler than > replace_py2_cpickle_, if the sources of non-blob binary attributes are > well understood. > > > .. _straddle_w_convert: > > Python2 Application Straddling Python2 / Py3k (1) > +++++++++++++++++++++++++++++++++++++++++++++++++ > > - - Application code "straddles" both Pythons using "compatible subset" > dialect. > > - - Code **must** be able to open the database from both Python2 and Py3k. > > - - It is acceptable to run a conversion process which normalizes all > active records in the database prior to testing. > > - - For databases which are already "binary clean" (binary data exists > only in blobs; the application creates no new non-blob binary > attributes), the best strategy is likely ignore_compat_. > > - - For databases which are not already "binary clean" (there may be > non-blob binary attributes), the best strategy is likely to > convert_storages_, followed by replace_py2_cpickle_ (if the Python2 > client might create new non-blob binary attributes). > > - - For cases where Python2 and Py3k clients may share the database for an > extended period, and where disruption to the Python2 clients must be > minimized, the replace_py3k_pickle_ strategy might be preferred, until > convert_storages_ becomes feasible. IMO, _replace_py2_cPickle is the best strategy in this scenario. As noted above, I think it's important to support non-blob binary data. > > > .. _straddle_no_convert: > > Python2 Application Migrating to Py3k (2) > +++++++++++++++++++++++++++++++++++++++++ > > - - Application code "straddles" both Pythons using "compatible subset" > dialect. > > - - Code **must** be able to open the database from both Python2 and Py3k. > > - - It is **not** acceptable to run a conversion process which normalizes > all active records in the database prior to testing (e.g., the > database is too large to convert on existing hardware, or the downtime > required for conversion is unacceptable). > > - - Because disruption to the Python2 clients must be minimized, the best > strategy is likely replace_py3k_pickle_ until convert_storages_ > becomes feasible. > > - - Alternatively, wrap_storages_ might be the best strategy for the Py3k > clients. I prefer _replace_py2_cPickle for this scenario also. In fact, I prefer it for any scenario that involved Python 2 & 3. > > > Strategies > - ---------- > > > .. _ignore_compat: > > Ignore compatibility > ++++++++++++++++++++ > > Use the stdlib pickle support in its default mode. > > - - No changes to the ``ZODB`` packages on Python2 or Py3k. > > - - Pickles created under Python2 will be readable on Py3k; however, > *all* bytes data will be coerced (via ``latin1``) to unicode. > > - - Pickles created under Py3k will likely not be readable on Python2 > (Python2 has no support for ``protocol 3``). > > - - Easiest usage for applications which are never going to straddle. > > - - Compatibility will only be achievalble via one-time conversions (where > the conversion script uses one of the other strategies or tools). > > > .. _replace_py3k_pickle: > > Replace Py3k ``pickle`` > +++++++++++++++++++++++ > > Keep pickling in the Python2 / protocol 1 way we have always done. > > - - No changes to the ``ZODB`` packages on Python2. Storages do not need > to be configured with any custom pickle support. > > - - On Py3k, ``ZODB`` uses pickler / unpickler from the ``zodbpickle`` > module, such that Python2 ``str`` objects are unpickled as ``bytes``; > ``bytes`` are pickled using the ``protocol 1`` opcodes (so that > Python2 will unpickle them as ``str``). I think we should reject this one. It breaks Python 3 instances. > > .. _replace_py2_cPickle: > > Replace Python2 ``cPickle`` > +++++++++++++++++++++++++++ > > Move to pickling in the new protocol 3 way (native under Py3k). > > - - On Python2, applications which need to ensure that ``bytes`` objects > unpickle correctly under Py3k need must be changed to use a new type, > ``zodbpickle,binary``. ``ZODB`` is configured with pickler / upickler > from ``zodbpickle``, such that objects of this type will be pickled > using the ``protocol 3`` opcodes for bytes (so that Py3k will unpickle > them as ``bytes``). > > - - Existing data for the affected classes will need to be fixed up using > a variation of convert_storages_. > > - - No changes to the ``ZODB`` packages on Py3k. Storages do not need to > be configured with any custom pickle support. This is a winner. > > .. _convert_storages: > > Convert Database Storages > +++++++++++++++++++++++++ > > - - Need tool(s) to identify problematic data: > > - Classes which mix ``str`` and ``unicode`` values for the same > attribute across records / instances. > > - - Utility which can apply per-class transforms to state pickles: > > - E.g., for instances of ``OFS.Image.Pdata``, convert the ``data`` > attribute (which should be a Python2 ``str``) to > ``zodbpickle.binary``. (Of course, these would probably be better > off written out as blobs). > > - Or, for some application which mixes ``str`` and ``unicode`` under > Python2 (either across instances or across transaction): upconvert > any value of type ``str`` for the given attribute(s) to ``unicode``, > using a configured encoding strategy (e.g, try ``utf8`` first, > falling back to ``latin1``). > > - - One-time converter utility would use ``copyTransactionsFrom``-style > pattern, opening the existing database readonly, getting pickles for > each transaction, invoking the converter utility for each instance to > fix up the pickle, then writing the converted pickles into the new > database. > > > .. _wrap_storages: > > Wrap Database Storages > ++++++++++++++++++++++ > > - - A wrapper storage uses the converter utility (identified above) during > the ``load`` operation, fixing up the object state it is handed to the > instance's ``__setstate__``. > > - - During the ``save`` operation, the wrapper would fix up pickled > instance state (after calling ``__getstate__``). > > - - Wrappers might be applied under Python2 (e.g., for apps where the > databse is already converted to ``protocol 3``) as an alternative to > replace_py2_cpickle_. > > - - Wrappers might be applied under Py3k (e.g., for apps where the databse > is not already converted to ``protocol 3``) as an alternative to > replace_py3k_pickle_.. > > > Concrete Proposal > - ----------------- > > I believe we will need to update ``zodbpickle`` and ``ZDOB`` to allow > for any of the strategies to be applied. > > - - ``zodbpickle`` should provide the script which analyzes pickles in > a database for inconsistent ``str`` / ``unicode`` usage. See: > https://github.com/jimfulton/dbstringanalysis > > - - ``zodbpickle`` should provide the utility for registering per-class > fixups. > > - - ``zodbpickle`` should provide the script which uses that utility > do to one-time conversion of a storage (supporting convert_storages_). > > - - ``zodbpickle`` should provide a new ``binary`` type which Python2 > applications can begin using to signal that attributes should be > unpickled in Py3k as ``bytes``. See: > https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes > > - - ``zodbpickle`` should provide a pickler/unpickler for use by > Python2 clients who operate against converted storages > (replace_py2_cpickle_). See: > https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes > > - - ``zodbpickle`` should provide a pickler/unpickler for use by > Py3k clients who operate against unconverted storages > (replace_py3k_pickle_). See: > https://github.com/zopefoundation/zodbpickle > > - - ``zodbpickle`` might need to provide a wrapper storage supporting > straddle_no_convert_. > > > Comments? Thanks for taking the time to work all of this out. It sounds rather complex. :) Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev