On Tue, Apr 16, 2013 at 4:38 PM, Tres Seaver <tsea...@palladion.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> After getting a bit bogged down during the PyCon US 2013 sprints, I'd
> like to restart the discussion by outlining the problem as I think I
> understand it now.
>
> Proposal for ZODB pickle compatibility
> ======================================
>
> Issues
> - ------
>
> - - There exists no forward-compatible way to pickle bytes on Python2
>   (Py3k pickle module "guesses", decoding any Python2 ``str`` using
>   ``latin1``).
>
> - - Some data pickled as ``str`` on Python2 truly is binary (e.g.,
>   ``Pdata`` objects for Zope2's ``OFS.Image.File`` and
>   ``OFS.Image.Image`` types;  crypto hases?)
>
> - - Some Python2 applications may have the same attribute for a given
>   class stored both as ``str`` and as ``unicode`` (due e.g., to bugs in
>   the code, literal defaults, browser quirks, changes to code over
>   time).
>
>
> Scenarios
> - ---------
>
>
> .. _py2_forever:
>
> Existing Python2-only Application
> +++++++++++++++++++++++++++++++++
>
> - - Code for the app is never(ish) going to migrate to Py3k.
>
> - - Using an updated / supported ZODb package **must** be possible
>
> - - Ideally, requires no changes to application code.
>
> - - Ideally, requies no database fixup / conversion.
>
> - - Best strategy is likely ignore_compat_.
>
>
> .. _py3k_only:
>
> New, Py3k-only Application
> ++++++++++++++++++++++++++
>
> - - Code for the app will run only on Py3k.
>
> - - Running with the latest-and-greatest ZODB **must** be possible.
>
> - - Ideally, the code for the app will make no concessions to backward-
>   compatibility.
>
> - - Best strategy is likely ignore_compat_.
>
>
> .. _migrate_w_convert:
>
> Python2 Application Migrating to Py3k
> +++++++++++++++++++++++++++++++++++++
>
> - - Application code "straddles" both Pythons using "compatible subset"
>   dialect, but only during the migration period.
>
> - - During that period, code **must** be able to open the database from
>   both Python2 and Py3k.
>
> - - Ideally, application code will need to make no concessions to
>   backward-compatibility after migration.
>
> - - It is acceptable to run a conversion process which normalizes all
>   active records in the database prior to testing.
>
> - - For databases which are already "binary clean" (binary data exists
>   only in blobs; the application creates no new non-blob binary
>   attributes), the best strategy is likely ignore_compat_.

I don't like the idea of supporting binary data only in blobs.

>
> - - For databases which are not already "binary clean" (there may be
>   non-blob binary attributes), the best strategy is likely to
>   convert_storages_, followed by replace_py2_cpickle_ (if the Python2
>   client might create new non-blob binary attributes).
>
> - - wrap_storages_ (on the Python2 side) might be simpler than
>   replace_py2_cpickle_, if the sources of non-blob binary attributes are
>   well understood.
>
>
> .. _straddle_w_convert:
>
> Python2 Application Straddling Python2 / Py3k (1)
> +++++++++++++++++++++++++++++++++++++++++++++++++
>
> - - Application code "straddles" both Pythons using "compatible subset"
>   dialect.
>
> - - Code **must** be able to open the database from both Python2 and Py3k.
>
> - - It is acceptable to run a conversion process which normalizes all
>   active records in the database prior to testing.
>
> - - For databases which are already "binary clean" (binary data exists
>   only in blobs; the application creates no new non-blob binary
>   attributes), the best strategy is likely ignore_compat_.
>
> - - For databases which are not already "binary clean" (there may be
>   non-blob binary attributes), the best strategy is likely to
>   convert_storages_, followed by replace_py2_cpickle_ (if the Python2
>   client might create new non-blob binary attributes).
>
> - - For cases where Python2 and Py3k clients may share the database for an
>   extended period, and where disruption to the Python2 clients must be
>   minimized, the replace_py3k_pickle_ strategy might be preferred, until
>   convert_storages_ becomes feasible.

IMO, _replace_py2_cPickle is the best strategy in this scenario.

As noted above, I think it's important to support non-blob binary
data.

>
>
> .. _straddle_no_convert:
>
> Python2 Application Migrating to Py3k (2)
> +++++++++++++++++++++++++++++++++++++++++
>
> - - Application code "straddles" both Pythons using "compatible subset"
>   dialect.
>
> - - Code **must** be able to open the database from both Python2 and Py3k.
>
> - - It is **not** acceptable to run a conversion process which normalizes
>   all active records in the database prior to testing (e.g., the
>   database is too large to convert on existing hardware, or the downtime
>   required for conversion is unacceptable).
>
> - - Because disruption to the Python2 clients must be minimized, the best
>   strategy is likely replace_py3k_pickle_ until convert_storages_
>   becomes feasible.
>
> - - Alternatively, wrap_storages_ might be the best strategy for the Py3k
>   clients.

I prefer _replace_py2_cPickle for this scenario also.  In fact, I prefer it
for any scenario that involved Python 2 & 3.

>
>
> Strategies
> - ----------
>
>
> .. _ignore_compat:
>
> Ignore compatibility
> ++++++++++++++++++++
>
> Use the stdlib pickle support in its default mode.
>
> - - No changes to the ``ZODB`` packages on Python2 or Py3k.
>
> - - Pickles created under Python2 will be readable on Py3k;  however,
>   *all* bytes data will be coerced (via ``latin1``) to unicode.
>
> - - Pickles created under Py3k will likely not be readable on Python2
>   (Python2 has no support for ``protocol 3``).
>
> - - Easiest usage for applications which are never going to straddle.
>
> - - Compatibility will only be achievalble via one-time conversions (where
>   the conversion script uses one of the other strategies or tools).
>
>
> .. _replace_py3k_pickle:
>
> Replace Py3k ``pickle``
> +++++++++++++++++++++++
>
> Keep pickling in the Python2 / protocol 1 way we have always done.
>
> - - No changes to the ``ZODB`` packages on Python2.  Storages do not need
>   to be configured with any custom pickle support.
>
> - - On Py3k, ``ZODB`` uses pickler / unpickler from the ``zodbpickle``
>   module, such that Python2 ``str`` objects are unpickled as ``bytes``;
>   ``bytes`` are pickled using the ``protocol 1`` opcodes (so that
>   Python2 will unpickle them as ``str``).

I think we should reject this one.  It breaks Python 3 instances.

>
> .. _replace_py2_cPickle:
>
> Replace Python2 ``cPickle``
> +++++++++++++++++++++++++++
>
> Move to pickling in the new protocol 3 way (native under Py3k).
>
> - - On Python2, applications which need to ensure that ``bytes`` objects
>   unpickle correctly under Py3k need must be changed to use a new type,
>   ``zodbpickle,binary``.  ``ZODB`` is configured with pickler / upickler
>   from ``zodbpickle``, such that objects of this type will be pickled
>   using the ``protocol 3`` opcodes for bytes (so that Py3k will unpickle
>   them as ``bytes``).
>
> - - Existing data for the affected classes will need to be fixed up using
>   a variation of convert_storages_.
>
> - - No changes to the ``ZODB`` packages on Py3k.  Storages do not need to
>   be configured with any custom pickle support.

This is a winner.

>
> .. _convert_storages:
>
> Convert Database Storages
> +++++++++++++++++++++++++
>
> - - Need tool(s) to identify problematic data:
>
>   - Classes which mix ``str`` and ``unicode`` values for the same
>     attribute across records / instances.
>
> - - Utility which can apply per-class transforms to state pickles:
>
>   - E.g., for instances of ``OFS.Image.Pdata``, convert the ``data``
>     attribute (which should be a Python2 ``str``) to
>     ``zodbpickle.binary``.  (Of course, these would probably be better
>     off written out as blobs).
>
>   - Or, for some application which mixes ``str`` and ``unicode`` under
>     Python2 (either across instances or across transaction):  upconvert
>     any value of type ``str`` for the given attribute(s) to ``unicode``,
>     using a configured encoding strategy (e.g, try ``utf8`` first,
>     falling back to ``latin1``).
>
> - - One-time converter utility would use ``copyTransactionsFrom``-style
>   pattern, opening the existing database readonly, getting pickles for
>   each transaction, invoking the converter utility for each instance to
>   fix up the pickle, then writing the converted pickles into the new
>   database.
>
>
> .. _wrap_storages:
>
> Wrap Database Storages
> ++++++++++++++++++++++
>
> - - A wrapper storage uses the converter utility (identified above) during
>   the ``load`` operation, fixing up the object state it is handed to the
>   instance's ``__setstate__``.
>
> - - During the ``save`` operation, the wrapper would fix up pickled
>   instance state (after calling ``__getstate__``).
>
> - - Wrappers might be applied under Python2 (e.g., for apps where the
>   databse is already converted to ``protocol 3``) as an alternative to
>   replace_py2_cpickle_.
>
> - - Wrappers might be applied under Py3k (e.g., for apps where the databse
>   is not already converted to ``protocol 3``) as an alternative to
>   replace_py3k_pickle_..
>
>
> Concrete Proposal
> - -----------------
>
> I believe we will need to update ``zodbpickle`` and ``ZDOB`` to allow
> for any of the strategies to be applied.
>
> - - ``zodbpickle`` should provide the script which analyzes pickles in
>   a database for inconsistent ``str`` / ``unicode`` usage.  See:
>   https://github.com/jimfulton/dbstringanalysis
>
> - - ``zodbpickle`` should provide the utility for registering per-class
>   fixups.
>
> - - ``zodbpickle`` should provide the script which uses that utility
>   do to one-time conversion of a storage (supporting convert_storages_).
>
> - - ``zodbpickle`` should provide a new ``binary`` type which Python2
>   applications can begin using to signal that attributes should be
>   unpickled in Py3k as ``bytes``.  See:
>   https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes
>
> - - ``zodbpickle`` should provide a pickler/unpickler for use by
>   Python2 clients who operate against converted storages
>   (replace_py2_cpickle_). See:
>   https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes
>
> - - ``zodbpickle`` should provide a pickler/unpickler for use by
>   Py3k clients who operate against unconverted storages
>   (replace_py3k_pickle_). See:
>   https://github.com/zopefoundation/zodbpickle
>
> - - ``zodbpickle`` might need to provide a wrapper storage supporting
>   straddle_no_convert_.
>
>
> Comments?

Thanks for taking the time to work all of this out.

It sounds rather complex. :)

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to