Hi all, We recently ran into some surprising behaviour when combining zc.zlibstorage on top of a RelStorage (i.e., you don't get compressed records in the SQL database), and I was wondering if anyone else had noticed the same thing. We came up with (what seems to be) a workaround for our use cases (so far anyway), and I was curious if it seems like a good idea to others or if there are reasons to avoid it.
Background ---------- We noticed this when we had some FileStorage ZODB databases that were compressed with zc.zlibstorage and we uploaded them to a RelStorage database (also configured with zc.zlibstorage) using zodbconvert, with a configuration like so: %import zc.zlibstorage <zlibstorage source> <filestorage source> path ...data/data.fs </filestorage> </zlibstorage> <zlibstorage destination> <relstorage destination> ... </relstorage> </zlibstorage> When we started our application against the RelStorage database (using a similar RelStorage + zc.zlibstorage configuration), we were surprised to be greeted with "UnpicklingError: bad pickle data." A quick peek at the pickled data (in the SQL database's object_state table) showed that it was compressed by zc.zlibstorage: it had the leading '.z' record marker. Setting a breakpoint in ZODB.Connection showed us that, despite the configuration, Connection's `_storage` was not a ZlibStorage after all, but a plain RelStorage. However, the ZODB.DB instance's `storage` *was* a ZlibStorage (as expected). IMVCCStorage ------------ The issue seems to be that when a ZODB.Connection is constructed with a storage that provides `IMVCCStorage`, it retains and uses, not that storage object, but the result of calling `IMVCCStorage.new_instance()`. As a storage wrapper, ZlibStorage claims to provide all the same interfaces as the underlying storage; any attributes provided by the underlying storage that ZlibStorage itself does not provide are delegated to the underlying storage through __getattr__. ZlibStorage does not define `new_instance`, so the underlying RelStorage is invoked to create and return a raw RelStorage instance, with the net result being that the Connection uses the raw RelStorage and never the wrapping ZlibStorage and so can never read or write compressed records. (zodbconvert uses the storage objects directly, without a Connection and new_instance, so that's why it was able to write compressed records initially.) Solution -------- The solution that we're testing now (and which so far seems to be working---our application can read and write the compressed databases records) was to patch in a new_instance method to ZlibStorage to make it wrap the underlying storage again: def new_instance(self): new_self = type(self).__new__(type(self)) # Preserve _transform, etc new_self.__dict__ = self.__dict__.copy() new_self.base = self.base.new_instance() # Because these are bound methods, we must re-copy # them or ivars might be wrong, like _transaction for name in self.copied_methods: v = getattr(new_self.base, name, None) if v is not None: setattr(new_self, name, v) return new_self ZlibStorage.new_instance = new_instance Does anyone have any comments on this? (My google-fu didn't find this mentioned before, but my google-fu is sometimes weak.) The only IMVCCStorage I've worked with so far has been RelStorage; is this likely to break with others or are we likely to run into trouble with RelStorage down the line? Does this seem like something that could make it into the tree, or is it special-purpose enough that we should continue patching? Thanks, Jason _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev