[ZODB-Dev] Towards ZODB on Python 3
(Resending because I used the wrong From address and the mail got stuck in moderation.) Some goals, in order of decreasing priority 1. ZODB should work on Python 3 2. ZODB databases created on Python 2 should be loadable with ZODB on Python 3. 3. ZODB databases created on Python 3 should be loadable with ZODB on Python 2. This will be kinda longish, so please settle down. Now, ZODB is built on top of pickles. And pickles in Python 2 know about two kinds of strings: str and unicode. But there are actually *three* kinds of strings in Python-land: * bytes * unicode * native strings (same as bytes in Python 2, same as unicode in Python 3) Unfortunately we cannot distinguish bytes from native strings in the pickles produced on Python 2: both kinds are pickled as STRING, BINSTRING or SHORT_BINSTRING opcodes. If we assume they're native strings, we can break pickles that contain binary data, in one of two possible ways: i. assume 'ascii' and raise UnicodeDecodeError while loading ii. assume 'latin-1' and silently give applications unicode objects where they expect strings iii. assume 'utf-8' and combine the disadvantages of both of the above methods: sometimes fail, sometimes return unicode where applications expect bytes One very common example of binary data: persistent object references. What if we break stride with the standard library pickle, do our own pickle[1] and load BINSTRINGs as bytes? iv. assume bytes [2] Then we break *every object instance* by putting byte strings into the instance __dict__ on Python 3: obj.__dict__[b'attr'] = value obj.attr Traceback ... AttributeError: ... What if we try to detect which SHORT_BINSTRINGs are bytes and which ones are native strings? v. try to decode 'ascii', if that fails, return bytes [3] Then we, again, get the disadvantage of approach (ii), only in a very inconsistent manner: sometimes pickled binary data unpickles into unicode. Half of your OIDs are now u'\0\0\0\0\0\0\0\x7f', the other half is b'\0\0\0\0\0\0\0\x80'. ZODB itself can cope with that [4], but will someone think of the childre^H^H^H^H^H applications? What if we introduce a way for applications to specify whether they want bytes or unicode? vi. define an explicit schema of some kind for each Persistent subclass, e.g. _p_load_as_bytes = ('names', 'of', 'attributes'); advanced users can override __setstate__ and do type fixups in there I don't know. I haven't had the time to think this through yet. It sounds like a huge amount of work for everyone. [1] https://github.com/zopefoundation/zodbpickle [2] zodbpickle.pickle.Unpickler(encoding='bytes') [3] zodbpickle.pickle.Unpickler(encoding='ascii', errors='bytes') [4] this is the status quo of the 'py3' branch in the ZODB repo That's the situation with loading. I've implemented approach (v) in the ZODB py3 branch, but I'm by no means certain it is acceptable. But that's not all, there's more fun to be had on the dumping side too! We want pickles created by ZODB to be a) reasonably short b) round-trippable (what you dump, you get back on load) c) compatible with Python 2 d) noload()able [5] [5] i.e. we want to be able to do garbage collection without actually instantiating user-defined classes (think of a ZEO server that doesn't have the right modules in sys.path, or standalone zodbgc processing), which is why we added noload() back into zodbpickle. noload() must be able to crawl the pickles and get back OIDs from persistent references. There are problems with each of these requirements, and solutions for those problems make the other requirements impossible to implement. * Python 3 pickles bytestrings using a fancy REDUCE opcode, as a function call to codecs.encode(u'decoded bytestring', 'latin-1'). This makes them large and breaks (a), and our noload() copied from Python 2.x stdlib is unable to handle them, breaking (d). [8] * Why does Python 3 pickle bytestrings this way? Because that's the only way to get round-trippability with Python's intepretation of BINSTRING opcodes as unicode, if you use pickle protocols 0, 1, or 2. Pickle protocol 3 has separate opcodes for all three kinds of strings (bytes, unicode, native -- remember?), but it's incompatible with Python 2, breaking requirement (c). * We could implement a custom pickler [6] and pickle bytestrings as SHORT_BINSTRING, fulfilling requirement (a) and (c) and (d), but this breaks (b), i.e. round-tripping. [6] zodbpickle.pickle.Pickler(bytes_as_strings=True) [7] [7] this is the status quo of the 'py3' branch in the ZODB repo [8] OTOH we could implement special support for REDUCE of codecs.decode() in our noload -- I almost got that working before Jim suggested a different approach, which is [6]. At least there's some nice symmetry: no matter if you pickle your
Re: [ZODB-Dev] Cache warm up time
I'd be curious to know what your results are, whichever path you decide to take! Might help inform me as to what might help on my server... One thing I haven't yet understood is - how come the ZEO server itself doesn't have a cache? It seems that would be a logical place to put one as the ZEO server generally rarely gets restarted, at least for the use case of running both the ZEO server and the clients on the same machine. On Fri, Mar 8, 2013 at 1:46 AM, Roché Compaan ro...@upfrontsystems.co.zawrote: Thanks, there are definitely some settings relating to the persistent cache that I haven't tried before, simply because I've been avoiding them. I'd still be interested to know if one can leverage the Relstorage memcache code for a ZEO cache, so if Shane doesn't get around to it I'll have a stab at it myself. Loading objects from a persistent cache will still cause IO so to me it seems that it would be a big win to keep the cache in memory even while restarting. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za On Thu, Mar 7, 2013 at 9:35 PM, Leonardo Rochael Almeida leoroch...@gmail.com wrote: This mail from Jim at this list a couple of years ago was stocked full of nice tips: https://mail.zope.org/pipermail/zodb-dev/2011-May/014180.html In particular: - Yes, use persistent cache. Recent versions are reliable. Make it as large as resonable (e.g at most the size of your packed database, at least the size of objects that you want to be around after a restart). - Consider using zc.zlibstorage to compress the data that's stored in ZODB - set drop-cache-rather-verify to true on the client (avoid long restart time where your client is revalidating the ZEO cache) - set invalidation-age on the server to at least an hour or two so that you deal with being disconnected from the storage server for a reasonable period of time without having to verify. Cheers, Leo On Thu, Mar 7, 2013 at 3:54 PM, Roché Compaan ro...@upfrontsystems.co.za wrote: We have a setup that is running just fine when the caches are warm but it takes several minutes after a restart before the cache warms up. As per usual, big catalog indexes seem to be the problem. I was wondering about two things. Firstly, in 2011 in this thread https://mail.zope.org/pipermail/zodb-dev/2011-October/014398.html about zeo.memcache, Shane said that he could adapt the caching code in RelStorage for ZEO. Shane do you still plan to do this? Do you think an instance can restart without having to reload most objects into the cache? Secondly, I was wondering to what extent using persistent caches can improve cache warm up time and if persistent caches are usable or not, given that at various times in the past, it was recommended that one try and avoid them. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Cache warm up time
On Fri, Mar 8, 2013 at 2:17 PM, Claudiu Saftoiu csaft...@gmail.com wrote: Once I know the difference I'll probably be able to answer this myself, but I wonder why the ZEO server doesn't do the sort of caching that allow the client to operate so quickly on the indices once they are loaded. IIRC zeo not only takes bytes from the storage and put them on a socket, it has a kind of heavy protocol for sending objects that has overhead on each object, so lots of small objects (that are 400mb in size) take a lot more time than sending a 400mb blob. -- Leonardo Santagada ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Cache warm up time
On Fri, Mar 8, 2013 at 12:31 PM, Leonardo Santagada santag...@gmail.comwrote: On Fri, Mar 8, 2013 at 2:17 PM, Claudiu Saftoiu csaft...@gmail.comwrote: Once I know the difference I'll probably be able to answer this myself, but I wonder why the ZEO server doesn't do the sort of caching that allow the client to operate so quickly on the indices once they are loaded. IIRC zeo not only takes bytes from the storage and put them on a socket, it has a kind of heavy protocol for sending objects that has overhead on each object, so lots of small objects (that are 400mb in size) take a lot more time than sending a 400mb blob. Ah that would make perfect sense. So ZEO and catalog indices really don't mix well at all. ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Cache warm up time
It would be great if there was a way to advise ZODB in advance that certain objects would be required so it could fetch multiple object states in a single request to the storage server. I saw a ZODB prefetching discussion long time ago, but maybe the authors themselves can weight in here http://www.python.org/~jeremy/weblog/030418.html -- Mikko Ohtamaa http://opensourcehacker.com http://twitter.com/moo9000 ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Cache warm up time
A very simple alternative to prefetching would be to load the whole DB into memory indiscriminately, if it is configured to do so. This way, you can store your catalog in a separate db and request all of it from the ZEO server and cache it straight away. I'm still partial to a memcached cache that can survive a restart. The first prize would be if it's possible to share the cache between zeo clients. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za On Fri, Mar 8, 2013 at 7:50 PM, Laurence Rowe l...@lrowe.co.uk wrote: On 8 March 2013 09:38, Claudiu Saftoiu csaft...@gmail.com wrote: On Fri, Mar 8, 2013 at 12:31 PM, Leonardo Santagada santag...@gmail.com wrote: On Fri, Mar 8, 2013 at 2:17 PM, Claudiu Saftoiu csaft...@gmail.com wrote: Once I know the difference I'll probably be able to answer this myself, but I wonder why the ZEO server doesn't do the sort of caching that allow the client to operate so quickly on the indices once they are loaded. IIRC zeo not only takes bytes from the storage and put them on a socket, it has a kind of heavy protocol for sending objects that has overhead on each object, so lots of small objects (that are 400mb in size) take a lot more time than sending a 400mb blob. Ah that would make perfect sense. So ZEO and catalog indices really don't mix well at all. The slowdown is largely because ZODB only loads objects one at a time. Loading a large catalogue requires paying that latency (network + software) each time, a 400mb of catalogue data may well equate to something like 1 objects, and therefore 1 loads in series. Once the data is loaded into the object cache you only need to fetch invalidated objects. It would be great if there was a way to advise ZODB in advance that certain objects would be required so it could fetch multiple object states in a single request to the storage server. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage 1.5.1 and persistent 4.0.5+ Incompatible (patch)
On 03/07/2013 10:48 AM, jason.mad...@nextthought.com wrote: On Mar 7, 2013, at 11:35, Sean Upton sdup...@gmail.com wrote: On Thu, Mar 7, 2013 at 7:31 AM, jason.mad...@nextthought.com wrote: I only spotted two uses of this assumption in RelStrorage, the above-mentioned `_prepare_tid`, plus `pack`. The following simple patch to change those places to use `raw` makes our own internal tests (python2.7, MySQL) pass. Why not fork https://github.com/zodb/relstorage and submit a pull request? Because I didn't realize that repository existed :) I will do so, thanks. On that note, though, the PyPI page still links to the SVN repository at http://svn.zope.org/relstorage/trunk/ (which is also what comes up in a Google search), and that repository still has all its contents; it's missing the 'MOVED_TO_GITHUB' file that's commonly there when the project has been moved (e.g., [1]). With a bit of searching I found the announcement on this list that development had been moved[2], but to a first glance it looks like SVN is still the place to be. If the move is complete, maybe it would be good to replace the SVN contents with the MOVED_TO_GITHUB pointer? Thanks for the patch and suggestion. I intend to handle RelStorage pull requests during/around PyCon next week. :-) Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] transaction: synchronizer newTransaction() behavior
Hi there, I've been discussing this issue with Laurence Rowe on the pylons-dev mailing list, and he suggested bringing it up here. I'm writing a MongoDB data manager for the python transaction package: https://github.com/countvajhula/mongomorphism I noticed that for a synchronizer, the beforeCompletion() and afterCompletion() methods are always called once the synch has been registered, but the newTransaction() method is only called when an explicit call to transaction.begin() is made. Since it's possible for transactions to be started without this explicit call, I was wondering if there was a good reason why these two cases (explicitly vs implicitly begun transactions) would be treated differently. That is, should the following two cases not be equivalent, and therefore should the newTransaction() method be called in both cases: (1) t = transaction.get() t.join(my_dm) ..some changes to the data.. transaction.commit() and: (2) transaction.begin() t = transaction.get() t.join(my_dm) ..some changes to the data.. transaction.commit() In my mongo dm implementation, I am using the synchronizer to do some initialization before each transaction gets underway, and am currently requiring explicit calls to transaction.begin() at the start of each transaction. Unfortunately, it appears that other third party libraries using the transaction library may not be calling begin() explicitly, and in particular my data manager doesn't work when used with pyramid_tm. Another thing I noticed was that a synchronizer cannot be registered like so: transaction.manager.registerSynch(MySynch()) .. and can only be registered like this: synch = MySynch() transaction.manager.registerSynch(synch) ... which I'm told is due to MySynch() being stored in a WeakSet which means it gets garbage collected. Currently this means that I'm retaining a reference to the synch as a global that I never use. Just seems a bit contrived so thought I'd mention that as well, in case there's anything that can be done about that. Any thoughts? Thanks! -Sid ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev