Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?
On 18 January 2013 10:21, Claudiu Saftoiu csaft...@gmail.com wrote: Er, to be clearer: my goal is for the preload to load everything into the cache that the query mechanism might use. It seems the bucket approach only takes ~10 seconds on the 350k-sized index trees vs. ~60-90 seconds. This seems to indicate that less things end up being pre-loaded... I guess I was too subtle before. Preloading is a waste of time. Just use a persistent ZEO cache of adequate size and be done with it. Okay. I did that, and I only tried the preloading because it didn't seem I was getting what I wanted. To wit: I ran a simple query and it took a good few minutes. It's true, after it took a few minutes, it ran instantly, and even after a server restart it only took a few seconds, but I don't understand why it took a few minutes in the first place. There are only 750k objects in that database, and I gave it a cache object size of 5 million; the packed database .fs is only 400 megabytes, and I gave it a cache byte size of 3000 megabytes. Then when I change one parameter of the query (to ask for objects with a month of november instead of october), it takes another few minutes... Speaking to your point, preloading didn't seem to help either (I had 'preloaded' dozens of times over the past few days and the queries still took forever), but the fact remains: it does not seem unreasonable to want these queries to run instantly from the get-go, given that is the point of indexing in the first place. As it stands now, for certain queries I could probably do better loading each object and filtering it via python because I wouldn't have to deal with loading the indices in order to run the 'fast' query, but this seems to defeat the point of indices entirely, and I'd like to not have to create custom search routines for every separate query. Again, maybe I'm doing something wrong, but I haven't been able to figure it out yet. I made a view to display the output of cacheDetailSize like Jeff suggested and I got something like this: db = ... for conn_d in db.cacheDetailSize(): writer.write(%(connection)s, size=%(size)s, non-ghost-size=%(ngsize)s\n % conn_d) output: Connection at 0684fe90, size=635683, non-ghost-size=209039 Connection at 146c5ad0, size=3490, non-ghost-size=113 That is after having run the 'preloading'. It seems that when the query takes forever, the non-ghost-size is slowly increasing (~100 objects/second) while the 'size' stays the same. Once the query is done after having taken a few minutes, each subsequent run is instant and the ngsize doesn't grow. My naive question is: it has plenty of RAM, why does it not just load everything into the RAM? Any suggestions? There must be a way to effectively use indexing with zodb and what I'm doing isn't working. Have you confirmed that the ZEO client cache file is being used? Configure logging to display the ZEO messages to make sure. The client cache is transient by default, so you will need to enable persistent client caching to see an effect past restarts: zeoclient client zeo1 ... /zeoclient https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RFC: ZODB 4.0 (without persistent)
On 14 October 2012 22:49, Jim Fulton j...@zope.com wrote: On Sun, Oct 14, 2012 at 5:28 PM, Tres Seaver tsea...@palladion.com wrote: ... Well, I don't have time to chase BTrees. This could always be done in ZODB 5. :) I could help chop BTrees out, if that would be useful: most of the effort will be purely subtractive in the ZODB package (I don't think anything depends on BTrees). FileStorage uses BTrees for it's in-memory index. MappingStorage used BTrees. There are ZODB tests that use BTrees, but I suppose they could be fixed. I just don't think the win is that great in separating BTrees at this time. I don't think Hanno is suggesting removing BTrees as a dependency from ZODB but rather breaking out the BTrees package into a separate PyPI distribution to make it more visible to potential users outside of the ZODB community, e.g. http://www.reddit.com/r/Python/comments/exj74/btree_c_extension_module_for_python_alpha/ To do that, refactoring tests shouldn't be required. I guess it could be argued that the fsBTree should be part of the ZODB rather than BTrees distribution, but leaving it where it is would be much easier. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RFC: ZODB 4.0 (without persistent)
On 14 October 2012 23:33, Jim Fulton j...@zope.com wrote: On Sun, Oct 14, 2012 at 6:07 PM, Laurence Rowe l...@lrowe.co.uk wrote: On 14 October 2012 22:49, Jim Fulton j...@zope.com wrote: On Sun, Oct 14, 2012 at 5:28 PM, Tres Seaver tsea...@palladion.com wrote: ... Well, I don't have time to chase BTrees. This could always be done in ZODB 5. :) I could help chop BTrees out, if that would be useful: most of the effort will be purely subtractive in the ZODB package (I don't think anything depends on BTrees). FileStorage uses BTrees for it's in-memory index. MappingStorage used BTrees. There are ZODB tests that use BTrees, but I suppose they could be fixed. I just don't think the win is that great in separating BTrees at this time. I don't think Hanno is suggesting removing BTrees as a dependency from ZODB but rather breaking out the BTrees package into a separate PyPI distribution to make it more visible to potential users outside of the ZODB community, e.g. http://www.reddit.com/r/Python/comments/exj74/btree_c_extension_module_for_python_alpha/ I think if we released a package named BTrees and people looked at it and saw that it was dependent on persistent and ZODB, they'd get pissed. Let's leave BTrees alone for now. Presumably the dependency tree would look something like: persistent BTrees ZODB ZEO The persistent dependency is definitely less to swallow than the whole ZODB for a potential user of the BTrees package, but its still a complication and there's no urgent reason to make the change now. Smaller, iterative changes usually win. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)
On 30 August 2012 19:19, Shane Hathaway sh...@hathawaymix.org wrote: On 08/30/2012 10:14 AM, Marius Gedminas wrote: On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote: On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas mar...@gedmin.as wrote: On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote: On Tue, 28 Aug 2012 16:31:20 +0200, Martijn Pieters m...@zopatista.com wrote : Anything else different? Did you make any performance comparisons between RelStorage and NEO? I believe the main difference compared to all other ZODB Storage implementation is the finer-grained locking scheme: in all storage implementations I know, there is a database-level lock during the entire second phase of 2PC, whereas in NEO transactions are serialised only when they alter a common set of objects. This could be a compelling point. I've seen deadlocks in an app that tried to use both ZEO and PostgreSQL via the Storm ORM. (The thread holding the ZEO commit lock was blocked waiting for the PostgreSQL commit to finish, while the PostgreSQL server was waiting for some other transaction to either commit or abort -- and that other transaction couldn't proceed because it was waiting for the ZEO lock.) This sounds like an application/transaction configuration problem. *shrug* Here's the code to reproduce it: http://pastie.org/4617132 To avoid this sort of deadlock, you need to always commit in a a consistent order. You also need to configure ZEO (or NEO) to time-out transactions that take too long to finish the second phase. The deadlock happens in tpc_begin() in both threads, which is the first phase, AFAIU. AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes the ZEO commit lock. Then it enters tpc_begin() for Storm's StoreDataManager and blocks waiting for a response from PostgreSQL -- which is delayed because the PostgreSQL server is waiting to see if the other thread, Thread #1, will commit or abort _its_ transaction, which is conflicting with the one from Thread #2. Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the ZEO commit lock held by Thread #2. So thread 1 acquires in this order: 1. PostgreSQL 2. ZEO Thread 2 acquires in this order: 1. ZEO 2. PostgreSQL SQL databases handle deadlocks by detecting and automatically rolling back transactions, while the transaction package expects all data managers to completely avoid deadlocks using the sortKey method. I haven't looked at the code, but I imagine Storm's StoreDataManager implements IDataManager. I wonder if StoreDataManager provides a consistent sortKey. The sortKey method must return a string (not an integer or other object) that is consistent yet different from all other participating data managers. Storm's DataManager defines sortKey as: def sortKey(self): # Stores in TPC mode should be the last to be committed, this makes # it possible to have TPC behavior when there's only a single store # not in TPC mode. if self._store._tpc: prefix = zz else: prefix = aa return %s_store_%d % (prefix, id(self)) http://bazaar.launchpad.net/~storm/storm/trunk/view/head:/storm/zope/zstorm.py#L320 (By default self._store._tpc is set to False.) This is essentially similar to zope.sqlalchemy's, the single phase variant being: 105 def sortKey(self): 106 # Try to sort last, so that we vote last - we may commit in tpc_vote(), 107 # which allows Zope to roll back its transaction if the RDBMS 108 # threw a conflict error. 109 return ~sqlalchemy:%d % id(self.tx) http://zope3.pov.lt/trac/browser/zope.sqlalchemy/trunk/src/zope/sqlalchemy/datamanager.py#L105 (The TPC variant simply omits the leading tilde as it is not required to sort last - zope.sqlalchemy commits in tpc_vote() rather than tpc_finish() when using single phase commit.) ZEO's sortKey is: 698 def sortKey(self): 699 # If the client isn't connected to anything, it can't have a 700 # valid sortKey(). Raise an error to stop the transaction early. 701 if self._server_addr is None: 702 raise ClientDisconnected 703 else: 704 return '%s:%s' % (self._storage, self._server_addr) http://zope3.pov.lt/trac/browser/ZODB/trunk/src/ZEO/ClientStorage.py#L698 (self._storage defaults to the string '1'.) This should mean that ZEO always gets a sortKey like '1:./zeosock' in the example given whereas Storm gets a sortKey like 'aa_storm_12345' (though the final number will vary per transaction.) Which should mean a consistent sort order and ZEO always committing first. It seems StormDataManager only commits in tpc_finish, doing nothing in either of commit() or tpc_vote() stages when in 1PC mode. As ZEO sorts first a failure to commit by Storm could never abort the ZEO server's
Re: [ZODB-Dev] ZODB via Pipe/Socket
On 20 March 2012 16:52, Adam Tauno Williams awill...@whitemice.org wrote: It is possible to open a ZODB in a thread and share it to other threads via a filesystem socket or pipe [rather than a TCP conntection]? I've searched around and haven't found any reference to such a configuration. This resolved bug report suggests you can using ZEO: https://bugs.launchpad.net/zodb/+bug/663259 Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Build compression into ZODB 3.11?
On 14 March 2012 17:47, Jim Fulton j...@zope.com wrote: I'm pretty happy with how zc.zlibstorage has worked out. Should I build this into ZODB 3.11? +1 BTW, lz4 compression looks interesting. The Python binding (at least from PyPI) is broken. I submitted an issue. Hopefully it will be fixed. FWIW, I experimented with c_zlib from https://gist.github.com/242459 in order to use a zlib default dictionary - a 32KB string used to pre-fill the compression buffer. Using a ~75MB Data.fs from a Plone site that compressed down to ~30MB with zc.zlibstorage normally, the most successful dictionary I tried was the end of the Data.fs itself which saved only an additional 6% over an empty dictionary. That feels like an unfair test to me, probably deduplicating serialized catalog bucket values. The next best was the last 32KB from another Plone Data.fs which only managed to save an additional 2.5% and a fairly short dictionary with common pickled classes saved an additional 2%. None of those savings seem worthwhile pursuing further given the extra brittleness involved. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Server-side caching
On 13 February 2012 10:06, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: The OS' file-system cache acts as a storage server cache. The storage server does (essentially) no processing to data read from disk, so an application-level cache would add nothing over the disk cache provided by the storage server. I see, then I guess it would be good to have at least the same amount of RAM as the total size of the DB, no? From what I see in our server, the linux buffer cache takes around 13GB of the 16G available, while the rest is mostly taken by the ZEO process (1.7G). The database is 17GB on disk. Adding enough memory so the database fits in RAM is always a good idea. Since the introduction of blobs, this should be possible (and relatively cheap) for most ZODB deployments. For Plone sites, a 30GB pre-blobs Data.fs typically falls to 2-3GB with blobs. There's also the wrapper storage zc.zlibstorage which compresses ZODB records allowing more of the database to fit in RAM (RelStorage has an option to compress records.) Also note that, for better or worse, FileStorage uses an in-memory index of current record positions, so no disk access is needed to find current data. Yes, but pickles still have to be retrieved, right? I guess this would mean random access (for a database like ours, in which we have many small objects), which doesn't favor cache performance. I'm asking this because in the tests we've made wih SSDs we have seen a 20% decrease in reading time for non-client-cached objects. So, there seems to be some disk i/o going on. The mean performance improvement doesn't tell the whole story here. With most of you database in the file-system cache median read times will be identical, but your 95th percentile read times will show a huge decrease as the seek time on an SSD is orders of magnitude lower than the seek time of a spinning disk. Even when you have enough RAM so the OS can cache the database in memory, I still think SSDs are worthwhile. Packing the database, backing up or any operation that churns through the disk can all cause the database to drop out of the file-system cache. Be sure to choose an SSD with capacitor backup so it won't lose your data, see: http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html. In general, I'd say no. It can depend on lots of details, including: - database size - active set size - network speed - memory and disk speeds on clients and servers - ... In any case, from what I see, these client caches cannot be shared between processes, which doesn't make them very useful , in which we have many parallel processes asking for the same objects over and over again. You could try a ZEO fanout setup too, where you have a ZEO server running on each client machine. The intermediary ZEO's client cache (you could put it on tmpfs if you have enough RAM) is then shared between all the clients running on that machine. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] zeopack error
On 9 February 2012 11:24, Jim Fulton j...@zope.com wrote: I'm sorry I haven't had time to look at this. Still don't really. Thanks Marius!!! On Wed, Feb 8, 2012 at 6:48 PM, Marius Gedminas mar...@gedmin.as wrote: On Thu, Feb 09, 2012 at 01:25:48AM +0200, Marius Gedminas wrote: On Wed, Feb 08, 2012 at 01:24:55PM +0100, Kaweh Kazemi wrote: Recap: last week I examined problems I had packing our 4GB users storage. ... unp = pickle.Unpickler(f) unp.persistent_load = lambda oid: 'persistent reference %r' % oid pprint.pprint(unp.load()) {'data': {persistent reference ['m', ('game', '\\x00\\x00\\x00\\x00\\x00\\x00\\tT', class '__main__.Tool')]: 1, persistent reference ['m', ('game', '\\x00\\x00\\x00\\x00\\x00\\x00\\x12\\x03', class '__main__.EnergyPack')]: 1}} Note the reference to __main__. This is almost certainly the root problem. Classes shouldn't be defined in __main__ (except when experimenting). At one time, I thought pickle disallowed pickling classes from __main__. ZODB probably should. It's a bug magnet. Those look like cross-database references to me. The original error (aaaugh Mutt makes it hard for me to look upthread while I'm writing a response) was something about non-hashable lists? Looks like a piece of code is trying to put persistent references into a dict, which can't possibly work in all cases. ... During my checks I realized that running the pack in a Python 2.7 environment (using the same ZODB version - 3.10.3) works fine, the pack reduces our 4GB storage to 1GB. But our production server uses Python 2.6 (same ZODB3.10.3) which yields the problem (though the test had been done on OS X 10.7.3 - 64bit, and the production server is Debian Squeeze 32bit). I've no idea why running the same ZODB version on Python 2.7 instead of 2.6 would make this error go away. Duh! The code that fails is in the standard library -- in the cPickle module: Traceback (most recent call last): ... File /usr/local/lib/python2.6/dist-packages/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/FileStorage/fspack.py, line 328, in findrefs return self.referencesf(self._file.read(dh.plen)) File /usr/local/lib/python2.6/dist-packages/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/serialize.py, line 630, in referencesf u.noload() TypeError: unhashable type: 'list' Since the bug is in the stdlib, it's not surprising that the newer stdlib cPickle from Python 2.7 fixes it. I suspect a bug in the application (defining persistent classes in __main__) is the root problem that's aggravated by the cPickle problem. The pickle's classes were defined in a normal module, I think Marius just aliased those to modules to __main__ and defined the classes there in order to load the pickle without the original code: sys.modules['game.objects.item'] = sys.modules['__main__'] # hack sys.modules['game.objects'] = sys.modules['__main__'] # hack sys.modules['game'] = sys.modules['__main__'] # hack Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app
On 24 January 2012 13:50, steve st...@lonetwin.net wrote: Hi All, I apologize for the cross-post but by this mail I simply hope to get a few pointers on how to narrow down to the problem I am seeing. I shall post to the relevant list if I have further questions. So here is the issue: Short description: I've got a repoze.bfg application running on top of zeo/zodb across multiple servers, served using mod_wsgi and it's showing bad resource usage (both high memory consumption as well as CPU usage). Are there any steps i can do to localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or usage ? Long description: * I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo, version 3.10.2) as the backend and is served up using apache+mod_wsgi. All running on a minimal debian 6.0 based amazon instances. * The architecture is 1 zodb server and 4 app instances running on individual EC2 instances (all in the same availability zone). All of the instances are behind an amazon Elastic Load Balancer * At the web-server, we don't customize apache much (ie: we pretty much use the stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the application in daemon mode, with the following parameters: WSGIDaemonProcess webapp user=appname threads=7 processes=4 maximum-requests=1 python-path=/path/to/virtualenv/eggs * The web app is the only thing that is served from these instances and we serve the static content for the using apache rather than the web app. * The zodb config on the db server looks like: zeo address 8886 read-only false invalidation-queue-size 1000 pid-filename $INSTANCE/var/ZEO.pid # monitor-address 8887 # transaction-timeout SECONDS /zeo blobstorage 1 filestorage path $INSTANCE/var/webapp.db /filestorage blob-dir $INSTANCE/var/blobs /blobstorage * The zeo connection string (for repoze.zodbconn-0.11) is: zodb_uri = zeo://zodb server ip:8886/?blob_dir=/path/to/var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true (Note: the drop_cache_rather_verify=true is for faster startups) Now with this, on live we have typical load such as: top - 13:34:54 up 1 day, 8:22, 2 users, load average: 11.87, 8.75, 6.37 Tasks: 85 total, 2 running, 82 sleeping, 0 stopped, 1 zombie Cpu(s): 81.1%us, 6.7%sy, 0.0%ni, 11.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.2%st Mem: 15736220k total, 7867340k used, 7868880k free, 283332k buffers Swap: 0k total, 0k used, 0k free, 1840876k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5079 appname 21 0 1587m 1.2g 6264 S 77 8.1 9:23.86 apache2 5065 appname 20 0 1545m 1.2g 6272 S 95 7.9 9:31.24 apache2 5144 appname 20 0 1480m 1.1g 6260 S 86 7.4 5:49.92 apache2 5127 appname 20 0 1443m 1.1g 6264 S 94 7.2 7:13.10 apache2 As you can see that very high load avg. and the apache processes spawned for mod_wsgi (identifiable because of the user whose context they run under) consume about 1.2Gs resident memory each. With a constant load like this, the app. response progressively degrades. We've tried to tweak the number of processes, the cache_size in the zeo connection string but all to no avail. So, now rather than shoot in the dark, I would appreciate suggestions on how I might be able to isolate the bottle-neck in the stack. One thing to note is that is high load and memory usage is only seen on the production instances. When we test the app. using ab or funkload on a similar setup (2 app instances instead of 4), we do not see this problem. Any pointers/comments would be appreciated. (Following up only on zodb-dev as I'm not subscribed to the other lists.) I'm guessing, but I suspect your load tests may only be reading from the ZODB so you rarely see any cache misses. The most important tuning paramaters for ZODB in respect to memory usage are the number of threads and the connection_cache_size. The connection_cache_size controls the number of persistent objects kept live in the interpreter at a time. It's a per-connection setting and as each thread needs its own connection. Memory usage increases proportionally to connection_cache_size * number of threads. Most people use either one or two threads per process with the ZODB. I know plone.recipe.zope2instance defaults to two threads per process, though I think this is only to avoid locking up in the case of Plone being configured to load an RSS feed from itself. The Python Global Interpreter Lock prevents threads from running concurrently, so with ZEO running so many threads per process is likely to be counter-productive. Try with one or two threads and perhaps up the connection_cache_size (though loading from the zeo cache is very quick you must ensure your working set fits in the connection cache or else you'll be loading the
Re: [ZODB-Dev] zeo.memcache
On 12 October 2011 23:53, Shane Hathaway sh...@hathawaymix.org wrote: As I see it, a cache of this type can take 2 basic approaches: it can either store {oid: (state, tid)}, or it can store {(oid, tid): (state, last_tid)}. The former approach is much simpler, but since memcache has no transaction guarantees whatsoever, it would lead to consistency errors. The latter approach makes it possible to avoid all consistency errors even with memcache, but it requires interesting algorithms to make efficient use of the cache. I chose the latter. On first reading I had thought that the {oid: (state, tid)} approach would not necessarily lead to consistency errors as a connection could simply discard cached values where the cached state tid is later than the current transaction's last tid. But I guess that it must be impossible for a committing connection to guarantee that all cached oids remain invalidated during a commit and are not refilled with a previous state by another connection performing a read. This would necessitate the same checkpointing algorithm to avoid consistency errors. I sometimes wonder if it would be better to separate the maintenance of the oid_tid mapping from the storage of object states. A database storing only the oid_tid mapping and enough previous tids to support current transactions -- essentially the Data.fs.index -- would always fit easily in RAM and could conceivably be replicated to every machine in a cluster to ensure fast lookups. The storage / caching of object states could then be very simple. Laurence ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Corrupted OOTreeSet - strange behavior
On 18 July 2011 11:07, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello, I have an OOTreeSet in my DB that is behaving a bit funny (seems to be corrupted). I thought I could get some more information by performing a sanity check, but that doesn't seem to help a lot: c in s False c in list(s) True s._check() Shouldn't there be an error in this case? TreeSets are essentially BTrees with only keys. This means that the members of a TreeSet must have a stable ordering. I suspect that that c's class does not define the comparison methods (such as __lt__) which means under Python 2 it falls back to the default ordering based on object id (Python 3 will raise a TypeError instead, avoiding this problem.) With ZODB an object's Python id (the memory address of the object) will change whenever it is reloaded, i.e. across restarts, after invalidation or removal from the cache. A TreeSet is ordered, so the contains comparison only needs to perform a lookup to see whether an object is a member of the TreeSet, as the id of the object has changed, its expected position has changed and it is not found. A list is not ordered, so it has to check against every object in the list to test for containment. The _check() method only confirms that the BTree/TreeSets's internal data structure is consistent. It does not check every item. So it does not show an error in this case. You will need to add comparison methods for the class of the objects you are storing in the TreeSet and then rebuild the TreeSets. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Corrupted OOTreeSet - strange behavior
On 18 July 2011 13:08, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: TreeSets are essentially BTrees with only keys. This means that the members of a TreeSet must have a stable ordering. I suspect that that c's class does not define the comparison methods (such as __lt__) which means under Python 2 it falls back to the default ordering based on object id (Python 3 will raise a TypeError instead, avoiding this problem.) With ZODB an object's Python id (the memory address of the object) will change whenever it is reloaded, i.e. across restarts, after invalidation or removal from the cache. Yes, I know that. But I have a __cmp__ function defined, based on an object property that never changes. That should be enough, no? I think it should, but are you absolutely certain it never changes? Does list(s) == sorted(list(s)) and does list(s) == list(OOTreeSet(s))? Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RFC: Blobs in S3
On 6 July 2011 19:44, Jim Fulton j...@zope.com wrote: We're evaluating AWS for some of our applications and I'm thinking of adding some options to support using S3 to store Blobs: 1. Allow a storage in a ZEO storage server to store Blobs in S3. This would probably be through some sort of abstraction to make this not actually depend on S3. It would likely leverage the fact that a storage server's interaction with blobs is more limited than application code. 2. Extend blob objects to provide an optional URL to fetch data from. This would allow applications to provide S3 (or similar service) URLs for blobs, rather than serving blob data themselves. 2.1 If I did this I think I'd also add a blob size property, so you could get a blob's size without opening the blob file or downloading it from a database server. Option 3. Handle blob URLs at the application level. To make this work for the S3 case, I think we'd have to use a ZEO server connection to be called by application code. Something like: self.blob = ZODB.blob.Blob() f = self.blob.open('w') f.write(some_data) Option 1 is fairly straightforward, and low risk. Option 2 is much trickier: - It's an API change - There are bits of implementation that depend on the current blob record format. I'm not sure if these bits extend beyond the ZODB code base. - The handling of blob object state would be a little delicate, since some of the state would be set on the storage server. - The win depends on being able to load a blob file independently of loading blob objects, although the ZEO blob cache implementation already depends on this. Adding the ability to store blobs in S3 would be an excellent feature for AWS based deployments. I'm not convinced that presenting S3 urls to the end users is terribly useful as there is no ability to set a Content-Disposition header and the url will not end with the correct file extension, which will cause problems for users downloading files. I would imagine a more common setup would be to serve the S3 stored blobs through a proxy server running in EC2, using something similar to Nginx's X-Accel-Redirect. Lovely Systems has some information on generating an S3 Authrorization header in Nginx here: http://www.lovelysystems.com/nginx-as-an-amazon-s3-authentication-proxy-2/ - though generating an authenticated S3 URL in Python to set in the X-Accel-Redirect header would lead to much simpler proxy configuration. In either case though, I don't see why doing so would necessitate changing the blob record format - presumably a blob's url can be simply mapped from the S3 blobstorage configuration and a blob's oid and tid? Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RFC: Blobs in S3
On 7 July 2011 16:55, Jim Fulton j...@zope.com wrote: On Thu, Jul 7, 2011 at 10:49 AM, Laurence Rowe l...@lrowe.co.uk wrote: ... One thing I found with my (rather naive) experiments building s3storage a few years ago is that you need to ensure requests to S3 are made in parallel to get reasonable performance. This would be a lesser problem with blobs, but even then you might have multiple file uploads in the same request. The boto library is really useful, but doesn't support async requests. Right, it occurred to me that commit performance with s3 might be an issue. I guess the simplest implementation would only upload a blob to S3 in tpc_begin as that is where the tid is set (and presumably the tid will form part of the blob's S3 url.) With large files that might make tpc_begin take a long time to complete as it waits for the blob data to be loaded into S3. It might be better to upload large blobs to a temporary s3 url first and then only make an S3 copy in tpc_begin, you'd need to do some benchmarks to see if this was worthwhile for all files or only files over a certain size. I think I get where you're going, although I'd quibble with the details. There is certainly some opportunity for doing things in parallel up until you get to tpc_vote. I wonder if renames in S3 take much time. I can image that they do. Thinking about this again, perhaps it would be better to store a url or uuid in the blob's record. This would allow a blob's S3 url to be assigned much earlier as it need not contain the tid. The commit would not then need to involve any requests to S3 at all. While I don't suppose an S3 copy request should be any slower than a zero byte PUT (S3 only promises eventual consistency), you still need to pay the latency. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Immutable blobs?
On 9 May 2011 13:32, Hanno Schlichting ha...@hannosch.eu wrote: On Mon, May 9, 2011 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote: While looking at the Plone versioning code the other day, it struck me that it would be much more efficient to implement file versioning if we could rely on blobs never changing after their first commit, as a copy of the file data would not need to be made proactively in the versioning repository incase the blob was changed in a future transaction. Subclassing of blobs is not supported, but looking at the code I didn't see anything that actively prevented this other than the Blob.__init__ itself. Is there something I've missed here? I had thought that an ImmutableBlob could be implemented by overriding the open and consumeFile methods of Blob to prevent modification after first commit. I thought blobs are always immutable by design? Blobs can be opened writable in subsequent transactions with blob.open('w'). This leads to the blob storage creating a new file when the transaction is committed - the naming scheme is basically oid/tid.blob. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] How to check for setting the same values on persistent objects?
On 4 May 2011 10:53, Hanno Schlichting ha...@hannosch.eu wrote: Hi. I tried to analyze the overhead of changing content in Plone a bit. It turns out we write back a lot of persistent objects to the database, even tough the actual values of these objects haven't changed. Digging deeper I tried to understand what happens here: 1. persistent.__setattr__ will always set _p_changed to True and thus cause the object to be written back 2. Some BTree buckets define the VALUE_SAME macro. If the macro is available and the new value is the same as the old, the change is ignored 3. The VALUE_SAME macro is only defined for the int, long and float value variants but not the object based ones 4. All code in Products.ZCatalog does explicit comparisons of the old and new value and ignores non-value-changes. I haven't seen any other code doing this. I'm assuming doing a general check for old == new is not safe, as it might not be implemented correctly for all objects and doing the comparison might be expensive. But I'm still curious if we could do something about this. Some ideas: 1. Encourage everyone to do the old == new check in all application code before setting attributes on persistent objects. Pros: This works today, you know what type of values you are dealing with and can be certain when to apply this, you might be able to avoid some computation if you store multiple values based on the same input data Cons: It clutters all code 2. Create new persistent base classes which do the checking in their __setattr__ methods Pros: A lot less cluttering in the application code Cons: All applications would need to use the new base classes. Developers might not understand the difference between the variants and use the checking versions, even though they store data which isn't cheap to compare 2.a. Create new base classes and do type checking for built-in types Pros: Safer to use than always doing value comparisons Cons: Still separate base classes and overhead of doing type checks 3. Compare object state at the level of the pickled binary data This would need to work at the level of the ZODB connection. When doing savepoints or commits, the registered objects flagged as _p_changed would be checked before being added to the modified list. In order to do this, we need to get the old value of the object, either by loading it again from the database or by keeping a cache of the non-modified state of all objects. The latter could be done in persistent.__setattr__, where we add the pristine state of an object into a separate cache before doing any changes to it. This probably should be a cache with an upper limit, so we avoid running out of memory for connections that change a lot of objects. The cache would only need to hold the binary data and not unpickle it. Pros: On the level of the binary data, the comparisons is rather cheap and safe to do Cons: We either add more database reads or complex change tracking, the change tracking would require more memory for keeping a copy of the pristine object. Interactions with ghosted objects and the new cache could be fragile. 4. Compare the binary data on the server side Pros: We can get to the old state rather quickly and only need to deal with binary string data Cons: We make all write operations slower, by adding additional read overhead. Especially those which really do change data. This won't work on RelStorage. We only safe disk space and cache invalidations, but still do the bulk of the work and sent data over the network. I probably missed some approaches here. None of the approaches feels like a good solution to me. Doing it server side (4) is a bad idea in my book. Option 3 seems to be the most transparent and safe version, but is also the most complicated to write with all interactions to other caches. It's also not clear what additional responsibilities this would introduce for subclasses of persistent which overwrite various hooks. Maybe option one is the easiest here, but it would need some documentation about this being a best practice. Until now I didn't realize the implications of setting attributes to unchanged values. Persistent objects are also used as a cache and in that case code relies on an object being invalidated to ensure its _v_ attributes are cleared. Comparing at the pickle level would break these caches. I suspect that this is only really a problem for the catalogue. Content objects will always change on the pickle level when they are invalidated as they will have their modification date updated. I imagine you also see archetypes doing bad things as it tends to store one persistent object per field, but that is just bad practise. It would be interesting to see the performance impact of adding newvalue != oldvalue checks on the catalogue data structures. This would also prevent the unindex logic being called unnecessarily. I don't think that the dobbin requirement
Re: [ZODB-Dev] transaction as context manager, exception during commit
On 24 February 2011 10:17, Chris Withers ch...@simplistix.co.uk wrote: Hi Jim, The current __exit__ for transaction managers looks like this: def __exit__(self, t, v, tb): if v is None: self.commit() else: self.abort() ..which means that if you're using the transaction package as a context manager and, say, a relational database integrity constraint is violated, then you're left with a hosed transaction that still needs aborting. How would you feel about the above changing to: def __exit__(self, t, v, tb): if v is None: try: self.commit() except: self.abort() raise else: self.abort() If this is okay, I'll be happy to write the tests and make the changes provided someone does a release when I have... Looking at the way ZPublisher handles this, I think you're right. I think you might also need to modify the __exit__ in Attempt, which additionally handles retrying transactions that fail. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage pack with history-free storage results in POSKeyErrors
On 26 January 2011 21:57, Jürgen Herrmann juergen.herrm...@xlhost.de wrote: is there a script or some example code to search for cross db references? i'm also eager to find out... for now i disabled my packing cronjobs. Packing with garbage collection disabled (pack-gc = false) should definitely be safe. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage pack with history-free storage results in POSKeyErrors
On 26 January 2011 23:11, Chris Withers ch...@simplistix.co.uk wrote: On 26/01/2011 22:49, Laurence Rowe wrote: On 26 January 2011 21:57, Jürgen Herrmannjuergen.herrm...@xlhost.de wrote: is there a script or some example code to search for cross db references? i'm also eager to find out... for now i disabled my packing cronjobs. Packing with garbage collection disabled (pack-gc = false) should definitely be safe. Am I right in thinking this is pointless if you're using a history-free storage? Yes. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage and PosKey errors - is this a risky hotfix?
On 24 January 2011 21:28, Shane Hathaway sh...@hathawaymix.org wrote: On 01/24/2011 02:02 PM, Anton Stonor wrote: Hi there, We have recently experienced a couple of PosKey errors with a Plone 4 site running RelStorage 1.4.1 and Mysql 5.1. After digging down we found that the objects that were throwing PosKeyErrors actually existed in the object_state table with pickles etc, however not in the current_object table. After inserting the missing pointers into the current_object table, everything worked fine: mysql SELECT zoid, tid FROM object_state WHERE zoid=561701; +++ | zoid | tid | +++ | 561701 | 255267099158685832 | +++ mysql INSERT INTO current_object(zoid, tid) VALUES('561701', '255267099158685832'); Looks like it works -- but is this a safe way to fix PosKeyErrors? Now, I wonder why these pointers were deleted from the current_object table in the first place. My money is on packing -- and it might fit with the fact that we recently ran a pack that removed an unusual large amount of transactions in a single pack (100.000+ transactions). But I don't know how to investigate the root cause further. Ideas? This suggests MySQL not only lost some data (due to a MySQL bug or a filesystem-level error), but it failed to enforce a foreign key that is supposed to ensure this never happens. I think you need to check the integrity of your filesystem (e2fsck -f) and database (mysqlcheck -c). You might also reconsider the choice to use MySQL. Must this imply a failure to maintain a foreign key constraint? While there are FK constraints on current_object (zoid, tid) - object_state (zoid, tid) there is no foreign key that might prevent a current_object row from being incorrectly deleted. I think that means the possibilities are: 1. The current_object table was not updated properly during a commit or corrupted so that some rows were lost. 2. Something goes wrong during pack gc (either in the pack logic or on the database). 3. Database corruption. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage recommended maintenance
On 21 January 2011 20:57, Shane Hathaway sh...@hathawaymix.org wrote: On 01/21/2011 10:46 AM, Chris Withers wrote: I'm wondering what the recommended maintenance for these two types of storage are that I use: - keep-history=true, never want to lose any revisions My guess is zodbpack with pack-gc as true, but what do I specify for the number of days in order to keep all history? Is 100 years enough? 365.24 * 100 = 36524 ;-) Why would you pack a database from which you don't want to lose any revisions. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Plone in P2P using Zope over DHT
I'm not very optimistic about this I'm afraid. First the problems with using Plone: * Plone relies heavily on its in ZODB indexes of all content (portal_catalog). This means that every edit will change lots of objects (without versioning ~15-20, most of which are in the catalogue). * At least with archetypes a content object's data is spread over multiple objects. (This should be better with Dexterity, though you will still have multiple objects for locking and workflow) * If you use versioning you'll see ~ 100 objects changed in an edit. * Even loading the front-page will take a long time - In my experiments writing an amazon s3 backend for ZODB the extra latency of fetching each object was really noticeable. But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT: * ZODB is transactional using two phase commit. With p2p latencies, these commits will be horribly slow - all clients storing changed objects would need to participate in the transaction. * Each client's object cache will need to know about invalidations, I don't see any way of supplying these from a DHT. I expect you'd have more success storing content items as single content objects / pages in the DHT and then generating indexes based on that. You'll need some way of storing parent - child relationships between the content objects too, as updating a single list of children object will be incredibly difficult to get right in a distributed system. Laurence On 4 January 2011 11:40, Aran Dunkley a...@organicdesign.co.nz wrote: Thanks for the feedback Vincent :-) it sounds like NEO is pretty close to being SQL-free. As one of the NEO team, what are your thoughts on the practicality of running Plone in a P2P environment with the latencies experienced in standard DHT (such as for example those based on Kademlia) implemtations? On 04/01/11 22:27, Vincent Pelletier wrote: Hi. Le mardi 4 janvier 2011 07:18:34, Aran Dunkley a écrit : The problem is that it uses SQL for its indexing queries (they quote NoSQL as meaning Not only SQL). SQL cannot work in P2P space, but can be made to work on server-clusters. Yes, we use MySQL, and it bites us on both worlds actually: - in relational world, we irritate developers as we ask questions like why does InnoDB load a whole row when we just select primary key columns, which ends up with don't store blobs in mysql - in key-value world, because NoSQL using MySQL doesn't look consistent So, why do we use MySQL in NEO ? We use InnoDB as an efficient BTree implementation, which handles persistence. We use MySQL as a handy data definition language (NEO is still evolving, we need an easy way to tweak table structure when a new feature requires it), but we don't need any transactional isolation (each MySQL process used for NEO is accessed by only one process through one connection). We want to stop using MySQL InnoDB in favour of leaner-and-meaner back-ends. I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it requires more work than the existing MySQL adaptor and there are more urgent tasks in NEO. Just as a proof-of-concept, NEO can use a Python BTree implementation as an alternative (RAM-only) storage back-end. We use ZODB's BTree implementation, which might look surprising as it's designed to be stored in a ZODB... But they work just as well in-RAM, and that's all I needed for such proof-of- concept. [1] http://fallabs.com/kyotocabinet/ Regards, ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage support to Microsoft SQLServer
On 17 November 2010 16:34, Alan Runyan runy...@gmail.com wrote: I have read that there is a problem to implement MS-SQL adapter for Relstorage because the “Two phase commit” feature is not exposed by MS-SQL server . unsure about that. probably depends on the client access library. At least when I looked at pyodbc/FreeTDS in 2008 FreeTDS did not have support for the tds packets necessary for joining an XA transaction. (FreeTDS is the odbc - SQL Server driver used on unix.) See: http://article.gmane.org/gmane.comp.db.tds.freetds/9598. I did have some more information on a SQLAlchemy wiki page but that seems to have gone now. However, two phase commit may not be necessary with the current version of RelStorage - it's not used with PostgreSQL anymore. Is there solution to overcome this problem, Without introducing too many layers? Can we use PyMSSQL and ADODB Python extension to implement the relstorage Adapter for MS-SQL. i recently had a discussion with some guys about this. i am unsure what their analysis was. but my opinion: - adodbapi is not good. - pymssql i've not used - pyodbc we used but it doesnt support storedprocs. works ok. - mxodbc we use and highly recommend. yes mxodbc costs money but you have support. i spoke with shane about this in the past about which library would he probably use if he were to support mssqlserver and his unresearched/not definitive answer was mxodbc. mainly because its supported and has been in production usage for almost a decade. I've used stored procedures with pyodbc: http://code.google.com/p/pyodbc/wiki/StoredProcedures Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] RelStorage support to Microsoft SQLServer
On 17 November 2010 17:05, Laurence Rowe l...@lrowe.co.uk wrote: On 17 November 2010 16:34, Alan Runyan runy...@gmail.com wrote: I have read that there is a problem to implement MS-SQL adapter for Relstorage because the “Two phase commit” feature is not exposed by MS-SQL server . unsure about that. probably depends on the client access library. At least when I looked at pyodbc/FreeTDS in 2008 FreeTDS did not have support for the tds packets necessary for joining an XA transaction. (FreeTDS is the odbc - SQL Server driver used on unix.) See: http://article.gmane.org/gmane.comp.db.tds.freetds/9598. I did have some more information on a SQLAlchemy wiki page but that seems to have gone now. Found that here: http://www.sqlalchemy.org/trac/wiki/MSSQLTwoPhaseCommit Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] zodb monitor port / tailing a .fs
On 14 October 2010 01:28, Darryl Dixon - Winterhouse Consulting darryl.di...@winterhouseconsulting.com wrote: On 13/10/2010 15:23, Jim Fulton wrote: You can connect to the monitor port in 3.9 and earlier, if the monitor port is configured. In 3.10, the monitor server is replaced by a ZEO client method, server_status. This tells you much the same information that's in the log messages. Okay, monitor port up and running now. I see commits listed when I'm not expecting any. Do we have any kind of tail -f /some/filestorage.fs yet? (or have we always had such a tool) to see what the last few transactions in the underlying file storage look like in a human-readable form? fsdump.py gets you pretty close (ZODB/scripts/fsdump.py). Between that and the Undo log for the DB inside Zope, you might be able to figure it out... There's also fstail: $ bin/zopepy -m ZODB.scripts.fstail var/filestorage/Data.fs 2010-09-03 22:15:17.658204: hash=2e11770947c4c9af50cfec0183c38b460507cad6 user=' admin' description='/Plone/login_failed' length=1126 offset=8229031 2010-08-21 23:28:12.580279: hash=c1e7af2df41b6506db65681bc7f2f58587cb8b8b user=' admin' description='/Plone/front-page/plone_lock_operations/safe_unlock' length=279 offset=8228776 2010-08-21 23:28:03.903884: hash=9ca763b978c804c87d920945d4c0b5470bb3aad4 user=' admin' description='/Plone/atct_edit' length=919 offset=8227814 2010-08-21 23:28:01.835501: hash=dec67acaa2685822d68d586ef83eff13e12d3e78 user=' admin' description='/Plone/front-page/plone_lock_operations/safe_unlock' length=279 offset=8227562 ... Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] read-only database
On 27 September 2010 18:26, Nathan Van Gheem vangh...@gmail.com wrote: BTW, I thought I could just use the ZPublisherEventsBackup to abort every transaction when zope is in read-only... Kind of hacky, but not too bad :) That sounds really evil, but I guess it should work... plone.app.imaging / plone.scale create scales on demand, caching as an annotation. You could define a different storage method by overriding the plone.app.imaging.scaling.ImageScaling view. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] db.undoLog() does not work as documented ..
On 23 August 2010 17:51, Jim Fulton j...@zope.com wrote: It's worth noting that these are not the docs. I didn't write or review them. I don't have any control over zodb.org. I have no idea how to comment on the docs. (I could possibly find out, but I don't have time to work that hard.) ... This is problematic. I didn't write the docs and the docs are not part of the software. I can't do anything about this I don't know if anyone who can deal with bugs has any control over zodb.org. That website is created from svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk Those docs used to live in ZODB. I converted them to rst from latex to make them easier to edit. They've mostly not been updated since 2002 though. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] db.undoLog() does not work as documented ..
On 23 August 2010 19:08, Jim Fulton j...@zope.com wrote: On Mon, Aug 23, 2010 at 1:08 PM, Laurence Rowe l...@lrowe.co.uk wrote: On 23 August 2010 17:51, Jim Fulton j...@zope.com wrote: It's worth noting that these are not the docs. I didn't write or review them. I don't have any control over zodb.org. I have no idea how to comment on the docs. (I could possibly find out, but I don't have time to work that hard.) ... This is problematic. I didn't write the docs and the docs are not part of the software. I can't do anything about this I don't know if anyone who can deal with bugs has any control over zodb.org. That website is created from svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk Cool. Those docs used to live in ZODB. I converted them to rst from latex to make them easier to edit. I appreciate the good intention. :) Honestly. They've mostly not been updated since 2002 though. :/ Sigh. I'll have to think about what the next step is then. This will probably involve deleting lots of wrong content. Do you maintain zodb.org then? What's the process for updating it? I created the zodbdocs following the examples of zope2docs and zope3docs for docs.zope.org. Jens Vagelpohl added them to the cron job that updates docs.zope.org and is the person responsible for that site as far as I know. At some point they were moved to zodb.org, which runs on the same box. As for updating it, just check in changes to svn. The cron job picks up changes and rebuilds the sphinx docs every hour (or perhaps every day). Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Weird KeyError with OOBTree
On 16 August 2010 13:13, Tres Seaver tsea...@palladion.com wrote: Hanno Schlichting wrote: On Mon, Aug 16, 2010 at 12:14 PM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Could this be some problem with using persistent objects as keys in a BTree? Some comparison problem? I'm not entirely sure about this, but I think using persistent objects as keys isn't supported. Looking at the code, I doubt using anything expect simple types like unicode strings or tuples of simple types will work without further work. From what I can see in the code, BTree's use functions like PyObject_Compare to compare different keys. Persistent doesn't implement any special compare function and falls back to the standard hash algorithm for an object. This happens to be its memory address. The memory address obviously changes over time and the same address gets reused for different objects. I think implementing a stable hash function for your type could make this work though. The ZODB gods correct me please :) Btrees require comparability, rather than hashability: your persistent type needs to define a total ordering[1], which typically means defining '__cmp__' for your class. You could also define just '__eq__' and '__lt__', but '__cmp__' is slightly more efficient. [1]http://www.zodb.org/documentation/guide/modules.html#total-ordering-and-persistence While ZODB 3.8 makes it possible to use Persistent objects as keys in a BTree, it's almost certainly a bad idea as a lookup will incur many more object loads while traversing the BTree as the Persistent keys will have to be loaded before they can be compared. Consider using one of these alternatives instead: * Set the IOTreeSet as an attribute directly on the persistent object. * Use http://pypi.python.org/pypi/zope.intid and use the intid for the key. (This uses http://pypi.python.org/pypi/zope.keyreference which uses the referenced object's oid and database name to perform the comparison, avoiding the need to load the persistent object.) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Weird KeyError with OOBTree
On 16 August 2010 17:29, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Consider using one of these alternatives instead: * Set the IOTreeSet as an attribute directly on the persistent object. You mean on the persistent object I am using as key? Yes. * Use http://pypi.python.org/pypi/zope.intid and use the intid for the key. (This uses http://pypi.python.org/pypi/zope.keyreference which uses the referenced object's oid and database name to perform the comparison, avoiding the need to load the persistent object.) This looks really nice. However it seems to depend on a lot of zope libraries that I'm currently including: location, component, security... well, I guess they're not that large. I will give it a look, maybe I'll use it. I guess you could avoid the dependencies by using (obj._p_jar.db().database_name, obj._p_oid) as the key. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] SpatialIndex
On 28 June 2010 15:23, Nitro ni...@dr-code.org wrote: Am 28.06.2010, 14:10 Uhr, schrieb Dylan Jay d...@pretaweb.com: I don't use a lot of other indexes other than what comes with plone but I can see the value of what your suggesting in having an installable tested collection of indexes. I can also see that this is a really big itch for you and you've already identified a bunch of candidates to include. So why not go the next step and create a package that's nothing more than requires specifications and release it with versions corresponding to zodb releases. Then others may find it useful and help you support it. And if it's really popular it may even get taken into consideration with zodb packaging. Who knows? Thanks for your feedback, Dylan. My main problem is not the lack of an index collection. It's one of the problems I faced, but not the main one. Indexing is just a small part of working with a database. The main problem (imo) is that there are already 50 zodb related packages on pypi and none of them gathered a lot of people working on them. I don't see why this should be any different if I publish yet another package. Especially if most people use plone and the built-in indices. Just look at what happened to ZCatalog Standalone. Here's a little metaphor for what I'm trying to say: Once upon a time there was a man who wanted to go to the bakery to buy a bread. He thought I'll be done with this quickly, after all many people want to buy a bread. So he went off to visit the ZBakery. When asking for a bread, the people in the ZBakery told him there's no need to sell whole breads. They said: See, we have all the ingredients here so you can make a bread suiting your own taste. Look, there's ZFlour, ZMilk and ZSalt. And if you rummage the corners of this bakery, you'll also might find ZFlour2, CustomFlour and MyOwnCoolFlour. We don't know if they are any good, because each flour is used by just one or two people.. The man thought about it for a while and went off to try the different flours. When he wanted to try the CustomFlour it did not work. It turned out this was because CustomFlour relied on 3rdPartyMill and 3rdPartyMill had a problem, so CustomFlour was broken. The man shaked his head after he realized a few dozen people already tried to get CustomFlour and nobody pointed out the problem to its producer. Finding out about all of this took the whole morning and so he finally made lunch break. After his lunch break was over he finally found a flour suiting his bread he went off to look at the different milks and salts. He experienced similar problems there. One of the milks had just a label milk on it, the other areas of the packaging were blank. The man had no idea if the milk in question would work for his bread or not. So he had to analyze the contents of the milk to see if it might be useful. It turned out the milk was mislabeled and not a milk. As the sun was already touching the horizon and the air was getting cold the man ignored the milk for the time being and went looking for salt. He did not have to search for long and was delighted to find a single salt which would just work. When the man looked out of a window of the ZBakery he saw it was already dark and went home. When lying in bed he thought to himself: All I wanted to have this morning was a bread. Now I'm about to fall asleep and still don't have one. The bakery even had all the ingredients! But why did they made me try and analyze each ingredient? I even would've taken a bread which tasted a bit worse then the bread which I now have to bake on my own. The other customers of ZBakery surely also want breads, rolls and cake. Aren't they interested in creating a standard package of breads, rolls and cake? If they'd work together on a single bread, they'd all benefit from the improved recipes. New customers would immediately notice that there's a good default bread which many people like. These customers might point their colleagues at ZBakery, because it sells tasty, ready-to-use breads. If there was a special customer he could still bake his own bread using the individual ingredients. Pondering all these things he slided into a deep sleep. When he woke up the next morning he found a handful of committed people who had gathered in the ZBakery to bake and sell their first bread together... There are some valid criticisms in here. One problem with PyPI is that there is no way to clearly mark a package as having been superseded, as zc.relationship was by zc.relation. So why don't we all work on the same packages? The main reason is one of legacy. Plone is built on Zope2 and ZCatalog. It works, but it is not without it's issues - we can't have queries that join from that catalog to a zc.relation catalog. Standalone ZCatalog failed because it came to early - Zope2 was only recently eggified, so to be successful the standalone ZCatalog would need to be used in Zope2.
Re: [ZODB-Dev] SpatialIndex
On 28 June 2010 19:31, Nitro ni...@dr-code.org wrote: Am 28.06.2010, 16:52 Uhr, schrieb Laurence Rowe l...@lrowe.co.uk: So why don't we all work on the same packages? The main reason is one of legacy. Plone is built on Zope2 and ZCatalog. It works, but it is not without it's issues - we can't have queries that join from that catalog to a zc.relation catalog. Standalone ZCatalog failed because it came to early - Zope2 was only recently eggified, so to be successful the standalone ZCatalog would need to be used in Zope2. Nobody has bothered with this because non-legacy code shouldn't be using ZCatalog anyway - there are newer and better ways of doing it. Oh, nice to know. I was already writing test cases for standalone ZCatalog integration in my project as all other indices seemed tied to plone :) In general, if it's not on PyPI it doesn't exist as far as the Zope world is concerned. (I can't find any references to standalone ZCatalog after 2005.) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] SpatialIndex
On 28 June 2010 21:27, Nitro ni...@dr-code.org wrote: ZODB is a general python object database with a much wider audience than just plone. It suits desktop applications just as well as applications you'd normally use twisted and pickle for. Forcing all those zope dependencies like buildout on people does not add to the attractiveness of ZODB for users outside zope. Having indices only in plone does also not make sense. Many applications would benefit from keyword, field, full-text, spatial, younameit indices. Yet extracting individual packages from zope/plone is impossible due to the slew of dependencies. While I can accept a dependency like zope.interface I don't accept a lot of the others. It really prevents ZODB from living up to its full potential in non-plone applications. Remember that Plone is an eight year old application that is built on top of a 12 year old Application server. There has been much progress since then (and plenty of people who build non-Plone ZODB based applications), but the size of the codebase means it is not possible to always be using the current best practice. http://zope2.zope.org/about-zope-2/the-history-of-zope Nobody would recommend that you try to extract stuff from Plone or Zope2. In my opinion there are two main sources of packages for non-Zope2 dependent applications. * The ZTK extracted the core of Zope 3 and is used in application servers such as Grok and BlueBream. It contains zope.catalog and it's related packages. There are several extensions on top of this such as zc.catalog and hurry.query. The 1.0 release has not been there yet, but the underlying packages are stable. Installing zcatalog requires a total of 34 packages (ZODB3 requires 10) http://docs.zope.org/zopetoolkit/releases/packages-trunk.html * The Repoze project has focussed on making zope technologies more easily accessible to applications outside of Zope. Whilst the ZTK project has improved things a lot, it is still a relatively large chunk to swallow whole. repoze.catalog is extracted from zope.catalog and requires only zope.index in addition to ZODB3. http://docs.repoze.org/catalog/ At the very lowest level are the indexes themselves such as zope.index and zc.relation, a spatial index would fit in here too. (Health warning: I'm mostly a Plone developer, so do not yet have experience using these packages) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Spatial indices
It really depends on what you are trying to achieve. The simplest solution would probably be to use a geohash string within an OOBTree. If you need a full geospatial solution, postgis is featureful and easy to use, and simple to integrate transactionally with ZODB. Reinventing the wheel is rarely the right option, though it might be more fun ;) Laurence On 16 June 2010 16:45, Nitro ni...@dr-code.org wrote: Hello, I tried to find a spatial index which integrates seamlessly into the ZODB. Unfortunately I did not find a satisfying solution anywhere. So I came up with three solutions how this could be implemented: 1) Write a native r-tree package, just like the current BTrees. Would likely have to be written in C for performance. 2) Make use of the existing B+ Trees by using a space filling curve such as the Z-curve or Hilbert curve to transform higher-dimensional data into 1D data which can then be stored in a BTree. Since B+ trees also provide range querying capabilities this should give good query performance. Unsure how much speed-up a C implementation of the insert/query functions would give. More info: http://www.scholarpedia.org/article/B-tree_and_UB-tree and http://www.drdobbs.com/184410998 . 3) Use the already existing Rtree package from http://pypi.python.org/pypi/Rtree . It's a thin wrapper of a C library, so it should be very fast. I can see two methods to make this work: 3a) - Create an rtree.RTree (stored in a separate file) and an OOTreeSet. - inserting: insert item into BTree. Then insert item's oid into Rtree. - querying: user supplies bounding box, rtree is queried, oids are returned. look up objects by oid in BTree. - zeo: does not work out-of-the-box with zeo since the Rtrees on different machines are not synchronized. 3b) - Create an rtree.RTree, a OOTreeSet and an IOTree. Difference to 3a): Create RTree with a custom storage manager (example: http://svn.gispython.org/svn/spatialindex/spatialindex/trunk/src/storagemanager/MemoryStorageManager.h and http://trac.gispython.org/spatialindex/browser/spatialindex/trunk/README#storage-manager ). This storage manager stores each page into the IOTree (key: pageId, value: pageData). - inserting: insert item into BTree. Then insert item's oid into Rtree. Causes storage manager to write out changed rtree pages to IOTree. - querying: user supplies bounding box, rtree is queried, pages for rtree returned from IOTree, oids finally returned from query. look up objects by oid in BTree. - zeo: works out-of-the-box with zeo since the rtree pulls its data from a btree (which is hooked up with zeo). Conclusion: 1) Native r-tree package: It is a lot of work which has already been done before. Bug-prone. Ruled out. 2) Spatial index on top of current BTrees: Looks interesting, could be done in python. Disadvantages: unclear UB tree patent situation, unclear how much work this really is. 3a) Does not work with zeo out-of-the-box. Ruled out. 3b) Requires writing a custom storage manager for the rtree package (likely in C). Provides different trees. Basic technology (rtrees + btrees) is tested. Would it make sense to add a default spatial index to ZODB? Does anybody of you have any experience with one of the mentioned solutions? Is anybody else interested in having a zodb spatial index? -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Merge request - transaction savepoint release support
Hi Jim, I've created a new branch for my savepoint release changes following the 1.1 release here: svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release-1.1 This does seem to be a real requirement, as I've had another request to provide this functionality for zope.sqlalchemy - when a large number of savepoints are used, the eventual commit can lead to a `RuntimeError: maximum recursion depth exceeded` in SQLAlchemy as it attempts to unroll its nested substransactions. Laurence On 17 January 2010 15:45, Laurence Rowe l...@lrowe.co.uk wrote: 2010/1/17 Jim Fulton j...@zope.com: On Sat, Jan 16, 2010 at 1:03 PM, Laurence Rowe l...@lrowe.co.uk wrote: I've had a request to add savepoint release support to zope.sqlalchemy as some databases seem to limit the number of savepoints in a transaction. I've added this in a branch of transaction here: svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release From the changelog: * Add support for savepoint.release(). Some databases only support a limited number of savepoints or subtransactions, this provides an opportunity for a data manager to free those resources. * Rename InvalidSavepointRollbackError to InvalidSavepointError (BBB provided.) If there are no objections, I shall merge this to trunk. I'll review and merge. Great, thanks! What does it mean to release a savepoint? How is this different from aborting a save point? I ask particularly in light of: On Sat, Jan 16, 2010 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote: 2010/1/16 Laurence Rowe l...@lrowe.co.uk: I'm still not sure this will allow me to add savepoint release support to zope.sqlalchemy, as SQLAlchemy has a concept of nested transactions rather than savepoints. http://groups.google.com/group/sqlalchemy/browse_thread/thread/7a4632587fd97724 Michael Bayer noted on the sqlalchemy group that on RELEASE SAVEPOINT Postgresql destroys all subsequent savepoints. My branch now implements this behaviour. For zope.sqlalchemy I commit the sqlalchemy substransaction on savepoint.release(). This translates to a RELEASE SAVEPOINT on postgresql, best described by their docs here: RELEASE SAVEPOINT destroys a savepoint previously defined in the current transaction. Destroying a savepoint makes it unavailable as a rollback point, but it has no other user visible behavior. It does not undo the effects of commands executed after the savepoint was established. (To do that, see ROLLBACK TO SAVEPOINT.) Destroying a savepoint when it is no longer needed allows the system to reclaim some resources earlier than transaction end. RELEASE SAVEPOINT also destroys all savepoints that were established after the named savepoint was established. http://developer.postgresql.org/pgdocs/postgres/sql-release-savepoint.html Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On 11 May 2010 15:08, Jim Fulton j...@zope.com wrote: On Tue, May 11, 2010 at 8:38 AM, Benji York be...@zope.com wrote: On Tue, May 11, 2010 at 7:34 AM, Jim Fulton j...@zope.com wrote: [...] The best I've been able to come up with is something like: t = ZODB.transaction(3) while t.trying: with t: ... transaction body ... I think you could get this to work: for transaction in ZODB.retries(3): with transaction: ... transaction body ... ZODB.retries would return an iterator that would raise StopIteration on the next go-round if the previously yielded context manager exited without a ConflictError. This is an improvement. It's still unsatisfying, but I don't think I'm going to get satisfaction. :) BTW, if I do something like this, I think I'll add a retry exception to the transaction package and have ZODB.POSException.ConflictError extend it so I can add the retry automation to the transaction package. The repoze.retry package lets you configure a list of exceptions. http://pypi.python.org/pypi/repoze.retry http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py Though it seems inspecting the error text is required for most sql database errors to know if they are retryable, as ZPsycoPGDA does: 188 except (psycopg2.ProgrammingError, psycopg2.IntegrityError), e: 189 if e.args[0].find(concurrent update) -1: 190 raise ConflictError (https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py) For PostgreSQL it should be sufficient to catch these errors and raise Retry during tpc_vote. For databases which do not provide MVCC in the same way as PostgreSQL, concurrency errors could be manifested at any point in the transaction. Even Oracle can raise an error during a long running transaction when insufficient rollback space is available, resulting in what is essentially a read conflict error. Such errors could not be caught by a data manager and reraised as a Retry exception. I think it might be useful to add an optional method to data managers that is queried by the retry automation machinery to see if an exception should potentially be retried. Perhaps this would best be accomplished in two steps: 1. Add an optional property to data managers called ``retryable``. This is a list of potentially retryable exceptions. When a data manager is added to the transaction, the transaction's list of retryable exceptions is extended by the joining data managers list of retryable exceptions. t = transaction.begin() try: application() except t.retryable, e: t.retry(e): 2. t.retry(e) is then checks with each registered data manager if that particular exceptions is retryable, and if so raises Retry. def retry(self, e): for datamanager in self._resources: try: retry = datamanager.retry except AttributeError: continue if isinstance(e, datamanager.retryable): datamanager.retry(e) # dm may raise Retry here Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
I think this means that you are storing all of your data in a single persistent object, the database root PersistentMapping. You need to break up your data into persistent objects (instances of objects that inherit from persistent.Persistent) for the ZODB to have a chance of performing memory mapping. You want to do something like: import transaction from ZODB import FileStorage, DB from BTrees.LOBTree import BTree, TreeSet storage = FileStorage.FileStorage('/tmp/test-filestorage.fs') db = DB(storage) conn = db.open() root = conn.root() transaction.begin() index = root['index'] = BTree() values = index[1] = TreeSet() values.add(42) transaction.commit() You should probably read: http://www.zodb.org/documentation/guide/modules.html#btrees-package. Since that was written an L variants of the BTree types have been introduced for storing 64bit integers. I'm using an LOBTree because that maps 64bit integers to python objects. For values I'm using an LOTreeSet, though you could also use an LLTreeSet (which has larger buckets). Laurence On 12 May 2010 00:37, Ryan Noon rmn...@gmail.com wrote: Hi Jim, I'm really sorry for the miscommunication, I thought I made that clear in my last email: I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary methods to the ZODB root and allows easy interchangeability with my old sqlite OODB abstraction. wordid_to_docset is a ZMap, which just wraps the ZODB boilerplate/connection and forwards dictionary methods to the root. If this seems superfluous, it was just to maintain backwards compatibility with all of the code I'd already written for the sqlite OODB I was using before I switched to ZODB. Whenever you see something like wordid_to_docset[id] it's just doing self.root[id] behind the scenes in a __setitem__ call inside the ZMap class, which I've pasted below. The db is just storing longs mapped to array('L')'s with a few thousand longs in em. I'm going to try switching to the persistent data structure that Laurence suggested (a pointer to relevant documentation would be really useful), but I'm still sorta worried because in my experimentation with ZODB so far I've never been able to observe it sticking to any cache limits, no matter how often I tell it to garbage collect (even when storing very small values that should give it adequate granularity...see my experiment at the end of my last email). If the memory reported to the OS by Python 2.6 is the problem I'd understand, but memory usage goes up the second I start adding new things (which indicates that Python is asking for more and not actually freeing internally, no?). If you feel there's something pathological about my memory access patterns in this operation I can just do the actual inversion step in Hadoop and load the output into ZODB for my application later, I was just hoping to keep all of my data in OODB's the entire time. Thanks again all of you for your collective time. I really like ZODB so far, and it bugs me that I'm likely screwing it up somewhere. Cheers, Ryan class ZMap(object): def __init__(self, name=None, dbfile=None, cache_size_mb=512, autocommit=True): self.name = name self.dbfile = dbfile self.autocommit = autocommit self.__hash__ = None #can't hash this #first things first, figure out if we need to make up a name if self.name == None: self.name = make_up_name() if sep in self.name: if self.name[-1] == sep: self.name = self.name[:-1] self.name = self.name.split(sep)[-1] if self.dbfile == None: self.dbfile = self.name + '.zdb' self.storage = FileStorage(self.dbfile, pack_keep_old=False) self.cache_size = cache_size_mb * 1024 * 1024 self.db = DB(self.storage, pool_size=1, cache_size_bytes=self.cache_size, historical_cache_size_bytes=self.cache_size, database_name=self.name) self.connection = self.db.open() self.root = self.connection.root() print 'Initializing ZMap %s in file %s with %dmb cache. Current %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root)) # basic operators def __eq__(self, y): # x == y return self.root.__eq__(y) def __ge__(self, y): # x = y return len(self) = len(y) def __gt__(self, y): # x y return len(self) len(y) def __le__(self, y): # x = y return not self.__gt__(y) def __lt__(self, y): # x y return not self.__ge__(y) def __len__(self): # len(x) return len(self.root) # dictionary stuff def __getitem__(self, key): # x[key] return self.root[key] def __setitem__(self, key, value): # x[key] = value self.root[key] = value self.__commit_check() # write back if necessary def __delitem__(self, key): # del x[key] del self.root[key] def get(self, key,
Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints
On 10 May 2010 21:41, Jim Fulton j...@zope.com wrote: A. Change transaction._transaction.AbortSavepoint to remove the datamanager from the transactions resources (joined data managers) when the savepoint is rolled back and abort called on the data manager. Then, if the data manager rejoins, it will have joined only once. Update the documentation of the data manager abort method (in IDataManager) to say that abort is called either when a transaction is aborted or when rolling back to a savepoint created before the data manager joined, and that the data manager is no longer joined to the transaction after abort is called. This is a backward incompatible change to the interface (because it weakens a precondition) that is unlikely to cause harm. I plan to implement A soon if there are no objections. Unless someone somehow convinced me to do D, I'll also add an assertion in the Transaction.join method to raise an error if a data manager joins more than once. Option A sounds sensible. It also means I won't have to change anything in the zope.sqlalchemy data manager. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
I think that moving to an LLTreeSet for the docset will significantly reduce your memory usage. Non persistent objects are stored as part of their parent persistent object's record. Each LOBTree object bucket contains up to 60 (key, value) pairs. When the values are non-persistent objects they are stored as part of the bucket object's record, and so accessing any key of a bucket in a transaction brings up to 60 docsets into memory. I would not be surprised if your program forces most of your data into memory each batch - as most words are in most documents. At the very least you should move to an LLSet (essentially a single BTree bucket). An LLTreeSet has the additional advantage of being scalable to many values, and if under load from multiple clients you are far less likely to see conflicts. Laurence On 11 May 2010 01:20, Ryan Noon rmn...@gmail.com wrote: P.S. About the data structures: wordset is a freshly unpickled python set from my old sqlite oodb thingy. The new docsets I'm keeping are 'L' arrays from the stdlib array module. I'm up for using ZODB's builtin persistent data structures if it makes a lot of sense to do so, but it sorta breaks my abstraction a bit and I feel like the memory issues I'm having are somewhat independent of the container data structures (as I'm having the same issue just with fixed size strings). Thanks! -Ryan On Mon, May 10, 2010 at 5:16 PM, Ryan Noon rmn...@gmail.com wrote: Hi all, I've incorporated everybody's advice, but I still can't get memory to obey cache-size-bytes. I'm using the new 3.10 from pypi (but the same behavior happens on the server where I was using 3.10 from the new lucid apt repos). I'm going through a mapping where we take one long integer docid and map it to a collection of long integers (wordset) and trying to invert it into a mapping for each 'wordid in those wordsets to a set of the original docids (docset). I've even tried calling cacheMinimize after every single docset append, but reported memory to the OS never goes down and the process continues to allocate like crazy. I'm wrapping ZODB in a ZMap class that just forwards all the dictionary methods to the ZODB root and allows easy interchangeability with my old sqlite OODB abstraction. Here's the latest version of my code, (minorly instrumented...see below): try: max_docset_size = 0 for docid, wordset in docid_to_wordset.iteritems(): for wordid in wordset: if wordid_to_docset.has_key(wordid): docset = wordid_to_docset[wordid] else: docset = array('L') docset.append(docid) if len(docset) max_docset_size: max_docset_size = len(docset) print 'Max docset is now %d (owned by wordid %d)' % (max_docset_size, wordid) wordid_to_docset[wordid] = docset wordid_to_docset.garbage_collect() wordid_to_docset.connection.cacheMinimize() n_docs_traversed += 1 if n_docs_traversed % 100 == 1: status_tick() if n_docs_traversed % 5 == 1: self.do_commit() self.do_commit() except KeyboardInterrupt, ex: self.log_write('Caught keyboard interrupt, committing...') self.do_commit() I'm keeping track of the greatest docset (which would be the largest possible thing not able to be paged out) and its only 10,152 longs (at 8 bytes each according to the array module's documentation) at the point 75 seconds into the operation when the process has allocated 224 MB (on a cache_size_bytes of 64*1024*1024). On a lark I just made an empty ZMap in the interpreter and filled it with 1M unique strings. It took up something like 190mb. I committed it and mem usage went up to 420mb. I then ran cacheMinimize (memory stayed at 420mb). Then I inserted another 1M entries (strings keyed on ints) and mem usage went up to 820mb. Then I committed and memory usage dropped to ~400mb and went back up to 833mb. Then I ran cacheMinimize again and memory usage stayed there. Does this example (totally decoupled from any other operations by me) make sense to experienced ZODB people? I have really no functional mental model of ZODB's memory usage patterns. I love using it, but I really want to find some way to get its allocations under control. I'm currently running this on a Macbook Pro, but it seems to be behaving the same way on Windows and Linux. I really appreciate all of the help so far, and if there're any other pieces of my code that might help please let me know. Cheers, Ryan On Mon, May 10, 2010 at 3:18 PM, Jim Fulton j...@zope.com wrote: On Mon, May 10, 2010 at 5:39 PM, Ryan Noon rmn...@gmail.com wrote: First off, thanks everybody. I'm
Re: [ZODB-Dev] Changing the pickle protocol?
I suspect that something like 90% of ZODB pickle data will be string values, so the scope for reducing the space used by a ZODB through the newer pickle protocol – and even the class registry – is limited. What would make a significant impact on data size is compression. With lots of short strings it's probably best to use a preset dictionary (which sadly does not seem to be exposed through the python zlib module). Text is usually very amenable to compression, and now we have blobs most binary data will no longer be in the Data.fs. Compression could either be implemented on the database level (which is probably cleanest) or on the application level (which would also reduce the size of content objects in memory). This would bring clear wins where I/O or memory bandwidth are the limiting factors - CPUs spend most of their time waiting for data to be copied into their cache from memory. Laurence 2010/4/28 Hanno Schlichting ha...@hannosch.eu: Hi. The ZODB currently uses a hardcoded pickle protocol one. There's both the more efficient protocol two and in Python 3 protocol 3. Protocol two has seen various improvements in recent Python versions, triggered by its use in memcached. I'd be interested to work on changing the protocol. How should I approach this? I can see three general approaches: 1. Hardcode the version to 2 in all places, instead of one. Pros: Easy to do, backwards compatible with all supported Python versions Cons: Still inflexible 2. Make the protocol version configurable Pros: Give control to the user, one could change the protocol used for storages or persistent caches independently Cons: More overhead, different protocol versions could have different bugs 3. Make the format configurable Shane made a proposal in this direction at some point. This would abstract the persistent format and allow for different serialization formats. As part of this one could also have different Pickle/Protocol combinations. Pros: Lots of flexibility, it might be possible to access the data from different languages Cons: Even more overhead If I am to look into any of these options, which one should I look into? Option 1 is obviously the easiest and I made a branch for this at some point already. I'm not particularly interested in option 3 myself, as I haven't had the use-case. Thanks for any advice, Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] plone and postgres connection problems
I've had this issue reported to me in the context of zope.sqlalchemy, but have been unable to reproduce it. Others have also seen it, but as far as I am aware have not been able to reproduce it: http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg146522.html As there have now been three sightings, in the context of SQLAlchemy, Django, and plain dbapi2 usage (RelStorage). I suspect it is a real issue with Psycopg2, however until it can be reproduced I'm not hopeful it can be fixed. Laurence On 19 April 2010 19:34, lista administracion reference.l...@gmail.com wrote: Hi We have two servers with Plone that point to the same database [Postgres 8] so that if one fails the other working. The problem is that at least once a day left to answer my Plone few minutes (it is in white or takes a long to respond) and cause apache send Proxy Error, this is solved but it is only annoying for we users to wait 5 to 10 minutes not counting the lack of availability of the page. We find that when it fails, PostgreSQL maintains a connection and keeps a long time as in transaction and after about 10 minutes this connection automatically terminates. this causes plone fail,at least 10 minutes The solution restart plone or kill the process as follows postgres 23267 0.0 0.1 2172016 7768 ? S 10:17 0:00 postgres: ploneadmin plonetesting 10.9.33.116(45189) idle in transaction kill -15 23267 We have 2 Servers with Apache/Plone Ram 4 Gigas RedHat 5.4 Apache 2.2.3 Plone 3.3.4 RelStorage-1.4.0b3 1 Server with Postgres Ram 4 Gigas RedHat 5.4 Postgres 8.1.18 Configuration. -- Conexion DB relstorage blob-dir var/blobs postgresql dsn dbname='plone' user='admin' host='10.9.33.128' password='password123' /postgresql /relstorage Recently modify these parameters in both servers and now it takes about 3 days to fail. blob-dir var/blobs cache-local-mb 512 cache-prefix prod cache-delta-size-limit 5000 commit-lock-timeout 10 poll-interval 60 pack-dry-run true pack-batch-timeout 8 pack-duty-cycle 0.3 pack-max-delay 30 We do not know if the problem is plone, postgres or network I hope someone can advise something. thanks in advance. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Low-level reference finding, tracking, deleting?
On 17 April 2010 05:27, Jeff Shell j...@bottlerocket.net wrote: We encountered a problem during an export/import in a Zope 3 based application that resulted in something not being importable. This is from our very first Zope 3 based application, and I stumbled across some very old adapter/utility registrations that I thought I had cleared out. There are references to `zope.interface.adapter.Null` which haven't been around for years. This is in an old `LocalAdapterRegistry` which, again, I thought I had removed along time ago. These objects and what they reference are not part of our normal object graph, and I was surprised to see them. Given an oid, how can I trace what references that object/oid? There is something in our normal object hierarchy retaining a reference, but I don't know how to find it, and imagine that trying to investigate/load the objects from the ZODB level will help me find the culprit. I describe how to do this in an article here: http://plone.org/documentation/kb/debug-zodb-bloat Since then, Jim has written zc.zodbgc in which the multi-zodb-check-refs script will optionally produce a database of reverse references. http://www.mail-archive.com/zodb-dev@zope.org/msg04389.html Are there low level deletion tools in the ZODB to delete individual objects? You delete an object by removing all references to it, so it becomes liable for garbage collection. Persistent component registrations will be referenced from the registry as well as the _components container. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Using zodb and blobs
Running your test script on my small amazon EC2 instance on linux takes between 0.0 and 0.04 seconds (I had to remove the divide by total to avoid a zero division error). 0.02 is 5000/s. Laurence On 14 April 2010 00:25, Nitro ni...@dr-code.org wrote: 40 tps sounds low: are you pushing blob content over the wire somehow? I have seen the ZEO storage committing transactions at least an order of magnitude faster than that (e.g., when processing incoming newswire feeds). I would guess that there could have been some other latencies involved in your setup (e.g., that 0-100ms lag you mention below). See my attached test script. It outputs 45-55 transactions/s for 100 byte sized payload. Maybe there's a very fundamental flaw in the way the test is setup. Note that I am testing on a regular desktop machine (Windows 7, WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s). The zeo server and clients will be in different physical locations, so I'd probably have to employ some shared filesystem which can deal with that. Speaking of locations of server and clients, is it a problem - as in zeo will perform very badly under these circumstances as it was not designed for this - if they are not in the same location (typical latency 0-100ms)? That depends on the mix of reads and writes in your application. I have personnally witnessed a case where the clients stayed up and serving pages over a whole weekend in a clusterfsck where both the ZEO server and the monitoring infrastructure went belly up. This was for a large corporate intranet, in case that helps: the problem surfaced mid-morning on Monday when the employee in charge of updating the lunch menu for the week couldn't save the changes. Haha, I hope they solved this critical problem in time! In my case the clients might be down for a couple of days (typically 1 or 2 days) and they should not spend 30 mins in cache verification time each time they reconnect. So if these 300k objects take up 1k each, then they occupy 300 MB of ram which I am fine with. If the client is disconnected for any period of time, it is far more likely that just dumping the cache and starting over fresh will be a win. The 'invalidation_queue' is primarily to support clients which remain up while the storage server is down or unreachable. Yes, taking the verification time hit is my plan for now. However, dumping the whole client cache is something I'd like to avoid, since the app I am working on will not work over a corporate intranet and thus the bandwidth for transferring the blobs is limited (and so can take up considerable time). Maybe I am overestimating the whole client cache problem though. Thanks again for your valuable advice, -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Checking the length of OOBTree
A BTree does not keep track of it's length. See BTrees.Length.Length: http://apidoc.zope.org/++apidoc++/Code/BTrees/Length/Length/index.html Laurence On 8 April 2010 16:36, Leszek Syroka leszek.marek.syr...@cern.ch wrote: Hi, what is the fastest way of checking the number of elements in OOBtree. Execution time of len( OOBtree.keys() ) and len(OOBtree) is exactly the same. For big data sets execution time is unacceptable. I found out that in the implementation of OOBtree (written in C) there is a variable called 'len', which seems to contain the length of the tree. Is it possible to access that variable from the python code without modifying the source? Best regards Leszek ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - zodb-...@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Savepoint release support
2010/1/17 Jim Fulton j...@zope.com: On Sat, Jan 16, 2010 at 1:03 PM, Laurence Rowe l...@lrowe.co.uk wrote: I've had a request to add savepoint release support to zope.sqlalchemy as some databases seem to limit the number of savepoints in a transaction. I've added this in a branch of transaction here: svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release From the changelog: * Add support for savepoint.release(). Some databases only support a limited number of savepoints or subtransactions, this provides an opportunity for a data manager to free those resources. * Rename InvalidSavepointRollbackError to InvalidSavepointError (BBB provided.) If there are no objections, I shall merge this to trunk. I'll review and merge. Great, thanks! What does it mean to release a savepoint? How is this different from aborting a save point? I ask particularly in light of: On Sat, Jan 16, 2010 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote: 2010/1/16 Laurence Rowe l...@lrowe.co.uk: I'm still not sure this will allow me to add savepoint release support to zope.sqlalchemy, as SQLAlchemy has a concept of nested transactions rather than savepoints. http://groups.google.com/group/sqlalchemy/browse_thread/thread/7a4632587fd97724 Michael Bayer noted on the sqlalchemy group that on RELEASE SAVEPOINT Postgresql destroys all subsequent savepoints. My branch now implements this behaviour. For zope.sqlalchemy I commit the sqlalchemy substransaction on savepoint.release(). This translates to a RELEASE SAVEPOINT on postgresql, best described by their docs here: RELEASE SAVEPOINT destroys a savepoint previously defined in the current transaction. Destroying a savepoint makes it unavailable as a rollback point, but it has no other user visible behavior. It does not undo the effects of commands executed after the savepoint was established. (To do that, see ROLLBACK TO SAVEPOINT.) Destroying a savepoint when it is no longer needed allows the system to reclaim some resources earlier than transaction end. RELEASE SAVEPOINT also destroys all savepoints that were established after the named savepoint was established. http://developer.postgresql.org/pgdocs/postgres/sql-release-savepoint.html Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB3 installation ambiguous conclusion
2009/12/20 Ross Boylan rossboy...@stanfordalumni.org: easy_install ZODB3 looked fairly good during installation until the end: quote Processing transaction-1.0.0.tar.gz Running transaction-1.0.0\setup.py -q bdist_egg --dist-dir c:\users\ross\appdata\local\temp\easy_install-cw1i4f\transaction-1.0.0\egg-dist-tmp-z7nrfd Adding transaction 1.0.0 to easy-install.pth file Installed c:\python26\lib\site-packages\transaction-1.0.0-py2.6.egg Finished processing dependencies for ZODB3 WARNING: An optional code optimization (C extension) could not be compiled. Optimizations for this package will not be available! Unable to find vcvarsall.bat /quote This seems to say things will work, just not as fast as they could. But I'm a little puzzled why things would work at all, since I don't have a build environment on the machine (well, there is a compiler that's part of the MS SDK, but I'm not really sure how capable or operational it is--it did seem to compile some sample C code in the kit). Is there a pure python fallback for the C code? I thought ZODB had some C-level magic. ZODB requires C-code modules to work, but pre-compiled win32 eggs are available, and presumably that is what easy_install picked. It's not clear to me what generated that warning, but then I don't use Windows. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] No module named Persistence
2009/12/20 Ross Boylan rossboy...@stanfordalumni.org: The IPC10 presentation says #Works as a side-effect of importing ZODB above from Persistence import Persistent I tried that (with the indicate other imports first). It led to a No module error. I tried commenting out the line, since the comment could be interpreted to mean that importing ZODB already does what's necessary. But there was no Persistent class defined I could use. I tried from Globals import Persistent, as suggested in a 1998 posting. This produced No module named Globals. Suggestions? That is the old Zope2 persistence base class. Try 'from persistent import Persistent'. http://docs.zope.org/zodb/zodbguide/prog-zodb.html#writing-a-persistent-class (note that guide has probably not been updated since ZODB 3.7, so don't expect any newer features to be documented there). Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Python properties on Persistent objects
2009/12/17 Mikko Ohtamaa mi...@redinnovation.com: Hi, I need to have little clarification should properties work on Persistent objects. I am running ZODB 3.8.4 on Plone 3.3. I am using plone.behavior and adapters to retrofit objects with a new behavior (HeaderBehavior object). This object is also editable through z3c.form interface. z3c.form requires a context variable on the object e.g. to look up dynamic vocabularies. To avoid having this object.context attribute to be peristent (as it's known every time by the factory method of the adapter which creates/look-ups HeaderBehavior) I tried to spoof context variable using properties and internal volatile variable. This was a trick I learnt somewhere (getpaid.core?) This sounds like you are passing context somewhere where view is expected. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Data.fs size grows non-stop
2009/12/9 Pedro Ferreira jose.pedro.ferre...@cern.ch: Hello, Just zodbbrowser with no prefix: http://pypi.python.org/pypi/zodbbrowser https://launchpad.net/zodbbrowser It's a web-app: it can connect to your ZEO server so you can inspect the DB while it's being used. We tried this, but we currently get an error related with the general security policy for zope.app. Maybe we need to install Zope? This would be a very handy tool. I'd suggest dumping the last few transactions with one of the ZODB scripts (fsdump.py perhaps) and seeing what objects get modified. That's what we've being doing, and we got some clues. We've modified Jim's script in order to find out which OIDs are being rewritten, and how much space they are taking, and this is a fragment of it: OID class_name total_size percent_size n_pickles min_size avg_size max_size '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683 1977885 2004241 2026518 '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683 1616904 1635889 1651956 '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522 20% 28513 418230 419315 420294 '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238 307112 314379 320647 '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238 190816 195216 199007 '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3% 1953 880615 884903 887285 [...] As you can see, we have an OOBucket occupying more than 2MB (!) per write. That's almost 17GB only considering the last 1M transactions of the DB (we get ~3M transactions per week). We believe this bucket belongs to some OOBTree-based index that we are using, whose values are Python lists (maybe that was a bad choice to start with?). In any case, how do OOBuckets work? Is it a simple key space segmentation strategy, or are the values taken into account as well? Our theory is that an OOBTree simply divides the N keys in K buckets, and doesn't care about the contents. So, since we are adding very large lists as values, the tree remains unbalanced, and since new contents will be added to this last bucket, each rewrite will imply the addition of ~2MB to the file storage. BTree buckets have no concept of the size of their contents, they split when their number of keys reaches a threshold (30 for OOBTrees). Will the replacement of these lists with a persistent structure such as a PersistentList solve the issue? The list would then be stored as a separate persistent object, so changes to the bucket would not rewrite the entire list object. The downside of this is that your application may become slower as reading the contents of the index will incur additional object loads. Zope2's ZCatalog stores index data as tuples in BTrees, but only a small amount of metadata is stored (so the buckets are maybe 30-60KB). It sounds like you are storing a large amount of metadata in the index, or perhaps inadvertently indexing something. I've seen similar problems caused by binary data ending up in a text index (where a 'word' ended up being several megabytes). Load the object to check the problem is large values, rather than large keys. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Data.fs size grows non-stop
2009/12/7 Jose Benito Gonzalez Lopez jose.benito.gonza...@cern.ch: Dear ZODB developers, Since some time ago (not sure since when) our database has passed from 15GB to 65GB so fast, and it keeps growing little by little (2 to 5 GB per day). It is clear that something is not correct in it. We would like to check which objects are taking most of the space or just try to find out what is going on,... Any help or suggestions would be much appreciated. Take a look at my write up here: http://plone.org/documentation/kb/debug-zodb-bloat You will want analyze.py from the latest ZODB release (or download it from http://svn.zope.org/ZODB/trunk/src/ZODB/scripts/) the version that ships with Zope 2.10.9 is broken. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] repozo, neither official nor supported, apparently...
2009/11/20 Chris Withers ch...@simplistix.co.uk: Jim Fulton wrote: On Thu, Nov 19, 2009 at 7:01 PM, Chris Withers ch...@simplistix.co.uk wrote: Jim Fulton wrote: There's nothing official or supported about a backup solution without automated tests. So I guess there isn't one. Right, so what does Zope Corp use? We use ZRS, of course. Well, ZRS solves the HA challenge the same way as zeoraid, if I understand correctly, but what about offsite backups and the like? The project I'm currently working on uses repozo to create backups that: - get hoovered by the hosting provider's backup mechanisms and rotated offsite daily - get sprayed by rsync over ssh to a DR site on another continent How would ZRS solve these problems? I'd prefer that there be a file-storage backup solution out of the box. repozo is the logical choice. It sounds like it needs some love though. This isn't something I'd likely get to soon. I'm not sure how much love repozo needs. It works, and it won't need changing until FileStorage's format changes, which I don't see happening any time soon. Maybe this test I added for analyze.py could be a helpful template. http://zope3.pov.lt/trac/changeset/100422 Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] repozo, neither official nor supported, apparently...
2009/11/20 Jim Fulton j...@zope.com: On Fri, Nov 20, 2009 at 9:32 AM, Chris Withers ch...@simplistix.co.uk wrote: ... I'm not sure how much love repozo needs. It works, and it won't need changing until FileStorage's format changes, which I don't see happening any time soon. It just occurred to me that repozo doesn't support blobs. This was touched on in a thread Backing up Data.fs and blob directory: https://mail.zope.org/pipermail/zodb-dev/2008-September/012094.html While there is no direct support in repozo, the approach of first taking a repozo backup followed by a blob directory backup works so long as you do not pack between the repozo and blob backups. (Blobs newer than the repozo backup are safely ignored.) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO and blobs: missing tmp file breaks transaction on retry
2009/11/13 Martin Aspeli optilude+li...@gmail.com: Hanno Schlichting wrote: On Fri, Nov 13, 2009 at 5:40 PM, Jim Fulton j...@zope.com wrote: On Fri, Nov 13, 2009 at 10:18 AM, Mikko Ohtamaa mi...@redinnovation.com wrote: Unfortunately the application having the issues is Plone 3.3. ZODB 3.9 depends on Zope 2.12 so, right? ZODB does depend on Zope anything. :) Plone 3.3 may use an earlier version of ZODB. but perhaps it is possible to get it to work with a later one. I wouldn't know. :) Plone 3.x uses Zope 2.10 and ZODB 3.7. Upgrading it to ZODB 3.8.x is trivial. But the changes in ZODB 3.9 (essentially the removal of the version feature) require a bunch of non-trivial changes to Zope2. So only Zope 2.12 works with ZODB 3.9. Anyone using Plone 3.x who wants to use blobs is therefor stuck with ZODB 3.8.x. It's not supported by Plone and considered experimental on all layers :) Meanwhile, several people have used it in production. I was a little taken aback to discover that it is considered somewhat experimental (and it seems, a bit broken) in ZODB 3.8 (as distinct from the Plone integration package, plone.app.blob, which indeed has been experimental up until now). I think a lot of other people would be too. A lot of people would be very happy if this bug in ZODB 3.8 could be fixed, since the option of upgrading is not there (since ZODB 3.9 introduces too-incompatible changes to work with Zope 2.10) for anyone on a released, stable version of Plone. Presumably ZODB 3.9 maintains backwards compatibility for ZEO clients, so a ZODB 3.9 ZEO server could be used with Zope 2.10 + ZODB 3.8 clients? Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] tools for analysing the tail of a filestorage?
This may help: http://plone.org/documentation/how-to/debug-zodb-bloat/ Laurence Chris Withers wrote: Hi All, I have a filestorage being used by Zope 2 that is mysteriously growing. I don't have confidence in the Undo tab, since this setup has two storages, once mounted into the other. I tried fstail.py, and while it tells me the same info as the Undo tab (except with more certainty that it's showing the right storage results ;-) it doesn't say much about the objects in question... Are there any other tools that might tell me more? cheers, Chris -- Simplistix - Content Management, Batch Processing Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev -- View this message in context: http://www.nabble.com/tools-for-analysing-the-tail-of-a-filestorage--tp25547059p25649804.html Sent from the Zope - ZODB-Dev mailing list archive at Nabble.com. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN
2009/5/27 Chris Withers ch...@simplistix.co.uk: Laurence Rowe wrote: Jim Fulton wrote: Well said. A feature I'd like to add is the ability to have persistent objects that don't get their own database records, so that you can get the benefit of having them track their changes without incuring the expense of a separate database object. +lots Hanno Schlichting recently posted a nice graph showing the persistent structure of a Plone Page object and it's 9 (!) sub-objects. http://blog.hannosch.eu/2009/05/visualizing-persistent-structure-of.html That graph isn't quite correct ;-) workflow_history has DateTime objects in it, and I think they get their own pickle. I had a major win on one CMFWorkflow project by changing the workflow implementation to use a better data structure *and* store ints instead of DateTime object. CMF should change this... Good point, though it is 'correct' for an object that has not undergone any workflow transitions yet, as is the case here ;) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN
Jim Fulton wrote: On May 26, 2009, at 10:16 AM, Pedro Ferreira wrote: In any case, it's not such a surprising number, since we have ~73141 event objects and ~344484 contribution objects, plus ~492016 resource objects, and then each one of these may contain authors, and fore sure some associated objects that store different bits of info... So, even if it doesn't include revisions, 19M is not such a surprising number. I've also tried to run the analyze.py script, but it returns me a stream of '''type' object is unsubscriptable errors, due to: classinfo = pickle.loads(record.data)[0] any suggestion? No. Unfortunately, most of the scripts in ZODB aren't tested or documented well and tend to bitrot. Also, is there any documentation about the basic structures of the database available? We found some information spread through different sites, but we couldn't find exhaustive documentation for the API (information about the different kinds of persistent classes, etc...). Is there any documentation on this? No. Comprehensive ZODB documentation is needed. This is an upcoming project for me. I have a patch at https://bugs.launchpad.net/zodb/+bug/223331 which fixes this. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN
Jim Fulton wrote: Well said. A feature I'd like to add is the ability to have persistent objects that don't get their own database records, so that you can get the benefit of having them track their changes without incuring the expense of a separate database object. +lots Hanno Schlichting recently posted a nice graph showing the persistent structure of a Plone Page object and it's 9 (!) sub-objects. http://blog.hannosch.eu/2009/05/visualizing-persistent-structure-of.html A sub-persitent type would allow us to fix the latency problems we experience without needing to re-engineer Archetypes at the same time. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] ZODB Documentation
A few weeks ago I converted the ZODB/ZEO Programming Guide and a few more articles into structured text and added them to the zope2docs buildout. I've now moved them to their own buildout in svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon appear at http://docs.zope.org/zodb (thanks Jens!) This means we now have two copies of the programming guide, one in latex in the ZODB sources and one in stx in zodbdocs. I'd like to propose removing the latex version and direct any changes to the stx version. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Documentation
Andreas Jung wrote: On 26.05.09 19:08, Andreas Jung wrote: On 26.05.09 18:54, Laurence Rowe wrote: A few weeks ago I converted the ZODB/ZEO Programming Guide and a few more articles into structured text and added them to the zope2docs buildout. I've now moved them to their own buildout in svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon appear at http://docs.zope.org/zodb (thanks Jens!) There is also (the same=) ZODB documentation available under http://docs.zope.org/zope2/articles/ We should get rid of one copy. oppss..sorry, for misreading...just seen your checkins for moving the stuff. They're actually copies at the moment, once Jens performs his magic I'll remove them from the Zope 2 buildout. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Documentation
Jim Fulton wrote: On May 26, 2009, at 12:54 PM, Laurence Rowe wrote: A few weeks ago I converted the ZODB/ZEO Programming Guide and a few more articles into structured text and added them to the zope2docs buildout. I've now moved them to their own buildout in svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon appear at http://docs.zope.org/zodb (thanks Jens!) This means we now have two copies of the programming guide, one in latex in the ZODB sources and one in stx in zodbdocs. I'd like to propose removing the latex version and direct any changes to the stx version. +1 (I'd repressed knowledge of the latex version.) Done. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN
Pedro Ferreira wrote: Dear all, Thanks a lot for your help. In fact, it was a matter of increasing the maximum recursion limit. There's still an unsolved issue, though. Each time we try to recover a backup using repozo, we get a CRC error. Is this normal? Has it happened to anyone? I guess we have a very large database, for what is normal in ZODB applications. We were wondering if there's any way to optimize the size (and performance) of such a large database, through the removal of unused objects and useless data. We perform packs in a weekly basis, but we're not sure if this is enough, or if there are other ways of lightening up the DB. Any recommendations regarding this point? You might want to try packing without garbage collection, which is a much cheaper operation. See http://mail.zope.org/pipermail/zodb-dev/2009-January/012365.html Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Proposal (version 2): cross database reference seat belt
Christian Theune wrote: Hi, On Tue, 2009-04-28 at 13:54 -0400, Jim Fulton wrote: Thanks again! (Note to everyone else, Shane and I discussed this on IRC, along with another alternative that I'll mention below.) I like version 2 better than version 1. I'd be inclined to simplify and it and skip the configuration flag and simply publish an event any time we see a cross-database reference when saving an object. Here's proposed solution 3. :) - We add a flag to disable new cross-database references unless they are explicitly registered. - We add a connection method to register a reference: def registerCrossDatabaseReference(from_, to): Register a new cross-database reference from from_ to to. - We arrange that connections can recognize old cross-database references. If someone accidentally creates a new reference and the flag is set, then transaction will be aborted. An interim step, if we're in a hurry to get 3.9 out, is to simply add the flag. This would disallow cross-database references in new applications. These applications could still support multiple databases by providing application-level traversal across databases. I think I'm reading something incorrectly: is there an emphasis on *new* applications? The flag would disallow the creation of cross-database references for a given DB -- independent of whether the app is new or old, right? Only depending on whether the application uses a ZODB that has the feature and has it enabled. Right? I think the emphasis was on new versus existing cross-database references. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO and time.sleep
For Plone, the standard remedy to this problem is to separate out portal_catalog into it's own storage (zeo has support for serving multiple storages). You may then control the object cache size per storage, setting the one for the portal_catalog storage large enough to keep all it's objects in the cache. As navigation is driven from the catalog this can significantly help performance and reduce the number of zeo loads to only those objects required to traverse to the published object. Other things that might help: * Reduce the number of zserver-threads from the default 4, object caches are per thread so this allows you to have fewer, larger caches. * Use FileSystemStorage for Archetypes, this can help if you serve many files. Files are stored in 64k pdata chunks, serving a large file can clear your cache. With newer versions of Plone you can use ZODB 3.8 and blobs. * Put Varnish or some other proxy cache in front and cache agressively. * Buy more memory, memory is cheap. Hope that helps, Laurence Juan Pablo Gimenez wrote: Hi all... I'm profiling a big plone 2.5 instance with huge performance problems and I was wondering if this bug is still present in zope 2.9.9-final, http://mail.zope.org/pipermail/zodb-dev/2007-March/010855.html We can't increment the zodb-cache-size because we're running out of memory... so a lot of times we read objects from zeo/zodb... Any help will be really appreciated... Saludos... ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Shane Hathaway wrote: I should note that this KeyError occurs while trying to report on a KeyError. I need to fix that. Fortunately, the same error pops out anyway. There's a fix for this in the Jarn branch. Note that to collect more interesting data it rolls back the load connection at this point, relying on the KeyError to cause the transaction to fail. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] How to turn off 'GC' when packing on ZODB3.6.2
eastxing wrote: Hi, I am using Plone2.5.5 with Zope2.9.8-final and ZODB3.6.2.Now my Data.fs size is nearly 26G with almost 140k Plone objects and more than 4100k zope objects in the database. Since 2 moths ago, I could not pack my database successfully. Recent days I tried to pack it again, but after more than 72 hours running, the pack process wasn't end I readed lots of discussions on the forum, some guys said turn off 'GC' when packing will improve the speed tremendously.Then I found an experimental product -- 'zc.FileStorage' written by Jim, but it seems that it only used by ZODB3.8 or later. So what should I do on ZODB3.6.2 to turn off 'GC' when do packing. ps:If this is a wrong place to ask the question, please let me know, I'll move it to the right place. As an alternative to backporting the changes to pack, you could try doing a zexp export of the site, and then reimport the zexp into a blank Data.fs. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] problem with broken
Broken objects occur when the class for a pickled object cannot be imported. To change the location of a class, you need to provide an alias at the old location so that the object can be unpickled, i.e. MyOldClassName = MyNewClassName. You can only remove MyOldClassName after you have updated all of the pickles (with your code below). Laurence Adam GROSZER wrote: Hello, I'm having a problem with broken objects here. It's coming when I'm trying to evolve generations and the generation just touches all objects in the ZODB to store them again with the non-deprecated classes. The code is like this: storage = context.connection._storage next_oid = None n = 0 while True: oid, tid, data, next_oid = storage.record_iternext(next_oid) obj = context.connection.get(oid) # Make sure that we tell all objects that they have been changed. Who # cares whether it is true! :-) obj._p_activate() obj._p_changed = True if next_oid is None: break 2008-11-04T19:40:16 ERROR SiteError http://localhost:8080/++etc++process/@@generations.html Traceback (most recent call last): File F:\W\Zope3\src\zope\publisher\publish.py, line 133, in publish result = publication.callObject(request, obj) ... File F:\W\Zope3\src\zope\tal\talinterpreter.py, line 343, in interpret handlers[opcode](self, args) File F:\W\Zope3\src\zope\tal\talinterpreter.py, line 583, in do_setLocal_tal self.engine.setLocal(name, self.engine.evaluateValue(expr)) File F:\W\Zope3\src\zope\tales\tales.py, line 696, in evaluate return expression(self) File F:\W\Zope3\src\zope\tales\expressions.py, line 217, in __call__ return self._eval(econtext) File F:\W\Zope3\src\zope\tales\expressions.py, line 211, in _eval return ob() File F:\W\Zope3\src\zope\app\generations\browser\managers.py, line 182, in evolve transaction.commit() File F:\W\Zope3\src\transaction\_manager.py, line 93, in commit return self.get().commit() File F:\W\Zope3\src\transaction\_transaction.py, line 322, in commit self._commitResources() File F:\W\Zope3\src\transaction\_transaction.py, line 416, in _commitResources rm.commit(self) File F:\W\Zope3\src\ZODB\Connection.py, line 541, in commit self._commit(transaction) File F:\W\Zope3\src\ZODB\Connection.py, line 586, in _commit self._store_objects(ObjectWriter(obj), transaction) File F:\W\Zope3\src\ZODB\Connection.py, line 620, in _store_objects p = writer.serialize(obj) # This calls __getstate__ of obj File F:\W\Zope3\src\ZODB\serialize.py, line 405, in serialize meta = klass, newargs() File F:\W\Zope3\src\ZODB\broken.py, line 325, in __getnewargs__ return self.__Broken_newargs__ AttributeError: 'VocabularyManager' object has no attribute '__Broken_newargs__' ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Amazon SimpleDB Adapter
Shane Hathaway wrote: Benjamin Liles wrote: Currently at the Plone conference it seems that a large number of people are beginning to host their Plone sites on the Amazon EC2 service. A simpleDB adapter might be a good way to provide persistent storage for an EC2 base Zope instance. Has there been any interest in this? If I was to write one, should I add it to RelStorage or create my own package along the lines of relstorage.adapters.simpledb? This sounds interesting! We should add an adapter to RelStorage. We might run into some trouble with MVCC, but I think we can solve that. We should also use Amazon S3 directly for blob storage. In general, Amazon's services seem a much better fit for ZODB apps than what Google is offering. I'm not sure RelStorage is the best place for it - SimpleDB is very different to relational databases. A couple of years ago I experimented with s3storage [1]. This turned out to be very slow due to the number of writes performed every transaction - one per object, though this could be improved if the writes were parallelized. It reached the point where zope2 would start up. This took about 10 or 15 mintutes at the time (I did not have access to EC2 at the time and this was over public wifi). It worked by creating it's own indexes in S3. I don't think SimpleDB will give any advantage unless it is shown to be faster to query than S3. You cannot store pickles directly in SimpleDB because it is limited to an attribute size of 1024 bytes. The challenge in building such a system is in Amazon's eventual consistency model means you cannot know how up to date your view of the data is. I think it could make a great backend for storing pickles (keyed by oid, tid) but it is probably much easier to have a separate index to consult during loadSerial. It may also be worth experimenting with DirectoryStorage over s3fs [2]. Laurence [1] http://code.google.com/p/s3storage [2] http://code.google.com/p/s3fs ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Broken instances after refactoring in ZODB
Leonardo Santagada wrote: On Oct 4, 2008, at 12:36 PM, Wichert Akkerman wrote: Adam wrote: Thanks for that, guys, I've not used a mailing list like this before so unsure how to respond. If ZODB stores the Package.Module.Class name in the pickle would it be possible for me to simply rename them in the binary file? Possible it is, but probably harder than just doing what they said My confusion here is that I've globally imported everything from the packages into the current namespace of my main module. ZODB shouldn't be aware I've moved the modules since for all intents and purposes to Python, they are still there. It doesn't matter where you import it from or to - python uses the location of the actual implementation and ZODB uses that. If you move your implementation to another place you have to either update all objects in the ZODB or add module aliases. Wichert. I would like to know from where does it get that info? I would guess from __module__. Correct. Why doesn't zodb has a table of some form for this info? I heard that sometimes for very small objects the string containing this information can use up to 30% of the whole space of the file (using FileStorage). How does RelStorage store this? I believe this was what the python pickle protocol 2 was created for. However I think when someone last looked the potential space savings with real world data did not justify making the change (Hanno has a branch in svn for this). Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Zope memory usage
Izak Burger-2 wrote: Dieter Maurer wrote: This is standard behaviour with long running processes on a system without memory compaction: Of course, I remember now, there was something about that in my Operating Systems course ten years ago :-) I suppose the bigger page sizes used on some architectures doesn't help. The zope instance in question is 2.10.5, which includes ZODB 3.7.1. Can we simply swap that out with 3.8.0? Or should we rather do a svn diff on the dm-memory_size_limited-cache branch (based on 3.7.0) and see if that applies cleanly to 3.7.1 (I suspect it will)? I'm using the 3.8 branch (that will become 3.8.1) for it's blob support happily with Plone 3.1 and Zope 2.10 Laurence -- View this message in context: http://www.nabble.com/Zope-memory-usage-tp19528989p19558656.html Sent from the Zope - ZODB-Dev mailing list archive at Nabble.com. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB not saving sometimes
Andreas Jung wrote: --On 22. Juni 2008 08:49:32 -0700 tsmiller [EMAIL PROTECTED] wrote: Gary, I have been using the ZODB for about a year and a half with a bookstore application. I am just now about ready to put it out on the internet for people to use. I have had the same problem with saving data. I have tried alot of things. But I have never gotten the database to save consistently. I can create an x number of records one right after the other that uses the exact same code to save them, and it is likely that all of them will save perfectly except one - or maybe two. We have never seen that - except with badly written application code. Commiting a transaction should always commit the data. That's the sense of a transaction system. The only reason I can imagine causing such a failure: bare try..except with in your code suppressing ZODB conflict errors. The other likely cause of this is modifying non-persistent sub objects and not setting _p_changed = True on the parent persistent object. e.g: dbroot['a_list'] = [1, 2, 3] transaction.commit() a_list = dbroot['a_list'] a_list.append(4) transaction.commit() The second commit actually has no effect as the persistence machinary has not been notified that the object has changed. This is not immediately apparent though as the 'live' shows what you expect: a_list [1, 2, 3, 4] And if a later transaction also modifies the persistent object, then all of the data is saved. To avoid this, avoid using mutable, non-persistent types for storage in the ZODB, replace lists and dicts with PersistentList and PersistentMapping. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: Advice on ZODB with large datasets
It's helpful to post your responses to the mailing list, that way when someone else has a similar problem in the future they'll be able to find the information. Inheriting from Persistent is also necessary to control the granularity of the database. Persistent objects are saved as separate `records` by ZODB. Other objects do not have a _p_oid attribute and have to be saved as part of their parent record. Laurence 2008/6/19 [EMAIL PROTECTED]: Laurence Rowe wrote: [EMAIL PROTECTED] wrote: Does your record class inherit from persistent.Persistent? 650k integers + object pointers should only be of the order 10 Mb or so. It sounds to me like the record data is being stored in the btrees bucket directly. No, it does not. It's just a simple dictionary for the time being. I assumed the BTree bucket would itself know to load the values only when they are explicitly requested, and that the Persistence of the objects just merely meant that the database didn't keep track of changes of nonpersistent objects. I will try copying my dictionaries to Persistent Mappings for now. Something like this should lead to smaller bucket objects where the record data is only loaded when you access the values of the btree: from BTrees.IOBTree import IOBTree bt = IOBTree() from persistent import Persistent class Record(Persistent): ... def __init__(self, data): ... super(Record, self).__init__() ... self.data = data ... rec = Record(my really long string data) bt[1] = rec ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: zodb does not save transaction
tsmiller wrote: I have a bookstore that uses the ZODB as its storage. It uses qooxdoo as the client and CherryPy for the server. The server has a 'saveBookById' routine that works 'most' of the time. However, sometimes the transaction.commit() does NOT commit the changes and when I restart my server the changes are lost. This sounds like you are using mutable data types (like lists or dicts) in the non-persistence aware variants. Christian, thanks for the reply. When I save a book I save a dictionary where all of the keys are strings and all of the values are strings. But what you say makes sense. I keep thinking that it must have something to do with the data itself. I will check very carefully to make sure that I am not saving anything but strings in the book record. Thanks. Tom The problem is not saving things that are not strings, but modifying a non persistent object without notifying the parent persistent object that a change has happened and it needs to be saved. e.g. you have a persistent object (inherits from persistent.Persistent) pobj pobj.dict = {} transaction.commit() pobj.dict['foo'] = 'bar' transaction.commit() print pobj.dict {'foo': 'bar'} #restart your python process print pobj.dict {} Instead you must either tell zodb the object has changed: pobj.dict = {} transaction.commit() pobj.dict['foo'] = 'bar' pbj._p_changed = True # alternatively: pobj.dict = pobj.dict transaction.commit() print pobj.dict {'foo': 'bar'} #restart your python process print pobj.dict {'foo': 'bar'} Or use a persistence aware replacement. from persistent.mapping import PersistentMapping pobj.dict = PersistentMapping() transaction.commit() pobj.dict['foo'] = 'bar' transaction.commit() print pobj.dict {'foo': 'bar'} #restart your python process print pobj.dict {'foo': 'bar'} The same principles apply to other mutable non-peristent objects, such as lists. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: PGStorage
PGStorage does require packing currently, but it would be fairly trivial to change it to only store single revisions. Postgres would still ensure mvcc. Then you just need to make sure postgres auto-vacuum daemon is running. Laurence David Pratt wrote: Yes, Shane had done some benchmarking about a year or so ago. PGStorage was actually faster with small writes but slower for larger ones. As far as packing, as a zodb implementation, packing is still required to reduce the size of data in Postgres. BTW Stephan, where is Lovely using it - a site example? I had read some time ago that they were exploring it but not that it was being used. Regards, David Stephan Richter wrote: On Tuesday 22 January 2008, Dieter Maurer wrote: OracleStorage was abandoned because it was almost an order or magnitude slower than FileStorage. Actually, Lovely Systems uses PGStorage because it is faster for them. Regards, Stephan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
Matt Hamilton wrote: David Binger dbinger at mems-exchange.org writes: On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote: Lots of people don't do nightly packs, I'm pretty sure such a process needs to be completely automatic. The question is weather doing it in a separate process in the background, or ever X transactions, or every X seconds, or something. Okay, perhaps the trigger should be the depth of the small-bucket tree. That may just end up causing delays periodically in transactions... ie delays that the user sees, as opposed to doing it via another thread or something. But then as only one thread would be doing this at a time it might not be too bad. -Matt ClockServer sections can now be specified in zope.conf. If you specify them with a period of say 10 mins (or even 2) then the queue should never get too large, and the linear search time is not a problem as n is small. Essentially you end up with a solution very similar to QueueCatalog but with the queue being searchable. The pain is then in modifying all of the indexes to search the queue in addition to their standard data structures. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZODB Benchmarks
It looks like ZODB performance in your test has the same O(log n) performance as PostgreSQL checkpoints (the periodic drops in your graph). This should come as no surprise. B-Trees have a theoretical Search/Insert/Delete time complexity equal to the height of the tree, which is (up to) log(n). So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for inserts. Instead of inserting into the (B-Tree based) data files at every transaction commit it writes a record to the WAL. This does not require traversal of the B-Tree and has O(1) time complexity. The penalty for this is that read operations become more complex, they must look first in the WAL and overlay those results with the main index. The WAL is never allowed to get too large, or its in memory index would become too big. If you are going to have this number of records -- in a single B-Tree -- then use a relational database. It's what they're optimised for. Laurence Roché Compaan wrote: Well I finally realised that ZODB benchmarks are not going to fall from the sky so compelled by a project that needs to scale to very large numbers and a general desire to have real numbers I started to write some benchmarks. My first goal was to get a baseline and test performance for the most basic operations like inserts and lookups. The first test tests BTree performance (OOBTree to be specific) and insert instances of a persitent class into a BTree. Each instance has a single attribute that is 1K in size. The test tries out different commit intervals - the first iteration commits every 10 inserts, the second iteration commits every 100 inserts and the last one commits every 1000 inserts. I don't have results for the second and third iterations since the first iteration takes a couple of hours to complete and I'm still waiting for the results on the second and third iteration. The results so far is worrying in that performance deteriorates logarithmically. The test kicks of with a bang at close to 750 inserts per second, but after 1 million objects the insert rate drops to 260 inserts per second and at 10 million objects the rate is not even 60 inserts per second. Why? In an attempt to determine if this drop in performance is normal I created a test with Postgres purely to observe transaction rate and not to compare it with the ZODB. In Postgres the transaction rate hovers around 2700 inserts throughout the test. There are periodic drops but I guess these are times when Postgres flushes to disc. I was hoping to have a consistent transaction rate in the ZODB too. See the attached image for the comparison. I also attach csv files of the data collected by both tests. During the last Plone conference I started a project called zodbbench available here: https://svn.plone.org/svn/collective/collective.zodbbench The tests are written as unit tests and are run with a testrunner script. The project uses buildout to make it easy to get going. Unfortunately installing it with buildout on some systems seems to lead to weird import errors that I can't explain so I would appreciate it if somebody with buildout fu can look at it. What I would appreciate more though is an explanation of the drop in performance or alternatively, why the test is insane ;-) ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: AW: diploma thesis: ZODB Indexing
Christian Theune wrote: snip / We imagine we need two kinds of components to make this work: 1. A query processor that could look like: class IQueryProcessor(Interface): def query(...): Returns a list of matching objects. The parameters are specific to the query processor in use. Alternatively, as the signature of the only method isn't specified anyway, we could make each query processor define its own interface instead. 2. An object collection that serves two purposes: a) maintain indexes b) provide a low-level query API that is rich enough to let different query processors e.g. for SQL, xpath, ... work against them. This is the one that needs most work to get the separation of concerns right. One split we came up with are the responsibilities to define: - which objects to index - how to store the indexes - how to derive the structural relations between objects Those could be separated into individual components and make the object collection a component that joins those together. On the definition of indexes: we're not sure whether a generic set of indexes will be sufficient (e.g. the three indexes from XISS - class index, attribute index, structural index) or do those need to be exchanged? For our ad-hoc querying we certainly don't want to have to set up specialised indexes to make things work, but maybe optional indexes could be used when possible -- just like RDBMS. Make sure you take a look at SQLAlchemy's implementation of this, sqlalchemy.orm.query. RDBMS do not get fast querying for free... They just revert to a complete record scan when they do not have an index - analogous to the find tab in the ZMI. As anyone who has ever queried such a database can attest, it ain't quick. (RDBMSs tend to create implicit indexes on primary and foreign keys also.) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: checking what refers to an object in zodb
Chris, I think you're looking at forward references when you want to look at back references. This might help: http://plone.org/documentation/how-to/debug-zodb-bloat (you might have to change the refmap to be in a zodb with that much data though) Laurence Chris Withers wrote: Hi All, We have a big(ish) zodb, which is about 29GB in size. Thanks to the laughable difficulty of getting larger disks in big corporates, we've been looking into what's taking up that 29GB and were a bit surprised by the results. Using space.py from the ZODBTools in Zope 2.9.4, it turns out that we have a lot of PersistentMapping's: 990,35913,555,382,871Persistence.mapping.PersistentMapping So, that's almost half of the 29GB! AT's default storage is a PersistentMapping called _md so this isn't too surprising. However, when looking into it, it turns out that half of the PersistentMapping's actually appear to be workflow_history's from DCWorkflow. To try and find out which objects were referencing all these workflow histories, we tried the following starting with one of the oid of these histories: from ZODB.FileStorage import FileStorage from ZODB.serialize import referencesf fs = FileStorage(path, read_only=1) data, serialno = fs.load(oid, '') refs = referencesf(data) To our surprise, all of the workflow histories returned an empty list for refs. What does this mean? Is there a bug that means these objects are hanging around even though there are no references? Are we using the wrong method to find references to these objects? (if it helps, we pack to 1 day and each pack removes between 0.5GB and 1GB from the overall size) If there's any more info that would be helpful here, please ask away... cheers, Chris ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Supporting a DataManager without Two Phase Commit
Hi, Several people have made SQLalchemy integrations recently. SQLAlchemy does not support Two Phase Commit (2PC) so correctly tying it in with zope's transactions is tricky. With multiple One Phase Commit (1PC) DataManagers the problem is of course intractable, but given the popularity of mappers like SQLAlchemy I think Zope should support a single 1PC DataManager. This websphere document describes a method to integrate a single 1PC resource with 2PC resources: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/lao/tasks/tla_ep.html Following a discussion with several of the sqlalchemy integration authors on #plone today we came up with the following hack to implement this: http://dev.plone.org/collective/browser/collective.lead/trunk/collective/lead/tx.py The DataManager is given a high sortKey to ensure that it is considered last, and commits in tpc_vote, before the other (2PC) DataManagers' tpc_finish methods are called. The hack obviously relies on only one DataManager making use of the trick. It would be nice to make this was supported directly so that an a error could be thrown when more than one 1PC DataManager joined a transaction. This could be implemented by changing the signature of transaction._transaction.Transaction.join to have an optional single_phase argument (default would be False). The 1PC resource would then be registered seperately to the 2PC resources and _commitResources would call commit on the 1PC resource between tpc_vote and tpc_finish. If you think this would be helpful I'll try and supply a patch (need to look into the detail of how failed transactions are cleaned up). Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: KeyError / POSKeyError
You need to provide the full traceback so we can tell where it is coming from. My guess (though I'm surprised by the particular error) is that you have perhaps got content owned by users in a user folder outside the site that is no longer accessible when you mount the database on its own. If that is the case then you need to write a script to fix up the __ac_local_roles__ on the affected objects. Laurence Tim Tisdall wrote: Here's the thing... I get a KeyError if that ZODB is on it's own, but if I create a fammed-old object that's similar to what it's looking for, it will then throw a POSKeyError. The Plone instance was created fresh and then only the file contents of the old site were copied over to the new instance. The migration of the old Plone site didn't work, but it did manage to make it so I could access the files contained within and copy them over. I didn't copy over any stylings, products, users, widget things... I'm pretty sure I just copied over AT types and a few basic zope files (like DTML files and zope page templates). -Tim On 3/23/07, Christian Theune [EMAIL PROTECTED] wrote: Hi, Can you tell whether you get a KeyError or a POSKeyError? If you get a KeyError, it's likely that the app (Plone) is broken, e.g. during the migration you mentioned. A POSKeyError would (very likely) not talk about a a key like 'fammed-old', so I suspect you don't have a corruption in your storage/database but your application. Christian Am Freitag, den 23.03.2007, 12:04 -0400 schrieb Tim Tisdall: I've got a 1gb ZODB that contains a single plone site and I'm not able to access any part of it via the ZMI. It keeps saying that it's looking for key fammed-old which is another plone site in another ZODB file. Basically I managed to partly migrate a Plone 2.0 to Plone 2.5 and then copied over the file contents from that instance into a new Plone instance. I have no idea why the new one would be referencing the old one, but it seemed to always throw this error if the old database was unmounted. I've tried several cookbook fixes I've found, but the problem is that the plone instance itself is throwing the KeyError. Deleting the whole plone instance is not going to help me much. Any suggestions? I've also tried running the fsrecovery.py, but it simply makes a complete duplicate of the file. fstest.py doesn't seem to find any errors. fsrefs.py finds a series of errors, but I have no idea what to do with that information. It seems that it's finding that it's referencing fammed-old and that that doesn't exist. -Tim ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev -- gocept gmbh co. kg - forsterstraße 29 - 06112 halle/saale - germany www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: roll back filestorage zodb to a certain date?
Jim Fulton wrote: snip / I wasn't asking about implementation. Here are some questions: - Should this create a new FileStorage? Or should it modify the existing FileStorage in place? Probably create a new one (analogous to a pack). Seems safer than truncating to me. - Should this work while the FileStorage is being used? I don't think this is important. If a new file is created it can open the existing one readonly anyhow. - Should this behave transactional? No need if it creates a new file However its done it'll sure beat the iterate through transactions to find the offset for a particular time then dd to create a truncated copy method that I use ;-) Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: History-less FileStorage?
I'm sure you're probably aware of these, but I thought I'd file this summary while they were in my head. There is no history-less FileStorage. It is essentially a transaction log. Directory Storage has Minimal.py which is history-less, very simple though it is not proven in production. Could be a good candidate for storing the catalogue, though I imagine you would want to rebuild after an unclean shutdown of zope in this case. http://dirstorage.sourceforge.net/FAQ.html BDBStorage never made it. http://wiki.zope.org/ZODB/BDBStorage.html PGStorage does store the history. However it would be fairly simple to rework it not to (indeed it would simplify the code considerably). Performance is similar to or better than ZEO + FileStorage, though slower than local FileStorage. http://sourceforge.net/projects/pgstorage Laurence Stefan H. Holek wrote: Do we have a history-less (i.e. no-grow) FileStorage? Thanks, Stefan -- Anything that, in happening, causes something else to happen, causes something else to happen. --Douglas Adams ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev