Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello Jim, Tuesday, May 11, 2010, 8:36:46 PM, you wrote: JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: >> On Sun, May 9, 2010 at 8:33 PM, Jim Fulton wrote: >>> >>> Our recent discussion of compression made me curious so I did some >>> analysis of pickle sizes in one of our large databases. This is for a >>> content management system. The database is packed weekly. It doesn't >>> include media, which are in blobs. >>> >>> There were ~19 million transaction in the database and around 130 >>> million data records. About 60% of the size was taken up by BTrees. >>> Compressing pickles using zlib with default compression reduced the >>> pickle sizes by ~58%. The average uncompressed record size was 1163 >>> bytes. The average compressed size was ~493 bytes. >>> >>> This is probably enough of a savings to make compression interesting. JF> ... >> That's really interesting! Did you notice any issues performance wise, or >> didn't you check that yet? JF> OK, I did some crude tests. It looks like compressing is a little JF> less expensive than pickling and decompressing is a little more JF> expensive than unpickling, which is to say this is pretty cheap. For JF> example, decompressing a data record took around 20 microseconds on my JF> machine. A typical ZEO load takes 10s of milliseconds. Even in Shane's JF> zodb shootout benchmark which loads data from ram, load times are JF> several hundred microseconds or more. JF> I don't think compression will hurt performance. It is likeley to JF> help it in practice because: JF> - There will be less data to send back and forth to remote servers. JF> - Smaller databases will get more benefit from disk caches. JF> (Databases will be more likely to fit on ssds.) JF> - ZEO caches (and relstorage memcached caches) will be able to hold JF> more object records. I was thinking about using other compressors. I found this: http://tukaani.org/lzma/benchmarks.html Seems like gzip/zlib is the fastest with some expense of efficiency. -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: What this country needs is a good five-cent microcomputer. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
Thanks Laurence, this looks really helpful. The simplicity of ZODB's concept and the joy of using it apparently hides some of the complexity necessary to use it efficiently. I'll check this out when I circle back to data stuff tomorrow. Have a great morning/day/evening! -Ryan On Tue, May 11, 2010 at 5:44 PM, Laurence Rowe wrote: > I think this means that you are storing all of your data in a single > persistent object, the database root PersistentMapping. You need to > break up your data into persistent objects (instances of objects that > inherit from persistent.Persistent) for the ZODB to have a chance of > performing memory mapping. You want to do something like: > > import transaction > from ZODB import FileStorage, DB > from BTrees.LOBTree import BTree, TreeSet > storage = FileStorage.FileStorage('/tmp/test-filestorage.fs') > db = DB(storage) > conn = db.open() > root = conn.root() > transaction.begin() > index = root['index'] = BTree() > values = index[1] = TreeSet() > values.add(42) > transaction.commit() > > You should probably read: > http://www.zodb.org/documentation/guide/modules.html#btrees-package. > Since that was written an L variants of the BTree types have been > introduced for storing 64bit integers. I'm using an LOBTree because > that maps 64bit integers to python objects. For values I'm using an > LOTreeSet, though you could also use an LLTreeSet (which has larger > buckets). > > Laurence > > On 12 May 2010 00:37, Ryan Noon wrote: > > Hi Jim, > > I'm really sorry for the miscommunication, I thought I made that clear in > my > > last email: > > "I'm wrapping ZODB in a 'ZMap' class that just forwards all the > dictionary > > methods to the ZODB root and allows easy interchangeability with my old > > sqlite OODB abstraction." > > wordid_to_docset is a "ZMap", which just wraps the ZODB > > boilerplate/connection and forwards dictionary methods to the root. If > this > > seems superfluous, it was just to maintain backwards compatibility with > all > > of the code I'd already written for the sqlite OODB I was using before I > > switched to ZODB. Whenever you see something like wordid_to_docset[id] > it's > > just doing self.root[id] behind the scenes in a __setitem__ call inside > the > > ZMap class, which I've pasted below. > > The db is just storing longs mapped to array('L')'s with a few thousand > > longs in em. I'm going to try switching to the persistent data structure > > that Laurence suggested (a pointer to relevant documentation would be > really > > useful), but I'm still sorta worried because in my experimentation with > ZODB > > so far I've never been able to observe it sticking to any cache limits, > no > > matter how often I tell it to garbage collect (even when storing very > small > > values that should give it adequate granularity...see my experiment at > the > > end of my last email). If the memory reported to the OS by Python 2.6 is > > the problem I'd understand, but memory usage goes up the second I start > > adding new things (which indicates that Python is asking for more and not > > actually freeing internally, no?). > > If you feel there's something pathological about my memory access > patterns > > in this operation I can just do the actual inversion step in Hadoop and > load > > the output into ZODB for my application later, I was just hoping to keep > all > > of my data in OODB's the entire time. > > Thanks again all of you for your collective time. I really like ZODB so > > far, and it bugs me that I'm likely screwing it up somewhere. > > Cheers, > > Ryan > > > > > > class ZMap(object): > > > > def __init__(self, name=None, dbfile=None, cache_size_mb=512, > > autocommit=True): > > self.name = name > > self.dbfile = dbfile > > self.autocommit = autocommit > > > > self.__hash__ = None #can't hash this > > > > #first things first, figure out if we need to make up a name > > if self.name == None: > > self.name = make_up_name() > > if sep in self.name: > > if self.name[-1] == sep: > > self.name = self.name[:-1] > > self.name = self.name.split(sep)[-1] > > > > > > if self.dbfile == None: > > self.dbfile = self.name + '.zdb' > > > > self.storage = FileStorage(self.dbfile, pack_keep_old=False) > > self.cache_size = cache_size_mb * 1024 * 1024 > > > > self.db = DB(self.storage, pool_size=1, > > cache_size_bytes=self.cache_size, > > historical_cache_size_bytes=self.cache_size, database_name=self.name) > > self.connection = self.db.open() > > self.root = self.connection.root() > > > > print 'Initializing ZMap "%s" in file "%s" with %dmb cache. > Current > > %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root)) > > > > # basic operators > > def __eq__(self, y): # x == y > > return self.root.__eq__(y) > > def __ge__(self, y): # x >= y > > return
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
I think this means that you are storing all of your data in a single persistent object, the database root PersistentMapping. You need to break up your data into persistent objects (instances of objects that inherit from persistent.Persistent) for the ZODB to have a chance of performing memory mapping. You want to do something like: import transaction from ZODB import FileStorage, DB from BTrees.LOBTree import BTree, TreeSet storage = FileStorage.FileStorage('/tmp/test-filestorage.fs') db = DB(storage) conn = db.open() root = conn.root() transaction.begin() index = root['index'] = BTree() values = index[1] = TreeSet() values.add(42) transaction.commit() You should probably read: http://www.zodb.org/documentation/guide/modules.html#btrees-package. Since that was written an L variants of the BTree types have been introduced for storing 64bit integers. I'm using an LOBTree because that maps 64bit integers to python objects. For values I'm using an LOTreeSet, though you could also use an LLTreeSet (which has larger buckets). Laurence On 12 May 2010 00:37, Ryan Noon wrote: > Hi Jim, > I'm really sorry for the miscommunication, I thought I made that clear in my > last email: > "I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary > methods to the ZODB root and allows easy interchangeability with my old > sqlite OODB abstraction." > wordid_to_docset is a "ZMap", which just wraps the ZODB > boilerplate/connection and forwards dictionary methods to the root. If this > seems superfluous, it was just to maintain backwards compatibility with all > of the code I'd already written for the sqlite OODB I was using before I > switched to ZODB. Whenever you see something like wordid_to_docset[id] it's > just doing self.root[id] behind the scenes in a __setitem__ call inside the > ZMap class, which I've pasted below. > The db is just storing longs mapped to array('L')'s with a few thousand > longs in em. I'm going to try switching to the persistent data structure > that Laurence suggested (a pointer to relevant documentation would be really > useful), but I'm still sorta worried because in my experimentation with ZODB > so far I've never been able to observe it sticking to any cache limits, no > matter how often I tell it to garbage collect (even when storing very small > values that should give it adequate granularity...see my experiment at the > end of my last email). If the memory reported to the OS by Python 2.6 is > the problem I'd understand, but memory usage goes up the second I start > adding new things (which indicates that Python is asking for more and not > actually freeing internally, no?). > If you feel there's something pathological about my memory access patterns > in this operation I can just do the actual inversion step in Hadoop and load > the output into ZODB for my application later, I was just hoping to keep all > of my data in OODB's the entire time. > Thanks again all of you for your collective time. I really like ZODB so > far, and it bugs me that I'm likely screwing it up somewhere. > Cheers, > Ryan > > > class ZMap(object): > > def __init__(self, name=None, dbfile=None, cache_size_mb=512, > autocommit=True): > self.name = name > self.dbfile = dbfile > self.autocommit = autocommit > > self.__hash__ = None #can't hash this > > #first things first, figure out if we need to make up a name > if self.name == None: > self.name = make_up_name() > if sep in self.name: > if self.name[-1] == sep: > self.name = self.name[:-1] > self.name = self.name.split(sep)[-1] > > > if self.dbfile == None: > self.dbfile = self.name + '.zdb' > > self.storage = FileStorage(self.dbfile, pack_keep_old=False) > self.cache_size = cache_size_mb * 1024 * 1024 > > self.db = DB(self.storage, pool_size=1, > cache_size_bytes=self.cache_size, > historical_cache_size_bytes=self.cache_size, database_name=self.name) > self.connection = self.db.open() > self.root = self.connection.root() > > print 'Initializing ZMap "%s" in file "%s" with %dmb cache. Current > %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root)) > > # basic operators > def __eq__(self, y): # x == y > return self.root.__eq__(y) > def __ge__(self, y): # x >= y > return len(self) >= len(y) > def __gt__(self, y): # x > y > return len(self) > len(y) > def __le__(self, y): # x <= y > return not self.__gt__(y) > def __lt__(self, y): # x < y > return not self.__ge__(y) > def __len__(self): # len(x) > return len(self.root) > > > # dictionary stuff > def __getitem__(self, key): # x[key] > return self.root[key] > def __setitem__(self, key, value): # x[key] = value > self.root[key] = value > self.__commit_check() # write back if necessary > > def __de
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
Hi Jim, I'm really sorry for the miscommunication, I thought I made that clear in my last email: "I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary methods to the ZODB root and allows easy interchangeability with my old sqlite OODB abstraction." wordid_to_docset is a "ZMap", which just wraps the ZODB boilerplate/connection and forwards dictionary methods to the root. If this seems superfluous, it was just to maintain backwards compatibility with all of the code I'd already written for the sqlite OODB I was using before I switched to ZODB. Whenever you see something like wordid_to_docset[id] it's just doing self.root[id] behind the scenes in a __setitem__ call inside the ZMap class, which I've pasted below. The db is just storing longs mapped to array('L')'s with a few thousand longs in em. I'm going to try switching to the persistent data structure that Laurence suggested (a pointer to relevant documentation would be really useful), but I'm still sorta worried because in my experimentation with ZODB so far I've never been able to observe it sticking to any cache limits, no matter how often I tell it to garbage collect (even when storing very small values that should give it adequate granularity...see my experiment at the end of my last email). If the memory reported to the OS by Python 2.6 is the problem I'd understand, but memory usage goes up the second I start adding new things (which indicates that Python is asking for more and not actually freeing internally, no?). If you feel there's something pathological about my memory access patterns in this operation I can just do the actual inversion step in Hadoop and load the output into ZODB for my application later, I was just hoping to keep all of my data in OODB's the entire time. Thanks again all of you for your collective time. I really like ZODB so far, and it bugs me that I'm likely screwing it up somewhere. Cheers, Ryan class ZMap(object): def __init__(self, name=None, dbfile=None, cache_size_mb=512, autocommit=True): self.name = name self.dbfile = dbfile self.autocommit = autocommit self.__hash__ = None #can't hash this #first things first, figure out if we need to make up a name if self.name == None: self.name = make_up_name() if sep in self.name: if self.name[-1] == sep: self.name = self.name[:-1] self.name = self.name.split(sep)[-1] if self.dbfile == None: self.dbfile = self.name + '.zdb' self.storage = FileStorage(self.dbfile, pack_keep_old=False) self.cache_size = cache_size_mb * 1024 * 1024 self.db = DB(self.storage, pool_size=1, cache_size_bytes=self.cache_size, historical_cache_size_bytes=self.cache_size, database_name=self.name) self.connection = self.db.open() self.root = self.connection.root() print 'Initializing ZMap "%s" in file "%s" with %dmb cache. Current %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root)) # basic operators def __eq__(self, y): # x == y return self.root.__eq__(y) def __ge__(self, y): # x >= y return len(self) >= len(y) def __gt__(self, y): # x > y return len(self) > len(y) def __le__(self, y): # x <= y return not self.__gt__(y) def __lt__(self, y): # x < y return not self.__ge__(y) def __len__(self): # len(x) return len(self.root) # dictionary stuff def __getitem__(self, key): # x[key] return self.root[key] def __setitem__(self, key, value): # x[key] = value self.root[key] = value self.__commit_check() # write back if necessary def __delitem__(self, key): # del x[key] del self.root[key] def get(self, key, default=None): # x[key] if key in x, else default return self.root.get(key, default) def has_key(self, key): # True if x has key, else False return self.root.has_key(key) def items(self): # list of key/val pairs return self.root.items() def keys(self): return self.root.keys() def pop(self, key, default=None): return self.root.pop() def popitem(self): #remove and return an arbitrary key/val pair return self.root.popitem() def setdefault(self, key, default=None): #D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D return self.root.setdefault(key, default) def values(self): return self.root.values() def copy(self): #copy it? dubiously necessary at the moment NOT_IMPLEMENTED('copy') # iteration def __iter__(self): # iter(x) return self.root.iterkeys() def iteritems(self): #iterator over items, this can be hellaoptimized return self.root.iteritems() def itervalues(self): return self.root.itervalues() def iterkeys(self): return self.root.iterkeys() # prac
Re: [ZODB-Dev] Some interesting (to some:) numbers
On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: > On Sun, May 9, 2010 at 8:33 PM, Jim Fulton wrote: >> >> Our recent discussion of compression made me curious so I did some >> analysis of pickle sizes in one of our large databases. This is for a >> content management system. The database is packed weekly. It doesn't >> include media, which are in blobs. >> >> There were ~19 million transaction in the database and around 130 >> million data records. About 60% of the size was taken up by BTrees. >> Compressing pickles using zlib with default compression reduced the >> pickle sizes by ~58%. The average uncompressed record size was 1163 >> bytes. The average compressed size was ~493 bytes. >> >> This is probably enough of a savings to make compression interesting. ... > That's really interesting! Did you notice any issues performance wise, or > didn't you check that yet? OK, I did some crude tests. It looks like compressing is a little less expensive than pickling and decompressing is a little more expensive than unpickling, which is to say this is pretty cheap. For example, decompressing a data record took around 20 microseconds on my machine. A typical ZEO load takes 10s of milliseconds. Even in Shane's zodb shootout benchmark which loads data from ram, load times are several hundred microseconds or more. I don't think compression will hurt performance. It is likeley to help it in practice because: - There will be less data to send back and forth to remote servers. - Smaller databases will get more benefit from disk caches. (Databases will be more likely to fit on ssds.) - ZEO caches (and relstorage memcached caches) will be able to hold more object records. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Adam GROSZER wrote: > Hello Jim, > > > > Tuesday, May 11, 2010, 4:46:46 PM, you wrote: > > JF> On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER wrote: >>> Hello Jim, >>> >>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote: >>> >>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER >>> wrote: > Hello Jim, > > Tuesday, May 11, 2010, 12:33:04 PM, you wrote: > > JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER > wrote: >>> Hello Jim, >>> >>> Monday, May 10, 2010, 1:27:00 PM, you wrote: >>> >>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink >>> wrote: > That's really interesting! Did you notice any issues performance > wise, or > didn't you check that yet? >>> JF> I didn't check performance. I just iterated over a file storage >>> file, >>> JF> checking compressed and uncompressed pickle sizes. >>> >>> I'd say some checksum is then also needed to detect bit failures that >>> mess up the compressed data. > JF> Why? > > I think the gzip algo compresses to a bit-stream, where even one bit > has an error the rest of the uncompressed data might be a total mess. > If that one bit is relatively early in the stream it's fatal. > Salvaging the data is not a joy either. > I know at this level we should expect that the OS and any underlying > infrastructure should provide error-free data or fail. > Tho I've seen some magic situations where the file copied without > error through a network, but at the end CRC check failed on it :-O >>> JF> How would a checksum help? All it would do is tell you your hosed. >>> JF> It wouldn't make you any less hosed. >>> >>> Yes, but I would know why it's hosed. > > JF> How so? How would you know why it is hosed. > > Because of data corruption in the compressed stream. > > JF> Note BTW that the zlib format already includes a checksum. > > JF> http://www.faqs.org/rfcs/rfc1950.html > > I missed that. Case closed then ;-) Sorry for the noise. A zipped file is not different from other (binary) data stored within the ZODB. Data corruption can occur always - zipped or not. Side note: we implemented compression support as part of our CMS on the application layer where we store large binary files as linked PData chains. However we do not compress in any case - only for certain content-types (it does not make sense to compress zip or jar files). We also store a md5 hash for each object (and never had a corruption issue so far). Andreas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvpn5sACgkQCJIWIbr9KYy3nACfV1lo6FLX7xeiDRVRlsj64tSX Xy4An31w7pY9K0wmIIUtIxzpFGRmv7GW =HbjJ -END PGP SIGNATURE- <>___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints
On 5/11/10 19:41 , Chris Withers wrote: > Jim Fulton wrote: I plan to implement A soon if there are no objections. Unless someone somehow convinced me to do D, I'll also add an assertion in the Transaction.join method to raise an error if a data manager joins more than once. >>> Option A sounds sensible. It also means I won't have to change >>> anything in the zope.sqlalchemy data manager. >> >> Very cool. I was hoping non-ZODB-data-manager authors >> were paying attention. :) >> >> If anyone knows of any other, I would appreciate someone forwarding >> this thread to them. > > zope.sendmail and MaildropHost have data managers. > I've seen some file-based things that was a data manager and a few > others in people's BFG stacks, maybe they can pipe up and/or let the > others of those wsgi components know. I wrote repoze.filesafe a while ago to do transaction-aware creation of files, which uses a datamanager. It never tries to join a transaction more than once, so should be fine. Wichert. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints
Jim Fulton wrote: >>> I plan to implement A soon if there are no objections. >>> >>> Unless someone somehow convinced me to do D, I'll also add an >>> assertion in the Transaction.join method to raise an error if a >>> data manager joins more than once. >> Option A sounds sensible. It also means I won't have to change >> anything in the zope.sqlalchemy data manager. > > Very cool. I was hoping non-ZODB-data-manager authors > were paying attention. :) > > If anyone knows of any other, I would appreciate someone forwarding > this thread to them. zope.sendmail and MaildropHost have data managers. I've seen some file-based things that was a data manager and a few others in people's BFG stacks, maybe they can pipe up and/or let the others of those wsgi components know. I'm also likely about to write one, but I'm dumb, so can't comment meaningfully on the options ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On Tue, May 11, 2010 at 11:35 AM, Laurence Rowe wrote: > On 11 May 2010 15:08, Jim Fulton wrote: >> On Tue, May 11, 2010 at 8:38 AM, Benji York wrote: >>> On Tue, May 11, 2010 at 7:34 AM, Jim Fulton wrote: [...] The best I've been able to come up with is something like: t = ZODB.transaction(3) while t.trying: with t: ... transaction body ... >>> >>> I think you could get this to work: >>> >>> for transaction in ZODB.retries(3): >>> with transaction: >>> ... transaction body ... >>> >>> ZODB.retries would return an iterator that would raise StopIteration on >>> the next go-round if the previously yielded context manager exited >>> without a ConflictError. >> >> This is an improvement. It's still unsatisfying, but I don't think I'm going >> to >> get satisfaction. :) >> >> BTW, if I do something like this, I think I'll add a retry exception to >> the transaction package and have ZODB.POSException.ConflictError >> extend it so I can add the retry automation to the transaction package. > > The repoze.retry package lets you configure a list of exceptions. > http://pypi.python.org/pypi/repoze.retry > http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py > > Though it seems inspecting the error text is required for most sql > database errors to know if they are retryable, as ZPsycoPGDA does: > > 188 except (psycopg2.ProgrammingError, > psycopg2.IntegrityError), e: > 189 if e.args[0].find("concurrent update") > -1: > 190 raise ConflictError > > (https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py) > > For PostgreSQL it should be sufficient to catch these errors and raise > Retry during tpc_vote. > > For databases which do not provide MVCC in the same way as PostgreSQL, > concurrency errors could be manifested at any point in the > transaction. Even Oracle can raise an error during a long running > transaction when insufficient rollback space is available, resulting > in what is essentially a read conflict error. Such errors could not be > caught by a data manager and reraised as a Retry exception. > > I think it might be useful to add an optional method to data managers > that is queried by the retry automation machinery to see if an > exception should potentially be retried. Perhaps this would best be > accomplished in two steps: > > 1. Add an optional property to data managers called ``retryable``. > This is a list of potentially retryable exceptions. When a data > manager is added to the transaction, the transaction's list of > retryable exceptions is extended by the joining data managers list of > retryable exceptions. > > t = transaction.begin() > try: > application() > except t.retryable, e: > t.retry(e): > > 2. t.retry(e) is then checks with each registered data manager if that > particular exceptions is retryable, and if so raises Retry. > > def retry(self, e): > for datamanager in self._resources: > try: > retry = datamanager.retry > except AttributeError: > continue > if isinstance(e, datamanager.retryable): > datamanager.retry(e) # dm may raise Retry here Thanks. I don't think we need 1 and 2. I'm inclined to go with 2. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On 11 May 2010 15:08, Jim Fulton wrote: > On Tue, May 11, 2010 at 8:38 AM, Benji York wrote: >> On Tue, May 11, 2010 at 7:34 AM, Jim Fulton wrote: >>> [...] The best I've been >>> able to come up with is something like: >>> >>> t = ZODB.transaction(3) >>> while t.trying: >>> with t: >>> ... transaction body ... >> >> I think you could get this to work: >> >> for transaction in ZODB.retries(3): >> with transaction: >> ... transaction body ... >> >> ZODB.retries would return an iterator that would raise StopIteration on >> the next go-round if the previously yielded context manager exited >> without a ConflictError. > > This is an improvement. It's still unsatisfying, but I don't think I'm going > to > get satisfaction. :) > > BTW, if I do something like this, I think I'll add a retry exception to > the transaction package and have ZODB.POSException.ConflictError > extend it so I can add the retry automation to the transaction package. The repoze.retry package lets you configure a list of exceptions. http://pypi.python.org/pypi/repoze.retry http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py Though it seems inspecting the error text is required for most sql database errors to know if they are retryable, as ZPsycoPGDA does: 188 except (psycopg2.ProgrammingError, psycopg2.IntegrityError), e: 189 if e.args[0].find("concurrent update") > -1: 190 raise ConflictError (https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py) For PostgreSQL it should be sufficient to catch these errors and raise Retry during tpc_vote. For databases which do not provide MVCC in the same way as PostgreSQL, concurrency errors could be manifested at any point in the transaction. Even Oracle can raise an error during a long running transaction when insufficient rollback space is available, resulting in what is essentially a read conflict error. Such errors could not be caught by a data manager and reraised as a Retry exception. I think it might be useful to add an optional method to data managers that is queried by the retry automation machinery to see if an exception should potentially be retried. Perhaps this would best be accomplished in two steps: 1. Add an optional property to data managers called ``retryable``. This is a list of potentially retryable exceptions. When a data manager is added to the transaction, the transaction's list of retryable exceptions is extended by the joining data managers list of retryable exceptions. t = transaction.begin() try: application() except t.retryable, e: t.retry(e): 2. t.retry(e) is then checks with each registered data manager if that particular exceptions is retryable, and if so raises Retry. def retry(self, e): for datamanager in self._resources: try: retry = datamanager.retry except AttributeError: continue if isinstance(e, datamanager.retryable): datamanager.retry(e) # dm may raise Retry here Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
Am 11.05.2010, 17:08 Uhr, schrieb Nitro : > Am 11.05.2010, 16:01 Uhr, schrieb Jim Fulton : > >> This wouldn't work. You would need to re-execute the suite >> for each retry. It's not enough to just keep committing the same >> transaction. (There are other details wrong with the code above, >> but they are fixable.) Python doesn't provide a way to keep >> executing the suite. > > You are right. > > The only thing I could come up with was something like below, using a > decorator instead of a context. > > -Matthias > > @doTransaction(count = 5) > def storeData(): > ... store data here ... > > def doTransaction(transaction = None, count = 3): > def decorator(func): > def do(): > for i in range(1+count): > try: > func() > except: > transaction.abort() > raise > try: > transaction.commit() > except ConflictError: > if i == count: > raise > else: > return > return do This should read return do(), i.e. the decorator should directly execute the storeData function. All in all I think Benji's proposal looks better :-) -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
Am 11.05.2010, 16:01 Uhr, schrieb Jim Fulton : > This wouldn't work. You would need to re-execute the suite > for each retry. It's not enough to just keep committing the same > transaction. (There are other details wrong with the code above, > but they are fixable.) Python doesn't provide a way to keep > executing the suite. You are right. The only thing I could come up with was something like below, using a decorator instead of a context. -Matthias @doTransaction(count = 5) def storeData(): ... store data here ... def doTransaction(transaction = None, count = 3): def decorator(func): def do(): for i in range(1+count): try: func() except: transaction.abort() raise try: transaction.commit() except ConflictError: if i == count: raise else: return return do return decorator ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello Jim, Tuesday, May 11, 2010, 4:46:46 PM, you wrote: JF> On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER wrote: >> Hello Jim, >> >> Tuesday, May 11, 2010, 1:37:19 PM, you wrote: >> >> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER wrote: Hello Jim, Tuesday, May 11, 2010, 12:33:04 PM, you wrote: JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: >> Hello Jim, >> >> Monday, May 10, 2010, 1:27:00 PM, you wrote: >> >> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink >> wrote: That's really interesting! Did you notice any issues performance wise, or didn't you check that yet? >> >> JF> I didn't check performance. I just iterated over a file storage file, >> JF> checking compressed and uncompressed pickle sizes. >> >> I'd say some checksum is then also needed to detect bit failures that >> mess up the compressed data. JF> Why? I think the gzip algo compresses to a bit-stream, where even one bit has an error the rest of the uncompressed data might be a total mess. If that one bit is relatively early in the stream it's fatal. Salvaging the data is not a joy either. I know at this level we should expect that the OS and any underlying infrastructure should provide error-free data or fail. Tho I've seen some magic situations where the file copied without error through a network, but at the end CRC check failed on it :-O >> >> JF> How would a checksum help? All it would do is tell you your hosed. >> JF> It wouldn't make you any less hosed. >> >> Yes, but I would know why it's hosed. JF> How so? How would you know why it is hosed. Because of data corruption in the compressed stream. JF> Note BTW that the zlib format already includes a checksum. JF> http://www.faqs.org/rfcs/rfc1950.html I missed that. Case closed then ;-) Sorry for the noise. -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: Some are atheists by neglect; others are so by affectation; they that think there is no God at some times do not think so at all times. - Benjamin Whichcote ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER wrote: > Hello Jim, > > Tuesday, May 11, 2010, 1:37:19 PM, you wrote: > > JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER wrote: >>> Hello Jim, >>> >>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote: >>> >>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER >>> wrote: > Hello Jim, > > Monday, May 10, 2010, 1:27:00 PM, you wrote: > > JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink > wrote: >>> That's really interesting! Did you notice any issues performance wise, >>> or >>> didn't you check that yet? > > JF> I didn't check performance. I just iterated over a file storage file, > JF> checking compressed and uncompressed pickle sizes. > > I'd say some checksum is then also needed to detect bit failures that > mess up the compressed data. >>> >>> JF> Why? >>> >>> I think the gzip algo compresses to a bit-stream, where even one bit >>> has an error the rest of the uncompressed data might be a total mess. >>> If that one bit is relatively early in the stream it's fatal. >>> Salvaging the data is not a joy either. >>> I know at this level we should expect that the OS and any underlying >>> infrastructure should provide error-free data or fail. >>> Tho I've seen some magic situations where the file copied without >>> error through a network, but at the end CRC check failed on it :-O > > JF> How would a checksum help? All it would do is tell you your hosed. > JF> It wouldn't make you any less hosed. > > Yes, but I would know why it's hosed. How so? How would you know why it is hosed. Note BTW that the zlib format already includes a checksum. http://www.faqs.org/rfcs/rfc1950.html Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On Tue, May 11, 2010 at 10:08 AM, Jim Fulton wrote: > This is an improvement. It's still unsatisfying, but I don't think I'm going > to > get satisfaction. :) Given that PEP 343 explicitly mentions *not* supporting an auto retry construct, I should think not. :) -- Benji York ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On Tue, May 11, 2010 at 8:38 AM, Benji York wrote: > On Tue, May 11, 2010 at 7:34 AM, Jim Fulton wrote: >> [...] The best I've been >> able to come up with is something like: >> >> t = ZODB.transaction(3) >> while t.trying: >> with t: >> ... transaction body ... > > I think you could get this to work: > > for transaction in ZODB.retries(3): > with transaction: > ... transaction body ... > > ZODB.retries would return an iterator that would raise StopIteration on > the next go-round if the previously yielded context manager exited > without a ConflictError. This is an improvement. It's still unsatisfying, but I don't think I'm going to get satisfaction. :) BTW, if I do something like this, I think I'll add a retry exception to the transaction package and have ZODB.POSException.ConflictError extend it so I can add the retry automation to the transaction package. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
On May 11, 2010, at 9:53 AM, Lennart Regebro wrote: > On Tue, May 11, 2010 at 14:47, Adam GROSZER wrote: >> Probably that crappy data would make the unpickler fail... or wait a >> second... the unpickler is a **SECURITY HOLE** in python, isn't it? >> That means feed it some random data... and stay tuned for the >> unexpected. > > That a bitflip would generate random data that actually did anything > at all is a bit like if you shake a puzzle box and out comes a > dinosaur and bites your leg. :-) > >> The thing is that a single bitflip could cause a LOT of crap. > > Mostly likely it would generate an unpickling error. But yeah, in > theory at least you are right. I have no idea what the performance > penalty would be, but a checksum would feel good. :) Most likely a bit flip in uncompressed data is much worse as it will probably pass unnoticed until it cause a major pain somewhere far away from where the bit flip occurred, in this manner compressed data all the way to a zeo client is better for a higher chance of fail-stop. I think, maybe :) -- Leonardo Santagada santagada at gmail.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On Tue, May 11, 2010 at 7:52 AM, Nitro wrote: > I'm already using custom transaction/savepoint context managers in my > code. I use them like ... > Now you could probably extend this to look like > > class TransactionContext(object): > def __init__(self, txn = None, retryCount = 3): > if txn is None: > txn = transaction.get() > self.txn = txn > self.retryCount = retryCount > > def __enter__(self): > return self.txn > > def __exit__(self, t, v, tb): > if t is not None: > self.txn.abort() > else: > for i in range(self.retryCount): > try: > self.txn.commit() > except ConflictError as exc2: > exc = exc2 > else: > return > raise exc > > The looping/except part could probably look nicer. Use case looks like: > > with TransactionContext(mytransaction, retryCount = 5): > db.root['sp_test'] = 'init' > > Does this look similar to what you were looking for? This wouldn't work. You would need to re-execute the suite for each retry. It's not enough to just keep committing the same transaction. (There are other details wrong with the code above, but they are fixable.) Python doesn't provide a way to keep executing the suite. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
On Tue, May 11, 2010 at 14:47, Adam GROSZER wrote: > Probably that crappy data would make the unpickler fail... or wait a > second... the unpickler is a **SECURITY HOLE** in python, isn't it? > That means feed it some random data... and stay tuned for the > unexpected. That a bitflip would generate random data that actually did anything at all is a bit like if you shake a puzzle box and out comes a dinosaur and bites your leg. :-) > The thing is that a single bitflip could cause a LOT of crap. Mostly likely it would generate an unpickling error. But yeah, in theory at least you are right. I have no idea what the performance penalty would be, but a checksum would feel good. :) -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello, Tuesday, May 11, 2010, 1:59:17 PM, you wrote: N> Am 11.05.2010, 13:47 Uhr, schrieb Adam GROSZER : >> Hello Jim, >> >> Tuesday, May 11, 2010, 1:37:19 PM, you wrote: >> >> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER >> wrote: Hello Jim, Tuesday, May 11, 2010, 12:33:04 PM, you wrote: JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: >> Hello Jim, >> >> Monday, May 10, 2010, 1:27:00 PM, you wrote: >> >> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink >> wrote: That's really interesting! Did you notice any issues performance wise, or didn't you check that yet? >> >> JF> I didn't check performance. I just iterated over a file storage >> file, >> JF> checking compressed and uncompressed pickle sizes. >> >> I'd say some checksum is then also needed to detect bit failures that >> mess up the compressed data. JF> Why? I think the gzip algo compresses to a bit-stream, where even one bit has an error the rest of the uncompressed data might be a total mess. If that one bit is relatively early in the stream it's fatal. Salvaging the data is not a joy either. I know at this level we should expect that the OS and any underlying infrastructure should provide error-free data or fail. Tho I've seen some magic situations where the file copied without error through a network, but at the end CRC check failed on it :-O >> >> JF> How would a checksum help? All it would do is tell you your hosed. >> JF> It wouldn't make you any less hosed. >> >> Yes, but I would know why it's hosed. >> Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom >> app that does some calculation. N> You could have bitflips anywhere in the database, not just the payload N> parts. You'd have to checksum and test everything all the time. Imo it's N> not worth the complexity and performance penalty given today's redundant N> storages like RAID, ZRS or zeoraid. N> Btw, the current pickle payload format is not secured against any bitflips N> either I think. The difference between the uncompressed and compressed is that if you have bitflips in an uncompressed data stream then you get let's say a B instead of A, or 3 instead of 1. That hits hard in numbers/IDs, but keeps string still human readable. Because the rest is still there. In a compressed stream the rest of the pickle/payload would be probably crap. Probably that crappy data would make the unpickler fail... or wait a second... the unpickler is a **SECURITY HOLE** in python, isn't it? That means feed it some random data... and stay tuned for the unexpected. The thing is that a single bitflip could cause a LOT of crap. You're right that currently there's no protection against such bitflips, but I'd rather present the user a nice error than some crappy data. -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: If necessity is the mother of invention, discontent is the father of progress. - David Rockefeller ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
On Tue, May 11, 2010 at 7:34 AM, Jim Fulton wrote: > [...] The best I've been > able to come up with is something like: > > t = ZODB.transaction(3) > while t.trying: > with t: > ... transaction body ... I think you could get this to work: for transaction in ZODB.retries(3): with transaction: ... transaction body ... ZODB.retries would return an iterator that would raise StopIteration on the next go-round if the previously yielded context manager exited without a ConflictError. -- Benji York ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
> def __exit__(self, t, v, tb): > if t is not None: > self.txn.abort() > else: > for i in range(self.retryCount): Oops, bug here. It should read range(1 + self.retryCount). It should probably have unittests anyway :-) -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Am 11.05.2010, 13:47 Uhr, schrieb Adam GROSZER : > Hello Jim, > > Tuesday, May 11, 2010, 1:37:19 PM, you wrote: > > JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER > wrote: >>> Hello Jim, >>> >>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote: >>> >>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER >>> wrote: > Hello Jim, > > Monday, May 10, 2010, 1:27:00 PM, you wrote: > > JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink > wrote: >>> That's really interesting! Did you notice any issues performance >>> wise, or >>> didn't you check that yet? > > JF> I didn't check performance. I just iterated over a file storage > file, > JF> checking compressed and uncompressed pickle sizes. > > I'd say some checksum is then also needed to detect bit failures that > mess up the compressed data. >>> >>> JF> Why? >>> >>> I think the gzip algo compresses to a bit-stream, where even one bit >>> has an error the rest of the uncompressed data might be a total mess. >>> If that one bit is relatively early in the stream it's fatal. >>> Salvaging the data is not a joy either. >>> I know at this level we should expect that the OS and any underlying >>> infrastructure should provide error-free data or fail. >>> Tho I've seen some magic situations where the file copied without >>> error through a network, but at the end CRC check failed on it :-O > > JF> How would a checksum help? All it would do is tell you your hosed. > JF> It wouldn't make you any less hosed. > > Yes, but I would know why it's hosed. > Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom > app that does some calculation. You could have bitflips anywhere in the database, not just the payload parts. You'd have to checksum and test everything all the time. Imo it's not worth the complexity and performance penalty given today's redundant storages like RAID, ZRS or zeoraid. Btw, the current pickle payload format is not secured against any bitflips either I think. -Matthias ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Automating retry management
I'm already using custom transaction/savepoint context managers in my code. I use them like with TransactionContext(): db.root['sp_test'] = 'init' with SavepointContext(): db.root['sp_test'] = 'saved' On of the context managers: class TransactionContext(object): def __init__(self, txn = None): if txn is None: txn = transaction.get() self.txn = txn def __enter__(self): return self.txn def __exit__(self, t, v, tb): if t is not None: self.txn.abort() else: self.txn.commit() Now you could probably extend this to look like class TransactionContext(object): def __init__(self, txn = None, retryCount = 3): if txn is None: txn = transaction.get() self.txn = txn self.retryCount = retryCount def __enter__(self): return self.txn def __exit__(self, t, v, tb): if t is not None: self.txn.abort() else: for i in range(self.retryCount): try: self.txn.commit() except ConflictError as exc2: exc = exc2 else: return raise exc The looping/except part could probably look nicer. Use case looks like: with TransactionContext(mytransaction, retryCount = 5): db.root['sp_test'] = 'init' Does this look similar to what you were looking for? -Matthias For completeness, here's my savepoint manager: class SavepointContext(object): def __enter__(self, txn = None): if txn is None: txn = transaction.get() self.savepoint = txn.savepoint() return self.savepoint def __exit__(self, type, value, traceback): if type is not None: self.savepoint.rollback() ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello Jim, Tuesday, May 11, 2010, 1:37:19 PM, you wrote: JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER wrote: >> Hello Jim, >> >> Tuesday, May 11, 2010, 12:33:04 PM, you wrote: >> >> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: Hello Jim, Monday, May 10, 2010, 1:27:00 PM, you wrote: JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: >> That's really interesting! Did you notice any issues performance wise, or >> didn't you check that yet? JF> I didn't check performance. I just iterated over a file storage file, JF> checking compressed and uncompressed pickle sizes. I'd say some checksum is then also needed to detect bit failures that mess up the compressed data. >> >> JF> Why? >> >> I think the gzip algo compresses to a bit-stream, where even one bit >> has an error the rest of the uncompressed data might be a total mess. >> If that one bit is relatively early in the stream it's fatal. >> Salvaging the data is not a joy either. >> I know at this level we should expect that the OS and any underlying >> infrastructure should provide error-free data or fail. >> Tho I've seen some magic situations where the file copied without >> error through a network, but at the end CRC check failed on it :-O JF> How would a checksum help? All it would do is tell you your hosed. JF> It wouldn't make you any less hosed. Yes, but I would know why it's hosed. Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom app that does some calculation. -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: The Past is over for all of us... the Future is promised to none of us. - Wayne Dyer ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER wrote: > Hello Jim, > > Tuesday, May 11, 2010, 12:33:04 PM, you wrote: > > JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: >>> Hello Jim, >>> >>> Monday, May 10, 2010, 1:27:00 PM, you wrote: >>> >>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink >>> wrote: > That's really interesting! Did you notice any issues performance wise, or > didn't you check that yet? >>> >>> JF> I didn't check performance. I just iterated over a file storage file, >>> JF> checking compressed and uncompressed pickle sizes. >>> >>> I'd say some checksum is then also needed to detect bit failures that >>> mess up the compressed data. > > JF> Why? > > I think the gzip algo compresses to a bit-stream, where even one bit > has an error the rest of the uncompressed data might be a total mess. > If that one bit is relatively early in the stream it's fatal. > Salvaging the data is not a joy either. > I know at this level we should expect that the OS and any underlying > infrastructure should provide error-free data or fail. > Tho I've seen some magic situations where the file copied without > error through a network, but at the end CRC check failed on it :-O How would a checksum help? All it would do is tell you your hosed. It wouldn't make you any less hosed. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Automating retry management
So I'm about to update the transaction package and this gives me an opportunity to do something I've been meaning to do for a while, which is to add support for the Python with statement: with transaction: ... transaction body ... and: with some_transaction_manager as t: ... transaction body, accesses current transaction as t ... This looks really great, IMO, but there's a major piece missing, which is dealing with transient transaction failures due to conflicts. If using an optimistic transaction mechanism, like ZODB's, you have to deal with conflict errors. If using a lock-based transaction mechanism, you'd have to deal with deadlock detection. In either case, you have to detect conflicts and retry the transaction. I wonder how other Python database interfaces deal with this. (I just skimmed the DBI v2 spec and didn't see anything.) What happens, for example, if there are conflicting writes in a Postgress or Oracle application? I assume that some sort of exception is raised. I also wonder how this situation could be handled elegantly. To deal with conflicts, (assuming transaction had with statement support) you'd end up with: tries = 0 while 1: try: with transaction: conn.root.x = 1 except ZODB.POSExeption.ConflictError: tries += 1 if tries > 3: raise Yuck! (Although it's better than it would be without transaction with statement support.) In web applications, we generally don't see the retry management because the framework takes care of it for us. That is until we write a script to do something outside of a web application. This would be easier to automate if Python let us write custom looping structures or allowed full anonymous functions. The best I've been able to come up with is something like: t = ZODB.transaction(3) while t.trying: with t: ... transaction body ... Here the transaction function returns an object that: - keeps track of how many times it's tried and manages a "trying" attribute that is true while we haven't given up or suceeded, and - is a context manager that takes care of transaction boundaries and updates the trying attr depending on transaction outcome. This version is better than the one with the try/except version, but isn't entirely satisfying. :) Does anyone have any better ideas? I use a ZODB function, because ConflictError is ZODB specific. It would be nice if this could be standardized, so that a mechanism could be defined by the transaction package. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello Jim, Tuesday, May 11, 2010, 12:33:04 PM, you wrote: JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: >> Hello Jim, >> >> Monday, May 10, 2010, 1:27:00 PM, you wrote: >> >> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: That's really interesting! Did you notice any issues performance wise, or didn't you check that yet? >> >> JF> I didn't check performance. I just iterated over a file storage file, >> JF> checking compressed and uncompressed pickle sizes. >> >> I'd say some checksum is then also needed to detect bit failures that >> mess up the compressed data. JF> Why? I think the gzip algo compresses to a bit-stream, where even one bit has an error the rest of the uncompressed data might be a total mess. If that one bit is relatively early in the stream it's fatal. Salvaging the data is not a joy either. I know at this level we should expect that the OS and any underlying infrastructure should provide error-free data or fail. Tho I've seen some magic situations where the file copied without error through a network, but at the end CRC check failed on it :-O -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: You may have to fight a battle more than once to win it. - Margaret Thatcher ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)
On Mon, May 10, 2010 at 8:20 PM, Ryan Noon wrote: > P.S. About the data structures: > wordset is a freshly unpickled python set from my old sqlite oodb thingy. > The new docsets I'm keeping are 'L' arrays from the stdlib array module. > I'm up for using ZODB's builtin persistent data structures if it makes a > lot of sense to do so, but it sorta breaks my abstraction a bit and I feel > like the memory issues I'm having are somewhat independent of the container > data structures (as I'm having the same issue just with fixed size strings). This is getting tiresome. We can't really advise you because we can't see what data structures you're using and we're wasting too much time guessing. We wouldn't have to guess and grill you if you showed a complete demonstration program, or at least one that showed what the heck your doing. The program you've showed so far is so incomplete, perhaps we're missing the obvious. In your original program, you never actually store anything in the database. You assign the database root to self.root, but never use self.root. (The variable self is not defined and we're left to assume that this disembodied code is part of a method definition.) In your most recent snippet, you don't show any database access. If you never actually store anything in the database, then nothing will be removed from memory. You're inserting data into wordid_to_docset, but you don't show its definition and won't tell us what it is. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER wrote: > Hello Jim, > > Monday, May 10, 2010, 1:27:00 PM, you wrote: > > JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: >>> That's really interesting! Did you notice any issues performance wise, or >>> didn't you check that yet? > > JF> I didn't check performance. I just iterated over a file storage file, > JF> checking compressed and uncompressed pickle sizes. > > I'd say some checksum is then also needed to detect bit failures that > mess up the compressed data. Why? Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Some interesting (to some:) numbers
Hello Jim, Monday, May 10, 2010, 1:27:00 PM, you wrote: JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink wrote: >> That's really interesting! Did you notice any issues performance wise, or >> didn't you check that yet? JF> I didn't check performance. I just iterated over a file storage file, JF> checking compressed and uncompressed pickle sizes. I'd say some checksum is then also needed to detect bit failures that mess up the compressed data. -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: A true friend is someone who is there for you when he'd rather be anywhere else. - Len Wein ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev