Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello Jim,

Tuesday, May 11, 2010, 8:36:46 PM, you wrote:

JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  wrote:
>> On Sun, May 9, 2010 at 8:33 PM, Jim Fulton  wrote:
>>>
>>> Our recent discussion of compression made me curious so I did some
>>> analysis of pickle sizes in one of our large databases. This is for a
>>> content management system.  The database is packed weekly.  It doesn't
>>> include media, which are in blobs.
>>>
>>> There were ~19 million transaction in the database and around 130
>>> million data records. About 60% of the size was taken up by BTrees.
>>> Compressing pickles using zlib with default compression reduced the
>>> pickle sizes by ~58%. The average uncompressed record size was 1163
>>> bytes.  The average compressed size was ~493 bytes.
>>>
>>> This is probably enough of a savings to make compression interesting.

JF> ...

>> That's really interesting! Did you notice any issues performance wise, or
>> didn't you check that yet?

JF> OK, I did some crude tests.  It looks like compressing is a little
JF> less expensive than pickling and decompressing is a little more
JF> expensive than unpickling, which is to say this is pretty cheap.  For
JF> example, decompressing a data record took around 20 microseconds on my
JF> machine. A typical ZEO load takes 10s of milliseconds. Even in Shane's
JF> zodb shootout benchmark which loads data from ram, load times are
JF> several hundred microseconds or more.

JF> I don't think compression will hurt performance.  It is likeley to
JF> help it in practice because:

JF> - There will be less data to send back and forth to remote servers.

JF> - Smaller databases will get more benefit from disk caches.
JF>   (Databases will be more likely to fit on ssds.)

JF> - ZEO caches (and relstorage memcached caches) will be able to hold
JF>   more object records.

I was thinking about using other compressors.
I found this:
http://tukaani.org/lzma/benchmarks.html
Seems like gzip/zlib is the fastest with some expense of efficiency.

-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
What this country needs is a good five-cent microcomputer.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-11 Thread Ryan Noon
Thanks Laurence, this looks really helpful.  The simplicity of ZODB's
concept and the joy of using it apparently hides some of the complexity
necessary to use it efficiently.  I'll check this out when I circle back to
data stuff tomorrow.

Have a great morning/day/evening!
-Ryan

On Tue, May 11, 2010 at 5:44 PM, Laurence Rowe  wrote:

> I think this means that you are storing all of your data in a single
> persistent object, the database root PersistentMapping. You need to
> break up your data into persistent objects (instances of objects that
> inherit from persistent.Persistent) for the ZODB to have a chance of
> performing memory mapping. You want to do something like:
>
> import transaction
> from ZODB import FileStorage, DB
> from BTrees.LOBTree import BTree, TreeSet
> storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
> db = DB(storage)
> conn = db.open()
> root = conn.root()
> transaction.begin()
> index = root['index'] = BTree()
> values = index[1] = TreeSet()
> values.add(42)
> transaction.commit()
>
> You should probably read:
> http://www.zodb.org/documentation/guide/modules.html#btrees-package.
> Since that was written an L variants of the BTree types have been
> introduced for storing 64bit integers. I'm using an LOBTree because
> that maps 64bit integers to python objects. For values I'm using an
> LOTreeSet, though you could also use an LLTreeSet (which has larger
> buckets).
>
> Laurence
>
> On 12 May 2010 00:37, Ryan Noon  wrote:
> > Hi Jim,
> > I'm really sorry for the miscommunication, I thought I made that clear in
> my
> > last email:
> > "I'm wrapping ZODB in a 'ZMap' class that just forwards all the
> dictionary
> > methods to the ZODB root and allows easy interchangeability with my old
> > sqlite OODB abstraction."
> > wordid_to_docset is a "ZMap", which just wraps the ZODB
> > boilerplate/connection and forwards dictionary methods to the root.  If
> this
> > seems superfluous, it was just to maintain backwards compatibility with
> all
> > of the code I'd already written for the sqlite OODB I was using before I
> > switched to ZODB.  Whenever you see something like wordid_to_docset[id]
> it's
> > just doing self.root[id] behind the scenes in a __setitem__ call inside
> the
> > ZMap class, which I've pasted below.
> > The db is just storing longs mapped to array('L')'s with a few thousand
> > longs in em.  I'm going to try switching to the persistent data structure
> > that Laurence suggested (a pointer to relevant documentation would be
> really
> > useful), but I'm still sorta worried because in my experimentation with
> ZODB
> > so far I've never been able to observe it sticking to any cache limits,
> no
> > matter how often I tell it to garbage collect (even when storing very
> small
> > values that should give it adequate granularity...see my experiment at
> the
> > end of my last email).  If the memory reported to the OS by Python 2.6 is
> > the problem I'd understand, but memory usage goes up the second I start
> > adding new things (which indicates that Python is asking for more and not
> > actually freeing internally, no?).
> > If you feel there's something pathological about my memory access
> patterns
> > in this operation I can just do the actual inversion step in Hadoop and
> load
> > the output into ZODB for my application later, I was just hoping to keep
> all
> > of my data in OODB's the entire time.
> > Thanks again all of you for your collective time.  I really like ZODB so
> > far, and it bugs me that I'm likely screwing it up somewhere.
> > Cheers,
> > Ryan
> >
> >
> > class ZMap(object):
> >
> > def __init__(self, name=None, dbfile=None, cache_size_mb=512,
> > autocommit=True):
> > self.name = name
> > self.dbfile = dbfile
> > self.autocommit = autocommit
> >
> > self.__hash__ = None #can't hash this
> >
> > #first things first, figure out if we need to make up a name
> > if self.name == None:
> > self.name = make_up_name()
> > if sep in self.name:
> > if self.name[-1] == sep:
> > self.name = self.name[:-1]
> > self.name = self.name.split(sep)[-1]
> >
> >
> > if self.dbfile == None:
> > self.dbfile = self.name + '.zdb'
> >
> > self.storage = FileStorage(self.dbfile, pack_keep_old=False)
> > self.cache_size = cache_size_mb * 1024 * 1024
> >
> > self.db = DB(self.storage, pool_size=1,
> > cache_size_bytes=self.cache_size,
> > historical_cache_size_bytes=self.cache_size, database_name=self.name)
> > self.connection = self.db.open()
> > self.root = self.connection.root()
> >
> > print 'Initializing ZMap "%s" in file "%s" with %dmb cache.
> Current
> > %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root))
> >
> > # basic operators
> > def __eq__(self, y): # x == y
> > return self.root.__eq__(y)
> > def __ge__(self, y): # x >= y
> > return 

Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-11 Thread Laurence Rowe
I think this means that you are storing all of your data in a single
persistent object, the database root PersistentMapping. You need to
break up your data into persistent objects (instances of objects that
inherit from persistent.Persistent) for the ZODB to have a chance of
performing memory mapping. You want to do something like:

import transaction
from ZODB import FileStorage, DB
from BTrees.LOBTree import BTree, TreeSet
storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
db = DB(storage)
conn = db.open()
root = conn.root()
transaction.begin()
index = root['index'] = BTree()
values = index[1] = TreeSet()
values.add(42)
transaction.commit()

You should probably read:
http://www.zodb.org/documentation/guide/modules.html#btrees-package.
Since that was written an L variants of the BTree types have been
introduced for storing 64bit integers. I'm using an LOBTree because
that maps 64bit integers to python objects. For values I'm using an
LOTreeSet, though you could also use an LLTreeSet (which has larger
buckets).

Laurence

On 12 May 2010 00:37, Ryan Noon  wrote:
> Hi Jim,
> I'm really sorry for the miscommunication, I thought I made that clear in my
> last email:
> "I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary
> methods to the ZODB root and allows easy interchangeability with my old
> sqlite OODB abstraction."
> wordid_to_docset is a "ZMap", which just wraps the ZODB
> boilerplate/connection and forwards dictionary methods to the root.  If this
> seems superfluous, it was just to maintain backwards compatibility with all
> of the code I'd already written for the sqlite OODB I was using before I
> switched to ZODB.  Whenever you see something like wordid_to_docset[id] it's
> just doing self.root[id] behind the scenes in a __setitem__ call inside the
> ZMap class, which I've pasted below.
> The db is just storing longs mapped to array('L')'s with a few thousand
> longs in em.  I'm going to try switching to the persistent data structure
> that Laurence suggested (a pointer to relevant documentation would be really
> useful), but I'm still sorta worried because in my experimentation with ZODB
> so far I've never been able to observe it sticking to any cache limits, no
> matter how often I tell it to garbage collect (even when storing very small
> values that should give it adequate granularity...see my experiment at the
> end of my last email).  If the memory reported to the OS by Python 2.6 is
> the problem I'd understand, but memory usage goes up the second I start
> adding new things (which indicates that Python is asking for more and not
> actually freeing internally, no?).
> If you feel there's something pathological about my memory access patterns
> in this operation I can just do the actual inversion step in Hadoop and load
> the output into ZODB for my application later, I was just hoping to keep all
> of my data in OODB's the entire time.
> Thanks again all of you for your collective time.  I really like ZODB so
> far, and it bugs me that I'm likely screwing it up somewhere.
> Cheers,
> Ryan
>
>
> class ZMap(object):
>
>     def __init__(self, name=None, dbfile=None, cache_size_mb=512,
> autocommit=True):
>         self.name = name
>         self.dbfile = dbfile
>         self.autocommit = autocommit
>
>         self.__hash__ = None #can't hash this
>
>         #first things first, figure out if we need to make up a name
>         if self.name == None:
>             self.name = make_up_name()
>         if sep in self.name:
>             if self.name[-1] == sep:
>                 self.name = self.name[:-1]
>             self.name = self.name.split(sep)[-1]
>
>
>         if self.dbfile == None:
>             self.dbfile = self.name + '.zdb'
>
>         self.storage = FileStorage(self.dbfile, pack_keep_old=False)
>         self.cache_size = cache_size_mb * 1024 * 1024
>
>         self.db = DB(self.storage, pool_size=1,
> cache_size_bytes=self.cache_size,
> historical_cache_size_bytes=self.cache_size, database_name=self.name)
>         self.connection = self.db.open()
>         self.root = self.connection.root()
>
>         print 'Initializing ZMap "%s" in file "%s" with %dmb cache. Current
> %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root))
>
>     # basic operators
>     def __eq__(self, y): # x == y
>         return self.root.__eq__(y)
>     def __ge__(self, y): # x >= y
>         return len(self) >= len(y)
>     def __gt__(self, y): # x > y
>         return len(self) > len(y)
>     def __le__(self, y): # x <= y
>         return not self.__gt__(y)
>     def __lt__(self, y): # x < y
>         return not self.__ge__(y)
>     def __len__(self): # len(x)
>         return len(self.root)
>
>
>     # dictionary stuff
>     def __getitem__(self, key): # x[key]
>         return self.root[key]
>     def __setitem__(self, key, value): # x[key] = value
>         self.root[key] = value
>         self.__commit_check() # write back if necessary
>
>     def __de

Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-11 Thread Ryan Noon
Hi Jim,

I'm really sorry for the miscommunication, I thought I made that clear in my
last email:

"I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary
methods to the ZODB root and allows easy interchangeability with my old
sqlite OODB abstraction."

wordid_to_docset is a "ZMap", which just wraps the ZODB
boilerplate/connection and forwards dictionary methods to the root.  If this
seems superfluous, it was just to maintain backwards compatibility with all
of the code I'd already written for the sqlite OODB I was using before I
switched to ZODB.  Whenever you see something like wordid_to_docset[id] it's
just doing self.root[id] behind the scenes in a __setitem__ call inside the
ZMap class, which I've pasted below.

The db is just storing longs mapped to array('L')'s with a few thousand
longs in em.  I'm going to try switching to the persistent data structure
that Laurence suggested (a pointer to relevant documentation would be really
useful), but I'm still sorta worried because in my experimentation with ZODB
so far I've never been able to observe it sticking to any cache limits, no
matter how often I tell it to garbage collect (even when storing very small
values that should give it adequate granularity...see my experiment at the
end of my last email).  If the memory reported to the OS by Python 2.6 is
the problem I'd understand, but memory usage goes up the second I start
adding new things (which indicates that Python is asking for more and not
actually freeing internally, no?).

If you feel there's something pathological about my memory access patterns
in this operation I can just do the actual inversion step in Hadoop and load
the output into ZODB for my application later, I was just hoping to keep all
of my data in OODB's the entire time.

Thanks again all of you for your collective time.  I really like ZODB so
far, and it bugs me that I'm likely screwing it up somewhere.

Cheers,
Ryan



class ZMap(object):

def __init__(self, name=None, dbfile=None, cache_size_mb=512,
autocommit=True):
self.name = name
self.dbfile = dbfile
self.autocommit = autocommit

self.__hash__ = None #can't hash this

#first things first, figure out if we need to make up a name
if self.name == None:
self.name = make_up_name()
if sep in self.name:
if self.name[-1] == sep:
self.name = self.name[:-1]
self.name = self.name.split(sep)[-1]


if self.dbfile == None:
self.dbfile = self.name + '.zdb'

self.storage = FileStorage(self.dbfile, pack_keep_old=False)
self.cache_size = cache_size_mb * 1024 * 1024

self.db = DB(self.storage, pool_size=1,
cache_size_bytes=self.cache_size,
historical_cache_size_bytes=self.cache_size, database_name=self.name)
self.connection = self.db.open()
self.root = self.connection.root()

print 'Initializing ZMap "%s" in file "%s" with %dmb cache. Current
%d items' % (self.name, self.dbfile, cache_size_mb, len(self.root))



# basic operators
def __eq__(self, y): # x == y
return self.root.__eq__(y)
def __ge__(self, y): # x >= y
return len(self) >= len(y)
def __gt__(self, y): # x > y
return len(self) > len(y)
def __le__(self, y): # x <= y
return not self.__gt__(y)
def __lt__(self, y): # x < y
return not self.__ge__(y)
def __len__(self): # len(x)
return len(self.root)


# dictionary stuff
def __getitem__(self, key): # x[key]
return self.root[key]

def __setitem__(self, key, value): # x[key] = value
self.root[key] = value
self.__commit_check() # write back if necessary

def __delitem__(self, key): # del x[key]
del self.root[key]


def get(self, key, default=None): # x[key] if key in x, else default
return self.root.get(key, default)

def has_key(self, key): # True if x has key, else False
return self.root.has_key(key)

def items(self): # list of key/val pairs
return self.root.items()

def keys(self):
return self.root.keys()

def pop(self, key, default=None):
return self.root.pop()

def popitem(self): #remove and return an arbitrary key/val pair
return self.root.popitem()

def setdefault(self, key, default=None):
#D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
return self.root.setdefault(key, default)

def values(self):
return self.root.values()

def copy(self): #copy it? dubiously necessary at the moment
NOT_IMPLEMENTED('copy')


# iteration
def __iter__(self): # iter(x)
return self.root.iterkeys()

def iteritems(self): #iterator over items, this can be hellaoptimized
return self.root.iteritems()


def itervalues(self):
return self.root.itervalues()

def iterkeys(self):
return self.root.iterkeys()


# prac

Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Jim Fulton
On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  wrote:
> On Sun, May 9, 2010 at 8:33 PM, Jim Fulton  wrote:
>>
>> Our recent discussion of compression made me curious so I did some
>> analysis of pickle sizes in one of our large databases. This is for a
>> content management system.  The database is packed weekly.  It doesn't
>> include media, which are in blobs.
>>
>> There were ~19 million transaction in the database and around 130
>> million data records. About 60% of the size was taken up by BTrees.
>> Compressing pickles using zlib with default compression reduced the
>> pickle sizes by ~58%. The average uncompressed record size was 1163
>> bytes.  The average compressed size was ~493 bytes.
>>
>> This is probably enough of a savings to make compression interesting.

...

> That's really interesting! Did you notice any issues performance wise, or
> didn't you check that yet?

OK, I did some crude tests.  It looks like compressing is a little
less expensive than pickling and decompressing is a little more
expensive than unpickling, which is to say this is pretty cheap.  For
example, decompressing a data record took around 20 microseconds on my
machine. A typical ZEO load takes 10s of milliseconds. Even in Shane's
zodb shootout benchmark which loads data from ram, load times are
several hundred microseconds or more.

I don't think compression will hurt performance.  It is likeley to
help it in practice because:

- There will be less data to send back and forth to remote servers.

- Smaller databases will get more benefit from disk caches.
  (Databases will be more likely to fit on ssds.)

- ZEO caches (and relstorage memcached caches) will be able to hold
  more object records.

Jim

--
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Adam GROSZER wrote:
> Hello Jim,
> 
> 
> 
> Tuesday, May 11, 2010, 4:46:46 PM, you wrote:
> 
> JF> On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER  wrote:
>>> Hello Jim,
>>>
>>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>>>
>>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER  
>>> wrote:
> Hello Jim,
>
> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>
> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  
> wrote:
>>> Hello Jim,
>>>
>>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>>
>>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>>> wrote:
> That's really interesting! Did you notice any issues performance 
> wise, or
> didn't you check that yet?
>>> JF> I didn't check performance. I just iterated over a file storage 
>>> file,
>>> JF> checking compressed and uncompressed pickle sizes.
>>>
>>> I'd say some checksum is then also needed to detect bit failures that
>>> mess up the compressed data.
> JF> Why?
>
> I think the gzip algo compresses to a bit-stream, where even one bit
> has an error the rest of the uncompressed data might be a total mess.
> If that one bit is relatively early in the stream it's fatal.
> Salvaging the data is not a joy either.
> I know at this level we should expect that the OS and any underlying
> infrastructure should provide error-free data or fail.
> Tho I've seen some magic situations where the file copied without
> error through a network, but at the end CRC check failed on it :-O
>>> JF> How would a checksum help?  All it would do is tell you your hosed.
>>> JF> It wouldn't make you any less hosed.
>>>
>>> Yes, but I would know why it's hosed.
> 
> JF> How so?  How would you know why it is hosed.
> 
> Because of data corruption in the compressed stream.
> 
> JF> Note BTW that the zlib format already includes a checksum.
> 
> JF>   http://www.faqs.org/rfcs/rfc1950.html
> 
> I missed that. Case closed then ;-) Sorry for the noise.

A zipped file is not different from other (binary) data stored within
the ZODB. Data corruption can occur always - zipped or not.

Side note: we implemented compression support as part of our CMS on the
application layer where we store large binary files as linked PData
chains. However we do not compress in any case - only for certain
content-types (it does not make sense to compress zip or jar files).
We also store a md5 hash for each object (and never had a corruption
issue so far).

Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvpn5sACgkQCJIWIbr9KYy3nACfV1lo6FLX7xeiDRVRlsj64tSX
Xy4An31w7pY9K0wmIIUtIxzpFGRmv7GW
=HbjJ
-END PGP SIGNATURE-
<>___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints

2010-05-11 Thread Wichert Akkerman
On 5/11/10 19:41 , Chris Withers wrote:
> Jim Fulton wrote:
 I plan to implement A soon if there are no objections.

 Unless someone somehow convinced me to do D, I'll also add an
 assertion in the Transaction.join method to raise an error if a
 data manager joins more than once.
>>> Option A sounds sensible. It also means I won't have to change
>>> anything in the zope.sqlalchemy data manager.
>>
>> Very cool. I was hoping non-ZODB-data-manager authors
>> were paying attention. :)
>>
>> If anyone knows of any other, I would appreciate someone forwarding
>> this thread to them.
>
> zope.sendmail and MaildropHost have data managers.
> I've seen some file-based things that was a data manager and a few
> others in people's BFG stacks, maybe they can pipe up and/or let the
> others of those wsgi components know.

I wrote repoze.filesafe a while ago to do transaction-aware creation of 
files, which uses a datamanager. It never tries to join a transaction 
more than once, so should be fine.

Wichert.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints

2010-05-11 Thread Chris Withers
Jim Fulton wrote:
>>> I plan to implement A soon if there are no objections.
>>>
>>> Unless someone somehow convinced me to do D, I'll also add an
>>> assertion in the Transaction.join method to raise an error if a
>>> data manager joins more than once.
>> Option A sounds sensible. It also means I won't have to change
>> anything in the zope.sqlalchemy data manager.
> 
> Very cool. I was hoping non-ZODB-data-manager authors
> were paying attention. :)
> 
> If anyone knows of any other, I would appreciate someone forwarding
> this thread to them.

zope.sendmail and MaildropHost have data managers.
I've seen some file-based things that was a data manager and a few 
others in people's BFG stacks, maybe they can pipe up and/or let the 
others of those wsgi components know.

I'm also likely about to write one, but I'm dumb, so can't comment 
meaningfully on the options ;-)

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
 - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 11:35 AM, Laurence Rowe  wrote:
> On 11 May 2010 15:08, Jim Fulton  wrote:
>> On Tue, May 11, 2010 at 8:38 AM, Benji York  wrote:
>>> On Tue, May 11, 2010 at 7:34 AM, Jim Fulton  wrote:
 [...] The best I've been
 able to come up with is something like:

    t = ZODB.transaction(3)
    while t.trying:
        with t:
            ... transaction body ...
>>>
>>> I think you could get this to work:
>>>
>>> for transaction in ZODB.retries(3):
>>>    with transaction:
>>>        ... transaction body ...
>>>
>>> ZODB.retries would return an iterator that would raise StopIteration on
>>> the next go-round if the previously yielded context manager exited
>>> without a ConflictError.
>>
>> This is an improvement. It's still unsatisfying, but I don't think I'm going 
>> to
>> get satisfaction. :)
>>
>> BTW, if I do something like this, I think I'll add a retry exception to
>> the transaction package and have ZODB.POSException.ConflictError
>> extend it so I can add the retry automation to the transaction package.
>
> The repoze.retry package lets you configure a list of exceptions.
> http://pypi.python.org/pypi/repoze.retry
> http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py
>
>  Though it seems inspecting the error text is required for most sql
> database errors to know if they are retryable, as ZPsycoPGDA does:
>
>  188                 except (psycopg2.ProgrammingError,
> psycopg2.IntegrityError), e:
>  189                     if e.args[0].find("concurrent update") > -1:
>  190                         raise ConflictError
>
> (https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py)
>
> For PostgreSQL it should be sufficient to catch these errors and raise
> Retry during tpc_vote.
>
> For databases which do not provide MVCC in the same way as PostgreSQL,
> concurrency errors could be manifested at any point in the
> transaction. Even Oracle can raise an error during a long running
> transaction when insufficient rollback space is available, resulting
> in what is essentially a read conflict error. Such errors could not be
> caught by a data manager and reraised as a Retry exception.
>
> I think it might be useful to add an optional method to data managers
> that is queried by the retry automation machinery to see if an
> exception should potentially be retried. Perhaps this would best be
> accomplished in two steps:
>
> 1. Add an optional property to data managers called ``retryable``.
> This is a list of potentially retryable exceptions. When a data
> manager is added to the transaction, the transaction's list of
> retryable exceptions is extended by the joining data managers list of
> retryable exceptions.
>
> t = transaction.begin()
> try:
>    application()
> except t.retryable, e:
>    t.retry(e):
>
> 2. t.retry(e) is then checks with each registered data manager if that
> particular exceptions is retryable, and if so raises Retry.
>
> def retry(self, e):
>    for datamanager in self._resources:
>        try:
>            retry = datamanager.retry
>        except AttributeError:
>            continue
>        if isinstance(e, datamanager.retryable):
>            datamanager.retry(e) # dm may raise Retry here

Thanks.

I don't think we need 1 and 2.
I'm inclined to go with 2.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Laurence Rowe
On 11 May 2010 15:08, Jim Fulton  wrote:
> On Tue, May 11, 2010 at 8:38 AM, Benji York  wrote:
>> On Tue, May 11, 2010 at 7:34 AM, Jim Fulton  wrote:
>>> [...] The best I've been
>>> able to come up with is something like:
>>>
>>>    t = ZODB.transaction(3)
>>>    while t.trying:
>>>        with t:
>>>            ... transaction body ...
>>
>> I think you could get this to work:
>>
>> for transaction in ZODB.retries(3):
>>    with transaction:
>>        ... transaction body ...
>>
>> ZODB.retries would return an iterator that would raise StopIteration on
>> the next go-round if the previously yielded context manager exited
>> without a ConflictError.
>
> This is an improvement. It's still unsatisfying, but I don't think I'm going 
> to
> get satisfaction. :)
>
> BTW, if I do something like this, I think I'll add a retry exception to
> the transaction package and have ZODB.POSException.ConflictError
> extend it so I can add the retry automation to the transaction package.

The repoze.retry package lets you configure a list of exceptions.
http://pypi.python.org/pypi/repoze.retry
http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py

 Though it seems inspecting the error text is required for most sql
database errors to know if they are retryable, as ZPsycoPGDA does:

 188 except (psycopg2.ProgrammingError,
psycopg2.IntegrityError), e:
 189 if e.args[0].find("concurrent update") > -1:
 190 raise ConflictError

(https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py)

For PostgreSQL it should be sufficient to catch these errors and raise
Retry during tpc_vote.

For databases which do not provide MVCC in the same way as PostgreSQL,
concurrency errors could be manifested at any point in the
transaction. Even Oracle can raise an error during a long running
transaction when insufficient rollback space is available, resulting
in what is essentially a read conflict error. Such errors could not be
caught by a data manager and reraised as a Retry exception.

I think it might be useful to add an optional method to data managers
that is queried by the retry automation machinery to see if an
exception should potentially be retried. Perhaps this would best be
accomplished in two steps:

1. Add an optional property to data managers called ``retryable``.
This is a list of potentially retryable exceptions. When a data
manager is added to the transaction, the transaction's list of
retryable exceptions is extended by the joining data managers list of
retryable exceptions.

t = transaction.begin()
try:
application()
except t.retryable, e:
t.retry(e):

2. t.retry(e) is then checks with each registered data manager if that
particular exceptions is retryable, and if so raises Retry.

def retry(self, e):
for datamanager in self._resources:
try:
retry = datamanager.retry
except AttributeError:
continue
if isinstance(e, datamanager.retryable):
datamanager.retry(e) # dm may raise Retry here

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Nitro
Am 11.05.2010, 17:08 Uhr, schrieb Nitro :

> Am 11.05.2010, 16:01 Uhr, schrieb Jim Fulton :
>
>> This wouldn't work.  You would need to re-execute the suite
>> for each retry. It's not enough to just keep committing the same
>> transaction. (There are other details wrong with the code above,
>> but they are fixable.)  Python doesn't provide a way to keep
>> executing the suite.
>
> You are right.
>
> The only thing I could come up with was something like below, using a
> decorator instead of a context.
>
> -Matthias
>
> @doTransaction(count = 5)
> def storeData():
> ... store data here ...
>
> def doTransaction(transaction = None, count = 3):
>  def decorator(func):
>  def do():
>  for i in range(1+count):
>  try:
>  func()
>  except:
>  transaction.abort()
>  raise
>  try:
>  transaction.commit()
>  except ConflictError:
>  if i == count:
>  raise
>  else:
>  return
>  return do

This should read return do(), i.e. the decorator should directly execute  
the storeData function.

All in all I think Benji's proposal looks better :-)

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Nitro
Am 11.05.2010, 16:01 Uhr, schrieb Jim Fulton :

> This wouldn't work.  You would need to re-execute the suite
> for each retry. It's not enough to just keep committing the same
> transaction. (There are other details wrong with the code above,
> but they are fixable.)  Python doesn't provide a way to keep
> executing the suite.

You are right.

The only thing I could come up with was something like below, using a  
decorator instead of a context.

-Matthias

@doTransaction(count = 5)
def storeData():
... store data here ...

def doTransaction(transaction = None, count = 3):
 def decorator(func):
 def do():
 for i in range(1+count):
 try:
 func()
 except:
 transaction.abort()
 raise
 try:
 transaction.commit()
 except ConflictError:
 if i == count:
 raise
 else:
 return
 return do
 return decorator

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello Jim,



Tuesday, May 11, 2010, 4:46:46 PM, you wrote:

JF> On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER  wrote:
>> Hello Jim,
>>
>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>>
>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER  wrote:
 Hello Jim,

 Tuesday, May 11, 2010, 12:33:04 PM, you wrote:

 JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  
 wrote:
>> Hello Jim,
>>
>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>
>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>> wrote:
 That's really interesting! Did you notice any issues performance wise, 
 or
 didn't you check that yet?
>>
>> JF> I didn't check performance. I just iterated over a file storage file,
>> JF> checking compressed and uncompressed pickle sizes.
>>
>> I'd say some checksum is then also needed to detect bit failures that
>> mess up the compressed data.

 JF> Why?

 I think the gzip algo compresses to a bit-stream, where even one bit
 has an error the rest of the uncompressed data might be a total mess.
 If that one bit is relatively early in the stream it's fatal.
 Salvaging the data is not a joy either.
 I know at this level we should expect that the OS and any underlying
 infrastructure should provide error-free data or fail.
 Tho I've seen some magic situations where the file copied without
 error through a network, but at the end CRC check failed on it :-O
>>
>> JF> How would a checksum help?  All it would do is tell you your hosed.
>> JF> It wouldn't make you any less hosed.
>>
>> Yes, but I would know why it's hosed.

JF> How so?  How would you know why it is hosed.

Because of data corruption in the compressed stream.

JF> Note BTW that the zlib format already includes a checksum.

JF>   http://www.faqs.org/rfcs/rfc1950.html

I missed that. Case closed then ;-) Sorry for the noise.


-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
Some are atheists by neglect; others are so by affectation; they that think 
there is no God at some times do not think so at all times. 
- Benjamin Whichcote 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER  wrote:
> Hello Jim,
>
> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>
> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER  wrote:
>>> Hello Jim,
>>>
>>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>>>
>>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  
>>> wrote:
> Hello Jim,
>
> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>
> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
> wrote:
>>> That's really interesting! Did you notice any issues performance wise, 
>>> or
>>> didn't you check that yet?
>
> JF> I didn't check performance. I just iterated over a file storage file,
> JF> checking compressed and uncompressed pickle sizes.
>
> I'd say some checksum is then also needed to detect bit failures that
> mess up the compressed data.
>>>
>>> JF> Why?
>>>
>>> I think the gzip algo compresses to a bit-stream, where even one bit
>>> has an error the rest of the uncompressed data might be a total mess.
>>> If that one bit is relatively early in the stream it's fatal.
>>> Salvaging the data is not a joy either.
>>> I know at this level we should expect that the OS and any underlying
>>> infrastructure should provide error-free data or fail.
>>> Tho I've seen some magic situations where the file copied without
>>> error through a network, but at the end CRC check failed on it :-O
>
> JF> How would a checksum help?  All it would do is tell you your hosed.
> JF> It wouldn't make you any less hosed.
>
> Yes, but I would know why it's hosed.

How so?  How would you know why it is hosed.

Note BTW that the zlib format already includes a checksum.

  http://www.faqs.org/rfcs/rfc1950.html

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Benji York
On Tue, May 11, 2010 at 10:08 AM, Jim Fulton  wrote:
> This is an improvement. It's still unsatisfying, but I don't think I'm going 
> to
> get satisfaction. :)

Given that PEP 343 explicitly mentions *not* supporting an auto retry
construct, I should think not. :)
-- 
Benji York
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 8:38 AM, Benji York  wrote:
> On Tue, May 11, 2010 at 7:34 AM, Jim Fulton  wrote:
>> [...] The best I've been
>> able to come up with is something like:
>>
>>    t = ZODB.transaction(3)
>>    while t.trying:
>>        with t:
>>            ... transaction body ...
>
> I think you could get this to work:
>
> for transaction in ZODB.retries(3):
>    with transaction:
>        ... transaction body ...
>
> ZODB.retries would return an iterator that would raise StopIteration on
> the next go-round if the previously yielded context manager exited
> without a ConflictError.

This is an improvement. It's still unsatisfying, but I don't think I'm going to
get satisfaction. :)

BTW, if I do something like this, I think I'll add a retry exception to
the transaction package and have ZODB.POSException.ConflictError
extend it so I can add the retry automation to the transaction package.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Leonardo Santagada

On May 11, 2010, at 9:53 AM, Lennart Regebro wrote:

> On Tue, May 11, 2010 at 14:47, Adam GROSZER  wrote:
>> Probably that crappy data would make the unpickler fail... or wait a
>> second... the unpickler is a **SECURITY HOLE** in python, isn't it?
>> That means feed it some random data... and stay tuned for the
>> unexpected.
> 
> That a bitflip would generate random data that actually did anything
> at all is a bit like if you shake a puzzle box and out comes a
> dinosaur and bites your leg. :-)
> 
>> The thing is that a single bitflip could cause a LOT of crap.
> 
> Mostly likely it would generate an unpickling error. But yeah, in
> theory at least you are right. I have no idea what the performance
> penalty would be, but a checksum would feel good. :)

Most likely a bit flip in uncompressed data is much worse as it will probably 
pass unnoticed until it cause a major pain somewhere far away from where the 
bit flip occurred, in this manner compressed data all the way to a zeo client 
is better for a higher chance of fail-stop. I think, maybe :)


--
Leonardo Santagada
santagada at gmail.com



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 7:52 AM, Nitro  wrote:
> I'm already using custom transaction/savepoint context managers in my
> code. I use them like

...
> Now you could probably extend this to look like
>
> class TransactionContext(object):
>     def __init__(self, txn = None, retryCount = 3):
>         if txn is None:
>             txn = transaction.get()
>         self.txn = txn
>         self.retryCount = retryCount
>
>     def __enter__(self):
>         return self.txn
>
>     def __exit__(self, t, v, tb):
>         if t is not None:
>             self.txn.abort()
>         else:
>             for i in range(self.retryCount):
>                 try:
>                     self.txn.commit()
>                 except ConflictError as exc2:
>                     exc = exc2
>                 else:
>                     return
>             raise exc
>
> The looping/except part could probably look nicer. Use case looks like:
>
> with TransactionContext(mytransaction, retryCount = 5):
>     db.root['sp_test'] = 'init'
>
> Does this look similar to what you were looking for?

This wouldn't work.  You would need to re-execute the suite
for each retry. It's not enough to just keep committing the same
transaction. (There are other details wrong with the code above,
but they are fixable.)  Python doesn't provide a way to keep
executing the suite.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Lennart Regebro
On Tue, May 11, 2010 at 14:47, Adam GROSZER  wrote:
> Probably that crappy data would make the unpickler fail... or wait a
> second... the unpickler is a **SECURITY HOLE** in python, isn't it?
> That means feed it some random data... and stay tuned for the
> unexpected.

That a bitflip would generate random data that actually did anything
at all is a bit like if you shake a puzzle box and out comes a
dinosaur and bites your leg. :-)

> The thing is that a single bitflip could cause a LOT of crap.

Mostly likely it would generate an unpickling error. But yeah, in
theory at least you are right. I have no idea what the performance
penalty would be, but a checksum would feel good. :)

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello,

Tuesday, May 11, 2010, 1:59:17 PM, you wrote:

N> Am 11.05.2010, 13:47 Uhr, schrieb Adam GROSZER :

>> Hello Jim,
>>
>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>>
>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER   
>> wrote:
 Hello Jim,

 Tuesday, May 11, 2010, 12:33:04 PM, you wrote:

 JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER   
 wrote:
>> Hello Jim,
>>
>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>
>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>>  wrote:
 That's really interesting! Did you notice any issues performance  
 wise, or
 didn't you check that yet?
>>
>> JF> I didn't check performance. I just iterated over a file storage  
>> file,
>> JF> checking compressed and uncompressed pickle sizes.
>>
>> I'd say some checksum is then also needed to detect bit failures that
>> mess up the compressed data.

 JF> Why?

 I think the gzip algo compresses to a bit-stream, where even one bit
 has an error the rest of the uncompressed data might be a total mess.
 If that one bit is relatively early in the stream it's fatal.
 Salvaging the data is not a joy either.
 I know at this level we should expect that the OS and any underlying
 infrastructure should provide error-free data or fail.
 Tho I've seen some magic situations where the file copied without
 error through a network, but at the end CRC check failed on it :-O
>>
>> JF> How would a checksum help?  All it would do is tell you your hosed.
>> JF> It wouldn't make you any less hosed.
>>
>> Yes, but I would know why it's hosed.
>> Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom
>> app that does some calculation.

N> You could have bitflips anywhere in the database, not just the payload
N> parts. You'd have to checksum and test everything all the time. Imo it's
N> not worth the complexity and performance penalty given today's redundant
N> storages like RAID, ZRS or zeoraid.

N> Btw, the current pickle payload format is not secured against any bitflips
N> either I think.

The difference between the uncompressed and compressed is that if you
have bitflips in an uncompressed data stream then you get let's say a
B instead of A, or 3 instead of 1. That hits hard in numbers/IDs, but
keeps string still human readable. Because the rest is still there.
In a compressed stream the rest of the pickle/payload would be
probably crap.
Probably that crappy data would make the unpickler fail... or wait a
second... the unpickler is a **SECURITY HOLE** in python, isn't it?
That means feed it some random data... and stay tuned for the
unexpected.
The thing is that a single bitflip could cause a LOT of crap.

You're right that currently there's no protection against such
bitflips, but I'd rather present the user a nice error than some
crappy data.

-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
If necessity is the mother of invention, discontent is the father of progress. 
- David Rockefeller 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Benji York
On Tue, May 11, 2010 at 7:34 AM, Jim Fulton  wrote:
> [...] The best I've been
> able to come up with is something like:
>
>    t = ZODB.transaction(3)
>    while t.trying:
>        with t:
>            ... transaction body ...

I think you could get this to work:

for transaction in ZODB.retries(3):
with transaction:
... transaction body ...

ZODB.retries would return an iterator that would raise StopIteration on
the next go-round if the previously yielded context manager exited
without a ConflictError.
-- 
Benji York
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Nitro
>  def __exit__(self, t, v, tb):
>  if t is not None:
>  self.txn.abort()
>  else:
>  for i in range(self.retryCount):

Oops, bug here. It should read range(1 + self.retryCount). It should  
probably have unittests anyway :-)

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Nitro
Am 11.05.2010, 13:47 Uhr, schrieb Adam GROSZER :

> Hello Jim,
>
> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>
> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER   
> wrote:
>>> Hello Jim,
>>>
>>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>>>
>>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER   
>>> wrote:
> Hello Jim,
>
> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>
> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>  wrote:
>>> That's really interesting! Did you notice any issues performance  
>>> wise, or
>>> didn't you check that yet?
>
> JF> I didn't check performance. I just iterated over a file storage  
> file,
> JF> checking compressed and uncompressed pickle sizes.
>
> I'd say some checksum is then also needed to detect bit failures that
> mess up the compressed data.
>>>
>>> JF> Why?
>>>
>>> I think the gzip algo compresses to a bit-stream, where even one bit
>>> has an error the rest of the uncompressed data might be a total mess.
>>> If that one bit is relatively early in the stream it's fatal.
>>> Salvaging the data is not a joy either.
>>> I know at this level we should expect that the OS and any underlying
>>> infrastructure should provide error-free data or fail.
>>> Tho I've seen some magic situations where the file copied without
>>> error through a network, but at the end CRC check failed on it :-O
>
> JF> How would a checksum help?  All it would do is tell you your hosed.
> JF> It wouldn't make you any less hosed.
>
> Yes, but I would know why it's hosed.
> Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom
> app that does some calculation.

You could have bitflips anywhere in the database, not just the payload  
parts. You'd have to checksum and test everything all the time. Imo it's  
not worth the complexity and performance penalty given today's redundant  
storages like RAID, ZRS or zeoraid.

Btw, the current pickle payload format is not secured against any bitflips  
either I think.

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Nitro
I'm already using custom transaction/savepoint context managers in my  
code. I use them like

with TransactionContext():
 db.root['sp_test'] = 'init'


with SavepointContext():
 db.root['sp_test'] = 'saved'

On of the context managers:

class TransactionContext(object):
 def __init__(self, txn = None):
 if txn is None:
 txn = transaction.get()
 self.txn = txn

 def __enter__(self):
 return self.txn

 def __exit__(self, t, v, tb):
 if t is not None:
 self.txn.abort()
 else:
 self.txn.commit()


Now you could probably extend this to look like

class TransactionContext(object):
 def __init__(self, txn = None, retryCount = 3):
 if txn is None:
 txn = transaction.get()
 self.txn = txn
 self.retryCount = retryCount

 def __enter__(self):
 return self.txn

 def __exit__(self, t, v, tb):
 if t is not None:
 self.txn.abort()
 else:
 for i in range(self.retryCount):
 try:
 self.txn.commit()
 except ConflictError as exc2:
 exc = exc2
 else:
 return
 raise exc

The looping/except part could probably look nicer. Use case looks like:

with TransactionContext(mytransaction, retryCount = 5):
 db.root['sp_test'] = 'init'

Does this look similar to what you were looking for?

-Matthias

For completeness, here's my savepoint manager:

class SavepointContext(object):
 def __enter__(self, txn = None):
 if txn is None:
 txn = transaction.get()
 self.savepoint = txn.savepoint()
 return self.savepoint

 def __exit__(self, type, value, traceback):
 if type is not None:
 self.savepoint.rollback()
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello Jim,

Tuesday, May 11, 2010, 1:37:19 PM, you wrote:

JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER  wrote:
>> Hello Jim,
>>
>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>>
>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  wrote:
 Hello Jim,

 Monday, May 10, 2010, 1:27:00 PM, you wrote:

 JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
 wrote:
>> That's really interesting! Did you notice any issues performance wise, or
>> didn't you check that yet?

 JF> I didn't check performance. I just iterated over a file storage file,
 JF> checking compressed and uncompressed pickle sizes.

 I'd say some checksum is then also needed to detect bit failures that
 mess up the compressed data.
>>
>> JF> Why?
>>
>> I think the gzip algo compresses to a bit-stream, where even one bit
>> has an error the rest of the uncompressed data might be a total mess.
>> If that one bit is relatively early in the stream it's fatal.
>> Salvaging the data is not a joy either.
>> I know at this level we should expect that the OS and any underlying
>> infrastructure should provide error-free data or fail.
>> Tho I've seen some magic situations where the file copied without
>> error through a network, but at the end CRC check failed on it :-O

JF> How would a checksum help?  All it would do is tell you your hosed.
JF> It wouldn't make you any less hosed.

Yes, but I would know why it's hosed.
Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom
app that does some calculation.

-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
The Past is over for all of us... the Future is promised to none of us. 
- Wayne Dyer 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER  wrote:
> Hello Jim,
>
> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>
> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  wrote:
>>> Hello Jim,
>>>
>>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>>
>>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>>> wrote:
> That's really interesting! Did you notice any issues performance wise, or
> didn't you check that yet?
>>>
>>> JF> I didn't check performance. I just iterated over a file storage file,
>>> JF> checking compressed and uncompressed pickle sizes.
>>>
>>> I'd say some checksum is then also needed to detect bit failures that
>>> mess up the compressed data.
>
> JF> Why?
>
> I think the gzip algo compresses to a bit-stream, where even one bit
> has an error the rest of the uncompressed data might be a total mess.
> If that one bit is relatively early in the stream it's fatal.
> Salvaging the data is not a joy either.
> I know at this level we should expect that the OS and any underlying
> infrastructure should provide error-free data or fail.
> Tho I've seen some magic situations where the file copied without
> error through a network, but at the end CRC check failed on it :-O

How would a checksum help?  All it would do is tell you your hosed.
It wouldn't make you any less hosed.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Automating retry management

2010-05-11 Thread Jim Fulton
So I'm about to update the transaction package and this gives me an
opportunity to do something I've been meaning to do for a while, which
is to add support for the Python with statement:

with transaction:
... transaction body ...

and:

with some_transaction_manager as t:
... transaction body, accesses current transaction as t ...

This looks really great, IMO, but there's a major piece missing, which
is dealing with transient transaction failures due to conflicts.  If
using an optimistic transaction mechanism, like ZODB's, you have to
deal with conflict errors.  If using a lock-based transaction
mechanism, you'd have to deal with deadlock detection. In either case,
you have to detect conflicts and retry the transaction.

I wonder how other Python database interfaces deal with this.  (I just
skimmed the DBI v2 spec and didn't see anything.) What happens, for
example, if there are conflicting writes in a Postgress or Oracle
application? I assume that some sort of exception is raised.

I also wonder how this situation could be handled elegantly.  To deal
with conflicts, (assuming transaction had with statement support)
you'd end up with:

tries = 0
while 1:
try:
with transaction:
conn.root.x = 1
except ZODB.POSExeption.ConflictError:
tries += 1
if tries > 3:
raise

Yuck!  (Although it's better than it would be without transaction
with statement support.) In web applications, we generally don't see
the retry management because the framework takes care of it for
us. That is until we write a script to do something outside of a web
application.

This would be easier to automate if Python let us write custom looping
structures or allowed full anonymous functions.  The best I've been
able to come up with is something like:

t = ZODB.transaction(3)
while t.trying:
with t:
... transaction body ...

Here the transaction function returns an object that:

- keeps track of how many times it's tried and manages a "trying"
  attribute that is true while we haven't given up or suceeded, and

- is a context manager that takes care of transaction boundaries and
  updates the trying attr depending on transaction outcome.

This version is better than the one with the try/except version,
but isn't entirely satisfying. :)

Does anyone have any better ideas?

I use a ZODB function, because ConflictError is ZODB specific.  It
would be nice if this could be standardized, so that a mechanism could
be defined by the transaction package.

Jim

--
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello Jim,

Tuesday, May 11, 2010, 12:33:04 PM, you wrote:

JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  wrote:
>> Hello Jim,
>>
>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>
>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  wrote:
 That's really interesting! Did you notice any issues performance wise, or
 didn't you check that yet?
>>
>> JF> I didn't check performance. I just iterated over a file storage file,
>> JF> checking compressed and uncompressed pickle sizes.
>>
>> I'd say some checksum is then also needed to detect bit failures that
>> mess up the compressed data.

JF> Why?

I think the gzip algo compresses to a bit-stream, where even one bit
has an error the rest of the uncompressed data might be a total mess.
If that one bit is relatively early in the stream it's fatal.
Salvaging the data is not a joy either.
I know at this level we should expect that the OS and any underlying
infrastructure should provide error-free data or fail.
Tho I've seen some magic situations where the file copied without
error through a network, but at the end CRC check failed on it :-O

-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
You may have to fight a battle more than once to win it. 
- Margaret Thatcher 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-11 Thread Jim Fulton
On Mon, May 10, 2010 at 8:20 PM, Ryan Noon  wrote:
> P.S. About the data structures:
> wordset is a freshly unpickled python set from my old sqlite oodb thingy.
> The new docsets I'm keeping are 'L' arrays from the stdlib array module.
>  I'm up for using ZODB's builtin persistent data structures if it makes a
> lot of sense to do so, but it sorta breaks my abstraction a bit and I feel
> like the memory issues I'm having are somewhat independent of the container
> data structures (as I'm having the same issue just with fixed size strings).

This is getting tiresome.  We can't really advise you because we can't
see what data structures you're using and we're wasting too much time
guessing. We wouldn't have to guess and grill you if you showed a
complete demonstration program, or at least one that showed what the
heck your doing.

The program you've showed so far is so incomplete, perhaps we're
missing the obvious.

In your original program, you never actually store anything in the
database. You assign the database root to self.root, but never use
self.root. (The variable self is not defined and we're left to assume
that this disembodied code is part of a method definition.) In your
most recent snippet, you don't show any database access. If you
never actually store anything in the database, then nothing will be
removed from memory.

You're inserting data into wordid_to_docset, but you don't show its
definition and won't tell us what it is.

Jim

--
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Jim Fulton
On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER  wrote:
> Hello Jim,
>
> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>
> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  wrote:
>>> That's really interesting! Did you notice any issues performance wise, or
>>> didn't you check that yet?
>
> JF> I didn't check performance. I just iterated over a file storage file,
> JF> checking compressed and uncompressed pickle sizes.
>
> I'd say some checksum is then also needed to detect bit failures that
> mess up the compressed data.

Why?

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Some interesting (to some:) numbers

2010-05-11 Thread Adam GROSZER
Hello Jim,

Monday, May 10, 2010, 1:27:00 PM, you wrote:

JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  wrote:
>> That's really interesting! Did you notice any issues performance wise, or
>> didn't you check that yet?

JF> I didn't check performance. I just iterated over a file storage file,
JF> checking compressed and uncompressed pickle sizes.

I'd say some checksum is then also needed to detect bit failures that
mess up the compressed data.

-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
A true friend is someone who is there for you when he'd rather be anywhere 
else. 
- Len Wein 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev