[Zope-dev] Re: ZOBD and pointers
[Yair Benita] Reading this answer I understand that anything I store should be persistent, even if its a list I don't plan to edit. [Tim Peters] I wouldn't say that. For example, for _most_ applications it would be foolish to create a subclass of Persistent to store an integer, as opposed to just storing an integer directly. I can conceive of (unlikely!) applications where there may be advantages to storing integers as perisistent objects, though. [Tres Seaver] As, for instance, where the integer changes much more frequently than the other attributes, which are large enough that re-storing them just because the integer attribute changed is painful. Yup, that's a possible reason. Another recently popped up, which I'll exaggerate to make the point: you have 100,000 distinct integer ids, and you have 10,000 objects each with a (Python) list containing 10,000 of those ids. If you load those all into memory, Python will allocate space for 1*1 = 100 million integer objects, and that will consume more than a gigabyte of RAM. But if integers are stored as one unique persistent object per unique integer, it can't require more than 100 thousand distinct persistent integers in memory (because that's the total number of distinct integer ids). The RAM difference is a factor of about 1000 (but ignoring that it takes more RAM to hold a persistent wrapper than to hold a straight integer). I'll note that IISets avoid this problem via a different route: they hold their integers as raw bits, not as Python integer objects. When you extract an element from an IISet, a Python integer object is created on-the-fly to wrap the bits. Making the attribute a persistent sub-object also eliminates the chance of a ConflictError based on changes to the other attributes. I didn't follow that one. If other attributes change, they can trigger conflict errors, right? This is the use case which drives BTrees.Length, right? The important part of that is its conflict resolution method, which keeps track of the correct final size of a BTree in the face of concurrent mutations. BTrees don't keep track of their own size because every addition or deletion would have to percolate the change in size back up to the root of the BTree, and we'd get conflict errors on the root object then. As is, most additions and deletions change only the leaf Bucket node where the mutation takes place, giving mutation often-useful spatial locality in the face of concurrent mutations. I wish we could do better than that, though: from what I see, most people don't realize that len(some_BTree) takes time linear in the number of elements, and sucks the entire BTree into RAM. The rest seem to have trouble, at least at first, using BTrees.Length correctly. I suppose that's what you get when a scheme is driven by pragmatic implementation compromises instead of by semantic necessity. Give enough pain, it should be possible to hide the BTrees.Length strategy under the covers, although I'm not sure the increase in storage size could be justified to users who have mastered the details of doing it manually (the problem being that many uses for BTrees never care to ask for the size, so wouldn't want to pay extra overheads for keeping track of size efficiently). ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: ZOBD and pointers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tim Peters wrote: [Tres Seaver] Making the attribute a persistent sub-object also eliminates the chance of a ConflictError based on changes to the other attributes. I didn't follow that one. If other attributes change, they can trigger conflict errors, right? Imaging object A with attributes 'foo' (a string), 'bar' (a normal Python int), and 'baz' (a hypothetical persistent int). Assigning directly to 'baz' would still conflict with assigning to 'foo' or 'bar'; however, the persistent int object might have an update protocol which made its value changeable without needing to rebind another PI into its parent. This is the use case which drives BTrees.Length, right? The important part of that is its conflict resolution method, which keeps track of the correct final size of a BTree in the face of concurrent mutations. BTrees don't keep track of their own size because every addition or deletion would have to percolate the change in size back up to the root of the BTree, and we'd get conflict errors on the root object then. As is, most additions and deletions change only the leaf Bucket node where the mutation takes place, giving mutation often-useful spatial locality in the face of concurrent mutations. I wish we could do better than that, though: from what I see, most people don't realize that len(some_BTree) takes time linear in the number of elements, and sucks the entire BTree into RAM. The rest seem to have trouble, at least at first, using BTrees.Length correctly. I suppose that's what you get when a scheme is driven by pragmatic implementation compromises instead of by semantic necessity. Give enough pain, it should be possible to hide the BTrees.Length strategy under the covers, although I'm not sure the increase in storage size could be justified to users who have mastered the details of doing it manually (the problem being that many uses for BTrees never care to ask for the size, so wouldn't want to pay extra overheads for keeping track of size efficiently). OK, cool. Tres. - -- === Tres Seaver +1 202-558-7113 [EMAIL PROTECTED] Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCuD/M+gerLs4ltQ4RAlunAKDAXewr/rGeiG7Rhz/aWlwhmnUzDgCgpV51 n/RqyKt05uUieC93uP3Mzmw= =PX5d -END PGP SIGNATURE- ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: ZOBD and pointers
As far as I am aware, ZODB will store a list of pointers to the lists of z objects. What you should be careful of for efficient use of ZODB is that your list is stored in an efficient way, well if the list is updated often or long anyway. When you pack your ZODB does it take up a lot less space? If so it may be that a lot of space is being wasted storing the updated lists of object references. Unless you use a special PersistentList ZODB will have no choice but to store a new copy of the whole list when that list is modified. If you have long lists then this can be a big problem. The Persistent classes have special handling to make them more efficent. So instead of lists use PersistentLists and instead of dicts use BTrees, as these may be stored more efficiently in the ZODB. Also have a look at the analyze.py script to try and track down where the space is being used. My notes here may be helpful too http://zopelabs.com/cookbook/1114086617 Hope that helps, Laurence Yair Benita wrote: Hi All, As my ZODB data files become larger and larger I am looking at ways to make the structure of my objects more efficient. To simplify my question, suppose I have two different classes and both contain a list of a objects from a third class: class x has the attribute x.elements = [objects of class z] class y has the attribute y.elements = [objects of class z] As far as I understand python the lists x.elements and y.elements contain pointers to the z objects previously defined. What I wanted to know is how ZODB handles that (or maybe I should say: how pickle handles that) when saving to a file. Will the pointers be converted to a copy of the z class objects or will one copy of the z class objects be saved and than the x.elements and y.elements will still be a list of pointers? Thanks for the help, Yair ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: ZOBD and pointers
[Laurence Rowe] ... Unless you use a special PersistentList ZODB will have no choice but to store a new copy of the whole list when that list is modified. Caution: that's true of a PersistentList too. The purpose of PersistentList isn't realy to supply more-effecient storage (that's the purpose of the various BTree classes). The purpose of PersistentList is this: myobject.my_list_attibute[3] = 4 If my_list_attribute is a plain Python list, the persistence machinery has no way to know that my_list_attribute's state mutated, so the assignment above will not get stored to disk at the next commit unless you _also_ do myobject._p_changed = True # or 1 If my_list_attribute is a PersistentList, then the persistence machinery does know when its state mutates, and there's no need to manage _p_changed manually. But in either case, the entire state of my_list_attribute gets stored to disk whenever any part of it changes. The only difference in what gets stored in the example above is that myobject's state also gets stored to disk if my_list_attribute is a Python list (assuming myobject._p_changed gets set to a true value by hand), while myobject's state does not need to get written to disk again if my_list_attribute is a PersistentList (then myobject refers to my_list_attribute via the latter's oid, and that oid hasn't changed, so there's no need to store myobject's state again). The entire state of the list attribute gets written out in either case. If you have long lists then this can be a big problem. Very true. The Persistent classes have special handling to make them more efficent. Sometimes true, but not in the PersistentList case. So instead of lists use PersistentLists If the goal is to save space, generally no, PersistentList won't help that; to the contrary, their state takes a little more space on disk than a plain list. and instead of dicts use BTrees, That one's differenent: a BTree is really a graph of (potentially _very_) many distinct perisistent objects, and BTrees were designed to support space- and time- efficient mutation. as these may be stored more efficiently in the ZODB. For BTrees, yes. ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: ZOBD and pointers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tim Peters wrote: [Yair Benita] ... Reading this answer I understand that anything I store should be persistent, even if its a list I don't plan to edit. I wouldn't say that. For example, for _most_ applications it would be foolish to create a subclass of Persistent to store an integer, as opposed to just storing an integer directly. I can conceive of (unlikely!) applications where there may be advantages to storing integers as perisistent objects, though. As, for instance, where the integer changes much more frequently than the other attributes, which are large enough that re-storing them just because the integer attribute changed is painful. Making the attribute a persistent sub-object also eliminates the chance of a ConflictError based on changes to the other attributes. This is the use case which drives BTrees.Length, right? Tres. - -- === Tres Seaver +1 202-558-7113 [EMAIL PROTECTED] Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCtw/D+gerLs4ltQ4RAnEqAJ9PKCCRriJR3Qt4AWrGCUGk1V6RFQCgxTEl 9waizE6T/pk8Tz/Tkul/4TA= =Uief -END PGP SIGNATURE- ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )