[Zope-dev] Re: ZOBD and pointers

2005-06-21 Thread Tim Peters
[Yair Benita]
 Reading this answer I understand that anything I store should be
 persistent, even if its a list I don't plan to edit.

[Tim Peters]
 I wouldn't say that.  For example, for _most_ applications it would be
 foolish to create a subclass of Persistent to store an integer, as
 opposed to just storing an integer directly.  I can conceive of
 (unlikely!) applications where there may be advantages to storing
 integers as perisistent objects, though.

[Tres Seaver]
 As, for instance, where the integer changes much more frequently than
 the other attributes, which are large enough that re-storing them just
 because the integer attribute changed is painful.

Yup, that's a possible reason.  Another recently popped up, which I'll
exaggerate to make the point:  you have 100,000 distinct integer ids,
and you have 10,000 objects each with a (Python) list containing
10,000 of those ids.  If you load those all into memory, Python will
allocate space for 1*1 = 100 million integer objects, and that
will consume more than a gigabyte of RAM.  But if integers are stored
as one unique persistent object per unique integer, it can't require
more than 100 thousand distinct persistent integers in memory (because
that's the total number of distinct integer ids).  The RAM difference
is a factor of about 1000 (but ignoring that it takes more RAM to hold
a persistent wrapper than to hold a straight integer).

I'll note that IISets avoid this problem via a different route:  they
hold their integers as raw bits, not as Python integer objects.  When
you extract an element from an IISet, a Python integer object is
created on-the-fly to wrap the bits.

 Making the attribute a persistent sub-object also eliminates the chance of a
 ConflictError based on changes to the other attributes.

I didn't follow that one.  If other attributes change, they can
trigger conflict errors, right?

 This is the use case which drives BTrees.Length, right?

The important part of that is its conflict resolution method, which
keeps track of the correct final size of a BTree in the face of
concurrent mutations.  BTrees don't keep track of their own size
because every addition or deletion would have to percolate the change
in size back up to the root of the BTree, and we'd get conflict errors
on the root object then.  As is, most additions and deletions change
only the leaf Bucket node where the mutation takes place, giving
mutation often-useful spatial locality in the face of concurrent
mutations.

I wish we could do better than that, though:  from what I see, most
people don't realize that len(some_BTree) takes time linear in the
number of elements, and sucks the entire BTree into RAM.  The rest
seem to have trouble, at least at first, using BTrees.Length
correctly.  I suppose that's what you get when a scheme is driven by
pragmatic implementation compromises instead of by semantic necessity.
 Give enough pain, it should be possible to hide the BTrees.Length
strategy under the covers, although I'm not sure the increase in
storage size could be justified to users who have mastered the details
of doing it manually (the problem being that many uses for BTrees
never care to ask for the size, so wouldn't want to pay extra
overheads for keeping track of size efficiently).
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: ZOBD and pointers

2005-06-21 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tim Peters wrote:

 [Tres Seaver]

Making the attribute a persistent sub-object also eliminates the chance of a
ConflictError based on changes to the other attributes.
 
 
 I didn't follow that one.  If other attributes change, they can
 trigger conflict errors, right?

Imaging object A with attributes 'foo' (a string), 'bar' (a normal
Python int), and 'baz' (a hypothetical persistent int).  Assigning
directly to 'baz' would still conflict with assigning to 'foo' or 'bar';
 however, the persistent int object might have an update protocol
which made its value changeable without needing to rebind another PI
into its parent.

This is the use case which drives BTrees.Length, right?
 
 
 The important part of that is its conflict resolution method, which
 keeps track of the correct final size of a BTree in the face of
 concurrent mutations.  BTrees don't keep track of their own size
 because every addition or deletion would have to percolate the change
 in size back up to the root of the BTree, and we'd get conflict errors
 on the root object then.  As is, most additions and deletions change
 only the leaf Bucket node where the mutation takes place, giving
 mutation often-useful spatial locality in the face of concurrent
 mutations.
 
 I wish we could do better than that, though:  from what I see, most
 people don't realize that len(some_BTree) takes time linear in the
 number of elements, and sucks the entire BTree into RAM.  The rest
 seem to have trouble, at least at first, using BTrees.Length
 correctly.  I suppose that's what you get when a scheme is driven by
 pragmatic implementation compromises instead of by semantic necessity.
  Give enough pain, it should be possible to hide the BTrees.Length
 strategy under the covers, although I'm not sure the increase in
 storage size could be justified to users who have mastered the details
 of doing it manually (the problem being that many uses for BTrees
 never care to ask for the size, so wouldn't want to pay extra
 overheads for keeping track of size efficiently).

OK, cool.

Tres.
- --
===
Tres Seaver  +1 202-558-7113  [EMAIL PROTECTED]
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCuD/M+gerLs4ltQ4RAlunAKDAXewr/rGeiG7Rhz/aWlwhmnUzDgCgpV51
n/RqyKt05uUieC93uP3Mzmw=
=PX5d
-END PGP SIGNATURE-
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: ZOBD and pointers

2005-06-20 Thread Laurence Rowe
As far as I am aware, ZODB will store a list of pointers to the lists of 
z objects. What you should be careful of for efficient use of ZODB is 
that your list is stored in an efficient way, well if the list is 
updated often or long anyway.


When you pack your ZODB does it take up a lot less space? If so it may 
be that a lot of space is being wasted storing the updated lists of 
object references. Unless you use a special PersistentList ZODB will 
have no choice but to store a new copy of the whole list when that list 
is modified. If you have long lists then this can be a big problem. The 
Persistent classes have special handling to make them more efficent.


So instead of lists use PersistentLists and instead of dicts use BTrees, 
as these may be stored more efficiently in the ZODB.


Also have a look at the analyze.py script to try and track down where 
the space is being used. My notes here may be helpful too 
http://zopelabs.com/cookbook/1114086617


Hope that helps,

Laurence

Yair Benita wrote:

Hi All,

As my ZODB data files become larger and larger I am looking at ways to make
the structure of my objects more efficient. To simplify my question, suppose
I have two different classes and both contain a list of a objects from a
third class:

class x has the attribute x.elements = [objects of class z]
class y has the attribute y.elements = [objects of class z]

As far as I understand python the lists x.elements and y.elements contain
pointers to the z objects previously defined. What I wanted to know is how
ZODB handles that (or maybe I should say: how pickle handles that) when
saving to a file. Will the pointers be converted to a copy of the z class
objects or will one copy of the z class objects be saved and than the
x.elements and y.elements will still be a list of pointers?

Thanks for the help,
Yair


___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: ZOBD and pointers

2005-06-20 Thread Tim Peters
[Laurence Rowe]
 ...
 Unless you use a special PersistentList ZODB will have no choice but
 to store a new copy of the whole list when that list is modified.

Caution:  that's true of a PersistentList too.  The purpose of
PersistentList isn't realy to supply more-effecient storage (that's
the purpose of the various BTree classes).  The purpose of
PersistentList is this:

myobject.my_list_attibute[3] = 4

If my_list_attribute is a plain Python list, the persistence machinery
has no way to know that my_list_attribute's state mutated, so the
assignment above will not get stored to disk at the next commit unless
you _also_ do

myobject._p_changed = True # or 1

If my_list_attribute is a PersistentList, then the persistence
machinery does know when its state mutates, and there's no need to
manage _p_changed manually.

But in either case, the entire state of my_list_attribute gets stored
to disk whenever any part of it changes.  The only difference in what
gets stored in the example above is that myobject's state also gets
stored to disk if my_list_attribute is a Python list (assuming
myobject._p_changed gets set to a true value by hand), while
myobject's state does not need to get written to disk again if
my_list_attribute is a PersistentList (then myobject refers to
my_list_attribute via the latter's oid, and that oid hasn't changed,
so there's no need to store myobject's state again).  The entire state
of the list attribute gets written out in either case.

 If you have long lists then this can be a big problem.

Very true.

 The Persistent classes have special handling to make them more efficent.

Sometimes true, but not in the PersistentList case.

 So instead of lists use PersistentLists

If the goal is to save space, generally no, PersistentList won't help
that; to the contrary, their state takes a little more space on disk
than a plain list.

 and instead of dicts use BTrees,

That one's differenent:  a BTree is really a graph of (potentially
_very_) many distinct perisistent objects, and BTrees were designed to
support space- and time- efficient mutation.

 as these may be stored more efficiently in the ZODB.

For BTrees, yes.
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: ZOBD and pointers

2005-06-20 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tim Peters wrote:
 [Yair Benita]
 
...
Reading this answer I understand that anything I store should be
persistent, even if its a list I don't plan to edit.
 
 
 I wouldn't say that.  For example, for _most_ applications it would be
 foolish to create a subclass of Persistent to store an integer, as
 opposed to just storing an integer directly.  I can conceive of
 (unlikely!) applications where there may be advantages to storing
 integers as perisistent objects, though.

As, for instance, where the integer changes much more frequently than
the other attributes, which are large enough that re-storing them just
because the integer attribute changed is painful.  Making the attribute
a persistent sub-object also eliminates the chance of a ConflictError
based on changes to the other attributes.  This is the use case which
drives BTrees.Length, right?


Tres.
- --
===
Tres Seaver  +1 202-558-7113  [EMAIL PROTECTED]
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCtw/D+gerLs4ltQ4RAnEqAJ9PKCCRriJR3Qt4AWrGCUGk1V6RFQCgxTEl
9waizE6T/pk8Tz/Tkul/4TA=
=Uief
-END PGP SIGNATURE-
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )