[Yair Benita] > I recently started to use ZODB and python as my chosen database > solution. > I am having a few problems with retaining changes in BTrees. I have > read the documentation and am aware of the _p_changed attribute. Still, > here is what I observe: > > ############################ > # consider this simplified example > # lets ignore the open database method for now. > T = OOBTree() > > # my trees contain integers as keys and sets as values > T.update({1:set([1,2,3]), 2:set([5,6,7])}) > > # I would really really like this to work > T[1].add(6) > T._p_changed = True > get_transaction().commit() > > # but it doesn't. > # changes are not saved when I close the database and reopen it
That's right. A Python set is not itself a persistent object, and so doesn't magically inform the persistence system when it mutates. A BTree is a persistent object, but under the covers it's actually a (potentially very large) graph of distinct persistent objects. That's what makes it scalable. Setting T._p_changed told the root object of this graph that its state changed, and that doesn't do you any good (in fact, the root object did not change). There is no direct way to access the interior BTree and Bucket nodes in a BTree graph, so you need to be trickier to make this work. > # This works: > T[1].add(6) > T.update({1:T[1]}) The conventional idiom should also work: T[1] = T[1] That manages to (in effect) set _p_changed on the invisible (to you) interior Bucket node holding T[1]. Sometimes you'll see code like this: some_object.some_attr = some_object.some_attr That's the same trick. For example, if p is an instance of some persistent class, and p.list is a Python list, then p.list.append(42) doesn't mark p as changed, but adding p.list = p.list does mark it changed. Whether that's more or less obscure than p._p_changed = True is somewhat in the eye of the beholder. > The thing is my sets tend to be very big Then you definitely don't want to use a non-persistent type for this. The entire state of a non-persistent object gets stored all over again when _any_ part of it changes. That's an easy way to change a linear-time algorithm into a quadratic-time one. > and I am not sure but I think that using T.update({1:T[1]}) will slow me > down since a dictionary is first created with a copy of the set which is > very big No; Python never, ever makes a copy of anything unless you explicitly ask for a copy. T.update({1: T[1]}) is just a little slower than T[1]=T[1], and both ways just move a few pointers around, independent of how large len(T[1]) may be. > and then the OOBTree is updated. Or am I wrong here? As above. What really kills you here is that the _commit_ time is proprotional to len(T[1]), because the entire state of a non-persistent object is stored to disk whenever any part of it changes. That's why people recommend using an IITreeSet instead. Like a BTree, that's actually a (potentially very large) graph of independent persistent objects. Do not use an IISet, use an IITreeSet here. An IISet is a single persistent object, and has the same problem as a plain Python set in that the entire state needs to be stored whenever any piece changes. That isn't true of an IITreeSet. _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )