Tim Peters wrote:
How expensive and costly are savepoints?
6, maybe 6.2, depending on the units you're using <wink>. Seriously, how
can such a question be answered? How expensive is math.log()?
My professor for numerical mathmatics would say it is very expensive
because log() takes much more than 10 cpu cycles. *g*
Savepoints are a generalization of subtransactions (and subtransactions are
now implemented "on top of" savepoints), so if you think the cost of a
subtransaction was 100, the cost of a savepoint will be somewhere around 100
too. Modified state has to be written to temp file(s) in either case, and
in such a way that it can be forgotten later if desired.
I was able to track down a savepoint() call to TmpStorage seems to store
parts of the current subtransaction to a file if I'm right. I wasn't
sure if savepoint() is either just marking a point in the middle of a
transaction or storing the transaction somewhere. From my point of view
it is costly compared to __add__(). *g*
If I were you, I'd just _try_ it, and fiddle as necessary until I was happy
with the tradeoffs I saw on my real data. It's not possible to guess the
outcome; e.g., if "a typical call" to migrate() takes 10 seconds for your
objects, the time to make a savepoint will probably be relatively
insignificant; if migrate() takes a nanosecond, the time to make a savepoint
will be relatively huge.
I'm migrating CMF objects to Archetypes objects including metadata,
security and so on. The migiration of a typical object takes about 0.2
to 1 sec including catalog updates. A folderish object with hundres to
thousands of children requires much more time but that's the fault of
the catalog. Every object is uncatalog and catalog again ... ugly, time
consuming but required in Zope2. I'm wishing we have events ...
with and without the savepoint line, where `tree` was an OOBTree and `N` was
1000, and it took 10x longer with the savepoint line. This is probably
close to a worst case, because `tree[i] = 2*i` most often modifies the same
bucket it modified on the previous iteration, and taking a savepoint on each
iteration therefore requires writing out the full state for each bucket many
times (about 15 times each, in fact). Without the savepoint line, each
bucket state is materialized to disk only once.
If I change it to an IIBTree, the discrepancy is even larger (about a factor
of 15), because IIBTrees tend to put many more (key, value) pairs in their
buckets than OOBTrees do, so each bucket state gets written out many more
times with the savepoint line (about 60 times each) than without.
OTOH, if your idea of migrate() doesn't make changes to the same containers
(or other persistent objects) across iterations, the discrepancy should get
smaller, approaching a factor of 1.0 in the limit (if no two iterations
modify the same persistent object). It's not possible to quantify that in
advance without knowing everything about your objects, your containers, and
all the details involved in what your migrate() does.
I could write some code that migrates all objects in a folder before
calling savepoint() but it's not worth the complexity and code.
Of course if this is a one-time migration, I wouldn't worry about expense at
all -- for all I know, it took me longer to write this reply than it will
take you to run the migration script <0.6 wink>.
I wouldn't be sure in your place. I'm migrating all data of nearly all
objects from one set of content types (CMF) to another set
(ATContentTypes). For a very large site like plone.org the migration was
running about 1 to 2h.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org