Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On 11/16/13 2:01 AM, Jens W. Klein wrote: Did I miss something? Any opinions much appreciated! Expect updates in this thread :) We did experience relstorage problems that we think could have been related to packing - we're not sure yet. So, I'm following this thread and your updates with great interest! regards, jw ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote: I started a new packing script for Relstorage (history free, postgresql). It is based on incoming reference counting. Did you look at zc.zodbdgc? I think it implements something very close to what you're proposing. It's been in production for a few years now at ZC. Not sure if it would need to be updated for relstorage. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On 11/18/13 12:19 PM, Jim Fulton wrote: On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote: I started a new packing script for Relstorage (history free, postgresql). It is based on incoming reference counting. Did you look at zc.zodbdgc? I think it implements something very close to what you're proposing. It's been in production for a few years now at ZC. Not sure if it would need to be updated for relstorage. AFAICT it does not work against a relstorage backend. Or at least I think to understand that from: http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html [...This documentation does not apply to RelStorage which has the same features built-in, but accessible in different ways. Look at the options for the zodbpack script. The –prepack option creates a table containing the same information as we are creating in the reference database[...] regards, jw ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On Mon, Nov 18, 2013 at 8:43 AM, Jan-Wijbrand Kolman janwijbr...@gmail.com wrote: On 11/18/13 12:19 PM, Jim Fulton wrote: On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote: I started a new packing script for Relstorage (history free, postgresql). It is based on incoming reference counting. Did you look at zc.zodbdgc? I think it implements something very close to what you're proposing. It's been in production for a few years now at ZC. Not sure if it would need to be updated for relstorage. AFAICT it does not work against a relstorage backend. Or at least I think to understand that from: http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html [...This documentation does not apply to RelStorage which has the same features built-in, but accessible in different ways. Look at the options for the zodbpack script. The –prepack option creates a table containing the same information as we are creating in the reference database[...] I didn't write that. I think zodbdgz probably would work, possibly with some modifications. If nothing else, it should be consulted, but then again, writing software is fun. Note that the important aspect here isn't cross-database references, but the garbage collection algorithm, which is incremental and uses a linear scan of the database. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On 11/15/2013 06:01 PM, Jens W. Klein wrote: The idea is simple: - iterate over all transactions starting with the lowest transaction id (tid) - for each transaction load the object states connected with tid - for each state fetch its outgoing references and fill a table where all incoming references of an object are stored as an array. if an state has no references write it anyway to the table with empty outgoing references I would describe the RelStorage packing algorithm with the same words, but since you reimplemented the algorithm from scratch, you found a more optimal implementation for your database. Good work! Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
Hi Jim, thanks for the hint (also in the other post). I looked at zc.zodbdgc and took some inspiration from it. As far as I understand it stores the incoming references in a separate filestorage backend. So this works similar to my impelmentation but uses the ZODB infrastructure. I dont see how I make zc.zodbdgc play with Relstorage and since it works on the abstracted ZODB level using pickles I suspected it to be not fast enough for so many obejcts - so I skipped this alternative. Jens On 2013-11-18 12:19, Jim Fulton wrote: On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote: I started a new packing script for Relstorage (history free, postgresql). It is based on incoming reference counting. Did you look at zc.zodbdgc? I think it implements something very close to what you're proposing. It's been in production for a few years now at ZC. Not sure if it would need to be updated for relstorage. Jim -- Klein Partner KG, member of BlueDynamics Alliance ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On 2013-11-18 17:29, Shane Hathaway wrote: On 11/15/2013 06:01 PM, Jens W. Klein wrote: The idea is simple: - iterate over all transactions starting with the lowest transaction id (tid) - for each transaction load the object states connected with tid - for each state fetch its outgoing references and fill a table where all incoming references of an object are stored as an array. if an state has no references write it anyway to the table with empty outgoing references I would describe the RelStorage packing algorithm with the same words, but since you reimplemented the algorithm from scratch, you found a more optimal implementation for your database. Good work! Thanks for the feedback! The only change I made is regarding to arrays - they are difficult to handle in postgres (no fun at all). It is much easier (and faster!) to add one row for each incoming reference (plus a reference to self). Deleting is now easy by fetching all references with only the self-reference and removing the objects, its left self-reference and deleting incoming references of the objects referenced by the deleted object. My SQL was a bit rusty after zome years of ODBs. I'am sure a person with deeper SQL knowledge may optimize the queries in some way. regards Jens -- Klein Partner KG, member of BlueDynamics Alliance ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs
On Mon, Nov 18, 2013 at 12:00 PM, Jens W. Klein j...@bluedynamics.com wrote: Hi Jim, thanks for the hint (also in the other post). I looked at zc.zodbdgc and took some inspiration from it. As far as I understand it stores the incoming references in a separate filestorage backend. This is just a temporary file to avoid storing all the data in memory. So this works similar to my impelmentation but uses the ZODB infrastructure. I dont see how I make zc.zodbdgc play with Relstorage and since it works on the abstracted ZODB level using pickles I don't know what you're saying, since I don't know what it refers to. zodbdgc works with storages. relstorage conforms to the storage API. It's possible some changes would be needed, but they should be minor. I suspected it to be not fast enough for so many obejcts No idea why, or what fast enough is. We use it on a database with ~200 million objects. - so I skipped this alternative. Good luck. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev