Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jan-Wijbrand Kolman

On 11/16/13 2:01 AM, Jens W. Klein wrote:
 Did I miss something? Any opinions much appreciated!

 Expect updates in this thread :)

We did experience relstorage problems that we think could have been 
related to packing - we're not sure yet. So, I'm following this thread 
and your updates with great interest!


regards, jw

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jim Fulton
On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote:
 I started a new packing script for Relstorage (history free, postgresql). It
 is based on incoming reference counting.

Did you look at zc.zodbdgc?  I think it implements something very close to
what you're proposing.  It's been in production for a few years now at ZC.

Not sure if it would need to be updated for relstorage.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jan-Wijbrand Kolman

On 11/18/13 12:19 PM, Jim Fulton wrote:

On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote:

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.


Did you look at zc.zodbdgc?  I think it implements something very close to
what you're proposing.  It's been in production for a few years now at ZC.

Not sure if it would need to be updated for relstorage.


AFAICT it does not work against a relstorage backend. Or at least I 
think to understand that from:


http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html

[...This documentation does not apply to RelStorage which has the same 
features built-in, but accessible in different ways. Look at the options 
for the zodbpack script. The –prepack option creates a table containing 
the same information as we are creating in the reference database[...]



regards, jw



___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jim Fulton
On Mon, Nov 18, 2013 at 8:43 AM, Jan-Wijbrand Kolman
janwijbr...@gmail.com wrote:
 On 11/18/13 12:19 PM, Jim Fulton wrote:

 On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com
 wrote:

 I started a new packing script for Relstorage (history free, postgresql).
 It
 is based on incoming reference counting.


 Did you look at zc.zodbdgc?  I think it implements something very close to
 what you're proposing.  It's been in production for a few years now at ZC.

 Not sure if it would need to be updated for relstorage.


 AFAICT it does not work against a relstorage backend. Or at least I think to
 understand that from:

 http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html

 [...This documentation does not apply to RelStorage which has the same
 features built-in, but accessible in different ways. Look at the options for
 the zodbpack script. The –prepack option creates a table containing the same
 information as we are creating in the reference database[...]

I didn't write that.  I think zodbdgz probably would work, possibly
with some modifications.
If nothing else, it should be consulted, but then again, writing
software is fun.

Note that the important aspect here isn't cross-database references,
but the garbage
collection algorithm, which is incremental and uses a linear scan of
the database.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Shane Hathaway

On 11/15/2013 06:01 PM, Jens W. Klein wrote:

The idea is simple:

- iterate over all transactions starting with the lowest
   transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
   all incoming references of an object are stored as an array.
   if an state has no references write it anyway to the table with empty
   outgoing references


I would describe the RelStorage packing algorithm with the same words, 
but since you reimplemented the algorithm from scratch, you found a more 
optimal implementation for your database. Good work!


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jens W. Klein

Hi Jim,

thanks for the hint (also in the other post). I looked at zc.zodbdgc and 
took some inspiration from it. As far as I understand it stores the 
incoming references in a separate filestorage backend. So this works 
similar to my impelmentation but uses the ZODB infrastructure. I dont 
see how I make zc.zodbdgc play with Relstorage and since it works on the 
abstracted ZODB level using pickles I suspected it to be not fast enough 
for so many obejcts - so I skipped this alternative.


Jens

On 2013-11-18 12:19, Jim Fulton wrote:

On Fri, Nov 15, 2013 at 8:01 PM, Jens W. Klein j...@bluedynamics.com wrote:

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.


Did you look at zc.zodbdgc?  I think it implements something very close to
what you're proposing.  It's been in production for a few years now at ZC.

Not sure if it would need to be updated for relstorage.

Jim




--
Klein  Partner KG, member of BlueDynamics Alliance

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jens W. Klein

On 2013-11-18 17:29, Shane Hathaway wrote:

On 11/15/2013 06:01 PM, Jens W. Klein wrote:

The idea is simple:

- iterate over all transactions starting with the lowest
   transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
   all incoming references of an object are stored as an array.
   if an state has no references write it anyway to the table with empty
   outgoing references


I would describe the RelStorage packing algorithm with the same words,
but since you reimplemented the algorithm from scratch, you found a more
optimal implementation for your database. Good work!



Thanks for the feedback!

The only change I made is regarding to arrays - they are difficult to 
handle in postgres (no fun at all). It is much easier (and faster!) to 
add one row for each incoming reference (plus a reference to self). 
Deleting is now easy by fetching all references with only the 
self-reference and removing the objects, its left self-reference and 
deleting incoming references of the objects referenced by the deleted 
object.


My SQL was a bit rusty after zome years of ODBs. I'am sure a person with 
deeper SQL knowledge may optimize the queries in some way.


regards Jens
--
Klein  Partner KG, member of BlueDynamics Alliance

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Optimizing RelStorage Packing for large DBs

2013-11-18 Thread Jim Fulton
On Mon, Nov 18, 2013 at 12:00 PM, Jens W. Klein j...@bluedynamics.com wrote:
 Hi Jim,

 thanks for the hint (also in the other post). I looked at zc.zodbdgc and
 took some inspiration from it. As far as I understand it stores the incoming
 references in a separate filestorage backend.

This is just a temporary file to avoid storing all the data in memory.

 So this works similar to my
 impelmentation but uses the ZODB infrastructure. I dont see how I make
 zc.zodbdgc play with Relstorage and since it works on the abstracted ZODB
 level using pickles

I don't know what you're saying, since I don't know what it refers to.

zodbdgc works with storages.  relstorage conforms to the storage API.
It's possible some changes would be needed, but they should be minor.

 I suspected it to be not fast enough for so many obejcts

No idea why, or what fast enough is.  We use it on a database with
~200 million objects.

 - so I skipped this alternative.

Good luck.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev