Re: [ZODB-Dev] Relstorage and over growing database.

2013-11-14 Thread Martijn Pieters
On Wed, Nov 13, 2013 at 9:24 AM, Jens W. Klein j...@bluedynamics.comwrote:

 Thanks Martijn for the hint, but we are using a history free database, so
 growing does only happen by deleted objects in our case.


Right, that is an important distinction.


 When in history free mode, is it possible to detect deleted objects at
 store-time? This way we could add the zoid at store time to a
 objects_deleted table in order to clean them up later.


No, because multiple references to an object might exist. There is no
reference counting in a ZODB, hence the intensive manual tree traversal job
when garbage collecting.


  Another way to speed up graph traversal would be to store the
 object-references in a field of object_state. At the moment we have to read
 the pickle in order to get the referenced zoids. Storing additional -
 redundant - information might be not perfect, but it would allow to pack/gc
 the database without any knowledge about the state objects structure, i.e.
 using a stored procedure.


That sounds like a feasible idea, at least to me.


 I would like to know what the relstorage experts think about this ideas.


You may want to contact Shane directly, he *may* not be reading this list
actively.

-- 
Martijn Pieters
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-11-13 Thread Jens W. Klein

On 2013-11-12 17:48, Martijn Pieters wrote:


On Mon, Nov 11, 2013 at 9:24 PM, Daniel Widerin dan...@widerin.net
mailto:dan...@widerin.net wrote:

Anyone experienced similar problems packing large relstorage databases?
The graph traversal takes a really long time. maybe we can improve that
by storing additional information in the relational database?


You should (at least initially) pack *without* GC (set pack-gc true to
false); I packed a humongous RelStorage-backed database before, and
packed to earlier dates in the past first to minimize the amount of data
removed in a single transaction.

Only when we were down to a reasonable size database did we enable
garbage collection.


Thanks Martijn for the hint, but we are using a history free database, 
so growing does only happen by deleted objects in our case.


When in history free mode, is it possible to detect deleted objects at 
store-time? This way we could add the zoid at store time to a 
objects_deleted table in order to clean them up later.


Another way to speed up graph traversal would be to store the 
object-references in a field of object_state. At the moment we have to 
read the pickle in order to get the referenced zoids. Storing additional 
- redundant - information might be not perfect, but it would allow to 
pack/gc the database without any knowledge about the state objects 
structure, i.e. using a stored procedure.


I would like to know what the relstorage experts think about this ideas.

kind regards
Jens
--
Klein  Partner KG, member of BlueDynamics Alliance

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-11-12 Thread Martijn Pieters
On Mon, Nov 11, 2013 at 9:24 PM, Daniel Widerin dan...@widerin.net wrote:

 Anyone experienced similar problems packing large relstorage databases?
 The graph traversal takes a really long time. maybe we can improve that
 by storing additional information in the relational database?


You should (at least initially) pack *without* GC (set pack-gc true to
false); I packed a humongous RelStorage-backed database before, and packed
to earlier dates in the past first to minimize the amount of data removed
in a single transaction.

Only when we were down to a reasonable size database did we enable garbage
collection.

-- 
Martijn Pieters
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-11-11 Thread Daniel Widerin
Hi, just want to share our experience:

My ZODB contains 300mio objects on relstorage/pgsql. The amount of
objects is caused by btrees stored on plone dexterity contenttypes. It's
size is 160GB. At that size it's impossible to pack because the pre-pack
takes 100 days.

jensens and me are searching for different packing algorithms and
methods to achieve better packing performance. We're keeping you updated
here!

How i solved my problem for now:

I converted into FileStorage which took about 40 hours and Data.fs was
55GB in size. Now i tried to run zeopack on that database - which
succeeded and database was reduced to 7.8 GB - still containing 40mio
objects. After that i migrated back to relstorage because of better
performance and the result is a 11 GB db in pgsql.

Anyone experienced similar problems packing large relstorage databases?
The graph traversal takes a really long time. maybe we can improve that
by storing additional information in the relational database?

Any hints or comments are welcome.

Daniel

Am 07.02.13 23:10, schrieb Shane Hathaway:
 On 02/07/2013 01:54 PM, Jürgen Herrmann wrote:
 Am 07.02.2013 21:18, schrieb Jürgen Herrmann:
 I know that's entirely not your fault but may be worth mentioning
 in the docs. Relstorage with MySQL works *very* well for DB sizes
 5GB or so, above that - not so much :/

 Also for the docs: on disk Restorage/MySQL uses 4x the size of a
 FileStorage with same contents. As packing tables are filled this
 grows by another factor of ~2. If you don't pack very regularly
 you might up ending in DBs that donb't permit packing anymore
 because of the big size very quickly.
 
 I suspect there are ways to fix all of that in the MySQL configuration.
 Like any SQL database, MySQL needs tuning as it grows.  Meanwhile,
 FileStorage doesn't really have any knobs, and it always stores in a
 fairly optimal way, so it's easier to use.
 
 FileStorage has a couple of issues that often drive people to
 RelStorage: (1) the on-disk format is unique to FileStorage, and there
 aren't many tools available for analyzing and fixing a broken Data.fs.
 (2) FileStorage only supports multiple clients through ZEO, which has
 relatively high latency.  If these issues don't impact you, then
 FileStorage is clearly the better choice for you.
 
 Shane
 
 ___
 For more information about ZODB, see http://zodb.org/
 
 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-11-11 Thread Jim Fulton
On Mon, Nov 11, 2013 at 4:24 PM, Daniel Widerin dan...@widerin.net wrote:
 Hi, just want to share our experience:

 My ZODB contains 300mio objects on relstorage/pgsql. The amount of
 objects is caused by btrees stored on plone dexterity contenttypes. It's
 size is 160GB. At that size it's impossible to pack because the pre-pack
 takes 100 days.

 jensens and me are searching for different packing algorithms and
 methods to achieve better packing performance. We're keeping you updated
 here!

 How i solved my problem for now:

 I converted into FileStorage which took about 40 hours and Data.fs was
 55GB in size. Now i tried to run zeopack on that database - which
 succeeded and database was reduced to 7.8 GB - still containing 40mio
 objects. After that i migrated back to relstorage because of better
 performance and the result is a 11 GB db in pgsql.

Hah. Nice.  Have you measured an improvement in relstorage performance
in practice? Is it enough to justify this hassle?

WRT packaging algorithms:

- You might look at zc.FileStorage which takes a slightly different approach
  than FileStorage:

  - Does most of the packing work in a separate process to avoid the GIL.

  - Doesn't do GC.

  - Has some other optimizations I don't recall.  For our large databases,
it's much faster than normal file-storage packing.

- Consider separating garbage collection and packing.  This allows
  garbage collection to be run mostly against a replica and to be spread
  out, if necessary.  Look at zc.zodbdgc.

 Anyone experienced similar problems packing large relstorage databases?
 The graph traversal takes a really long time. maybe we can improve that
 by storing additional information in the relational database?

 Any hints or comments are welcome.

Definately look at zodbdgc.  It doesn't traverse the graph. It essentially does
reference counting and is able to iterate over the database, which for
FileStorage, is relatively quick.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Shane Hathaway

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.


Then I'd like to talk a little about my current relstorage setup here:
It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of the
DB and let zodbconvert convert it back to a FileStorage, the resulting
file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over an
hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special zodbconvert
that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be transformed
into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't 
surprise me very much.


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Jürgen Herrmann

Am 07.02.2013 20:22, schrieb Shane Hathaway:

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.

Then I'd like to talk a little about my current relstorage setup 
here:

It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of 
the
DB and let zodbconvert convert it back to a FileStorage, the 
resulting

file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over an
hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special zodbconvert
that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be transformed
into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't
surprise me very much.


I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/

That issue has given me some sleepless nights, especially because
the conversion step to another storage type takes quite a long
time. But in less than two hours i came up with a workable
solution today, maybe see the other messages on the list regarding
that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :)

best regards,
Jürgen
--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Jürgen Herrmann

Am 07.02.2013 21:18, schrieb Jürgen Herrmann:

Am 07.02.2013 20:22, schrieb Shane Hathaway:

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.

Then I'd like to talk a little about my current relstorage setup 
here:

It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. 
object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of 
the
DB and let zodbconvert convert it back to a FileStorage, the 
resulting

file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over 
an

hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special 
zodbconvert

that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be 
transformed

into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't
surprise me very much.


I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/


Also for the docs: on disk Restorage/MySQL uses 4x the size of a
FileStorage with same contents. As packing tables are filled this
grows by another factor of ~2. If you don't pack very regularly
you might up ending in DBs that donb't permit packing anymore
because of the big size very quickly.

best regards,
Jürgen



That issue has given me some sleepless nights, especially because
the conversion step to another storage type takes quite a long
time. But in less than two hours i came up with a workable
solution today, maybe see the other messages on the list regarding
that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :)

best regards,
Jürgen
--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Shane Hathaway

On 02/07/2013 01:54 PM, Jürgen Herrmann wrote:

Am 07.02.2013 21:18, schrieb Jürgen Herrmann:

I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/


Also for the docs: on disk Restorage/MySQL uses 4x the size of a
FileStorage with same contents. As packing tables are filled this
grows by another factor of ~2. If you don't pack very regularly
you might up ending in DBs that donb't permit packing anymore
because of the big size very quickly.


I suspect there are ways to fix all of that in the MySQL configuration. 
Like any SQL database, MySQL needs tuning as it grows.  Meanwhile, 
FileStorage doesn't really have any knobs, and it always stores in a 
fairly optimal way, so it's easier to use.


FileStorage has a couple of issues that often drive people to 
RelStorage: (1) the on-disk format is unique to FileStorage, and there 
aren't many tools available for analyzing and fixing a broken Data.fs. 
(2) FileStorage only supports multiple clients through ZEO, which has 
relatively high latency.  If these issues don't impact you, then 
FileStorage is clearly the better choice for you.


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-02 Thread Shane Hathaway

On 02/01/2013 09:08 PM, Juan A. Diaz wrote:

Reading the some comments [0] in the code
(relstorage/adapters/schema.py) I could see that the object_ref
database is uses during the packing, then the question is, in a
history-preserving database there is something that we could do to
decrease the size of that table? could be safe truncate that table? We
want to move to a storage with history-free, but for now we are
looking some options and actions to perform on production without the


object_ref is essentially a cache of object_state, and object_refs_added 
is a list of what's in that cache.  Therefore you can freely truncate 
object_ref as long as you also truncate object_refs_added.  Don't 
truncate them during packing, though.



When we realize that the size of the database was over growing we
start to make daily packs but now after read this comment I start to
think that that could be now part of the problem, could be? Normally
in a normal day the DB grows like 2.2GB, but after make a pack the DB
size is decrease clouse to 1.5GB or 2GB.


If your database grows by 2.2 GB per day, it's not surprising that the 
database is 15 GB.  With drive and RAM sizes today, 15 GB doesn't sound 
like a problem to me... unless it's on a Raspberry Pi. :-)


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-02 Thread Juan A. Diaz
2013/2/2 Shane Hathaway sh...@hathawaymix.org:
 On 02/01/2013 09:08 PM, Juan A. Diaz wrote:

 Reading the some comments [0] in the code
 (relstorage/adapters/schema.py) I could see that the object_ref
 database is uses during the packing, then the question is, in a
 history-preserving database there is something that we could do to
 decrease the size of that table? could be safe truncate that table? We
 want to move to a storage with history-free, but for now we are
 looking some options and actions to perform on production without the


 object_ref is essentially a cache of object_state, and object_refs_added is
 a list of what's in that cache.  Therefore you can freely truncate
 object_ref as long as you also truncate object_refs_added.  Don't truncate
 them during packing, though.

Do you think that add one option in zodbpack to truncate this tables
after the pack could be a god idea?


 When we realize that the size of the database was over growing we
 start to make daily packs but now after read this comment I start to
 think that that could be now part of the problem, could be? Normally
 in a normal day the DB grows like 2.2GB, but after make a pack the DB
 size is decrease clouse to 1.5GB or 2GB.


 If your database grows by 2.2 GB per day, it's not surprising that the
 database is 15 GB.  With drive and RAM sizes today, 15 GB doesn't sound like
 a problem to me... unless it's on a Raspberry Pi. :-)

 Shane


Yes, but after the pack the size of new objects that remain in the
database is just like 200MB.

Also 15GB as you say is not a real big database for this days, but we
are synchronizing our databases through a low bandwidth channel across
various datacenters and in some cases recover the database from a
failure in the sync process is real pain!. Do you think that is
possible to don't replicate that tables could be safe? There are other
tables that maybe we don't need replicate?

Cheers and many thanks for you time :)

nueces...
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-02 Thread Juan A. Diaz
2013/2/2 Shane Hathaway sh...@hathawaymix.org:
 On 02/02/2013 04:13 PM, Juan A. Diaz wrote:

 2013/2/2 Shane Hathaway sh...@hathawaymix.org:

 On 02/01/2013 09:08 PM, Juan A. Diaz wrote:

 Do you think that add one option in zodbpack to truncate this tables
 after the pack could be a god idea?


 The object_ref table is intended to help the next pack run quickly, but I
 suppose it might make sense to clear it anyway with an option.

Ok, I going to try implement it in the next week, and send a pull request :)

 If your database grows by 2.2 GB per day, it's not surprising that the
 database is 15 GB.  With drive and RAM sizes today, 15 GB doesn't sound
 like
 a problem to me... unless it's on a Raspberry Pi. :-)


 Yes, but after the pack the size of new objects that remain in the
 database is just like 200MB.

 Also 15GB as you say is not a real big database for this days, but we
 are synchronizing our databases through a low bandwidth channel across
 various datacenters and in some cases recover the database from a
 failure in the sync process is real pain!. Do you think that is
 possible to don't replicate that tables could be safe? There are other
 tables that maybe we don't need replicate?


 You never need to replicate the MyISAM tables.  Only the InnoDB tables
 (object_state, current_object, transaction) need to be replicated.

Perfect!

 Shane


Thank again for you time =)

Cheers.

nueces...
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev