Re: [ZODB-Dev] Relstorage and over growing database.
On Wed, Nov 13, 2013 at 9:24 AM, Jens W. Klein j...@bluedynamics.comwrote: Thanks Martijn for the hint, but we are using a history free database, so growing does only happen by deleted objects in our case. Right, that is an important distinction. When in history free mode, is it possible to detect deleted objects at store-time? This way we could add the zoid at store time to a objects_deleted table in order to clean them up later. No, because multiple references to an object might exist. There is no reference counting in a ZODB, hence the intensive manual tree traversal job when garbage collecting. Another way to speed up graph traversal would be to store the object-references in a field of object_state. At the moment we have to read the pickle in order to get the referenced zoids. Storing additional - redundant - information might be not perfect, but it would allow to pack/gc the database without any knowledge about the state objects structure, i.e. using a stored procedure. That sounds like a feasible idea, at least to me. I would like to know what the relstorage experts think about this ideas. You may want to contact Shane directly, he *may* not be reading this list actively. -- Martijn Pieters ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 2013-11-12 17:48, Martijn Pieters wrote: On Mon, Nov 11, 2013 at 9:24 PM, Daniel Widerin dan...@widerin.net mailto:dan...@widerin.net wrote: Anyone experienced similar problems packing large relstorage databases? The graph traversal takes a really long time. maybe we can improve that by storing additional information in the relational database? You should (at least initially) pack *without* GC (set pack-gc true to false); I packed a humongous RelStorage-backed database before, and packed to earlier dates in the past first to minimize the amount of data removed in a single transaction. Only when we were down to a reasonable size database did we enable garbage collection. Thanks Martijn for the hint, but we are using a history free database, so growing does only happen by deleted objects in our case. When in history free mode, is it possible to detect deleted objects at store-time? This way we could add the zoid at store time to a objects_deleted table in order to clean them up later. Another way to speed up graph traversal would be to store the object-references in a field of object_state. At the moment we have to read the pickle in order to get the referenced zoids. Storing additional - redundant - information might be not perfect, but it would allow to pack/gc the database without any knowledge about the state objects structure, i.e. using a stored procedure. I would like to know what the relstorage experts think about this ideas. kind regards Jens -- Klein Partner KG, member of BlueDynamics Alliance ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On Mon, Nov 11, 2013 at 9:24 PM, Daniel Widerin dan...@widerin.net wrote: Anyone experienced similar problems packing large relstorage databases? The graph traversal takes a really long time. maybe we can improve that by storing additional information in the relational database? You should (at least initially) pack *without* GC (set pack-gc true to false); I packed a humongous RelStorage-backed database before, and packed to earlier dates in the past first to minimize the amount of data removed in a single transaction. Only when we were down to a reasonable size database did we enable garbage collection. -- Martijn Pieters ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
Hi, just want to share our experience: My ZODB contains 300mio objects on relstorage/pgsql. The amount of objects is caused by btrees stored on plone dexterity contenttypes. It's size is 160GB. At that size it's impossible to pack because the pre-pack takes 100 days. jensens and me are searching for different packing algorithms and methods to achieve better packing performance. We're keeping you updated here! How i solved my problem for now: I converted into FileStorage which took about 40 hours and Data.fs was 55GB in size. Now i tried to run zeopack on that database - which succeeded and database was reduced to 7.8 GB - still containing 40mio objects. After that i migrated back to relstorage because of better performance and the result is a 11 GB db in pgsql. Anyone experienced similar problems packing large relstorage databases? The graph traversal takes a really long time. maybe we can improve that by storing additional information in the relational database? Any hints or comments are welcome. Daniel Am 07.02.13 23:10, schrieb Shane Hathaway: On 02/07/2013 01:54 PM, Jürgen Herrmann wrote: Am 07.02.2013 21:18, schrieb Jürgen Herrmann: I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ Also for the docs: on disk Restorage/MySQL uses 4x the size of a FileStorage with same contents. As packing tables are filled this grows by another factor of ~2. If you don't pack very regularly you might up ending in DBs that donb't permit packing anymore because of the big size very quickly. I suspect there are ways to fix all of that in the MySQL configuration. Like any SQL database, MySQL needs tuning as it grows. Meanwhile, FileStorage doesn't really have any knobs, and it always stores in a fairly optimal way, so it's easier to use. FileStorage has a couple of issues that often drive people to RelStorage: (1) the on-disk format is unique to FileStorage, and there aren't many tools available for analyzing and fixing a broken Data.fs. (2) FileStorage only supports multiple clients through ZEO, which has relatively high latency. If these issues don't impact you, then FileStorage is clearly the better choice for you. Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On Mon, Nov 11, 2013 at 4:24 PM, Daniel Widerin dan...@widerin.net wrote: Hi, just want to share our experience: My ZODB contains 300mio objects on relstorage/pgsql. The amount of objects is caused by btrees stored on plone dexterity contenttypes. It's size is 160GB. At that size it's impossible to pack because the pre-pack takes 100 days. jensens and me are searching for different packing algorithms and methods to achieve better packing performance. We're keeping you updated here! How i solved my problem for now: I converted into FileStorage which took about 40 hours and Data.fs was 55GB in size. Now i tried to run zeopack on that database - which succeeded and database was reduced to 7.8 GB - still containing 40mio objects. After that i migrated back to relstorage because of better performance and the result is a 11 GB db in pgsql. Hah. Nice. Have you measured an improvement in relstorage performance in practice? Is it enough to justify this hassle? WRT packaging algorithms: - You might look at zc.FileStorage which takes a slightly different approach than FileStorage: - Does most of the packing work in a separate process to avoid the GIL. - Doesn't do GC. - Has some other optimizations I don't recall. For our large databases, it's much faster than normal file-storage packing. - Consider separating garbage collection and packing. This allows garbage collection to be run mostly against a replica and to be spread out, if necessary. Look at zc.zodbdgc. Anyone experienced similar problems packing large relstorage databases? The graph traversal takes a really long time. maybe we can improve that by storing additional information in the relational database? Any hints or comments are welcome. Definately look at zodbdgc. It doesn't traverse the graph. It essentially does reference counting and is able to iterate over the database, which for FileStorage, is relatively quick. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
Am 07.02.2013 20:22, schrieb Shane Hathaway: On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ That issue has given me some sleepless nights, especially because the conversion step to another storage type takes quite a long time. But in less than two hours i came up with a workable solution today, maybe see the other messages on the list regarding that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :) best regards, Jürgen -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
Am 07.02.2013 21:18, schrieb Jürgen Herrmann: Am 07.02.2013 20:22, schrieb Shane Hathaway: On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ Also for the docs: on disk Restorage/MySQL uses 4x the size of a FileStorage with same contents. As packing tables are filled this grows by another factor of ~2. If you don't pack very regularly you might up ending in DBs that donb't permit packing anymore because of the big size very quickly. best regards, Jürgen That issue has given me some sleepless nights, especially because the conversion step to another storage type takes quite a long time. But in less than two hours i came up with a workable solution today, maybe see the other messages on the list regarding that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :) best regards, Jürgen -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 02/07/2013 01:54 PM, Jürgen Herrmann wrote: Am 07.02.2013 21:18, schrieb Jürgen Herrmann: I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ Also for the docs: on disk Restorage/MySQL uses 4x the size of a FileStorage with same contents. As packing tables are filled this grows by another factor of ~2. If you don't pack very regularly you might up ending in DBs that donb't permit packing anymore because of the big size very quickly. I suspect there are ways to fix all of that in the MySQL configuration. Like any SQL database, MySQL needs tuning as it grows. Meanwhile, FileStorage doesn't really have any knobs, and it always stores in a fairly optimal way, so it's easier to use. FileStorage has a couple of issues that often drive people to RelStorage: (1) the on-disk format is unique to FileStorage, and there aren't many tools available for analyzing and fixing a broken Data.fs. (2) FileStorage only supports multiple clients through ZEO, which has relatively high latency. If these issues don't impact you, then FileStorage is clearly the better choice for you. Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 02/01/2013 09:08 PM, Juan A. Diaz wrote: Reading the some comments [0] in the code (relstorage/adapters/schema.py) I could see that the object_ref database is uses during the packing, then the question is, in a history-preserving database there is something that we could do to decrease the size of that table? could be safe truncate that table? We want to move to a storage with history-free, but for now we are looking some options and actions to perform on production without the object_ref is essentially a cache of object_state, and object_refs_added is a list of what's in that cache. Therefore you can freely truncate object_ref as long as you also truncate object_refs_added. Don't truncate them during packing, though. When we realize that the size of the database was over growing we start to make daily packs but now after read this comment I start to think that that could be now part of the problem, could be? Normally in a normal day the DB grows like 2.2GB, but after make a pack the DB size is decrease clouse to 1.5GB or 2GB. If your database grows by 2.2 GB per day, it's not surprising that the database is 15 GB. With drive and RAM sizes today, 15 GB doesn't sound like a problem to me... unless it's on a Raspberry Pi. :-) Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
2013/2/2 Shane Hathaway sh...@hathawaymix.org: On 02/01/2013 09:08 PM, Juan A. Diaz wrote: Reading the some comments [0] in the code (relstorage/adapters/schema.py) I could see that the object_ref database is uses during the packing, then the question is, in a history-preserving database there is something that we could do to decrease the size of that table? could be safe truncate that table? We want to move to a storage with history-free, but for now we are looking some options and actions to perform on production without the object_ref is essentially a cache of object_state, and object_refs_added is a list of what's in that cache. Therefore you can freely truncate object_ref as long as you also truncate object_refs_added. Don't truncate them during packing, though. Do you think that add one option in zodbpack to truncate this tables after the pack could be a god idea? When we realize that the size of the database was over growing we start to make daily packs but now after read this comment I start to think that that could be now part of the problem, could be? Normally in a normal day the DB grows like 2.2GB, but after make a pack the DB size is decrease clouse to 1.5GB or 2GB. If your database grows by 2.2 GB per day, it's not surprising that the database is 15 GB. With drive and RAM sizes today, 15 GB doesn't sound like a problem to me... unless it's on a Raspberry Pi. :-) Shane Yes, but after the pack the size of new objects that remain in the database is just like 200MB. Also 15GB as you say is not a real big database for this days, but we are synchronizing our databases through a low bandwidth channel across various datacenters and in some cases recover the database from a failure in the sync process is real pain!. Do you think that is possible to don't replicate that tables could be safe? There are other tables that maybe we don't need replicate? Cheers and many thanks for you time :) nueces... ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
2013/2/2 Shane Hathaway sh...@hathawaymix.org: On 02/02/2013 04:13 PM, Juan A. Diaz wrote: 2013/2/2 Shane Hathaway sh...@hathawaymix.org: On 02/01/2013 09:08 PM, Juan A. Diaz wrote: Do you think that add one option in zodbpack to truncate this tables after the pack could be a god idea? The object_ref table is intended to help the next pack run quickly, but I suppose it might make sense to clear it anyway with an option. Ok, I going to try implement it in the next week, and send a pull request :) If your database grows by 2.2 GB per day, it's not surprising that the database is 15 GB. With drive and RAM sizes today, 15 GB doesn't sound like a problem to me... unless it's on a Raspberry Pi. :-) Yes, but after the pack the size of new objects that remain in the database is just like 200MB. Also 15GB as you say is not a real big database for this days, but we are synchronizing our databases through a low bandwidth channel across various datacenters and in some cases recover the database from a failure in the sync process is real pain!. Do you think that is possible to don't replicate that tables could be safe? There are other tables that maybe we don't need replicate? You never need to replicate the MyISAM tables. Only the InnoDB tables (object_state, current_object, transaction) need to be replicated. Perfect! Shane Thank again for you time =) Cheers. nueces... ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev