Re: [ZODB-Dev] zodb conversion questions
Am 06.02.2013 15:05, schrieb Jürgen Herrmann: Hi there! I hav a relstorage with mysql backend that grew out of bounds and we're looking into different backend solutions now. Possibly also going back to FileStorage and using zeo... Anyway we'll have to convert the databases at some point. As these are live DBs we cannot shut them down for longer than the ususal maintenance interval during the night, so for maybe 2-3h. a full conversion process will never complete in this time so we're looking for a process that can split the conversion into two phases: 1. copy transactions from backup of the source db to the destination db. this can take a long time, we don't care. note the last timestamp/transaction_id converted. 2. shut down the source db 3. copy transactions from the source db to the destination db, starting at the last converted transaction_id. this should be fast, as only a few transactions need to be converted, say 1% . if i would reimplement copyTransactionsFrom() to accept a start transaction_id/timestamp, would this result in dest being an exact copy of source? source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source) last_txn_id = source.lastTransaction() source.close() dest.close() source = open_my_source_storage() # add some transactions source.close() source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source, last_txn_id=last_txn_id) source.close() dest.close() I will reply to myself here :) This actually works, tested with a modified version of FileStorage for now. I modified the signature of copyTransactionsFrom to look like this: def copyTransactionsFrom(self, source, verbose=0, not_before_tid=None): not_before_tid is a packed tid or None, None meaning copy all (the default, so no existing API usage would break). Is there public interest in modifying this API permamently? Anybody want to look at the actual code changes? best regards, Jürgen Herrmann -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] zodb conversion questions
On Thu, Feb 7, 2013 at 10:48 AM, Jürgen Herrmann juergen.herrm...@xlhost.de wrote: Am 06.02.2013 15:05, schrieb Jürgen Herrmann: Hi there! I hav a relstorage with mysql backend that grew out of bounds and we're looking into different backend solutions now. Possibly also going back to FileStorage and using zeo... Anyway we'll have to convert the databases at some point. As these are live DBs we cannot shut them down for longer than the ususal maintenance interval during the night, so for maybe 2-3h. a full conversion process will never complete in this time so we're looking for a process that can split the conversion into two phases: 1. copy transactions from backup of the source db to the destination db. this can take a long time, we don't care. note the last timestamp/transaction_id converted. 2. shut down the source db 3. copy transactions from the source db to the destination db, starting at the last converted transaction_id. this should be fast, as only a few transactions need to be converted, say 1% . if i would reimplement copyTransactionsFrom() to accept a start transaction_id/timestamp, would this result in dest being an exact copy of source? source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source) last_txn_id = source.lastTransaction() source.close() dest.close() source = open_my_source_storage() # add some transactions source.close() source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source, last_txn_id=last_txn_id) source.close() dest.close() I will reply to myself here :) This actually works, tested with a modified version of FileStorage for now. I modified the signature of copyTransactionsFrom to look like this: def copyTransactionsFrom(self, source, verbose=0, not_before_tid=None): ``start`` would be better to be consistent with the iterator API. not_before_tid is a packed tid or None, None meaning copy all (the default, so no existing API usage would break). Is there public interest in modifying this API permamently? +.1 This API is a bit of an attractive nuisance. I'd rather people learn how to use iterators in their own scripts, as they are very useful and powerful. This API just hides that. The second part, replaying old transactions is a bit more subtle, but it's still worth it for people to be aware of it. If I were doing this today, I'd make this documentation rather than API. But then, documentation ... whimper. Anybody want to look at the actual code changes? Sure, if they have tests. Unfortunately, we can only accept pull requests from zope contributors. Are you one? Wanna be one? :) Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] zodb conversion questions
@jim, resent to the list, sorry. Am 07.02.2013 17:11, schrieb Jim Fulton: On Thu, Feb 7, 2013 at 10:48 AM, Jürgen Herrmann juergen.herrm...@xlhost.de wrote: Am 06.02.2013 15:05, schrieb Jürgen Herrmann: Hi there! I hav a relstorage with mysql backend that grew out of bounds and we're looking into different backend solutions now. Possibly also going back to FileStorage and using zeo... Anyway we'll have to convert the databases at some point. As these are live DBs we cannot shut them down for longer than the ususal maintenance interval during the night, so for maybe 2-3h. a full conversion process will never complete in this time so we're looking for a process that can split the conversion into two phases: 1. copy transactions from backup of the source db to the destination db. this can take a long time, we don't care. note the last timestamp/transaction_id converted. 2. shut down the source db 3. copy transactions from the source db to the destination db, starting at the last converted transaction_id. this should be fast, as only a few transactions need to be converted, say 1% . if i would reimplement copyTransactionsFrom() to accept a start transaction_id/timestamp, would this result in dest being an exact copy of source? source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source) last_txn_id = source.lastTransaction() source.close() dest.close() source = open_my_source_storage() # add some transactions source.close() source = open_my_source_storage() dest = open_my_destination_storage() dest.copyTransactionsFrom(source, last_txn_id=last_txn_id) source.close() dest.close() I will reply to myself here :) This actually works, tested with a modified version of FileStorage for now. I modified the signature of copyTransactionsFrom to look like this: def copyTransactionsFrom(self, source, verbose=0, not_before_tid=None): ``start`` would be better to be consistent with the iterator API. this was my first approach, though for my usecase it would be misleading as the code roughly looks like this: if tid not_before_tid: continue and it excludes the given tid from the transactions re-stored. maybe we can come up with a better name but start doesn't nail it :) not_before_tid is a packed tid or None, None meaning copy all (the default, so no existing API usage would break). Is there public interest in modifying this API permamently? +.1 This API is a bit of an attractive nuisance. I'd rather people learn how to use iterators in their own scripts, as they are very useful and powerful. This API just hides that. not sure i understand this correctly, maybe you could elaborate a bit more? for my usecase you'd suggest i just use the storage iterator and walk/re-store the transactions in my own code? there's a lot of checking and branching going on inside copyTransactionsFrom(), that's why i asked if this would work in the first place. The second part, replaying old transactions is a bit more subtle, but it's still worth it for people to be aware of it. If I were doing this today, I'd make this documentation rather than API. But then, documentation ... whimper. Anybody want to look at the actual code changes? Sure, if they have tests. Unfortunately, we can only accept pull requests from zope contributors. Are you one? Wanna be one? :) i'll look at the supplied test and see if i can make my test script a proper test case for the test suite. shouldn't be too hard. we'll decide about the contributor stuff after that :) btw i need this to be in the ZODB version current Zope2 uses, is this one on github already? if so, where can i find it? even if i don't become a contributor this would make generating patches much easier. Jim thanks for your help! Jürgen -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
Am 07.02.2013 20:22, schrieb Shane Hathaway: On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ That issue has given me some sleepless nights, especially because the conversion step to another storage type takes quite a long time. But in less than two hours i came up with a workable solution today, maybe see the other messages on the list regarding that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :) best regards, Jürgen -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
Am 07.02.2013 21:18, schrieb Jürgen Herrmann: Am 07.02.2013 20:22, schrieb Shane Hathaway: On 02/06/2013 04:23 AM, Jürgen Herrmann wrote: I think this is not entirely correct. I ran in to problems serveral times when new_oid was emptied! Maybe Shane can confirm this? (results in read conlfict errors) Ah, that's true. You do need to replicate new_oid. Then I'd like to talk a little about my current relstorage setup here: It's backed by mysql, history-preserving setup. Recently one of our DBs started to grow very quickly and it's object_state.ibd (InnoDB) file is just over 86GB as of today. Packing now fails due to mysql not being able to complete sorts in the object_ref table. object_ref is also very big (36GB MYD file, 25GB MYI file). I took a backup of the DB and let zodbconvert convert it back to a FileStorage, the resulting file is 6GB (!). I will pack it and see how big it is then. I will also investigate how big on disk this DB would be when stored in postgresql. This situation poses another problem for us: using zodbconvert to convert this mess to a Filestorage tages just over an hour when writing to a ramdisk. I suspect converting to postgres will take more than 10 hours, which is unacceptable for us as this is a live database an cannot be offline for more than 2-3 hours in the nicht. So we will have to investigate into a special zodbconvert that uses a two step process: 1. import transactions to new storage from a mysql db backup 2. import rest of transactions that occurred after the backup was made from the live database (which is offline for that time of course) looking at zodbconvert using copyTransactionsFrom() i thnik this should be possible but up to now i did non investigate furhter. maybe shane could confirm this? maybe this could also be transformed into a neat way of getting incremental backups out of zodbs in general? Yes, that could work. As for MySQL growing tables without bounds... well, that wouldn't surprise me very much. I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ Also for the docs: on disk Restorage/MySQL uses 4x the size of a FileStorage with same contents. As packing tables are filled this grows by another factor of ~2. If you don't pack very regularly you might up ending in DBs that donb't permit packing anymore because of the big size very quickly. best regards, Jürgen That issue has given me some sleepless nights, especially because the conversion step to another storage type takes quite a long time. But in less than two hours i came up with a workable solution today, maybe see the other messages on the list regarding that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :) best regards, Jürgen -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev -- XLhost.de ® - Webhosting von supersmall bis eXtra Large XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 Web: http://www.XLhost.de ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage and over growing database.
On 02/07/2013 01:54 PM, Jürgen Herrmann wrote: Am 07.02.2013 21:18, schrieb Jürgen Herrmann: I know that's entirely not your fault but may be worth mentioning in the docs. Relstorage with MySQL works *very* well for DB sizes 5GB or so, above that - not so much :/ Also for the docs: on disk Restorage/MySQL uses 4x the size of a FileStorage with same contents. As packing tables are filled this grows by another factor of ~2. If you don't pack very regularly you might up ending in DBs that donb't permit packing anymore because of the big size very quickly. I suspect there are ways to fix all of that in the MySQL configuration. Like any SQL database, MySQL needs tuning as it grows. Meanwhile, FileStorage doesn't really have any knobs, and it always stores in a fairly optimal way, so it's easier to use. FileStorage has a couple of issues that often drive people to RelStorage: (1) the on-disk format is unique to FileStorage, and there aren't many tools available for analyzing and fixing a broken Data.fs. (2) FileStorage only supports multiple clients through ZEO, which has relatively high latency. If these issues don't impact you, then FileStorage is clearly the better choice for you. Shane ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev