Re: [ZODB-Dev] zodb conversion questions

2013-02-07 Thread Jürgen Herrmann

Am 06.02.2013 15:05, schrieb Jürgen Herrmann:

Hi there!

I hav a relstorage with mysql backend that grew out of bounds
and we're looking into different backend solutions now. Possibly
also going back to FileStorage and using zeo...

Anyway we'll have to convert the databases at some point. As these
are live DBs we cannot shut them down for longer than the
ususal maintenance interval during the night, so for maybe 2-3h.

a full conversion process will never complete in this time so
we're looking for a process that can split the conversion into
two phases:

1. copy transactions from backup of the source db to the destination
   db. this can take a long time, we don't care. note the last
   timestamp/transaction_id converted.
2. shut down the source db
3. copy transactions from the source db to the destination db, 
starting

   at the last converted transaction_id. this should be fast, as only
   a few transactions need to be converted, say  1% .


if i would reimplement copyTransactionsFrom() to accept a start
transaction_id/timestamp, would this result in dest being an exact
copy of source?

source = open_my_source_storage()
dest = open_my_destination_storage()
dest.copyTransactionsFrom(source)
last_txn_id = source.lastTransaction()
source.close()
dest.close()

source = open_my_source_storage()
# add some transactions
source.close()

source = open_my_source_storage()
dest = open_my_destination_storage()
dest.copyTransactionsFrom(source, last_txn_id=last_txn_id)
source.close()
dest.close()


I will reply to myself here :) This actually works, tested with a
modified version of FileStorage for now. I modified the signature
of copyTransactionsFrom to look like this:

def copyTransactionsFrom(self, source, verbose=0, not_before_tid=None):

not_before_tid is a packed tid or None, None meaning copy all
(the default, so no existing API usage would break).

Is there public interest in modifying this API permamently?
Anybody want to look at the actual code changes?

best regards,
Jürgen Herrmann
--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zodb conversion questions

2013-02-07 Thread Jim Fulton
On Thu, Feb 7, 2013 at 10:48 AM, Jürgen Herrmann
juergen.herrm...@xlhost.de wrote:
 Am 06.02.2013 15:05, schrieb Jürgen Herrmann:

 Hi there!

 I hav a relstorage with mysql backend that grew out of bounds
 and we're looking into different backend solutions now. Possibly
 also going back to FileStorage and using zeo...

 Anyway we'll have to convert the databases at some point. As these
 are live DBs we cannot shut them down for longer than the
 ususal maintenance interval during the night, so for maybe 2-3h.

 a full conversion process will never complete in this time so
 we're looking for a process that can split the conversion into
 two phases:

 1. copy transactions from backup of the source db to the destination
db. this can take a long time, we don't care. note the last
timestamp/transaction_id converted.
 2. shut down the source db
 3. copy transactions from the source db to the destination db, starting
at the last converted transaction_id. this should be fast, as only
a few transactions need to be converted, say  1% .


 if i would reimplement copyTransactionsFrom() to accept a start
 transaction_id/timestamp, would this result in dest being an exact
 copy of source?

 source = open_my_source_storage()
 dest = open_my_destination_storage()
 dest.copyTransactionsFrom(source)
 last_txn_id = source.lastTransaction()
 source.close()
 dest.close()

 source = open_my_source_storage()
 # add some transactions
 source.close()

 source = open_my_source_storage()
 dest = open_my_destination_storage()
 dest.copyTransactionsFrom(source, last_txn_id=last_txn_id)
 source.close()
 dest.close()


 I will reply to myself here :) This actually works, tested with a
 modified version of FileStorage for now. I modified the signature
 of copyTransactionsFrom to look like this:

 def copyTransactionsFrom(self, source, verbose=0, not_before_tid=None):

``start`` would be better to be consistent with the iterator API.

 not_before_tid is a packed tid or None, None meaning copy all
 (the default, so no existing API usage would break).

 Is there public interest in modifying this API permamently?

+.1

This API is a bit of an attractive nuisance.  I'd rather people
learn how to use iterators in their own scripts, as they are very
useful and powerful.  This API just hides that.

The second part, replaying old transactions is a bit more subtle,
but it's still worth it for people to be aware of it.

If I were doing this today, I'd make this documentation
rather than API. But then, documentation ... whimper.

 Anybody want to look at the actual code changes?

Sure, if they have tests.  Unfortunately, we can only accept
pull requests from zope contributors. Are you one?
Wanna be one? :)

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
Jerky is better than bacon! http://zo.pe/Kqm
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zodb conversion questions

2013-02-07 Thread Jürgen Herrmann

@jim, resent to the list, sorry.

Am 07.02.2013 17:11, schrieb Jim Fulton:

On Thu, Feb 7, 2013 at 10:48 AM, Jürgen Herrmann
juergen.herrm...@xlhost.de wrote:

Am 06.02.2013 15:05, schrieb Jürgen Herrmann:


Hi there!

I hav a relstorage with mysql backend that grew out of bounds
and we're looking into different backend solutions now. Possibly
also going back to FileStorage and using zeo...

Anyway we'll have to convert the databases at some point. As these
are live DBs we cannot shut them down for longer than the
ususal maintenance interval during the night, so for maybe 2-3h.

a full conversion process will never complete in this time so
we're looking for a process that can split the conversion into
two phases:

1. copy transactions from backup of the source db to the 
destination

   db. this can take a long time, we don't care. note the last
   timestamp/transaction_id converted.
2. shut down the source db
3. copy transactions from the source db to the destination db, 
starting
   at the last converted transaction_id. this should be fast, as 
only

   a few transactions need to be converted, say  1% .


if i would reimplement copyTransactionsFrom() to accept a start
transaction_id/timestamp, would this result in dest being an exact
copy of source?

source = open_my_source_storage()
dest = open_my_destination_storage()
dest.copyTransactionsFrom(source)
last_txn_id = source.lastTransaction()
source.close()
dest.close()

source = open_my_source_storage()
# add some transactions
source.close()

source = open_my_source_storage()
dest = open_my_destination_storage()
dest.copyTransactionsFrom(source, last_txn_id=last_txn_id)
source.close()
dest.close()



I will reply to myself here :) This actually works, tested with a
modified version of FileStorage for now. I modified the signature
of copyTransactionsFrom to look like this:

def copyTransactionsFrom(self, source, verbose=0, 
not_before_tid=None):


``start`` would be better to be consistent with the iterator API.


this was my first approach, though for my usecase it would be 
misleading

as the code roughly looks like this:

  if tid  not_before_tid:
continue

and it excludes the given tid from the transactions re-stored. maybe
we can come up with a better name but start doesn't nail it :)




not_before_tid is a packed tid or None, None meaning copy all
(the default, so no existing API usage would break).

Is there public interest in modifying this API permamently?


+.1

This API is a bit of an attractive nuisance.  I'd rather people
learn how to use iterators in their own scripts, as they are very
useful and powerful.  This API just hides that.


not sure i understand this correctly, maybe you could elaborate a
bit more? for my usecase you'd suggest i just use the storage iterator
and walk/re-store the transactions in my own code?
there's a lot of checking and branching going on inside
copyTransactionsFrom(), that's why i asked if this would work in the
first place.


The second part, replaying old transactions is a bit more subtle,
but it's still worth it for people to be aware of it.

If I were doing this today, I'd make this documentation
rather than API. But then, documentation ... whimper.


Anybody want to look at the actual code changes?


Sure, if they have tests.  Unfortunately, we can only accept
pull requests from zope contributors. Are you one?
Wanna be one? :)


i'll look at the supplied test and see if i can make my test script
a proper test case for the test suite. shouldn't be too hard.

we'll decide about the contributor stuff after that :)

btw i need this to be in the ZODB version current Zope2 uses,
is this one on github already? if so, where can i find it? even
if i don't become a contributor this would make generating patches
much easier.



Jim


thanks for your help!

Jürgen

--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Shane Hathaway

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.


Then I'd like to talk a little about my current relstorage setup here:
It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of the
DB and let zodbconvert convert it back to a FileStorage, the resulting
file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over an
hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special zodbconvert
that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be transformed
into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't 
surprise me very much.


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Jürgen Herrmann

Am 07.02.2013 20:22, schrieb Shane Hathaway:

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.

Then I'd like to talk a little about my current relstorage setup 
here:

It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of 
the
DB and let zodbconvert convert it back to a FileStorage, the 
resulting

file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over an
hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special zodbconvert
that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be transformed
into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't
surprise me very much.


I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/

That issue has given me some sleepless nights, especially because
the conversion step to another storage type takes quite a long
time. But in less than two hours i came up with a workable
solution today, maybe see the other messages on the list regarding
that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :)

best regards,
Jürgen
--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Jürgen Herrmann

Am 07.02.2013 21:18, schrieb Jürgen Herrmann:

Am 07.02.2013 20:22, schrieb Shane Hathaway:

On 02/06/2013 04:23 AM, Jürgen Herrmann wrote:

I think this is not entirely correct. I ran in to problems serveral
times when new_oid was emptied! Maybe Shane can confirm this?
(results in read conlfict errors)


Ah, that's true. You do need to replicate new_oid.

Then I'd like to talk a little about my current relstorage setup 
here:

It's backed by mysql, history-preserving setup. Recently one of our
DBs started to grow very quickly and it's object_state.ibd (InnoDB)
file is just over 86GB as of today. Packing now fails due to mysql
not being able to complete sorts in the object_ref table. 
object_ref
is also very big (36GB MYD file, 25GB MYI file). I took a backup of 
the
DB and let zodbconvert convert it back to a FileStorage, the 
resulting

file is 6GB (!). I will pack it and see how big it is then. I will
also investigate how big on disk this DB would be when stored in
postgresql. This situation poses another problem for us: using
zodbconvert to convert this mess to a Filestorage tages just over 
an

hour when writing to a ramdisk. I suspect converting to postgres
will take more than 10 hours, which is unacceptable for us as this
is a live database an cannot be offline for more than 2-3 hours in
the nicht. So we will have to investigate into a special 
zodbconvert

that uses a two step process:
1. import transactions to new storage from a mysql db backup
2. import rest of transactions that occurred after the backup was
made from the live database (which is offline for that time
of course)

looking at zodbconvert using copyTransactionsFrom() i thnik this
should be possible but up to now i did non investigate furhter.
maybe shane could confirm this? maybe this could also be 
transformed

into a neat way of getting incremental backups out of zodbs in
general?


Yes, that could work.

As for MySQL growing tables without bounds... well, that wouldn't
surprise me very much.


I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/


Also for the docs: on disk Restorage/MySQL uses 4x the size of a
FileStorage with same contents. As packing tables are filled this
grows by another factor of ~2. If you don't pack very regularly
you might up ending in DBs that donb't permit packing anymore
because of the big size very quickly.

best regards,
Jürgen



That issue has given me some sleepless nights, especially because
the conversion step to another storage type takes quite a long
time. But in less than two hours i came up with a workable
solution today, maybe see the other messages on the list regarding
that issue. I LOVE OPEN SOURCE. I LOVE PYTHON. :)

best regards,
Jürgen
--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


--

XLhost.de ® - Webhosting von supersmall bis eXtra Large 


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage and over growing database.

2013-02-07 Thread Shane Hathaway

On 02/07/2013 01:54 PM, Jürgen Herrmann wrote:

Am 07.02.2013 21:18, schrieb Jürgen Herrmann:

I know that's entirely not your fault but may be worth mentioning
in the docs. Relstorage with MySQL works *very* well for DB sizes
5GB or so, above that - not so much :/


Also for the docs: on disk Restorage/MySQL uses 4x the size of a
FileStorage with same contents. As packing tables are filled this
grows by another factor of ~2. If you don't pack very regularly
you might up ending in DBs that donb't permit packing anymore
because of the big size very quickly.


I suspect there are ways to fix all of that in the MySQL configuration. 
Like any SQL database, MySQL needs tuning as it grows.  Meanwhile, 
FileStorage doesn't really have any knobs, and it always stores in a 
fairly optimal way, so it's easier to use.


FileStorage has a couple of issues that often drive people to 
RelStorage: (1) the on-disk format is unique to FileStorage, and there 
aren't many tools available for analyzing and fixing a broken Data.fs. 
(2) FileStorage only supports multiple clients through ZEO, which has 
relatively high latency.  If these issues don't impact you, then 
FileStorage is clearly the better choice for you.


Shane

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev