Re: [ZODB-Dev] Relstorage pack problems
On Fri, Jan 23, 2009 at 8:45 PM, Shane Hathaway sh...@hathawaymix.org wrote: Shane Hathaway wrote: Assuming your bad script caused your problem, it is likely that packing will still mess up your database, since you still probably have mixed-up object_state rows. Don't pack until I've had a chance to look again. Here is some more analysis. Now that I understand you accidentally merged two databases into one by forcing copyTransactionsFrom() to run when it shouldn't, I looked for the transactions you merged. First I looked for the OIDs with a confused transaction ID. = select zoid from current_object where tid != (select max(tid) from object_state where object_state.zoid = current_object.zoid); zoid -- 7 10 12 11 9 8 (6 rows) Then I listed all non-current transaction IDs for those objects. = select zoid, tid from object_state where zoid in (7,8,9,10,11,12) and tid != (select tid from current_object where current_object.zoid = object_state.zoid); zoid |tid --+ 8 | 250499913748614178 9 | 250499913748614178 10 | 250499913748614178 11 | 250499913748614178 12 | 250499913748614178 7 | 250499913748614178 (6 rows) Based on this information and the information in my last email, I can deduce that you fortunately merged only two transactions from another database and that while the merge caused conflicts, these objects haven't been otherwise modified. Note that the bad database merge could have happened at any time, not necessarily November 17 when these transactions were created. Anyone with access to your database and your broken script could cause this problem again. Fix the script quickly. Here are the two bad transactions: 250499913441768123 | initial database creation 250499913748614178 | /manage_main\012\012Created Zope Application You need to delete all traces of these two transactions from your database. Before you do, please ensure nothing is actually using them. The query below should not return any rows. select * from current_object where tid in (250499913441768123, 250499913748614178); Assuming that query returns no rows, here is how you can remove the bad transactions: update object_state set prev_tid = 0 where prev_tid in (250499913441768123, 250499913748614178); delete from object_state where tid in (250499913441768123, 250499913748614178); delete from object_ref where tid in (250499913441768123, 250499913748614178); delete from object_refs_added where tid in (250499913441768123, 250499913748614178); delete from transaction where tid in (250499913441768123, 250499913748614178); commit; Once you've done that, you should see no more anomalies in current_object: = select zoid from current_object where tid != (select max(tid) from object_state where object_state.zoid = current_object.zoid); zoid -- (0 rows) I used several shortcuts for this solution, particularly the statement that sets prev_tid to 0. If you had merged a more complex database, I wouldn't have been able to use shortcuts. I'm glad to know RelStorage didn't do anything wrong after all. Perhaps the copyTransactionsFrom() method could work harder to prevent a mishap like this, but that method is part of the ZODB API, not RelStorage, so I don't have as much control over it. However, I still don't want you to pack yet because my experiments with packing your database has revealed some unexpected behavior. I'm going to look into it. Thanks again, Shane. We'll fix the script, try removing this 2 transactions and packing on a copy of the database, to see what happens. The last pack on a copy works, but then the application raises a KeyError 8, probably becouse this zoid is one of the affected by the wrong transactions. -- Santi Camps (Earcon S.L.) http://www.earcon.com http://www.kmkey.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
On Fri, Jan 23, 2009 at 2:38 AM, Shane Hathaway sh...@hathawaymix.org wrote: Santi, I hope you don't mind me discussing your database in public. I'm not going to talk about anything that looks like it could be private. Other RelStorage users might benefit from the analysis. Hi Shane That's right, of course. Furthermore, thanks a lot for this analysis. Looking at your database, I see that something bad happened just before transaction 250499913441768123. That number is an encoded time stamp: from ZODB.TimeStamp import TimeStamp from ZODB.utils import p64 str(TimeStamp(p64(250499913441768123))) '2008-11-17 19:36:04.913130' The transaction log entry says initial database creation, which means that the database had no root object (OID 0), so ZODB created one and started a brand new database. Strange! This happened about an hour after a transaction labeled: /asp_ekartek/kmkey_iso/portal_setup/manage_doUpgrades I'm guessing that an upgrade script did something horribly wrong that day. I've been revising what happens this day, and I think we are near to get the guilty.This day the database was migrated from DirectoryStorage to Relstorage.AFAIK, the upgrade should be done in DirectoryStorage, before the conversion to Relstorage. I don't think the upgrade can corrupt the database, all operations ara high level ones, and we never change ZODB objects by hand. So, the problem should be in the conversion process. I attach the script we use to do the conversion. Be free to include in Relstorage if you think it's useful and it is well done (as I said, I really don't know much about ZODB, I just mix zodbconvert.py with some DirectoryStorage code) Furthermore, the entry for OID 0 in the current_object table points to an old transaction rather than the most recent transaction that modified OID 0. That's not supposed to happen, even when you undo. I hope RelStorage didn't do that! Did you or someone on your team change current_object by hand? I can understand why you would, since a simple modification to current_object would be a nice quick fix for the broken upgrade. The fix would not be complete, though, because now the object_state table and the current_object table disagree on the current state of OID 0. According to object_state, even now, the current state of OID 0 still points to the small object graph that was accidentally created on November 17. The pack code relies more on object_state than on current_object, so the pack code sees only a handful of objects that are reachable. Packing with garbage collection removes everything that is not reachable. The current_object table is really just a cache of object_state. If the schema were fully normalized, there would not be a current_object table. In theory, the current_object table makes it possible to load ZODB objects quickly. But if the current_object table results in problems like this, I need to consider alternatives. We've used the attached script to convert a lot of others databases, that are packing successfully, so something special should occurs in this case.The only explanation I can found is that the conversion would be done without unmounting the DirectoryStorage database from its Zope mount point, and that caused the problem. The right way we use to convert databases from DS to RS is, first of all, detach them from Zope, then convert, and then mount the resultant RS, but It's possible that were a human mistake that day. In any case, I believe you can get out of this mess pretty easily. You need to delete the extra object states for OID 0 created on November 17. I tried this in my copy of your database: delete from object_state where zoid = 0 and tid in ( 250499913441768123, 250499913748614178); After that, select count(1) from object_state where zoid = 0; should tell you there is only one state in the database for OID 0. Packing should work fine then. It seemed to do the right thing on my copy, but I don't have your application code to check it. Thank you very much for that information. I really could not be able to found it myself.We will try it as soon as possible and let you know the results Regards -- Santi Camps (Earcon S.L.) http://www.earcon.com http://www.kmkey.com #!/usr/bin/env python ## # # Copyright (c) 2008 Zope Corporation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED AS IS AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ## ZODB storage
Re: [ZODB-Dev] Relstorage pack problems
Santi Camps wrote: I attach the script we use to do the conversion. Be free to include in Relstorage if you think it's useful and it is well done (as I said, I really don't know much about ZODB, I just mix zodbconvert.py with some DirectoryStorage code) Ah-ha, you removed from zodbconvert.py the code that checks whether the destination already has data. That was unnecessary and a big mistake. By removing that error condition, you made the script blind to the mess it was making. Assuming your bad script caused your problem, it is likely that packing will still mess up your database, since you still probably have mixed-up object_state rows. Don't pack until I've had a chance to look again. Put the error condition back. Here is what it looks like: if storage_has_data(destination): msg = Error: the destination storage has data. Try --clear. sys.exit(msg) It comes just before the call to copyTransactionsFrom(). Also, your current storage_has_data() function simply returns True, which is insane. Here is the correct method: def storage_has_data(storage): i = storage.iterator() try: i[0] except IndexError: return False return True It's possible that function has a bug due to recent changes in the storage iterator API. If it does, I'll fix it. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Alan Runyan wrote: On Fri, Jan 23, 2009 at 1:45 PM, Shane Hathaway sh...@hathawaymix.org mailto:sh...@hathawaymix.org wrote: = select zoid from current_object where tid != (select max(tid) from object_state where object_state.zoid = current_object.zoid); zoid -- 7 10 12 11 9 8 (6 rows) Wow. All of that with some SQL. Cool, huh? One caveat: While PostgreSQL executes this query in seconds, MySQL 5.0 would likely mis-optimize the subquery and take hours. :-/ For MySQL I'd have to use join syntax instead. Out of curiosity what would be the process if you were using FileStorage to diagnose the same problem? Do the same thing by iterating over the entire storage and filling in some structures and doing the comparison manually? Probably, though ZODB has some pretty good tools for low-level FileStorage debugging. Sounds like it would take a few hours, with SQL -- it seems pretty straightforward. I think so too, although with FileStorage it would have been possible to guess at the time when the databases were accidentally merged, by looking at the position of the bad transactions in the file. FileStorage also would have reacted differently to the merged databases, though I have no idea how. The sooner this gets into the mainline of ZODB -- the better. Thanks. Still, FileStorage should always be the default. It's simpler and easier to set up than RelStorage. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Interesting. RelStorage is already set up to do something like that. It splits packing into two phases. In the first phase, it makes a complete list of every row for the second phase to delete. The second phase could be delayed as long as we want. Between the two phases, if any database connections use objects that have been marked for deletion, we could cancel the pack and flag a software bug. If you want we make any kind of test with our databases, It will be a pleasure It would be very helpful to me if you could provide a copy of your database for me to debug. I'm hoping for a compressed SQL dump. Shane Hi again I already know what happens: either object_ref and object_refs_added tables are completely empty in my database. I can't understand why, but it explains the data loose when packing (nothing references nothing). So, the mystery is not in Relstorage but in the data. The only right ways to solve it is to disable pack-gc or try to fill this tables correctly. Do you know any simple way to do it ? Perhaps an export / import should fill this tables ? Thanks a lot for your help -- Santi Camps (Earcon S.L.) http://www.earcon.com http://www.kmkey.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Santi, I hope you don't mind me discussing your database in public. I'm not going to talk about anything that looks like it could be private. Other RelStorage users might benefit from the analysis. Looking at your database, I see that something bad happened just before transaction 250499913441768123. That number is an encoded time stamp: from ZODB.TimeStamp import TimeStamp from ZODB.utils import p64 str(TimeStamp(p64(250499913441768123))) '2008-11-17 19:36:04.913130' The transaction log entry says initial database creation, which means that the database had no root object (OID 0), so ZODB created one and started a brand new database. Strange! This happened about an hour after a transaction labeled: /asp_ekartek/kmkey_iso/portal_setup/manage_doUpgrades I'm guessing that an upgrade script did something horribly wrong that day. Furthermore, the entry for OID 0 in the current_object table points to an old transaction rather than the most recent transaction that modified OID 0. That's not supposed to happen, even when you undo. I hope RelStorage didn't do that! Did you or someone on your team change current_object by hand? I can understand why you would, since a simple modification to current_object would be a nice quick fix for the broken upgrade. The fix would not be complete, though, because now the object_state table and the current_object table disagree on the current state of OID 0. According to object_state, even now, the current state of OID 0 still points to the small object graph that was accidentally created on November 17. The pack code relies more on object_state than on current_object, so the pack code sees only a handful of objects that are reachable. Packing with garbage collection removes everything that is not reachable. The current_object table is really just a cache of object_state. If the schema were fully normalized, there would not be a current_object table. In theory, the current_object table makes it possible to load ZODB objects quickly. But if the current_object table results in problems like this, I need to consider alternatives. In any case, I believe you can get out of this mess pretty easily. You need to delete the extra object states for OID 0 created on November 17. I tried this in my copy of your database: delete from object_state where zoid = 0 and tid in ( 250499913441768123, 250499913748614178); After that, select count(1) from object_state where zoid = 0; should tell you there is only one state in the database for OID 0. Packing should work fine then. It seemed to do the right thing on my copy, but I don't have your application code to check it. Please let me know whether current_object was edited by hand. If it was, then we have a documentation bug. If RelStorage did that on its own, we may have a software bug. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Laurence Rowe wrote: Shane Hathaway wrote: I should note that this KeyError occurs while trying to report on a KeyError. I need to fix that. Fortunately, the same error pops out anyway. There's a fix for this in the Jarn branch. Note that to collect more interesting data it rolls back the load connection at this point, relying on the KeyError to cause the transaction to fail. I saw that, but I didn't feel good about the idea of rolling back the transaction. I wanted the log message to show the state of the database as seen by that connection at that moment in time. That's the kind of information you can't get with a SQL query executed at a later time. I figured that if you really want to see all object states, you can just use SQL queries later. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
On Jan 19, 2009, at 2:22 PM, Shane Hathaway wrote: A proper fix would be to make the mounted object reachable from the mounted database root. I believe the Zope 2 support for mounted databases does this automatically, but it's possible that Zope 3 takes an unwise shortcut. Zope 3 doesn't support mounting. It simply supports multiple databases. It is up to developers to decide how to use multiple databases. This is a significant potential pitfall. IMO, the best solution long term is to provide a distributed garbage collector that is independent of packing. ZODB 3.9 adds a hook to enable this and I intend to write a distributed garbage collector soon. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
On Mon, Jan 19, 2009 at 8:22 PM, Shane Hathaway sh...@hathawaymix.org wrote: Santi Camps wrote: We've been using RelStorage 1.1.c1 in production environments over postgresql 8.1 for some time. It has been working really fine, but yesterday we have a big problem packing a relstorage zodb mounted as a zope mount point. I'm guessing that your mounted object is not attached to the root of the mounted database. In that situation, both RelStorage and FileStorage will pack away practically everything in the mounted database, because from the viewpoint of the mounted database, the disconnected object is unreachable and thus garbage. A quick workaround is to disable garbage collection in zope.conf; then you can pack without losing any objects at all. A proper fix would be to make the mounted object reachable from the mounted database root. I believe the Zope 2 support for mounted databases does this automatically, but it's possible that Zope 3 takes an unwise shortcut. Sorry, I miss to say we are using Zope 2.9.4, and that mounted object is already reachable from the mounted database root (adding a ZODB mount point to the root thought ZMI). Although, the resultant database, after pack, has a big size. So, not all was deleted, but seems the root object not to be found. We are using the same method on a lot of databases and has been working fine for a long time, until this case. I know it's very difficult to say what happened, but I've reported the case because perhaps it's possible to take some measures in order to avoid data loss while packing. I don't know a lot about ZODB, but DirectoryStorage, for instance, don't delete objects immediately, but uses a 'deleted' mark and don't remove them until X days (X configured in the settings). I love this caution of DirectoryStorage, but I like all the other of RelStorage :-) If you want we make any kind of test with our databases, It will be a pleasure Thanks a lot for your help -- Santi Camps (Earcon S.L.) http://www.earcon.com http://www.kmkey.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Santi Camps wrote: Sorry, I miss to say we are using Zope 2.9.4, and that mounted object is already reachable from the mounted database root (adding a ZODB mount point to the root thought ZMI). Ok. Although, the resultant database, after pack, has a big size. So, not all was deleted, but seems the root object not to be found. A side note: PostgreSQL does not actually reduce the size of a database until you vacuum full. Packing a ZODB in PostgreSQL actually temporarily increases the size of the database even if it contains a lot less data. Still, vacuum full is not the right way to control the database size; the right solution is to enable autovacuum and pack on a regular basis. We are using the same method on a lot of databases and has been working fine for a long time, until this case. I know it's very difficult to say what happened, but I've reported the case because perhaps it's possible to take some measures in order to avoid data loss while packing. I don't know a lot about ZODB, but DirectoryStorage, for instance, don't delete objects immediately, but uses a 'deleted' mark and don't remove them until X days (X configured in the settings). I love this caution of DirectoryStorage, but I like all the other of RelStorage :-) Interesting. RelStorage is already set up to do something like that. It splits packing into two phases. In the first phase, it makes a complete list of every row for the second phase to delete. The second phase could be delayed as long as we want. Between the two phases, if any database connections use objects that have been marked for deletion, we could cancel the pack and flag a software bug. If you want we make any kind of test with our databases, It will be a pleasure It would be very helpful to me if you could provide a copy of your database for me to debug. I'm hoping for a compressed SQL dump. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Shane Hathaway wrote: I should note that this KeyError occurs while trying to report on a KeyError. I need to fix that. Fortunately, the same error pops out anyway. There's a fix for this in the Jarn branch. Note that to collect more interesting data it rolls back the load connection at this point, relying on the KeyError to cause the transaction to fail. Laurence ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Relstorage pack problems
Santi Camps wrote: We've been using RelStorage 1.1.c1 in production environments over postgresql 8.1 for some time. It has been working really fine, but yesterday we have a big problem packing a relstorage zodb mounted as a zope mount point. I'm guessing that your mounted object is not attached to the root of the mounted database. In that situation, both RelStorage and FileStorage will pack away practically everything in the mounted database, because from the viewpoint of the mounted database, the disconnected object is unreachable and thus garbage. A quick workaround is to disable garbage collection in zope.conf; then you can pack without losing any objects at all. A proper fix would be to make the mounted object reachable from the mounted database root. I believe the Zope 2 support for mounted databases does this automatically, but it's possible that Zope 3 takes an unwise shortcut. But just one second later (22:11:32) a KeyError appears in event.log, and the content of the database couldn't been seen in ZMI. We try to remount it, restart zope, and so on, but nothing works, so we had to restore a backup. That's the traceback of event.log: Traceback (most recent call last): File /usr/local/zope/lib/python/ZODB/Connection.py, line 768, in setstate self._setstate(obj) File /usr/local/zope/lib/python/ZODB/Connection.py, line 804, in _setstate p, serial = self._storage.load(obj._p_oid, self._version) File /usr/lib/python2.4/site-packages/relstorage/relstorage.py, line 311, in load self._log_keyerror(oid_int, no tid found(2)) File /usr/lib/python2.4/site-packages/relstorage/relstorage.py, line 255, in _log_keyerror rows = adapter.iter_object_history(cursor, oid_int) File /usr/lib/python2.4/site-packages/relstorage/adapters/common.py, line 229, in iter_object_history raise KeyError(oid) KeyError: 7L I should note that this KeyError occurs while trying to report on a KeyError. I need to fix that. Fortunately, the same error pops out anyway. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev