Re: [ZODB-Dev] Recovering from BTree corruption
On Sep 12, 2007, at 10:28 AM, Jim Fulton wrote: ... - checkbtrees.py - fstest.py There's an fsrefs script that checks internal references I believe. fsrefs.py shows loads of problems in both the data.fs and the resources.fs. probably > 200 entries per database. i.e. oid 0xD87110L BTrees._OOBTree.OOBucket last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL refers to invalid objects: oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: '' oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: '' oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '' oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: '' ... - How do I tell if something is a reference to another database? I don't know how to do this with fsrefs. I'm not 100% sure that fsrefs recognizes cross-database references. I did a little looking at fsrefs. It doesn't analyze the types of references. It just tries to load objects. This approach, aside from being less informative than it should be, totally fails with multiple databases. Cross-database references will always be reported as "missing" by fsrefs. I'll try to make some time in the next few days to look at this issue. Man it's hard to make time ... I'll look at fsrefs a bit more closely to: - make sure it understands cross-database references, and It doesn't. - Make sure it reports whether missing references are local or remote. Haha ;) I'd like to decide what to do next based on this investigation. In particular, I want to be sure if the problems you are having are actually due to cross-database reference issues. I'll also look at writing a tool that might be able to recover lost objects from backup databases. The idea is that a tool would scan a database for missing oids save the list to files, separating references to different databases. Then there'd be another tool that would read this list and a list of old database files and scan the files looking for oids in the list and extracting records if they are found. I spent some time on an analyses tool. See: http://svn.zope.org/zc.fsutil/branches/dev/ and especially: http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil/ references.txt?view=auto It will help you figure out if you have holes and separate cross- database and local references. You may have to work a little though. The data structures produced will allow you to analyze broken cross- database references in a way that should be fairly obvious. (Hint, you'll have to generate data for each database and make sure that all of oids mentioned in the set of cross-database references are actually present in the named databases.) A major challenge is handling large databases. We have databases will millions of objects and I kept having to trim the amount of data analyzed to fit the data structures in memory. It is interesting to look at the evolution of the data structures over the last couple of days yesterday as I tried to cope with scale. The obvious next step is to store data in a database rather than memory. This will slow things down, but will allow me to work with arbitrarily large databases and keep richer data structures. Assuming that you still care about this (you've been quiet :), I suggest using this tool to find the holes. (You can also use it to find the objects that refer to the missing objects.) Then, once you've found the missing oids, you should go to backups, open file storages on the backups and, if the oids are present, copy the pickles to the database under repair. Something like: pickles = [backup_storage.load(oid, '')[0] for oid in oids] t = transaction.begin() s = database_with_hole s.tpc_begin(t) [s.store(oid, '\0'*8, p, '', t) for (oid, p) in zip(oids, pickles)] s.tpc_vote(t) s.tpc_finish(t) If you don't have the data in backups, then you might be able to use information about the objects referring to the missing objects to repair the refering objects by hand by deleting the references to missing objects. Hope this helps. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
On Sep 11, 2007, at 10:27 AM, Alan Runyan wrote: And, as you said in another node, the BTree folder actually loves in the resources database. Correct the BTree is in /plone/resources/files to be exact. Cross database references are inherently weak. A reference from a foreign database doesn't prevent an object from being treated as garbage. So, if the only reference to an object is from a foreign database, then the object is considered garbage. It doesn't sound like this is what's affecting you. The cross-database reference is to the BTree. It sounds like the internal references are within database. Well. Someone could have 'copy/pasted' a file from the content database into the resources/files database. That could have been one issue. :( BTW, I assume you mean cut/paste aka move. - checkbtrees.py - fstest.py There's an fsrefs script that checks internal references I believe. fsrefs.py shows loads of problems in both the data.fs and the resources.fs. probably > 200 entries per database. i.e. oid 0xD87110L BTrees._OOBTree.OOBucket last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL refers to invalid objects: oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: '' oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: '' oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '' oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: '' Interesting. I wonder if these are actually cross-database references. My questions are: - I imagine if there are 'invalid' references this is considered "corruption" or "inconsistency"? I consider this inconsistency. The file structure is intact, but the data isn't what it should be. Not that it matters to the end user what we call it. - How do I tell if something is a reference to another database? I don't know how to do this with fsrefs. I'm not 100% sure that fsrefs recognizes cross-database references. - Having these invalid references, is this common to ZODB applications? No. Possibly, there's a backup that has data records for the missing OIDs. Going to ask hosting company to pull up backups for the past few weeks. But how i'm going to find this other than "seeing if the folder allows me to iterate over the items" is not throwing POSKeyError. Does that sound like a decent litmus test? Well. there's also fsrefs. I'll try to make some time in the next few days to look at this issue. I'll look at fsrefs a bit more closely to: - make sure it understands cross-database references, and - Make sure it reports whether missing references are local or remote. I'd like to decide what to do next based on this investigation. In particular, I want to be sure if the problems you are having are actually due to cross-database reference issues. I'll also look at writing a tool that might be able to recover lost objects from backup databases. The idea is that a tool would scan a database for missing oids save the list to files, separating references to different databases. Then there'd be another tool that would read this list and a list of old database files and scan the files looking for oids in the list and extracting records if they are found. I do suspect we need to do something about cross-database references. My long-term plan is to: - Add an option to file storages to skip garbage collection when packing. - Add a multi-database garbage-collection protocol and tool In the short term, It might be good to have a mechanism for limiting which objects can have cross-database reference to them to limit the chance of inadvertent cross-datavase references via move. This would need to be fleshed out though, which takes time. Perhaps something can be done at the zope or plone level in the code for moving objects to make sure that objects aren't moved between databases. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
Alan Runyan wrote at 2007-9-11 09:27 -0500: > ... >oid 0xD87110L BTrees._OOBTree.OOBucket >last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL >refers to invalid objects: >oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '' >oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: '' Looks as if the "OOBucket" has lost quite some value links (as only a single one links to the next bucket). >My questions are: > > - I imagine if there are 'invalid' references this is considered "corruption" > or "inconsistency"? I depends on your preferences. > ... > - Having these invalid references, is this common to ZODB applications? No. At least not for ZODB applications that do not use inter database references. >> Possibly, there's a backup that has data records for the missing OIDs. > >Going to ask hosting company to pull up backups for the past few weeks. >But how i'm going to find this other than "seeing if the folder allows me >to iterate over the items" is not throwing POSKeyError. Does that sound >like a decent litmus test? You can also run "fsrefs" on it. When you do not get "missing ...", then the backup does not have you POSKeyError (but may lack quite a few newer modifications). -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
> And, as you said in another node, the BTree folder actually loves in > the resources database. Correct the BTree is in /plone/resources/files to be exact. > Cross database references are inherently weak. A reference from a > foreign database doesn't prevent an object from being treated as > garbage. So, if the only reference to an object is from a foreign > database, then the object is considered garbage. It doesn't sound > like this is what's affecting you. The cross-database reference is > to the BTree. It sounds like the internal references are within > database. Well. Someone could have 'copy/pasted' a file from the content database into the resources/files database. That could have been one issue. > > - checkbtrees.py > > - fstest.py > > There's an fsrefs script that checks internal references I believe. fsrefs.py shows loads of problems in both the data.fs and the resources.fs. probably > 200 entries per database. i.e. oid 0xD87110L BTrees._OOBTree.OOBucket last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL refers to invalid objects: oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: '' oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: '' oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: '' oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: '' oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '' oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: '' My questions are: - I imagine if there are 'invalid' references this is considered "corruption" or "inconsistency"? - How do I tell if something is a reference to another database? - Having these invalid references, is this common to ZODB applications? > Possibly, there's a backup that has data records for the missing OIDs. Going to ask hosting company to pull up backups for the past few weeks. But how i'm going to find this other than "seeing if the folder allows me to iterate over the items" is not throwing POSKeyError. Does that sound like a decent litmus test? -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
On Sep 10, 2007, at 10:34 AM, Alan Runyan wrote: Hi guys. It seems that one of our customers has a corrupted BTree. I would love for someone to provide some insight on how we can recover the data. we have two databases: 1 for resources and 1 for 'content'. resources contain lots of very big files. The system is configured to have a mount point at /plone/resources is a subclass of BTreeFolder, using internal data struct of OOBTree. And, as you said in another node, the BTree folder actually loves in the resources database. anytime I iterate over the keys I get POSKeyError. anytime I iterate over the values the same. if I run BTree.check() on the data structure's tree attribute (the OOBTree itself) I get a POSKeyError. Running the utils.checkbtrees doesnt say this btree has a problem. While debugging this I had a conversation with sidnei about mounted databases. He recalled that if your using a mounted database you should not pack. If for some reason your mounted database had a cross reference to another database and somehow you had a dangling reference to the other database it would cause POSKeyError. Cross database references are inherently weak. A reference from a foreign database doesn't prevent an object from being treated as garbage. So, if the only reference to an object is from a foreign database, then the object is considered garbage. It doesn't sound like this is what's affecting you. The cross-database reference is to the BTree. It sounds like the internal references are within database. Is there any other ways of "testing consistency" of FileStorage other than: - checkbtrees.py - fstest.py There's an fsrefs script that checks internal references I believe. And any ideas how I can salvage the data? This BTree, of course, had the most valuable data. Possibly, there's a backup that has data records for the missing OIDs. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
Alan Runyan wrote at 2007-9-10 09:34 -0500: > ... >While debugging this I had a conversation with sidnei about mounted >databases. He recalled that if your using a mounted database you >should not pack. If for some reason your mounted database had a cross >reference to another database and somehow you had a dangling reference >to the other database it would cause POSKeyError. BTrees are actually directed acyclic graphs (DAGs) with two node types "tree" (internal node) and "bucket" (leaf). Beside its children, a "tree" contains a link to its leftmost bucket. Beside its keys/values, a "bucket" contains a link to the next "bucket". When you iterate over "keys" or "values", the leftmost bucket is accessed via the root's leftmost bucket link and then all buckets are visited via the "next bucket" links. Your description seems to indicate that you have lost a "next bucket" link. If you are lucky, then the tree access structure (the children links of the "tree" nodes) is still intact -- or if not, is at least partially intact. Then, you will be able to recover large parts of your tree. You have two options: * reconstruct the tree from its pickles. This is the way, the checking of BTrees works. * Determine the last key ("LK") before you get the "POSKeyError"; then use the tree structure to access the next available key. You may need to try ever larger values above "LK" to skip a potentially damanged part of the tree. I would start with the second approach and switch to the first one when it becomes too tedious. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Recovering from BTree corruption
On Sep 10, 2007, at 10:34 AM, Alan Runyan wrote: Hi guys. It seems that one of our customers has a corrupted BTree. I would love for someone to provide some insight on how we can recover the data. we have two databases: 1 for resources and 1 for 'content'. resources contain lots of very big files. The system is configured to have a mount point at /plone/resources is a subclass of BTreeFolder, using internal data struct of OOBTree. Does the BTree folder live in the content database or the resources database. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Recovering from BTree corruption
Hi guys. It seems that one of our customers has a corrupted BTree. I would love for someone to provide some insight on how we can recover the data. we have two databases: 1 for resources and 1 for 'content'. resources contain lots of very big files. The system is configured to have a mount point at /plone/resources is a subclass of BTreeFolder, using internal data struct of OOBTree. anytime I iterate over the keys I get POSKeyError. anytime I iterate over the values the same. if I run BTree.check() on the data structure's tree attribute (the OOBTree itself) I get a POSKeyError. Running the utils.checkbtrees doesnt say this btree has a problem. While debugging this I had a conversation with sidnei about mounted databases. He recalled that if your using a mounted database you should not pack. If for some reason your mounted database had a cross reference to another database and somehow you had a dangling reference to the other database it would cause POSKeyError. Is there any other ways of "testing consistency" of FileStorage other than: - checkbtrees.py - fstest.py And any ideas how I can salvage the data? This BTree, of course, had the most valuable data. cheers -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev