I will take a look at all of the above and report back to you. I have Friday off so should have some time to work on it this weekend.

Todd

Vadim Gritsenko wrote:
Todd Byrne wrote:
Well this was an EOF exception so the database file was truncated, is
there an away to recover the non corrupted documents?

I was thinking a bit about it.

It is relatively simple to add a basic sanity check into the filer.open() method to see if file was properly closed or not: length of the file should be exactly header + pageCount * pageSize. So it is easy to catch EOF condition.

In addition to this check, filer probably can write additional byte into the header to indicate clean shutdown. It can write 1 when file is opened, and 0 right before it closed. So next time you open, if you see '1' in there, it means database was not properly shut down last time.

Once you know if file was shut down incorrectly, we could try recover data from it (starting with taking a backup, probably?)

For hash filer, it can walk through main table and collect all non empty records, and recover collision chains. BTree filer also can be traversed. So most of the information can be retrievable. Some, of course, could be lost.

Let me know if you want to implement any of the above.

Vadim

Simple scan through the collection until the error occurs again?

I am going to work on some changes to propagate the IOException into the
other methods.

Todd

Vadim Gritsenko wrote:
Todd Byrne wrote:

In getBTreeNode(long page, BTreeNode parent)(line ) if an exception gets thrown for what ever reason it just gets ignored and null returned. This
seems problematic because nothing that calls this method checks for
null. Worse seems getChildNode(int idx)(line 534) doesn't check and is
called in about 11 places and none of their callers check for null.

My hunch is to let the exception be thrown and then we will get meaning full stack traces instead of NPEs later and have to search the log files later. If an exception was thrown like PageNotFound we could catch it in
the result code and gracefully skip the document.

Thoughts Ideas?

The only exception which should be thrown from this method is
IOException, as far as I can see. And I agree, I think it is better to
get IOException rather then rather pointless NPE.

So for a quick fix, I'd try and change method signature(s) to include
IOException and pass it up instead of eating it.


It brings though a point of how xindice should handle exceptional
situations like this. If it is at all possible, it should recover. E.g.,
it is better to be able to extract some documents from damaged
collection then none at all.


Vadim



Reply via email to