index not a SequenceFile.

Andrew Hulbert Tue, 18 Oct 2016 07:28:32 -0700

Mike,

So backing up and then later deleting the recovery directories a fewtimes did the trick. It seemed that removing the initial bad one causedthe others to go through for the most part...

I believe all the WAL files were there. I'll look for the WAL deleted inthe GC logs and see if there's any evidence of that. It is version 1.6.4by the way. Unfortunately can't send the logs to you here but I did savethem off and I'll talk to Jeff about what we can do.


We are currently getting a new error that I'm going to look into...

Expected protocol id ffffffff82 but got 0

Expected protocol id ffffffff82 but got 6e

etc.

Looking into that now! Thanks for the help so far, as usual!

Andrew

On 10/18/2016 09:46 AM, Michael Wall wrote:

Andrew,

That is what I was going to suggest you try. Where is that "Unable tofind recovery files for extent" log? Anyway we can see some actual logs?

Are all the WALs there? Do you find any of the WAL deleted by GC inthe gc logs? Do you find any duplicates WALs in the HDFS trash?

On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <[email protected]<mailto:[email protected]>> wrote:


    Mike,

    For one of the WALs I backed up the recovery directory and that
    initiated a new recovery attempt as indicated in the tserver debug
    log...

    Then the exception was thrown:

    Unable to find recovery files for extent xxxxxx logentry xxxxx
    hdfs://path/to/wal/yyyy

    Any ideas? I figure we can zero out the WAL and it will go on with
    life but it would be nice to try and get the data!

    Thanks!


    On 10/18/2016 08:55 AM, Jeff Kubina wrote:


    On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <[email protected]
    <mailto:[email protected]>> wrote:

        Take a look at the master logs for where the WAL was sorted
        to the /accumulo/recovery/... directory.  Then look to see if
        those WALs are still around and contain content.


    Checked one of them, yes it is around with content.

        Where is this this EOF exception, on a tserver?


    Yes, the tserver.

        Is the master log complaining about anything?


    Repeating a message similar to the tserver but also that the
    tablet assignment failed for the tserver.

    tservers are not balancing because of all this.

Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.

Reply via email to