Note that the error is more like this:
Expected protocol id ffffff82 but got 35 (0!;38\\;82,<servername>:9997,
On 10/18/2016 10:28 AM, Andrew Hulbert wrote:
So backing up and then later deleting the recovery directories a few
times did the trick. It seemed that removing the initial bad one
caused the others to go through for the most part...
I believe all the WAL files were there. I'll look for the WAL deleted
in the GC logs and see if there's any evidence of that. It is version
1.6.4 by the way. Unfortunately can't send the logs to you here but I
did save them off and I'll talk to Jeff about what we can do.
We are currently getting a new error that I'm going to look into...
Expected protocol id ffffffff82 but got 0
Expected protocol id ffffffff82 but got 6e
Looking into that now! Thanks for the help so far, as usual!
On 10/18/2016 09:46 AM, Michael Wall wrote:
That is what I was going to suggest you try. Where is that "Unable
to find recovery files for extent" log? Anyway we can see some
Are all the WALs there? Do you find any of the WAL deleted by GC in
the gc logs? Do you find any duplicates WALs in the HDFS trash?
On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulb...@ccri.com
For one of the WALs I backed up the recovery directory and that
initiated a new recovery attempt as indicated in the tserver
Then the exception was thrown:
Unable to find recovery files for extent xxxxxx logentry xxxxx
Any ideas? I figure we can zero out the WAL and it will go on
with life but it would be nice to try and get the data!
On 10/18/2016 08:55 AM, Jeff Kubina wrote:
On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjw...@gmail.com
Take a look at the master logs for where the WAL was sorted
to the /accumulo/recovery/... directory. Then look to see
if those WALs are still around and contain content.
Checked one of them, yes it is around with content.
Where is this this EOF exception, on a tserver?
Yes, the tserver.
Is the master log complaining about anything?
Repeating a message similar to the tserver but also that the
tablet assignment failed for the tserver.
tservers are not balancing because of all this.