Andrew, That is what I was going to suggest you try. Where is that "Unable to find recovery files for extent" log? Anyway we can see some actual logs?
Are all the WALs there? Do you find any of the WAL deleted by GC in the gc logs? Do you find any duplicates WALs in the HDFS trash? On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulb...@ccri.com> wrote: > Mike, > For one of the WALs I backed up the recovery directory and that initiated > a new recovery attempt as indicated in the tserver debug log... > > Then the exception was thrown: > > Unable to find recovery files for extent xxxxxx logentry xxxxx > hdfs://path/to/wal/yyyy > > Any ideas? I figure we can zero out the WAL and it will go on with life > but it would be nice to try and get the data! > > Thanks! > > > On 10/18/2016 08:55 AM, Jeff Kubina wrote: > > > On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjw...@gmail.com> wrote: > >> Take a look at the master logs for where the WAL was sorted to the >> /accumulo/recovery/... >> directory. Then look to see if those WALs are still around and contain >> content. >> > > Checked one of them, yes it is around with content. > > Where is this this EOF exception, on a tserver? >> > > Yes, the tserver. > > >> Is the master log complaining about anything? >> > > Repeating a message similar to the tserver but also that the tablet > assignment failed for the tserver. > > tservers are not balancing because of all this. > > > >