Hi folks,
We experienced a problem this morning with a recovery on 1.6.1 that went
something like this:
FileNotFoundException: File does not exist:
hdfs:///accumulo/recovery/<uuid>/failed/data
at Tablet.java:1410
at Tablet.java:1233
etc.
at TabletServer:2923
Interestingly enough, at hdfs:///accumulo/recovery/<uuid>/failed was a 0
byte file, not a directory...and it was preventing tablets from getting
assigned (I am not sure what caused the original failure, but I believe
what happened is a tserver node was going down...the master indicated it
was trying to shutdown the a tserver which was so bad off someone just
rekicked the node).
I looked through the fixes for 1.6.2,3,4,5 but didn't see anything
related on the release notes pages but I haven't gone through all the
tickets yet. I haven't been able to get anyone to upgrade to 1.6.5 yet
and perhaps its already fixed.
Just wondering if that's something that has been seen before?
In order to fix it I just deleted the failed file and it proceeded
Thanks!
Andrew