Hi folks,

We experienced a problem this morning with a recovery on 1.6.1 that went something like this:

FileNotFoundException: File does not exist: hdfs:///accumulo/recovery/<uuid>/failed/data

at Tablet.java:1410
at Tablet.java:1233
etc.
at TabletServer:2923

Interestingly enough, at hdfs:///accumulo/recovery/<uuid>/failed was a 0 byte file, not a directory...and it was preventing tablets from getting assigned (I am not sure what caused the original failure, but I believe what happened is a tserver node was going down...the master indicated it was trying to shutdown the a tserver which was so bad off someone just rekicked the node).

I looked through the fixes for 1.6.2,3,4,5 but didn't see anything related on the release notes pages but I haven't gone through all the tickets yet. I haven't been able to get anyone to upgrade to 1.6.5 yet and perhaps its already fixed.

Just wondering if that's something that has been seen before?

In order to fix it I just deleted the failed file and it proceeded

Thanks!

Andrew

Reply via email to