Thanks, Josh. As this is our stage cluster, we aren't too worried about the missing data; I just want to clean up the metadata so the queue looks better. I'll take the back-fill approach and see how that goes.
--Adam On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <josh.el...@gmail.com> wrote: > > > On 7/24/17 1:44 PM, Adam J. Shook wrote: > >> We had some corrupt WAL blocks on our stage environment the other day and >> opted to delete them. We not have some missing metadata and about 3k files >> pending for replication. I've dug into it a bit and noticed that many of >> the WALs in the `order` queue of the replication table A) no longer exist >> in HDFS and B) have no entries in the `repl` section of the replication >> table. >> >> Based on the code, if there are no entries in the `repl` section, then >> the work will never be queued for completion via ZooKeeper and therefore >> never finished -- does this make sense? >> > > Yeah, that sounds about right. I'm lamenting that I never wrote up docs > for the user-manual to cover the table-schema. I should ... do that... > > I think the order entry is created when the repl entry is. Would have to > dig back into code though. > > What'd be the suggestion here > >> to proceed? I'm thinking a one-off tool to backfill the `repl` section >> should do the trick, but I am wondering if this is something that should be >> changed in Accumulo? >> > > A tool to back-fill makes sense to me. I'm not sure what we could do in > Accumulo automatically. Any time there is data-loss (data gone missing or > old data coming back), Accumulo really can't do anything on its own. As you > described in your scenario, you made the conscious decision to nuke the > files with missing blocks. However, providing tools to handle "common" > failure scenarios outside of our purview sounds like a good idea. > > Improving our docs around how to "re-sync" two tables being replicated > would also be great. We have the hammer via snapshot+export, just need to > be clear with the instructions. > > Cheers, >> --Adam >> >