We had some corrupt WAL blocks on our stage environment the other day and
opted to delete them.  We not have some missing metadata and about 3k files
pending for replication.  I've dug into it a bit and noticed that many of
the WALs in the `order` queue of the replication table A) no longer exist
in HDFS and B) have no entries in the `repl` section of the replication
table.

Based on the code, if there are no entries in the `repl` section, then the
work will never be queued for completion via ZooKeeper and therefore never
finished -- does this make sense?  What'd be the suggestion here to
proceed?  I'm thinking a one-off tool to backfill the `repl` section should
do the trick, but I am wondering if this is something that should be
changed in Accumulo?

Cheers,
--Adam

Reply via email to