I'm not sure copying data out of HDFS is what you would want to do, though I suppose it depends on how much data you're storing there. If you want a backup on a different system, but you have too much data to store outside of a distributed file system, you could consider using distcp to copy from one HDFS instance to another.
You can't clone the !METADATA table. In 1.5.0, you can export and import tables, which is designed to help you copy a table to a different cluster (see docs/examples/README.export). Cloning your tables could help, but in the case of !METADATA corruption you're still in the position of manually creating a new table with the same configuration (and split points if you know them) and bulk importing the old data files. I don't know if table export could be used to back up the metadata and configuration of a cloned table to help you recover its state later on the same system if the original table has gotten corrupted. It's possible. Billie On Fri, May 31, 2013 at 11:05 AM, Mike Hugo <[email protected]> wrote: > I'm curious to know how people are backing up data in Accumulo. > > We are planning on copying data out of HDFS on a some regular basis to be > able to do full restore. > > We've also ended up getting into a state of having a corrupt !METADATA > table a few times. I'm wondering if doing a clone on a few tables on a > periodic basis (like every hour, for a few hours) might be one way to help > us recover from that situation. > > E.g if we did a clone on all tables, including the !METADATA table hourly, > and we didn't necessarily care about losing data in the last hour time > frame, could we simply restore from one of those clones if we get into a > corrupted state? > > Is there another mechanism for snapshotting / backing up data in Accumulo? > > Thanks for your thoughts! > > Mike >
