On Fri, May 31, 2013 at 2:39 PM, Billie Rinaldi <[email protected]>wrote:
> I'm not sure copying data out of HDFS is what you would want to do, though > I suppose it depends on how much data you're storing there. If you want a > backup on a different system, but you have too much data to store outside > of a distributed file system, you could consider using distcp to copy from > one HDFS instance to another. > > You can't clone the !METADATA table. In 1.5.0, you can export and import > tables, which is designed to help you copy a table to a different cluster > (see docs/examples/README.export). Cloning your tables could help, but in > the case of !METADATA corruption you're still in the position of manually > creating a new table with the same configuration (and split points if you > know them) and bulk importing the old data files. I don't know if table > export could be used to back up the metadata and configuration of a cloned > table to help you recover its state later on the same system if the > original table has gotten corrupted. It's possible. > Export table will save the tables state (whats in !METADATA in zookeeper) to a zipfile. So even if you do not actually copy the exported table, it can be used to save table metadata. I made comment on ACCUMULO-942 about using export table to obtain a consistent snapshot of HDFS and Accumulo metadata using export table. That system metadata could be backed up. > > > Billie > > > On Fri, May 31, 2013 at 11:05 AM, Mike Hugo <[email protected]> wrote: > >> I'm curious to know how people are backing up data in Accumulo. >> >> We are planning on copying data out of HDFS on a some regular basis to be >> able to do full restore. >> >> We've also ended up getting into a state of having a corrupt !METADATA >> table a few times. I'm wondering if doing a clone on a few tables on a >> periodic basis (like every hour, for a few hours) might be one way to help >> us recover from that situation. >> >> E.g if we did a clone on all tables, including the !METADATA table >> hourly, and we didn't necessarily care about losing data in the last hour >> time frame, could we simply restore from one of those clones if we get into >> a corrupted state? >> >> Is there another mechanism for snapshotting / backing up data in Accumulo? >> >> Thanks for your thoughts! >> >> Mike >> > >
