Adam, 

Thanks for the quick response. So, now that I understand the caveats, what I 
would like to know is how this would be done.


Patrick




-----Original Message-----
From: Adam Fuchs <[email protected]>
To: user <[email protected]>
Sent: Thu, Jul 5, 2012 10:13 am
Subject: Re: Recovering Tables from HDFS


Hi Patrick,


The short answer is yes, but there are a few caveats:
1. As you said, information that is sitting in the in-memory map and in the 
write-ahead log will not be in those files. You can periodically call flush 
(Connector.getTableOperations().flush(...)) to guarantee that your data has 
made it into the RFiles.
2. Old data that has been deleted may reappear. RFiles can span multiple 
tablets, which happens when tablets split. Often, one of the tablets compacts, 
getting rid of delete keys. However, the file that holds the original data is 
still in HDFS because it is referenced by another tablet (or because it has not 
yet been garbage collected). If you're using Accumulo in an append-only 
fashion, then this will not be a problem.
3. For the same reasons as #2, if you're doing any aggregation you might run 
into counts being incorrect.


You might also check out the table cloning feature introduced in 1.4 as a means 
for backing up a table: 
http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables


Cheers,
Adam



On Thu, Jul 5, 2012 at 9:52 AM,  <[email protected]> wrote:



users@accumulo,


I need help understanding if one could recover or backup tables by taking their 
files stored in HDFS and reattaching them to tablet servers, even though this 
would mean the loss of information from recent mutations and write ahead logs. 
The documentation on recovery is focused on the failure of a tablet server, 
but, in the event of a failure of the master or other situation where the 
tablet servers cannot be utilized, it would be beneficial to know whether the 
files in HDFS can be used for recovery.


Thanks,


Patrick Lynch
 
 



 

Reply via email to