All of the components that you need to perform point in time recovery of an Accumulo instance exist already. I have been working on a tool[1] in my copious amounts of free time to integrate them into something usable, but it doesn’t actually use the files in the trash. My approach is to let you determine the MTTR and then schedule your backups accordingly; a backup in case you are not able to recover your database using the techniques in the current documentation.
[1] https://github.com/dlmarion/raccovery From: James Hughes [mailto:[email protected]] Sent: Monday, August 17, 2015 4:28 PM To: [email protected] Subject: Re: Accumulo GC and Hadoop trash settings Ok, I can the see the benefit of being able to recovery data. Is this process documented? And is there any kind of user-friendly tool for it? On Mon, Aug 17, 2015 at 4:11 PM, <[email protected]> wrote: It's not temporary files, it's any file that has been compacted away. If you keep files around longer than {dfs.namenode.checkpoint.period}, then you have a chance to recover in case your most recent checkpoint is corrupt. _____ From: "James Hughes" <[email protected]> To: [email protected] Sent: Monday, August 17, 2015 3:57:57 PM Subject: Accumulo GC and Hadoop trash settings Hi all, >From reading about the Accumulo GC, it sounds like temporary files are >routinely deleted during GC cycles. In a small testing environment, I've the >HDFS Accumulo user's .Trash folder have 10s of gigabytes of data. Is there any reason that the default value for gc.trash.ignore is false? Is there any downside to deleting GC'ed files completely? Thanks in advance, Jim http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore
