Some advanced recovery steps are documented[1], but there is no sort of
"fix it for you" tool.
It's probably a good idea to either set "fs.trash.interval" and/or
"fs.trash.checkpoint.interval" in core-site.xml to be representative of
the available HDFS space you have, or just turn off trash and take the
necessary steps to make sure your data is backed up (if that's a
priority for you).
HDFS (and Accumulo for that matter) are only as reliable as the hardware
and configurations you have set. They are built to be robust and
reliable systems, but they aren't without their flaws given enough time.
[1]
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_advanced_system_recovery
James Hughes wrote:
Ok, I can the see the benefit of being able to recovery data. Is this
process documented? And is there any kind of user-friendly tool for it?
On Mon, Aug 17, 2015 at 4:11 PM, <[email protected]
<mailto:[email protected]>> wrote:
It's not temporary files, it's any file that has been compacted
away. If you keep files around longer than
{dfs.namenode.checkpoint.period}, then you have a chance to recover
in case your most recent checkpoint is corrupt.
------------------------------------------------------------------------
*From: *"James Hughes" <[email protected] <mailto:[email protected]>>
*To: *[email protected] <mailto:[email protected]>
*Sent: *Monday, August 17, 2015 3:57:57 PM
*Subject: *Accumulo GC and Hadoop trash settings
Hi all,
From reading about the Accumulo GC, it sounds like temporary files
are routinely deleted during GC cycles. In a small testing
environment, I've the HDFS Accumulo user's .Trash folder have 10s of
gigabytes of data.
Is there any reason that the default value for gc.trash.ignore is
false? Is there any downside to deleting GC'ed files completely?
Thanks in advance,
Jim
http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore