Some advanced recovery steps are documented[1], but there is no sort of "fix it for you" tool.

It's probably a good idea to either set "fs.trash.interval" and/or "fs.trash.checkpoint.interval" in core-site.xml to be representative of the available HDFS space you have, or just turn off trash and take the necessary steps to make sure your data is backed up (if that's a priority for you).

HDFS (and Accumulo for that matter) are only as reliable as the hardware and configurations you have set. They are built to be robust and reliable systems, but they aren't without their flaws given enough time.


[1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_advanced_system_recovery

James Hughes wrote:
Ok, I can the see the benefit of being able to recovery data.  Is this
process documented?  And is there any kind of user-friendly tool for it?

On Mon, Aug 17, 2015 at 4:11 PM, <[email protected]
<mailto:[email protected]>> wrote:


      It's not temporary files, it's any file that has been compacted
    away. If you keep files around longer than
    {dfs.namenode.checkpoint.period}, then you have a chance to recover
    in case your most recent checkpoint is corrupt.

    ------------------------------------------------------------------------
    *From: *"James Hughes" <[email protected] <mailto:[email protected]>>
    *To: *[email protected] <mailto:[email protected]>
    *Sent: *Monday, August 17, 2015 3:57:57 PM
    *Subject: *Accumulo GC and Hadoop trash settings


    Hi all,

     From reading about the Accumulo GC, it sounds like temporary files
    are routinely deleted during GC cycles.  In a small testing
    environment, I've the HDFS Accumulo user's .Trash folder have 10s of
    gigabytes of data.

    Is there any reason that the default value for gc.trash.ignore is
    false?  Is there any downside to deleting GC'ed files completely?

    Thanks in advance,

    Jim

    http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore


Reply via email to