[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-23 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336172#comment-16336172
 ] 

Hanisha Koneru commented on HDFS-12996:
---

Thanks [~aw] for reviewing the doc.

We considered something along similar lines. However we don’t want to take 
automatic snapshots on behalf of the user as it would not relieve the NN 
memory, as expected by the user. For example, say the user deletes 10M files 
and expects the NN memory usage to go down. Automatic snapshots would prevent 
this.

> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326698#comment-16326698
 ] 

Allen Wittenauer commented on HDFS-12996:
-

bq. Also the design looks very similar to Checkpointing/Snapshots.

The fact that this JIRA even exists points that snapshots are/were a failure.  
On other file systems, snapshots are exactly the recovery model for these types 
of deletes.

...

Reading through the doc, there are handful of spots where I see the use cases 
are extremely limited.  But I'm really left with a basic question:

Why isn't there an option to just have the NN automatically do a snapshot for 
deletes over a certain size instead and then automatically delete these 
snapshots after X amount time?  Wouldn't that add the protection that is being 
requested while avoiding the requirement to restart the NN? 



> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-12 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324775#comment-16324775
 ] 

Hanisha Koneru commented on HDFS-12996:
---

Thanks [~virajith] for the review.
bq. In Section 13, Step 5, which component receives the restore-replica-trash 
command? From my understanding this should be the Namenode. However, in step 5 
the Namenode is not running. How is this going to work?
You are right. The Namenode would first need to be restarted and then the 
restore-replica-trash command needs to be issued. Steps 5 and 6 got 
interchanged. Will fix this in the next version of the document. Thanks for 
catching this.
bq. Have you looked at "undoing" the delete operation? I think this would need 
to handle conflicts (for example, /foo/bar was deleted but a new /foo/bar/ is 
created) but could avoid having to stop and restart the Namenode.
Our current proposal handles a simpler use case of recovery via rolling back 
everything to an earlier point in time. Something similar to what you proposed 
could be built on top of the replica trash though.
bq. Can the replica purge daemon functionality be implemented in the 
VolumeScanner?
Yes that may be possible. We will look into it when we get to the purge 
implementation.

> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-10 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321337#comment-16321337
 ] 

Hanisha Koneru commented on HDFS-12996:
---

Thanks for the review, [~shahrs87].

bq. Suppose user1 and user2 deleted some of their directories (lets say dir1 
and dir2 respectively). If user1 wants to recover its directory, then we will 
recover dir2 as well ?

Yes. In the current design, recovery is done by rolling back to an earlier 
image. We could separately build a more fine-grained recovery mechanism on top 
of the replica trash.

bq. Many of our clients(lets say user1) use /tmp/ to store their 
intermediate task output (to work around quota problems). After a job 
completes, they delete this space and use the same location to store next job 
output. In the meantime if some other user(lets say user2) wants to recover 
their mistakenly deleted directory then we will go back in time for user1 which 
might corrupt user1's output directory.

True. This again would be a trade-off between recovering the deleted data and 
undoing operations performed after the delete operation. Only an administrator 
can make this call.

The goal of this feature is to provide a safe-guard to recover from 
catastrophic mistakes where it is acceptable to lose a few recent changes to 
recover deleted data.

bq. Also the design looks very similar to Checkpointing/Snapshots.

--> Didn't get what you mean by checkpointing in this context. If you take 
frequent rolling snapshots e.g. hourly snapshots of root directory from a cron 
job, then you don't need this feature and you can recover deleted files from a 
recent snapshot. However very few clusters are setup for this.


> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-09 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319531#comment-16319531
 ] 

Virajith Jalaparti commented on HDFS-12996:
---

Thanks for posting the design document [~hanishakoneru]. A few questions - 
# In Section 13, Step 5, which component receives the _restore-replica-trash_ 
command? From my understanding this should be the Namenode. However, in step 5 
the Namenode is not running. How is this going to work?
# Have you looked at "undoing" the delete operation? I think this would need to 
handle conflicts (for example, /foo/bar was deleted but a new /foo/bar/ is 
created) but could avoid having to stop and restart the Namenode.

> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-08 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317342#comment-16317342
 ] 

Rushabh S Shah commented on HDFS-12996:
---

[~hanishakoneru]: Looks like a good improvement. Thanks for the design.
I skimmed through the design document. I have couple of questions.
1. Suppose user1 and user2 deleted some of their directories (lets say dir1 and 
dir2 respectively).
If user1 wants to recover its directory, then we will recover dir2 as well ?
2. Another scenario I am concerned about:
Many of our clients(lets say user1) use {{/tmp/}} to store their 
intermediate task output (to work around quota problems).
After a job completes, they delete this space and use the same location to 
store next job output.
In the meantime if some other user(lets say user2) wants to recover their 
mistakenly deleted directory then we will go back in time for user1 which might 
corrupt user1's output directory.

Also the design looks very similar to Checkpointing/Snapshots.


> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317272#comment-16317272
 ] 

Hanisha Koneru commented on HDFS-12996:
---

Didn't know that. Thanks [~jojochuang] :).

> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12996) DataNode Replica Trash

2018-01-08 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317266#comment-16317266
 ] 

Wei-Chiu Chuang commented on HDFS-12996:


Just FYI, you should be able to move the jira from HADOOP to HDFS by clicking 
on More -> Move. 
I've done the same many times

> DataNode Replica Trash
> --
>
> Key: HDFS-12996
> URL: https://issues.apache.org/jira/browse/HDFS-12996
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: DataNode_Replica_Trash_Design_Doc.pdf
>
>
> DataNode Replica Trash will allow administrators to recover from a recent 
> delete request that resulted in catastrophic loss of user data. This is 
> achieved by placing all invalidated blocks in a replica trash on the datanode 
> before completely purging them from the system. The design doc is attached 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org