[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-14 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320927#comment-17320927
 ] 

Siyao Meng edited comment on HDFS-15614 at 4/14/21, 11:31 AM:
--

Thanks for bringing this up [~ayushtkn].


{quote}
And this fails, And yep there is an ambiguity.
{quote}

The reason is that 
[{{DFS#provisionSnapshotTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2984]
 followed EZ counterpart's 
[{{DFS#provisionEZTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2913]
 implementation. {{dfs.provisionSnapshotTrash}} is not automatically called 
from {{dfs.allowSnapshot}}, following what encryption zone has done similarly.

Therefore this would be the same deal for encryption zone trash root creation. 
When replacing {{dfs.allowSnapshot}} calls with {{dfs.createEncryptionZone}} in 
the first test case we should also find trash root inside encryption zone 
missing with {{dfs.provisionSnapshotTrash}} call *alone*.

I suggest some guidelines should be posted and in javadoc adding that 
allowSnapshot should better performed with dfsadmin CLI (and 
createEncryptionZone if there aren't already).


{quote}
How come a client side feature that important, that can make the cluster go 
down in times of critical situation like failover, Again a test to show that:
{quote}

[name 
quota|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html#Name_Quotas]
 can become an issue indeed.


I think I got your point. Maybe a better way to create those necessary Trash 
dirs is to ask an admin to run dfsadmin command *manually* after flipping 
{{dfs.namenode.snapshot.trashroot.enabled}} to {{true}}.

Currently we already have {{dfsadmin -provisionSnapshotTrash}} but can only be 
done one by one. {{dfsadmin -provisionSnapshotTrash -all}} can be implemented 
to achieve this.


Cheers,
Siyao


was (Author: smeng):
Thanks for bringing this up [~ayushtkn].


{quote}
And this fails, And yep there is an ambiguity.
{quote}

The reason is that 
[{{DFS#provisionSnapshotTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2984]
 followed EZ counterpart's 
[{{DFS#provisionEZTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2913]
 implementation. {{dfs.provisionSnapshotTrash}} is not automatically called 
from {{dfs.allowSnapshot}}, following what .

Therefore this would be the same deal for encryption zone trash root creation. 
When replacing {{dfs.allowSnapshot}} calls with {{dfs.createEncryptionZone}} in 
the first test case we should also find trash root inside encryption zone 
missing with {{dfs.provisionSnapshotTrash}} call *alone*.

I suggest some guidelines should be posted and in javadoc adding that 
allowSnapshot should better performed with dfsadmin CLI (and 
createEncryptionZone if there aren't already).


{quote}
How come a client side feature that important, that can make the cluster go 
down in times of critical situation like failover, Again a test to show that:
{quote}

[name 
quota|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html#Name_Quotas]
 can become an issue indeed.


I think I got your point. Maybe a better way to create those necessary Trash 
dirs is to ask an admin to run dfsadmin command *manually* after flipping 
{{dfs.namenode.snapshot.trashroot.enabled}} to {{true}}.

Currently we already have {{dfsadmin -provisionSnapshotTrash}} but can only be 
done one by one. {{dfsadmin -provisionSnapshotTrash -all}} can be implemented 
to achieve this.


Cheers,
Siyao

> Initialize snapshot trash root during NameNode startup if enabled
> -
>
> Key: HDFS-15614
> URL: https://issues.apache.org/jira/browse/HDFS-15614
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up to HDFS-15607.
> Goal:
> Initialize (create) snapshot trash root for all existing snapshottable 
> directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to 
> {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually 
> on 

[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168
 ] 

Shashikant Banerjee edited comment on HDFS-15614 at 4/12/21, 8:46 AM:
--

Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy 
call IMO. It does not depend on the no of snapshots present in the system.

 
{code:java}
1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes 
will try this stuff in an attempt to become active and come out of safemode. 
Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, 
I can tell you one which I tried: Namespace Quotas, and yep the namenode 
crashed. can be bunch of such cases
{code}
If mkdir fails to create the Trash directory , inside the snapshot root, then 
strict ordering/processing of all entries during snapshot deletion can not be 
guaranteed, If this feature needs to be used, .Trash needs to be within the 
snapshottable directory which is similar to the case with encryption zones.

 

 

 
{code:java}
2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin 
he didn't had any Trash directory in the snapshot dir, Suddenly a failover 
happened, he would get a trash directory in its snapshot directory, Which he 
never created.{code}
If a new directory is made snapshottable with feature flahg turned , .Trash 
directory gets created impliclitly as a part of allowSnapshot call. I don't 
think there is an ambiguity here.
{code:java}
Third, The time cost, The namenode startup or the namenode failover or let it 
be coming out of safemode should be fast, They are actually contributing to 
cluster down time, and here we are doing like first getSnapshottableDirs which 
itself would be a heavy call if you have a lot of snapshots, then for each 
directory, one by one we are doing a getFileInfo and then a mkdir, seems like 
time-consuming. Not sure about the memory consumption at that point due to this 
though...
{code}
I don't think getSnapshottableDirs() is a very heavey call in typical setups. 
It has nothing to do with the no of snapshots that exist in the sytem.
{code:java}
Fourth, Why the namenode needs to do a client operation? It is the server. And 
that too while starting up, This mkdirs from namenode to self is itself 
suspicious, Bunch of namenode crashing coming up trying to become active, 
trying to push same edits, Hopefully you would have taken that into account and 
pretty sure such things won't occur, Namenodes won't collide even in the rarest 
cases. yep and all safe with the permissions..
{code}
This is important for provisioning snapshot trash to use ordered snapshot 
deletion feature if the system already had pre existing snapshottable 
directories.

 


was (Author: shashikant):
Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy 
call IMO. It does not depend on the no of snapshots present in the system.

 
{code:java}
1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes 
will try this stuff in an attempt to become active and come out of safemode. 
Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, 
I can tell you one which I tried: Namespace Quotas, and yep the namenode 
crashed. can be bunch of such cases
{code}
If mkdir fails to create the Trash directory , inside the snapshot root, then 
strict ordering/processing of all entries during snapshot deletion can not be 
guaranteed, If this feature needs to be used, .Trash needs to be within the 
snapshottable directory which is similar to the case with encryption zones.

 

 

 
{code:java}
2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin 
he didn't had any Trash directory in the snapshot dir, Suddenly a failover 
happened, he would get a trash directory in its snapshot directory, Which he 
never created.{code}
If a new directory is made snapshottable with feature flahg turned , .Trash 
directory gets created impliclitly as a part of allowSnapshot call. I don't 
think there is an ambiguity here.
{code:java}
Third, The time cost, The namenode startup or the namenode failover or let it 
be coming out of safemode should be fast, They are actually contributing to 
cluster down time, and here we are doing like first getSnapshottableDirs which 
itself would be a heavy call if you have a lot of snapshots, then for each 
directory, one by one we are doing a getFileInfo and then a mkdir, seems like 
time-consuming. Not sure about the memory consumption at that point due to this 
though...
{code}
I don't think getSnapshottableDirs() is a very heavey call in typical setups. 
It has nothing to do with the no of snapshots that exist in the sytem.
{code:java}
Fourth, Why the namenode needs to do a client operation? It is the server. And 
that too while starting up, This mkdirs from namenode to