[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320927#comment-17320927 ] Siyao Meng edited comment on HDFS-15614 at 4/14/21, 11:31 AM: -- Thanks for bringing this up [~ayushtkn]. {quote} And this fails, And yep there is an ambiguity. {quote} The reason is that [{{DFS#provisionSnapshotTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2984] followed EZ counterpart's [{{DFS#provisionEZTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2913] implementation. {{dfs.provisionSnapshotTrash}} is not automatically called from {{dfs.allowSnapshot}}, following what encryption zone has done similarly. Therefore this would be the same deal for encryption zone trash root creation. When replacing {{dfs.allowSnapshot}} calls with {{dfs.createEncryptionZone}} in the first test case we should also find trash root inside encryption zone missing with {{dfs.provisionSnapshotTrash}} call *alone*. I suggest some guidelines should be posted and in javadoc adding that allowSnapshot should better performed with dfsadmin CLI (and createEncryptionZone if there aren't already). {quote} How come a client side feature that important, that can make the cluster go down in times of critical situation like failover, Again a test to show that: {quote} [name quota|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html#Name_Quotas] can become an issue indeed. I think I got your point. Maybe a better way to create those necessary Trash dirs is to ask an admin to run dfsadmin command *manually* after flipping {{dfs.namenode.snapshot.trashroot.enabled}} to {{true}}. Currently we already have {{dfsadmin -provisionSnapshotTrash}} but can only be done one by one. {{dfsadmin -provisionSnapshotTrash -all}} can be implemented to achieve this. Cheers, Siyao was (Author: smeng): Thanks for bringing this up [~ayushtkn]. {quote} And this fails, And yep there is an ambiguity. {quote} The reason is that [{{DFS#provisionSnapshotTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2984] followed EZ counterpart's [{{DFS#provisionEZTrash}}|https://github.com/apache/hadoop/blob/c6539e3289711d29f508930bbda40302f48ddf4c/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2913] implementation. {{dfs.provisionSnapshotTrash}} is not automatically called from {{dfs.allowSnapshot}}, following what . Therefore this would be the same deal for encryption zone trash root creation. When replacing {{dfs.allowSnapshot}} calls with {{dfs.createEncryptionZone}} in the first test case we should also find trash root inside encryption zone missing with {{dfs.provisionSnapshotTrash}} call *alone*. I suggest some guidelines should be posted and in javadoc adding that allowSnapshot should better performed with dfsadmin CLI (and createEncryptionZone if there aren't already). {quote} How come a client side feature that important, that can make the cluster go down in times of critical situation like failover, Again a test to show that: {quote} [name quota|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html#Name_Quotas] can become an issue indeed. I think I got your point. Maybe a better way to create those necessary Trash dirs is to ask an admin to run dfsadmin command *manually* after flipping {{dfs.namenode.snapshot.trashroot.enabled}} to {{true}}. Currently we already have {{dfsadmin -provisionSnapshotTrash}} but can only be done one by one. {{dfsadmin -provisionSnapshotTrash -all}} can be implemented to achieve this. Cheers, Siyao > Initialize snapshot trash root during NameNode startup if enabled > - > > Key: HDFS-15614 > URL: https://issues.apache.org/jira/browse/HDFS-15614 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This is a follow-up to HDFS-15607. > Goal: > Initialize (create) snapshot trash root for all existing snapshottable > directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to > {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually > on
[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168 ] Shashikant Banerjee edited comment on HDFS-15614 at 4/12/21, 8:46 AM: -- Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {code:java} 1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes will try this stuff in an attempt to become active and come out of safemode. Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, I can tell you one which I tried: Namespace Quotas, and yep the namenode crashed. can be bunch of such cases {code} If mkdir fails to create the Trash directory , inside the snapshot root, then strict ordering/processing of all entries during snapshot deletion can not be guaranteed, If this feature needs to be used, .Trash needs to be within the snapshottable directory which is similar to the case with encryption zones. {code:java} 2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin he didn't had any Trash directory in the snapshot dir, Suddenly a failover happened, he would get a trash directory in its snapshot directory, Which he never created.{code} If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created impliclitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {code:java} Third, The time cost, The namenode startup or the namenode failover or let it be coming out of safemode should be fast, They are actually contributing to cluster down time, and here we are doing like first getSnapshottableDirs which itself would be a heavy call if you have a lot of snapshots, then for each directory, one by one we are doing a getFileInfo and then a mkdir, seems like time-consuming. Not sure about the memory consumption at that point due to this though... {code} I don't think getSnapshottableDirs() is a very heavey call in typical setups. It has nothing to do with the no of snapshots that exist in the sytem. {code:java} Fourth, Why the namenode needs to do a client operation? It is the server. And that too while starting up, This mkdirs from namenode to self is itself suspicious, Bunch of namenode crashing coming up trying to become active, trying to push same edits, Hopefully you would have taken that into account and pretty sure such things won't occur, Namenodes won't collide even in the rarest cases. yep and all safe with the permissions.. {code} This is important for provisioning snapshot trash to use ordered snapshot deletion feature if the system already had pre existing snapshottable directories. was (Author: shashikant): Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {code:java} 1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes will try this stuff in an attempt to become active and come out of safemode. Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, I can tell you one which I tried: Namespace Quotas, and yep the namenode crashed. can be bunch of such cases {code} If mkdir fails to create the Trash directory , inside the snapshot root, then strict ordering/processing of all entries during snapshot deletion can not be guaranteed, If this feature needs to be used, .Trash needs to be within the snapshottable directory which is similar to the case with encryption zones. {code:java} 2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin he didn't had any Trash directory in the snapshot dir, Suddenly a failover happened, he would get a trash directory in its snapshot directory, Which he never created.{code} If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created impliclitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {code:java} Third, The time cost, The namenode startup or the namenode failover or let it be coming out of safemode should be fast, They are actually contributing to cluster down time, and here we are doing like first getSnapshottableDirs which itself would be a heavy call if you have a lot of snapshots, then for each directory, one by one we are doing a getFileInfo and then a mkdir, seems like time-consuming. Not sure about the memory consumption at that point due to this though... {code} I don't think getSnapshottableDirs() is a very heavey call in typical setups. It has nothing to do with the no of snapshots that exist in the sytem. {code:java} Fourth, Why the namenode needs to do a client operation? It is the server. And that too while starting up, This mkdirs from namenode to