[jira] [Updated] (HDFS-15638) Make Hive tables directory permission check flat

2020-10-16 Thread Xinli Shang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang updated HDFS-15638:
---
Description: 
Problem: Currently, when a user tries to accesses a file he/she needs the 
permissions of it's parent and ancestors and the permission of that file. This 
is correct generally, but for Hive tables directories/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer still need to call setfacl() for every file. This results in a huge 
amount of RPC calls to NN. HDFS has default ACL to solve that but that only 
applies to create and copy, but not apply for rename. However, in Hive ETL, 
rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 

  was:
Problem: Currently, when a user tries to accesses a file he/she needs not only 
the permission of that file but also the permissions of it's parent and 
ancestors. This is correct, but for Hive tables directory/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer sometimes still need to call setfacl() for every file. This results 
in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but 
that only applies to create and copy, but not apply for rename. However, in 
Hive ETL, rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 


> Make Hive tables directory permission check flat 
> -
>
> Key: HDFS-15638
> URL: https://issues.apache.org/jira/browse/HDFS-15638
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xinli Shang
>Priority: Major
>
> Problem: Currently, when a user tries to accesses a file he/she needs the 
> permissions of it's parent and ancestors and the permission of that file. 
> This is correct generally, but for Hive tables directories/files, all the 
> files under a partition or even a table usually have the same permissions for 
> the same set of ACL groups. Although the permissions and ACL groups are the 
> same, the writer still need to call setfacl() for every file. This results in 
> a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that 
> only applies to create and copy, but not apply for rename. However, in Hive 
> ETL, rename is very common. 
> Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it 
> is a Hive table directory. If that flag is set, then all the sub-directory 
> and files under it will just use it's permission and ACL groups settings. By 
> doing this way, Hive ETL doesn't need to set permissions at the file level. 
> If that flag is not set(by default), work as before. To set/unset that flag, 
> it would require admin privilege. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15638) Make Hive tables directory permission check flat

2020-10-16 Thread Xinli Shang (Jira)
Xinli Shang created HDFS-15638:
--

 Summary: Make Hive tables directory permission check flat 
 Key: HDFS-15638
 URL: https://issues.apache.org/jira/browse/HDFS-15638
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Xinli Shang


Problem: Currently, when a user tries to accesses a file he/she needs not only 
the permission of that file but also the permissions of it's parent and 
ancestors. This is correct, but for Hive tables directory/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer sometimes still need to call setfacl() for every file. This results 
in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but 
that only applies to create and copy, but not apply for rename. However, in 
Hive ETL, rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215697#comment-17215697
 ] 

Ahmed Hussein commented on HDFS-15618:
--

I checked the failing Junit tests. They are unrelated to the patch.
I will file new Jiras for those tests that seem to be broken for sometime. On 
multiple occasions, I started to notice "Dominos effect" in HDFS tests. A test 
fails or times out causes other tests to fail because they could not bind to a 
port or they could not get enough resources.
An example was that in the testRead() between TestBlockTokenWithDFSStriped and 
TestBlockTokenWithDFS where port 19870 is used by the two test cases.

The following is the stack trace of TestDeadNodeDetection failure reported by 
hadoopQA.

{code:bash}
java.net.BindException: Problem binding to [localhost:44881] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:908)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
at org.apache.hadoop.ipc.Server.bind(Server.java:640)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:1210)
at org.apache.hadoop.ipc.Server.(Server.java:3103)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:430)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:350)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:1031)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1452)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:513)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2868)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2774)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2818)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2494)
at 
org.apache.hadoop.hdfs.TestDeadNodeDetection.testDeadNodeDetectionDeadNodeRecovery(TestDeadNodeDetection.java:226)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 

[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215691#comment-17215691
 ] 

Hadoop QA commented on HDFS-15618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 2 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 2s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  4s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
58s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} |  | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 

[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215676#comment-17215676
 ] 

Stephen O'Donnell commented on HDFS-15634:
--

The issue you have seen is a fairly extreme example - not too many clusters 
will have 200 nodes decommissioned at the same time I suspect. The empty node 
problem is valid concern on smaller clusters and in the 
decommission-recommission case, the empty node may never catch up to the other 
nodes, as the default block placement policy picks nodes randomly.

The long lock hold is a problem even when a node (or perhaps rack) goes dead 
unexpectedly. I think it would be better to try to fix that generally, rather 
than doing something special for decommission. I am also not really comfortable 
changing the current "non destructive" decommission flow to something that 
removes the blocks from the DN.

If you look at HeartBeatManager.heartbeatCheck(...), it seems to handle only 1 
DN as dead on each check interval. The check interval is either 5 minute by 
default or 30 seconds if "dfs.namenode.avoid.write.stale.datanode" is true.

Then it ultimately calls BlockManager.removeBlocksAssociatedTo(...) which does 
the expensive work of removing the blocks. In that method, I wonder if we could 
drop and re-take the write lock periodically so this does not hold the lock for 
too long?


{code}
  /** Remove the blocks associated to the given datanode. */
  void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
providedStorageMap.removeDatanode(node);
for (DatanodeStorageInfo storage : node.getStorageInfos()) {
  final Iterator it = storage.getBlockIterator();
  //add the BlockInfos to a new collection as the
  //returned iterator is not modifiable.
  Collection toRemove = new ArrayList<>();
  while (it.hasNext()) {
toRemove.add(it.next());
  }

  // Could we drop and re-take the write lock in this loop every 1000 
blocks?
  for (BlockInfo b : toRemove) {
removeStoredBlock(b, node);
  }
}
// Remove all pending DN messages referencing this DN.
pendingDNMessages.removeAllMessagesForDatanode(node);

node.resetBlocks();
invalidateBlocks.remove(node);
  }
{code}

We see some nodes with 5M, 10M or even more blocks sometimes, so this would 
help them in general.

I am not sure if there would be any negative consequences of dropping and 
retaking the write lock in this scenario?

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: write lock.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215654#comment-17215654
 ] 

Fengnan Li commented on HDFS-15634:
---

Thanks for the comment [~sodonnell]. More context here:

We were decommissioning to swap with better hardware so these datanodes would 
not be used anymore.

We are running 2.8.2 with about 350K blocks on each datanode after they are 
decomed. We stopped ~200 datanodes at once (sounds crazy... and it does) 

I attached the graph for the writelock at that time. !write lock.png!

The goal of the whole ticket is not really about whether there will be missing 
block or not. And I don't think there will be unless you are decommissioning 
datanodes with all replicas at the same time which is out of the discussion. 
What I am proposing is to mitigate the impact to namenode performance. From 
this perspective, recommissioning a datanode with full blocks or stopping the 
node to have namenode clean up all the blocks at once are not ideal.

Balancing is actually not a concern for a large cluster (>3k datanodes) with 
high traffic since other blocks will soon fill up this new datanode.

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: write lock.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15634:
--
Attachment: write lock.png

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: write lock.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15630) RBF: Fix wrong client IP info in CallerContext when requests mount points with multi-destinations.

2020-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215650#comment-17215650
 ] 

Íñigo Goiri commented on HDFS-15630:


In addition to [~ferhui] comments, a couple minor ones.
Let's put the full javadoc for the old invokeMethod signature and just add "Set 
clientIp and callerContext get from server context." at the end of the 
description.


> RBF: Fix wrong client IP info in CallerContext when requests mount points 
> with multi-destinations.
> --
>
> Key: HDFS-15630
> URL: https://issues.apache.org/jira/browse/HDFS-15630
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15630.001.patch, HDFS-15630.002.patch, 
> HDFS-15630.003.patch
>
>
> There are two issues about client IP info in CallerContext when we try to 
> request mount points with multi-destinations.
>  # the clientIp would duplicate in CallerContext when 
> RouterRpcClient#invokeSequential.
>  # the clientIp would miss in CallerContext when 
> RouterRpcClient#invokeConcurrent. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14383) Compute datanode load based on StoragePolicy

2020-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215648#comment-17215648
 ] 

Íñigo Goiri commented on HDFS-14383:


[^HDFS-14383-02.patch] LGTM.
+1

[~LiJinglun], can you verify this solves your use case too?

> Compute datanode load based on StoragePolicy
> 
>
> Key: HDFS-14383
> URL: https://issues.apache.org/jira/browse/HDFS-14383
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.3, 3.1.2
>Reporter: Karthik Palanisamy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14383-01.patch, HDFS-14383-02.patch
>
>
> Datanode load check logic needs to be changed because existing computation 
> will not consider StoragePolicy.
> DatanodeManager#getInServiceXceiverAverage
> {code}
> public double getInServiceXceiverAverage() {
>  double avgLoad = 0;
>  final int nodes = getNumDatanodesInService();
>  if (nodes != 0) {
>  final int xceivers = heartbeatManager
>  .getInServiceXceiverCount();
>  avgLoad = (double)xceivers/nodes;
>  }
>  return avgLoad;
> }
> {code}
>  
> For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) 
> with average 10 xceivers the calculated threshold by the NN is 28 (((500 + 
> 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes 
> unavailable when the COLD tier nodes are barely in use. Turning this check 
> off helps to mitigate this issue, however the 
> dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, 
> upon turning it off can lead to situations where specific DNs are 
> "overloaded".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215646#comment-17215646
 ] 

Ahmed Hussein commented on HDFS-15618:
--

Thanks [~kihwal]! I thinks that's a good idea to have a default value for the 
MiniDFSCluster.
I added a default value in the {{MiniDFSCluster builder}}.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15618:
-
Attachment: HDFS-15618.004.patch

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215497#comment-17215497
 ] 

Kihwal Lee commented on HDFS-15628:
---

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
working on this, [~ahussein].

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at 

[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15628:
--
Fix Version/s: 3.1.5

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 

[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15627:
--
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
working on this [~ahussein].

> Audit log deletes before collecting blocks
> --
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15627:
--
Summary: Audit log deletes before collecting blocks  (was: Audit log 
deletes after edit is written)

> Audit log deletes before collecting blocks
> --
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15627) Audit log deletes after edit is written

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215476#comment-17215476
 ] 

Kihwal Lee commented on HDFS-15627:
---

+1 lgtm

> Audit log deletes after edit is written
> ---
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215454#comment-17215454
 ] 

Kihwal Lee commented on HDFS-15618:
---

It can be as fancy as adding a builder method for setting it (for exceptional 
cases) with a default value of 30 seconds. Or simply set it to 30 in places 
like {{startDataNodes()}}.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215452#comment-17215452
 ] 

Kihwal Lee commented on HDFS-15618:
---

In production, there is no harm in exiting after waiting for 5 seconds. But in 
junit, as you pointed out, it might cause more failures when the environment is 
slow.  We can set the timeout to something like 30 seconds in the mini dfs 
cluster's base config.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?focusedWorklogId=501492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501492
 ]

ASF GitHub Bot logged work on HDFS-15635:
-

Author: ASF GitHub Bot
Created on: 16/Oct/20 10:01
Start Date: 16/Oct/20 10:01
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2389:
URL: https://github.com/apache/hadoop/pull/2389#issuecomment-709951351


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  6s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |  17m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   2m 34s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 32s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | -1 :x: |  javac  |  23m 40s | 
[/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt)
 |  root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 1 new + 2055 unchanged - 
1 fixed = 2056 total (was 2056)  |
   | +1 :green_heart: |  compile  |  22m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |  22m 36s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 49s | 
[/diff-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 8 
unchanged - 0 fixed = 15 total (was 8)  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  19m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m  6s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  11m  2s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 186m 15s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.viewfs.TestViewFSOverloadSchemeCentralMountTableConfig |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/Dockerfile
 |
   | GITHUB PR | 

[jira] [Updated] (HDFS-15637) Viewfs should make mount-table to read from central place

2020-10-16 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated HDFS-15637:

Component/s: viewfs

> Viewfs should make mount-table to read from central place
> -
>
> Key: HDFS-15637
> URL: https://issues.apache.org/jira/browse/HDFS-15637
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfs
>Reporter: Junfan Zhang
>Priority: Major
>
> Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637].
> Viewfs should make mount-table to read from central place to solve the 
> problem of difficult mount-table conf update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15637) Viewfs should make mount-table to read from central place

2020-10-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215296#comment-17215296
 ] 

Junfan Zhang edited comment on HDFS-15637 at 10/16/20, 9:42 AM:


Hi [~umamaheswararao]. [~shv], [~abhishekd], [~hexiaoqiao]. Can you please take 
a look at the ISSUE, thanks. If ok, i will take it over.


was (Author: zuston):
Hi [~umamaheswararao] Can you please take a look at the ISSUE, thanks. If ok, i 
will take it over.

> Viewfs should make mount-table to read from central place
> -
>
> Key: HDFS-15637
> URL: https://issues.apache.org/jira/browse/HDFS-15637
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637].
> Viewfs should make mount-table to read from central place to solve the 
> problem of difficult mount-table conf update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15637) Viewfs should make mount-table to read from central place

2020-10-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215296#comment-17215296
 ] 

Junfan Zhang commented on HDFS-15637:
-

Hi [~umamaheswararao] Can you please take a look at the ISSUE, thanks. If ok, i 
will take it over.

> Viewfs should make mount-table to read from central place
> -
>
> Key: HDFS-15637
> URL: https://issues.apache.org/jira/browse/HDFS-15637
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637].
> Viewfs should make mount-table to read from central place to solve the 
> problem of difficult mount-table conf update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15637) Viewfs should make mount-table to read from central place

2020-10-16 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated HDFS-15637:

Description: 
Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637].

Viewfs should make mount-table to read from central place to solve the problem 
of difficult mount-table conf update

> Viewfs should make mount-table to read from central place
> -
>
> Key: HDFS-15637
> URL: https://issues.apache.org/jira/browse/HDFS-15637
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637].
> Viewfs should make mount-table to read from central place to solve the 
> problem of difficult mount-table conf update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15637) Viewfs should make mount-table to read from central place

2020-10-16 Thread Junfan Zhang (Jira)
Junfan Zhang created HDFS-15637:
---

 Summary: Viewfs should make mount-table to read from central place
 Key: HDFS-15637
 URL: https://issues.apache.org/jira/browse/HDFS-15637
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215288#comment-17215288
 ] 

Stephen O'Donnell edited comment on HDFS-15634 at 10/16/20, 9:27 AM:
-

{quote}
Proposal: Invalidate these blocks once they are replicated and there are enough 
live replicas in the cluster.
{quote}

Looking at the PR, you are adding these blocks to addToInvalidates(...) which 
will actually remove the replicas from the DNs. 

I am not sure this is a good idea, for a few reasons:

1. Right now, a decommissioned DN is untouched by the process - if something 
goes wrong with decommission (which we have seen happen) we can just 
recommission the node again and know all the blocks are still safely present.

2. I seem to recall there are some edge cases where a decommissioned but still 
online replica can be read.

3. On some clusters, nodes are decommissioned for maintenance (yes, they should 
use maintenance mode, but some don't) such as OS upgrades and then 
recommissioned. In these cases, when the DN rejoins, the blocks will become 
over replicated and then the NN will remove replicas randomly. This is arguably 
better than adding back an empty node, which may require running the balancer 
to move data onto it. If we remove the blocks from the DN while it is 
decommissioning, then on recommission we can only ever add back an empty node. 

{quote}
 A recent shutdown of decommissioned datanodes to finished the flow caused 
Namenode latency spike since namenode needs to remove all of the blocks from 
its memory and this step requires holding write lock. If we have gradually 
invalidated these blocks the deletion will be much easier and faster.
{quote}

1. What version were you running when you saw this problem?

2. How many blocks approximately were on the DNs which were stopped after 
decommission completed?

3. How many decommissioned hosts were stopped when this happened?

4. How long did the NN hold the write lock for approximately?

I am wondering if there would be a better way to handle this, possibly yielding 
the write lock which removing the blocks periodically, as this same problem 
would exist for a node going dead unexpectedly and not just during decommission.


was (Author: sodonnell):
{quote}
Proposal: Invalidate these blocks once they are replicated and there are enough 
live replicas in the cluster.
{quote}

Looking at the PR, you are adding these blocks to addToInvalidates(...) which 
will actually remove the replicas from the DNs. 

I am not sure this is a good idea, for a few reasons:

1. Right now, a decommissioned DN is untouched by the process - if something 
goes wrong with decommission (which we have seen happen) we can just 
recommission the node again and know all the blocks are still safely present.

2. I seem to recall there are some edge cases where a decommissioned but still 
online replica can be read.

3. On some clusters, nodes are decommissioned for maintenance (yes, they should 
use maintenance mode, but some don't) such as OS upgrades and then 
recommissioned. In these cases, when the DN rejoins, the blocks will become 
over replicated and then the NN will remove replicas randomly. This is arguably 
better than adding back an empty node, which may require running the balancer 
to move data onto it. If we remove the blocks from the DN while it is 
decommissioning, then on recommission we can only ever add back an empty node. 

{quote}
 A recent shutdown of decommissioned datanodes to finished the flow caused 
Namenode latency spike since namenode needs to remove all of the blocks from 
its memory and this step requires holding write lock. If we have gradually 
invalidated these blocks the deletion will be much easier and faster.
{quote}

What version were you running when you saw this problem?

How many blocks approximately were on the DNs which were stopped after 
decommission completed?

How many decommissioned hosts were stopped when this happened?

I am wondering if there would be a better way to handle this, possibly yielding 
the write lock which removing the blocks periodically, as this same problem 
would exist for a node going dead unexpectedly and not just during decommission.

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 

[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215288#comment-17215288
 ] 

Stephen O'Donnell commented on HDFS-15634:
--

{quote}
Proposal: Invalidate these blocks once they are replicated and there are enough 
live replicas in the cluster.
{quote}

Looking at the PR, you are adding these blocks to addToInvalidates(...) which 
will actually remove the replicas from the DNs. 

I am not sure this is a good idea, for a few reasons:

1. Right now, a decommissioned DN is untouched by the process - if something 
goes wrong with decommission (which we have seen happen) we can just 
recommission the node again and know all the blocks are still safely present.

2. I seem to recall there are some edge cases where a decommissioned but still 
online replica can be read.

3. On some clusters, nodes are decommissioned for maintenance (yes, they should 
use maintenance mode, but some don't) such as OS upgrades and then 
recommissioned. In these cases, when the DN rejoins, the blocks will become 
over replicated and then the NN will remove replicas randomly. This is arguably 
better than adding back an empty node, which may require running the balancer 
to move data onto it. If we remove the blocks from the DN while it is 
decommissioning, then on recommission we can only ever add back an empty node. 

{quote}
 A recent shutdown of decommissioned datanodes to finished the flow caused 
Namenode latency spike since namenode needs to remove all of the blocks from 
its memory and this step requires holding write lock. If we have gradually 
invalidated these blocks the deletion will be much easier and faster.
{quote}

What version were you running when you saw this problem?

How many blocks approximately were on the DNs which were stopped after 
decommission completed?

How many decommissioned hosts were stopped when this happened?

I am wondering if there would be a better way to handle this, possibly yielding 
the write lock which removing the blocks periodically, as this same problem 
would exist for a node going dead unexpectedly and not just during decommission.

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated HDFS-15635:

Description: 
According to HDFS-15289, the default mountable loader is 
{{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.

In some scenarios, users want to implement the mount table loader by 
themselves, so it is necessary to dynamically configure the loader.

 

cc [~shv], [~abhishekd], [~hexiaoqiao]

  was:
According to HDFS-15289, the default mountable loader is 
{{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.

In some scenarios, users want to implement the mount table loader by 
themselves, so it is necessary to dynamically configure the loader.

 

cc [~shv], [~abhishekd]


> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.
>  
> cc [~shv], [~abhishekd], [~hexiaoqiao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215223#comment-17215223
 ] 

Junfan Zhang edited comment on HDFS-15635 at 10/16/20, 8:26 AM:


Hi [~umamaheswararao] Can you please take a look at the PR, thanks.


was (Author: zuston):
Hi [~umamaheswararao] Please review it.Thanks~

> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.
>  
> cc [~shv], [~abhishekd]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated HDFS-15635:

Description: 
According to HDFS-15289, the default mountable loader is 
{{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.

In some scenarios, users want to implement the mount table loader by 
themselves, so it is necessary to dynamically configure the loader.

 

cc [~shv], [~abhishekd]

  was:
According to HDFS-15289, the default mountable loader is 
{{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
 

In some scenarios, users want to implement the mount table loader by 
themselves, so it is necessary to dynamically configure the loader.


> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.
>  
> cc [~shv], [~abhishekd]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun resolved HDFS-15636.

Resolution: Duplicate

Duplicate with HDFS-14383

> NameNode computes load by group when choosing datanodes.
> 
>
> Key: HDFS-15636
> URL: https://issues.apache.org/jira/browse/HDFS-15636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
> datanodes. The HOT files are stored with ALL_SSD and cold ones are stored 
> with HOT. There is a big chance the NameNode couldn't choose enough nodes for 
> writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
> temporary solution is  to increase the 
> 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the 
> unbalance of load of all the datanodes.
> We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
> are computed separately and each group has its own average load. When the 
> NameNode chooses a hdd node it only compares the node's load with
>  the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14383) Compute datanode load based on StoragePolicy

2020-10-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215242#comment-17215242
 ] 

Jinglun commented on HDFS-14383:


I met the same problem recently. This patch makes sense to me. Thanks 
[~ayushtkn] your working.

> Compute datanode load based on StoragePolicy
> 
>
> Key: HDFS-14383
> URL: https://issues.apache.org/jira/browse/HDFS-14383
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.3, 3.1.2
>Reporter: Karthik Palanisamy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14383-01.patch, HDFS-14383-02.patch
>
>
> Datanode load check logic needs to be changed because existing computation 
> will not consider StoragePolicy.
> DatanodeManager#getInServiceXceiverAverage
> {code}
> public double getInServiceXceiverAverage() {
>  double avgLoad = 0;
>  final int nodes = getNumDatanodesInService();
>  if (nodes != 0) {
>  final int xceivers = heartbeatManager
>  .getInServiceXceiverCount();
>  avgLoad = (double)xceivers/nodes;
>  }
>  return avgLoad;
> }
> {code}
>  
> For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) 
> with average 10 xceivers the calculated threshold by the NN is 28 (((500 + 
> 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes 
> unavailable when the COLD tier nodes are barely in use. Turning this check 
> off helps to mitigate this issue, however the 
> dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, 
> upon turning it off can lead to situations where specific DNs are 
> "overloaded".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215232#comment-17215232
 ] 

Jinglun commented on HDFS-15636:


Hi [~ayushtkn], seems related, I'll give it a check.

> NameNode computes load by group when choosing datanodes.
> 
>
> Key: HDFS-15636
> URL: https://issues.apache.org/jira/browse/HDFS-15636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
> datanodes. The HOT files are stored with ALL_SSD and cold ones are stored 
> with HOT. There is a big chance the NameNode couldn't choose enough nodes for 
> writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
> temporary solution is  to increase the 
> 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the 
> unbalance of load of all the datanodes.
> We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
> are computed separately and each group has its own average load. When the 
> NameNode chooses a hdd node it only compares the node's load with
>  the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215231#comment-17215231
 ] 

Jinglun commented on HDFS-15636:


We can implement this by 2 steps:
 # Let NameNode support computing load with group. The NameNode should resolve 
group for each datanode and count 'nodes in service' and 'xceiver count' for 
each group(in DatanodeStats).
 # Add a new BlockPlacementPolicy which considers load with group.

> NameNode computes load by group when choosing datanodes.
> 
>
> Key: HDFS-15636
> URL: https://issues.apache.org/jira/browse/HDFS-15636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
> datanodes. The HOT files are stored with ALL_SSD and cold ones are stored 
> with HOT. There is a big chance the NameNode couldn't choose enough nodes for 
> writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
> temporary solution is  to increase the 
> 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the 
> unbalance of load of all the datanodes.
> We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
> are computed separately and each group has its own average load. When the 
> NameNode chooses a hdd node it only compares the node's load with
>  the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215228#comment-17215228
 ] 

Ayush Saxena commented on HDFS-15636:
-

Similar to HDFS-14383?

> NameNode computes load by group when choosing datanodes.
> 
>
> Key: HDFS-15636
> URL: https://issues.apache.org/jira/browse/HDFS-15636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
> datanodes. The HOT files are stored with ALL_SSD and cold ones are stored 
> with HOT. There is a big chance the NameNode couldn't choose enough nodes for 
> writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
> temporary solution is  to increase the 
> 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the 
> unbalance of load of all the datanodes.
> We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
> are computed separately and each group has its own average load. When the 
> NameNode chooses a hdd node it only compares the node's load with
>  the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Jinglun (Jira)
Jinglun created HDFS-15636:
--

 Summary: NameNode computes load by group when choosing datanodes.
 Key: HDFS-15636
 URL: https://issues.apache.org/jira/browse/HDFS-15636
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jinglun


We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
datanodes. The HOT files are stored with ALL_SSD and cold ones are stored with 
HOT. There is a big chance the NameNode couldn't choose enough nodes for 
writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
temporary solution is  to increase the 
'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the unbalance 
of load of all the datanodes.
We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
are computed separately and each group has its own average load. When the 
NameNode chooses a hdd node it only compares the node's load with
 the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15636) NameNode computes load by group when choosing datanodes.

2020-10-16 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun reassigned HDFS-15636:
--

Assignee: Jinglun

> NameNode computes load by group when choosing datanodes.
> 
>
> Key: HDFS-15636
> URL: https://issues.apache.org/jira/browse/HDFS-15636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd 
> datanodes. The HOT files are stored with ALL_SSD and cold ones are stored 
> with HOT. There is a big chance the NameNode couldn't choose enough nodes for 
> writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A 
> temporary solution is  to increase the 
> 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the 
> unbalance of load of all the datanodes.
> We should let the NameNode compute load by group. The ssd nodes and hdd nodes 
> are computed separately and each group has its own average load. When the 
> NameNode chooses a hdd node it only compares the node's load with
>  the average load of the hdd group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215223#comment-17215223
 ] 

Junfan Zhang commented on HDFS-15635:
-

Hi [~umamaheswararao] Please review it.Thanks~

> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
>  
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated HDFS-15635:

External issue ID: https://github.com/apache/hadoop/pull/2389

> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
>  
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?focusedWorklogId=501439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501439
 ]

ASF GitHub Bot logged work on HDFS-15635:
-

Author: ASF GitHub Bot
Created on: 16/Oct/20 06:54
Start Date: 16/Oct/20 06:54
Worklog Time Spent: 10m 
  Work Description: zuston opened a new pull request #2389:
URL: https://github.com/apache/hadoop/pull/2389


   
   Link to [HDFS-15635](https://issues.apache.org/jira/browse/HDFS-15635)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501439)
Remaining Estimate: 0h
Time Spent: 10m

> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
>  
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15635:
--
Labels: pull-request-available  (was: )

> ViewFileSystemOverloadScheme support specifying mount table loader imp 
> through conf
> ---
>
> Key: HDFS-15635
> URL: https://issues.apache.org/jira/browse/HDFS-15635
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: viewfsOverloadScheme
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HDFS-15289, the default mountable loader is 
> {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
>  
> In some scenarios, users want to implement the mount table loader by 
> themselves, so it is necessary to dynamically configure the loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf

2020-10-16 Thread Junfan Zhang (Jira)
Junfan Zhang created HDFS-15635:
---

 Summary: ViewFileSystemOverloadScheme support specifying mount 
table loader imp through conf
 Key: HDFS-15635
 URL: https://issues.apache.org/jira/browse/HDFS-15635
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: viewfsOverloadScheme
Reporter: Junfan Zhang


According to HDFS-15289, the default mountable loader is 
{{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}.
 

In some scenarios, users want to implement the mount table loader by 
themselves, so it is necessary to dynamically configure the loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501431
 ]

ASF GitHub Bot logged work on HDFS-15634:
-

Author: ASF GitHub Bot
Created on: 16/Oct/20 06:05
Start Date: 16/Oct/20 06:05
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on pull request #2388:
URL: https://github.com/apache/hadoop/pull/2388#issuecomment-709817743


   Will put another patch with UT soon



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501431)
Time Spent: 1h  (was: 50m)

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501429
 ]

ASF GitHub Bot logged work on HDFS-15634:
-

Author: ASF GitHub Bot
Created on: 16/Oct/20 06:04
Start Date: 16/Oct/20 06:04
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on a change in pull request #2388:
URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070581



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3512,7 +3512,11 @@ private Block addStoredBlock(final BlockInfo block,
 int numUsableReplicas = num.liveReplicas() +
 num.decommissioning() + num.liveEnteringMaintenanceReplicas();
 
-if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
+
+// if block is still under construction, then done for now
+if (!storedBlock.isCompleteOrCommitted()) {

Review comment:
   I felt quite confused with the original structure since the early return 
was put after the statements it is trying to avoid..
   I can make it a single return no big deal.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501429)
Time Spent: 40m  (was: 0.5h)

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501430
 ]

ASF GitHub Bot logged work on HDFS-15634:
-

Author: ASF GitHub Bot
Created on: 16/Oct/20 06:04
Start Date: 16/Oct/20 06:04
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on a change in pull request #2388:
URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070741



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3559,9 +3558,26 @@ private Block addStoredBlock(final BlockInfo block,
 if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) {
   invalidateCorruptReplicas(storedBlock, reportedBlock, num);
 }
+if (shouldInvalidateDecommissionedRedundancy(num, fileRedundancy)) {
+  for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) {
+final DatanodeDescriptor datanode = storage.getDatanodeDescriptor();
+if (datanode.isDecommissioned()
+|| datanode.isDecommissionInProgress()) {
+  addToInvalidates(storedBlock, datanode);
+}
+  }
+}
 return storedBlock;
   }
 
+  // If there are enough live replicas, start invalidating
+  // decommissioned + decommissioning replicas
+  private boolean shouldInvalidateDecommissionedRedundancy(NumberReplicas num,

Review comment:
   Good idea.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501430)
Time Spent: 50m  (was: 40m)

> Invalidate block on decommissioning DataNode after replication
> --
>
> Key: HDFS-15634
> URL: https://issues.apache.org/jira/browse/HDFS-15634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org